Skip to content

Commit 31c874a

Browse files
committed
Merge remote-tracking branch 'refs/remotes/origin/master'
2 parents 94508d2 + 0009de0 commit 31c874a

File tree

1 file changed

+45
-44
lines changed

1 file changed

+45
-44
lines changed

doc/commands/gretl_functions_en.xml

+45-44
Original file line numberDiff line numberDiff line change
@@ -1642,13 +1642,13 @@
16421642
<quote>Matrix input</quote> below for alternative usage.
16431643
</para>
16441644
<para>
1645-
In the most minimal usage, <argname>x</argname> is set to
1645+
In the simplest usage <argname>x</argname> is set to
16461646
<lit>null</lit>, <argname>byvar</argname> is a single series
1647-
and the third argument is omitted, or set to
1648-
<lit>null</lit>. In this case, the return value is a matrix
1649-
with two columns holding, respectively, the distinct values
1650-
of <argname>byvar</argname>, sorted in ascending order, and
1651-
the count of observations at which <argname>byvar</argname>
1647+
and the third argument is omitted or set to
1648+
<lit>null</lit>. The return value is then a matrix with two
1649+
columns holding, respectively, the distinct values of
1650+
<argname>byvar</argname> sorted in ascending order, and the
1651+
count of observations at which <argname>byvar</argname>
16521652
takes on each of these values. For example,
16531653
</para>
16541654
<code>
@@ -1661,40 +1661,40 @@
16611661
</para>
16621662
<para>
16631663
More generally, if <argname>byvar</argname> is a list with
1664-
<math>n</math> members, then the left-hand <math>n</math>
1665-
columns hold the combinations of the distinct values of each
1666-
of the <math>n</math> series and the count column holds the
1667-
number of observations at which each combination is
1668-
realized. Note that the count column can always be found at
1669-
the position <lit>nelem(byvar) + 1</lit>.
1664+
<math>n</math> members then the first <math>n</math> columns
1665+
of the returned matrix hold the combinations of the distinct
1666+
values of each of the <math>n</math> series, and the count
1667+
column holds the number of observations at which each
1668+
combination is realized. (The count column can always be
1669+
found at the position <lit>nelem(byvar)+1</lit>).
16701670
</para>
16711671
<subhead>Specifying an aggregation function</subhead>
16721672
<para>
1673-
If the third argument is given, then <argname>x</argname>
1673+
If the third argument is given then <argname>x</argname>
16741674
must not be <lit>null</lit>, and the rightmost
16751675
<math>m</math> columns hold the values of the statistic
16761676
specified by <argname>funcname</argname> for each of the
1677-
variables in <argname>x</argname>. (Thus, <math>m</math> is
1677+
variables in <argname>x</argname>. (So <math>m</math> is
16781678
equal to 1 if <argname>x</argname> is a single series and
16791679
equal to <lit>nelem(x)</lit> if <argname>x</argname> is a
1680-
list.) The given statistic is calculated on the respective
1680+
list.) The specified statistic is calculated on the
16811681
sub-samples defined by the combinations in
16821682
<argname>byvar</argname> (in ascending order); these
16831683
combinations are shown in the first <math>n</math> column(s)
16841684
of the returned matrix.
16851685
</para>
16861686
<para>
1687-
So, in the special case where <argname>x</argname> and
1688-
<argname>byvar</argname> are both individual series, the
1689-
return value is a matrix with three columns holding,
1690-
respectively, the distinct values of
1691-
<argname>byvar</argname>, sorted in ascending order; the
1692-
count of observations at which <argname>byvar</argname>
1693-
takes on each of these values; and the values of the
1694-
statistic specified by <argname>funcname</argname>
1695-
calculated on series <argname>x</argname>, using only those
1696-
observations at which <argname>byvar</argname> takes on the
1697-
value given in the first column.
1687+
So, if both <argname>x</argname> and
1688+
<argname>byvar</argname> are individual series, the return
1689+
value is a matrix with three columns holding the distinct
1690+
values of <argname>byvar</argname> sorted in ascending
1691+
order; the count of observations at which
1692+
<argname>byvar</argname> takes on each of these values; and
1693+
the values of the statistic specified by
1694+
<argname>funcname</argname> calculated on series
1695+
<argname>x</argname>, using just those observations at which
1696+
<argname>byvar</argname> takes on the value given in the
1697+
first column.
16981698
</para>
16991699
<para>
17001700
The following values of <argname>funcname</argname> are
@@ -1710,7 +1710,7 @@
17101710
be said to <quote>aggregate</quote> the series in some way.
17111711
If none of these built-in functions does what you need, you
17121712
can give the name of a user-defined function as the
1713-
aggregator; like the built-ins, such a function must take a
1713+
aggregator. Like the built-ins, such a function must take a
17141714
single series argument and return a scalar value.
17151715
</para>
17161716
<para>
@@ -1720,12 +1720,13 @@
17201720
(non-missing) observations on <argname>x</argname> at
17211721
each <argname>byvar</argname> combination.
17221722
</para>
1723+
<subhead>Some examples</subhead>
17231724
<para>
1724-
For a simple example, suppose that <lit>region</lit>
1725-
represents a coding of geographical region using integer
1726-
values 1 to <math>n</math>, and <lit>income</lit> represents
1727-
household income. Then the following would produce an <by
1728-
r="n" c="3"/> matrix holding the region codes, the count of
1725+
First, suppose that <lit>region</lit> represents a coding of
1726+
geographical region using integer values 1 to
1727+
<math>n</math>, and <lit>income</lit> represents household
1728+
income. Then the following would produce an <by r="n"
1729+
c="3"/> matrix holding the region codes, the count of
17291730
observations in each region, and mean household income for
17301731
each of the regions:
17311732
</para>
@@ -1752,15 +1753,15 @@
17521753
of <lit>income</lit> and <lit>age</lit>.
17531754
</para>
17541755
<para>
1755-
Note that if <argname>byvar</argname> is a list, some
1756-
combinations of the <argname>byvar</argname> values may not
1757-
be present in the data (giving a count of zero). In that
1758-
case the value of the statistics for <argname>x</argname>
1759-
are recorded as <lit>NaN</lit> (not a number). If you want
1760-
to ignore such cases you can use the <fncref targ="selifr"/>
1761-
function to select only those rows that have a non-zero
1762-
count. The column to test is one place to the right of the
1763-
number of <argname>byvar</argname> variables, so we can do:
1756+
If <argname>byvar</argname> is a list, some combinations of
1757+
the <argname>byvar</argname> values may not be present in
1758+
the data (giving a count of zero). In that case the value of
1759+
the statistics for <argname>x</argname> are recorded as
1760+
<lit>NaN</lit> (not a number). To cut out such cases you
1761+
can use the <fncref targ="selifr"/> function to select only
1762+
those rows that have a non-zero count. The column to test is
1763+
one place to the right of the number of
1764+
<argname>byvar</argname> variables, so we can do:
17641765
</para>
17651766
<code>
17661767
matrix m = aggregate(X, BY, sd)
@@ -1774,9 +1775,9 @@
17741775
form. However, if both arguments are provided they must
17751776
match in type (you cannot give a series or list for one
17761777
argument and a matrix for the other) and two matrix
1777-
arguments must have the same number of rows. Also note that
1778-
in this context matrix columns are treated as if they were
1779-
series, so the aggregation function must follow the pattern
1778+
arguments must have the same number of rows. In this
1779+
context matrix columns are treated as if they were series,
1780+
so the aggregation function must follow the pattern
17801781
described above, taking a series argument and returning a
17811782
scalar.
17821783
</para>

0 commit comments

Comments
 (0)