| groupingsets {data.table} | R Documentation |
Grouping Set aggregation for data tables
Description
Calculate aggregates at various levels of groupings producing multiple (sub-)totals. Reflects SQLs GROUPING SETS operations.
Usage
rollup(x, ...)
## S3 method for class 'data.table'
rollup(x, j, by, .SDcols, id = FALSE, label = NULL, ...)
cube(x, ...)
## S3 method for class 'data.table'
cube(x, j, by, .SDcols, id = FALSE, label = NULL, ...)
groupingsets(x, ...)
## S3 method for class 'data.table'
groupingsets(x, j, by, sets, .SDcols,
id = FALSE, jj, label = NULL, enclos = parent.frame(), ...)
Arguments
x |
|
... |
argument passed to custom user methods. Ignored for |
j |
expression passed to data.table |
by |
character column names by which we are grouping. |
sets |
list of character vector reflecting grouping sets, used in |
.SDcols |
columns to be used in |
id |
logical default |
jj |
quoted version of |
label |
label(s) to be used in the 'total' rows in the grouping variable columns of the output, that is, in rows where the grouping variable has been aggregated. Can be a named list of scalars, or a scalar, or |
enclos |
the environment containing the symbols referenced by |
Details
All three functions rollup, cube, groupingsets are generic methods, data.table methods are provided.
The label argument can be a named list of scalars, or a scalar, or NULL. When label is a list, each element name must be (1) a variable name in by, or (2) the first element of the class in the data.table x of a variable in by, or (3) one of 'character', 'integer', 'numeric', 'factor', 'Date', 'IDate'. The order of the list elements is not important. A label specified by variable name will apply only to that variable, while a label specified by first element of a class will apply to all variables in by for which the first element of the class of the variable in x matches the label element name, except for variables that have a label specified by variable name (that is, specification by variable name takes precedence over specification by class). For label elements with name in by, the class of the label value must be the same as the class of the variable in x. For label elements with name not in by, the first element of the class of the label value must be the same as the label element name. For example, label = list(integer = 999, IDate = as.Date("3000-01-01")) would produce an error because class(999)[1] is not "integer" and class(as.Date("3000-01-01"))[1] is not "IDate". A corrected specification would be label = list(integer = 999L, IDate = as.IDate("3000-01-01")).
The label = <scalar> option provides a shorter alternative in the case where only one class of grouping variable requires a label. For example, label = list(character = "Total") can be shortened to label = "Total". When this option is used, the label will be applied to all variables in by for which the first element of the class of the variable in x matches the first element of the class of the scalar.
Value
A data.table with various aggregates.
References
https://www.postgresql.org/docs/9.5/static/queries-table-expressions.html#QUERIES-GROUPING-SETS https://www.postgresql.org/docs/9.5/static/functions-aggregate.html#FUNCTIONS-GROUPING-TABLE
See Also
Examples
n = 24L
set.seed(25)
DT <- data.table(
color = sample(c("green","yellow","red"), n, TRUE),
year = as.Date(sample(paste0(2011:2015,"-01-01"), n, TRUE)),
status = as.factor(sample(c("removed","active","inactive","archived"), n, TRUE)),
amount = sample(1:5, n, TRUE),
value = sample(c(3, 3.5, 2.5, 2), n, TRUE)
)
# rollup
by_vars = c("color", "year", "status")
rollup(DT, j=sum(value), by=by_vars) # default id=FALSE
rollup(DT, j=sum(value), by=by_vars, id=TRUE)
rollup(DT, j=lapply(.SD, sum), by=by_vars, id=TRUE, .SDcols="value")
rollup(DT, j=c(list(count=.N), lapply(.SD, sum)), by=by_vars, id=TRUE)
rollup(DT, j=sum(value), by=by_vars,
# specify label by variable name
label=list(color="total", year=as.Date("3000-01-01"), status=factor("total")))
rollup(DT, j=sum(value), by=by_vars,
# specify label by variable name and first element of class
label=list(color="total", Date=as.Date("3000-01-01"), factor=factor("total")))
# label is character scalar so applies to color only
rollup(DT, j=sum(value), by=by_vars, label="total")
rollup(DT, j=.N, by=c("color", "year", "status", "value"),
# label can be explicitly specified as NA or NaN
label = list(color=NA_character_, year=as.Date(NA), status=factor(NA), value=NaN))
# cube
cube(DT, j = sum(value), by = c("color","year","status"), id=TRUE)
cube(DT, j = lapply(.SD, sum), by = c("color","year","status"), id=TRUE, .SDcols="value")
cube(DT, j = c(list(count=.N), lapply(.SD, sum)), by = c("color","year","status"), id=TRUE)
# groupingsets
groupingsets(DT, j = c(list(count=.N), lapply(.SD, sum)), by = c("color","year","status"),
sets = list("color", c("year","status"), character()), id=TRUE)