Makes one data.table from a list of many
rbindlist.RdSame as do.call(rbind, l) on data.frames, but much faster.
Usage
rbindlist(l, use.names="check", fill=FALSE, idcol=NULL, ignore.attr=FALSE)
# rbind(..., use.names=TRUE, fill=FALSE, idcol=NULL)Arguments
- l
A list containing
data.table,data.frameorlistobjects....is the same but you pass the objects by name separately.- use.names
TRUEbinds by matching column name,FALSEby position."check"(default) warns if all items don't have the same names in the same order and then currently proceeds as ifuse.names=FALSEfor backwards compatibility (TRUEin future); see news for v1.12.2.- fill
TRUEfills missing columns with NAs, or NULL for missing list columns. By defaultFALSE.- idcol
Creates a column in the result showing which list item those rows came from.
TRUEnames this column".id".idcol="file"names this column"file". If the input list has names, those names are the values placed in this id column, otherwise the values are an integer vector1:length(l). Seeexamples.- ignore.attr
Logical, default
FALSE. WhenTRUE, allows binding columns with different attributes (e.g. class).
Details
Each item of l can be a data.table, data.frame or list, including NULL (skipped) or an empty object (0 rows). rbindlist is most useful when there are an unknown number of (potentially many) objects to stack, such as returned by lapply(fileNames, fread). rbind is most useful to stack two or three objects which you know in advance. ... should contain at least one data.table for rbind(...) to call the fast method and return a data.table, whereas rbindlist(l) always returns a data.table even when stacking a plain list with a data.frame, for example.
Columns with duplicate names are bound in the order of occurrence, similar to base. The position (column number) that each duplicate name occurs is also retained.
If column i does not have the same type in each of the list items; e.g, the column is integer in item 1 while others are numeric, they are coerced to the highest type.
If a column contains factors then a factor is created. If any of the factors are also ordered factors then the longest set of ordered levels are found (the first if this is tied). Then the ordered levels from each list item are checked to be an ordered subset of these longest levels. If any ambiguities are found (e.g. blue<green vs green<blue), or any ordered levels are missing from the longest, then a regular factor is created with warning. Any strings in regular factor and character columns which are missing from the longest ordered levels are added at the end.
When binding lists of data.table or data.frame objects containing objects with units defined by class attributes (e.g., difftime objects with different units), the resulting data.table may not preserve the original units correctly. Instead, values will be converted to a common unit without proper conversion of the values themselves. This issue applies to any class where the unit or precision is determined by attributes. Users should manually ensure that objects with unit-dependent attributes have consistent units before using rbindlist.
Examples
# default case
DT1 = data.table(A=1:3,B=letters[1:3])
DT2 = data.table(A=4:5,B=letters[4:5])
l = list(DT1,DT2)
rbindlist(l)
#> A B
#> <int> <char>
#> 1: 1 a
#> 2: 2 b
#> 3: 3 c
#> 4: 4 d
#> 5: 5 e
# bind correctly by names
DT1 = data.table(A=1:3,B=letters[1:3])
DT2 = data.table(B=letters[4:5],A=4:5)
l = list(DT1,DT2)
rbindlist(l, use.names=TRUE)
#> A B
#> <int> <char>
#> 1: 1 a
#> 2: 2 b
#> 3: 3 c
#> 4: 4 d
#> 5: 5 e
# fill missing columns, and match by col names
DT1 = data.table(A=1:3,B=letters[1:3])
DT2 = data.table(B=letters[4:5],C=factor(1:2))
l = list(DT1,DT2)
rbindlist(l, use.names=TRUE, fill=TRUE)
#> A B C
#> <int> <char> <fctr>
#> 1: 1 a <NA>
#> 2: 2 b <NA>
#> 3: 3 c <NA>
#> 4: NA d 1
#> 5: NA e 2
# generate index column, auto generates indices
rbindlist(l, use.names=TRUE, fill=TRUE, idcol=TRUE)
#> .id A B C
#> <int> <int> <char> <fctr>
#> 1: 1 1 a <NA>
#> 2: 1 2 b <NA>
#> 3: 1 3 c <NA>
#> 4: 2 NA d 1
#> 5: 2 NA e 2
# let's name the list
setattr(l, 'names', c("a", "b"))
rbindlist(l, use.names=TRUE, fill=TRUE, idcol="ID")
#> ID A B C
#> <char> <int> <char> <fctr>
#> 1: a 1 a <NA>
#> 2: a 2 b <NA>
#> 3: a 3 c <NA>
#> 4: b NA d 1
#> 5: b NA e 2
# bind different classes
DT1 = data.table(A=1:3,B=letters[1:3])
DT2 = data.table(A=4:5,B=letters[4:5])
setattr(DT1[["A"]], "class", c("a", "integer"))
rbind(DT1, DT2, ignore.attr=TRUE)
#> A B
#> <a> <char>
#> 1: 1 a
#> 2: 2 b
#> 3: 3 c
#> 4: 4 d
#> 5: 5 e