Remove rows with missing values on columns specified
na.omit.data.table.RdThis is a data.table method for the S3 generic stats::na.omit. The internals are written in C for speed. See examples for benchmark timings.
bit64::integer64 type is also supported.
Usage
# S3 method for class 'data.table'
na.omit(object, cols=seq_along(object), invert=FALSE, ...)Arguments
- object
A
data.table.- cols
A vector of column names (or numbers) on which to check for missing values. Default is all the columns.
- invert
logical. If
FALSEomits all rows with any missing values (default).TRUEreturns just those rows with missing values instead.- ...
Further arguments special methods could require.
Details
The data.table method consists of an additional argument cols, which when specified looks for missing values in just those columns specified. The default value for cols is all the columns, to be consistent with the default behaviour of stats::na.omit.
It does not add the attribute na.action as stats::na.omit does.
Value
A data.table with just the rows where the specified columns have no missing value in any of them.
Examples
DT = data.table(x=c(1,NaN,NA,3), y=c(NA_integer_, 1:3), z=c("a", NA_character_, "b", "c"))
# default behaviour
na.omit(DT)
#> x y z
#> <num> <int> <char>
#> 1: 3 3 c
# omit rows where 'x' has a missing value
na.omit(DT, cols="x")
#> x y z
#> <num> <int> <char>
#> 1: 1 NA a
#> 2: 3 3 c
# omit rows where either 'x' or 'y' have missing values
na.omit(DT, cols=c("x", "y"))
#> x y z
#> <num> <int> <char>
#> 1: 3 3 c
if (FALSE) { # \dontrun{
# Timings on relatively large data
set.seed(1L)
DT = data.table(x = sample(c(1:100, NA_integer_), 5e7L, TRUE),
y = sample(c(rnorm(100), NA), 5e7L, TRUE))
system.time(ans1 <- na.omit(DT)) ## 2.6 seconds
system.time(ans2 <- stats:::na.omit.data.frame(DT)) ## 29 seconds
# identical? check each column separately, as ans2 will have additional attribute
all(sapply(1:2, function(i) identical(ans1[[i]], ans2[[i]]))) ## TRUE
} # }