Split data.table, apply function, and return results in a data.table.

For each subset of a data.table, apply function then combine results into a data.table.

dt_ddply(
  .data,
  .variables,
  .f = NULL,
  ...,
  .progress = "none",
  .drop = TRUE,
  .parallel = FALSE
)

dt_ldply(
  .data,
  .f = NULL,
  ...,
  .progress = "none",
  .parallel = FALSE,
  .id = NA
)

dt_dlply(
  .data,
  .variables,
  .f = NULL,
  ...,
  .progress = "none",
  .drop = TRUE,
  .parallel = FALSE
)

Arguments

.data

data frame to be processed

.variables

variables to split data frame by, as as.quoted variables, a formula or character vector

.f

A function, specified in one of the following ways:

A named function, e.g. mean.
An anonymous function, e.g. \(x) x + 1 or function(x) x + 1.
A formula, e.g. ~ .x + 1. You must use .x to refer to the first argument. Only recommended if you require backward compatibility with older versions of R.
A string, integer, or list, e.g. "idx", 1, or list("idx", 1) which are shorthand for \(x) pluck(x, "idx"), \(x) pluck(x, 1), and \(x) pluck(x, "idx", 1) respectively. Optionally supply .default to set a default value if the indexed element is NULL or does not exist.

...

other arguments passed on to .fun

.progress

name of the progress bar to use, see create_progress_bar

.drop

should combinations of variables that do not appear in the input data be preserved (FALSE) or dropped (TRUE, default)

.parallel

if TRUE, apply function in parallel, using parallel backend provided by foreach

Examples

dt <- data.table(x = 1:10, y = 1:5)
dt_dlply(dt, .(y), ~.[which.max(x)])
#> $`1`
#> [data.table]: 
#> # A data frame: 1 × 2
#>       x     y
#>   <int> <int>
#> 1     6     1
#> 
#> $`2`
#> [data.table]: 
#> # A data frame: 1 × 2
#>       x     y
#>   <int> <int>
#> 1     7     2
#> 
#> $`3`
#> [data.table]: 
#> # A data frame: 1 × 2
#>       x     y
#>   <int> <int>
#> 1     8     3
#> 
#> $`4`
#> [data.table]: 
#> # A data frame: 1 × 2
#>       x     y
#>   <int> <int>
#> 1     9     4
#> 
#> $`5`
#> [data.table]: 
#> # A data frame: 1 × 2
#>       x     y
#>   <int> <int>
#> 1    10     5
#> 
dt_ddply(dt, .(y), ~ top_n(., 1, x))
#> [data.table]: 
#> # A data frame: 5 × 2
#>       x     y
#>   <int> <int>
#> 1     6     1
#> 2     7     2
#> 3     8     3
#> 4     9     4
#> 5    10     5