Based on columns in data, run function on subsets of data and return the results in a list, carrying names of the subsets. Say a column is `model` and you want to create a plot, run a regression or anything on the data for each model. In that case lapplydt(data,by="model",fun=function(x)lm(lAUC~lDose,data=x)). The l in lapplydt is because a list is returned (like lapply), the dt is because the input is a data.table (anything that can be converted to such is OK).

lapplydt(data, by, fun, drop.null = FALSE)

Arguments

data

Data set to process. Must be a data.frame-like structure.

by

Column to split data by.

fun

function to pass to `lapply()`. If an argument called `.nm` is defined, it gets a special meaning and can be used to retrieve the name of the respective subset. See examples.

drop.null

If some subsets return `NULL`, drop the empty elements in the returned list?

Value

a list

Details

the name of the current dataset can be reached with the `.nm` variable, if such argument is defined in `fun`. Se examples.

When is lapplydt better than lapply(split(dt),...)? In some cases, the lapply() way may drop the names stemming from the values of the columns you are splitting by. Other advantages is the `.nm` feature described above, and that `lapplydt` can drop NULL results. That's about it.

Examples

pk <- readRDS(file=system.file("examples/data/xgxr2.rds",package="NMdata"))
lapplydt(pk,by="DOSE",fun=nrow)
#> $`3`
#> [1] 300
#> 
#> $`10`
#> [1] 300
#> 
#> $`30`
#> [1] 301
#> 
#> $`100`
#> [1] 301
#> 
#> $`300`
#> [1] 300
#> 
lapplydt(pk,by="DOSE",fun=function(x,.nm) {
    message("this is subset",.nm)
    message(paste("Result:",nrow(x)))
})
#> this is subset3
#> Result: 300
#> this is subset10
#> Result: 300
#> this is subset30
#> Result: 301
#> this is subset100
#> Result: 301
#> this is subset300
#> Result: 300
#> $`3`
#> NULL
#> 
#> $`10`
#> NULL
#> 
#> $`30`
#> NULL
#> 
#> $`100`
#> NULL
#> 
#> $`300`
#> NULL
#>