The following points show ways to improve calculation speed with
NMsim
. The list is somewhat prioritized, most important
first.
- Reduce amount of data written to file
- Almost always use the
table.vars
argument. Only include the variables you need out of Nonmem. There is no reason to list variables in input data sinceNMsim
can merge those back on. NMsim versions after 0.1.6 have acarry.out
argument to control which input data variables will be recovered. The default iscarry.out=TRUE
which means that all input data variables will be added back on the results. If your data set is very large, and your input data set has many columns, you can reduce this. You could do something as little astable.vars=c(PRED, IPRED)
andcarry.out=c("ID", "TIME","TRT")
.
- Almost always use the
- If running multiple simulations, run them in parallel
- Use
sge=TRUE
to submit all simulation runs to the cluster
- Use
- Break the simulation into multiple Nonmem runs
- For example, instead of running
subproblems=1000
, you could dosubproblems=50, nsims=20, sge=TRUE
. In the first case, one Nonmem model is going through 1000 subproblems. In the second case, 20 Nonmem runs are going through 50 subproblems each, and the Nonmem runs are executed in parellel. See example under “Speeding up large simulations” too.
- For example, instead of running
- Use
data.table
- Depending on the data sets this can make a very large difference.
Consider using
NMdataConf(as.fun="data.tabe")
- Depending on the data sets this can make a very large difference.
Consider using
- Use
method.execute="nmsim"
- If you provide the path to the Nonmem executable using
path.nonmem
,NMsim()
will be default usemethod.execute="nmsim"
. Trick: configure this once and for all in the top of your script with something likeNMdata::NMdataConf(path.nonmem="/path/to/nmfe75")
.
- If you provide the path to the Nonmem executable using
- Parellilize individual runs
- NMsim does support parellilization (performing a Nonmem run using
multiple cores). For instance, use 16 cores for each Nonmem run by
adding
nc=16
- However, this is rarely worth it for simulations. Most of the time, input/output (reading/writing to disk) is the bottleneck, and splitting the data set into multiple runs will most often reduce run times more.
- NMsim does support parellilization (performing a Nonmem run using
multiple cores). For instance, use 16 cores for each Nonmem run by
adding
Most common reasons for NMsim
to fail or be slow
If NMsim
fails or behaves unexpectedly, make sure to
read the output from Nonmem in the R console. If NMsim
complains it cannot find the output tables, it is likely because Nonmem
failed and did not generate them.
Any variable that is used in Nonmem must be defined either inside the
model or in the data set. There may be a covariate in the estimated
model that you did not include in your simulation data set - the
simulation will break. Sometimes the estimation models include variables
in output tables that were not used for anything else by Nonmem than
being read from the input data set and printed in output tables. Imagine
you cleverly included a unique row identifier called ROW
in
your estimation data set and listed it in $TABLE
to
reliably combine input and output data. It is not used for anything else
in the data. If we do not customize the output table in
NMsim()
using the table.vars
argument and the
simulation input data set does not include a numeric ROW
column, we get this error:
Starting NMTRAN
AN ERROR WAS FOUND IN THE CONTROL STATEMENTS.
AN ERROR WAS FOUND ON LINE 60 AT THE APPROXIMATE POSITION NOTED:
$TABLE ROW TVKA TVV2 TVV3 TVCL KA V2 V3 CL Q PRED IPRED Y NOPRINT FILE=NMsim_xgxr021_noname.tab
X
THE CHARACTERS IN ERROR ARE: ROW
479 THIS ITEM IS NOT LISTED IN MODULE NMPRD4 AND MAY NOT BE DISPLAYED.
cp: cannot stat 'NMsim_xgxr021_noname.tab': No such file or directory
Error in NMscanTables(file, quiet = TRUE, as.fun = "data.table", col.row = col.row, :
NMscanTables: File not found: /home/philip/R/x86_64-pc-linux-gnu-library/4.2/NMsim/examples/nonmem/NMsim/xgxr021_noname/NMsim_xgxr021_noname.tab. Did you copy the lst file but forgot table file?
Results could not be read.
Nonmem gets to writing the $TABLE
but cannot find a
variable called ROW
. But remember, NMsim normally does not
need a row identifier to combine the input and output data. In many
cases, the best way to fix this is to reduce the $TABLE
section using the table.vars
argument. All we need from the
simulation results are population and individual predictions anyway. We
could have omitted ROW
in the input data set and done
something as simple as
simres <- NMsim(file.mod=file.mod,
data=dat.sim,
table.vars="PRED IPRED Y")
table.vars
can help avoid many of these problems. And if
NMsim
is slow, this is a large low-hanging fruit. In a
benchmark example, I reduced a (very large) simulation run time from
~1.5 hours to ~7 minutes this way.
Speeding up large simulations
NMsim()
offers a powerful way to parallellize mutually
independent simulations. This will typically be distinct subjects which
can be simulated without regard to each other. The simple way to do this
is to split the data set into a list of data sets and pass that list in
the NMsim()
data
argument.
library(NMsim)
library(data.table)
dose1 <- NMcreateDoses(TIME=c(0,24),AMT=c(300,150),
addl=list(ADDL=c(0,5),II=c(0,24)),CMT=1,col.id=NA,
as.fun="data.table")
doses <- dose1[,.(ID=1:1000),by=dose1]
dat.sim <- addEVID2(doses,TIME=0:(24*7),CMT=2)
## say dt.sim is a large data set with 1000 subjects. We want 10 data
## sets with 100 subjects in each.
dat.sim$IDGRP <- (dat.sim$ID-1)%/%100+1
## Now IDGRP is the grouping of the ID's
as.data.table(dat.sim)[,.(minID=min(ID),maxID=max(ID)),by=IDGRP]
### with data.table, we easily split the data in a list of data.tables based on that grouping variable
data.multiple <- split(dt.sim,by="IDGRP")
## using sge=TRUE we are now sending the run to the cluster as 10 parellel Nonmem runs.
path.multidata <- NMsim(file.mod,
data=data.multiple
,table.vars=c("PRED","IPRED","Y"),
,name.sim="datalist_01"
,sge=TRUE
)
### If we want to wait and read the results when they are ready, use wait=TRUE. NMsim will automatically combine them into one data set as if they were from just one Nonmem run.
simres <- NMreadSim(path.multidata,wait=TRUE)