Data Set Creation with NMsim

Introduction

The purpose of data sets of interest in this vignette is succesfull and meaningful simulation with NMsim().

The data argument to NMsim() is essential to any simulation. For a simulation of new subjects, the data may be the only need argument In addition to a control stream file path.

simres <- NMsim(file.mod=file.mod,data=data.sim)

NMsim and NMdata provide concise and feature-rich methods to create and check such data sets.

NMsim() does not require simulation data sets to be created using NMsim functions. If already accustumed to other data set generation tools, you can most likely continue using those with NMsim().

However created, The NMsim() data argument must

be a data.frame
its columns names are exact NONMEM $DATA variables

Checking the data set before running Nonmem can significantly simplify identification of data set issues. This vignette uses NMdata::NMcheckData for automated checks of several value types and other properties of data sets. Like NMsim, NMdata::NMcheckData does not depend on how the data set was created.

A basic simulation data set

As an example we create a regimen with a loading dose of 300 mg followed by 150 QD for 6 days. We dose into compartment 1, and we want to simulate samples in the second compartment. These numbers depend on the model which the data set is intended to be used with.

NMcreateDoses() is a flexible function that creates dosing records based on a concise syntax. We add a label to the regimen right away.

### multiple dose regimens with loading are easily created with NMcreateDoses too
## We use ADDL+II (either method easy)
doses <- NMcreateDoses(TIME=c(0,24),AMT=c(300,150),addl=data.frame(ADDL=c(0,5),II=c(0,24)),CMT=1)
doses <- transform(doses,trt="300 mg then 150 mg QD")
## Notice, the ID and MDV columns are included
doses

ID	TIME	EVID	CMT	AMT	II	ADDL	MDV	trt
1	0	1	1	300	0	0	1	300 mg then 150 mg QD
1	24	1	1	150	24	5	1	300 mg then 150 mg QD

Now we add the sample records using NMaddSamples().

## Add simulation records - longer for QD regimens
dat.sim <- NMaddSamples(doses,TIME=0:(24*7),CMT=2)

dat.sim is now a valid simulation data set with one subject. However, even though NMaddSamples() does try to order the data in a meaningful way, it is recommended to always manually order the data set. We use data.table’s setorder(). dplyr::arrange can just as well be used. A row identifier (counter) can make post-processing easier, so we add that too.

## sort data set 
setorder(dat.sim,ID,TIME,EVID)
## Adding a row identifier (generally not necessary but recommended)
dat.sim$ROW <- 1:nrow(dat.sim)

NMsim does not include any plotting functionality, but here is a simple way to show dosing amounts and sample times. NMdata::NMexpandDoses() is used to expand the doses coded with ADDL/II in order to get a data row to plot for each dose. We also take the sum of the amounts by time point in case doses are simultaneous.

dtplot <- NMdata::NMexpandDoses(dat.sim,as.fun="data.table")
dtplot <- dtplot[,.(AMT=sum(AMT)),by=.(ID,CMT,TIME,EVID)]

ggplot(dtplot,aes(TIME,factor(CMT),colour=factor(EVID)))+
    geom_point(data=function(x)x[EVID==1],aes(size=AMT))+
    geom_point(data=function(x)x[EVID==2],shape="|")+
    labs(x="Time (hours)",y="Compartment")+
    theme(legend.position="bottom")

After using NMexpandDoses() the simulation data set is plottet.

A brief overview of the number of events broken down by event type EVID and dose amount AMT:

trt	CMT	EVID	AMT	Nrows
300 mg then 150 mg QD	1	1	150	6
300 mg then 150 mg QD	1	1	300	1
300 mg then 150 mg QD	2	2	NA	169

Showing the top five rows for understanding what the data now looks like. Notice that the following are not issues:

Data contains a mix of numeric and non-numeric columns
Columns are not sorted in Nonmem-friendly style with non-numeric columns to the right

ID	TIME	EVID	CMT	AMT	II	ADDL	MDV	trt	ROW
1	0	1	1	300	0	0	1	300 mg then 150 mg QD	1
1	0	2	2	NA	NA	NA	1	300 mg then 150 mg QD	2
1	1	2	2	NA	NA	NA	1	300 mg then 150 mg QD	3
1	2	2	2	NA	NA	NA	1	300 mg then 150 mg QD	4
1	3	2	2	NA	NA	NA	1	300 mg then 150 mg QD	5

Finally, We check the simulation data set for various potential issues in Nonmem data sets using NMdata::NMcheckData and summarize the number of doses and observations:

## until NMdata 0.1.7 NMcheckData requires a DV column
dat.sim[,DV:=NA_real_]
NMdata::NMcheckData(dat.sim,type.data="sim")
#> No findings. Great!

Be aware that NMsim() does not reorder rows before simulation. The user is responsible for ordering the simulation data.

Just like for preparation of data set with observed data, time after previous dose may be the relevant time to analyze the simulation results against. We use NMdata::addTAPD() to add this automatically.

dat.sim2 <- addTAPD(dat.sim)
head(dat.sim2)

ID	TIME	EVID	CMT	AMT	II	ADDL	MDV	trt	ROW	DV	DOSCUMN	TPDOS	TAPD	PDOSAMT	DOSCUMA
1	0	1	1	300	0	0	1	300 mg then 150 mg QD	1	NA	1	0	0	300	300
1	0	2	2	NA	NA	NA	1	300 mg then 150 mg QD	2	NA	0	NA	NA	NA	0
1	1	2	2	NA	NA	NA	1	300 mg then 150 mg QD	3	NA	1	0	1	300	300
1	2	2	2	NA	NA	NA	1	300 mg then 150 mg QD	4	NA	1	0	2	300	300
1	3	2	2	NA	NA	NA	1	300 mg then 150 mg QD	5	NA	1	0	3	300	300
1	4	2	2	NA	NA	NA	1	300 mg then 150 mg QD	6	NA	1	0	4	300	300

Notice TAPD for the sample at TIME==0. addTAPD does not use the order of the data set to determine the time-order or the records. The default behavior of addTAPD is to treat a sample taken at the exact same time as a dose as a pre-dose. If instead we want them to be considered post-dose, we have to specify how to order EVID numbers.

## order.evid=c(1,2) means doses are ordered before EVID=2 records
dat.sim2 <- addTAPD(dat.sim,order.evid=c(1,2))
## now the TIME=0 sample has TAPD=0
head(dat.sim2)

ID	TIME	EVID	CMT	AMT	II	ADDL	MDV	trt	ROW	DV	DOSCUMN	TAPD	PDOSAMT	DOSCUMA
1	0	1	1	300	0	0	1	300 mg then 150 mg QD	1	NA	1	0	300	300
1	0	2	2	NA	NA	NA	1	300 mg then 150 mg QD	2	NA	1	0	300	300
1	1	2	2	NA	NA	NA	1	300 mg then 150 mg QD	3	NA	1	1	300	300
1	2	2	2	NA	NA	NA	1	300 mg then 150 mg QD	4	NA	1	2	300	300
1	3	2	2	NA	NA	NA	1	300 mg then 150 mg QD	5	NA	1	3	300	300
1	4	2	2	NA	NA	NA	1	300 mg then 150 mg QD	6	NA	1	4	300	300

addTAPD uses NMdata::NMexpandDoses to make sure all dosing times are considered. See ?NMdata::addTAPD for what the other created columns mean and for many useful features.

Multiple endpoints (e.g. parent and metabolite)

Pass a data.frame to NMaddSamples’s CMT argument to include multiple endpoints.

NMaddSamples(doses,CMT=data.frame(CMT=c(2,3),DVID=c("Parent","Metabolite")),TIME=1:2)

ID	TIME	EVID	CMT	AMT	II	ADDL	MDV	trt	DVID
1	0	1	1	300	0	0	1	300 mg then 150 mg QD	NA
1	1	2	2	NA	NA	NA	1	300 mg then 150 mg QD	Parent
1	1	2	3	NA	NA	NA	1	300 mg then 150 mg QD	Metabolite
1	2	2	2	NA	NA	NA	1	300 mg then 150 mg QD	Parent
1	2	2	3	NA	NA	NA	1	300 mg then 150 mg QD	Metabolite
1	24	1	1	150	24	5	1	300 mg then 150 mg QD	NA

Cohort-dependent or individual sampling schemes

Same way as for the CMT argument, TIME can also be a data.frame. If it contains a covariate found in the doses data, the added simulation times will be merged on accordingly. You can use say a cohort identifier, or it could be ID which allows you to reuse (all or parts of) the observed sample times.

Other features

As the name implies the default of function is to use EVID=2 (which means they are neither doses, samples, nor resetting events) for these records. Should you want to change that (maybe to EVID=0), use the EVID argument.

Philip Delff

August 06, 2025 using NMsim 0.2.3.936

Introduction

A basic simulation data set

Multiple endpoints (e.g. parent and metabolite)

Cohort-dependent or individual sampling schemes

Other features

Data Set Creation with NMsim

Philip Delff

August 06, 2025 using NMsim 0.2.3.936

Introduction

A basic simulation data set

Add time after previous dose and related information

Multiple endpoints (e.g. parent and metabolite)

Cohort-dependent or individual sampling schemes

Other features