Simulation-Based Forest Plots with NMsim
Philip Delff
Boris Grinshpun
January 13, 2025
Source:vignettes/NMsim-forest.Rmd
NMsim-forest.Rmd
Introduction
The forest plot is an effective and widely recognized way to illustrate estimated covariate effects on exposure or response, or parameters related to these (e.g. clearance). The forest plot typically includes precision of the estimate in terms of a confidence interval. Often, an acceptance region such as the 80%-125% bio equivalence is included for comparison.
In some cases, the forest plot can be derived based on model estimates without simulation. This is the case for a forest plot on exposure if the PK is linear and only steady-state average concentration (or AUC) is of interest. If the PK is non-linear and/or exposure metrics such as Cmax which depends on multiple PK parameters are of interest, a simulation-based forest plot may be needed. As we shall see, NMsim provides a flexible and concise framework to perform the required simulations, a function for summarizing the simulation results is provided, and it is demonstrated how the summary is easily plotted using the coveffectsplot R package.
This vignette is no attempt to harmonize how to construct a forest plot. Instead, we shall describe the steps taken, and it is up to the scientist to choose how they want to construct the analysis to address their questions most effectively.
Initialization
library(data.table)
library(NMsim)
library(NMdata)
library(NMcalc) ## Optional. Used to calculate AUC.
library(coveffectsplot)
library(knitr) ## for printing tables with knitr::kable
NMdataConf(
path.nonmem = "/opt/NONMEM/nm75/run/nmfe75", ## path to NONMEM executable
dir.sims="simtmp-forest", ## where to store temporary simulation results
dir.res="simres-forest" ## where to store final simulation results
## ,as.fun="data.table"
)
A model is selected.
file.mod <- "NMsim-forest-models/xgxr134.mod"
Generation of simulation input data
The simulation data set has to match the model in compartment numbers, and it must contain all variables needed to run the NONMEM model. We simulate daily dosing of 30 mg.
doses <- NMcreateDoses(TIME=0,AMT=30,ADDL=100,II=24,col.id=NA)
kable(doses)
TIME | EVID | CMT | AMT | II | ADDL | MDV |
---|---|---|---|---|---|---|
0 | 1 | 1 | 30 | 24 | 100 | 1 |
NMsim::expandCovs()
can be used to construct a set of
simulations needed to derive a forest plot. It varies one covariate at a
time, keeping all other covariates at their reference value. It can
derive references and quantiles from a data set.
For reference values we use median in the observed population for continuous covariates and manually select “Female” as the reference for sex.
## reading output and input tables from estimation. Used to determine
## reference values and quantiles.
data.ref <- NMdata::NMscanData(file.mod,quiet=TRUE)
covs <- expandCovs(
AGE=list(ref=median,quantiles=c(10,25,75,90)/100,label="Age (years)"),
## notice, values OR quantiles can be provided
WEIGHTB=list(ref=median, quantiles=c(10,25,75,90)/100, label="Bodyweigt (kg)"),
MALEN=list(ref=c(Female=0), values=c(Male=1), label="Sex"),
data=data.ref,
as.fun="data.table"
)
## adding distinct ID's for each combination of covariates
covs[,ID:=.GRP,by=.(type,covvar,covval)]
kable(covs)
covvar | covval | covvalc | covlabel | covref | type | AGE | MALEN | WEIGHTB | ID |
---|---|---|---|---|---|---|---|---|---|
NA | NA | NA | NA | NA | ref | 54 | 0 | 110 | 1 |
AGE | 34 | 34 | Age (years) | 54 | value | 34 | 0 | 110 | 2 |
AGE | 45 | 45 | Age (years) | 54 | value | 45 | 0 | 110 | 3 |
AGE | 65 | 65 | Age (years) | 54 | value | 65 | 0 | 110 | 4 |
AGE | 73 | 73 | Age (years) | 54 | value | 73 | 0 | 110 | 5 |
WEIGHTB | 85 | 85 | Bodyweigt (kg) | 110 | value | 54 | 0 | 85 | 6 |
WEIGHTB | 96 | 96 | Bodyweigt (kg) | 110 | value | 54 | 0 | 96 | 7 |
WEIGHTB | 130 | 130 | Bodyweigt (kg) | 110 | value | 54 | 0 | 130 | 8 |
WEIGHTB | 140 | 140 | Bodyweigt (kg) | 110 | value | 54 | 0 | 140 | 9 |
MALEN | 1 | Male | Sex | 0 | value | 54 | 1 | 110 | 10 |
We now have all combinations of covariates in the object called
covs
. Notice the type
column which
distinguishes the reference combination in contrast to the other
simulations where one covariate is being varied at a time.
## repeating the doses for all combinations of covariates
dt.dos <- covs[,doses[],by=covs]
dims(covs,doses,dt.dos) |> kable()
data | nrows | ncols |
---|---|---|
covs | 10 | 10 |
doses | 1 | 7 |
dt.dos | 10 | 17 |
We sample every 15 minutes on the first day and one day in steady-state.
## add a sampling scheme
time.sim <- rbind(
data.table(TIME=seq(0,24,.25),period="Day 1"),
data.table(TIME=seq(0,24,.25)+30*24,period="Steady-State")
)
dt.sim <- addEVID2(dt.dos,TIME=time.sim,CMT=2)
dims(dt.dos,time.sim,dt.sim) |> kable()
data | nrows | ncols |
---|---|---|
dt.dos | 10 | 17 |
time.sim | 194 | 2 |
dt.sim | 1950 | 18 |
There is only one analyte in this data set. If for instance, you have a parent and a metabolite, adding sampling times could look like this:
dt.sim.parent.metab <-
addEVID2(dt.dos,TIME=time.sim,CMT=data.frame(analyte=c("parent","metabolite"),CMT=c(2,3)))
We used the period
column to distinguish time intervals
for the analysis. We have to remember to do the postprocessing by the
values in that column. If you are looking at multiple analytes, remember
to postprocess by the column that distringuishes those, too. In this
example, that would be analyte
.
Simulation
We need to run the simulations repeatedly, sampling parameters based
on uncertainty estimates. This can be done in various ways. First of
all, one must choose between non-parametric and parametric sampling.
Non-parameteric sampling is typically based on a bootstrap, and the
parametetric sampling is often based on a successfull
$COVARIANCE
step. NMsim
has methods to do
either type, and multiple methods are available for parametric sampling.
The best source of information on these different methods is the
ACOP2024 poster Simulation
of clinical trial predictions with model uncertainty using NMsim by
Sanaya Shroff and Philip Delff.
In this case we shall use parametric sampling. Since in this case we
are simulating typical subjects (random effects fixed at zero), we only
need variability on the fixed effects (THETA’s). Multiple methods are
available in NMsim
to do this. We shall use the method
provided by NMsim based on the multivariate normal distribution
(mvrnorm
). This method is selected because it is adequate
for sampling THETA
’s, it does not require additional
software installed, and it is robust.
simres.forest <- NMsim(file.mod # path to NONMEM model
,data=dt.sim, # simulation dataset
,name.sim="forest_mvrnorm" # output name suffix
,method.sim=NMsim_VarCov # sampling with mvrnorm
,nsims=500
,typical=TRUE # FALSE to include BSV
,table.vars=cc(PRED,IPRED) # output table variables
,seed.R=342 # seed for reproducibility
,sge=TRUE # TRUE if submitting to a cluster
,nc=1
)
If you encounter issues with samples that do not run, consider
whether any of your uncertainty estimates on THETA
’s could
lead to sampled THETA
’s that could make the model be
undefined. This would often be absorption parameters, clearences,
volumes or any other strictly positive parameter which is estimated with
poor precision. If this happens, you may want to go back and estimate
that THETA on the log scale to make sure all sampled values will be
positive. A quick way to look for this is by combining parameter
estimates (from the .ext
file) and the automatic labeling
of the parameters using NMdata::NMrelate()
:
NMdata::NMreadExt(file.mod,as.fun="data.table")[par.type=="THETA",.(RSE=se/value),by=.(par.name,FIX)] |>
NMdata::mergeCheck(NMrelate(file.mod,par.type="THETA",as.fun="data.table")[,.(par.name,code)],by="par.name",quiet=TRUE) |>
kable()
par.name | FIX | RSE | code |
---|---|---|---|
THETA(1) | 0 | 0.0804007 | LTVKA=THETA(1) |
THETA(2) | 0 | 0.0136327 | LTVV2=THETA(2) |
THETA(3) | 0 | 0.0789812 | LTVCL=THETA(3) |
THETA(4) | 0 | 0.0481949 | LTVV3=THETA(4) |
THETA(5) | 0 | 0.0292952 | LTVQ=THETA(5) |
THETA(6) | 0 | 0.4222300 | AGEEFF=THETA(6) |
THETA(7) | 0 | 1.9160784 | WEIGHTEFF=THETA(7) |
THETA(8) | 0 | 0.1041536 | MALEEFF=THETA(8) |
Post processing
NMsim
provides a function to do the post processing of
the set of simulations setup by expandCovs()
. The key steps
performed by this function are outlined below. Most importantly, it
normalizes the exposures by the reference value for each sampled set of
parameters. Then, it derives median and a confidence interval as
quantiles in the simulated distribution.
### Read simulation results
simres <- NMreadSim("simres-forest/xgxr134_forest_mvrnorm_MetaData.rds",wait=TRUE,rm.tmp=TRUE)
## Check how many models returned simulation results
as.data.table(simres)[,uniqueN(model.sim)]
## [1] 500
If you encounter issues with samples that do not run, please consider
whether any of your uncertainty estimates on \(THETA\)’s could lead to sampled THETA’s
that could make the model be undefined. This would often be absorption
parameters, clearences, volumes or any other strictly positive parameter
which is estimated with poor precision. If this happens, you may want to
go back and estimate that THETA on the log scale to make sure all
sampled values will be positive. A quick way to look into this is to
look at the relative standard error for each THETA
. In the
following we merge in the first line in the model using each
THETA
so we can refresh our memory on the interpretation of
each parameter without relying on a well formatted parameter table.
NMdata::NMreadExt(file.mod,as.fun="data.table")[
par.type=="THETA",.(RSE=se/value),by=.(par.name,FIX)] |>
NMdata::mergeCheck(
NMrelate(file.mod,par.type="THETA",as.fun="data.table")[,.(par.name,code)],
by="par.name",quiet=TRUE)
The step required by the user is to define the functions to derive relevant exposure metrics. We will use AUC 0-24h and Cmax.
### Define exposure metrics
funs.exposure <- list(
"Cmax"=function(x) max(x$PRED)
,"AUC"=function(x) trapez(x$TIME,x$PRED)
## ,"Concentration at 4 hours"=function(x) x$value[x$TAPD==4]
)
sum.uncertain <- summarizeCovs(simres,
funs.exposure = funs.exposure,
by=cc(period),
cover.ci=.95
)
Plotting
We will use the R package coveffectsplot
for plotting.
coveffectsplot::forest_plot()
requires certain column names
so we adjust those first.
setDT(sum.uncertain)
setnames(sum.uncertain,
cc(covvalf,predmm,predml,predmu,metric.var),
cc(label,mid,lower,upper,paramname)
)
sum.uncertain[,MEANVAL:=mid]
sum.uncertain[,covname:=covlabel]
nsig <- 3
sum.uncertain[,LABEL := sprintf("%s [%s - %s]",signif2(mid,nsig),signif2(lower,nsig),signif2(upper,nsig))]
descrip.legend <- "Reference (vertical line)\nClinically relevant limits 0.8-1.25 (colored area)"
### I don't know why forest_plot() needs this
label_value <- function(x,... )x
fun.plot <- function(data,...){
textsize <- 10
forest1 <- forest_plot(
data = data,
facet_formula = "covlabel ~ paramname",
facet_scales = "free_y",
facet_space = "free_y",
xy_facet_text_bold = FALSE,
plot_table_ratio = 1.7,
table_text_size = 3,
x_label_text_size = textsize,
y_label_text_size = textsize,
x_facet_text_size = textsize,
y_facet_text_size = textsize,
base_size=textsize,
strip_placement = "outside",
table_position = "right",
legend_order = c("pointinterval", "ref", "area"),
x_range = c(.5,1.5),
ref_legend_text = descrip.legend,
area_legend_text = descrip.legend,
facet_switch = c("y"),
legend_position="bottom",
...
)
}
forest.day1 <- fun.plot(sum.uncertain[period=="Day 1"])
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## ℹ The deprecated feature was likely used in the coveffectsplot package.
## Please report the issue at
## <https://github.com/smouksassi/coveffectsplot/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
forest.ss <- fun.plot(sum.uncertain[period=="Steady-State"])
Post processing explained step by step
We are including the code from the main steps in the post processing
function, NMsim::summarizeCovs()
. It is important that the
scientist understands that what the forest plots derived in this
document represent is the relative effect of the covariate on the
exposure metric. This is derived by NMsim::summarizeCovs()
as quantiles of the exposure relative to the reference subject. The
confidence interval hence expresses the uncertainty on the covariate
effect for a subject with other covariates at reference values.
Notice, the code below consists of snippets from the
NMsim::summarizeCovs()
function. The code is not intended
to be used on its own.
### use only simulated samples
simres <- as.data.table(data)[EVID==2]
### summarizing exposure metrics for each subject in each model,
### each combination of covariates
resp.model <- simres[,lapply(funs.exposure,function(f)f(.SD)),
by=c(allby,modelby,"ID")]
### the exposure metrics in long format.
mvars <- names(funs.exposure)
resp.model.l <- melt(resp.model,measure.vars=mvars,variable.name="metric.var",value.name="metric.val")
## deriving median by model and time to have a single value per
## model This is only relevant in case multiple subjects are
## simulated by each model.
sum.res.model <- resp.model.l[
,.(predm=median(metric.val))
,by=c(modelby,allby,"metric.var")
]
### making reference value a column rather than rows.
## column with refrence exposure value is called val.exp.ref
### summarize distribution of ratio to ref across parameter samples/models
sum.uncertain <- sum.res.model[
,setNames(as.list(quantile(predm/val.exp.ref,probs=c((1-cover.ci)/2,.5,1-(1-cover.ci)/2))),
c("predml","predmm","predmu"))
,by=c(allby,"metric.var")]