Skip to contents

NMsim

Introduction

The forest plot is an effective and widely recognized way to illustrate estimated covariate effects on exposure or response, or parameters related to these (e.g. clearance). The forest plot typically includes precision of the estimate in terms of a confidence interval. Often, an acceptance region such as the 80%-125% bio equivalence is included for comparison.

In some cases, the forest plot can be derived based on model estimates without simulation. This is the case for a forest plot on exposure if the PK is linear and only steady-state average concentration (or AUC) is of interest. If the PK is non-linear and/or exposure metrics such as Cmax which depends on multiple PK parameters are of interest, a simulation-based forest plot may be needed. As we shall see, NMsim provides a flexible and concise framework to perform the required simulations, a function for summarizing the simulation results is provided, and it is demonstrated how the summary is easily plotted using the coveffectsplot R package.

This vignette is no attempt to harmonize how to construct a forest plot. Instead, we shall describe the steps taken, and it is up to the scientist to choose how they want to construct the analysis to address their questions most effectively.

Initialization

library(data.table)
library(NMsim)
library(NMdata)  
library(NMcalc)  ## Optional. Used to calculate AUC.
library(coveffectsplot)
library(knitr) ## for printing tables with knitr::kable

NMdataConf(
    path.nonmem = "/opt/NONMEM/nm75/run/nmfe75", ## path to NONMEM executable
    dir.sims="simtmp-forest", ## where to store temporary simulation results
    dir.res="simres-forest"   ## where to store final simulation results
    ## ,as.fun="data.table"
)

A model is selected.

file.mod <- "NMsim-forest-models/xgxr134.mod"

Generation of simulation input data

The simulation data set has to match the model in compartment numbers, and it must contain all variables needed to run the NONMEM model. We simulate daily dosing of 30 mg.

doses <- NMcreateDoses(TIME=0,AMT=30,ADDL=100,II=24,col.id=NA)
kable(doses)
TIME EVID CMT AMT II ADDL MDV
0 1 1 30 24 100 1

NMsim::expandCovs() can be used to construct a set of simulations needed to derive a forest plot. It varies one covariate at a time, keeping all other covariates at their reference value. It can derive references and quantiles from a data set.

For reference values we use median in the observed population for continuous covariates and manually select “Female” as the reference for sex.

## reading output and input tables from estimation. Used to determine
## reference values and quantiles.
data.ref <- NMdata::NMscanData(file.mod,quiet=TRUE)
covs <- expandCovs(
    AGE=list(ref=median,quantiles=c(10,25,75,90)/100,label="Age (years)"),
    ## notice, values OR quantiles can be provided
    WEIGHTB=list(ref=median, quantiles=c(10,25,75,90)/100, label="Bodyweigt (kg)"),
    MALEN=list(ref=c(Female=0), values=c(Male=1), label="Sex"),
    data=data.ref,
    as.fun="data.table"
)
## adding distinct ID's for each combination of covariates
covs[,ID:=.GRP,by=.(type,covvar,covval)]

kable(covs)
covvar covval covvalc covlabel covref type AGE MALEN WEIGHTB ID
NA NA NA NA NA ref 54 0 110 1
AGE 34 34 Age (years) 54 value 34 0 110 2
AGE 45 45 Age (years) 54 value 45 0 110 3
AGE 65 65 Age (years) 54 value 65 0 110 4
AGE 73 73 Age (years) 54 value 73 0 110 5
WEIGHTB 85 85 Bodyweigt (kg) 110 value 54 0 85 6
WEIGHTB 96 96 Bodyweigt (kg) 110 value 54 0 96 7
WEIGHTB 130 130 Bodyweigt (kg) 110 value 54 0 130 8
WEIGHTB 140 140 Bodyweigt (kg) 110 value 54 0 140 9
MALEN 1 Male Sex 0 value 54 1 110 10

We now have all combinations of covariates in the object called covs. Notice the type column which distinguishes the reference combination in contrast to the other simulations where one covariate is being varied at a time.

## repeating the doses for all combinations of covariates
dt.dos <- covs[,doses[],by=covs]
dims(covs,doses,dt.dos) |> kable()
data nrows ncols
covs 10 10
doses 1 7
dt.dos 10 17

We sample every 15 minutes on the first day and one day in steady-state.

## add a sampling scheme
time.sim <- rbind(
    data.table(TIME=seq(0,24,.25),period="Day 1"),
    data.table(TIME=seq(0,24,.25)+30*24,period="Steady-State")
)
dt.sim <- addEVID2(dt.dos,TIME=time.sim,CMT=2)
dims(dt.dos,time.sim,dt.sim) |> kable()
data nrows ncols
dt.dos 10 17
time.sim 194 2
dt.sim 1950 18

There is only one analyte in this data set. If for instance, you have a parent and a metabolite, adding sampling times could look like this:

dt.sim.parent.metab <-
    addEVID2(dt.dos,TIME=time.sim,CMT=data.frame(analyte=c("parent","metabolite"),CMT=c(2,3)))

We used the period column to distinguish time intervals for the analysis. We have to remember to do the postprocessing by the values in that column. If you are looking at multiple analytes, remember to postprocess by the column that distringuishes those, too. In this example, that would be analyte.

Simulation

We need to run the simulations repeatedly, sampling parameters based on uncertainty estimates. This can be done in various ways. First of all, one must choose between non-parametric and parametric sampling. Non-parameteric sampling is typically based on a bootstrap, and the parametetric sampling is often based on a successfull $COVARIANCE step. NMsim has methods to do either type, and multiple methods are available for parametric sampling. The best source of information on these different methods is the ACOP2024 poster Simulation of clinical trial predictions with model uncertainty using NMsim by Sanaya Shroff and Philip Delff.

In this case we shall use parametric sampling. Since in this case we are simulating typical subjects (random effects fixed at zero), we only need variability on the fixed effects (THETA’s). Multiple methods are available in NMsim to do this. We shall use the method provided by NMsim based on the multivariate normal distribution (mvrnorm). This method is selected because it is adequate for sampling THETA’s, it does not require additional software installed, and it is robust.

simres.forest <- NMsim(file.mod # path to NONMEM model
                      ,data=dt.sim, # simulation dataset
                      ,name.sim="forest_mvrnorm" # output name suffix
                      ,method.sim=NMsim_VarCov # sampling with mvrnorm
                      ,nsims=500
                      ,typical=TRUE # FALSE to include BSV
                      ,table.vars=cc(PRED,IPRED) # output table variables
                      ,seed.R=342 # seed for reproducibility
                      ,sge=TRUE # TRUE if submitting to a cluster
                      ,nc=1
                       )

If you encounter issues with samples that do not run, consider whether any of your uncertainty estimates on THETA’s could lead to sampled THETA’s that could make the model be undefined. This would often be absorption parameters, clearences, volumes or any other strictly positive parameter which is estimated with poor precision. If this happens, you may want to go back and estimate that THETA on the log scale to make sure all sampled values will be positive. A quick way to look for this is by combining parameter estimates (from the .ext file) and the automatic labeling of the parameters using NMdata::NMrelate():

NMdata::NMreadExt(file.mod,as.fun="data.table")[par.type=="THETA",.(RSE=se/value),by=.(par.name,FIX)] |>
    NMdata::mergeCheck(NMrelate(file.mod,par.type="THETA",as.fun="data.table")[,.(par.name,code)],by="par.name",quiet=TRUE) |>
    kable()
par.name FIX RSE code
THETA(1) 0 0.0804007 LTVKA=THETA(1)
THETA(2) 0 0.0136327 LTVV2=THETA(2)
THETA(3) 0 0.0789812 LTVCL=THETA(3)
THETA(4) 0 0.0481949 LTVV3=THETA(4)
THETA(5) 0 0.0292952 LTVQ=THETA(5)
THETA(6) 0 0.4222300 AGEEFF=THETA(6)
THETA(7) 0 1.9160784 WEIGHTEFF=THETA(7)
THETA(8) 0 0.1041536 MALEEFF=THETA(8)

Post processing

NMsim provides a function to do the post processing of the set of simulations setup by expandCovs(). The key steps performed by this function are outlined below. Most importantly, it normalizes the exposures by the reference value for each sampled set of parameters. Then, it derives median and a confidence interval as quantiles in the simulated distribution.

### Read simulation results
simres <- NMreadSim("simres-forest/xgxr134_forest_mvrnorm_MetaData.rds",wait=TRUE,rm.tmp=TRUE)
## Check how many models returned simulation results
as.data.table(simres)[,uniqueN(model.sim)]
## [1] 500

If you encounter issues with samples that do not run, please consider whether any of your uncertainty estimates on \(THETA\)’s could lead to sampled THETA’s that could make the model be undefined. This would often be absorption parameters, clearences, volumes or any other strictly positive parameter which is estimated with poor precision. If this happens, you may want to go back and estimate that THETA on the log scale to make sure all sampled values will be positive. A quick way to look into this is to look at the relative standard error for each THETA. In the following we merge in the first line in the model using each THETA so we can refresh our memory on the interpretation of each parameter without relying on a well formatted parameter table.

NMdata::NMreadExt(file.mod,as.fun="data.table")[
            par.type=="THETA",.(RSE=se/value),by=.(par.name,FIX)] |>
    NMdata::mergeCheck(
                NMrelate(file.mod,par.type="THETA",as.fun="data.table")[,.(par.name,code)],
                by="par.name",quiet=TRUE)

The step required by the user is to define the functions to derive relevant exposure metrics. We will use AUC 0-24h and Cmax.

### Define exposure metrics
funs.exposure <- list(
    "Cmax"=function(x) max(x$PRED)
   ,"AUC"=function(x) trapez(x$TIME,x$PRED)
    ## ,"Concentration at 4 hours"=function(x) x$value[x$TAPD==4]
)

sum.uncertain <- summarizeCovs(simres,
                                 funs.exposure = funs.exposure,
                                 by=cc(period),
                                 cover.ci=.95
                                 )

Plotting

We will use the R package coveffectsplot for plotting. coveffectsplot::forest_plot() requires certain column names so we adjust those first.

setDT(sum.uncertain)
setnames(sum.uncertain,
         cc(covvalf,predmm,predml,predmu,metric.var),
         cc(label,mid,lower,upper,paramname)
         )

sum.uncertain[,MEANVAL:=mid]
sum.uncertain[,covname:=covlabel]
nsig <- 3
sum.uncertain[,LABEL := sprintf("%s [%s - %s]",signif2(mid,nsig),signif2(lower,nsig),signif2(upper,nsig))]


descrip.legend <- "Reference (vertical line)\nClinically relevant limits 0.8-1.25 (colored area)"

### I don't know why forest_plot() needs this 
label_value <- function(x,... )x

fun.plot <- function(data,...){
    textsize <- 10
    forest1 <- forest_plot(
        data = data,
        facet_formula = "covlabel ~ paramname", 
        facet_scales = "free_y",
        facet_space = "free_y",
        xy_facet_text_bold = FALSE, 
        plot_table_ratio = 1.7, 
        table_text_size = 3,
        x_label_text_size = textsize,
        y_label_text_size = textsize, 
        x_facet_text_size = textsize, 
        y_facet_text_size = textsize,
        base_size=textsize,
        strip_placement = "outside",
        table_position = "right",
        legend_order = c("pointinterval", "ref", "area"),
        x_range = c(.5,1.5),
        ref_legend_text = descrip.legend,
        area_legend_text = descrip.legend,
        facet_switch = c("y"),
        legend_position="bottom",
        ...
    )
}

forest.day1 <- fun.plot(sum.uncertain[period=="Day 1"])
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
##  The deprecated feature was likely used in the coveffectsplot package.
##   Please report the issue at
##   <https://github.com/smouksassi/coveffectsplot/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

forest.ss <- fun.plot(sum.uncertain[period=="Steady-State"])

Post processing explained step by step

We are including the code from the main steps in the post processing function, NMsim::summarizeCovs(). It is important that the scientist understands that what the forest plots derived in this document represent is the relative effect of the covariate on the exposure metric. This is derived by NMsim::summarizeCovs() as quantiles of the exposure relative to the reference subject. The confidence interval hence expresses the uncertainty on the covariate effect for a subject with other covariates at reference values.

Notice, the code below consists of snippets from the NMsim::summarizeCovs() function. The code is not intended to be used on its own.

### use only simulated samples
simres <- as.data.table(data)[EVID==2]


### summarizing exposure metrics for each subject in each model,
### each combination of covariates
resp.model <- simres[,lapply(funs.exposure,function(f)f(.SD)),
                     by=c(allby,modelby,"ID")]

### the exposure metrics in long format.
mvars <- names(funs.exposure)
resp.model.l <- melt(resp.model,measure.vars=mvars,variable.name="metric.var",value.name="metric.val")


## deriving median by model and time to have a single value per
## model This is only relevant in case multiple subjects are
## simulated by each model. 
sum.res.model <- resp.model.l[
   ,.(predm=median(metric.val))
   ,by=c(modelby,allby,"metric.var")
]

### making reference value a column rather than rows. 
## column with refrence exposure value is called val.exp.ref

### summarize distribution of ratio to ref across parameter samples/models
sum.uncertain <- sum.res.model[
   ,setNames(as.list(quantile(predm/val.exp.ref,probs=c((1-cover.ci)/2,.5,1-(1-cover.ci)/2))),
             c("predml","predmm","predmu"))
   ,by=c(allby,"metric.var")]