Built 2024-11-14 using NMdata 0.1.8.901.

This vignette is still under development. Please make sure to see latest version available here.

General NMdata questions

What is NMdata?

NMdata is an R package that can help

  • Creating event-based data sets for PK/PD modeling
  • Keeping Nonmem code updated to match contents of datasets
  • Read all output data and combine with input data from Nonmem runs
    • Very automated - supply output list file (.lst) only

NMdata comes with a configuration tool that can be used to tailor default behaviour to the user’s system configuration and preferences.

NMdata is not

  • A plotting package
  • A tool to retrieve details about model runs
  • A calculation or simulation toolbox
  • A “silo” that requires you to do things in a certain way
    • No tools in NMdata requires other NMdata tools to be used

How do I install NMdata?

For most users, taking NMdata from the package archive you are already using is the preferred way:

install.packages("NMdata")
library(NMdata)

To get the development version of NMdata, do the following from within R:

library(remotes)
install_github("nmautoverse/NMdata")
library(NMdata)

In a production environment, use a github release To install NMdata X.Y.Z release, do (notice “@v”:)

library(remotes)
install_github("nmautoverse/NMdata@vX.Y.Z")
library(NMdata)

This example is not automatically updated to latest available release. Please make sure to check for latest version on (https://github.com/nmautoverse/NMdata/releases)

If you need to archive a script that is dependent on a feature or bug fix which is only in the master branch, you can request a release.

What about NMdata dependencies?

NMdata depends on data.table only, and data.table does not have any dependencies. R 3.0 or later is required, that’s all (yes, you can run NMdata on pretty old R installations without any difference).

Why one more tool for interacting with Nonmem from R?

Other tools exist for reading data from Nonmem. However, they most often have requirements to how the dataset or the Nonmem control stream is written. NMdata aims at assuming as little as possible about how you work while still doing (and checking) as much as possible for you.

Tools in NMdata do not assume you use other tools in NMdata. If you like just one function in NMdata and want to integrate that in your workflow, that is perfectly possible. NMdata aims at being able to integrate with anything else and still do the job.

While many other tools available provide plotting functions, NMdata focuses on getting the data ready for smooth Nonmem experience (by checking as much as possible first), get the resulting data out of Nonmem and a little processing.

If you are building applications that process data from Nonmem, you may find very valuable tools in NMdata. I have not seen as generic and flexible a Nonmem data reader elsewhere.

I use another tool that automatically reads my data. Should I be interested in NMdata?

The data creation tools in NMdata may still be interesting. If another tool is working well for a project, you may not have any reason to use NMscanData (the Nonmem data reader in NMdata) for it. However, a lot of the time those tools have certain requirements to how datasets are constructed and especially how data is exported from Nonmem ($TABLE). If a Nonmem run does not live up to those requirements, and you want to look at the run results, NMscanData will let you do that without you having to “correct” your $TABLE statements and wait for Nonmem to run again.

Also, in other cases NMscanData can save you a lot of time and frustration even if you have your own routines for these tasks. It will allow you to easily read your old Nonmem runs that were done a little differently, or if you need to read someone elses work. A meta analysis of a large number of models implemented by different people over the years? That’s a candy store for NMscanData. Here, NMdata can save you many hours of frustration.

How can NMscanData do all that only based on the Nonmem output list file

NMscanData reads the names of the input and output tables from the list file, the path to the input data from the control stream (which it knows where to find, or you can change the method it uses to find it), and then it checks several properties of these files and their contents before combining all the info. It is quite a lot of checking and book keeping but the key steps are simple.

Why is NMdata fast?

NMdata is generally fast - all thanks to the incredible data.table package. If you don’t use data.table already, NMdata may even seem extremely fast. If you are noticing some circumstances where NMdata seems slow, I am very curious to hear about it.

So NMdata uses data.table. Does that mean variables are modified by reference?

NMdata definitely modify variables by reference internally but this will not affect your workspace (only exception is NMstamp()). It might improve speed to go all the way and modify by reference in the user workspace, but using NMdata must be easy for R users at all levels of experience. If you don’t understand what this is all about, you’re fine.

I want to use a tool from NMdata, but how does it integrate with dplyr or data.table?

Every function in NMdata has an argument called as.fun. You can provide a function in this argument which will automatically be applied to results.

You can get a tibble if you work with dplyr/tidyverse tools:

NMscanData(..., as.fun = tibble::as_tibble)

Under the hood, NMdata is implemented in data.table, and in fact the default is to convert to data.frame before returning the output. So if you want to have data.table’s back, use as.fun="data.table" (notice as a character string, not a function) to avoid the conversion

NMscanData(..., as.fun = "data.table")

Using as.fun=as.data.table (a function this time) wold work but is not recommended, because that would do an unnecessary copy of data.

If you want to change the behaviour generally for the session and omit the as.fun argument, you can do this by this option:

## for tibbles
NMdataConfig(as.fun = tibble::as_tibble)
## for data.table
NMdataConfig(as.fun = "data.table")

All NMdata functions (that return data) should now return tibbles or data.tables. If you use NMdataConfig to control this, for reproducibility please do so explicitly in your script and not in your .Rprofile or similar.

The help of an NMdata function says I have to pass a data.frame. Can I use a tibble or data.table?

Absolutely. data.tables and tibbles are data.frames, and data.frame in the documentation refers to any of these structures (that technically inherit from the data.frame class).

Can I request a feature?

Please open an issue on github.

Any plans on including some plotting functions?

No. See the question above about dependencies. If NMdata were extended with plotting features, it would become dependent on a number of packages, and backward compatibility would be very difficult to ensure.

The only way to provide plotting features for output from NMdata functions, would be to launch a separate package. I rarely need much code to get the plots I want based on the format returned by NMscanData and maybe a call to findCovs or findVars.

Questions specific to data reading tools

I don’t use PSN. What should I do?

NMscanData needs the path to the output control stream (technically, the input control stream will work too, but this is not recommended). It has no requirement to the naming of files.

NMdata by default assumes a PSN setup. The reason this makes a difference is a little technical but briefly, the path to the input data is not available in the list file when using PSN. If you don’t use PSN, all information will probably be available in the output control stream. Enough talking, just do this:

NMdataConf(file.mod = identity)

How can I customize how NMscanData names models?

NMscanData take the name of the model from the file name and adds it to a column in the returned data. If the name of the directory is the model name, you can do the following as well:

NMdataConfig(modelname = function(file) basename(dirname(normalizePath(file))))

Feel free to modify the function to omit parts of the string or add to it.

What is “input” and output data?

  • What is referred to as “input data” in NMdata documentation is the datafile read in the $DATA or $INFILE.
  • What is referred to as “output data” in the same documentation is the totality of files written by $TABLE statements.

Notice especially, “output data” does not refer to any of the files automatically written by Nonmem such as .phi, .ext, .cov etc.

Does NMdata read .phi, .ext, .cov and other files generated by Nonmem?

No. Reading this data is often very useful, but there are other tools out there that do this (e.g. nonmem2R).