Using EPIC in R
UsingEPICinR.Rmd1 Installation
If you do not have R installed on your computer, please go to Appendix 1. Once you have R installed, you can install the epicR package as per the instructions below:
Windows 7 or Later
- Download and Install the latest version of R from https://cran.r-project.org/bin/windows/base/
- Download and Install R Studio from https://posit.co/download/rstudio-desktop/
- Download and Install the latest version of Rtools from https://cran.r-project.org/bin/windows/Rtools/
- Using either an R session in Terminal or in R Studio, install the
package
pak:
install.packages('pak')- Install epicR from GitHub:
pak::pkg_install('resplab/epicR')Mac OS Sierra and Later
- Download and Install the latest version of R from https://cran.r-project.org/bin/macosx/
- Download and Install R Studio from https://posit.co/download/rstudio-desktop/
- Install Xcode Command Line Tools by running the following in Terminal:
- Using either an R session in Terminal or in R Studio, install the
package
pak:
install.packages('pak')- Install epicR from GitHub:
pak::pkg_install('resplab/epicR')Ubuntu 22.04 and Later
- Install R by executing the following commands in Terminal:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9If the installation is successful, you should be able to start R:
- Download and Install R Studio from https://posit.co/download/rstudio-desktop/
- Install
libcurlfrom Terminal:
- Using either an R session in Terminal or in R Studio, install the
package
pak:
install.packages("pak")- Install epicR from GitHub:
pak::pkg_install("resplab/epicR")2 Running the Model
Now that you have installed epicR, you can load the
library:
library(epicR)Recommended: Using the simulate()
Function
The easiest way to run EPIC is with the simulate()
function, which handles all session management automatically and
provides progress information:
# Run with defaults (Canada, 20 year horizon, 60,000 agents)
results <- simulate()
print(results$basic)
# Customize parameters
results <- simulate(
jurisdiction = "us",
time_horizon = 10,
n_agents = 100000
)
# Quick test with fewer agents
results <- simulate(n_agents = 10000)
# Get extended results
results <- simulate(return_extended = TRUE)
print(results$extended)
# Get event history
results <- simulate(return_events = TRUE, n_agents = 10000)
head(results$events)Advanced: Manual Session Management
For advanced users who need fine-grained control (e.g., running multiple simulations in one session), you can manage sessions manually.
The backend of this package is in C++. In other languages such as
R/Python, memory allocation is taken care of for you. However, in C/C++,
you need to allocate memory yourself. You also need to deallocate memory
when the program is done. Because of this, we have the function
init_session().
init_session() is in the core.R file,
and calls 3 C functions: Cdeallocate_resources(),
Callocate_resources(), and Cinit_session().
You don’t really need to know how these work, but basically they
allocate/deallocate memory.
You need to call this before you run anything:
In C++, a program that is successful returns 0, so you
should see this in the R console.
Step 1: Initialize the Session
(Already shown above with init_session())
Step 2: Set the Inputs
The default input values are created in the file
input.R. You can retrieve these values using
get_input(). This function returns a nested list. The
returned list contains values, help and
ref which each hold further sub-lists. help
and ref contain explanations regarding what each input
parameter means.
For easier notation we will refer the retrieved inputs as
input:
input <- get_input()Below is a list of the main categories that input$values
returns:
Input Value Categories:
| Category | Description |
|---|---|
global_parameters |
Time horizon, age ranges, discount rates |
agent |
Population demographics (sex distribution, etc.) |
smoking |
Smoking initiation, cessation, and relapse rates |
diagnosis |
COPD diagnosis and case detection parameters |
cost |
Healthcare costs (exacerbations, medications, maintenance) |
medication |
Medication usage and effectiveness parameters |
exacerbation |
Exacerbation rates and severity distributions |
events |
Event-specific parameters |
input$values returns a list of lists, so you can further
explore and set specific inputs using the $ operator. An
example of how to do this is shown below:
#changing time_horizon which can be accessed through global_parameters
input$values$global_parameters$time_horizon <- 5
#changing age0, which can be accessed through global_parameters, from the default value of 40 to 50
input$values$global_parameters$age0 <- 50
#changing p_females, which can be accessed through agent, from the default value 0.5 to 0.6.
input$values$agent$p_female <- 0.6Step 3: Run
After changing the inputs you can provide them to the
run() function. The run() function has two
arguments: max_n_agents and input. If
you created your own input in Step 2, you can put it in
here; otherwise, you can leave the input argument
blank. The max_n_agents argument is how many people you
want in your model. The default is set to the maximum integer your
computer will allow. You can set this to a lower number if you like (it
might run faster).
Step 4: Results
Once you have run the model simulation, there are several functions to access the results.
The first function is the Cget_output() which shows some
of the overall results:
results <- Cget_output()Basic Output from Cget_output():
| Output | Description |
|---|---|
n_agents |
Total number of agents simulated |
cumul_time |
Cumulative time in person-years |
n_deaths |
Total number of deaths |
n_COPD |
Number of agents with COPD |
n_exac_mild |
Number of mild exacerbations |
n_exac_moderate |
Number of moderate exacerbations |
n_exac_severe |
Number of severe exacerbations |
n_exac_very_severe |
Number of very severe exacerbations |
total_cost |
Total healthcare costs |
total_qaly |
Total quality-adjusted life years |
The second function Cget_output_ex() returns a very
large list of results:
resultsExra <- Cget_output_ex()Extended Output from
Cget_output_ex():
The extended output includes detailed breakdowns by: - Year (annual results over the time horizon) - Age groups - Sex - COPD severity (GOLD stages) - Smoking status
Key tables in the extended output: - n_death - Deaths by
year, age, sex - n_COPD - COPD prevalence by year, age, sex
- cost - Costs by year and category - qaly -
QALYs by year and health state - exacerbation -
Exacerbations by year, severity, and GOLD stage
Important Notes: - EPIC only starts simulating patients at age 40 - Comorbidities are not implemented in the current version of epicR
The full snippet of the code:
init_session()
input <- get_input()
input$values$global_parameters$time_horizon <- 5
run(input=input$values)
results <- Cget_output()
resultsExra <- Cget_output_ex()
terminate_session()For some studies, having access to the entire event history of the simulated population might be beneficial. Capturing event history is possible by setting record_mode as a setting.
Cget_all_events_matrix() function returns the events
matrix, with every event for every patient. Note that you might need a
large amount of memory available, if you want to collect event history
for a large number of patients.
settings <- get_default_settings()
# record_mode = 2 indicates recording every event that occurs
settings$record_mode <- 2
#n_base_agents is the number of people at time 0.
settings$n_base_agents <- 1e4
init_session(settings = settings)
run()
results <- Cget_output()
events <- as.data.frame(Cget_all_events_matrix())
head(events)
terminate_session()You can change the record mode of the simulation by accessing
settings$record_mode as shown above. Here are what
different record modes mean:
Record Modes:
| Mode | Description |
|---|---|
| 0 | Aggregate output only (minimal memory) |
| 1 | Agent-level summary statistics |
| 2 | Complete event history for all agents (high memory usage) |
As shown in the code snippet above, you can inspect the event matrix
Cget_all_events_matrix() by converting it into a data frame
using as.data.frame(Cget_all_events_matrix()). This data
frame consists of 33 columns, including:
Key Event Matrix Columns:
| Column | Description |
|---|---|
agent_id |
Unique identifier for each agent |
event |
Event type code (see table below) |
time |
Time of event occurrence |
age |
Age of agent at event |
sex |
Sex of agent (0=male, 1=female) |
smoking_status |
Current smoking status |
COPD |
COPD status (0=no, 1=yes) |
gold_stage |
GOLD severity stage (0-4) |
exacerbation_severity |
Severity of exacerbation event |
cost |
Cost associated with event |
qaly |
QALY impact of event |
In the events data frame, each type of event has a code corresponding to the table below:
Event Type Codes:
| Event | Code |
|---|---|
| Start | 0 |
| Annual | 1 |
| Birthday | 2 |
| Smoking change | 3 |
| COPD incidence | 4 |
| Exacerbation | 5 |
| Exacerbation end | 6 |
| Death by exacerbation | 7 |
| Doctor visit* | 8 |
| Medication change* | 9 |
| Background death | 13 |
| End | 14 |
*Note: Doctor visit and Medication change are not implemented in this version of epicR
Closed-cohort analysis
Closed-cohort analysis can be specified by changing the appropriate input parameters:
library(epicR)
input <- get_input(closed_cohort = 1)$values
init_session()
run(input=input)
Cget_output()
terminate_session()