Asthma Control Model¶
Data¶
Raw Data¶
EBA
was a prospective representative observational study of 618 participants aged 1-85
years (74% were >= 18 years old) with self-reported, physician-diagnosed asthma from BC.
The measurements were taken every 3 months for a year. Among 613 patients, only 6% were lost
during the one-year follow-up. There were at least 500 cases for each asthma control level.
More females were in the cohort as expected (adult asthma is more prevalent among females).
Asthma control level changed during the follow-up for 79% of the patients.
We followed the 2020 GINA guidelines to define asthma control level by using the sum of the four indicator variables (0 if no and 1 if yes) in the last 3 months before each measurement:
daily symptoms
nocturnal symptoms
inhaler use
limited activities
If the sum is zero, then the asthma control level is controlled. If it is less than 3, then it is partially-controlled. Otherwise, it is uncontrolled. For responses with do not know to the indicator variables, we treated them as a no. In this analysis, we did not consider treatment nor whether a patient experienced an exacerbation in the last 3 months before the visit. We excluded two patients whose asthma diagnosis dates were earlier than they were born and three patients who had no asthma diagnosis dates, for a final count of 613 patients.
Column | Type | Description |
---|---|---|
studyId |
int
|
8-digit patient ID |
visit |
int
|
The visit number. Visits were scheduled every 3 months for a year. A value in
[1, 5] .
|
daytimeSymptoms |
int
|
1 = yes, 2 = no |
nocturnalSymptoms |
int
|
1 = yes, 2 = no |
inhalerUse |
int
|
1 = yes, 2 = no |
limitedActivities |
int
|
1 = yes, 2 = no |
exacerbations |
int
|
TODO |
sex |
int
|
0 = female, 1 = male |
age |
float
|
Age in years |
ageAtAsthmaDx |
float
|
Age at asthma diagnosis |
time_since_Dx |
float
|
Time since asthma diagnosis in years |
time_since_Dx_cat |
int
|
1 = TODO, 2 = TODO, 3 = TODO |
Processed Data¶
In keeping with Python
conventions, the columns were converted to snake case. In addition,
studyId
was renamed to patient_id
, as studyId
indicates that the ID is for a given
study, when in fact the ID was for an individual patient.
The variables daytimeSymptoms
, nocturnalSymptoms
, inhalerUse
, and limitedActivities
were converted to binary variables, where 1 = True
and 0 = False
.
We also needed to compute the asthma control level from the four indicator variables. We first
computed the control_score
, defined as:
which has a minimum value of 0
(maximum control) and a maximum value of 4
(minimum control).
Then we defined the asthma control level as follows:
Column | Type | Description |
---|---|---|
patient_id |
int
|
8-digit patient ID |
visit |
int
|
The visit number. Visits were scheduled every 3 months for a year. A value in
[1, 5] .
|
daytime_symptoms |
int
|
1 = True ,
0 = False
|
nocturnal_symptoms |
int
|
1 = True ,
0 = False
|
inhaler_use |
int
|
1 = True ,
0 = False
|
limited_activities |
int
|
1 = True ,
0 = False
|
exacerbations |
int
|
TODO |
sex |
int
|
0 = female, 1 = male |
age |
float
|
Age in years |
age_at_asthma_dx |
float
|
Age at asthma diagnosis |
time_since_dx |
float
|
Time since asthma diagnosis in years |
time_since_dx_cat |
int
|
1 = TODO, 2 = TODO, 3 = TODO |
control_score |
int
|
0 = maximum control, 4 = minimum control |
control_level |
int
|
Asthma control level:
|
Model¶
Our goal is to fit a model for generating the proportion of time that an individual labelled as
asthmatic
spends in each control level.
Ordinal Regression¶
Ordinal regression
is a type of regression analysis that is used when the response variable
(in our case, the control level) is ordered, but the intervals between the levels are
arbitrary. In our case, the order of the control levels matters
(controlled
< partially-controlled
< uncontrolled
), but the numbers assigned to them
and the distance between those numbers are arbitrary.
To begin, we define our variables:
\(i\): the patient index
\(k\): the asthma control level, where \(k \in \{1,2,3\}\)
\(y^{(i)}\): the asthma control level for patient \(i\), where \(y^{(i)} \in \{1,2,3\}\)
\(\theta_k\): the threshold parameter for the \(k^{th}\) control level
\(x_n^{(i)}\): the \(n^{th}\) covariate for patient \(i\)
\(\beta_n\): the coefficient for the \(n^{th}\) covariate
Then the model is:
where \(\sigma\) is the logistic function:
and the covariates are:
To obtain the probability that a patient is in a specific control level, we use the following:
Random Effects¶
In our model, we also include a random effect to account for the correlation between measurements from the same patient. This is important because the measurements are taken repeatedly over time, and we expect that the measurements from the same patient will be more similar to each other than to measurements from different patients. The random effect is assumed to be normally distributed with mean zero and variance \(\sigma^2\). The model with random effects is:
where \(\beta_0^{(i)}\) is the random effect for patient \(i\).
Fitting the Model with EBA Data¶
The predictions from this model are the probabilities of being in each of the control levels during the 3-month period, but we make the following assumptions to allow us to apply these predictions to our simulation:
We assume that the probability of being in each of the control levels is equivalent to the proportion of time spent in each of the control levels.
We assume that we may extend these predictions from a 3-month period to a 1-year period (this is the time cycle of the simulation).
We assume that the probability of being in a control level does not depend on time.
We assume that the probability of being in a control level does not depend on the past history of asthma control.
We assume that the probability of being in a control level does not depend on the past history of exacerbations.
In short, for each virtual individual (agent) labelled as asthmatic, we sampled an individual-specific intercept from the estimated distribution of the random effects, and with that intercept in the asthma control prediction model, we simulated the proportion of time spent in each of the control levels in each time cycle.
Predictions¶
Once the ordinal regression model has been fit on the EBA
dataset, the coefficients are
saved to the leap/processed_data/config.json
file. During the simulation, these coefficients
are used to determine the probability of being in each of the control levels for each agent
labelled as asthmatic
.
where \(\beta_0^{(i)}\) is assigned to each agent at the beginning of the simulation, sampled randomly from a normal distribution with \(\mu = 0\) and \(\sigma\) as calculated when the model was fit.