Asthma Control Model

Data

Raw Data

EBA was a prospective representative observational study of 618 participants aged 1-85 years (74% were >= 18 years old) with self-reported, physician-diagnosed asthma from BC. The measurements were taken every 3 months for a year. Among 613 patients, only 6% were lost during the one-year follow-up. There were at least 500 cases for each asthma control level. More females were in the cohort as expected (adult asthma is more prevalent among females). Asthma control level changed during the follow-up for 79% of the patients.

We followed the 2020 GINA guidelines to define asthma control level by using the sum of the four indicator variables (0 if no and 1 if yes) in the last 3 months before each measurement:

  1. daily symptoms

  2. nocturnal symptoms

  3. inhaler use

  4. limited activities

If the sum is zero, then the asthma control level is controlled. If it is less than 3, then it is partially-controlled. Otherwise, it is uncontrolled. For responses with do not know to the indicator variables, we treated them as a no. In this analysis, we did not consider treatment nor whether a patient experienced an exacerbation in the last 3 months before the visit. We excluded two patients whose asthma diagnosis dates were earlier than they were born and three patients who had no asthma diagnosis dates, for a final count of 613 patients.

Column Type Description
studyId int 8-digit patient ID
visit int The visit number. Visits were scheduled every 3 months for a year. A value in [1, 5].
daytimeSymptoms int 1 = yes, 2 = no
nocturnalSymptoms int 1 = yes, 2 = no
inhalerUse int 1 = yes, 2 = no
limitedActivities int 1 = yes, 2 = no
exacerbations int TODO
sex int 0 = female, 1 = male
age float Age in years
ageAtAsthmaDx float Age at asthma diagnosis
time_since_Dx float Time since asthma diagnosis in years
time_since_Dx_cat int 1 = TODO, 2 = TODO, 3 = TODO

Processed Data

In keeping with Python conventions, the columns were converted to snake case. In addition, studyId was renamed to patient_id, as studyId indicates that the ID is for a given study, when in fact the ID was for an individual patient.

The variables daytimeSymptoms, nocturnalSymptoms, inhalerUse, and limitedActivities were converted to binary variables, where 1 = True and 0 = False.

We also needed to compute the asthma control level from the four indicator variables. We first computed the control_score, defined as:

\[\text{control_score} = \text{daytime_symptoms} + \text{nocturnal_symptoms} + \text{inhaler_use} + \text{limited_activities}\]

which has a minimum value of 0 (maximum control) and a maximum value of 4 (minimum control).

Then we defined the asthma control level as follows:

\[\begin{split}\text{control_level} = \begin{cases} 1 & \text{control_score} = 0 \\ 2 & 0 ~ < \text{control_score} < 3 \\ 3 & \text{control_score} \geq 3 \end{cases}\end{split}\]
Column Type Description
patient_id int 8-digit patient ID
visit int The visit number. Visits were scheduled every 3 months for a year. A value in [1, 5].
daytime_symptoms int 1 = True, 0 = False
nocturnal_symptoms int 1 = True, 0 = False
inhaler_use int 1 = True, 0 = False
limited_activities int 1 = True, 0 = False
exacerbations int TODO
sex int 0 = female, 1 = male
age float Age in years
age_at_asthma_dx float Age at asthma diagnosis
time_since_dx float Time since asthma diagnosis in years
time_since_dx_cat int 1 = TODO, 2 = TODO, 3 = TODO
control_score int 0 = maximum control, 4 = minimum control
control_level int Asthma control level:
  • 1 = fully-controlled
  • 2 = partially-controlled
  • 3 = uncontrolled

Model

Our goal is to fit a model for generating the proportion of time that an individual labelled as asthmatic spends in each control level.

Ordinal Regression

Ordinal regression is a type of regression analysis that is used when the response variable (in our case, the control level) is ordered, but the intervals between the levels are arbitrary. In our case, the order of the control levels matters (controlled < partially-controlled < uncontrolled), but the numbers assigned to them and the distance between those numbers are arbitrary.

To begin, we define our variables:

  • \(i\): the patient index

  • \(k\): the asthma control level, where \(k \in \{1,2,3\}\)

  • \(y^{(i)}\): the asthma control level for patient \(i\), where \(y^{(i)} \in \{1,2,3\}\)

  • \(\theta_k\): the threshold parameter for the \(k^{th}\) control level

  • \(x_n^{(i)}\): the \(n^{th}\) covariate for patient \(i\)

  • \(\beta_n\): the coefficient for the \(n^{th}\) covariate

Then the model is:

\[\begin{align} P(y^{(i)} \leq k) = \sigma(\theta_k + \sum_{n=1}^{N} \beta_n x_n^{(i)}) \end{align}\]

where \(\sigma\) is the logistic function:

\[\begin{align} \sigma(x) = \dfrac{1}{1 + e^{-x}} \end{align}\]

and the covariates are:

\[\sum_{n=1}^{N} \beta_n x_n := \beta_{\text{age}} \cdot \text{age} + \beta_{\text{sex}} \cdot \text{sex} + \beta_{\text{age2}} \cdot \text{age}^2 + \beta_{\text{sexage}} \cdot \text{sex} \cdot \text{age} + \beta_{\text{sexage2}} \cdot \text{sex} \cdot \text{age}^2\]

To obtain the probability that a patient is in a specific control level, we use the following:

\[\begin{align} P(y^{(i)} = k) = P(y^{(i)} \leq k) - P(y^{(i)} \leq k-1) \end{align}\]

Random Effects

In our model, we also include a random effect to account for the correlation between measurements from the same patient. This is important because the measurements are taken repeatedly over time, and we expect that the measurements from the same patient will be more similar to each other than to measurements from different patients. The random effect is assumed to be normally distributed with mean zero and variance \(\sigma^2\). The model with random effects is:

\[\begin{align} P(y^{(i)} \leq k) = \sigma(\theta_k + \sum_{n=1}^{N} \beta_n x_n^{(i)} + \beta_0^{(i)}) \end{align}\]

where \(\beta_0^{(i)}\) is the random effect for patient \(i\).

Fitting the Model with EBA Data

The predictions from this model are the probabilities of being in each of the control levels during the 3-month period, but we make the following assumptions to allow us to apply these predictions to our simulation:

  1. We assume that the probability of being in each of the control levels is equivalent to the proportion of time spent in each of the control levels.

  2. We assume that we may extend these predictions from a 3-month period to a 1-year period (this is the time cycle of the simulation).

  3. We assume that the probability of being in a control level does not depend on time.

  4. We assume that the probability of being in a control level does not depend on the past history of asthma control.

  5. We assume that the probability of being in a control level does not depend on the past history of exacerbations.

In short, for each virtual individual (agent) labelled as asthmatic, we sampled an individual-specific intercept from the estimated distribution of the random effects, and with that intercept in the asthma control prediction model, we simulated the proportion of time spent in each of the control levels in each time cycle.

Predictions

Once the ordinal regression model has been fit on the EBA dataset, the coefficients are saved to the leap/processed_data/config.json file. During the simulation, these coefficients are used to determine the probability of being in each of the control levels for each agent labelled as asthmatic.

\[\begin{align} P(y^{(i)} \leq k) = \sigma(\theta_k + \sum_{n=1}^{N} \beta_n x_n^{(i)} + \beta_0^{(i)}) \end{align}\]

where \(\beta_0^{(i)}\) is assigned to each agent at the beginning of the simulation, sampled randomly from a normal distribution with \(\mu = 0\) and \(\sigma\) as calculated when the model was fit.