Asthma Control Model¶

Data¶

Raw Data¶

EBA was a prospective representative observational study of 618 participants aged 1-85 years (74% were >= 18 years old) with self-reported, physician-diagnosed asthma from BC. The measurements were taken every 3 months for a year. Among 613 patients, only 6% were lost during the one-year follow-up. There were at least 500 cases for each asthma control level. More females were in the cohort as expected (adult asthma is more prevalent among females). Asthma control level changed during the follow-up for 79% of the patients.

We followed the 2020 GINA guidelines to define asthma control level by using the sum of the four indicator variables (0 if no and 1 if yes) in the last 3 months before each measurement:

daily symptoms
nocturnal symptoms
inhaler use
limited activities

If the sum is zero, then the asthma control level is controlled. If it is less than 3, then it is partially-controlled. Otherwise, it is uncontrolled. For responses with do not know to the indicator variables, we treated them as a no. In this analysis, we did not consider treatment nor whether a patient experienced an exacerbation in the last 3 months before the visit. We excluded two patients whose asthma diagnosis dates were earlier than they were born and three patients who had no asthma diagnosis dates, for a final count of 613 patients.

Column	Type	Description
`studyId`	`int`	8-digit patient ID
`visit`	`int`	The visit number. Visits were scheduled every 3 months for a year. A value in `[1, 5]`.
`daytimeSymptoms`	`int`	1 = yes, 2 = no
`nocturnalSymptoms`	`int`	1 = yes, 2 = no
`inhalerUse`	`int`	1 = yes, 2 = no
`limitedActivities`	`int`	1 = yes, 2 = no
`exacerbations`	`int`	TODO
`sex`	`int`	0 = female, 1 = male
`age`	`float`	Age in years
`ageAtAsthmaDx`	`float`	Age at asthma diagnosis
`time_since_Dx`	`float`	Time since asthma diagnosis in years
`time_since_Dx_cat`	`int`	1 = TODO, 2 = TODO, 3 = TODO

Processed Data¶

In keeping with Python conventions, the columns were converted to snake case. In addition, studyId was renamed to patient_id, as studyId indicates that the ID is for a given study, when in fact the ID was for an individual patient.

The variables daytimeSymptoms, nocturnalSymptoms, inhalerUse, and limitedActivities were converted to binary variables, where 1 = True and 0 = False.

We also needed to compute the asthma control level from the four indicator variables. We first computed the control_score, defined as:

\[\text{control_score} = \text{daytime_symptoms} + \text{nocturnal_symptoms} + \text{inhaler_use} + \text{limited_activities}\]

which has a minimum value of 0 (maximum control) and a maximum value of 4 (minimum control).

Then we defined the asthma control level as follows:

\[\begin{split}\text{control_level} = \begin{cases} 1 & \text{control_score} = 0 \\ 2 & 0 ~ < \text{control_score} < 3 \\ 3 & \text{control_score} \geq 3 \end{cases}\end{split}\]

Column	Type	Description
`patient_id`	`int`	8-digit patient ID
`visit`	`int`	The visit number. Visits were scheduled every 3 months for a year. A value in `[1, 5]`.
`daytime_symptoms`	`int`	`1 = True`, `0 = False`
`nocturnal_symptoms`	`int`	`1 = True`, `0 = False`
`inhaler_use`	`int`	`1 = True`, `0 = False`
`limited_activities`	`int`	`1 = True`, `0 = False`
`exacerbations`	`int`	TODO
`sex`	`int`	0 = female, 1 = male
`age`	`float`	Age in years
`age_at_asthma_dx`	`float`	Age at asthma diagnosis
`time_since_dx`	`float`	Time since asthma diagnosis in years
`time_since_dx_cat`	`int`	1 = TODO, 2 = TODO, 3 = TODO
`control_score`	`int`	0 = maximum control, 4 = minimum control
`control_level`	`int`	Asthma control level: 1 = fully-controlled 2 = partially-controlled 3 = uncontrolled

Model¶

Our goal is to fit a model for generating the proportion of time that an individual labelled as asthmatic spends in each control level.

Ordinal Regression¶

Ordinal regression is a type of regression analysis that is used when the response variable (in our case, the control level) is ordered, but the intervals between the levels are arbitrary. In our case, the order of the control levels matters (controlled < partially-controlled < uncontrolled), but the numbers assigned to them and the distance between those numbers are arbitrary.

To begin, we define our variables:

\(i\): the patient index
\(k\): the asthma control level, where \(k \in \{1,2,3\}\)
\(y^{(i)}\): the asthma control level for patient \(i\), where \(y^{(i)} \in \{1,2,3\}\)
\(\theta_k\): the threshold parameter for the \(k^{th}\) control level
\(x_n^{(i)}\): the \(n^{th}\) covariate for patient \(i\)
\(\beta_n\): the coefficient for the \(n^{th}\) covariate

Then the model is:

\[\begin{align} P(y^{(i)} \leq k) = \sigma(\theta_k + \sum_{n=1}^{N} \beta_n x_n^{(i)}) \end{align}\]

where \(\sigma\) is the logistic function:

\[\begin{align} \sigma(x) = \dfrac{1}{1 + e^{-x}} \end{align}\]

and the covariates are:

\[\sum_{n=1}^{N} \beta_n x_n := \beta_{\text{age}} \cdot \text{age} + \beta_{\text{sex}} \cdot \text{sex} + \beta_{\text{age2}} \cdot \text{age}^2 + \beta_{\text{sexage}} \cdot \text{sex} \cdot \text{age} + \beta_{\text{sexage2}} \cdot \text{sex} \cdot \text{age}^2\]

To obtain the probability that a patient is in a specific control level, we use the following:

\[\begin{align} P(y^{(i)} = k) = P(y^{(i)} \leq k) - P(y^{(i)} \leq k-1) \end{align}\]

Random Effects¶

In our model, we also include a random effect to account for the correlation between measurements from the same patient. This is important because the measurements are taken repeatedly over time, and we expect that the measurements from the same patient will be more similar to each other than to measurements from different patients. The random effect is assumed to be normally distributed with mean zero and variance \(\sigma^2\). The model with random effects is:

\[\begin{align} P(y^{(i)} \leq k) = \sigma(\theta_k + \sum_{n=1}^{N} \beta_n x_n^{(i)} + \beta_0^{(i)}) \end{align}\]

where \(\beta_0^{(i)}\) is the random effect for patient \(i\).

Fitting the Model with EBA Data¶

The predictions from this model are the probabilities of being in each of the control levels during the 3-month period, but we make the following assumptions to allow us to apply these predictions to our simulation:

We assume that the probability of being in each of the control levels is equivalent to the proportion of time spent in each of the control levels.
We assume that we may extend these predictions from a 3-month period to a 1-year period (this is the time cycle of the simulation).
We assume that the probability of being in a control level does not depend on time.
We assume that the probability of being in a control level does not depend on the past history of asthma control.
We assume that the probability of being in a control level does not depend on the past history of exacerbations.

In short, for each virtual individual (agent) labelled as asthmatic, we sampled an individual-specific intercept from the estimated distribution of the random effects, and with that intercept in the asthma control prediction model, we simulated the proportion of time spent in each of the control levels in each time cycle.

Predictions¶

Once the ordinal regression model has been fit on the EBA dataset, the coefficients are saved to the leap/processed_data/config.json file. During the simulation, these coefficients are used to determine the probability of being in each of the control levels for each agent labelled as asthmatic.

\[\begin{align} P(y^{(i)} \leq k) = \sigma(\theta_k + \sum_{n=1}^{N} \beta_n x_n^{(i)} + \beta_0^{(i)}) \end{align}\]

where \(\beta_0^{(i)}\) is assigned to each agent at the beginning of the simulation, sampled randomly from a normal distribution with \(\mu = 0\) and \(\sigma\) as calculated when the model was fit.