Exacerbation Calibration Data¶

The number of exacerbations in a given year is modelled using a Poisson distribution. The formula is:

\[\begin{align} N_{\text{exacerbations}} &\sim \text{Poisson}(\lambda) = \dfrac{\lambda^k e^{-\lambda}}{k!} \end{align}\]

Here \(\lambda\) is the expected number of exacerbations per year. To obtain \(\lambda\), we must perform a Poisson regression. The Poisson regression assumes that the value we are interested in can be approximated using the following formula:

\[\begin{align} \ln(\lambda) &= \ln(\alpha) + \beta_0 + \beta_{a} a + \beta_{s} s + \sum_{i=1}^3 \beta_i c_i \end{align}\]

where:

\(\alpha\): calibration multiplier
\(a\): age
\(\beta_a\): age constant
\(s\): sex
\(\beta_s\): sex constant
\(c_i\): relative time spent in control level \(i\)
\(\beta_i\): control level constant

In the exacerbation_data.py file, we are interested in calculating \(\alpha\). If we rewrite the equation, the meaning of \(\alpha\) becomes more apparent:

\[\begin{align} \lambda &= \alpha \cdot e^{\beta_0} e^{\beta_{a} a} e^{\beta_{s} s} \prod_{i=1}^3 e^{\beta_i c_i} \end{align}\]

How do we obtain \(\alpha\)? We again assume that the mean value has the same form as in a Poisson regression, with the following formula:

\[\begin{align} \ln(\lambda_{C}) &= \sum_{i=1}^3 \gamma_i c_i \end{align}\]

\(\lambda_C\): the average number of exacerbations in a given year
\(c_i\): relative time spent in control level \(i\)
\(\gamma_i\): control level constant (different from \(\beta_i\) above)

Here, the \(\gamma_i\) values were calculated from the Economic Burden of Asthma (EBA) study and are given by:

\[\begin{split}\begin{align} \gamma_1 &:= 0.1880058 & \text{rate(exacerbation | fully controlled)}\\ \gamma_2 &:= 0.3760116 & \text{rate(exacerbation | partially controlled)}\\ \gamma_3 &:= 0.5640174 & \text{rate(exacerbation | uncontrolled)} \end{align}\end{split}\]

The number of exacerbations predicted by the model is then:

\[\begin{align} N_{\text{exac}}^{\text{(pred)}} &= \lambda_C \cdot N_{\text{asthma}} \end{align}\]

\(N_{\text{asthma}}\): the number of people in a given year, age, and sex

and number of hospitalizations is:

\[\begin{align} N_{\text{hosp}}^{\text{(pred)}} &= N_{\text{exac}}^{\text{(pred)}} \cdot P(\text{hosp}) \end{align}\]

\(N_{\text{hosp}}^{\text{(pred)}}\): the predicted number of hospitalizations for a given year, age, and sex
\(P(\text{hosp})\): the probability of hospitalization due to asthma given the patient has an asthma exacerbation

Finally, \(\alpha\) can be computed:

\[\begin{align} \alpha(a, s, y) &= \dfrac{N_{\text{hosp}}(a, s, y)}{N_{\text{hosp}}^{\text{(pred)}}(a, s, y)} \end{align}\]

To run the data generation for the exacerbation data:

cd LEAP
python3 leap/data_generation/exacerbation_data.py

leap.data_generation.exacerbation_data module¶

leap.data_generation.exacerbation_data.exacerbation_prediction(sex: str, age: int, gamma_control: list[float] | None = None)[source]¶

Calculate the mean number of exacerbations for a given age and sex.

\[\ln(\lambda_{C}) = \sum_{i=1}^3 \gamma_i c_i\]

where:

\(\lambda_{C}\) is the predicted average number of asthma exacerbations per year.
\(\gamma_i\) is the control parameter.
\(c_i\) is the relative time spent in control level \(i\).

Here the \(\gamma_i\) values were calculated from the Economic Burden of Asthma (EBA) study and are given by:

\begin{align*} \gamma_1 &:= 0.1880058 & \text{rate(exacerbation | fully controlled)}\\ \gamma_2 &:= 0.3760116 & \text{rate(exacerbation | partially controlled)}\\ \gamma_3 &:= 0.5640174 & \text{rate(exacerbation | uncontrolled)} \end{align*}

Parameters:¶

sex: str¶: One of “M” or “F”.
age: int¶: Integer age, a value in [3, 90].
gamma_control: list[float] | None = None¶: A list of three floats, the control parameters.

Returns:¶

The predicted number of exacerbations per year per person with asthma.

leap.data_generation.exacerbation_data.parse_sex(x: str) → str | float[source]¶

Reformat a string containing sex information.

Parameters:¶

x: str¶: A string containing sex information. For example, Female, Male, M, or F.

Returns:¶

Either M or F, or np.nan if the string does not contain sex information.

leap.data_generation.exacerbation_data.parse_age(x: str) → int | float[source]¶

Reformat a string containing age information.

Parameters:¶

x: str¶: A string containing age information. If the string is in the format {sex}_{age}, we parse the integer age. For example, F_90 or M_1.

Returns:¶

The integer age, or np.nan if the string does not contain age information.

leap.data_generation.exacerbation_data.load_hospitalization_data(province: str = 'CA', starting_year: int = 2000, min_age: int = 3) → pandas.core.frame.DataFrame[source]¶

Load the hospitalization data for the given province and starting year.

The data is from the Hospital Morbidity Database (HMDB) from the Canadian Institute for Health Information (CIHI).

The hospitalization data was collected from patients presenting to a hospital in Canada due to an asthma exacerbation. We will use this data to calibrate the exacerbation model.

Parameters:¶

province: str = 'CA'¶: The province for which to load the hospitalization data.
starting_year: int = 2000¶: The starting year for which to load the hospitalization data.
min_age: int = 3¶: The minimum age for to be used in the data. We are assuming that asthma diagnoses are made at age 3 and older, so the default is 3.

Returns:¶

The hospitalization data for the given province and starting year. Columns:

year: The year of the data.
sex: One of M = male, F = female.
age: Integer age, a value in [3, 90].
hospitalization_rate: The observed number of hospitalizations per 100 000 people for a given year, age, and sex.

leap.data_generation.exacerbation_data.load_population_data(province: str, starting_year: int, projection_scenario: str, max_year: int, min_age: int = 3, max_age: int = 90) → pandas.core.frame.DataFrame[source]¶

Load the population data for the given province, starting year, and projection scenario.

The population data was generated by the leap/data_generation/birth_data.py script.

Parameters:¶

province: str¶: The 2-letter abbreviation for the province.
starting_year: int¶: The starting year for the population data.
projection_scenario: str¶: The projection scenario for the population data.
max_year: int¶: The maximum year for the population data.
min_age: int = 3¶: The minimum age for the population data.
max_age: int = 90¶: The maximum age for the population data.

Returns:¶

A dataframe containing the Canadian population data. Columns:

year: The year of the data.
age: A value in [min_age, max_age].
province: The 2-letter province abbreviation.
sex: One of M = male, F = female.
n: The number of people in a given year, age, sex, province, and projection scenario.

leap.data_generation.exacerbation_data.exacerbation_calibrator(province: str = 'CA', starting_year: int = 2000, max_year: int = 2065, min_age: int = 3, max_age: int = 90, prob_hosp: float = 0.026, projection_scenario: str = 'M3') → pandas.core.frame.DataFrame[source]¶

Compute the ratio between the observed and predicted hospitalization rates.

Parameters:¶

province: str = 'CA'¶

The 2-letter abbreviation for the province.

starting_year: int = 2000¶

The starting year for the calibration.

max_year: int = 2065¶

The maximum year for the calibration.

min_age: int = 3¶

The minimum age for the calibration.

max_age: int = 90¶

The maximum age for the calibration.

prob_hosp: float = 0.026¶

The probability of a very severe exacerbation, defined as an exacerbation that requires hospitalization.

projection_scenario: str = 'M3'¶

The projection scenario for the population data. One of:

LG: low-growth projection
HG: high-growth projection
M1: medium-growth 1 projection
M2: medium-growth 2 projection
M3: medium-growth 3 projection
M4: medium-growth 4 projection
M5: medium-growth 5 projection
M6: medium-growth 6 projection
FA: fast-aging projection
SA: slow-aging projection

See: StatCan Projection Scenarios.

Returns:¶

year: The year of the data, a value in [starting_year, max_year].
age: The integer age, a value in [min_age, max_age].
sex: One of M or F.
calibrator_multiplier: The ratio between the observed and predicted number of hospitalizations.

Return type:¶

A dataframe with the following columns

leap.data_generation.exacerbation_data.generate_exacerbation_calibration_data()[source]¶: Generate the exacerbation calibration data for all provinces.