Exacerbation Calibration Data

The number of exacerbations in a given year is modelled using a Poisson distribution. The formula is:

\[\begin{align} N_{\text{exacerbations}} &\sim \text{Poisson}(\lambda) = \dfrac{\lambda^k e^{-\lambda}}{k!} \end{align}\]

Here \(\lambda\) is the expected number of exacerbations per year. To obtain \(\lambda\), we must perform a Poisson regression. The Poisson regression assumes that the value we are interested in can be approximated using the following formula:

\[\begin{align} \ln(\lambda) &= \ln(\alpha) + \beta_0 + \beta_{a} a + \beta_{s} s + \sum_{i=1}^3 \beta_i c_i \end{align}\]

where:

  • \(\alpha\): calibration multiplier

  • \(a\): age

  • \(\beta_a\): age constant

  • \(s\): sex

  • \(\beta_s\): sex constant

  • \(c_i\): relative time spent in control level \(i\)

  • \(\beta_i\): control level constant

In the exacerbation_data.py file, we are interested in calculating \(\alpha\). If we rewrite the equation, the meaning of \(\alpha\) becomes more apparent:

\[\begin{align} \lambda &= \alpha \cdot e^{\beta_0} e^{\beta_{a} a} e^{\beta_{s} s} \prod_{i=1}^3 e^{\beta_i c_i} \end{align}\]

How do we obtain \(\alpha\)? We again assume that the mean value has the same form as in a Poisson regression, with the following formula:

\[\begin{align} \ln(\lambda_{C}) &= \sum_{i=1}^3 \gamma_i c_i \end{align}\]
  • \(\lambda_C\): the average number of exacerbations in a given year

  • \(c_i\): relative time spent in control level \(i\)

  • \(\gamma_i\): control level constant (different from \(\beta_i\) above)

Here, the \(\gamma_i\) values were calculated from the Economic Burden of Asthma (EBA) study and are given by:

\[\begin{split}\begin{align} \gamma_1 &:= 0.1880058 & \text{rate(exacerbation | fully controlled)}\\ \gamma_2 &:= 0.3760116 & \text{rate(exacerbation | partially controlled)}\\ \gamma_3 &:= 0.5640174 & \text{rate(exacerbation | uncontrolled)} \end{align}\end{split}\]

The number of exacerbations predicted by the model is then:

\[\begin{align} N_{\text{exac}}^{\text{(pred)}} &= \lambda_C \cdot N_{\text{asthma}} \end{align}\]
  • \(N_{\text{asthma}}\): the number of people in a given year, age, and sex

and number of hospitalizations is:

\[\begin{align} N_{\text{hosp}}^{\text{(pred)}} &= N_{\text{exac}}^{\text{(pred)}} \cdot P(\text{hosp}) \end{align}\]
  • \(N_{\text{hosp}}^{\text{(pred)}}\): the predicted number of hospitalizations for a given year, age, and sex

  • \(P(\text{hosp})\): the probability of hospitalization due to asthma given the patient has an asthma exacerbation

Finally, \(\alpha\) can be computed:

\[\begin{align} \alpha(a, s, y) &= \dfrac{N_{\text{hosp}}(a, s, y)}{N_{\text{hosp}}^{\text{(pred)}}(a, s, y)} \end{align}\]

To run the data generation for the exacerbation data:

cd LEAP
python3 leap/data_generation/exacerbation_data.py

leap.data_generation.exacerbation_data module

leap.data_generation.exacerbation_data.exacerbation_prediction(sex: str, age: int, gamma_control: list[float] | None = None)[source]

Calculate the mean number of exacerbations for a given age and sex.

\[\ln(\lambda_{C}) = \sum_{i=1}^3 \gamma_i c_i\]

where:

  • \(\lambda_{C}\) is the predicted average number of asthma exacerbations per year.

  • \(\gamma_i\) is the control parameter.

  • \(c_i\) is the relative time spent in control level \(i\).

Here the \(\gamma_i\) values were calculated from the Economic Burden of Asthma (EBA) study and are given by:

\begin{align*} \gamma_1 &:= 0.1880058 & \text{rate(exacerbation | fully controlled)}\\ \gamma_2 &:= 0.3760116 & \text{rate(exacerbation | partially controlled)}\\ \gamma_3 &:= 0.5640174 & \text{rate(exacerbation | uncontrolled)} \end{align*}
Parameters:
sex: str

One of “M” or “F”.

age: int

Integer age, a value in [3, 90].

gamma_control: list[float] | None = None

A list of three floats, the control parameters.

Returns:

The predicted number of exacerbations per year per person with asthma.

leap.data_generation.exacerbation_data.parse_sex(x: str) str | float[source]

Reformat a string containing sex information.

Parameters:
x: str

A string containing sex information. For example, Female, Male, M, or F.

Returns:

Either M or F, or np.nan if the string does not contain sex information.

leap.data_generation.exacerbation_data.parse_age(x: str) int | float[source]

Reformat a string containing age information.

Parameters:
x: str

A string containing age information. If the string is in the format {sex}_{age}, we parse the integer age. For example, F_90 or M_1.

Returns:

The integer age, or np.nan if the string does not contain age information.

leap.data_generation.exacerbation_data.load_hospitalization_data(province: str = 'CA', starting_year: int = 2000, min_age: int = 3) pandas.core.frame.DataFrame[source]

Load the hospitalization data for the given province and starting year.

The data is from the Hospital Morbidity Database (HMDB) from the Canadian Institute for Health Information (CIHI).

The hospitalization data was collected from patients presenting to a hospital in Canada due to an asthma exacerbation. We will use this data to calibrate the exacerbation model.

Parameters:
province: str = 'CA'

The province for which to load the hospitalization data.

starting_year: int = 2000

The starting year for which to load the hospitalization data.

min_age: int = 3

The minimum age for to be used in the data. We are assuming that asthma diagnoses are made at age 3 and older, so the default is 3.

Returns:

The hospitalization data for the given province and starting year. Columns:

  • year: The year of the data.

  • sex: One of M = male, F = female.

  • age: Integer age, a value in [3, 90].

  • hospitalization_rate: The observed number of hospitalizations per 100 000 people for a given year, age, and sex.

leap.data_generation.exacerbation_data.load_population_data(province: str, starting_year: int, projection_scenario: str, max_year: int, min_age: int = 3, max_age: int = 90) pandas.core.frame.DataFrame[source]

Load the population data for the given province, starting year, and projection scenario.

The population data was generated by the leap/data_generation/birth_data.py script.

Parameters:
province: str

The 2-letter abbreviation for the province.

starting_year: int

The starting year for the population data.

projection_scenario: str

The projection scenario for the population data.

max_year: int

The maximum year for the population data.

min_age: int = 3

The minimum age for the population data.

max_age: int = 90

The maximum age for the population data.

Returns:

A dataframe containing the Canadian population data. Columns:

  • year: The year of the data.

  • age: A value in [min_age, max_age].

  • province: The 2-letter province abbreviation.

  • sex: One of M = male, F = female.

  • n: The number of people in a given year, age, sex, province, and projection scenario.

leap.data_generation.exacerbation_data.exacerbation_calibrator(province: str = 'CA', starting_year: int = 2000, max_year: int = 2065, min_age: int = 3, max_age: int = 90, prob_hosp: float = 0.026, projection_scenario: str = 'M3') pandas.core.frame.DataFrame[source]

Compute the ratio between the observed and predicted hospitalization rates.

Parameters:
province: str = 'CA'

The 2-letter abbreviation for the province.

starting_year: int = 2000

The starting year for the calibration.

max_year: int = 2065

The maximum year for the calibration.

min_age: int = 3

The minimum age for the calibration.

max_age: int = 90

The maximum age for the calibration.

prob_hosp: float = 0.026

The probability of a very severe exacerbation, defined as an exacerbation that requires hospitalization.

projection_scenario: str = 'M3'

The projection scenario for the population data. One of:

  • LG: low-growth projection

  • HG: high-growth projection

  • M1: medium-growth 1 projection

  • M2: medium-growth 2 projection

  • M3: medium-growth 3 projection

  • M4: medium-growth 4 projection

  • M5: medium-growth 5 projection

  • M6: medium-growth 6 projection

  • FA: fast-aging projection

  • SA: slow-aging projection

See: StatCan Projection Scenarios.

Returns:

  • year: The year of the data, a value in [starting_year, max_year].

  • age: The integer age, a value in [min_age, max_age].

  • sex: One of M or F.

  • calibrator_multiplier: The ratio between the observed and predicted number of hospitalizations.

Return type:

A dataframe with the following columns

leap.data_generation.exacerbation_data.generate_exacerbation_calibration_data()[source]

Generate the exacerbation calibration data for all provinces.