Death Data

To obtain the mortality data for each year, we used one table from StatCan:

  1. 1996 - 2021:

    For past years, we used Table 13-10-00837-01 from StatCan.

    The *.csv file can be downloaded from here: 13100837-eng.zip.

    and is saved as: LEAP/leap/original_data/13100837.csv

  2. 2021 - 2068:

    StatCan doesn’t provide annual projections for death probabilities, but does provide a projection for specific years (which we call calibration years) for the M3 projection scenario only. For Canada, this is 2068, and for BC, 2043. The following equation can be used to obtain the probability of death in future years:

    \[\sigma^{-1}(p(s, a, y)) = \sigma^{-1}(p(s, a, y_0)) - e^{\beta(s)(y - y_0)}\]

    where:

    • \(\sigma^{-1}\) is the inverse sigmoid function, also known as the logit function:

    \[\sigma^{-1}(p) = \ln\left(\dfrac{p}{1-p}\right)\]

    and:

    • \(a\) is the age

    • \(s\) is the sex

    • \(y_0\) is the year the collected data ends (in our case, 2020)

    • \(y\) is the future year

    • \(p(s, a, y_0)\) is the probability of death for a person of that age/sex in the year the collected data ends (in our case, 2020)

    • \(p(s, a, y)\) is the probability of death for a person of that age/sex in a future year.

    The parameter \(\beta(s)\) is unknown, and so we first need to calculate it. To do so, we set \(y = \text{calibration_year}\), and use the Brent root-finding algorithm to optimize \(\beta(s)\) such that the life expectancy in the calibration year (which is known) matches the predicted life expectancy.

    Once we have found \(\beta(s)\), we can use this formula to find the projected death probabilities.

To run the data generation for the mortality data:

cd LEAP
python3 leap/data_generation/death_data.py

This will update the following data file:

  1. leap/processed_data/life_table.csv

leap.data_generation.death_data module

leap.data_generation.death_data.calculate_life_expectancy(life_table: pandas.core.frame.DataFrame) float[source]

Determine the life expectancy for a person born in a given year.

The life expectancy can be calculated from the death probability using the formulae delineated here: Life Table Definitions

Parameters:
life_table: pandas.core.frame.DataFrame

A dataframe containing the probability of death for a single year, province and sex, for each age. Columns:

  • age: the integer age.

  • sex: One of M = male, F = female.

  • year: the integer calendar year.

  • province: A string indicating the province abbreviation, e.g. "BC".

    For all of Canada, set province to "CA".

  • prob_death: the probability of death for a given age, province, sex, and year.

Returns:

The life expectancy for a person born in the given year, in a given province, for a given sex.

leap.data_generation.death_data.get_prob_death_projected(prob_death: float, year_initial: int, year: int, beta_year: float) float[source]

Given the (known) prob death for a past year, calculate the prob death in a future year.

\[\sigma^{-1}(p(\text{sex}, \text{age}, \text{year})) = \sigma^{-1}(p(\text{sex}, \text{age}, \text{year}_0)) - e^{\beta(\text{sex})(\text{year} - \text{year}_0)}\]
Parameters:
prob_death: float

The probability of death for year_initial, the last year that past data was collected, for a given age, sex, province, and projection scenario.

year_initial: int

The initial year with a known probability of death. This is the last year that the past data was collected.

year: int

The current year.

beta_year: float

The beta parameter for the given sex, province, and projection scenario.

Returns:

The projected probability of death for the current year.

leap.data_generation.death_data.get_projected_life_table_single_year(beta_year: float, life_table: pandas.core.frame.DataFrame, year_initial: int, year: int, sex: str, province: str) pandas.core.frame.DataFrame[source]

Get the life table for a single year.

Parameters:
beta_year: float

The beta parameter for the given year.

life_table: pandas.core.frame.DataFrame

A dataframe containing the projected probability of death for the starting year, for a given sex and province. Columns:

  • age: the integer age.

  • sex: One of M = male, F = female.

  • year: the starting calendar year.

  • province: a string indicating the province abbreviation, e.g. "BC". For all of Canada, set province to "CA".

  • prob_death: the probability of death for a given age, province, sex, and year.

year_initial: int

The initial year with a known probability of death. This is the last year that the past data was collected.

year: int

The current year.

sex: str

One of M = male, F = female.

province: str

a string indicating the province abbreviation, e.g. "BC". For all of Canada, set province to "CA".

Returns:

A dataframe containing the projected probability of death for the given year, sex, and province.

leap.data_generation.death_data.beta_year_optimizer(beta_year: float, life_table: pandas.core.frame.DataFrame, sex: str, province: str, year_initial: int, year: int) float[source]

Calculate the difference between the projected life expectancy and desired life expectancy.

This function is passed to the scipy.optimize.brentq function. We want to find beta_year such that the projected life expectancy is as close as possible to the desired life expectancy.

Parameters:
beta_year: float

The beta parameter for the given year.

life_table: pandas.core.frame.DataFrame

A dataframe containing the projected probability of death for the calibration year, for a given sex and province. Columns:

  • age: the integer age.

  • sex: one of M = male, F = female.

  • year: the calibration calendar year.

  • province: a 2-letter string indicating the province abbreviation, e.g. "BC". For all of Canada, set province to "CA".

  • prob_death: the probability of death for a given age, province, sex, and year.

sex: str

one of M = male, F = female.

province: str

A 2-letter string indicating the province abbreviation, e.g. "BC". For all of Canada, set province to "CA".

year_initial: int

The initial year with a known probability of death. This is the last year that the past data was collected.

year: int

The current year.

Returns:

The difference between the projected life expectancy of the calibration year and the desired life expectancy.

leap.data_generation.death_data.load_past_death_data() pandas.core.frame.DataFrame[source]

Load the past death data from the StatCan CSV file.

Returns:

A dataframe containing the probability of death and the standard error for each year, province, age, and sex. Columns:

  • year: The integer calendar year.

  • province: a 2-letter string indicating the province abbreviation, e.g. "BC". For all of Canada, set province to "CA".

  • sex: One of M = male, F = female.

  • age: The integer age.

  • prob_death: The probability that a person of the given age, sex, and province will die in the given year.

  • se: The standard error of the probability of death.

leap.data_generation.death_data.load_projected_death_data(past_life_table: pandas.core.frame.DataFrame, a: float = -0.03, b: float = -0.01, xtol: float = 1e-05) pandas.core.frame.DataFrame[source]

Load the projected death data from StatCan CSV file.

Parameters:
past_life_table: pandas.core.frame.DataFrame

A dataframe containing the probability of death and the standard error for each year, province, age, and sex. Columns:

  • year: the integer calendar year.

  • province: A 2-letter string indicating the province abbreviation, e.g. "BC". For all of Canada, set province to "CA".

  • sex: One of M = male, F = female.

  • age: the integer age.

  • prob_death: the probability of death.

  • se: the standard error of the probability of death.

a: float = -0.03

The lower bound for the beta parameter.

b: float = -0.01

The upper bound for the beta parameter.

xtol: float = 1e-05

The tolerance for the beta parameter.

Returns:

A dataframe containing the predicted probability of death and the standard error for each year, province, age, and sex. Columns:

  • year: The integer calendar year.

  • province: A 2-letter string indicating the province abbreviation, e.g. "BC". For all of Canada, set province to "CA".

  • sex: One of M = male, F = female.

  • age: The integer age.

  • prob_death: The probability that a person of the given age, sex, and province will die in the given year.

  • se: The standard error of the probability of death.

leap.data_generation.death_data.generate_death_data()[source]

Generate the mortality data CSV.