Death Data¶
To obtain the mortality data for each year, we used one table from StatCan
:
1996 - 2021:
For past years, we used Table 13-10-00837-01 from StatCan.
The
*.csv
file can be downloaded from here: 13100837-eng.zip.and is saved as:
LEAP/leap/original_data/13100837.csv
2021 - 2068:
StatCan
doesn’t provide annual projections for death probabilities, but does provide a projection for specific years (which we call calibration years) for theM3
projection scenario only. For Canada, this is 2068, and for BC, 2043. The following equation can be used to obtain the probability of death in future years:\[\sigma^{-1}(p(s, a, y)) = \sigma^{-1}(p(s, a, y_0)) - e^{\beta(s)(y - y_0)}\]where:
\(\sigma^{-1}\) is the inverse sigmoid function, also known as the logit function:
\[\sigma^{-1}(p) = \ln\left(\dfrac{p}{1-p}\right)\]and:
\(a\) is the age
\(s\) is the sex
\(y_0\) is the year the collected data ends (in our case, 2020)
\(y\) is the future year
\(p(s, a, y_0)\) is the probability of death for a person of that age/sex in the year the collected data ends (in our case, 2020)
\(p(s, a, y)\) is the probability of death for a person of that age/sex in a future year.
The parameter \(\beta(s)\) is unknown, and so we first need to calculate it. To do so, we set \(y = \text{calibration_year}\), and use the
Brent
root-finding algorithm to optimize \(\beta(s)\) such that the life expectancy in the calibration year (which is known) matches the predicted life expectancy.Once we have found \(\beta(s)\), we can use this formula to find the projected death probabilities.
To run the data generation for the mortality data:
cd LEAP
python3 leap/data_generation/death_data.py
This will update the following data file:
leap/processed_data/life_table.csv
leap.data_generation.death_data module¶
- leap.data_generation.death_data.calculate_life_expectancy(life_table: pandas.core.frame.DataFrame) float [source]¶
Determine the life expectancy for a person born in a given year.
The life expectancy can be calculated from the death probability using the formulae delineated here: Life Table Definitions
- Parameters:¶
- life_table: pandas.core.frame.DataFrame¶
A dataframe containing the probability of death for a single year, province and sex, for each age. Columns:
age
: the integer age.sex
: One ofM
= male,F
= female.year
: the integer calendar year.province
: A string indicating the province abbreviation, e.g."BC"
.For all of Canada, set province to
"CA"
.
prob_death
: the probability of death for a given age, province, sex, and year.
- Returns:¶
The life expectancy for a person born in the given year, in a given province, for a given sex.
- leap.data_generation.death_data.get_prob_death_projected(prob_death: float, year_initial: int, year: int, beta_year: float) float [source]¶
Given the (known) prob death for a past year, calculate the prob death in a future year.
\[\sigma^{-1}(p(\text{sex}, \text{age}, \text{year})) = \sigma^{-1}(p(\text{sex}, \text{age}, \text{year}_0)) - e^{\beta(\text{sex})(\text{year} - \text{year}_0)}\]- Parameters:¶
- prob_death: float¶
The probability of death for
year_initial
, the last year that past data was collected, for a given age, sex, province, and projection scenario.- year_initial: int¶
The initial year with a known probability of death. This is the last year that the past data was collected.
- year: int¶
The current year.
- beta_year: float¶
The beta parameter for the given sex, province, and projection scenario.
- Returns:¶
The projected probability of death for the current year.
- leap.data_generation.death_data.get_projected_life_table_single_year(beta_year: float, life_table: pandas.core.frame.DataFrame, year_initial: int, year: int, sex: str, province: str) pandas.core.frame.DataFrame [source]¶
Get the life table for a single year.
- Parameters:¶
- beta_year: float¶
The beta parameter for the given year.
- life_table: pandas.core.frame.DataFrame¶
A dataframe containing the projected probability of death for the starting year, for a given sex and province. Columns:
age
: the integer age.sex
: One ofM
= male,F
= female.year
: the starting calendar year.province
: a string indicating the province abbreviation, e.g."BC"
. For all of Canada, set province to"CA"
.prob_death
: the probability of death for a given age, province, sex, and year.
- year_initial: int¶
The initial year with a known probability of death. This is the last year that the past data was collected.
- year: int¶
The current year.
- sex: str¶
One of
M
= male,F
= female.- province: str¶
a string indicating the province abbreviation, e.g.
"BC"
. For all of Canada, set province to"CA"
.
- Returns:¶
A dataframe containing the projected probability of death for the given year, sex, and province.
- leap.data_generation.death_data.beta_year_optimizer(beta_year: float, life_table: pandas.core.frame.DataFrame, sex: str, province: str, year_initial: int, year: int) float [source]¶
Calculate the difference between the projected life expectancy and desired life expectancy.
This function is passed to the
scipy.optimize.brentq
function. We want to findbeta_year
such that the projected life expectancy is as close as possible to the desired life expectancy.- Parameters:¶
- beta_year: float¶
The beta parameter for the given year.
- life_table: pandas.core.frame.DataFrame¶
A dataframe containing the projected probability of death for the calibration year, for a given sex and province. Columns:
age
: the integer age.sex
: one ofM
= male,F
= female.year
: the calibration calendar year.province
: a 2-letter string indicating the province abbreviation, e.g."BC"
. For all of Canada, set province to"CA"
.prob_death
: the probability of death for a given age, province, sex, and year.
- sex: str¶
one of
M
= male,F
= female.- province: str¶
A 2-letter string indicating the province abbreviation, e.g.
"BC"
. For all of Canada, set province to"CA"
.- year_initial: int¶
The initial year with a known probability of death. This is the last year that the past data was collected.
- year: int¶
The current year.
- Returns:¶
The difference between the projected life expectancy of the calibration year and the desired life expectancy.
- leap.data_generation.death_data.load_past_death_data() pandas.core.frame.DataFrame [source]¶
Load the past death data from the
StatCan
CSV file.- Returns:¶
A dataframe containing the probability of death and the standard error for each year, province, age, and sex. Columns:
year
: The integer calendar year.province
: a 2-letter string indicating the province abbreviation, e.g."BC"
. For all of Canada, set province to"CA"
.sex
: One ofM
= male,F
= female.age
: The integer age.prob_death
: The probability that a person of the given age, sex, and province will die in the given year.se
: The standard error of the probability of death.
-
leap.data_generation.death_data.load_projected_death_data(past_life_table: pandas.core.frame.DataFrame, a: float =
-0.03
, b: float =-0.01
, xtol: float =1e-05
) pandas.core.frame.DataFrame [source]¶ Load the projected death data from
StatCan
CSV file.- Parameters:¶
- past_life_table: pandas.core.frame.DataFrame¶
A dataframe containing the probability of death and the standard error for each year, province, age, and sex. Columns:
year
: the integer calendar year.province
: A 2-letter string indicating the province abbreviation, e.g."BC"
. For all of Canada, set province to"CA"
.sex
: One ofM
= male,F
= female.age
: the integer age.prob_death
: the probability of death.se
: the standard error of the probability of death.
- a: float =
-0.03
¶ The lower bound for the beta parameter.
- b: float =
-0.01
¶ The upper bound for the beta parameter.
- xtol: float =
1e-05
¶ The tolerance for the beta parameter.
- Returns:¶
A dataframe containing the predicted probability of death and the standard error for each year, province, age, and sex. Columns:
year
: The integer calendar year.province
: A 2-letter string indicating the province abbreviation, e.g."BC"
. For all of Canada, set province to"CA"
.sex
: One ofM
= male,F
= female.age
: The integer age.prob_death
: The probability that a person of the given age, sex, and province will die in the given year.se
: The standard error of the probability of death.