Mortality Model¶

Data¶

To obtain the mortality data for each year, we used one table from Statistics Canada:

Past Data: 1996 - 2021¶

For past years, we used Table 13-10-00837-01 from StatCan.

The *.csv file can be downloaded from here: 13100837-eng.zip and is saved as: LEAP/leap/original_data/13100837.csv.

The relevant columns are:

Column	Type	Description
`REF_DATE`	`int`	the calendar year
`AGE_GROUP`	`str`	the age of the person in years
`GEO`	`str`	the province or terriroty full name
`SEX`	`str`	one of “Both sexes”, “Females”, or “Males”
`ELEMENT`	`str`	describes what the variable of interest is; we want `"Death probability between age x and x+1 (qx)"`
`VALUE`	`int`	the probability of death between age `x` and `x+1` in that year, province, sex, and age group

Projected Data: 2021 - 2068¶

Statistics Canada doesn’t provide annual projections for death probabilities, but does provide a projection for specific years (which we call calibration years):

Region	Year	Projection Scenario	Mortality Scenario
Canada	`2028`	`LG`	`HM`
Canada	`2028`	`M1`	`MM`
Canada	`2028`	`M2`	`MM`
Canada	`2028`	`M3`	`MM`
Canada	`2028`	`M4`	`MM`
Canada	`2028`	`M5`	`MM`
Canada	`2028`	`HG`	`LM`
Canada	`2028`	`SA`	`HM`
Canada	`2028`	`FA`	`LM`
Canada	`2048`	`LG`	`HM`
Canada	`2048`	`M1`	`MM`
Canada	`2048`	`M2`	`MM`
Canada	`2028`	`M3`	`MM`
Canada	`2048`	`M4`	`MM`
Canada	`2048`	`M5`	`MM`
Canada	`2048`	`HG`	`LM`
Canada	`2048`	`SA`	`HM`
Canada	`2048`	`FA`	`LM`
Canada	`2073`	`LG`	`HM`
Canada	`2073`	`M1`	`MM`
Canada	`2073`	`M2`	`MM`
Canada	`2073`	`M3`	`MM`
Canada	`2073`	`M4`	`MM`
Canada	`2073`	`M5`	`MM`
Canada	`2073`	`HG`	`LM`
Canada	`2073`	`SA`	`HM`
Canada	`2073`	`FA`	`LM`
Provinces / Territories	`2028`	`HG`, `FA`	`LM`
Provinces / Territories	`2028`	`M1`, `M2`, `M3`, `M4`, `M5`, `M6`	`MM`
Provinces / Territories	`2028`	`LG`, `SA`	`HM`
Provinces / Territories	`2033`	`HG`, `FA`	`LM`
Provinces / Territories	`2033`	`M1`, `M2`, `M3`, `M4`, `M5`, `M6`	`MM`
Provinces / Territories	`2033`	`LG`, `SA`	`HM`
Provinces / Territories	`2038`	`HG`, `FA`	`LM`
Provinces / Territories	`2038`	`M1`, `M2`, `M3`, `M4`, `M5`, `M6`	`MM`
Provinces / Territories	`2038`	`LG`, `SA`	`HM`
Provinces / Territories	`2043`	`HG`, `FA`	`LM`
Provinces / Territories	`2043`	`M1`, `M2`, `M3`, `M4`, `M5`, `M6`	`MM`
Provinces / Territories	`2043`	`LG`, `SA`	`HM`
Provinces / Territories	`2048`	`HG`, `FA`	`LM`
Provinces / Territories	`2048`	`M1`, `M2`, `M3`, `M4`, `M5`, `M6`	`MM`
Provinces / Territories	`2048`	`LG`, `SA`	`HM`
Provinces / Territories	`2053`	`HG`, `FA`	`LM`
Provinces / Territories	`2053`	`M1`, `M2`, `M3`, `M4`, `M5`, `M6`	`MM`
Provinces / Territories	`2053`	`LG`, `SA`	`HM`
Provinces / Territories	`2058`	`HG`, `FA`	`LM`
Provinces / Territories	`2058`	`M1`, `M2`, `M3`, `M4`, `M5`, `M6`	`MM`
Provinces / Territories	`2058`	`LG`, `SA`	`HM`
Provinces / Territories	`2063`	`HG`, `FA`	`LM`
Provinces / Territories	`2063`	`M1`, `M2`, `M3`, `M4`, `M5`, `M6`	`MM`
Provinces / Territories	`2063`	`LG`, `SA`	`HM`
Provinces / Territories	`2068`	`HG`, `FA`	`LM`
Provinces / Territories	`2068`	`M1`, `M2`, `M3`, `M4`, `M5`, `M6`	`MM`
Provinces / Territories	`2068`	`LG`, `SA`	`HM`
Provinces / Territories	`2073`	`HG`, `FA`	`LM`
Provinces / Territories	`2073`	`M1`, `M2`, `M3`, `M4`, `M5`, `M6`	`MM`
Provinces / Territories	`2073`	`LG`, `SA`	`HM`

This data can be found in the Statistics Canada Population Projections Technical Report: Table 3.1, Table 3.2, Table 5.2.1, Table 5.2.2, Table 5.2.3.

Model¶

We have mortality data for past years (1996 - 2020), and life expectancy projections for specific future years; but we would like to have mortality data for all future years in our model. Statistics Canada describes how they model mortality here: Methods for Constructing Life Tables for Canada, Provinces and Territories.

In particular, the model they use is the Kannisto-Thatcher model, described in this paper: On the use of Kannisto model for mortality trajectory modelling at very old ages.

According to the Kannisto-Thatcher model, the instantaneous probability of death at age $x$ is given by:

\[\begin{split}\mu(x) &= \dfrac{a e^{\beta x}}{1 + a e^{\beta x}} \\ &= \lim_{\Delta x \to 0} \dfrac{P(\text{death between age $x$ and $x + \Delta x$} \mid \text{survived till $x$})}{\Delta x}\end{split}\]

In mathematical terms, $\mu(x)$ is the hazard rate. Let’s break this down further. Let $F_X(x)$ be the cumulative distribution function for age at death, $X$:

\[\begin{split}F_X(x) :&= P(\text{age at death} \leq \text{given age}) \\ &= P(X \leq x)\end{split}\]

We want the conditional probability of death between age $x$ and $x + \Delta x$, given that the person has survived till age $x$. This is given by:

\[P(x < X \leq x + \Delta x \mid X > x)\]

Recall that for a conditional probability:

\[P(A \mid B) = \dfrac{P(A \cap B)}{P(B)}\]

and so:

\[P(x < X \leq x + \Delta x \mid X > x) = \dfrac{P(x < X \leq x + \Delta x \bigcap X > x)}{P(X > x)}\]

Since $F_X(x)$ is the cumulative distribution function, by definition it must sum to 1:

\[P(X > x) = 1 - F_X(x)\]

Since $X > x$ if $x < X \leq x + \Delta x$, we can rewrite the numerator as:

\[\begin{split}P(x < X \leq x + \Delta x \bigcap X > x) &= P(x < X \leq x + \Delta x) \\ &= F_X(x + \Delta x) - F_X(x)\end{split}\]

Putting it all together, we have:

\[P(x < X \leq x + \Delta x \mid X > x) = \dfrac{F_X(x + \Delta x) - F_X(x)}{1 - F_X(x)}\]

Now, we want to find the instantaneous rate of death; the probability of death per unit time. If we take the limit as $\Delta x \to 0$, we will find the instantaneous probability of death at age $x$. To get the probability of death per unit time, we need to divide by $\Delta x$:

\[\mu(x) = \lim_{\Delta x \to 0} \dfrac{F_X(x + \Delta x) - F_X(x)}{\Delta x (1 - F_X(x))}\]

You will recognize the derivative of $F_X(x)$:

\[\dfrac{d}{dx} F_X(x) = \lim_{\Delta x \to 0} \dfrac{F_X(x + \Delta x) - F_X(x)}{\Delta x}\]

and so:

\[\mu(x) = \dfrac{F_X'(x)}{1 - F_X(x)}\]

The data in the Statistics Canada mortality table is the probability of death between age $x$ and $x + 1$, which is denoted as $q_x$. This is the same as the probability $P(x < X \leq x + \Delta x \mid X > x)$, with $\Delta x = 1$. We would like to solve for $q_x$, using the Kannisto-Thatcher Equation for $\mu(x)$. First, we can write $q_x$ in terms of $F_X(x)$:

\[\begin{split}q_x &= P(x < X \leq x + 1 \mid X > x) \\ &= \dfrac{F_X(x + 1) - F_X(x)}{1 - F_X(x)}\end{split}\]

Let us define $S_X(x)$, the survival function, for convenience:

\[\begin{split}S_X(x) &:= 1 - F_X(x) \\ &= P(X > x)\end{split}\]

Then we have:

\[\dfrac{dS}{dx} = -F_X'(x)\]

and so $\mu(x)$ can be rewritten as:

\[\mu(x) = -\dfrac{dS}{dx}\dfrac{1}{S_X(x)}\]

Solving this first order separable linear differential equation, we have:

\[\begin{split}\int \dfrac{dS}{S_X} &= -\int \mu(x) dx \\ \ln(S_X(x)) &= -\int \mu(x) dx + C \\ &= -\int \dfrac{a e^{\beta x}}{1 + a e^{\beta x}} dx + C\end{split}\]

Letting $u(x) := 1 + a e^{\beta x}$, we have:

\[\begin{split}\ln(S_X(x)) &= - \dfrac{1}{\beta} \int \dfrac{du}{u} + C \\ &= - \dfrac{1}{\beta} \ln(u(x)) + C \\ S_X(x) &= e^C (1 + a e^{\beta x})^{-\frac{1}{\beta}} \\ &= k (1 + a e^{\beta x})^{-\frac{1}{\beta}} \\ 1 - F_X(x) &= k (1 + a e^{\beta x})^{-\frac{1}{\beta}} \\ F_X(x) &= 1 - k (1 + a e^{\beta x})^{-\frac{1}{\beta}}\end{split}\]

Now, we can substitute this into the equation for $q_x$:

\[\begin{split}q_x &= \dfrac{F_X(x + \Delta x) - F_X(x)}{1 - F_X(x)} \\ &= \dfrac{ 1 - k (1 + a e^{\beta (x + \Delta x)})^{-\frac{1}{\beta}} - 1 + k (1 + a e^{\beta x})^{-\frac{1}{\beta}} }{k (1 + a e^{\beta x})^{-\frac{1}{\beta}}} \\ &= \dfrac{ - k (1 + a e^{\beta (x + \Delta x)})^{-\frac{1}{\beta}} + k (1 + a e^{\beta x})^{-\frac{1}{\beta}} }{k (1 + a e^{\beta x})^{-\frac{1}{\beta}}} \\ &= 1 - \left(\dfrac{1 + a e^{\beta (x + \Delta x)}}{1 + a e^{\beta x}}\right)^{-\frac{1}{\beta}} \\ &= 1 - \left(\dfrac{1 + a e^{\beta x}}{1 + a e^{\beta (x + \Delta x)}}\right)^{\frac{1}{\beta}}\end{split}\]

If we take the logit of $q_x$, we have:

\[\begin{split}\sigma^{-1}(q_x) &= \ln\left(\dfrac{q_x}{1 - q_x}\right) \\ &= \ln\left(\dfrac{ 1 - \left(\dfrac{1 + a e^{\beta x}}{1 + a e^{\beta (x + \Delta x)}}\right)^{\frac{1}{\beta}} }{ \left(\dfrac{1 + a e^{\beta x}}{1 + a e^{\beta (x + \Delta x)}}\right)^{\frac{1}{\beta}} }\right) \\ &= \ln\left( \left(\dfrac{1 + a e^{\beta (x + \Delta x)}}{1 + a e^{\beta x}}\right)^{\frac{1}{\beta}} - 1 \right) \\ &= \ln \left(\dfrac{ (1 + \alpha e^{\beta (x + \Delta x)})^{\frac{1}{\beta}} - (1 + \alpha e^{\beta x})^{\frac{1}{\beta}} }{(1 + \alpha e^{\beta x})^{\frac{1}{\beta}}}\right) \\ &= \ln\left( (1 + \alpha e^{\beta (x + \Delta x)})^{\frac{1}{\beta}} - (1 + \alpha e^{\beta x})^{\frac{1}{\beta}} \right) - \dfrac{1}{\beta}\ln(1 + \alpha e^{\beta x})\end{split}\]

Let us now look at $\sigma^{-1}(q_x) - \sigma^{-1}(q_{x_0})$:

\[\begin{split}\sigma^{-1}(q_x) - \sigma^{-1}(q_{x_0}) &= \ln\left( (1 + \alpha e^{\beta (x + \Delta x)})^{\frac{1}{\beta}} - (1 + \alpha e^{\beta x})^{\frac{1}{\beta}} \right) - \dfrac{1}{\beta}\ln(1 + a e^{\beta x}) \\ &- \ln\left( (1 + \alpha e^{\beta (x_0 + \Delta x)})^{\frac{1}{\beta}} - (1 + \alpha e^{\beta x_0})^{\frac{1}{\beta}} \right) + \dfrac{1}{\beta}\ln(1 + \alpha e^{\beta x_0})\end{split}\]

Now, based on fitting the model to empirical data, typically we have [Appendix D, Table 5, [Kannisto, 1994]]:

$\beta \approx \mathcal{O}(10^{-1})$
$\alpha \approx \mathcal{O}(10^{-5})$

We can use the binomial approximation to simplify the above equation. Let us take:

\[(1 + \alpha e^{\beta x})^{\frac{1}{\beta}}\]

In order to use the binomial approximation, we must have:

\[\begin{split}\left|\alpha e^{\beta x}\right| &< 1 \\ \left|\dfrac{\alpha e^{\beta x}}{\beta}\right| &\ll 1 \\\end{split}\]

Since $x$ represents the age in years, we have $x \in [0, 120]$. These conditions hold for all ages. Using the binomial approximation, we have:

\[(1 + \alpha e^{\beta x})^{\frac{1}{\beta}} \approx 1 + \dfrac{\alpha e^{\beta x}}{\beta}\]

Going back to our equation for $\sigma^{-1}(q_x) - \sigma^{-1}(q_{x_0})$, we have:

\[\begin{split}\sigma^{-1}(q_x) - \sigma^{-1}(q_{x_0}) &\approx \ln\left( 1 + \dfrac{\alpha e^{\beta (x + \Delta x)}}{\beta} - 1 - \dfrac{\alpha e^{\beta x}}{\beta} \right) - \ln\left(1 + \dfrac{\alpha e^{\beta x}}{\beta}\right) \\ &- \ln\left( 1 + \dfrac{\alpha e^{\beta (x_0 + \Delta x)}}{\beta} - 1 - \dfrac{\alpha e^{\beta x_0}}{\beta} \right) + \ln\left(1 + \dfrac{\alpha e^{\beta x_0}}{\beta}\right) \\ &= \ln\left( \dfrac{\alpha e^{\beta (x + \Delta x)}}{\beta} - \dfrac{\alpha e^{\beta x}}{\beta} \right) - \ln\left(1 + \dfrac{\alpha e^{\beta x}}{\beta}\right) \\ &- \ln\left( \dfrac{\alpha e^{\beta (x_0 + \Delta x)}}{\beta} - \dfrac{\alpha e^{\beta x_0}}{\beta} \right) + \ln\left(1 + \dfrac{\alpha e^{\beta x_0}}{\beta}\right) \\ &= \textcolor{orange}{\cancel{\ln\left(\dfrac{\alpha}{\beta}\right)}} + \ln(e^{\beta x})+ \textcolor{magenta}{\cancel{\ln\left(e^{\beta \Delta x} - 1\right)}} - \ln\left(1 + \dfrac{\alpha e^{\beta x}}{\beta}\right) \\ &- \textcolor{orange}{\cancel{\ln\left(\dfrac{\alpha}{\beta}\right)}} - \ln(e^{\beta x_0}) - \textcolor{magenta}{\cancel{\ln\left(e^{\beta \Delta x} - 1\right)}} + \ln\left(1 + \dfrac{\alpha e^{\beta x_0}}{\beta}\right) \\ &= \ln(e^{\beta x})+ \ln\left(1 + \dfrac{\alpha e^{\beta x_0}}{\beta}\right) - \ln\left(1 + \dfrac{\alpha e^{\beta x}}{\beta}\right) - \ln(e^{\beta x_0}) \\ &= \beta (x - x_0) + \ln\left(\dfrac{1 + \dfrac{\alpha e^{\beta x_0}}{\beta}}{1 + \dfrac{\alpha e^{\beta x}}{\beta}}\right)\end{split}\]

The last term is much smaller than the first term, and so we can ignore it. Thus, we have:

\[\sigma^{-1}(q_x) \approx \sigma^{-1}(q_{x_0}) + \beta (x - x_0)\]

If $x_0$ is the age of the person in the starting year of the simulation, then $(\text{year} - \text{year}_0) = (x - x_0)$:

\[\sigma^{-1}(q_x(\text{sex}, \text{age})) = \sigma^{-1}(q_{x_0}(\text{sex}, \text{age})) - \beta_{\text{sex}}(\text{year} - \text{year}_0)\]

The parameter $\beta_{\text{sex}}$ is unknown, and so we first need to calculate it. To do so, we set $\text{year} = \text{year}_C$, the calibration year, and use the Brent root-finding algorithm to optimize $\beta_{\text{sex}}$ such that the life expectancy in the calibration year (which is known) matches the predicted life expectancy.

Once we have found $\beta_{\text{sex}}$, we can use this formula to find the projected death probabilities.