Mortality Model
Data
To obtain the mortality data for each year, we used one table from Statistics Canada:
Past Data: 1996 - 2021
For past years, we used
Table 13-10-00837-01 from StatCan.
The *.csv file can be downloaded from here:
13100837-eng.zip
and is saved as:
LEAP/leap/original_data/13100837.csv.
The relevant columns are:
Column |
Type |
Description |
REF_DATE
|
int
|
the calendar year |
AGE_GROUP
|
str
|
the age of the person in years |
GEO
|
str
|
the province or terriroty full name |
SEX
|
str
|
one of “Both sexes”, “Females”, or “Males” |
ELEMENT
|
str
|
describes what the variable of interest is; we want "Death probability between age x and x+1 (qx)" |
VALUE
|
int
|
the probability of death between age x and x+1 in that year, province, sex, and age group |
Projected Data: 2021 - 2068
Statistics Canada doesn’t provide annual projections for death probabilities, but does
provide a projection for specific years (which we call calibration years):
Region |
Year |
Projection Scenario |
Mortality Scenario |
Canada |
2028
|
LG
|
HM
|
Canada |
2028
|
M1
|
MM
|
Canada |
2028
|
M2
|
MM
|
Canada |
2028
|
M3
|
MM
|
Canada |
2028
|
M4
|
MM
|
Canada |
2028
|
M5
|
MM
|
Canada |
2028
|
HG
|
LM
|
Canada |
2028
|
SA
|
HM
|
Canada |
2028
|
FA
|
LM
|
Canada |
2048
|
LG
|
HM
|
Canada |
2048
|
M1
|
MM
|
Canada |
2048
|
M2
|
MM
|
Canada |
2028
|
M3
|
MM
|
Canada |
2048
|
M4
|
MM
|
Canada |
2048
|
M5
|
MM
|
Canada |
2048
|
HG
|
LM
|
Canada |
2048
|
SA
|
HM
|
Canada |
2048
|
FA
|
LM
|
Canada |
2073
|
LG
|
HM
|
Canada |
2073
|
M1
|
MM
|
Canada |
2073
|
M2
|
MM
|
Canada |
2073
|
M3
|
MM
|
Canada |
2073
|
M4
|
MM
|
Canada |
2073
|
M5
|
MM
|
Canada |
2073
|
HG
|
LM
|
Canada |
2073
|
SA
|
HM
|
Canada |
2073
|
FA
|
LM
|
Provinces / Territories |
2028
|
HG, FA
|
LM
|
Provinces / Territories |
2028
|
M1, M2, M3, M4, M5, M6
|
MM
|
Provinces / Territories |
2028
|
LG, SA
|
HM
|
Provinces / Territories |
2033
|
HG, FA
|
LM
|
Provinces / Territories |
2033
|
M1, M2, M3, M4, M5, M6
|
MM
|
Provinces / Territories |
2033
|
LG, SA
|
HM
|
Provinces / Territories |
2038
|
HG, FA
|
LM
|
Provinces / Territories |
2038
|
M1, M2, M3, M4, M5, M6
|
MM
|
Provinces / Territories |
2038
|
LG, SA
|
HM
|
Provinces / Territories |
2043
|
HG, FA
|
LM
|
Provinces / Territories |
2043
|
M1, M2, M3, M4, M5, M6
|
MM
|
Provinces / Territories |
2043
|
LG, SA
|
HM
|
Provinces / Territories |
2048
|
HG, FA
|
LM
|
Provinces / Territories |
2048
|
M1, M2, M3, M4, M5, M6
|
MM
|
Provinces / Territories |
2048
|
LG, SA
|
HM
|
Provinces / Territories |
2053
|
HG, FA
|
LM
|
Provinces / Territories |
2053
|
M1, M2, M3, M4, M5, M6
|
MM
|
Provinces / Territories |
2053
|
LG, SA
|
HM
|
Provinces / Territories |
2058
|
HG, FA
|
LM
|
Provinces / Territories |
2058
|
M1, M2, M3, M4, M5, M6
|
MM
|
Provinces / Territories |
2058
|
LG, SA
|
HM
|
Provinces / Territories |
2063
|
HG, FA
|
LM
|
Provinces / Territories |
2063
|
M1, M2, M3, M4, M5, M6
|
MM
|
Provinces / Territories |
2063
|
LG, SA
|
HM
|
Provinces / Territories |
2068
|
HG, FA
|
LM
|
Provinces / Territories |
2068
|
M1, M2, M3, M4, M5, M6
|
MM
|
Provinces / Territories |
2068
|
LG, SA
|
HM
|
Provinces / Territories |
2073
|
HG, FA
|
LM
|
Provinces / Territories |
2073
|
M1, M2, M3, M4, M5, M6
|
MM
|
Provinces / Territories |
2073
|
LG, SA
|
HM
|
This data can be found in the Statistics Canada Population Projections Technical Report:
Table 3.1, Table 3.2, Table 5.2.1, Table 5.2.2, Table 5.2.3.
Model
We have mortality data for past years (1996 - 2020), and life expectancy projections for
specific future years; but we would like to have mortality data for all future years in our
model. Statistics Canada describes how they model mortality here:
Methods for Constructing Life Tables for Canada, Provinces and Territories.
In particular, the model they use is the Kannisto-Thatcher model, described in this paper:
On the use of Kannisto model for mortality trajectory modelling at very old ages.
According to the Kannisto-Thatcher model, the instantaneous probability of death at age \(x\)
is given by:
\[\begin{split}\mu(x) &= \dfrac{a e^{\beta x}}{1 + a e^{\beta x}} \\
&= \lim_{\Delta x \to 0} \dfrac{P(\text{death between age $x$ and $x + \Delta x$} \mid \text{survived till $x$})}{\Delta x}\end{split}\]
In mathematical terms, \(\mu(x)\) is the hazard rate. Let’s break this down further.
Let \(F_X(x)\) be the cumulative distribution function for age at death, \(X\):
\[\begin{split}F_X(x) :&= P(\text{age at death} \leq \text{given age}) \\
&= P(X \leq x)\end{split}\]
We want the conditional probability of death between age \(x\) and \(x + \Delta x\),
given that the person has survived till age \(x\). This is given by:
\[P(x < X \leq x + \Delta x \mid X > x)\]
Recall that for a conditional probability:
\[P(A \mid B) = \dfrac{P(A \cap B)}{P(B)}\]
and so:
\[P(x < X \leq x + \Delta x \mid X > x) = \dfrac{P(x < X \leq x + \Delta x \bigcap X > x)}{P(X > x)}\]
Since \(F_X(x)\) is the cumulative distribution function, by definition it must sum to 1:
\[P(X > x) = 1 - F_X(x)\]
Since \(X > x\) if \(x < X \leq x + \Delta x\), we can rewrite the numerator as:
\[\begin{split}P(x < X \leq x + \Delta x \bigcap X > x) &= P(x < X \leq x + \Delta x) \\
&= F_X(x + \Delta x) - F_X(x)\end{split}\]
Putting it all together, we have:
\[P(x < X \leq x + \Delta x \mid X > x) =
\dfrac{F_X(x + \Delta x) - F_X(x)}{1 - F_X(x)}\]
Now, we want to find the instantaneous rate of death; the probability of death per unit time.
If we take the limit as \(\Delta x \to 0\), we will find the instantaneous probability of
death at age \(x\). To get the probability of death per unit time, we need to divide by
\(\Delta x\):
\[\mu(x) = \lim_{\Delta x \to 0} \dfrac{F_X(x + \Delta x) - F_X(x)}{\Delta x (1 - F_X(x))}\]
You will recognize the derivative of \(F_X(x)\):
\[\dfrac{d}{dx} F_X(x) = \lim_{\Delta x \to 0} \dfrac{F_X(x + \Delta x) - F_X(x)}{\Delta x}\]
and so:
\[\mu(x) = \dfrac{F_X'(x)}{1 - F_X(x)}\]
The data in the Statistics Canada mortality table is the probability of death between age
\(x\) and \(x + 1\), which is denoted as \(q_x\). This is the same as the probability
\(P(x < X \leq x + \Delta x \mid X > x)\), with \(\Delta x = 1\). We would like to solve
for \(q_x\), using the Kannisto-Thatcher Equation for \(\mu(x)\). First, we can
write \(q_x\) in terms of \(F_X(x)\):
\[\begin{split}q_x &= P(x < X \leq x + 1 \mid X > x) \\
&= \dfrac{F_X(x + 1) - F_X(x)}{1 - F_X(x)}\end{split}\]
Let us define \(S_X(x)\), the survival function, for convenience:
\[\begin{split}S_X(x) &:= 1 - F_X(x) \\
&= P(X > x)\end{split}\]
Then we have:
\[\dfrac{dS}{dx} = -F_X'(x)\]
and so \(\mu(x)\) can be rewritten as:
\[\mu(x) = -\dfrac{dS}{dx}\dfrac{1}{S_X(x)}\]
Solving this first order separable linear differential equation, we have:
\[\begin{split}\int \dfrac{dS}{S_X} &= -\int \mu(x) dx \\
\ln(S_X(x)) &= -\int \mu(x) dx + C \\
&= -\int \dfrac{a e^{\beta x}}{1 + a e^{\beta x}} dx + C\end{split}\]
Letting \(u(x) := 1 + a e^{\beta x}\), we have:
\[\begin{split}\ln(S_X(x)) &= - \dfrac{1}{\beta} \int \dfrac{du}{u} + C \\
&= - \dfrac{1}{\beta} \ln(u(x)) + C \\
S_X(x) &= e^C (1 + a e^{\beta x})^{-\frac{1}{\beta}} \\
&= k (1 + a e^{\beta x})^{-\frac{1}{\beta}} \\
1 - F_X(x) &= k (1 + a e^{\beta x})^{-\frac{1}{\beta}} \\
F_X(x) &= 1 - k (1 + a e^{\beta x})^{-\frac{1}{\beta}}\end{split}\]
Now, we can substitute this into the equation for \(q_x\):
\[\begin{split}q_x &= \dfrac{F_X(x + \Delta x) - F_X(x)}{1 - F_X(x)} \\
&= \dfrac{
1 - k (1 + a e^{\beta (x + \Delta x)})^{-\frac{1}{\beta}} -
1 + k (1 + a e^{\beta x})^{-\frac{1}{\beta}}
}{k (1 + a e^{\beta x})^{-\frac{1}{\beta}}} \\
&= \dfrac{
- k (1 + a e^{\beta (x + \Delta x)})^{-\frac{1}{\beta}}
+ k (1 + a e^{\beta x})^{-\frac{1}{\beta}}
}{k (1 + a e^{\beta x})^{-\frac{1}{\beta}}} \\
&= 1 - \left(\dfrac{1 + a e^{\beta (x + \Delta x)}}{1 + a e^{\beta x}}\right)^{-\frac{1}{\beta}} \\
&= 1 - \left(\dfrac{1 + a e^{\beta x}}{1 + a e^{\beta (x + \Delta x)}}\right)^{\frac{1}{\beta}}\end{split}\]
If we take the logit of \(q_x\), we have:
\[\begin{split}\sigma^{-1}(q_x) &= \ln\left(\dfrac{q_x}{1 - q_x}\right) \\
&= \ln\left(\dfrac{
1 - \left(\dfrac{1 + a e^{\beta x}}{1 + a e^{\beta (x + \Delta x)}}\right)^{\frac{1}{\beta}}
}{
\left(\dfrac{1 + a e^{\beta x}}{1 + a e^{\beta (x + \Delta x)}}\right)^{\frac{1}{\beta}}
}\right) \\
&= \ln\left(
\left(\dfrac{1 + a e^{\beta (x + \Delta x)}}{1 + a e^{\beta x}}\right)^{\frac{1}{\beta}} - 1
\right) \\
&= \ln
\left(\dfrac{
(1 + \alpha e^{\beta (x + \Delta x)})^{\frac{1}{\beta}} -
(1 + \alpha e^{\beta x})^{\frac{1}{\beta}}
}{(1 + \alpha e^{\beta x})^{\frac{1}{\beta}}}\right)
\\
&= \ln\left(
(1 + \alpha e^{\beta (x + \Delta x)})^{\frac{1}{\beta}} -
(1 + \alpha e^{\beta x})^{\frac{1}{\beta}}
\right) -
\dfrac{1}{\beta}\ln(1 + \alpha e^{\beta x})\end{split}\]
Let us now look at \(\sigma^{-1}(q_x) - \sigma^{-1}(q_{x_0})\):
\[\begin{split}\sigma^{-1}(q_x) - \sigma^{-1}(q_{x_0}) &=
\ln\left(
(1 + \alpha e^{\beta (x + \Delta x)})^{\frac{1}{\beta}} -
(1 + \alpha e^{\beta x})^{\frac{1}{\beta}}
\right) -
\dfrac{1}{\beta}\ln(1 + a e^{\beta x}) \\
&-
\ln\left(
(1 + \alpha e^{\beta (x_0 + \Delta x)})^{\frac{1}{\beta}} -
(1 + \alpha e^{\beta x_0})^{\frac{1}{\beta}}
\right) +
\dfrac{1}{\beta}\ln(1 + \alpha e^{\beta x_0})\end{split}\]
Now, based on fitting the model to empirical data, typically we have
[Appendix D, Table 5, [Kannisto, 1994]]:
\(\beta \approx \mathcal{O}(10^{-1})\)
\(\alpha \approx \mathcal{O}(10^{-5})\)
We can use the binomial approximation to simplify the above equation. Let us take:
\[(1 + \alpha e^{\beta x})^{\frac{1}{\beta}}\]
In order to use the binomial approximation, we must have:
\[\begin{split}\left|\alpha e^{\beta x}\right| &< 1 \\
\left|\dfrac{\alpha e^{\beta x}}{\beta}\right| &\ll 1 \\\end{split}\]
Since \(x\) represents the age in years, we have \(x \in [0, 120]\). These conditions hold
for all ages. Using the binomial approximation, we have:
\[(1 + \alpha e^{\beta x})^{\frac{1}{\beta}} \approx 1 + \dfrac{\alpha e^{\beta x}}{\beta}\]
Going back to our equation for \(\sigma^{-1}(q_x) - \sigma^{-1}(q_{x_0})\), we have:
\[\begin{split}\sigma^{-1}(q_x) - \sigma^{-1}(q_{x_0}) &\approx
\ln\left(
1 + \dfrac{\alpha e^{\beta (x + \Delta x)}}{\beta} -
1 - \dfrac{\alpha e^{\beta x}}{\beta}
\right) -
\ln\left(1 + \dfrac{\alpha e^{\beta x}}{\beta}\right) \\
&-
\ln\left(
1 + \dfrac{\alpha e^{\beta (x_0 + \Delta x)}}{\beta} -
1 - \dfrac{\alpha e^{\beta x_0}}{\beta}
\right) +
\ln\left(1 + \dfrac{\alpha e^{\beta x_0}}{\beta}\right) \\
&= \ln\left(
\dfrac{\alpha e^{\beta (x + \Delta x)}}{\beta} -
\dfrac{\alpha e^{\beta x}}{\beta}
\right) -
\ln\left(1 + \dfrac{\alpha e^{\beta x}}{\beta}\right) \\
&-
\ln\left(
\dfrac{\alpha e^{\beta (x_0 + \Delta x)}}{\beta} -
\dfrac{\alpha e^{\beta x_0}}{\beta}
\right) +
\ln\left(1 + \dfrac{\alpha e^{\beta x_0}}{\beta}\right) \\
&= \textcolor{orange}{\cancel{\ln\left(\dfrac{\alpha}{\beta}\right)}} + \ln(e^{\beta x})+
\textcolor{magenta}{\cancel{\ln\left(e^{\beta \Delta x} - 1\right)}} -
\ln\left(1 + \dfrac{\alpha e^{\beta x}}{\beta}\right) \\
&-
\textcolor{orange}{\cancel{\ln\left(\dfrac{\alpha}{\beta}\right)}} - \ln(e^{\beta x_0}) -
\textcolor{magenta}{\cancel{\ln\left(e^{\beta \Delta x} - 1\right)}} +
\ln\left(1 + \dfrac{\alpha e^{\beta x_0}}{\beta}\right) \\
&= \ln(e^{\beta x})+
\ln\left(1 + \dfrac{\alpha e^{\beta x_0}}{\beta}\right) -
\ln\left(1 + \dfrac{\alpha e^{\beta x}}{\beta}\right) -
\ln(e^{\beta x_0}) \\
&= \beta (x - x_0) +
\ln\left(\dfrac{1 + \dfrac{\alpha e^{\beta x_0}}{\beta}}{1 + \dfrac{\alpha e^{\beta x}}{\beta}}\right)\end{split}\]
The last term is much smaller than the first term, and so we can ignore it. Thus, we have:
\[\sigma^{-1}(q_x) \approx \sigma^{-1}(q_{x_0}) + \beta (x - x_0)\]
If \(x_0\) is the age of the person in the starting year of the simulation, then
\((\text{year} - \text{year}_0) = (x - x_0)\):
\[\sigma^{-1}(q_x(\text{sex}, \text{age})) =
\sigma^{-1}(q_{x_0}(\text{sex}, \text{age})) -
\beta_{\text{sex}}(\text{year} - \text{year}_0)\]
The parameter \(\beta_{\text{sex}}\) is unknown, and so we first need to calculate it.
To do so, we set \(\text{year} = \text{year}_C\), the calibration year, and use the Brent
root-finding algorithm to optimize \(\beta_{\text{sex}}\) such that the life expectancy in the
calibration year (which is known) matches the predicted life expectancy.
Once we have found \(\beta_{\text{sex}}\), we can use this formula to find the projected death
probabilities.