Birth Model

Data

To obtain the population data for each year, we used two tables from Statistics Canada:

Past Data: 1999 - 2021

For past years, we used Table 17-10-00005-01 from Statistics Canada. This is data that the Canadian government collects and publishes on the population of the country, stratified by different variables.

The *.csv file can be downloaded from here: 17100005-eng.zip

and is saved as: LEAP/leap/original_data/17100005.csv.

The relevant columns are:

Column

Type

Description

REF_DATE

int

the calendar year

AGE_GROUP

str

the age of the person in years

GEO

str

the province or terriroty full name

SEX

str

one of “Both sexes”, “Females”, or “Males”

VALUE

int

the population in that year, province, sex, and age group

Projected Data: 2021 - 2065

For future years, we used Table 17-10-0057-01 from Statistics Canada. Statistic Canada provides projected population data based on different projection scenarios.

The *.csv file can be downloaded from here: 17100057-eng.zip

and is saved as: LEAP/leap/original_data/17100057.csv.

The relevant columns are:

Column

Type

Description

REF_DATE

int

the calendar year

AGE_GROUP

str

the age of the person in years

GEO

str

the province or terriroty full name

Sex

str

one of “Both sexes”, “Females”, or “Males”

Projection scenario

str

the projection scenario used to model population growth:
  • LG: low-growth projection

  • HG: high-growth projection

  • M1: medium-growth 1 projection

  • M2: medium-growth 2 projection

  • M3: medium-growth 3 projection

  • M4: medium-growth 4 projection

  • M5: medium-growth 5 projection

  • M6: medium-growth 6 projection

  • FA: fast-aging projection

  • SA: slow-aging projection

VALUE

int

the population in that year, province, sex, age group, and projection scenario

Processed Data

The two source tables are combined by leap/data_generation/birth_data.py into a single processed file saved as: leap/processed_data/birth/birth_estimate.csv.

Past data (from 17100005.csv) covers years 1999 onwards using actual population counts. Projected data (from 17100057.csv) begins the year after the last available past year and covers projections up to 2065. In the projected source file, VALUE is stored in thousands and is multiplied by 1000 during processing.

For both sources, only the AGE_GROUP = 0 (newborns) rows are used. The N column represents the total number of births (both sexes combined), and prop_male is derived as the number of male births divided by the total.

Column

Type

Description

year

int

the calendar year

province

str

the 2-letter province or territory ID (e.g., BC = British Columbia, AB = Alberta, CA = Canada)

N

int

the total number of births (both sexes) in that year and province

prop_male

float

the proportion of births that are male

projection_scenario

str

past for historical data, or one of the projection scenario IDs for future data:

  • LG: low-growth projection

  • HG: high-growth projection

  • M1: medium-growth 1 projection

  • M2: medium-growth 2 projection

  • M3: medium-growth 3 projection

  • M4: medium-growth 4 projection

  • M5: medium-growth 5 projection

  • M6: medium-growth 6 projection

  • FA: fast-aging projection

  • SA: slow-aging projection