Birth Data

Original Data

To obtain the population data for each year, we used two tables from StatCan:

  1. 1999 - 2021:

    For past years, we used Table 17-10-00005-01 from StatCan.

    The *.csv file can be downloaded from here: 17100005-eng.zip

    and is saved as: LEAP/leap/original_data/17100005.csv

  2. 2021 - 2065:

    For future years, we used Table 17-10-0057-01 from StatCan.

    The *.csv file can be downloaded from here: 17100057-eng.zip.

    and is saved as: LEAP/leap/original_data/17100057.csv

Generating Processed Data

To run the data processing for the population data, with data points taken every year:

cd LEAP
python leap/data_generation/birth_data.py --time-delta P1Y

This will update the following data files:

  1. leap/processed_data/{time_delta_tag}/birth/birth_estimate.csv

  2. leap/processed_data/{time_delta_tag}/birth/initial_population.csv

The --time-delta argument must be in ISO 8601 format:

ISO 8601

Meaning

P1Y1M1DT1H1M1.1S

1 year, 1 month, 1 day, 1 hour, 1 minute, 1 second, and 100 milliseconds

P40D

40 days

P1Y1D

1 year and 1 day

P3DT4H59M

3 days, 4 hours, and 59 minutes

PT2H30M

2 hours and 30 minutes

P1M

1 month

PT1M

1 minute

Processed Data

The output of the data generation for the Birth module is two .csv files:

birth_estimate.csv

Column

Type

Description

timepoint

dt.datetime

the date and time of the start of the time interval (e.g. 2020-01-01 00:00:00)

province

str

the 2-letter province or territory ID (e.g., BC = British Columbia, AB = Alberta, CA = Canada)

N

int

total number of births (both sexes) during the given time interval and in the given province

prop_male

float

the proportion of births that are male

projection_scenario

str

past for historical data, or one of the projection scenario IDs for future data:

  • LG: low-growth projection

  • HG: high-growth projection

  • M1: medium-growth 1 projection

  • M2: medium-growth 2 projection

  • M3: medium-growth 3 projection

  • M4: medium-growth 4 projection

  • M5: medium-growth 5 projection

  • M6: medium-growth 6 projection

  • FA: fast-aging projection

  • SA: slow-aging projection

initial_population.csv

Column

Type

Description

timepoint

dt.datetime

the date and time of the start of the time interval (e.g. 2020-01-01 00:00:00)

province

str

the 2-letter province or territory ID (e.g., BC = British Columbia, AB = Alberta, CA = Canada)

age

int

age in years

prop_male

float

the proportion of births that are male

n_age

float

number of people of the given age living in the given province during the given time interval

n_birth

float

number of births in the given province during the given time interval

projection_scenario

str

past for historical data, or one of the projection scenario IDs for future data:

  • LG: low-growth projection

  • HG: high-growth projection

  • M1: medium-growth 1 projection

  • M2: medium-growth 2 projection

  • M3: medium-growth 3 projection

  • M4: medium-growth 4 projection

  • M5: medium-growth 5 projection

  • M6: medium-growth 6 projection

  • FA: fast-aging projection

  • SA: slow-aging projection

leap.data_generation.birth_data module

leap.data_generation.birth_data.get_projection_scenario_id(projection_scenario: str) str[source]

Convert the long form of the projection scenario to the 2-letter ID.

Parameters:
projection_scenario: str

The long form of the projection scenario, e.g. Projection scenario M1.

Returns:

The 2-letter ID of the projection scenario, e.g. M1.

leap.data_generation.birth_data.filter_age_group(age_group: str) bool[source]

Filter out grouped categories such as “Median”, “Average”, “All”, “to”, “over”.

Parameters:
age_group: str

The age group string.

Returns:

True if the age group is not a grouped category, False otherwise.

leap.data_generation.birth_data.interpolate(data: pandas.DataFrame, col_pred: str, time_delta: leap.utils.TimeDelta, columns_group: list[str]) pandas.DataFrame[source]

Interpolate the values of a column for missing timepoints.

Parameters:
data: pandas.DataFrame

The data to interpolate. Must contain a "timepoint" column.

col_pred: str

The name of the column to predict.

time_delta: leap.utils.TimeDelta

The duration of the time intervals to use for the data, e.g. 1 year, 5 years, etc.

Returns:

A dataframe with the same columns as the input data, but with the values of the column to predict interpolated for the missing timepoints. The dataframe will contain rows for all timepoints between the minimum and maximum timepoints in the input data, with a step size of time_delta.

leap.data_generation.birth_data.load_past_births_population_data(time_delta: leap.utils.TimeDelta, min_timepoint: datetime.datetime = datetime.datetime(2000, 1, 1, 0, 0)) pandas.DataFrame[source]

Load the past birth data from the CSV file.

Parameters:
time_delta: leap.utils.TimeDelta

The duration of the time intervals to use for the data, e.g. 1 year, 5 years, etc.

min_timepoint: datetime.datetime = datetime.datetime(2000, 1, 1, 0, 0)

The minimum timepoint to include in the data.

Returns:

The past birth data. Columns:

  • timepoint: The date / time of the data.

  • province: The 2-letter province ID.

  • N: The total number of births in that time interval.

  • prop_male: The proportion of births in that time interval that are male.

  • projection_scenario: The projection scenario; all values are "past".

leap.data_generation.birth_data.load_projected_births_population_data(time_delta: leap.utils.TimeDelta, min_timepoint: datetime.datetime, max_timepoint: datetime.datetime = datetime.datetime(2070, 1, 1, 0, 0)) pandas.DataFrame[source]

Load the projected births data from the CSV file from StatCan.

Parameters:
time_delta: leap.utils.TimeDelta

The duration of the time intervals to use for the data, e.g. 1 year, 5 years, etc.

min_timepoint: datetime.datetime

The starting timepoint for the projected data.

max_timepoint: datetime.datetime = datetime.datetime(2070, 1, 1, 0, 0)

The ending timepoint for the projected data.

Returns:

The projected births data. Columns:

  • timepoint: The starting date / time of the time interval.

  • province: The 2-letter province ID.

  • N: The total number of births predicted for that time interval.

  • prop_male: The proportion of predicted births in that time interval that are male.

  • projection_scenario: The projection scenario, one of:

    • LG: low-growth projection

    • HG: high-growth projection

    • M1: medium-growth 1 projection

    • M2: medium-growth 2 projection

    • M3: medium-growth 3 projection

    • M4: medium-growth 4 projection

    • M5: medium-growth 5 projection

    • M6: medium-growth 6 projection

    • FA: fast-aging projection

    • SA: slow-aging projection

leap.data_generation.birth_data.load_past_initial_population_data(time_delta: leap.utils.TimeDelta, min_timepoint: datetime.datetime = datetime.datetime(2000, 1, 1, 0, 0)) pandas.DataFrame[source]

Load the past initial population data from the CSV file.

Parameters:
time_delta: leap.utils.TimeDelta

The duration of the time intervals to use for the data, e.g. 1 year, 5 years, 1 month, etc.

min_timepoint: datetime.datetime = datetime.datetime(2000, 1, 1, 0, 0)

The starting timepoint for the past data; only timepoints >= this value will be included in the returned data.

Returns:

The past initial population data. Columns:

  • timepoint: The date / time of the data.

  • province: The 2-letter province ID, e.g. BC.

  • age: The age of the population.

  • prop_male: The proportion of the population in that age group that are male.

  • n_age: The total number of people in that age group for the given time interval, province, and projection scenario.

  • n_birth: The total number of births in the given time interval, province, and projection scenario.

  • prop: The proportion of the total number of people in that age group to the total number of births in that time interval.

  • projection_scenario: The projection scenario; all values are “past”.

leap.data_generation.birth_data.load_projected_initial_population_data(time_delta: leap.utils.TimeDelta, min_timepoint: datetime.datetime, max_timepoint: datetime.datetime = datetime.datetime(2070, 1, 1, 0, 0)) pandas.DataFrame[source]

Load the projected initial population data from the CSV file.

Parameters:
time_delta: leap.utils.TimeDelta

The duration of the time intervals to use for the data, e.g. 1 year, 5 years, 1 month, etc.

min_timepoint: datetime.datetime

The starting timepoint for the projected data.

max_timepoint: datetime.datetime = datetime.datetime(2070, 1, 1, 0, 0)

The ending timepoint for the projected data.

Returns:

The projected initial population data. Columns:

  • timepoint: The starting date / time of the time interval.

  • province: The 2-letter province ID, e.g. BC.

  • age: The age of the population.

  • prop_male: The proportion of the population in that age group that are male.

  • n_age: The total number of people in that age group for the given time interval, province, and projection scenario.

  • n_birth: The total number of births in the given time interval, province, and projection scenario.

  • prop: The proportion of the total number of people in that age group to the total number of births in that time interval.

  • projection_scenario: The projection scenario, one of:

    • LG: low-growth projection

    • HG: high-growth projection

    • M1: medium-growth 1 projection

    • M2: medium-growth 2 projection

    • M3: medium-growth 3 projection

    • M4: medium-growth 4 projection

    • M5: medium-growth 5 projection

    • M6: medium-growth 6 projection

    • FA: fast-aging projection

    • SA: slow-aging projection

leap.data_generation.birth_data.generate_birth_estimate_data(time_delta: leap.utils.TimeDelta, draw_plot: bool = True)[source]

Create/update the birth_estimate.csv file.

Parameters:
time_delta: leap.utils.TimeDelta

The duration of the time intervals to use for the data, e.g. 1 year, 5 years, etc.

draw_plot: bool = True

If True, generate a plot for validation.

leap.data_generation.birth_data.generate_initial_population_data(time_delta: leap.utils.TimeDelta, draw_plot: bool = True)[source]

Create/update the initial_population.csv file.

Parameters:
time_delta: leap.utils.TimeDelta

The duration of the time intervals to use for the data, e.g. 1 year, 5 years, etc.

draw_plot: bool = True

If True, generate a plot for validation.

leap.data_generation.birth_data.plot(df: pandas.DataFrame, y: str, color: str, title: str = '', file_path: pathlib.Path | None = None, width: int = 2000, height: int = 1500)[source]

Plot the incidence or prevalence of asthma.

Parameters:
df: pandas.DataFrame

A dataframe containing either incidence or prevalence data. Must have columns:

  • timepoint (dt.datetime): The given timepoint.

  • province (str): The 2-letter province ID, e.g. BC.

  • projection_scenario (str): The projection scenario, one of:
    • past: past data from StatCan, up to the most recent census date (2021-01-01)

    • LG: low-growth projection

    • HG: high-growth projection

    • M1: medium-growth 1 projection

    • M2: medium-growth 2 projection

    • M3: medium-growth 3 projection

    • M4: medium-growth 4 projection

    • M5: medium-growth 5 projection

    • M6: medium-growth 6 projection

    • FA: fast-aging projection

    • SA: slow-aging projection

y: str

The name of the column in the dataframe which will be plotted as the y data.

color: str

The name of the column in the dataframe which will be used to color the data.

title: str = ''

The title of the plot.

file_path: pathlib.Path | None = None

The path to save the plot to.

width: int = 2000

The width of the plot.

height: int = 1500

The height of the plot.