Birth Data¶
Original Data¶
To obtain the population data for each year, we used two tables from StatCan:
1999 - 2021:
For past years, we used Table 17-10-00005-01 from StatCan.
The
*.csvfile can be downloaded from here: 17100005-eng.zipand is saved as:
LEAP/leap/original_data/17100005.csv2021 - 2065:
For future years, we used Table 17-10-0057-01 from StatCan.
The
*.csvfile can be downloaded from here: 17100057-eng.zip.and is saved as:
LEAP/leap/original_data/17100057.csv
Generating Processed Data¶
To run the data processing for the population data, with data points taken every year:
cd LEAP
python leap/data_generation/birth_data.py --time-delta P1Y
This will update the following data files:
leap/processed_data/{time_delta_tag}/birth/birth_estimate.csvleap/processed_data/{time_delta_tag}/birth/initial_population.csv
The --time-delta argument must be in ISO 8601 format:
ISO 8601 |
Meaning |
|---|---|
P1Y1M1DT1H1M1.1S |
1 year, 1 month, 1 day, 1 hour, 1 minute, 1 second, and 100 milliseconds |
P40D |
40 days |
P1Y1D |
1 year and 1 day |
P3DT4H59M |
3 days, 4 hours, and 59 minutes |
PT2H30M |
2 hours and 30 minutes |
P1M |
1 month |
PT1M |
1 minute |
Processed Data¶
The output of the data generation for the Birth module is two .csv files:
birth_estimate.csv
Column |
Type |
Description |
|---|---|---|
|
|
the date and time of the start of the time interval (e.g. |
|
|
the 2-letter province or territory ID
(e.g., |
|
|
total number of births (both sexes) during the given time interval and in the given province |
|
|
the proportion of births that are male |
|
|
|
initial_population.csv
Column |
Type |
Description |
|---|---|---|
|
|
the date and time of the start of the time interval (e.g. |
|
|
the 2-letter province or territory ID
(e.g., |
|
|
age in years |
|
|
the proportion of births that are male |
|
|
number of people of the given age living in the given province during the given time interval |
|
|
number of births in the given province during the given time interval |
|
|
|
leap.data_generation.birth_data module¶
- leap.data_generation.birth_data.get_projection_scenario_id(projection_scenario: str) str[source]¶
Convert the long form of the projection scenario to the 2-letter ID.
- leap.data_generation.birth_data.filter_age_group(age_group: str) bool[source]¶
Filter out grouped categories such as “Median”, “Average”, “All”, “to”, “over”.
- leap.data_generation.birth_data.interpolate(data: pandas.DataFrame, col_pred: str, time_delta: leap.utils.TimeDelta, columns_group: list[str]) pandas.DataFrame[source]¶
Interpolate the values of a column for missing timepoints.
- Parameters:¶
- data: pandas.DataFrame¶
The data to interpolate. Must contain a
"timepoint"column.- col_pred: str¶
The name of the column to predict.
- time_delta: leap.utils.TimeDelta¶
The duration of the time intervals to use for the data, e.g. 1 year, 5 years, etc.
- Returns:¶
A dataframe with the same columns as the input data, but with the values of the column to predict interpolated for the missing timepoints. The dataframe will contain rows for all timepoints between the minimum and maximum timepoints in the input data, with a step size of
time_delta.
-
leap.data_generation.birth_data.load_past_births_population_data(time_delta: leap.utils.TimeDelta, min_timepoint: datetime.datetime =
datetime.datetime(2000, 1, 1, 0, 0)) pandas.DataFrame[source]¶ Load the past birth data from the CSV file.
- Parameters:¶
- time_delta: leap.utils.TimeDelta¶
The duration of the time intervals to use for the data, e.g. 1 year, 5 years, etc.
- min_timepoint: datetime.datetime =
datetime.datetime(2000, 1, 1, 0, 0)¶ The minimum timepoint to include in the data.
- Returns:¶
The past birth data. Columns:
timepoint: The date / time of the data.province: The 2-letter province ID.N: The total number of births in that time interval.prop_male: The proportion of births in that time interval that are male.projection_scenario: The projection scenario; all values are"past".
-
leap.data_generation.birth_data.load_projected_births_population_data(time_delta: leap.utils.TimeDelta, min_timepoint: datetime.datetime, max_timepoint: datetime.datetime =
datetime.datetime(2070, 1, 1, 0, 0)) pandas.DataFrame[source]¶ Load the projected births data from the CSV file from
StatCan.- Parameters:¶
- time_delta: leap.utils.TimeDelta¶
The duration of the time intervals to use for the data, e.g. 1 year, 5 years, etc.
- min_timepoint: datetime.datetime¶
The starting timepoint for the projected data.
- max_timepoint: datetime.datetime =
datetime.datetime(2070, 1, 1, 0, 0)¶ The ending timepoint for the projected data.
- Returns:¶
The projected births data. Columns:
timepoint: The starting date / time of the time interval.province: The 2-letter province ID.N: The total number of births predicted for that time interval.prop_male: The proportion of predicted births in that time interval that are male.projection_scenario: The projection scenario, one of:LG: low-growth projectionHG: high-growth projectionM1: medium-growth 1 projectionM2: medium-growth 2 projectionM3: medium-growth 3 projectionM4: medium-growth 4 projectionM5: medium-growth 5 projectionM6: medium-growth 6 projectionFA: fast-aging projectionSA: slow-aging projection
-
leap.data_generation.birth_data.load_past_initial_population_data(time_delta: leap.utils.TimeDelta, min_timepoint: datetime.datetime =
datetime.datetime(2000, 1, 1, 0, 0)) pandas.DataFrame[source]¶ Load the past initial population data from the CSV file.
- Parameters:¶
- time_delta: leap.utils.TimeDelta¶
The duration of the time intervals to use for the data, e.g. 1 year, 5 years, 1 month, etc.
- min_timepoint: datetime.datetime =
datetime.datetime(2000, 1, 1, 0, 0)¶ The starting timepoint for the past data; only timepoints >= this value will be included in the returned data.
- Returns:¶
The past initial population data. Columns:
timepoint: The date / time of the data.province: The 2-letter province ID, e.g.BC.age: The age of the population.prop_male: The proportion of the population in that age group that are male.n_age: The total number of people in that age group for the given time interval, province, and projection scenario.n_birth: The total number of births in the given time interval, province, and projection scenario.prop: The proportion of the total number of people in that age group to the total number of births in that time interval.projection_scenario: The projection scenario; all values are “past”.
-
leap.data_generation.birth_data.load_projected_initial_population_data(time_delta: leap.utils.TimeDelta, min_timepoint: datetime.datetime, max_timepoint: datetime.datetime =
datetime.datetime(2070, 1, 1, 0, 0)) pandas.DataFrame[source]¶ Load the projected initial population data from the CSV file.
- Parameters:¶
- time_delta: leap.utils.TimeDelta¶
The duration of the time intervals to use for the data, e.g. 1 year, 5 years, 1 month, etc.
- min_timepoint: datetime.datetime¶
The starting timepoint for the projected data.
- max_timepoint: datetime.datetime =
datetime.datetime(2070, 1, 1, 0, 0)¶ The ending timepoint for the projected data.
- Returns:¶
The projected initial population data. Columns:
timepoint: The starting date / time of the time interval.province: The 2-letter province ID, e.g.BC.age: The age of the population.prop_male: The proportion of the population in that age group that are male.n_age: The total number of people in that age group for the given time interval, province, and projection scenario.n_birth: The total number of births in the given time interval, province, and projection scenario.prop: The proportion of the total number of people in that age group to the total number of births in that time interval.projection_scenario: The projection scenario, one of:LG: low-growth projectionHG: high-growth projectionM1: medium-growth 1 projectionM2: medium-growth 2 projectionM3: medium-growth 3 projectionM4: medium-growth 4 projectionM5: medium-growth 5 projectionM6: medium-growth 6 projectionFA: fast-aging projectionSA: slow-aging projection
-
leap.data_generation.birth_data.generate_birth_estimate_data(time_delta: leap.utils.TimeDelta, draw_plot: bool =
True)[source]¶ Create/update the
birth_estimate.csvfile.- Parameters:¶
- time_delta: leap.utils.TimeDelta¶
The duration of the time intervals to use for the data, e.g. 1 year, 5 years, etc.
- draw_plot: bool =
True¶ If
True, generate a plot for validation.
-
leap.data_generation.birth_data.generate_initial_population_data(time_delta: leap.utils.TimeDelta, draw_plot: bool =
True)[source]¶ Create/update the
initial_population.csvfile.- Parameters:¶
- time_delta: leap.utils.TimeDelta¶
The duration of the time intervals to use for the data, e.g. 1 year, 5 years, etc.
- draw_plot: bool =
True¶ If
True, generate a plot for validation.
-
leap.data_generation.birth_data.plot(df: pandas.DataFrame, y: str, color: str, title: str =
'', file_path: pathlib.Path | None =None, width: int =2000, height: int =1500)[source]¶ Plot the incidence or prevalence of asthma.
- Parameters:¶
- df: pandas.DataFrame¶
A dataframe containing either incidence or prevalence data. Must have columns:
timepoint (dt.datetime): The given timepoint.province (str): The 2-letter province ID, e.g.BC.projection_scenario (str): The projection scenario, one of:past: past data from StatCan, up to the most recent census date (2021-01-01)LG: low-growth projectionHG: high-growth projectionM1: medium-growth 1 projectionM2: medium-growth 2 projectionM3: medium-growth 3 projectionM4: medium-growth 4 projectionM5: medium-growth 5 projectionM6: medium-growth 6 projectionFA: fast-aging projectionSA: slow-aging projection
- y: str¶
The name of the column in the dataframe which will be plotted as the
ydata.- color: str¶
The name of the column in the dataframe which will be used to color the data.
- title: str =
''¶ The title of the plot.
- file_path: pathlib.Path | None =
None¶ The path to save the plot to.
- width: int =
2000¶ The width of the plot.
- height: int =
1500¶ The height of the plot.