Migration Data¶
StatCan
does not contain immigration/emigration data broken down by the necessary
groups (age, sex, etc), so we do not have exact data for this category. Instead, we use the
following data files:
leap/processed_data/life_table.csv
(generated bydeath_data.py
)leap/processed_data/birth/initial_pop_distribution_prop.csv
(generated bybirth_data.py
)
The life_table.csv
contains the probability of death during that year for each age, sex,
province, and projection scenario.
The initial_pop_distribution_prop.csv
contains the number of people in a given age, sex,
province, and projection scenario, along with the number of births for that year. This data is the
net number of people, factoring in death, immigration, and emigration.
To obtain the net migration, for anyone of age > 0
, we compute the number of people in each age
group projected to die during that year based on the prob_death
column in the life_table.csv
.
Then we calculate the net change in people using the n_age
column in the
initial_pop_distribution_prop.csv
. We subtract the number of people who died from the net
population change to get the net number of people who migrated:
delta_n = n - n_prev * (1 - prob_death)
To run the data generation for the migration data:
cd LEAP
python3 leap/data_generation/migration_data.py
leap.data_generation.migration_data module¶
- leap.data_generation.migration_data.get_prev_year_population(df: pandas.core.frame.DataFrame, sex: str, year: int, age: int, min_year: int, min_age: int) pandas.core.series.Series [source]¶
Get the age, sex, probability of death, and population for the previous year.
- Parameters:¶
- df: pandas.core.frame.DataFrame¶
A dataframe with the following columns:
year
: The calendar year.sex
: One ofM
= male,F
= female.age
: The integer age.N
: The population for a given year, age, sex, province, and projection scenario.prob_death
: The probability that a person in the given year, age, sex, province, and projection scenario will die within the year.
- sex: str¶
One of
F
= female,M
= male.- year: int¶
The calendar year.
- age: int¶
The integer age.
- min_year: int¶
The minimum year in the dataframe.
- min_age: int¶
The minimum age in the dataframe.
- Returns:¶
The age, sex, probability of death, and population for the previous year.
- leap.data_generation.migration_data.get_delta_n(n: float, n_prev: float, prob_death: float) float [source]¶
Get the population change due to migration for a given age and sex in a single year.
- Parameters:¶
- n: float¶
The number of people living in Canada for a single age, sex, year, province, and projection scenario.
- n_prev: float¶
The number of people living in Canada in the previous year for the same age, sex, province, and projection scenario as defined for
n
. So ifn
is the number of females aged10
in the year2020
,n_prev
is the number of females aged9
in the year2019
.- prob_death: float¶
The probability that a person with a given age and sex in a given year will die between the previous year and this year. So if the person is a female aged
10
in2020
,prob_death
is the probability that a female aged9
in2019
will die by the age of10
.
- Returns:¶
The change in population for a given year, age, and sex due to migration.
- leap.data_generation.migration_data.get_n_migrants(delta_N: float) pandas.core.series.Series [source]¶
Get the number of immigrants and emigrants in a single year for a given age and sex.
Important
TODO: This function is wrong.
delta_N
is the change in population due to migration. This function currently assumes that ifdelta_N < 0
, 100% of migration is emigration, and ifdelta_N > 0
, 100% of migration is immigration. This has led to the data being very inaccurate (for example, it appears as though people in their 90s are emigrating a lot and people in their 20s are not). This will be remedied in a separate PR.
- leap.data_generation.migration_data.load_migration_data() tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame] [source]¶
Generate migration data for the given provinces and years.
- Returns:¶
A tuple containing two dataframes. The first dataframe contains the immigration data:
year
: The calendar year.province
: A string indicating the 2-letter province abbreviation, e.g."BC"
. For all of Canada, set province to"CA"
.sex
: One ofM
= male,F
= female.age
: The integer age.projection_scenario
: The projection scenario.n_immigrants
: The number of immigrants for a given year, province, sex, age, and projection scenario.prop_immigrants_birth
: The proportion of immigrants for a given year, province, sex, age, and projection scenario, relative to the total number of births in that year for the given province and projection scenario.prop_immigrants_year
: The proportion of immigrants for a given year, province, sex, age, and projection scenario, relative to the total number of immigrants in that year for the given province and projection scenario.
The second dataframe contains the emigration data:
year
: The calendar year.province
: A string indicating the 2-letter province abbreviation, e.g."BC"
. For all of Canada, set province to"CA"
.sex
: One ofM
= male,F
= female.age
: The integer age.projection_scenario
: The projection scenario.n_emigrants
: The number of emigrants for a given year, province, sex, age, and projection scenario.prop_emigrants_birth
: The proportion of emigrants for a given year, province, sex, age, and projection scenario, relative to the total number of births in that year for the given province and projection scenario.prop_emigrants_year
: The proportion of emigrants for a given year, province, sex, age, and projection scenario, relative to the total number of emigrants in that year for the given province and projection scenario.