Migration Data¶

StatCan does not contain immigration/emigration data broken down by the necessary groups (age, sex, etc), so we do not have exact data for this category. Instead, we use the following data files:

leap/processed_data/life_table.csv (generated by death_data.py)
leap/processed_data/birth/initial_pop_distribution_prop.csv (generated by birth_data.py)

The life_table.csv contains the probability of death during that year for each age, sex, province, and projection scenario.

The initial_pop_distribution_prop.csv contains the number of people in a given age, sex, province, and projection scenario, along with the number of births for that year. This data is the net number of people, factoring in death, immigration, and emigration.

To obtain the net migration, for anyone of age > 0, we compute the number of people in each age group projected to die during that year based on the prob_death column in the life_table.csv. Then we calculate the net change in people using the n_age column in the initial_pop_distribution_prop.csv. We subtract the number of people who died from the net population change to get the net number of people who migrated:

delta_n = n - n_prev * (1 - prob_death)

To run the data generation for the migration data:

cd LEAP
python3 leap/data_generation/migration_data.py

leap.data_generation.migration_data module¶

leap.data_generation.migration_data.get_prev_year_population(df: pandas.core.frame.DataFrame, sex: str, year: int, age: int, min_year: int, min_age: int) → pandas.core.series.Series[source]¶

Get the age, sex, probability of death, and population for the previous year.

Parameters:¶

df: pandas.core.frame.DataFrame¶

A dataframe with the following columns:

year: The calendar year.
sex: One of M = male, F = female.
age: The integer age.
N: The population for a given year, age, sex, province, and projection scenario.
prob_death: The probability that a person in the given year, age, sex, province, and projection scenario will die within the year.

sex: str¶

One of F = female, M = male.

year: int¶

The calendar year.

age: int¶

The integer age.

min_year: int¶

The minimum year in the dataframe.

min_age: int¶

The minimum age in the dataframe.

Returns:¶

The age, sex, probability of death, and population for the previous year.

leap.data_generation.migration_data.get_delta_n(n: float, n_prev: float, prob_death: float) → float[source]¶

Get the population change due to migration for a given age and sex in a single year.

Parameters:¶

n: float¶: The number of people living in Canada for a single age, sex, year, province, and projection scenario.
n_prev: float¶: The number of people living in Canada in the previous year for the same age, sex, province, and projection scenario as defined for n. So if n is the number of females aged 10 in the year 2020, n_prev is the number of females aged 9 in the year 2019.
prob_death: float¶: The probability that a person with a given age and sex in a given year will die between the previous year and this year. So if the person is a female aged 10 in 2020, prob_death is the probability that a female aged 9 in 2019 will die by the age of 10.

Returns:¶

The change in population for a given year, age, and sex due to migration.

leap.data_generation.migration_data.get_n_migrants(delta_N: float) → pandas.core.series.Series[source]¶

Get the number of immigrants and emigrants in a single year for a given age and sex.

Important

TODO: This function is wrong. delta_N is the change in population due to migration. This function currently assumes that if delta_N < 0, 100% of migration is emigration, and if delta_N > 0, 100% of migration is immigration. This has led to the data being very inaccurate (for example, it appears as though people in their 90s are emigrating a lot and people in their 20s are not). This will be remedied in a separate PR.

Parameters:¶

delta_N: float¶: The change in population for a given year, age, sex, province, and projection scenario due to migration.

Returns:¶

A pd.Series containing two values, the number of immigrants in a single year and the number of emigrants in a single year.

leap.data_generation.migration_data.load_migration_data() → tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]¶

Generate migration data for the given provinces and years.

Returns:¶

A tuple containing two dataframes. The first dataframe contains the immigration data:

year: The calendar year.
province: A string indicating the 2-letter province abbreviation, e.g. "BC". For all of Canada, set province to "CA".
sex: One of M = male, F = female.
age: The integer age.
projection_scenario: The projection scenario.
n_immigrants: The number of immigrants for a given year, province, sex, age, and projection scenario.
prop_immigrants_birth: The proportion of immigrants for a given year, province, sex, age, and projection scenario, relative to the total number of births in that year for the given province and projection scenario.
prop_immigrants_year: The proportion of immigrants for a given year, province, sex, age, and projection scenario, relative to the total number of immigrants in that year for the given province and projection scenario.

The second dataframe contains the emigration data:

year: The calendar year.
province: A string indicating the 2-letter province abbreviation, e.g. "BC". For all of Canada, set province to "CA".
sex: One of M = male, F = female.
age: The integer age.
projection_scenario: The projection scenario.
n_emigrants: The number of emigrants for a given year, province, sex, age, and projection scenario.
prop_emigrants_birth: The proportion of emigrants for a given year, province, sex, age, and projection scenario, relative to the total number of births in that year for the given province and projection scenario.
prop_emigrants_year: The proportion of emigrants for a given year, province, sex, age, and projection scenario, relative to the total number of emigrants in that year for the given province and projection scenario.

leap.data_generation.migration_data.generate_migration_data()[source]¶