Migration Data

StatCan does not contain immigration/emigration data broken down by the necessary groups (age, sex, etc), so we do not have exact data for this category. Instead, we use the following data files:

  1. leap/processed_data/life_table.csv (generated by death_data.py)

  2. leap/processed_data/birth/initial_population.csv (generated by birth_data.py)

The life_table.csv contains the probability of death during that year for each age, sex, province, and projection scenario.

The initial_population.csv contains the number of people in a given age, sex, province, and projection scenario, along with the number of births for that year. This data is the net number of people, factoring in death, immigration, and emigration.

To obtain the net migration, for anyone of age > 0, we compute the number of people in each age group projected to die during that year based on the prob_death column in the life_table.csv. Then we calculate the net change in people using the n_age column in the initial_population.csv. We subtract the number of people who died from the net population change to get the net number of people who migrated:

delta_n = n - n_prev * (1 - prob_death)

To run the data generation for the migration data:

cd LEAP
python3 leap/data_generation/migration_data.py

leap.data_generation.migration_data module

leap.data_generation.migration_data.get_prev_year_population(df: pandas.DataFrame, sex: str, year: int, age: int, min_year: int, min_age: int) pandas.Series[source]

Get the age, sex, probability of death, and population for the previous year.

Parameters:
df: pandas.DataFrame

A dataframe with the following columns:

  • year: The calendar year.

  • sex: One of M = male, F = female.

  • age: The integer age.

  • N: The population for a given year, age, sex, province, and projection scenario.

  • prob_death: The probability that a person in the given year, age, sex, province, and projection scenario will die within the year.

sex: str

One of F = female, M = male.

year: int

The calendar year.

age: int

The integer age.

min_year: int

The minimum year in the dataframe.

min_age: int

The minimum age in the dataframe.

Returns:

The age, sex, probability of death, and population for the previous year.

leap.data_generation.migration_data.get_delta_n(n: float, n_prev: float, prob_death: float) float[source]

Get the population change due to migration for a given age and sex in a single year.

Parameters:
n: float

The number of people living in Canada for a single age, sex, year, province, and projection scenario.

n_prev: float

The number of people living in Canada in the previous year for the same age, sex, province, and projection scenario as defined for n. So if n is the number of females aged 10 in the year 2020, n_prev is the number of females aged 9 in the year 2019.

prob_death: float

The probability that a person with a given age and sex in a given year will die between the previous year and this year. So if the person is a female aged 10 in 2020, prob_death is the probability that a female aged 9 in 2019 will die by the age of 10.

Returns:

The change in population for a given year, age, and sex due to migration.

leap.data_generation.migration_data.load_migration_data() pandas.DataFrame[source]

Generate migration data for the given provinces and years.

Returns:

  • year: The calendar year.

  • province: A string indicating the 2-letter province abbreviation, e.g. "BC". For all of Canada, set province to "CA".

  • sex: One of M = male, F = female.

  • age: The integer age.

  • projection_scenario: The projection scenario.

  • delta_n: The signed change in population for a given year, age, sex, province, and projection scenario due to net migration. Positive values indicate net immigration; negative values indicate net emigration.

  • prop_migrants_birth: The signed proportion of delta_n relative to the total number of births in that year for the given province and projection scenario. Positive = net immigration, negative = net emigration.

  • prop_immigrants_year: For cells where delta_n > 0, the proportion of immigrants for this age and sex relative to the total number of immigrants in that year. Zero for emigration cells. Denominator includes only immigration cells.

  • prop_emigrants_year: For cells where delta_n < 0, the proportion of emigrants for this age and sex relative to the total number of emigrants in that year. Zero for immigration cells. Denominator includes only emigration cells.

  • prob_emigration: For cells where delta_n < 0, the per-person probability of emigrating (abs(delta_n) / N). Zero for immigration cells.

Return type:

A dataframe with the following columns

leap.data_generation.migration_data.generate_migration_data()[source]