Occurrence Data

To run the data processing for the occurrence data:

cd LEAP
python3 leap/data_generation/occurrence_data.py

leap.data_generation.occurrence_data module

leap.data_generation.occurrence_data.load_asthma_df(starting_year: int = 2000) pandas.core.frame.DataFrame[source]

Load the asthma incidence and prevalence data.

Parameters:
starting_year: int = 2000

The starting year for the data. Data before this year will be excluded from the analysis.

Returns:

The asthma incidence and prevalence data. Columns:

  • year (int): The calendar year.

  • age_group (str): The age group.

  • age (int): The average age of the age group.

  • sex (str): One of F = female, M = male.

  • incidence (float): The incidence of asthma.

  • prevalence (float): The prevalence of asthma.

leap.data_generation.occurrence_data.generate_occurrence_model(df_asthma: pandas.core.frame.DataFrame, formula: str, occ_type: str, maxiter: int = 1000) statsmodels.genmod.generalized_linear_model.GLMResultsWrapper[source]

Generate a GLM model for asthma incidence or prevalence.

Parameters:
df_asthma: pandas.core.frame.DataFrame

The asthma dataframe. Must have columns:

  • year (int): The calendar year.

  • sex (str): One of M = male, F = female.

  • age (int): The age in years.

  • incidence (float): The incidence of asthma.

  • prevalence (float): The prevalence of asthma.

formula: str

The formula for the GLM model. See the statsmodels documentation for more information.

occ_type: str

The type of occurrence data to model. Must be one of "incidence" or "prevalence".

maxiter: int = 1000

The maximum number of iterations to perform while fitting the model.

Returns:

The fitted GLM model.

leap.data_generation.occurrence_data.generate_incidence_model(df_asthma: pandas.core.frame.DataFrame, maxiter: int = 1000) statsmodels.genmod.generalized_linear_model.GLMResultsWrapper[source]

Generate a GLM model for asthma incidence.

Parameters:
df_asthma: pandas.core.frame.DataFrame

The asthma dataframe. Must have columns:

  • year (int): The calendar year.

  • sex (str): One of M = male, F = female.

  • age (int): The age in years.

  • incidence (float): The incidence of asthma.

  • prevalence (float): The prevalence of asthma.

maxiter: int = 1000

The maximum number of iterations to perform while fitting the model.

Returns:

The fitted GLM model.

leap.data_generation.occurrence_data.generate_prevalence_model(df_asthma: pandas.core.frame.DataFrame, maxiter: int = 1000) tuple[statsmodels.genmod.generalized_linear_model.GLMResultsWrapper, numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray][source]

Generate a GLM model for asthma prevalence.

Parameters:
df_asthma: pandas.core.frame.DataFrame

The asthma dataframe. Must have columns:

  • year (int): The calendar year.

  • sex (str): One of M = male, F = female.

  • age (int): The age in years.

  • incidence (float): The incidence of asthma.

  • prevalence (float): The prevalence of asthma.

maxiter: int = 1000

The maximum number of iterations to perform while fitting the model.

Returns:

  1. The fitted GLM model.

  2. The alpha parameters for the age polynomial.

  3. The norm2 parameters for the age polynomial.

  4. The alpha parameters for the year polynomial.

  5. The norm2 parameters for the year polynomial.

Return type:

A tuple containing

leap.data_generation.occurrence_data.get_predicted_data(model: statsmodels.genmod.generalized_linear_model.GLMResultsWrapper, pred_col: str, min_age: int = 3, max_age: int = 100, min_year: int = 2000, max_year: int = 2019) pandas.core.frame.DataFrame[source]

Get predicted data from a GLM model.

The GLM model must be fitted on the following columns:

  • year (int): The calendar year.

  • sex (int): One of 0 = female, 1 = male.

  • age (int): The age in years.

Parameters:
model: statsmodels.genmod.generalized_linear_model.GLMResultsWrapper

The fitted GLM model.

pred_col: str

The name of the column to store the predicted data.

min_age: int = 3

The minimum age to predict.

max_age: int = 100

The maximum age to predict.

min_year: int = 2000

The minimum year to predict.

max_year: int = 2019

The maximum year to predict.

Returns:

A dataframe containing the predicted data. Columns:

  • year (int): The calendar year.

  • sex (str): One of M = male, F = female.

  • age (int): The age in years.

  • pred_col (float): The predicted data.

leap.data_generation.occurrence_data.plot_occurrence(df: pandas.core.frame.DataFrame, y: str, title: str = '', file_path: pathlib.Path | None = None, min_year: int = 2000, max_year: int = 2019, year_interval: int = 2, max_age: int = 110, width: int = 1000, height: int = 800)[source]

Plot the incidence or prevalence of asthma.

Parameters:
df: pandas.core.frame.DataFrame

A dataframe containing either incidence or prevalence data. Must have columns:

  • year (int): The calendar year.

  • sex (str): One of F = female, M = male.

  • age (int): The age in years.

  • y (float): Specified by the y argument, this will be the y data.

  • y_pred (float): Optional, the predicted y data. If this column is present, it will be plotted alongside the actual data. The column name must be the same as y with _pred appended. For example, if y is incidence, then the predicted data must be incidence_pred.

y: str

The name of the column in the dataframe which will be plotted as the y data.

title: str = ''

The title of the plot.

file_path: pathlib.Path | None = None

The path to save the plot to. If None, the plot will be displayed.

min_year: int = 2000

The minimum year to plot.

max_year: int = 2019

The maximum year to plot.

year_interval: int = 2

The interval between years. This is used if you don’t want to plot every year.

max_age: int = 110

The maximum age to plot.

width: int = 1000

The width of the plot.

height: int = 800

The height of the plot.

Returns:

If file_path is None, the plot will be displayed. Otherwise, the plot will be saved to the specified path.

leap.data_generation.occurrence_data.add_beta_parameters(model: statsmodels.genmod.generalized_linear_model.GLMResultsWrapper, parameter_map: dict[str, list[int]], config: dict[str, Any]) dict[str, Any][source]

Add the beta parameters to the config dictionary.

Parameters:
model: statsmodels.genmod.generalized_linear_model.GLMResultsWrapper

The fitted GLM model.

parameter_map: dict[str, list[int]]

A dictionary mapping the parameter names to their indices in the model parameters field, model.params. The keys are the parameter names and the values are lists of indices. For example, if βyear is the second parameter in the list, then the mapping would be {"βyear": [1]}, and it would be accessed by model.params.iloc[1].

config: dict[str, Any]

The config dictionary to add the parameters to.

Returns:

The config dictionary with the beta parameters added.

leap.data_generation.occurrence_data.generate_occurrence_data()[source]

Generate the asthma incidence and prevalence data.

Saves the data to a CSV file: processed_data/asthma_occurrence_predictions.csv.

The data is also plotted and saved to the following files:

  • data_generation/figures/asthma_incidence_predicted.png: The predicted asthma incidence per 100 in BC.

  • data_generation/figures/asthma_prevalence_predicted.png: The predicted asthma prevalence per 100 in BC.