Occurrence Data¶
To run the data processing for the occurrence data:
cd LEAP
python3 leap/data_generation/occurrence_data.py
leap.data_generation.occurrence_data module¶
-
leap.data_generation.occurrence_data.load_asthma_df(starting_year: int =
2000
) pandas.core.frame.DataFrame [source]¶ Load the asthma incidence and prevalence data.
- Parameters:¶
- starting_year: int =
2000
¶ The starting year for the data. Data before this year will be excluded from the analysis.
- starting_year: int =
- Returns:¶
The asthma incidence and prevalence data. Columns:
year (int)
: The calendar year.age_group (str)
: The age group.age (int)
: The average age of the age group.sex (str)
: One ofF
= female,M
= male.incidence (float)
: The incidence of asthma.prevalence (float)
: The prevalence of asthma.
-
leap.data_generation.occurrence_data.generate_occurrence_model(df_asthma: pandas.core.frame.DataFrame, formula: str, occ_type: str, maxiter: int =
1000
) statsmodels.genmod.generalized_linear_model.GLMResultsWrapper [source]¶ Generate a
GLM
model for asthma incidence or prevalence.- Parameters:¶
- df_asthma: pandas.core.frame.DataFrame¶
The asthma dataframe. Must have columns:
year (int)
: The calendar year.sex (str)
: One ofM
= male,F
= female.age (int)
: The age in years.incidence (float)
: The incidence of asthma.prevalence (float)
: The prevalence of asthma.
- formula: str¶
The formula for the GLM model. See the statsmodels documentation for more information.
- occ_type: str¶
The type of occurrence data to model. Must be one of
"incidence"
or"prevalence"
.- maxiter: int =
1000
¶ The maximum number of iterations to perform while fitting the model.
- Returns:¶
The fitted
GLM
model.
-
leap.data_generation.occurrence_data.generate_incidence_model(df_asthma: pandas.core.frame.DataFrame, maxiter: int =
1000
) statsmodels.genmod.generalized_linear_model.GLMResultsWrapper [source]¶ Generate a
GLM
model for asthma incidence.- Parameters:¶
- df_asthma: pandas.core.frame.DataFrame¶
The asthma dataframe. Must have columns:
year (int)
: The calendar year.sex (str)
: One ofM
= male,F
= female.age (int)
: The age in years.incidence (float)
: The incidence of asthma.prevalence (float)
: The prevalence of asthma.
- maxiter: int =
1000
¶ The maximum number of iterations to perform while fitting the model.
- Returns:¶
The fitted
GLM
model.
-
leap.data_generation.occurrence_data.generate_prevalence_model(df_asthma: pandas.core.frame.DataFrame, maxiter: int =
1000
) tuple[statsmodels.genmod.generalized_linear_model.GLMResultsWrapper, numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray] [source]¶ Generate a
GLM
model for asthma prevalence.- Parameters:¶
- df_asthma: pandas.core.frame.DataFrame¶
The asthma dataframe. Must have columns:
year (int)
: The calendar year.sex (str)
: One ofM
= male,F
= female.age (int)
: The age in years.incidence (float)
: The incidence of asthma.prevalence (float)
: The prevalence of asthma.
- maxiter: int =
1000
¶ The maximum number of iterations to perform while fitting the model.
- Returns:¶
The fitted
GLM
model.The alpha parameters for the age polynomial.
The norm2 parameters for the age polynomial.
The alpha parameters for the year polynomial.
The norm2 parameters for the year polynomial.
- Return type:¶
A tuple containing
-
leap.data_generation.occurrence_data.get_predicted_data(model: statsmodels.genmod.generalized_linear_model.GLMResultsWrapper, pred_col: str, min_age: int =
3
, max_age: int =100
, min_year: int =2000
, max_year: int =2019
) pandas.core.frame.DataFrame [source]¶ Get predicted data from a GLM model.
The GLM model must be fitted on the following columns:
year (int)
: The calendar year.sex (int)
: One of0
= female,1
= male.age (int)
: The age in years.
- Parameters:¶
- model: statsmodels.genmod.generalized_linear_model.GLMResultsWrapper¶
The fitted GLM model.
- pred_col: str¶
The name of the column to store the predicted data.
- min_age: int =
3
¶ The minimum age to predict.
- max_age: int =
100
¶ The maximum age to predict.
- min_year: int =
2000
¶ The minimum year to predict.
- max_year: int =
2019
¶ The maximum year to predict.
- Returns:¶
A dataframe containing the predicted data. Columns:
year (int)
: The calendar year.sex (str)
: One ofM
= male,F
= female.age (int)
: The age in years.pred_col (float)
: The predicted data.
-
leap.data_generation.occurrence_data.plot_occurrence(df: pandas.core.frame.DataFrame, y: str, title: str =
''
, file_path: pathlib.Path | None =None
, min_year: int =2000
, max_year: int =2019
, year_interval: int =2
, max_age: int =110
, width: int =1000
, height: int =800
)[source]¶ Plot the incidence or prevalence of asthma.
- Parameters:¶
- df: pandas.core.frame.DataFrame¶
A dataframe containing either incidence or prevalence data. Must have columns:
year (int)
: The calendar year.sex (str)
: One ofF
= female,M
= male.age (int)
: The age in years.y (float)
: Specified by they
argument, this will be the y data.y_pred (float)
: Optional, the predicted y data. If this column is present, it will be plotted alongside the actual data. The column name must be the same asy
with_pred
appended. For example, ify
isincidence
, then the predicted data must beincidence_pred
.
- y: str¶
The name of the column in the dataframe which will be plotted as the
y
data.- title: str =
''
¶ The title of the plot.
- file_path: pathlib.Path | None =
None
¶ The path to save the plot to. If
None
, the plot will be displayed.- min_year: int =
2000
¶ The minimum year to plot.
- max_year: int =
2019
¶ The maximum year to plot.
- year_interval: int =
2
¶ The interval between years. This is used if you don’t want to plot every year.
- max_age: int =
110
¶ The maximum age to plot.
- width: int =
1000
¶ The width of the plot.
- height: int =
800
¶ The height of the plot.
- Returns:¶
If
file_path
isNone
, the plot will be displayed. Otherwise, the plot will be saved to the specified path.
- leap.data_generation.occurrence_data.add_beta_parameters(model: statsmodels.genmod.generalized_linear_model.GLMResultsWrapper, parameter_map: dict[str, list[int]], config: dict[str, Any]) dict[str, Any] [source]¶
Add the beta parameters to the config dictionary.
- Parameters:¶
- model: statsmodels.genmod.generalized_linear_model.GLMResultsWrapper¶
The fitted GLM model.
- parameter_map: dict[str, list[int]]¶
A dictionary mapping the parameter names to their indices in the model parameters field,
model.params
. The keys are the parameter names and the values are lists of indices. For example, ifβyear
is the second parameter in the list, then the mapping would be{"βyear": [1]}
, and it would be accessed bymodel.params.iloc[1]
.- config: dict[str, Any]¶
The config dictionary to add the parameters to.
- Returns:¶
The config dictionary with the beta parameters added.
- leap.data_generation.occurrence_data.generate_occurrence_data()[source]¶
Generate the asthma incidence and prevalence data.
Saves the data to a CSV file:
processed_data/asthma_occurrence_predictions.csv
.The data is also plotted and saved to the following files:
data_generation/figures/asthma_incidence_predicted.png
: The predicted asthma incidence per 100 in BC.data_generation/figures/asthma_prevalence_predicted.png
: The predicted asthma prevalence per 100 in BC.