Prevalence Calibration¶
This file is a helper file for the occurrence_calibration_data.py
file. It contains the
functions that are used to calibrate the asthma prevalence equation.
leap.data_generation.prevalence_calibration module¶
- leap.data_generation.prevalence_calibration.get_asthma_prevalence_correction(asthma_prev_risk_factor_params: list[float], risk_factor_prob: list[float]) float [source]¶
Compute the correction term for asthma prevalence.
\[\alpha = \sum_{\lambda=1}^{n} p(\lambda) \cdot \beta_{\lambda}\]where:
\(\alpha\) is the correction term for the asthma prevalence
\(p(\lambda)\) is the prevalence of risk factor level \(\lambda\),
risk_factor_prob[λ]
\(\beta_{\lambda}\) is the parameter for risk factor level \(\lambda\),
asthma_prev_risk_factor_params[λ]
- leap.data_generation.prevalence_calibration.compute_asthma_prevalence_λ(asthma_prev_risk_factor_params: list[float], odds_ratio_target: list[float], risk_factor_prob: list[float], β_0: float) numpy.ndarray [source]¶
Compute the asthma prevalence based on the risk factors and the parameters provided.
\[\zeta_{\lambda} = \sigma(\beta_0 + \log(\omega_{\lambda}) - \alpha)\]where:
\(\beta_0 = \sigma^{-1}(\eta)\)
\(\omega_{\lambda}\) is the odds ratio for risk factor level \(\lambda\),
odds_ratio_target[λ]
\(\alpha\) is the correction term for the asthma prevalence, computed by
get_asthma_prevalence_correction
- Parameters:¶
- asthma_prev_risk_factor_params: list[float]¶
A vector of parameters for the risk factors, with shape
(n - 1, 1)
.- odds_ratio_target: list[float]¶
A vector of odds ratios between the risk factors and asthma, with shape
(n, 1)
.- risk_factor_prob: list[float]¶
A vector of the prevalence of the risk factor levels, with shape
(n, 1)
.- β_0: float¶
The intercept of the logistic regression model.
- Returns:¶
The calibrated asthma prevalence.
- leap.data_generation.prevalence_calibration.compute_asthma_prevalence(asthma_prev_risk_factor_params: list[float], odds_ratio_target: list[float], risk_factor_prob: list[float], β_0: float) float [source]¶
Compute the asthma prevalence based on the risk factors and the parameters provided.
We want to find the calibrated asthma prevalence \(\zeta\):
\[\begin{split}\zeta &= \sum_{\lambda=0}^{n} p(\lambda) \zeta_{\lambda} \\\end{split}\]where:
\(p(\lambda)\) is the probability of risk factor level \(\lambda\),
risk_factor_prob[λ]
\(\zeta_{\lambda}\) is the predicted asthma prevalence at risk factor level \(\lambda\),
asthma_prev_λ
We compute \(\zeta_{\lambda}\) as follows:
\[\begin{split}\zeta_{\lambda} &= \sigma(\beta_0 + \log(\omega_{\lambda}) - \alpha) \\ \beta_0 &= \sigma^{-1}(\eta) \\ \alpha &= \sum_{\lambda=1}^{n} p(\lambda) \cdot \beta_{\lambda}\end{split}\]where:
\(\eta\) is the target asthma prevalence,
asthma_prev_target
, from the model of the BC Ministry of Health data.\(\omega_{\lambda}\) is the odds ratio for risk factor level \(\lambda\),
odds_ratio_target[λ]
\(\beta_{\lambda}\) is the parameter for risk factor level \(\lambda\),
asthma_prev_risk_factor_params[λ]
\(\alpha\) is the correction term for the asthma prevalence, computed by
get_asthma_prevalence_correction
- Parameters:¶
- asthma_prev_risk_factor_params: list[float]¶
A vector of parameters for the risk factors, with shape
(n - 1, 1)
.- odds_ratio_target: list[float]¶
A vector of odds ratios between the risk factors and asthma, with shape
(n, 1)
.- risk_factor_prob: list[float]¶
A vector of the prevalence of the risk factor levels, with shape
(n, 1)
.- β_0: float¶
The intercept of the logistic regression model.
- Returns:¶
The calibrated asthma prevalence.
- leap.data_generation.prevalence_calibration.compute_asthma_prevalence_difference(asthma_prev_risk_factor_params: list[float], odds_ratio_target: list[float], risk_factor_prob: list[float], β_0: float, asthma_prev_target: float) float [source]¶
Compute the absolute difference between the calibrated and target asthma prevalence.
We want to find:
\[|\zeta - \eta|\]where:
\(\zeta\) is the calibrated asthma prevalence, computed by
compute_asthma_prevalence
\(\eta\) is the target asthma prevalence,
asthma_prev_target
, from the model of the BC Ministry of Health data.
- Parameters:¶
- asthma_prev_risk_factor_params: list[float]¶
A vector of parameters for the risk factors, with shape
(n - 1, 1)
.- odds_ratio_target: list[float]¶
A vector of odds ratios between the risk factors and asthma, with shape
(n, 1)
.- risk_factor_prob: list[float]¶
A vector of the prevalence of the risk factor levels, with shape
(n, 1)
.- β_0: float¶
The intercept of the logistic regression model.
- asthma_prev_target: float¶
The target prevalence of asthma.
- Returns:¶
The absolute difference between the calibrated and target asthma prevalence.
-
leap.data_generation.prevalence_calibration.optimize_prevalence_β_parameters(asthma_prev_target: float, odds_ratio_target: list[float], risk_factor_prob: list[float], β_0: float | None =
None
, verbose: bool =False
) list[float] [source]¶ Calibrate asthma prevalence based on the target prevalence and odds ratios of risk factors.
We want to find the parameters \(\beta_{\lambda}\) such that the difference between the calibrated asthma prevalence and the target asthma prevalence is minimized. The calibrated asthma prevalence is computed as follows:
\[\begin{split}\beta_0 &= \sigma^{-1}(\eta) \\ \zeta_{\lambda} &= \sigma(\beta_0 + \log(\omega_{\lambda}) - \alpha) \\ \alpha &= \sum_{\lambda=1}^{n} p(\lambda) \cdot \beta_{\lambda} \\ \zeta &= \sum_{\lambda=0}^{n} p(\lambda) \zeta_{\lambda}\end{split}\]where:
\(\eta\) is the target asthma prevalence,
asthma_prev_target
, from the model of the BC Ministry of Health data.\(\omega_{\lambda}\) is the odds ratio for risk factor level \(\lambda\),
odds_ratio_target[i]
\(p(\lambda)\) is the prevalence of risk factor level \(\lambda\),
risk_factor_prob[i]
\(\beta_{\lambda}\) is the parameter for risk factor level \(\lambda\),
asthma_prev_risk_factor_params[i]
\(\alpha\) is the correction term for the asthma prevalence
\(\zeta_{\lambda}\) is the predicted asthma prevalence at risk factor level \(\lambda\)
\(\zeta\) is the predicted / calibrated asthma prevalence
The function uses the
BFGS
optimization algorithm to minimize the absolute difference between the calibrated asthma prevalence and the target asthma prevalence.- Parameters:¶
- asthma_prev_target: float¶
The target prevalence of asthma from the BC Ministry of Health model.
- odds_ratio_target: list[float]¶
A vector of odds ratios for the risk factors, with shape
(n, 1)
.- risk_factor_prob: list[float]¶
A vector of the prevalence of the risk factors, with shape
(n, 1)
.- β_0: float | None =
None
¶ The intercept of the logistic regression model. If
None
, it is set to thelogit
of the target prevalence.- verbose: bool =
False
¶ A boolean indicating if the trace should be printed.
- Returns:¶
A vector of the asthma prevalence beta parameters for each risk factor level, with shape
(n - 1, 1)
.