Prevalence Calibration¶

This file is a helper file for the occurrence_calibration_data.py file. It contains the functions that are used to calibrate the asthma prevalence equation.

leap.data_generation.prevalence_calibration module¶

leap.data_generation.prevalence_calibration.get_asthma_prevalence_correction(asthma_prev_risk_factor_params: list[float], risk_factor_prob: list[float]) → float[source]¶

Compute the correction term for asthma prevalence.

\[\alpha = \sum_{\lambda=1}^{n} p(\lambda) \cdot \beta_{\lambda}\]

where:

\(\alpha\) is the correction term for the asthma prevalence
\(p(\lambda)\) is the prevalence of risk factor level \(\lambda\), risk_factor_prob[λ]
\(\beta_{\lambda}\) is the parameter for risk factor level \(\lambda\), asthma_prev_risk_factor_params[λ]

Parameters:¶

asthma_prev_risk_factor_params: list[float]¶: A vector of parameters for the risk factors, with shape (n - 1, 1).
risk_factor_prob: list[float]¶: A vector of the prevalence of the risk factor levels, with shape (n, 1).

Returns:¶

The correction term for asthma prevalence.

leap.data_generation.prevalence_calibration.compute_asthma_prevalence_λ(asthma_prev_risk_factor_params: list[float], odds_ratio_target: list[float], risk_factor_prob: list[float], β_0: float) → numpy.ndarray[source]¶

Compute the asthma prevalence based on the risk factors and the parameters provided.

\[\zeta_{\lambda} = \sigma(\beta_0 + \log(\omega_{\lambda}) - \alpha)\]

where:

\(\beta_0 = \sigma^{-1}(\eta)\)
\(\omega_{\lambda}\) is the odds ratio for risk factor level \(\lambda\), odds_ratio_target[λ]
\(\alpha\) is the correction term for the asthma prevalence, computed by get_asthma_prevalence_correction

Parameters:¶

asthma_prev_risk_factor_params: list[float]¶: A vector of parameters for the risk factors, with shape (n - 1, 1).
odds_ratio_target: list[float]¶: A vector of odds ratios between the risk factors and asthma, with shape (n, 1).
risk_factor_prob: list[float]¶: A vector of the prevalence of the risk factor levels, with shape (n, 1).
β_0: float¶: The intercept of the logistic regression model.

Returns:¶

The calibrated asthma prevalence.

leap.data_generation.prevalence_calibration.compute_asthma_prevalence(asthma_prev_risk_factor_params: list[float], odds_ratio_target: list[float], risk_factor_prob: list[float], β_0: float) → float[source]¶

Compute the asthma prevalence based on the risk factors and the parameters provided.

We want to find the calibrated asthma prevalence \(\zeta\):

\[\begin{split}\zeta &= \sum_{\lambda=0}^{n} p(\lambda) \zeta_{\lambda} \\\end{split}\]

where:

\(p(\lambda)\) is the probability of risk factor level \(\lambda\), risk_factor_prob[λ]
\(\zeta_{\lambda}\) is the predicted asthma prevalence at risk factor level \(\lambda\), asthma_prev_λ

We compute \(\zeta_{\lambda}\) as follows:

\[\begin{split}\zeta_{\lambda} &= \sigma(\beta_0 + \log(\omega_{\lambda}) - \alpha) \\ \beta_0 &= \sigma^{-1}(\eta) \\ \alpha &= \sum_{\lambda=1}^{n} p(\lambda) \cdot \beta_{\lambda}\end{split}\]

where:

\(\eta\) is the target asthma prevalence, asthma_prev_target, from the model of the BC Ministry of Health data.
\(\omega_{\lambda}\) is the odds ratio for risk factor level \(\lambda\), odds_ratio_target[λ]
\(\beta_{\lambda}\) is the parameter for risk factor level \(\lambda\), asthma_prev_risk_factor_params[λ]
\(\alpha\) is the correction term for the asthma prevalence, computed by get_asthma_prevalence_correction

Parameters:¶

asthma_prev_risk_factor_params: list[float]¶: A vector of parameters for the risk factors, with shape (n - 1, 1).
odds_ratio_target: list[float]¶: A vector of odds ratios between the risk factors and asthma, with shape (n, 1).
risk_factor_prob: list[float]¶: A vector of the prevalence of the risk factor levels, with shape (n, 1).
β_0: float¶: The intercept of the logistic regression model.

Returns:¶

The calibrated asthma prevalence.

leap.data_generation.prevalence_calibration.compute_asthma_prevalence_difference(asthma_prev_risk_factor_params: list[float], odds_ratio_target: list[float], risk_factor_prob: list[float], β_0: float, asthma_prev_target: float) → float[source]¶

Compute the absolute difference between the calibrated and target asthma prevalence.

We want to find:

\[|\zeta - \eta|\]

where:

\(\zeta\) is the calibrated asthma prevalence, computed by compute_asthma_prevalence
\(\eta\) is the target asthma prevalence, asthma_prev_target, from the model of the BC Ministry of Health data.

Parameters:¶

asthma_prev_risk_factor_params: list[float]¶: A vector of parameters for the risk factors, with shape (n - 1, 1).
odds_ratio_target: list[float]¶: A vector of odds ratios between the risk factors and asthma, with shape (n, 1).
risk_factor_prob: list[float]¶: A vector of the prevalence of the risk factor levels, with shape (n, 1).
β_0: float¶: The intercept of the logistic regression model.
asthma_prev_target: float¶: The target prevalence of asthma.

Returns:¶

The absolute difference between the calibrated and target asthma prevalence.

leap.data_generation.prevalence_calibration.optimize_prevalence_β_parameters(asthma_prev_target: float, odds_ratio_target: list[float], risk_factor_prob: list[float], β_0: float | None = None, verbose: bool = False) → list[float][source]¶

Calibrate asthma prevalence based on the target prevalence and odds ratios of risk factors.

We want to find the parameters \(\beta_{\lambda}\) such that the difference between the calibrated asthma prevalence and the target asthma prevalence is minimized. The calibrated asthma prevalence is computed as follows:

\[\begin{split}\beta_0 &= \sigma^{-1}(\eta) \\ \zeta_{\lambda} &= \sigma(\beta_0 + \log(\omega_{\lambda}) - \alpha) \\ \alpha &= \sum_{\lambda=1}^{n} p(\lambda) \cdot \beta_{\lambda} \\ \zeta &= \sum_{\lambda=0}^{n} p(\lambda) \zeta_{\lambda}\end{split}\]

where:

\(\eta\) is the target asthma prevalence, asthma_prev_target, from the model of the BC Ministry of Health data.
\(\omega_{\lambda}\) is the odds ratio for risk factor level \(\lambda\), odds_ratio_target[i]
\(p(\lambda)\) is the prevalence of risk factor level \(\lambda\), risk_factor_prob[i]
\(\beta_{\lambda}\) is the parameter for risk factor level \(\lambda\), asthma_prev_risk_factor_params[i]
\(\alpha\) is the correction term for the asthma prevalence
\(\zeta_{\lambda}\) is the predicted asthma prevalence at risk factor level \(\lambda\)
\(\zeta\) is the predicted / calibrated asthma prevalence

The function uses the BFGS optimization algorithm to minimize the absolute difference between the calibrated asthma prevalence and the target asthma prevalence.

Parameters:¶

asthma_prev_target: float¶: The target prevalence of asthma from the BC Ministry of Health model.
odds_ratio_target: list[float]¶: A vector of odds ratios for the risk factors, with shape (n, 1).
risk_factor_prob: list[float]¶: A vector of the prevalence of the risk factors, with shape (n, 1).
β_0: float | None = None¶: The intercept of the logistic regression model. If None, it is set to the logit of the target prevalence.
verbose: bool = False¶: A boolean indicating if the trace should be printed.

Returns:¶

A vector of the asthma prevalence beta parameters for each risk factor level, with shape (n - 1, 1).