Prevalence Calibration

This file is a helper file for the occurrence_calibration_data.py file. It contains the functions that are used to calibrate the asthma prevalence equation.

leap.data_generation.prevalence_calibration module

leap.data_generation.prevalence_calibration.get_asthma_prevalence_correction(asthma_prev_risk_factor_params: list[float], risk_factor_prob: list[float]) float[source]

Compute the correction term for asthma prevalence.

\[\alpha = \sum_{\lambda=1}^{n} p(\lambda) \cdot \beta_{\lambda}\]

where:

  • \(\alpha\) is the correction term for the asthma prevalence

  • \(p(\lambda)\) is the prevalence of risk factor level \(\lambda\), risk_factor_prob[λ]

  • \(\beta_{\lambda}\) is the parameter for risk factor level \(\lambda\), asthma_prev_risk_factor_params[λ]

Parameters:
asthma_prev_risk_factor_params: list[float]

A vector of parameters for the risk factors, with shape (n - 1, 1).

risk_factor_prob: list[float]

A vector of the prevalence of the risk factor levels, with shape (n, 1).

Returns:

The correction term for asthma prevalence.

leap.data_generation.prevalence_calibration.compute_asthma_prevalence_λ(asthma_prev_risk_factor_params: list[float], odds_ratio_target: list[float], risk_factor_prob: list[float], β_0: float) numpy.ndarray[source]

Compute the asthma prevalence based on the risk factors and the parameters provided.

\[\zeta_{\lambda} = \sigma(\beta_0 + \log(\omega_{\lambda}) - \alpha)\]

where:

  • \(\beta_0 = \sigma^{-1}(\eta)\)

  • \(\omega_{\lambda}\) is the odds ratio for risk factor level \(\lambda\), odds_ratio_target[λ]

  • \(\alpha\) is the correction term for the asthma prevalence, computed by get_asthma_prevalence_correction

Parameters:
asthma_prev_risk_factor_params: list[float]

A vector of parameters for the risk factors, with shape (n - 1, 1).

odds_ratio_target: list[float]

A vector of odds ratios between the risk factors and asthma, with shape (n, 1).

risk_factor_prob: list[float]

A vector of the prevalence of the risk factor levels, with shape (n, 1).

β_0: float

The intercept of the logistic regression model.

Returns:

The calibrated asthma prevalence.

leap.data_generation.prevalence_calibration.compute_asthma_prevalence(asthma_prev_risk_factor_params: list[float], odds_ratio_target: list[float], risk_factor_prob: list[float], β_0: float) float[source]

Compute the asthma prevalence based on the risk factors and the parameters provided.

We want to find the calibrated asthma prevalence \(\zeta\):

\[\begin{split}\zeta &= \sum_{\lambda=0}^{n} p(\lambda) \zeta_{\lambda} \\\end{split}\]

where:

  • \(p(\lambda)\) is the probability of risk factor level \(\lambda\), risk_factor_prob[λ]

  • \(\zeta_{\lambda}\) is the predicted asthma prevalence at risk factor level \(\lambda\), asthma_prev_λ

We compute \(\zeta_{\lambda}\) as follows:

\[\begin{split}\zeta_{\lambda} &= \sigma(\beta_0 + \log(\omega_{\lambda}) - \alpha) \\ \beta_0 &= \sigma^{-1}(\eta) \\ \alpha &= \sum_{\lambda=1}^{n} p(\lambda) \cdot \beta_{\lambda}\end{split}\]

where:

  • \(\eta\) is the target asthma prevalence, asthma_prev_target, from the model of the BC Ministry of Health data.

  • \(\omega_{\lambda}\) is the odds ratio for risk factor level \(\lambda\), odds_ratio_target[λ]

  • \(\beta_{\lambda}\) is the parameter for risk factor level \(\lambda\), asthma_prev_risk_factor_params[λ]

  • \(\alpha\) is the correction term for the asthma prevalence, computed by get_asthma_prevalence_correction

Parameters:
asthma_prev_risk_factor_params: list[float]

A vector of parameters for the risk factors, with shape (n - 1, 1).

odds_ratio_target: list[float]

A vector of odds ratios between the risk factors and asthma, with shape (n, 1).

risk_factor_prob: list[float]

A vector of the prevalence of the risk factor levels, with shape (n, 1).

β_0: float

The intercept of the logistic regression model.

Returns:

The calibrated asthma prevalence.

leap.data_generation.prevalence_calibration.compute_asthma_prevalence_difference(asthma_prev_risk_factor_params: list[float], odds_ratio_target: list[float], risk_factor_prob: list[float], β_0: float, asthma_prev_target: float) float[source]

Compute the absolute difference between the calibrated and target asthma prevalence.

We want to find:

\[|\zeta - \eta|\]

where:

  • \(\zeta\) is the calibrated asthma prevalence, computed by compute_asthma_prevalence

  • \(\eta\) is the target asthma prevalence, asthma_prev_target, from the model of the BC Ministry of Health data.

Parameters:
asthma_prev_risk_factor_params: list[float]

A vector of parameters for the risk factors, with shape (n - 1, 1).

odds_ratio_target: list[float]

A vector of odds ratios between the risk factors and asthma, with shape (n, 1).

risk_factor_prob: list[float]

A vector of the prevalence of the risk factor levels, with shape (n, 1).

β_0: float

The intercept of the logistic regression model.

asthma_prev_target: float

The target prevalence of asthma.

Returns:

The absolute difference between the calibrated and target asthma prevalence.

leap.data_generation.prevalence_calibration.optimize_prevalence_β_parameters(asthma_prev_target: float, odds_ratio_target: list[float], risk_factor_prob: list[float], β_0: float | None = None, verbose: bool = False) list[float][source]

Calibrate asthma prevalence based on the target prevalence and odds ratios of risk factors.

We want to find the parameters \(\beta_{\lambda}\) such that the difference between the calibrated asthma prevalence and the target asthma prevalence is minimized. The calibrated asthma prevalence is computed as follows:

\[\begin{split}\beta_0 &= \sigma^{-1}(\eta) \\ \zeta_{\lambda} &= \sigma(\beta_0 + \log(\omega_{\lambda}) - \alpha) \\ \alpha &= \sum_{\lambda=1}^{n} p(\lambda) \cdot \beta_{\lambda} \\ \zeta &= \sum_{\lambda=0}^{n} p(\lambda) \zeta_{\lambda}\end{split}\]

where:

  • \(\eta\) is the target asthma prevalence, asthma_prev_target, from the model of the BC Ministry of Health data.

  • \(\omega_{\lambda}\) is the odds ratio for risk factor level \(\lambda\), odds_ratio_target[i]

  • \(p(\lambda)\) is the prevalence of risk factor level \(\lambda\), risk_factor_prob[i]

  • \(\beta_{\lambda}\) is the parameter for risk factor level \(\lambda\), asthma_prev_risk_factor_params[i]

  • \(\alpha\) is the correction term for the asthma prevalence

  • \(\zeta_{\lambda}\) is the predicted asthma prevalence at risk factor level \(\lambda\)

  • \(\zeta\) is the predicted / calibrated asthma prevalence

The function uses the BFGS optimization algorithm to minimize the absolute difference between the calibrated asthma prevalence and the target asthma prevalence.

Parameters:
asthma_prev_target: float

The target prevalence of asthma from the BC Ministry of Health model.

odds_ratio_target: list[float]

A vector of odds ratios for the risk factors, with shape (n, 1).

risk_factor_prob: list[float]

A vector of the prevalence of the risk factors, with shape (n, 1).

β_0: float | None = None

The intercept of the logistic regression model. If None, it is set to the logit of the target prevalence.

verbose: bool = False

A boolean indicating if the trace should be printed.

Returns:

A vector of the asthma prevalence beta parameters for each risk factor level, with shape (n - 1, 1).