leap.data_generation.utils module¶
- leap.data_generation.utils.get_province_id(province: str) str[source]¶
Convert full length province name to abbreviation.
- leap.data_generation.utils.get_sex_id(sex: str) str[source]¶
Convert full length sex to single character.
- leap.data_generation.utils.parse_age_group(x: str, max_age: int) tuple[int, int][source]¶
Parse an age group string into a tuple of integers.
- Parameters:¶
- x: str¶
The age group string. Must be in the format “X-Y”, “X+”, “X-Y years”, “<1 year”.
- Returns:¶
A tuple of integers representing the lower and upper age of the age group.
Examples
>>> parse_age_group("0-4", max_age=65) (0, 4) >>> parse_age_group("5-9 years", max_age=65) (5, 9) >>> parse_age_group("10+", max_age=65) (10, 65) >>> parse_age_group("<1 year", max_age=65) (0, 1)
-
leap.data_generation.utils.format_age_group(age_group: str, upper_age_group: str =
'100 years and over') int[source]¶ Convert age group to integer.
Examples:
>>> format_age_group("110 years and over", "110 years and over") 110 >>> format_age_group("Under 1 year", "100 years and over") 0 >>> format_age_group("9 years") 9
- leap.data_generation.utils.heaviside(x: float | list[float] | numpy.ndarray | pandas.core.series.Series, threshold: float) int | list[int][source]¶
Heaviside step function.
- class leap.data_generation.utils.ContingencyTable(a: float, b: float, c: float, d: float)[source]¶
Bases:
objectA class representing a contingency table.
- __init__(a: float, b: float, c: float, d: float)[source]¶
Initialize the contingency table with proportions. :param a: Proportion of the population with variable 1 + and variable 2 +. :param b: Proportion of the population with variable 1 + and variable 2 -. :param c: Proportion of the population with variable 1 - and variable 2 +. :param d: Proportion of the population with variable 1 - and variable 2 -.
- apply(func) leap.data_generation.utils.ContingencyTable[source]¶
Apply a function to each value in the contingency table.
-
leap.data_generation.utils.conv_2x2(ori: float, ni: float, n1i: float, n2i: float, var_names: list =
['ai', 'bi', 'ci', 'di']) leap.data_generation.utils.ContingencyTable[source]¶ Create a 2x2 contigency table.
This function is based off the
Rfunctionmetafor::conv.2x2.We want to determine the contingency table:
variable 2, outcome + variable 2, outcome - variable 1, outcome + aibin1ivariable 1, outcome - cidin2iniGiven the odds ratio \(or_{i}\), the marginal counts \(n_{1i}\) and \(n_{2i}\), and the total sample size \(n_{i}\), we want to compute the probabilities \(a_{i}\), \(b_{i}\), \(c_{i}\), and \(d_{i}\).
\[\begin{split}n_{i} &= a_{i} + b_{i} + c_{i} + d_{i} \\ n_{1i} &= a_{i} + b_{i} \\ n_{2i} &= a_{i} + c_{i} \\ or_{i} &= \dfrac{a_{i} d_{i}}{b_{i} c_{i}}\end{split}\]Examples
Let’s suppose we know that the probability of antibiotic use in infancy is 0.52, and the probability of having an asthma diagnosis is 0.87, and suppose we have 100 people. We also know that the odds ratio, i.e. the odds of getting asthma given antibiotic exposure, is
ori=0.4343. Then the contingency table would be:asthma no asthma antibiotics aibin1i = 52no antibiotics cidin2i = 87ni = 100We want to compute
ai,bi,ci, anddi. We can do this using theconv_2x2function:>>> from leap.data_generation.utils import conv_2x2 >>> conv_2x2(ori=0.4343, ni=100, n1i=52, n2i=87) ContingencyTable(values=43, 9, 44, 4)Here we have:
ai= 43, the number of people who have asthma and were exposed to antibiotics.bi= 9, the number of people who have asthma and were not exposed to antibiotics.ci= 44, the number of people who do not have asthma and were exposed to antibiotics.di= 4, the number of people who do not have asthma and were not exposed to antibiotics.
We can divide them by
nito get the proportions:ai= 0.43, the probability of having asthma given antibiotic exposure.bi= 0.09, the probability of having asthma given no antibiotic exposure.ci= 0.44, the probability of not having asthma given antibiotic exposure.di= 0.04, the probability of not having asthma given no antibiotic exposure.