leap.data_generation.utils module¶

leap.data_generation.utils.get_province_id(province: str) → str[source]¶

Convert full length province name to abbreviation.

Parameters:¶

province: str¶: The full length province name, e.g. British Columbia.

Returns:¶

The abbreviation for the province, e.g. BC.

leap.data_generation.utils.get_sex_id(sex: str) → str[source]¶

Convert full length sex to single character.

Parameters:¶

sex: str¶: The full length string, either Female or Male

Returns:¶

The single character string, either F or M.

leap.data_generation.utils.parse_age_group(x: str, max_age: int) → tuple[int, int][source]¶

Parse an age group string into a tuple of integers.

Parameters:¶

x: str¶: The age group string. Must be in the format “X-Y”, “X+”, “X-Y years”, “<1 year”.

Returns:¶

A tuple of integers representing the lower and upper age of the age group.

Examples

>>> parse_age_group("0-4", max_age=65)
(0, 4)
>>> parse_age_group("5-9 years", max_age=65)
(5, 9)
>>> parse_age_group("10+", max_age=65)
(10, 65)
>>> parse_age_group("<1 year", max_age=65)
(0, 1)

leap.data_generation.utils.format_age_group(age_group: str, upper_age_group: str = '100 years and over') → int[source]¶

Convert age group to integer.

Parameters:¶

age_group: str¶: The age group string, e.g. 5 to 9 years.
upper_age_group: str = '100 years and over'¶: The upper age group string, e.g. 100 years and over.

Returns:¶

The integer age.

Examples:

>>> format_age_group("110 years and over", "110 years and over")
110
>>> format_age_group("Under 1 year", "100 years and over")
0
>>> format_age_group("9 years")
9

leap.data_generation.utils.heaviside(x: float | list[float] | numpy.ndarray | pandas.core.series.Series, threshold: float) → int | list[int][source]¶

Heaviside step function.

Parameters:¶

x: float | list[float] | numpy.ndarray | pandas.core.series.Series¶: The input value or array of values.
threshold: float¶: The threshold value.

Returns:¶

1 if x >= threshold, else 0. If x is a vector, this is computed for each entry.

class leap.data_generation.utils.ContingencyTable(a: float, b: float, c: float, d: float)[source]¶

Bases: object

A class representing a contingency table.

__init__(a: float, b: float, c: float, d: float)[source]¶: Initialize the contingency table with proportions. :param a: Proportion of the population with variable 1 + and variable 2 +. :param b: Proportion of the population with variable 1 + and variable 2 -. :param c: Proportion of the population with variable 1 - and variable 2 +. :param d: Proportion of the population with variable 1 - and variable 2 -.

to_list() → list[float][source]¶: Convert the contingency table to a list of values.

apply(func) → leap.data_generation.utils.ContingencyTable[source]¶: Apply a function to each value in the contingency table.

leap.data_generation.utils.conv_2x2(ori: float, ni: float, n1i: float, n2i: float, var_names: list = ['ai', 'bi', 'ci', 'di']) → leap.data_generation.utils.ContingencyTable[source]¶

Create a 2x2 contigency table.

This function is based off the R function metafor::conv.2x2.

We want to determine the contingency table:

	variable 2, outcome +	variable 2, outcome -
variable 1, outcome +	`ai`	`bi`	`n1i`
variable 1, outcome -	`ci`	`di`
	`n2i`		`ni`

Given the odds ratio \(or_{i}\), the marginal counts \(n_{1i}\) and \(n_{2i}\), and the total sample size \(n_{i}\), we want to compute the probabilities \(a_{i}\), \(b_{i}\), \(c_{i}\), and \(d_{i}\).

\[\begin{split}n_{i} &= a_{i} + b_{i} + c_{i} + d_{i} \\ n_{1i} &= a_{i} + b_{i} \\ n_{2i} &= a_{i} + c_{i} \\ or_{i} &= \dfrac{a_{i} d_{i}}{b_{i} c_{i}}\end{split}\]

Parameters:¶

ori: float¶: The odds ratio.
ni: float¶: The total sample size.
n1i: float¶: The marginal count for the first variable.
n2i: float¶: The marginal count for the second variable.
var_names: list = ['ai', 'bi', 'ci', 'di']¶: The names of the variables. Must be of length 4.

Returns:¶

A pandas DataFrame with the cell frequencies for the 2x2 table.

Examples

Let’s suppose we know that the probability of antibiotic use in infancy is 0.52, and the probability of having an asthma diagnosis is 0.87, and suppose we have 100 people. We also know that the odds ratio, i.e. the odds of getting asthma given antibiotic exposure, is ori=0.4343. Then the contingency table would be:

	asthma	no asthma
antibiotics	`ai`	`bi`	`n1i = 52`
no antibiotics	`ci`	`di`
	`n2i = 87`		`ni = 100`

We want to compute ai, bi, ci, and di. We can do this using the conv_2x2 function:

>>> from leap.data_generation.utils import conv_2x2
>>> conv_2x2(ori=0.4343, ni=100, n1i=52, n2i=87)
ContingencyTable(values=43, 9, 44, 4)

Here we have:

ai = 43, the number of people who have asthma and were exposed to antibiotics.
bi = 9, the number of people who have asthma and were not exposed to antibiotics.
ci = 44, the number of people who do not have asthma and were exposed to antibiotics.
di = 4, the number of people who do not have asthma and were not exposed to antibiotics.

We can divide them by ni to get the proportions:

ai = 0.43, the probability of having asthma given antibiotic exposure.
bi = 0.09, the probability of having asthma given no antibiotic exposure.
ci = 0.44, the probability of not having asthma given antibiotic exposure.
di = 0.04, the probability of not having asthma given no antibiotic exposure.