leap.data_generation.utils module

leap.data_generation.utils.get_province_id(province: str) str[source]

Convert full length province name to abbreviation.

Parameters:
province: str

The full length province name, e.g. British Columbia.

Returns:

The abbreviation for the province, e.g. BC.

leap.data_generation.utils.get_sex_id(sex: str) str[source]

Convert full length sex to single character.

Parameters:
sex: str

The full length string, either Female or Male

Returns:

The single character string, either F or M.

leap.data_generation.utils.parse_age_group(x: str, max_age: int) tuple[int, int][source]

Parse an age group string into a tuple of integers.

Parameters:
x: str

The age group string. Must be in the format “X-Y”, “X+”, “X-Y years”, “<1 year”.

Returns:

A tuple of integers representing the lower and upper age of the age group.

Examples

>>> parse_age_group("0-4", max_age=65)
(0, 4)
>>> parse_age_group("5-9 years", max_age=65)
(5, 9)
>>> parse_age_group("10+", max_age=65)
(10, 65)
>>> parse_age_group("<1 year", max_age=65)
(0, 1)
leap.data_generation.utils.format_age_group(age_group: str, upper_age_group: str = '100 years and over') int[source]

Convert age group to integer.

Parameters:
age_group: str

The age group string, e.g. 5 to 9 years.

upper_age_group: str = '100 years and over'

The upper age group string, e.g. 100 years and over.

Returns:

The integer age.

Examples:

>>> format_age_group("110 years and over", "110 years and over")
110
>>> format_age_group("Under 1 year", "100 years and over")
0
>>> format_age_group("9 years")
9
leap.data_generation.utils.heaviside(x: float | list[float] | numpy.ndarray | pandas.core.series.Series, threshold: float) int | list[int][source]

Heaviside step function.

Parameters:
x: float | list[float] | numpy.ndarray | pandas.core.series.Series

The input value or array of values.

threshold: float

The threshold value.

Returns:

1 if x >= threshold, else 0. If x is a vector, this is computed for each entry.

class leap.data_generation.utils.ContingencyTable(a: float, b: float, c: float, d: float)[source]

Bases: object

A class representing a contingency table.

__init__(a: float, b: float, c: float, d: float)[source]

Initialize the contingency table with proportions. :param a: Proportion of the population with variable 1 + and variable 2 +. :param b: Proportion of the population with variable 1 + and variable 2 -. :param c: Proportion of the population with variable 1 - and variable 2 +. :param d: Proportion of the population with variable 1 - and variable 2 -.

to_list() list[float][source]

Convert the contingency table to a list of values.

apply(func) leap.data_generation.utils.ContingencyTable[source]

Apply a function to each value in the contingency table.

leap.data_generation.utils.conv_2x2(ori: float, ni: float, n1i: float, n2i: float, var_names: list = ['ai', 'bi', 'ci', 'di']) leap.data_generation.utils.ContingencyTable[source]

Create a 2x2 contigency table.

This function is based off the R function metafor::conv.2x2.

We want to determine the contingency table:

variable 2, outcome + variable 2, outcome -
variable 1, outcome + ai bi n1i
variable 1, outcome - ci di
n2i ni

Given the odds ratio \(or_{i}\), the marginal counts \(n_{1i}\) and \(n_{2i}\), and the total sample size \(n_{i}\), we want to compute the probabilities \(a_{i}\), \(b_{i}\), \(c_{i}\), and \(d_{i}\).

\[\begin{split}n_{i} &= a_{i} + b_{i} + c_{i} + d_{i} \\ n_{1i} &= a_{i} + b_{i} \\ n_{2i} &= a_{i} + c_{i} \\ or_{i} &= \dfrac{a_{i} d_{i}}{b_{i} c_{i}}\end{split}\]
Parameters:
ori: float

The odds ratio.

ni: float

The total sample size.

n1i: float

The marginal count for the first variable.

n2i: float

The marginal count for the second variable.

var_names: list = ['ai', 'bi', 'ci', 'di']

The names of the variables. Must be of length 4.

Returns:

A pandas DataFrame with the cell frequencies for the 2x2 table.

Examples

Let’s suppose we know that the probability of antibiotic use in infancy is 0.52, and the probability of having an asthma diagnosis is 0.87, and suppose we have 100 people. We also know that the odds ratio, i.e. the odds of getting asthma given antibiotic exposure, is ori=0.4343. Then the contingency table would be:

asthma no asthma
antibiotics ai bi n1i = 52
no antibiotics ci di
n2i = 87 ni = 100

We want to compute ai, bi, ci, and di. We can do this using the conv_2x2 function:

>>> from leap.data_generation.utils import conv_2x2
>>> conv_2x2(ori=0.4343, ni=100, n1i=52, n2i=87)
ContingencyTable(values=43, 9, 44, 4)

Here we have:

  • ai = 43, the number of people who have asthma and were exposed to antibiotics.

  • bi = 9, the number of people who have asthma and were not exposed to antibiotics.

  • ci = 44, the number of people who do not have asthma and were exposed to antibiotics.

  • di = 4, the number of people who do not have asthma and were not exposed to antibiotics.

We can divide them by ni to get the proportions:

  • ai = 0.43, the probability of having asthma given antibiotic exposure.

  • bi = 0.09, the probability of having asthma given no antibiotic exposure.

  • ci = 0.44, the probability of not having asthma given antibiotic exposure.

  • di = 0.04, the probability of not having asthma given no antibiotic exposure.