Antibiotic Exposure Model¶
In this section, we will describe the model used to predict the number of antibiotics prescribed to infants in their first year of life. This model will be incorporated into the risk factors for developing asthma later in life.
Datasets¶
We obtained data from the BC Ministry of Health
on the number of antibiotics prescribed to
infants in their first year of life. The data is available for the years 2000
to 2018
. The
data is formatted as follows:
Column | Type | Description |
---|---|---|
year |
int
|
format XXXX , e.g 2000 , range [2000, 2018]
|
sex |
int
|
1 = Female, 2 = Male |
n_abx |
int
|
The total number of antibiotic courses prescribed to infants in BC in a given year. Note that this is not per infant, it is the total for all infants. |
Since the n_abx
column gives us the total number of antibiotics prescribed, we need to use
population data to convert this to a per infant value. We obtained population data from the
StatCan
census data (the same that is used in the Birth
module):
Column | Type | Description |
---|---|---|
year |
int
|
format XXXX , e.g 2000 , range [1999, 2021]
|
sex |
int
|
"M" = male, "F" = female
|
n_birth |
int
|
The total number of births in BC in a given year. |
Model: Generalized Linear Model - Negative Binomial¶
Since our model projects into the future, we would like to be able to extend this data beyond
2018
. To obtain these projections, we use a Generalized Linear Model (GLM)
. A GLM
is a
type of regression analysis which is a generalized form of linear regression.
See Generalized Linear Models for more information on GLMs
.
Probability Distribution¶
When fitting a GLM
, first you must choose a distribution for the response variable
. In our
case, the response variable is the number of antibiotics prescribed during the first year of life.
The number of antibiotics prescribed is a count variable, in a given time interval
(a year, in our case). Since it is count data, we need a discrete probability distribution.
The Poisson distribution
is a good choice for our data, but it has some limitations. The
Poisson distribution
assumes that the mean and variance are equal, i.e:
However, in our data, the variance is greater than the mean. This is a common problem in count data,
and it is called overdispersion
. The Negative Binomial
distribution is a generalization of
the Poisson
distribution that allows for overdispersion. The Negative Binomial
distribution
has an extra parameter, \(\theta\), which controls the amount of overdispersion. Typically,
the distribution is written as:
where:
\(k\) is the number of failures before \(r\) successes occur
\(p\) is the probability of a success
\(r\) is the number of successes
We can reparametrize this with \(\mu\) and \(\theta\) using the following equations:
where \(\mu\) is the mean, \(\sigma^2\) is the variance, and \(\theta\) is the overdispersion parameter.
Doing some algebra, we have:
Letting \(y = k\), we have:
We added an upper bound on the mean parameter to prevent unrealistic extrapolation:
In other words, we are saying that the mean number of antibiotics prescribed to an infant in their
first year of life is less than or equal to 0.05
.
So we have:
Link Function¶
We also need to choose a link function
. Recall that the link function \(g(\mu^{(i)})\)
is used to relate the mean to the predicted value \(\eta^{(i)}\):
How do we choose a link function? Well, we are free to choose any link function we like, but there
are some constraints. For example, in the Negative Binomial distribution, the mean is always
>= 0
. However, \(\eta^{(i)}\) can be any real number. Therefore, we need a link function
that maps real numbers to non-negative numbers. The log link function
is a good choice for this:
Formula¶
Now that we have our distribution and link function, we need to decide on a formula for \(\eta^{(i)}\). We are permitted to use linear combinations of functions of the features in our dataset.
For our dataset, we want a formula using sex
and year
. Since prescribing practices change
over time, and since infections requiring antibiotic prescriptions also change over time,
we should include year in our formula. We also want to include sex, since there are sex differences
in antibiotic prescriptions.
There is an additional factor specific to BC regulations. In 2005, the BC government introduced
an antibiotic conservation program, which reduced the number of antibiotics prescribed
[5]. It stands to reason that the formula may change before and after 2005. To
account for this, we will introduce a Heaviside step function
, which returns 0
for values
below a given threshold, and 1
for values above the threshold. In our case, the threshold
is 2005
.
where:
\(s^{(i)}\) is the sex of the infant
\(t^{(i)}\) is the year of birth of the infant
\(H(t^{(i)} - 2005)\) is the
Heaviside step function
, which is0
for years before2005
and1
for years after2005