SIRE-PC

Help

What are SNP and vaccination effects?

Vaccination is known to be an effective way to reduce the probability of becoming infected, however they will not protect every individual. Furthermore, once infected a vaccinated individual may be less likely to transmit the disease and/or recover faster. Understanding these mechanisms is crucial to estimating the effectiveness of a vaccination campaign. We define vaccination "effects" as the average difference in three key epidemiological traits between the vaccinated and unvaccinated groups: susceptibility, infectivity and recoverability.

Single nucleotide polymorphisms (SNP) capture individual variability at a particular genetic nucleotide. Some SNPs have been discovered that have large "effects" on the three host epidemiological traits. Targeting these SNPs in selective breeding programmes offers a potential avenue for effective disease control. Here it is assumed that the SNP under consideration takes two possible alleles (A and B), and that individuals are diploid (i.e. they have genotypes AA, AB or BB).

The property of "dominance" captures the degree to which one allele (the dominant one) masks the phenotypic contribution of the other (recessive). For example, if A is dominant then individuals with genotype AA and AB will behave in a similar way, but those with genotype BB will behave differently.

What is the purpose of SIRE-PC?

We begin by assuming that a disease transmission experiment is going to be undertaken (or field data is being analysed) to establish how a particular SNP (or vaccination status) affects the three host traits: susceptibility, infectivity and recoverability.

Existing computational tools (such as SIRE) make it possible to estimate variation in these traits using the infection and recovery times of individuals in the experiment (or alternatively just recovery times or disease diagnostic test results, although these lead to an additional loss in statistical power not considered here).

A key question to ask is how should the experiment be best designed in order to maximise the precision with which SNP/vaccination effects can be made? It is this question that SIRE-PC aims to answer.

How do I use SIRE-PC?

The instructions below follow the case of SNP effects, and a description of how these are modified for vaccination effects is provided later.

The tool works in the following way:

Inputs

The first option to select is the type of effects being investigated (either SNP or vaccination).

The next option selects the experimental design to be carried out. These include the five potential designs outlined in the paper: "Single group (without dominance)", "Pure design (without dominance)", "Pure design (with dominance)", "Mixed design (without dominance)" and "Mixed design (with dominance)".

Selecting "User defined" allows the user to set an arbitrary design. Here the number of groups can be altered by entering the relevant number in the box (also groups can be removed by clicking on the "x" symbols in the corner).

Selecting "GWAS / Field data" gives results for the case in which the SNP allele under study is assumed to be randomly distributed across groups. Here the number of groups can be specified by entering the relevant number in the box. The genotypes of individuals (both seeders and contacts) are assumed to be in Hardy-Weinberg equilibrium with a specified A allele frequency (note, in the case of field data the number of seeders is usually set to one to represent the index case).

The number of seeders and contacts refers to the number of individuals initially infected and susceptible in each contact group. The genetic compositions of these subpopulations is shown by the panels on the right (one for each contact group). Here N_AA, N_BB and N_AB refer to the number of individuals in each of the three genotypes. These genetic compositions may be changed by either clicking on the numbers themselves, or by dragging the purple circles within the triangle plots.

Often the same basic experimental design is repeated multiple times as a way of increasing statistical power. This duplication is represented by the number of "replicates".

The shape parameter k governs the gamma distributed recovery profile (note this quantity only affects estimates for recovery rate parameters).

The % infected slider sets ϕ, the expected fraction of contacts that become infected during the course of the experiment. This may be significantly less than 100% if the experiment is terminated early or the basic reproductive ratio R₀ is low.

In some circumstances allele A may dominate over B, or vice-versa. This affect can be accounted for by adjusting the relevant dominance slider.

Outputs

N_total shows the total number of individuals for the experiment.

Model parameters a_g, a_f, and a_r represent the relative fractional differences in susceptibility, infectivity and recovery rate for individuals with an A compared to a B allele at the SNP under investigation. For example, a_g=0.1 represents the case in which individuals with genotype AA are approximately 20% more susceptible to disease than those with BB (see paper for a more precise definition).

Due to the stochastic nature of data obtained from disease transmission experiments (i.e. infection and recovery times), precise estimates for a_g, a_f, and a_r are not possible. What this tool shows is the expected standard deviation in the posterior distribution of these quantities, which provide estimates for their precision. Small numbers represent a higher degree of precision, and so the experimental design should be chosen to minimise these standard deviations as much as possible. A question mark ? is used to represent the case in which the standard deviation is infinite (i.e. no information is available).

One way of interpreting the standard deviations is that they provides a guide as to how small an effect the experiment is able to detect. For example a value "SD in a_g = 0.2" would mean that statistically significant association can be made in cases in which the AA and BB (or alternatively vaccinated/unvaccinated) individuals differ in their susceptibility by 20% or more. The experiment would likely lack the power to identify smaller effect sizes than this. Note this is only a general guide and the actual statistical power depends on stochastic variability inherent in the epidemiological process itself. Furthermore when performing GWAS, power is substantially reduced because of the large number of tests being carried out (e.g. through Bonferroni correction).

Parameters Δ_g, Δ_f, and Δ_r represent the scaled dominance factors of allele A over B. Note, the standard deviations in these quantities are each divided by the corresponding effect size (because if the SNP effect size is small, it becomes harder to establish dominance).

Vaccination effects

The key difference between studying vaccination effects and SNP effects is that instead of three genotypes (AA, AB and BB) we have two classifications: unvaccinated and vaccinated. Effectively this means we can associated unvaccinated individuals to be represented by AA, vaccinated by BB and set the number of AB individuals to zero. This also means that dominance does not need to be considered, and so experimental designs aimed at estimating this are omitted (along with the input slider giving the expected dominance of A over B). Furthermore, instead of triangular plots we now simply use sliders to control the vaccination status composition for the seeder and contact populations within each group.

When the "field data" option is selected it is assumed that individuals are independently vaccinated with a certain specified frequency (resulting in a stochastic variation in vaccination levels across groups). If vaccination levels are actually known for each of the groups this data can be incorporated by selecting the "user defined" option.

General application

Two important points need to be made when considering the general application of the power calculations presented here:

• It is assumed that individuals are randomly distributed regarding the effects of other SNPs on the 3 parameters. In particular, it is assumed that related individuals are randomly distributed across groups.

• The estimates provided by this tool represent lower bounds on the actual standard deviations of model parameters. As shown in the "Realistic model and data scenarios" section of the paper, residual contributions, group and fixed effects and incomplete data will all act to increase these standard deviations, and so reduce the statistical power with which SNP and vaccination-based associations can be made.

Definitions

✖

SNP		Single nucleotide polymorphism
GWAS		Genome-wide association study
Seeder		Initially infected individual
Contact		Initially susceptible individual
Dominance		The degree to which one allele (dominant) masks the phenotypic contribution of the other (recessive)
Replicates		Number of times an experimental design is copied (to increase statistical power)
*N_AA*		Number of individuals with SNP genotype AA
*N_AB*		Number of individuals with SNP genotype AB
*N_BB*		Number of individuals with SNP genotype BB
*N_vac*		Number of vaccinated individuals
*N_unvac*		Number of unvaccinated individuals
*N_total*		The total number of individuals in the disease transmission experiment
k		Shape parameter that determines distribution in infectious duration (assumed gamma distributed)
ϕ		Fraction of contacts assumed to be infected
χ		Homozygote balance - the proportion of AA individuals minus the proportion of BB individuals
H		Homozygosity - the proportion of AA individuals plus the proportion of BB individuals
*a_g*		SNP effect for susceptibility - half the fractional change comparing the AA and BB genotypes (or unvaccinated/vaccinated)
*a_f*		SNP effect for infectivity - half the fractional change comparing the AA and BB genotypes (or unvaccinated/vaccinated)
*a_r*		SNP effect for recoverability - half the fractional change comparing the AA and BB genotypes (or unvaccinated/vaccinated)
SD in a_g		Expected standard deviation in the posterior distribution for a_g - see below for interpretation
SD in a_f		Expected standard deviation in the posterior distribution for a_f - see below for interpretation
SD in a_r		Expected standard deviation in the posterior distribution for a_r - see below for interpretation
Δ_g		Susceptibility effect scaled dominance factor (1 = A is completely dominant over B, -1 = B is dominant over A, 0 = no dominance)
Δ_f		Infectivity effect scaled dominance factor (1 = A is completely dominant over B, -1 = B is dominant over A, 0 = no dominance)
Δ_r		Recoverability effect scaled dominance factor (1 = A is completely dominant over B, -1 = B is dominant over A, 0 = no dominance)
SD in Δ_g		Expected standard deviation in the posterior distribution for Δ_g
SD in Δ_f		Expected standard deviation in the posterior distribution for Δ_f
SD in Δ_r		Expected standard deviation in the posterior distribution for Δ_r

Interpretation

Quantities giving standard deviations in effect sizes (i.e. SD in a_g / a_f / a_r) provide an indication of what size of effect could potentially be identified by a transmission experiment. For example a value "SD in a_g = 0.2" would mean that statistically significant association can be made in cases in which the AA and BB (or alternatively vaccinated/unvaccinated) individuals differ in their susceptibility by 20% or more. The experiment would likely lack the power to identify smaller effect sizes than this.

Note, this tool only provides a general guide and the actual statistical power depends on stochastic variability inherent in the epidemiological process itself. Furthermore, when performing GWAS power is substantially reduced because of the large number of tests being carried out (e.g. through Bonferroni correction).

# Seeders per group:
# Contacts per group:
# Replicates:
Shape parameter k:
% Infected ϕ:
Dominance:	AB

N_total =	Individuals
SD in a_g =	(Susceptibility)
SD in a_f =	(Infectivity)
SD in a_r =	(Recovery)
SD in Δ_g =	(Dom. in Susc.)
SD in Δ_f =	(Dom. in Inf.)
SD in Δ_r =	(Dom. in Rec.)

# Groups:
Allele freq.:

# Groups:
Vac. freq.:

N_AA	=
N_BB	=
N_AB	=