A variety of child-parent configurations are amenable to genetic association studies, including (but not limited to) cases in combination with unrelated controls, case-parent triads, and case-parent triads in combination with unrelated control-parent triads. Because genome-wide association studies (GWAS) are frequently underpowered due to the large number of single-nucleotide polymorphisms being tested, power calculations are necessary to choose an optimal study design and to maximize scientific gains from high genotyping and assay costs.
The statistical power is an important aspect of design comparison. Frequently, study designs are compared directly through a power analysis, without considering the total number of individuals that needs to be genotyped. For example, a fixed number of complete case-parent triads could be compared with the same number of case-mother or case-father dyads. However, such an approach ignores the costs of data collection. A much more general and informative design comparison can be achieved by studying the relative efficiency, which we define as the ratio of variances of two different parameter estimators, corresponding to two separate designs. Using log-linear modeling, we derive the relative efficiency from the asymptotic variance formulas of the parameters. The relative efficiency estimate takes into account the fact that different designs impose different costs relative to the number of genotyped individuals. The relative efficiency calculations are implemented as an easy-to-use function in our R package Haplin (H. K. Gjessing and Lie 2006)) .
We use the releative efficiency estimates to select the study design that attains the highest statistical power using the smallest sample collection and assay costs. The results will depend on the genetic effect being assessed, and our analyses include regular autosomal (offspring or child) effects, parent-of-origin effects and maternal effects (a definition of the genetic effects are provided in (M. Gjerdevik et al. 2019)). We here show example commands for various scenarios.
The relative efficiency of two designs are calculated by the Haplin
hapRelEff. The commands are very similar to the
Haplin power calculation function
hapPowerAsymp, which are
explained in detail in our previously published paper (M. Gjerdevik et al. 2019). In general, one only
needs to specify the study designs to be compared, the allele
frequencies, and the type of genetic effect and its magnitude.
The following command calculates the efficiency of the standard case-control design with an equal number of case and control children relative to the case-parent triad design.
hapRelEff(cases.comp = c(c=1), controls.comp = c(c=1), cases.ref = c(mfc=1), haplo.freq = c(0.1,0.9), RR = c(1,1))
## $haplo.rel.eff ## Haplotype RR.rel.eff ## 1 1 1.5 ## 2 2 ref
specify the comparison designs, whereas
controls.ref specify the reference design. We use the
following abbreviations to describe the family designs. We let the
letters c, m and f denote a child, mother and a father, respectively.
Thus, the case-parent triad design is specified by
cases.comp = c(mfc=1) or
cases.ref = c(mfc=1),
whereas the standard case-control design is specified by
cases.comp = c(c=1) and
controls.comp = c(c=1)
cases.ref = c(c=1) and
controls.ref = c(c=1). To specify a case-control design
with twice as many controls than cases, one could use the combination
cases.comp = c(c=1) and
controls.comp = c(c=2).
The genetic effects are determined by the choice of relative risk
parameter(s), which also specifies the effect sizes. A reguar autosomal
effect is specified by the relative risk argument
relative efficiency estimated under the null hypothesis, i.e., when all
relative risks are equal to one, is known as the Pitman efficiency (Noether 1955). However, other relative risk
values can be used. Allele frequencies are specified by the argument
haplo.freq. Note that the order and length of the specified
relative risk parameter vectors should always match the corresponding
We see that the relative efficiency for the standard case-control design is 1.5, compared with the case-parent triad design. This result is well-known from the literature (H. J. Cordell and Clayton 2005).
To compare the full hybrid design consisting of both case-parent triads and control-parent triads, we can use a command similar to
hapRelEff(cases.comp = c(mfc=1), controls.comp = c(mfc=1), cases.ref = c(mfc=1), haplo.freq = c(0.2,0.8), RR = c(1,1))
The relative efficiency for PoO effects is computed by replacing the
RR by the two relative risk arguments
RRcf denoting parental origin m
(mother) and f (father). The command
hapRelEff(cases.comp = c(mfc=1), controls.comp = c(mfc=1), cases.ref = c(mfc=1), haplo.freq = c(0.2,0.8), RRcm = c(1,1), RRcf = c(1,1))
calculates the efficiency for the full hybrid design, relative to the case-parent triad design. We refer to our previous paper (M. Gjerdevik et al. 2019) for an explanation of the full output.
Since children and their mothers have an allele in common, a maternal
effect might be statistically confounded with a regular autosomal effect
or a PoO effect. The relative efficiency for maternal effects can be
analyzed jointly with that of a regular autosomal effect or a PoO effect
by adding the relative risk argument
RR.mat to the original
hapRelEff(cases.comp = list(c(mc=1)), cases.ref=list(c(mfc=1)), haplo.freq = c(0.1,0.9), RR = c(1,1), RR.mat=c(1,1))
## $haplo.rel.eff ## Haplotype RR.rel.eff RRm.rel.eff ## 1 1 0.6 0.6 ## 2 2 ref ref
calculates the efficiency of the case-mother dyad design relative to the case-mother dyad design, assessing both regular autosomal and maternal effects. In this example, we see that the relative efficiency estimates for regular autosomal and maternal effects are identical when adjusting for possible confounding of the effects with one another (M. Gjerdevik et al. 2019).
The default commands correspond to analyses of single-SNPs. However,
the extention to haplotypes is straightforward. The number of markers
and haplotypes is determined by the vector
nall, where the
number of markers is equal to
length(nall), and the number
of different haplotypes is equal to
prod(nall). Thus, two
diallelic markers are denoted by
nall = c(2,2). The length
of the arguments
correspond to the number of haplotypes, as shown in the example
hapRelEff(nall = c(2,2), cases.comp = c(c=1), controls.comp = c(c=1), cases.ref = c(mfc=1), haplo.freq = c(0.1,0.2,0.3,0.4), RR = c(1,1,1,1))
## $haplo.rel.eff ## Haplotype RR.rel.eff ## 1 1-1 1.31 ## 2 2-1 1.22 ## 3 1-2 1.27 ## 4 2-2 ref
We recommend consulting our paper (M. Gjerdevik et al. 2019) for a more detailed description of haplotype analysis.