Large upward bias in estimation of locus-specific effects from genomewide scans

H H Göring, J D Terwilliger, J Blangero, H H Göring, J D Terwilliger, J Blangero

Abstract

The primary goal of a genomewide scan is to estimate the genomic locations of genes influencing a trait of interest. It is sometimes said that a secondary goal is to estimate the phenotypic effects of each identified locus. Here, it is shown that these two objectives cannot be met reliably by use of a single data set of a currently realistic size. Simulation and analytical results, based on variance-components linkage analysis as an example, demonstrate that estimates of locus-specific effect size at genomewide LOD score peaks tend to be grossly inflated and can even be virtually independent of the true effect size, even for studies on large samples when the true effect size is small. However, the bias diminishes asymptotically. The explanation for the bias is that the LOD score is a function of the locus-specific effect-size estimate, such that there is a high correlation between the observed statistical significance and the effect-size estimate. When the LOD score is maximized over the many pointwise tests being conducted throughout the genome, the locus-specific effect-size estimate is therefore effectively maximized as well. We argue that attempts at bias correction give unsatisfactory results, and that pointwise estimation in an independent data set may be the only way of obtaining reliable estimates of locus-specific effect-and then only if one does not condition on statistical significance being obtained. We further show that the same factors causing this bias are responsible for frequent failures to replicate initial claims of linkage or association for complex traits, even when the initial localization is, in fact, correct. The findings of this study have wide-ranging implications, as they apply to all statistical methods of gene localization. It is hoped that, by keeping this bias in mind, we will more realistically interpret and extrapolate from the results of genomewide scans.

Figures

Figure 1
Figure 1
Distribution of pointwise QTL-heritability estimate at position of QTL. h2q = the generating value for additive trait heritability attributable to QTL; = its sample estimate. See text for details of the simulation.
Figure 2
Figure 2
Nearly one-to-one relationship of QTL-heritability estimate and observed LOD score under full information on meiotic transmissions. Z = observed LOD score; = sample estimate of additive trait heritability attributable to QTL. See text for details of the simulation.
Figure 3
Figure 3
Bias in QTL-heritability estimate at true position of QTL for significant LOD scores. h2 = overall additive trait heritability (h2=0.5, unless otherwise indicated); h2q = additive trait heritability attributable to QTL. The indicated sample sizes refer to numbers of two-offspring nuclear families. See text for details of the approximate analytical approach.
Figure 4
Figure 4
Conceptual representation of genomewide bias as a function of sample size. h2q = additive trait heritability attributable to QTL in population; = its sample estimate; = its mean sample estimate. The biases for the two sample sizes are indicated by the thick horizontal arrows. See text for details.
Figure 5
Figure 5
Relationship of bias in QTL-heritability estimate and probability of replication failure. h2q = additive trait heritability attributable to QTL. The indicated sample sizes refer to numbers of two-offspring nuclear families. See text for details on analytical approach.
Figure 6
Figure 6
Density functions of significant LOD scores as a function of QTL heritability. Z = LOD score; h2q = additive trait heritability attributable to QTL. The given power numbers refer to a sample of 1,000 two-offspring nuclear families. See text for details on analytical approach.

Source: PubMed

3
Subscribe