The UK10K project identifies rare variants in health and disease

UK10K Consortium, Klaudia Walter, Josine L Min, Jie Huang, Lucy Crooks, Yasin Memari, Shane McCarthy, John R B Perry, ChangJiang Xu, Marta Futema, Daniel Lawson, Valentina Iotchkova, Stephan Schiffels, Audrey E Hendricks, Petr Danecek, Rui Li, James Floyd, Louise V Wain, Inês Barroso, Steve E Humphries, Matthew E Hurles, Eleftheria Zeggini, Jeffrey C Barrett, Vincent Plagnol, J Brent Richards, Celia M T Greenwood, Nicholas J Timpson, Richard Durbin, Nicole Soranzo

Abstract

The contribution of rare and low-frequency variants to human traits is largely unexplored. Here we describe insights from sequencing whole genomes (low read depth, 7×) or exomes (high read depth, 80×) of nearly 10,000 individuals from population-based and disease collections. In extensively phenotyped cohorts we characterize over 24 million novel sequence variants, generate a highly accurate imputation reference panel and identify novel alleles associated with levels of triglycerides (APOB), adiponectin (ADIPOQ) and low-density lipoprotein cholesterol (LDLR and RGAG1) from single-marker and rare variant aggregation tests. We describe population structure and functional annotation of rare and low-frequency variants, use the data to estimate the benefits of sequencing for association studies, and summarize lessons from disease-specific collections. Finally, we make available an extensive resource, including individual-level genetic and phenotypic data and web-based tools to facilitate the exploration of association results.

Conflict of interest statement

P.F. is a member of the Scientific Advisory Board of Omicia, Inc.

Figures

Figure 1. The UK10K-cohorts resource for variation…
Figure 1. The UK10K-cohorts resource for variation discovery.
Number of SNVs identified in the UK10K-cohorts data set in all autosomal regions in different allele frequency (AF) bins, and percentages that were shared with samples of European ancestry from the 1000 Genomes Project (phase I, EUR n = 379) and/or the Genomes of the Netherlands (GoNL, n = 499) study, or unique to the UK10K-cohorts data set. AF bins were calculated using the UK10K data set, for allele count (AC) = 1, AC = 2, and non-overlapping AF bins for higher AC. All numerical values are in Extended Data Fig. 2. PowerPoint slide
Figure 2. Study design for associations tested…
Figure 2. Study design for associations tested in the UK10K-cohorts study.
Summary of phenotype–genotype association testing strategies employed in the UK10K-cohorts study. PowerPoint slide
Figure 3. Summary of association results across…
Figure 3. Summary of association results across the UK10K-cohorts study.
Allelic spectrum for single-marker association results for independent variants identified in the single-variant analysis (Supplementary Table 5). A variant’s effect (absolute value of Beta, expressed in standard deviation units) is given as a function of minor allele frequency (MAF, x axis). Error bars are proportional to the standard error of the beta, variants identifying known loci are dark blue and variants identifying novel signals replicated in independent studies are coloured in light blue. The red and orange lines indicate 80% power at experiment-wide significance level (t-test; P value ≤4.62 × 10−10) for the maximum theoretical sample size for the WGS sample and WGS+GWA, respectively. PowerPoint slide
Figure 4. Power for single-variant and region-based…
Figure 4. Power for single-variant and region-based tests.
a, Strength of single-variant associations detectable at 80% power as a function of MAF and sample size. Using data from chromosome 20, we calculated the smallest value of the strength of association Beta (measured in standard deviations), that would be detectable under a linear dosage model, given the MAF and r2 of each variant imputable from both the 1000GP and the UK10K+1000GP reference panels, for various sample sizes, n. The averages of these minimum detectable beta values by MAF and sample size are shown. b, Power of region-based tests in the UK10K-cohorts sample. Evaluations assume n = 3,621, α = 6.7 × 10−8 and that the proportion of causal variants in the regions is either 5% or 20%, for maximum association (Max Beta) in a region = 2, 3, 4 s.d. c, Power of region-based tests and the impact of genotype imputation. Ten regions of 30 variants were randomly sampled from each autosome, and then genotype errors were randomly added to the data following observed r2 values between genotypes from data imputed from different sources (WGS, high depth WES, GWAS imputed against 1000GP, GWAS imputed against the combined reference panel of 1000GP and UK10K; Supplementary Table 11), and matching the MAF of each variant using the same parameters as in b, with the proportion of causal variants in the regions set to 20%. PowerPoint slide
Figure 5. Enrichment of single-marker associations by…
Figure 5. Enrichment of single-marker associations by functional annotation in the UK10K-cohorts study.
Distribution of fold enrichment statistics for single-variant associations of low-frequency (MAF 1–5%) and common (MAF ≥ 5%) SNVs in near-genic elements or selected chromatin states and DNase I hotspots (DHS). Boxplots represent distributions of fold enrichment statistics estimated across the five (out of 31 core) traits where at least 10 independent SNVs were associated with the trait at 10−7P value (permutation test) threshold (HDL, LDL, TC, APOA1 and APOB). Chromatin state and DHS regions were inferred from ENCODE data in a liver cell line, HepG2, which is informative for lipids. Promoter and 5′ UTR are not shown, but corresponding statistics are given in Supplementary Table 12. PowerPoint slide
Extended Data Figure 1. UK10K-cohorts, sequence and…
Extended Data Figure 1. UK10K-cohorts, sequence and sample quality and variation metrics.
ae, Sample quality metrics for UK10K-cohorts (n = 3,781) where n = 1–1,927 corresponds to ALSPAC and 1,928 to 3,781 to TwinsUK. This sample includes all individuals passing sample quality control, including related pairs and non-European individuals that were later removed from association tests. A subset of 3,621 individuals was included in association analyses. Samples sequenced at BGI are coloured in blue and samples sequenced at Sanger are coloured in grey. a, Number of singletons (AC = 1) by sample (×103). b, Number of INDELs by sample (×105). c, Read depth (sequence coverage) by sample. d, Ratio of heterozygous and homozygous non-reference (=homozygous alternative) SNV genotypes (mean for females = 1.54, mean for males = 1.47). e, Transition to transversion ratio (Ts/Tv) by sample. fi, Sequence variation metrics for UK10K-cohorts. f, Types of substitution (×106). g, Number of SNVs (×106), INDELs (×105) and large deletions (×103) by non-overlapping non-reference allele frequency (AF) bins. h, Size distribution of INDELs. Negative INDEL lengths represent deletions and positive INDEL lengths represent insertions. i, Large deletion size distribution in unequal bin sizes where the smallest deletions were 200 bp to 1 kb long and the largest deletions 100 kb to 1 Mb. In total 18,739 deletions were called with GenomeSTRiP. The average deletion size was ˜13 kb and the median size was ˜3.7 kb. j, Total number of SNVs and INDELs by AF bin (based on 3,781 samples), multi-allelic variants are treated as separate variants. k, Sequence quality and variation metrics for UK10K-cohorts. For 61 overlapping TwinsUK individuals we compared the variant sites and genotypes of the low-coverage sequences with high-coverage exome data by non-overlapping AF bins (WGS versus Exomes). We considered 74,621 shared sites in non-overlapping AF bins. We calculated the fraction of concordant over total sites, the number of non-reference genotypes and non-reference genotype discordance (NRD, in %) between WGS and Exomes; false discovery rate (FDR = FP/(FP + TP); TP, true positive; FP, false positive), where we consider the exomes as the truth set; number of false positives (FP) and FDR for sites that are or not shared with the 1000 Genomes Project, phase I (1000GP); false negative rate (FNR = FN/(FN + TP); FN, false negative; TP, true positive), where AF bins were defined based on the 61 exomes. Furthermore, we compared 22 monozygotic twin pairs at 880,280 bi-allelic SNV sites on chromosome 20, reporting the percentage of concordant genotypes, non-reference genotypes and NRD. AFs are from the set of 3,621 samples, which contains at most one of the two monozygotic twins from each pair. We note that discrepancies can be caused by errors in either twin, so the expected NRD to the truth would be half the NRD value given.
Extended Data Figure 2. UK10K-cohorts, comparison with…
Extended Data Figure 2. UK10K-cohorts, comparison with GoNL and 1000GP-EUR.
Percentage of autosomal SNVs that are either shared between UK10K (n = 3,781), GoNL (n = 499) and 1000GP-EUR (n = 379), or unique to each set, for allele counts (AC) AC = 1, AC = 2, and non-overlapping allele frequency (AF) bins for higher AC. a, Shared and unique variants for GoNL with AF based on GoNL, and b, for 1000GP-EUR. AF bins are not directly comparable owing to the different sample sizes in each call set. The x-axis shows the number of variants in millions. The percentages next to the bars represent the percentage of variants from GoNL (a) and 1000GP-EUR (b) that are shared with at least one of the other data sets. All numerical values used in a can be found in d and for b in e. c, Numerical values for Fig. 1.
Extended Data Figure 3. UK10K-cohorts, derived allele…
Extended Data Figure 3. UK10K-cohorts, derived allele frequency spectrum by functional annotation.
Derived allele frequency (DAF) spectrum for UK10K-cohorts chromosome 20 variants divided by functional class. a, Proportion of total variants (standardized across DAF bins) as a function of DAF for different genic elements. b, Standardized proportion of all variants by DAF bin, and divided into conserved (GERP > 2) versus neutral (GERP ≤ 2) sites. c, Ratio of conserved versus neutral variants by DAF bin, and classified by chromatin segmentation domains defined by ENCODE as detailed in the methods.
Extended Data Figure 4. UK10K-cohorts, false discovery…
Extended Data Figure 4. UK10K-cohorts, false discovery rate (FDR).
ag, FDR values for reporting associations at different P value cut-offs for all analyses reported in this study and the 31 core traits for single-variant analysis (a); naive exome-wide Meta SKAT (b); naive exome-wide Meta SKAT-O (c); functional exome-wide Meta SKAT (LoF and missense) (d); functional exome-wide Meta SKAT-O (LoF and missense) (e); functional exome-wide Meta SKAT (LoF) (f); functional exome-wide Meta SKAT-O (LoF) (g).
Extended Data Figure 5. UK10K-cohorts, QQ plots.
Extended Data Figure 5. UK10K-cohorts, QQ plots.
QQ plots for the association tests of the 31 core traits in the WGS data set (n = 3,621 individuals). a, Single-variant analysis (˜14 million variants with MAF ≥ 0.1%); b, naive exome-wide Meta SKAT (1,783,548 variants with MAF < 1% in 50,717 windows); c, functional exome-wide Meta SKAT (LoF and missense; 256,733 variants with MAF < 1% in 14,909 windows); d, loss-of-function functional exome-wide Meta SKAT (LoF; 9,113 variants with MAF < 1% in 3,208 windows); e, genome-wide Meta SKAT (35,858,684 variants with MAF < 1% in 1,845,982 windows).
Extended Data Figure 6. UK10K-exomes, sequence variant…
Extended Data Figure 6. UK10K-exomes, sequence variant statistics.
Number of variants (×103) that are found in one or more of the three UK10K-exomes disease data sets, as a function of allele frequency (AF) of the non-reference allele. Variants are split into allele counts (AC) AC = 1, AC = 2 and non-overlapping AF bins for AC > 2. Allele frequency is the frequency of the alternative allele. The distributions of SNVs and INDELs across frequencies and disease collections are similar, except that there is a lower proportion of INDELs with AF > 1% compared to SNVs. a, SNVs. Multiallelic sites are included (1.6%), and non-reference alleles at the same site are treated as separate variants. b, INDELs. Counts are given in c. c, Variants are classed by whether they were found in more than one disease collection or unique to a specific group. d, Comparison of UK10K patient set with European-Americans individuals from the NHLBI Exome Sequencing project (EA ESP). The left panel shows the variants identified in UK10K and the percentage shared with EA ESP. Both the total number of variants and the number within the EA ESP bait regions (intersection of bait sets) are given. The right panel shows the variants identified in EA ESP and the percentage shared with UK10K. Both the total number of variants, and the number within the UK10K baits after removing any that failed UK10K quality control, are given. There is some overlap in the ranges of AC and AF for EA ESP variants because different numbers of individuals were included.
Extended Data Figure 7. UK10K-exomes, functional consequences.
Extended Data Figure 7. UK10K-exomes, functional consequences.
ad, Percentage of SNVs in each allele frequency bin that are loss of function (a), functional (b), possibly functional (c) and other (d), when consequences are restricted to given subsets of transcripts, and where the most severe consequence in qualifying transcripts is used. Values are percentages of SNVs that have transcripts of a given type. Protein-coding is transcripts with a biotype of protein coding. High expression is transcripts with FPKM (fragments per kilobase of transcript per million mapped reads) ≥1 in any tissue. Widely expressed is transcripts with FPKM ≥ 1 in 16 tissues. Only low expression is transcripts expressed at FPKM < 1 in all 16 tissues where there were no transcripts with high expression in that variant. Expression was determined from the Illumina Body Map data set. Variants mapping to protein-coding transcripts <300-bp long or with missing or low quality expression data were excluded. Frequency bins are singletons and non-overlapping allele frequency ranges for allele counts above 1. Allele frequency is the frequency of the alternative allele. Multi-allelic sites were included with alternative alleles at the same site treated as separate variants. e, Counts of single nucleotide polymorphisms in each consequence class by allele frequency and transcript subset.
Extended Data Figure 8. UK10K-cohorts, genotype and…
Extended Data Figure 8. UK10K-cohorts, genotype and phenotype similarities within and between regions.
a, b, Dot plots show the genetic (a) and phenotypic distribution (b) of the relationships of 1,139 unrelated TwinsUK individuals by their regional place of birth. To determine the genetic relationships we used the mean number of shared alleles between two individuals within and between regions for allele counts (AC) 2 to 7, where AC is calculated from the whole data set of 3,781 samples. To determine phenotypic similarities we calculated the mean difference between the residualized phenotypes. Genetically-related individuals are more closely related within a region than between regions, while the phenotypic distance measure has similar distributions within and between regions. The mean shared alleles increase with increasing allele count, and simultaneously the within and between distributions converge. c, The five lowest P values for AC 2 to 7 obtained from Mantel tests to determine similarities between genotypes and phenotypes by region. P values were not significant after correcting for multiple testing using the FDR method. Full trait names are given in Supplementary Table 1.
Extended Data Figure 9. UK10K-cohorts, population fine…
Extended Data Figure 9. UK10K-cohorts, population fine structure in the TwinsUK sample.
a, Chunk length matrix for all UK10K defined geographic regions, calculated as described in the methods. The bottom 5 regions are merged in Box 1 Figure. b, Coancestry matrix for all UK10K defined geographic regions, calculated as described in the methods. c, Chunk length matrix for all UK10K FineSTRUCTURE inferred populations, calculated as described in the methods. d, Coancestry matrix for all UK10K FineSTRUCTURE inferred populations. Details on calculation of these parameters are described in Methods. e, Pairwise coincidence matrix for the UK10K FineSTRUCTURE MCMC run, showing the fraction of the 1,000 retained iterations from the posterior in which each pair of individuals is in the same population, averaged for each pair of populations. The full posterior is extremely complex, which is indicative of a continuous admixture cline rather than discrete populations. f, Sources distribution for the FineSTRUCTURE inferred populations with the full set of inferred populations and geographic labels. Geographic labels of London, Southeast, North Midland, Southern and Eastern are merged into South and East for Box 1 Figure. FSPop labels are given to populations inferred by FineSTRUCTURE, which are merged into the Pop labels as shown in the main Box 1 Figure. g, The f2 haplotype age analysis estimates the time to the most recent common ancestor (tMRCA) between the two haplotypes underlying a given observed variant of allele count 2 in all of the TwinsUK samples. The observed IBD segment length around each f2 variant estimates the tMRCA, using an explicit model parameterized by the recombination and the mutation rates. Shown is the map of the UK with all regions used in this analysis depicted by their location, and lines colour-coding the observed median tMRCA of f2 haplotypes.

References

    1. Manolio TA. Bringing genome-wide association findings into clinical use. Nature Rev. Genet. 2013;14:549–558. doi: 10.1038/nrg3523.
    1. Voight BF, et al. The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits. PLoS Genet. 2012;8:e1002793. doi: 10.1371/journal.pgen.1002793.
    1. Cortes A, Brown MA. Promise and pitfalls of the Immunochip. Arthritis Res. Ther. 2011;13:101. doi: 10.1186/ar3204.
    1. Simons YB, Turchin MC, Pritchard JK, Sella G. The deleterious mutation load is insensitive to recent population history. Nature Genet. 2014;46:220–224. doi: 10.1038/ng.2896.
    1. 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature467, 1061–1073 (2010)
    1. Lange LA, et al. Whole-exome sequencing identifies rare and low-frequency coding variants associated with LDL cholesterol. Am. J. Hum. Genet. 2014;94:233–245. doi: 10.1016/j.ajhg.2014.01.010.
    1. Hindorff LA, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA. 2009;106:9362–9367. doi: 10.1073/pnas.0903103106.
    1. Huang J, et al. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nature Commun. 2015;6:8111. doi: 10.1038/ncomms9111.
    1. Zheng, H. et al. Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture. Nature10.1038/nature14878 (2015)
    1. Taylor PN, et al. Whole-genome sequence-based analysis of thyroid function. Nature Commun. 2015;6:5681. doi: 10.1038/ncomms6681.
    1. Timpson NJ, et al. A rare variant in APOC3 is associated with plasma triglyceride and VLDL levels in Europeans. Nature Commun. 2014;5:4871. doi: 10.1038/ncomms5871.
    1. Geihs, M. et al. An interactive genome browser of association results from the UK10K cohorts project. Bioinformatics10.1093/bioinformatics/btv491 (2015)
    1. Boyd A, et al. Cohort Profile: the ‘children of the 90s’—the index offspring of the Avon Longitudinal Study of Parents and Children. Int. J. Epidemiol. 2013;42:111–127. doi: 10.1093/ije/dys064.
    1. Moayyeri A, Hammond CJ, Hart DJ, Spector TD. The UK Adult Twin Registry (TwinsUK Resource) Twin Res. Hum. Genet. 2013;16:144–149. doi: 10.1017/thg.2012.89.
    1. Li Y, Sidore C, Kang HM, Boehnke M, Abecasis GR. Low-coverage sequencing: implications for design of complex trait association studies. Genome Res. 2011;21:940–951. doi: 10.1101/gr.117259.110.
    1. Wheeler E, et al. Genome-wide SNP and CNV analysis identifies common and low-frequency variants associated with severe early-onset obesity. Nature Genet. 2013;45:513–517. doi: 10.1038/ng.2607.
    1. Gudbjartsson DF, et al. Large-scale whole-genome sequencing of the Icelandic population. Nature Genet. 2015;47:435–444. doi: 10.1038/ng.3247.
    1. Williams FM, et al. Genes contributing to pain sensitivity in the normal population: an exome sequencing study. PLoS Genet. 2012;8:e1003095. doi: 10.1371/journal.pgen.1003095.
    1. Genome of the Netherlands Consortium. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nature Genet.46, 818–825 (2014)
    1. Welter D, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42:D1001–D1006. doi: 10.1093/nar/gkt1229.
    1. Xu C, et al. Estimating genome-wide significance for whole-genome sequencing studies. Genet. Epidemiol. 2014;38:281–290. doi: 10.1002/gepi.21797.
    1. Jørgensen AB, Frikke-Schmidt R, Nordestgaard BG, Tybjærg-Hansen A. Loss-of-function mutations in APOC3 and risk of ischemic vascular disease. N. Engl. J. Med. 2014;371:32–41. doi: 10.1056/NEJMoa1308027.
    1. The TG and HDL Working Group of the Exome Sequencing Project, National Heart, Lung, and Blood Institute. Loss-of-function mutations in APOC3, triglycerides, and coronary disease. N. Engl. J. Med.371, 22–31 (2014)
    1. McCarthy MI, et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Rev. Genet. 2008;9:356–369. doi: 10.1038/nrg2344.
    1. Park JH, et al. Distribution of allele frequencies and effect sizes and their interrelationships for common genetic susceptibility variants. Proc. Natl Acad. Sci. USA. 2011;108:18026–18031. doi: 10.1073/pnas.1114759108.
    1. Yang J, et al. Common SNPs explain a large proportion of the heritability for human height. Nature Genet. 2010;42:565–569. doi: 10.1038/ng.608.
    1. Nordestgaard BG, Benn M, Schnohr P, Tybjaerg-Hansen A. Nonfasting triglycerides and risk of myocardial infarction, ischemic heart disease, and death in men and women. J. Am. Med. Assoc. 2007;298:299–308. doi: 10.1001/jama.298.3.299.
    1. Whittall RA, Matheus S, Cranston T, Miller GJ, Humphries SE. The intron 14 2140+5G>A variant in the low density lipoprotein receptor gene has no effect on plasma cholesterol levels. J. Med. Genet. 2002;39:e57. doi: 10.1136/jmg.39.9.e57.
    1. Teslovich TM, et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature. 2010;466:707–713. doi: 10.1038/nature09270.
    1. Asimit J, Zeggini E. Rare variant association analysis methods for complex traits. Annu. Rev. Genet. 2010;44:293–308. doi: 10.1146/annurev-genet-102209-163421.
    1. Wu MC, et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 2011;89:82–93. doi: 10.1016/j.ajhg.2011.05.029.
    1. Liu DJ, Leal SM. Estimating genetic effects and quantifying missing heritability explained by identified rare-variant associations. Am. J. Hum. Genet. 2012;91:585–596. doi: 10.1016/j.ajhg.2012.08.008.
    1. Morisaki H, et al. CDH13 gene coding T-cadherin influences variations in plasma adiponectin levels in the Japanese population. Hum. Mutat. 2012;33:402–410. doi: 10.1002/humu.21652.
    1. Dastani Z, et al. Novel loci for adiponectin levels and their influence on type 2 diabetes and metabolic traits: a multi-ethnic meta-analysis of 45,891 individuals. PLoS Genet. 2012;8:e1002607. doi: 10.1371/journal.pgen.1002607.
    1. Lee S, Wu MC, Lin X. Optimal tests for rare variant effects in sequencing association studies. Biostatistics. 2012;13:762–775. doi: 10.1093/biostatistics/kxs014.
    1. The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature488, 57–74 (2012)
    1. Adams D, et al. BLUEPRINT to decode the epigenetic signature written in blood. Nature Biotechnol. 2012;30:224–226. doi: 10.1038/nbt.2153.
    1. Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature518, 317–330 (2015)
    1. Tennessen JA, et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science. 2012;337:64–69. doi: 10.1126/science.1219240.
    1. Zuk O, et al. Searching for missing heritability: designing rare variant association studies. Proc. Natl Acad. Sci. USA. 2014;111:E455–E464. doi: 10.1073/pnas.1322563111.
    1. Green RC, et al. ACMG recommendations for reporting of incidental findings in clinical exome and genome sequencing. Genet. Med. 2013;15:565–574. doi: 10.1038/gim.2013.73.
    1. Kaye J, et al. Managing clinically significant findings in research: the UK10K example. Eur. J. Hum. Genet. 2014;22:1100–1104. doi: 10.1038/ejhg.2013.290.
    1. Amendola LM, et al. Actionable exomic incidental findings in 6503 participants: challenges of variant classification. Genome Res. 2015;25:305–315. doi: 10.1101/gr.183483.114.
    1. Landrum MJ, et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2014;42:D980–D985. doi: 10.1093/nar/gkt1113.
    1. Leslie S, et al. The fine-scale genetic structure of the British population. Nature. 2015;519:309–314. doi: 10.1038/nature14230.
    1. Mathieson I, McVean G. Differential confounding of rare and common variants in spatially structured populations. Nature Genet. 2012;44:243–246. doi: 10.1038/ng.1074.
    1. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature447, 661–678 (2007)
    1. Lawson DJ, Hellenthal G, Myers S, Falush D. Inference of population structure using dense haplotype data. PLoS Genet. 2012;8:e1002453. doi: 10.1371/journal.pgen.1002453.
    1. Benjamini Y, Hochberg Y. controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B. 1995;57:289–300.

Source: PubMed

3
Předplatit