A global reference for human genetic variation
1000 Genomes Project Consortium, Adam Auton, Lisa D Brooks, Richard M Durbin, Erik P Garrison, Hyun Min Kang, Jan O Korbel, Jonathan L Marchini, Shane McCarthy, Gil A McVean, Gonçalo R Abecasis
Abstract
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Conflict of interest statement
D.M.A. is affiliated with Vertex Pharmaceuticals, E.A. is on the speaker’s bureau for Illumina, P.A. is an advisor to Illumina and Ancestry.com, D.R.B., B.B., M.B., R.K.C., A.C., M.E., S.H., S.K., L.M., J.P. and R.S. are affiliated with Illumina, J.K.B. is affiliated with Ancestry.com, A.C. is on the Science Advisory Board of Biogen Idec. and the scientific advisory board of Affymetrix, A.W.C. is affiliated with DNAnexus, D.C. is affiliated with Personalis, C.J.D., J.G., J.P.S., T.W., B.W., and Y.Z. are affiliated with Affymetrix, E.T.D. is an advisor for DNAnexus, F.M.D.L.V. is employed by Real Time Genomics, M.A.D. is affiliated with SynapDx, P.D. is a co-founder and director of Genomics, and a partner in Peptide Groove, R.D. is a founder of Congenica and a consultant for Dovetail, E.E.E. is on the scientific advisory board of DNAnexus, and is a consultant for Kunming University of Science and Technology as part of the 1000 China Talent Program, P.F. is a member of the scientific advisory board of Omicia, M.G. is an advisor to Bina and DNAnexus, F.C.L.H. is affiliated with ThermoFisher Scientific, N.H. is affiliated with Life Technologies, C.L. is a scientific advisor for BioNano Genomics, H.Y.K.L. is affiliated with Bina Technologies which is part of Roche Sequencing, E.R.M. holds shares in Life Technologies, and G.M. is a co-founder of Genomics and a partner in Peptide Groove.
Figures
References
- The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature467, 1061–1073 (2010)
- The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature491, 56–65 (2012)
- Voight BF, et al. The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits. PLoS Genet. 2012;8:e1002793. doi: 10.1371/journal.pgen.1002793.
- Trynka G, et al. Dense genotyping identifies and localizes multiple common and rare variant association signals in celiac disease. Nature Genet. 2011;43:1193–1201. doi: 10.1038/ng.998.
- Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nature Genet. 2012;44:955–959. doi: 10.1038/ng.2354.
- Xue Y, et al. Deleterious- and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing. Am. J. Hum. Genet. 2012;91:1022–1032. doi: 10.1016/j.ajhg.2012.10.015.
- Jung H, Bleazard T, Lee J, Hong D. Systematic investigation of cancer-associated somatic point mutations in SNP databases. Nature Biotechnol. 2013;31:787–789. doi: 10.1038/nbt.2681.
- Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature10.1038/nature15394 (this issue)
- The Haplotype Reference Consortium ()
- Simons YB, Turchin MC, Pritchard JK, Sella G. The deleterious mutation load is insensitive to recent population history. Nature Genet. 2014;46:220–224. doi: 10.1038/ng.2896.
- Do R, et al. No evidence that selection has been less effective at removing deleterious mutations in Europeans than in Africans. Nature Genet. 2015;47:126–131. doi: 10.1038/ng.3186.
- Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664. doi: 10.1101/gr.094052.109.
- Mathieson I, McVean G. Demography and the age of rare variants. PLoS Genet. 2014;10:e1004528. doi: 10.1371/journal.pgen.1004528.
- Li H, Durbin R. Inference of human population history from individual whole-genome sequences. Nature. 2011;475:493–496. doi: 10.1038/nature10231.
- Moltke I, et al. A common Greenlandic TBC1D4 variant confers muscle insulin resistance and type 2 diabetes. Nature. 2014;512:190–193. doi: 10.1038/nature13425.
- Yi X, et al. Sequencing of 50 human exomes reveals adaptation to high altitude. Science. 2010;329:75–78. doi: 10.1126/science.1190371.
- Lamason RL, et al. SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. Science. 2005;310:1782–1786. doi: 10.1126/science.1116238.
- Eiberg H, et al. Blue eye color in humans may be caused by a perfectly associated founder mutation in a regulatory element located within the HERC2 gene inhibiting OCA2 expression. Hum. Genet. 2008;123:177–187. doi: 10.1007/s00439-007-0460-x.
- Mathias RA, et al. Adaptive evolution of the FADS gene cluster within Africa. PLoS ONE. 2012;7:e44926. doi: 10.1371/journal.pone.0044926.
- Hernandez RD, et al. Classic selective sweeps were rare in recent human evolution. Science. 2011;331:920–924. doi: 10.1126/science.1198878.
- Chen W, et al. Genetic variants near TIMP3 and high-density lipoprotein-associated loci influence susceptibility to age-related macular degeneration. Proc. Natl Acad. Sci. USA. 2010;107:7401–7406. doi: 10.1073/pnas.0912702107.
- Wakefield J. Bayes factors for genome-wide association studies: comparison with P-values. Genet. Epidemiol. 2009;33:79–86. doi: 10.1002/gepi.20359.
- Wakefield J. Commentary: genome-wide significance thresholds via Bayes factors. Int. J. Epidemiol. 2012;41:286–291. doi: 10.1093/ije/dyr241.
- Sham PC, Purcell SM. Statistical power and significance testing in large-scale genetic studies. Nature Rev. Genet. 2014;15:335–346. doi: 10.1038/nrg3706.
- Gold B, et al. Variation in factor B (BF) and complement component 2 (C2) genes is associated with age-related macular degeneration. Nature Genet. 2006;38:458–462. doi: 10.1038/ng1750.
- Klein RJ, et al. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308:385–389. doi: 10.1126/science.1109557.
- Rivera A, et al. Hypothetical LOC387715 is a second major susceptibility gene for age-related macular degeneration, contributing independently of complement factor H to disease risk. Hum. Mol. Genet. 2005;14:3227–3236. doi: 10.1093/hmg/ddi353.
- Yates JR, et al. Complement C3 variant and the risk of age-related macular degeneration. N. Engl. J. Med. 2007;357:553–561. doi: 10.1056/NEJMoa072618.
- Maller JB, et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nature Genet. 2012;44:1294–1301. doi: 10.1038/ng.2435.
- Fritsche LG, et al. Age-related macular degeneration is associated with an unstable ARMS2 (LOC387715) mRNA. Nature Genet. 2008;40:892–896. doi: 10.1038/ng.170.
- The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature489, 57–74 (2012)
- Stranger BE, et al. Patterns of cis regulatory variation in diverse human populations. PLoS Genet. 2012;8:e1002639. doi: 10.1371/journal.pgen.1002639.
- Chaisson MJ, et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature. 2015;517:608–611. doi: 10.1038/nature13907.
- Gudbjartsson DF, et al. Large-scale whole-genome sequencing of the Icelandic population. Nature Genet. 2015;47:435–444. doi: 10.1038/ng.3247.
- The UK10K Consortium. The UK10K project identifies rare variants in health and disease. Nature10.1038/nature14962 (2015)
- Sidore, C. et al. Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nature Genet.10.1038/ng.3368 (2015)
- Delaneau O, Marchini J. The 1000 Genomes Project Consortium. Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel. Nature Commun. 2014;5:3934. doi: 10.1038/ncomms4934.
- O’Connell J, et al. A general approach for haplotype phasing across the full spectrum of relatedness. PLoS Genet. 2014;10:e1004234. doi: 10.1371/journal.pgen.1004234.
- Menelaou A, Marchini J. Genotype calling and phasing using next-generation sequencing reads and a haplotype scaffold. Bioinformatics. 2013;29:84–91. doi: 10.1093/bioinformatics/bts632.
Source: PubMed