CSVS, a crowdsourcing database of the Spanish population genetic variability

María Peña-Chilet, Gema Roldán, Javier Perez-Florido, Francisco M Ortuño, Rosario Carmona, Virginia Aquino, Daniel Lopez-Lopez, Carlos Loucera, Jose L Fernandez-Rueda, Asunción Gallego, Francisco García-Garcia, Anna González-Neira, Guillermo Pita, Rocío Núñez-Torres, Javier Santoyo-López, Carmen Ayuso, Pablo Minguez, Almudena Avila-Fernandez, Marta Corton, Miguel Ángel Moreno-Pelayo, Matías Morin, Alvaro Gallego-Martinez, Jose A Lopez-Escamez, Salud Borrego, Guillermo Antiñolo, Jorge Amigo, Josefa Salgado-Garrido, Sara Pasalodos-Sanchez, Beatriz Morte, Spanish Exome Crowdsourcing Consortium, Ángel Carracedo, Ángel Alonso, Joaquín Dopazo, Fátima Al-Shahrour, Rafael Artuch, Javier Benitez, Luis Antonio Castaño, Ignacio Del Castillo, Aitor Delmiro, Carmina Espinos, Roser González, Daniel Grinberg, Encarnación Guillén, Pablo Lapunzina, Esther Lopez, Ramón Martí, Montserrat Milá, José Mª Millán, Virginia Nunes, Francesc Palau, Belen Perez, Luis Pérez Jurado, Rosario Perona, Aurora Pujol, Feliciano Ramos, Antonia Ribes, Jordi Rosell, Eulalia Rovira, Jordi Surrallés, Isabel Tejada, Magdalena Ugarte, María Peña-Chilet, Gema Roldán, Javier Perez-Florido, Francisco M Ortuño, Rosario Carmona, Virginia Aquino, Daniel Lopez-Lopez, Carlos Loucera, Jose L Fernandez-Rueda, Asunción Gallego, Francisco García-Garcia, Anna González-Neira, Guillermo Pita, Rocío Núñez-Torres, Javier Santoyo-López, Carmen Ayuso, Pablo Minguez, Almudena Avila-Fernandez, Marta Corton, Miguel Ángel Moreno-Pelayo, Matías Morin, Alvaro Gallego-Martinez, Jose A Lopez-Escamez, Salud Borrego, Guillermo Antiñolo, Jorge Amigo, Josefa Salgado-Garrido, Sara Pasalodos-Sanchez, Beatriz Morte, Spanish Exome Crowdsourcing Consortium, Ángel Carracedo, Ángel Alonso, Joaquín Dopazo, Fátima Al-Shahrour, Rafael Artuch, Javier Benitez, Luis Antonio Castaño, Ignacio Del Castillo, Aitor Delmiro, Carmina Espinos, Roser González, Daniel Grinberg, Encarnación Guillén, Pablo Lapunzina, Esther Lopez, Ramón Martí, Montserrat Milá, José Mª Millán, Virginia Nunes, Francesc Palau, Belen Perez, Luis Pérez Jurado, Rosario Perona, Aurora Pujol, Feliciano Ramos, Antonia Ribes, Jordi Rosell, Eulalia Rovira, Jordi Surrallés, Isabel Tejada, Magdalena Ugarte

Abstract

The knowledge of the genetic variability of the local population is of utmost importance in personalized medicine and has been revealed as a critical factor for the discovery of new disease variants. Here, we present the Collaborative Spanish Variability Server (CSVS), which currently contains more than 2000 genomes and exomes of unrelated Spanish individuals. This database has been generated in a collaborative crowdsourcing effort collecting sequencing data produced by local genomic projects and for other purposes. Sequences have been grouped by ICD10 upper categories. A web interface allows querying the database removing one or more ICD10 categories. In this way, aggregated counts of allele frequencies of the pseudo-control Spanish population can be obtained for diseases belonging to the category removed. Interestingly, in addition to pseudo-control studies, some population studies can be made, as, for example, prevalence of pharmacogenomic variants, etc. In addition, this genomic data has been used to define the first Spanish Genome Reference Panel (SGRP1.0) for imputation. This is the first local repository of variability entirely produced by a crowdsourcing effort and constitutes an example for future initiatives to characterize local variability worldwide. CSVS is also part of the GA4GH Beacon network. CSVS can be accessed at: http://csvs.babelomics.org/.

© The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research.

Figures

Figure 1.
Figure 1.
(A) data is contributed by different genomic projects and pass through different quality control steps including an artefact and kinship test (that detects upper outliers, with an unexpected high ratio of private variants, most likely errors, and lower outliers, that are duplicates or close kinship individuals) and locality test before being inserted in the database. (B) Initial CSVS page. (C) Query panel in the Search option. (D) List of variants found in the Spanish population within the selected region along with complementary information on impact, conservation, other's population frequencies and phenotype. (E) genomic browser that displays the selected variant in its genomic context. (F) Saturation plot. (G) Updated contents of the database.
Figure 2.
Figure 2.
Circos plot showing the different genes with high saturation (orange) and low saturation (green) along the chromosomes, which were significantly enriched in functional terms in Supplementary Figure S1.

References

    1. Mardis E.R. DNA sequencing technologies: 2006–2016. Nat. Protoc. 2017; 12:213.
    1. Durbin R.M., Abecasis G.R., Altshuler D.L., Auton A., Brooks L.D., Gibbs R.A., Hurles M.E., McVean G.A.. A map of human genome variation from population-scale sequencing. Nature. 2010; 467:1061–1073.
    1. Dunham I., Kundaje A., Aldred S.F., Collins P.J., Davis C.A., Doyle F., Epstein C.B., Frietze S., Harrow J., Kaul R. et al. .. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012; 489:57–74.
    1. Fu W., O’Connor T.D., Jun G., Kang H.M., Abecasis G., Leal S.M., Gabriel S., Rieder M.J., Altshuler D., Shendure J. et al. .. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature. 2013; 493:216–220.
    1. Boycott K.M., Rath A., Chong J.X., Hartley T., Alkuraya F.S., Baynam G., Brookes A.J., Brudno M., Carracedo A., den Dunnen J.T. et al. .. International cooperation to enable the diagnosis of all rare genetic diseases. Am. J. Hum. Genet. 2017; 100:695–705.
    1. Boycott K.M., Vanstone M.R., Bulman D.E., MacKenzie A.E.. Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nat. Rev. Genet. 2013; 14:681–691.
    1. Wenger A.M., Guturu H., Bernstein J.A., Bejerano G.. Systematic reanalysis of clinical exome data yields additional diagnoses: implications for providers. Genet. Med. 2017; 19:209.
    1. Boycott K.M., Hartley T., Biesecker L.G., Gibbs R.A., Innes A.M., Riess O., Belmont J., Dunwoodie S.L., Jojic N., Lassmann T. et al. .. A diagnosis for all rare genetic diseases: the horizon and the next frontiers. Cell. 2019; 177:32–37.
    1. Rehm H.L., Bale S.J., Bayrak-Toydemir P., Berg J.S., Brown K.K., Deignan J.L., Friez M.J., Funke B.H., Hegde M.R., Lyon E.. ACMG clinical laboratory standards for next-generation sequencing. Genet. Med. 2013; 15:733.
    1. Lek M., Karczewski K.J., Minikel E.V., Samocha K.E., Banks E., Fennell T., O’Donnell-Luria A.H., Ware J.S., Hill A.J., Cummings B.B. et al. .. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016; 536:285.
    1. Ng S.B., Turner E.H., Robertson P.D., Flygare S.D., Bigham A.W., Lee C., Shaffer T., Wong M., Bhattacharjee A., Eichler E.E.. Targeted capture and massively parallel sequencing of 12 human exomes. Nature. 2009; 461:272–276.
    1. Ng S.B., Buckingham K.J., Lee C., Bigham A.W., Tabor H.K., Dent K.M., Huff C.D., Shannon P.T., Jabs E.W., Nickerson D.A. et al. .. Exome sequencing identifies the cause of a mendelian disorder. Nat. Genet. 2010; 42:30–35.
    1. Ng S.B., Bigham A.W., Buckingham K.J., Hannibal M.C., McMillin M.J., Gildersleeve H.I., Beck A.E., Tabor H.K., Cooper G.M., Mefford H.C. et al. .. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat. Genet. 2010; 42:790–793.
    1. Abecasis G.R., Auton A., Brooks L.D., DePristo M.A., Durbin R.M., Handsaker R.E., Kang H.M., Marth G.T., McVean G.A.. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012; 491:56–65.
    1. The_Genome_of_the_Netherlands_Consortium Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat. Genet. 2014; 46:818–825.
    1. Nelson M.R., Wegmann D., Ehm M.G., Kessner D., St Jean P., Verzilli C., Shen J., Tang Z., Bacanu S.A., Fraser D. et al. .. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science. 2012; 337:100–104.
    1. Kryukov G.V., Pennacchio L.A., Sunyaev S.R.. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am. J. Hum. Genet. 2007; 80:727–739.
    1. Marth G.T., Yu F., Indap A.R., Garimella K., Gravel S., Leong W.F., Tyler-Smith C., Bainbridge M., Blackwell T., Zheng-Bradley X. et al. .. The functional spectrum of low-frequency coding variation. Genome Biol. 2011; 12:R84.
    1. Mathieson I., McVean G.. Differential confounding of rare and common variants in spatially structured populations. Nat. Genet. 2012; 44:243–246.
    1. Moreno-Estrada A., Gravel S., Zakharia F., McCauley J.L., Byrnes J.K., Gignoux C.R., Ortiz-Tello P.A., Martinez R.J., Hedges D.J., Morris R.W. et al. .. Reconstructing the population genetic history of the Caribbean. PLoS Genet. 2013; 9:e1003925.
    1. Corona E., Chen R., Sikora M., Morgan A.A., Patel C.J., Ramesh A., Bustamante C.D., Butte A.J.. Analysis of the genetic basis of disease in the context of worldwide human relationships and migration. PLoS Genet. 2013; 9:e1003447.
    1. Fernandez R.M., Bleda M., Luzon-Toro B., Garcia-Alonso L., Arnold S., Sribudiani Y., Besmond C., Lantieri F., Doan B., Ceccherini I. et al. .. Pathways systematically associated to Hirschsprung's disease. Orphanet. J. Rare. Dis. 2013; 8:187.
    1. Dopazo J., Amadoz A., Bleda M., Garcia-Alonso L., Alemán A., García-García F., Rodriguez J.A., Daub J.T., Muntané G., Rueda A.. 267 Spanish exomes reveal population-specific differences in disease-related genetic variation. Mol. Biol. Evol. 2016; 33:1205–1218.
    1. Bustamante C.D., Burchard E.G., De la Vega F.M.. Genomics for the world. Nature. 2011; 475:163–165.
    1. Wong L.P., Ong R.T., Poh W.T., Liu X., Chen P., Li R., Lam K.K., Pillai N.E., Sim K.S., Xu H. et al. .. Deep whole-genome sequencing of 100 southeast Asian Malays. Am. J. Hum. Genet. 2013; 92:52–66.
    1. Casals F., Hodgkinson A., Hussin J., Idaghdour Y., Bruat V., de Maillard T., Grenier J.C., Gbeha E., Hamdan F.F., Girard S. et al. .. Whole-exome sequencing reveals a rapid change in the frequency of rare functional variants in a founding population of humans. PLos Genet. 2013; 9:e1003815.
    1. Lim E.T., Wurtz P., Havulinna A.S., Palta P., Tukiainen T., Rehnstrom K., Esko T., Magi R., Inouye M., Lappalainen T. et al. .. Distribution and medical impact of loss-of-function variants in the Finnish founder population. PLoS Genet. 2014; 10:e1004494.
    1. Gudbjartsson D.F., Helgason H., Gudjonsson S.A., Zink F., Oddson A., Gylfason A., Besenbacher S., Magnusson G., Halldorsson B.V., Hjartarson E. et al. .. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. 2015; 47:435–444.
    1. Nagasaki M., Yasuda J., Katsuoka F., Nariai N., Kojima K., Kawai Y., Yamaguchi-Kabata Y., Yokozawa J., Danjoh I., Saito S. et al. .. Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals. Nat. Commun. 2015; 6:8018.
    1. Fattahi Z., Beheshtian M., Mohseni M., Poustchi H., Sellars E., Nezhadi S.H., Amini A., Arzhangi S., Jalalvand K., Jamali P.. Iranome: a catalog of genomic variations in the Iranian population. Hum. Mutat. 2019; 40:1968–1984.
    1. Khare R., Good B.M., Leaman R., Su A.I., Lu Z.. Crowdsourcing in biomedicine: challenges and opportunities. Brief. Bioinform. 2015; 17:23–32.
    1. Estellés-Arolas E., González-Ladrón-de-Guevara F.. Towards an integrated crowdsourcing definition. J Inf Sci. 2012; 38:189–200.
    1. Margolin A.A., Bilal E., Huang E., Norman T.C., Ottestad L., Mecham B.H., Sauerwine B., Kellen M.R., Mangravite L.M., Furia M.D. et al. .. Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer. Sci. Transl. Med. 2013; 5:181re1.
    1. Plenge R.M., Greenberg J.D., Mangravite L.M., Derry J.M., Stahl E.A., Coenen M.J., Barton A., Padyukov L., Klareskog L., Gregersen P.K. et al. .. Crowdsourcing genetic prediction of clinical utility in the rheumatoid arthritis responder challenge. Nat. Genet. 2013; 45:468–469.
    1. Eduati F., Mangravite L.M., Wang T., Tang H., Bare J.C., Huang R., Norman T., Kellen M., Menden M.P., Yang J. et al. .. Prediction of human population responses to toxic compounds by a collaborative competition. Nat. Biotech. 2015; 33:933–940.
    1. Davis S., Button-Simons K., Bensellak T., Ahsen E.M., Checkley L., Foster G.J., Su X., Moussa A., Mapiye D., Khoo S.K. et al. .. Leveraging crowdsourcing to accelerate global health solutions. Nat. Biotechnol. 2019; 37:848–850.
    1. Gallego-Martinez A., Lopez-Escamez J.A.. Genetic architecture of Meniere's disease. Hear. Res. 2019; 107872.
    1. Gui H., Schriemer D., Cheng W.W., Chauhan R.K., Antiňolo G., Berrios C., Bleda M., Brooks A.S., Brouwer R.W., Burns A.J.. Whole exome sequencing coupled with unbiased functional analysis reveals new Hirschsprung disease genes. Genome Biol. 2017; 18:48.
    1. Alexander D.H., Novembre J., Lange K.. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009; 19:1655–1664.
    1. Chen T., Guestrin C.. Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. 2016; ACM; 785–794.
    1. Das S., Forer L., Schönherr S., Sidore C., Locke A.E., Kwong A., Vrieze S.I., Chew E.Y., Levy S., McGue M.. Next-generation genotype imputation service and methods. Nat. Genet. 2016; 48:1284.
    1. Ng P.C., Henikoff S.. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003; 31:3812–3814.
    1. Adzhubei I., Jordan D.M., Sunyaev S.R.. Predicting functional effect of human missense mutations using PolyPhen‐2. Curr. Protoc. Hum. Genet. 2013; 76:7.20.21–27.20.41.
    1. Kircher M., Witten D.M., Jain P., O’Roak B.J., Cooper G.M., Shendure J.. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 2014; 46:310–315.
    1. Davydov E.V., Goode D.L., Sirota M., Cooper G.M., Sidow A., Batzoglou S.. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 2010; 6:e1001025.
    1. Landrum M.J., Lee J.M., Benson M., Brown G.R., Chao C., Chitipiralla S., Gu B., Hart J., Hoffman D., Jang W. et al. .. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2017; 46:D1062–D1067.
    1. Tate J.G., Bamford S., Jubb H.C., Sondka Z., Beare D.M., Bindal N., Boutselakis H., Cole C.G., Creatore C., Dawson E. et al. .. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res. 2018; 47:D941–D947.
    1. Bleda M., Tarraga J., de Maria A., Salavert F., Garcia-Alonso L., Celma M., Martin A., Dopazo J., Medina I.. CellBase, a comprehensive collection of RESTful web services for retrieving relevant biological information from heterogeneous sources. Nucleic Acids Res. 2012; 40:W609–W614.
    1. Medina I., Salavert F., Sanchez R., de Maria A., Alonso R., Escobar P., Bleda M., Dopazo J.. Genome Maps, a new generation genome browser. Nucleic Acids Res. 2013; 41:W41–W46.
    1. Philippakis A.A., Azzariti D.R., Beltran S., Brookes A.J., Brownstein C.A., Brudno M., Brunner H.G., Buske O.J., Carey K., Doll C.. The Matchmaker Exchange: a platform for rare disease gene discovery. Hum. Mutat. 2015; 36:915–921.
    1. Kuleshov M.V., Jones M.R., Rouillard A.D., Fernandez N.F., Duan Q., Wang Z., Koplev S., Jenkins S.L., Jagodnik K.M., Lachmann A. et al. .. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016; 44:W90–W97.
    1. Kubo K., Ohara M., Tachikawa M., Cavallari L., Lee M., Wen M., Scordo M., Nutescu E., Perera M., Miyajima A.. Population differences in S-warfarin pharmacokinetics among African Americans, Asians and whites: their influence on pharmacogenetic dosing algorithms. Pharmacogenomics J. 2017; 17:494–500.
    1. Meyer U.A. Pharmacogenetics–five decades of therapeutic lessons from genetic diversity. Nat. Rev. Genet. 2004; 5:669–676.
    1. Ramamoorthy A., Pacanowski M., Bull J., Zhang L.. Racial/ethnic differences in drug disposition and response: review of recently approved drugs. Clin. Pharmacol. Ther. 2015; 97:263–273.
    1. Barbarino J.M., Whirl‐Carrillo M., Altman R.B., Klein T.E.. PharmGKB: A worldwide resource for pharmacogenomic information. Wiley Interdiscip. Rev. Syst. Biol. Med. 2018; 10:e1417.
    1. Koch L. Exploring human genomic diversity with gnomAD. Nat. Rev. Genet. 2020; 21:448–448.
    1. Ingelman-Sundberg M., Mkrtchian S., Zhou Y., Lauschke V.M.. Integrating rare genetic variants into pharmacogenetic drug response predictions. Hum. Genomics. 2018; 12:26.
    1. McLaren W., Gil L., Hunt S.E., Riat H.S., Ritchie G.R., Thormann A., Flicek P., Cunningham F.. The ensembl variant effect predictor. Genome Biol. 2016; 17:122.
    1. González-Pérez A., López-Bigas N.. Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. Am. J. Hum. Genet. 2011; 88:440–449.
    1. Fadista J., Oskolkov N., Hansson O., Groop L.. LoFtool: a gene intolerance score based on loss-of-function variants in 60 706 individuals. Bioinformatics. 2017; 33:471–474.
    1. Piñero J., Queralt-Rosinach N., Bravo À., Deu-Pons J., Bauer-Mehren A., Baron M., Sanz F., Furlong L.I.. DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database. 2015; 2015:bav028.
    1. Saunders G., Baudis M., Becker R., Beltran S., Béroud C., Birney E., Brooksbank C., Brunak S., Van den Bulcke M., Drysdale R.. Leveraging European infrastructures to access 1 million human genomes by 2022. Nat. Rev. Genet. 2019; 20:693–701.

Source: PubMed

3
Se inscrever