COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer

Simon A Forbes, Nidhi Bindal, Sally Bamford, Charlotte Cole, Chai Yin Kok, David Beare, Mingming Jia, Rebecca Shepherd, Kenric Leung, Andrew Menzies, Jon W Teague, Peter J Campbell, Michael R Stratton, P Andrew Futreal, Simon A Forbes, Nidhi Bindal, Sally Bamford, Charlotte Cole, Chai Yin Kok, David Beare, Mingming Jia, Rebecca Shepherd, Kenric Leung, Andrew Menzies, Jon W Teague, Peter J Campbell, Michael R Stratton, P Andrew Futreal

Abstract

COSMIC (http://www.sanger.ac.uk/cosmic) curates comprehensive information on somatic mutations in human cancer. Release v48 (July 2010) describes over 136,000 coding mutations in almost 542,000 tumour samples; of the 18,490 genes documented, 4803 (26%) have one or more mutations. Full scientific literature curations are available on 83 major cancer genes and 49 fusion gene pairs (19 new cancer genes and 30 new fusion pairs this year) and this number is continually increasing. Key amongst these is TP53, now available through a collaboration with the IARC p53 database. In addition to data from the Cancer Genome Project (CGP) at the Sanger Institute, UK, and The Cancer Genome Atlas project (TCGA), large systematic screens are also now curated. Major website upgrades now make these data much more mineable, with many new selection filters and graphics. A Biomart is now available allowing more automated data mining and integration with other biological databases. Annotation of genomic features has become a significant focus; COSMIC has begun curating full-genome resequencing experiments, developing new web pages, export formats and graphics styles. With all genomic information recently updated to GRCh37, COSMIC integrates many diverse types of mutation information and is making much closer links with Ensembl and other data resources.

Figures

Figure 1.
Figure 1.
Circos diagram summarizing the full somatic mutation content of cell line NCI-H209. Concentric rings summarize the data on different types of mutation. From the inside out, the core displays the structural rearrangements; intrachromosomal are in green, interchromosomal in purple. The next ring out shows the chromosomal copy number in histogram form, with inner red patches indicating regions of LOH. Further out, several rings of single base coding substitutions are shown (black tiles show splice site mutations, red stop-gained, purple non-synonymous and grey synonymous changes). The inner dark orange and outer light orange histograms represent non-coding mutations, relative frequencies of homozygous and heterozygous mutations, respectively. In the final ring before the chromosome indicators, indels are shown in green; light green represents insertions and dark green deletions.
Figure 2.
Figure 2.
The gene histogram page for TP53. The histogram shows relative frequencies of mutations (y-axis) across the CDS of the gene (x-axis). Underneath the x-axis scale bar are complex replacement mutations, followed by simple deletions (blue triangles) and insertions (red triangles). Under this, zoom options are available. On the left, the new specialization filters are shown, offering many query options.
Figure 3.
Figure 3.
Pie charts (here showing the TP53 gene) are increasingly used for summarization of complex spectrum data in COSMIC. Two are currently live with many more forthcoming. The top graph (a) shows the breakdown of all observed mutations by type, and the lower (b) shows the breakdown of mutated samples by source. The total number differs slightly due to some samples having more than one mutation, thus being counted once in (b) but twice or more in (a).
Figure 4.
Figure 4.
Mutation spectrum histogram for whole-genome-resequencing sample COLO-829, displaying the considerable overrepresentation of C:G>T:A events in its coding mutation repertoire, reflecting the characteristic signature of DNA damage due to ultraviolet light exposure common in malignant melanoma.

References

    1. Forbes SA, Tang G, Bindal N, Bamford S, Dawson E, Cole C, Kok CY, Jia M, Ewing R, Menzies A, et al. COSMIC (the Catalogue of Somatic Mutations in Cancer): a resource to investigate acquired mutations in human cancer. Nucleic Acids Res. 2010;38:D652–D657.
    1. Forbes SA, Bhamra G, Bamford S, Dawson E, Kok C, Clements J, Menzies A, Teague JW, Futreal PA, Stratton MR. The Catalogue of Somatic Mutations in Cancer (COSMIC) Curr. Protoc. Hum. Genet. 2008 Chapter 10, 11.
    1. Petitjean A, Mathe E, Kato S, Ishioka C, Tavtigian SV, Hainaut P, Olivier M. Impact of mutant p53 functional properties on TP53 mutation patterns and tumor phenotype: lessons from recent developments in the IARC TP53 database. Hum. Mutat. 2007;28:622–629.
    1. Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455:1061–1068.
    1. Sjöblom T, Jones S, Wood LD, Parsons DW, Lin J, Barber TD, Mandelker D, Leary RJ, Ptak J, Silliman N, et al. The consensus coding sequences of human breast and colorectal cancers. Science. 2006;314:268–274.
    1. Parsons DW, Jones S, Zhang X, Lin JC, Leary RJ, Angenendt P, Mankoo P, Carter H, Siu IM, Gallia GL, et al. An integrated genomic analysis of human glioblastoma multiforme. Science. 2008;321:1807–1812.
    1. Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K, Sougnez C, Greulich H, Muzny DM, Morgan MB, et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature. 2008;455:1069–1075.
    1. Pleasance ED, Stephens PJ, O'Meara S, McBride DJ, Meynert A, Jones D, Lin ML, Beare D, Lau KW, Greenman C, et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature. 2010;463:184–190.
    1. Pleasance ED, Cheetham RK, Stephens PJ, McBride DJ, Humphray SJ, Greenman CD, Varela I, Lin ML, Ordóñez GR, Bignell GR, et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature. 2009;463:191–196.
    1. Mardis ER, Ding L, Dooling DJ, Larson DE, McLellan MD, Chen K, Koboldt DC, Fulton RS, Delehaunty KD, McGrath SD, et al. Recurring mutations found by sequencing an acute myeloid leukemia genome. N. Engl. J. Med. 2009;361:1058–1066.
    1. Stephens PJ, McBride DJ, Lin ML, Varela I, Pleasance ED, Simpson JT, Stebbings LA, Leroy C, Edkins S, Mudie LJ, et al. Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature. 2009;462:1005–1010.
    1. Haider S, Ballester B, Smedley D, Zhang J, Rice P, Kasprzyk A. BioMart central portal – unified access to biological data. Nucleic Acids Res. 2009;37:W23–W27.
    1. Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, et al. The generic genome browser: a building block for a model organism system database. Genome Res. 2002;12:1599–1610.

Source: PubMed

3
S'abonner