COSMIC (the Catalogue of Somatic Mutations in Cancer): a resource to investigate acquired mutations in human cancer

Simon A Forbes, Gurpreet Tang, Nidhi Bindal, Sally Bamford, Elisabeth Dawson, Charlotte Cole, Chai Yin Kok, Mingming Jia, Rebecca Ewing, Andrew Menzies, Jon W Teague, Michael R Stratton, P Andrew Futreal, Simon A Forbes, Gurpreet Tang, Nidhi Bindal, Sally Bamford, Elisabeth Dawson, Charlotte Cole, Chai Yin Kok, Mingming Jia, Rebecca Ewing, Andrew Menzies, Jon W Teague, Michael R Stratton, P Andrew Futreal

Abstract

The catalogue of Somatic Mutations in Cancer (COSMIC) (http://www.sanger.ac.uk/cosmic/) is the largest public resource for information on somatically acquired mutations in human cancer and is available freely without restrictions. Currently (v43, August 2009), COSMIC contains details of 1.5-million experiments performed through 13,423 genes in almost 370,000 tumours, describing over 90,000 individual mutations. Data are gathered from two sources, publications in the scientific literature, (v43 contains 7797 curated articles) and the full output of the genome-wide screens from the Cancer Genome Project (CGP) at the Sanger Institute, UK. Most of the world's literature on point mutations in human cancer has now been curated into COSMIC and while this is continually updated, a greater emphasis on curating fusion gene mutations is driving the expansion of this information; over 2700 fusion gene mutations are now described. Whole-genome sequencing screens are now identifying large numbers of genomic rearrangements in cancer and COSMIC is now displaying details of these analyses also. Examination of COSMIC's data is primarily web-driven, focused on providing mutation range and frequency statistics based upon a choice of gene and/or cancer phenotype. Graphical views provide easily interpretable summaries of large quantities of data, and export functions can provide precise details of user-selected data.

Figures

Figure 1.
Figure 1.
The Histogram page forms the core of the COSMIC website, showing the mutation range for each gene selected. The frequencies of each single base substitution are shown in the histogram itself and each deletion, insertion and complex substitution are indicated below as triangles (deletions point down in blue, insertions point up in red) or bars (which indicate the length of sequence being replaced). Below the mutations, long single-colour bars indicate structural domains, providing links to external databases including Pfam and InterPro. Two histograms are shown to demonstrate the range of mutation distributions observed; (a) BRAF shows the classical gain-of-function spectrum, whereby very specific mutations are required to activate growth promotion, in this case most frequently a c.1799T>A substitution; (b) CDKN2A (p16) is a tumour suppressor gene, only allowing growth promotion when its activity is absent (‘loss-of-function’). Tumourigenic mutations only need to inactivate the gene, shown here as a range of mutations across the gene’s length, particularly including a large quantity of frame shifting insertions and deletions.
Figure 2.
Figure 2.
COSMIC has now standardized on Circos diagrams (4) to display genome-wide mutation analyses for a sample. Concentric rings display different data for the same sample, all located to the same genomic co-ordinates. On the outside of the circle, the chromosomes of the standard reference genome are coloured, numbered and appropriately scaled. Moving from the chromosome indicators to the centre of the circle, blue lines indicate the presence of coding point mutations in the coding domains of genes, a ring of histograms indicate the copy number of each genomic segment, red bars indicate the presence of small intra chromosomal rearrangements and green lines link chromosomes where large intra chromosomal rearrangements have been observed.
Figure 3.
Figure 3.
COSMIC mutations are available in the Ensembl genome browser, into which all other genomic annotations can be combined. Each unique sequence change is separately indicated and in a genomic context, the coding bias of COSMIC’s data makes the annotations pile up over the exons. In this example, the first 15 rows of mutations in the PTEN gene is shown (reduced from 197 rows in Ensembl).

References

    1. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton MR. A census of human cancer genes. Nat. Rev. Cancer. 2004;4:177–183.
    1. Campbell PJ, Stephens PJ, Pleasance ED, O'M;eara S, Li H, Santarius T, Stebbings LA, Leroy C, Edkins S, Hardy C, et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 2008;40:722–729.
    1. den Dunnen JT, Antonarakis SE. Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion. Hum. Mut. 2000;15:7–12.
    1. Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones S, Marra M. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19:1639–1645.
    1. Dowell RD, Jokerst RM, Day A, Eddy SR, Stein L. The distributed annotation system. BMC Bioinformatics. 2001;2:7.
    1. Forbes SA, Bhamra G, Bamford S, Dawson E, Kok C, Clements J, Menzies A, Teague JW, Futreal PA, Stratton MR. The catalogue of somatic mutations in cancer (COSMIC) Curr. Protoc. Hum. Genet. 2008;10:11.
    1. Haider S, Ballester B, Smedley D, Zhang J, Rice P, Kasprzyk A. Biomart Central Portal - unified access to biological data. Nucleic Acids Res. 2009;37:W23–W27.
    1. Sjoblom T, Jones S, Wood LD, Parsons DW, Lin J, Barber TD, Mandelker D, Leary RJ, Ptak J, Silliman N, et al. The consensus coding sequences of human breast and colorectal cancers. Science. 2006;314:268–274.

Source: PubMed

3
Iratkozz fel