Chemosensitivity prediction by transcriptional profiling

J E Staunton, D K Slonim, H A Coller, P Tamayo, M J Angelo, J Park, U Scherf, J K Lee, W O Reinhold, J N Weinstein, J P Mesirov, E S Lander, T R Golub, J E Staunton, D K Slonim, H A Coller, P Tamayo, M J Angelo, J Park, U Scherf, J K Lee, W O Reinhold, J N Weinstein, J P Mesirov, E S Lander, T R Golub

Abstract

In an effort to develop a genomics-based approach to the prediction of drug response, we have developed an algorithm for classification of cell line chemosensitivity based on gene expression profiles alone. Using oligonucleotide microarrays, the expression levels of 6,817 genes were measured in a panel of 60 human cancer cell lines (the NCI-60) for which the chemosensitivity profiles of thousands of chemical compounds have been determined. We sought to determine whether the gene expression signatures of untreated cells were sufficient for the prediction of chemosensitivity. Gene expression-based classifiers of sensitivity or resistance for 232 compounds were generated and then evaluated on independent sets of data. The classifiers were designed to be independent of the cells' tissue of origin. The accuracy of chemosensitivity prediction was considerably better than would be expected by chance. Eighty-eight of 232 expression-based classifiers performed accurately (with P < 0.05) on an independent test set, whereas only 12 of the 232 would be expected to do so by chance. These results suggest that at least for a subset of compounds genomic approaches to chemosensitivity prediction are feasible.

Figures

Figure 1
Figure 1
General scheme for classification of compound sensitivity in cell lines by using gene expression data.
Figure 2
Figure 2
Example of compound (NSC 749; Azaguanine) with bimodal distribution of growth inhibition. For each compound, log(GI50) values were normalized across the 60 cell lines, and cell lines with log(GI50) within 0.8 SDs of the mean are eliminated from analysis; remaining cell lines were defined as sensitive or resistant to the compound. Compounds with at least 30 cell lines outside the 1.6-SD window, and for which the window represents at least 1 order of magnitude in raw GI50 data were analyzed further. A total of 232 compounds met these criteria.
Figure 3
Figure 3
Distribution of classification accuracies for 232 compounds. Percent accuracy for each compound is the average accuracy for classification of sensitive and resistant test cell lines. The control distribution represents results obtained from random classification (1,000 iterations) of the 232 test sets.
Figure 4
Figure 4
Top 30 classifier genes for cytochalasin D (NSC-209835). The red and blue matrix represents the normalized expression patterns for each gene across the cell lines (brightest red indicates highest relative expression, darkest blue indicate, lowest relative expression). (Top) The sensitive and resistant cell lines are shown. Tissue of origin for each cell line is indicated as follows: L, lung (nonsmall cell); C, colon; B, breast; O, ovarian; E, leukemia; R, renal; M, melanoma; P, prostate; N, central nervous system. Lines used as training sets are shown in bold. The list at right shows the weighting factor [measure of correlation; weights were computed by using negative log(GI50) values and thus a positive value correlates with sensitivity], the GenBank accession number, and the gene name. Genes whose products are known to have cytoskeletal and/or extracellular matrix functions are shown in bold.

Source: PubMed

3
Sottoscrivi