Intra- and inter-individual variance of gene expression in clinical studies

Wei-Chung Cheng, Wun-Yi Shu, Chia-Yang Li, Min-Lung Tsai, Cheng-Wei Chang, Chaang-Ray Chen, Hung-Tsu Cheng, Tzu-Hao Wang, Ian C Hsu, Wei-Chung Cheng, Wun-Yi Shu, Chia-Yang Li, Min-Lung Tsai, Cheng-Wei Chang, Chaang-Ray Chen, Hung-Tsu Cheng, Tzu-Hao Wang, Ian C Hsu

Abstract

Background: Variance in microarray studies has been widely discussed as a critical topic on the identification of differentially expressed genes; however, few studies have addressed the influence of estimating variance.

Methodology/principal findings: To break intra- and inter-individual variance in clinical studies down to three levels--technical, anatomic, and individual--we designed experiments and algorithms to investigate three forms of variances. As a case study, a group of "inter-individual variable genes" were identified to exemplify the influence of underestimated variance on the statistical and biological aspects in identification of differentially expressed genes. Our results showed that inadequate estimation of variance inevitably led to the inclusion of non-statistically significant genes into those listed as significant, thereby interfering with the correct prediction of biological functions. Applying a higher cutoff value of fold changes in the selection of significant genes reduces/eliminates the effects of underestimated variance.

Conclusions/significance: Our data demonstrated that correct variance evaluation is critical in selecting significant genes. If the degree of variance is underestimated, "noisy" genes are falsely identified as differentially expressed genes. These genes are the noise associated with biological interpretation, reducing the biological significance of the gene set. Our results also indicate that applying a higher number of fold change as the selection criteria reduces/eliminates the differences between distinct estimations of variance.

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1. Microarray experimental design.
Figure 1. Microarray experimental design.
Three kinds of samples were employed in this study. Individual variance was evaluated using the first sample group (G1), comprising Samples 1 to 9 of nine individuals. The second sample group (G2) was used to evaluate anatomic variance. It contained Samples 8–1, 8–2, and 8–3, taken from three different sections of placenta from the same individual. The third sample group (G3) consists of two technical replicates, Samples 8–3_1 and 8–3_2, using an identical RNA pool for microarray hybridization to evaluate technical variance. The expression of Sample 8–3 could be estimated by the mean expression of Samples 8–3_1 and 8–3_2. The mean expression of Samples 8–1, 8–2, and 8–3 represented the expression of Sample 8.
Figure 2. Profiles of the three kinds…
Figure 2. Profiles of the three kinds of variance.
(a) The distribution of the differential expression for the three forms of variance. The differential expression for the three forms of variance was estimated by S1:, S2:, and S3: for any possible pair of i and j, respectively. (b) D1, D2, and D3 are the probability density distributions of D quantity using permutation method using the data series S1, S2, and S3 when considering individual, anatomic, and technical variance respectively.
Figure 3. The scatter plot of averaged…
Figure 3. The scatter plot of averaged fold change and p values, and the selection of inter-individual variable gene.
(a) The scatter plot of log2 (averaged fold change) and –log (p value). Pa is the p value determined by applying anatomic variance. Pt is the p value determined by applying technical variance. (b) The enlarged area of the rectangle in (a). The red arrows indicate the corresponding p value of FDR 5%. The gray arrows indicate the averaged fold change criteria: 1.2, 1.3, 1.4, and 1.5. (c) The number of inter-individual variable gene selected by the criteria of FDR 5%, evaluated by technical and anatomic variance (The red arrows in Figure 3b), and distinct averaged fold changes (The gray arrows in Figure 3b).

References

    1. van ‘t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530–536.
    1. Hoshida Y, Villanueva A, Kobayashi M, Peix J, Chiang DY, et al. Gene expression in fixed tissues and outcome in hepatocellular carcinoma. N Engl J Med. 2008;359:1995–2004.
    1. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000;403:503–511.
    1. Ein-Dor L, Kela I, Getz G, Givol D, Domany E. Outcome signature genes in breast cancer: is there a unique set? Bioinformatics. 2005;21:171–178.
    1. Michiels S, Koscielny S, Hill C. Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet. 2005;365:488–492.
    1. Tan PK, Downey TJ, Spitznagel EL, Xu P, Fu D, et al. Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res. 2003;31:5676–5684.
    1. Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet. 2010;11:733–739.
    1. Liang M, Briggs AG, Rute E, Greene AS, Cowley AW. Quantitative assessment of the importance of dye switching and biological replication in cDNA microarray studies. Physiol Genomics. 2003;14:199–207.
    1. Severgnini M, Bicciato S, Mangano E, Scarlatti F, Mezzelani A, et al. Strategies for comparing gene expression profiles from different microarray platforms: application to a case-control experiment. Anal Biochem. 2006;353:43–56.
    1. Patterson TA, Lobenhofer EK, Fulmer-Smentek SB, Collins PJ, Chu TM, et al. Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project. Nat Biotechnol. 2006;24:1140–1150.
    1. Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol. 2006;24:1151–1161.
    1. Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, et al. Multiple-laboratory comparison of microarray platforms. Nat Methods. 2005;2:345–350.
    1. Larkin JE, Frank BC, Gavras H, Sultana R, Quackenbush J. Independence and reproducibility across microarray platforms. Nat Methods. 2005;2:337–344.
    1. Vinciotti V, Khanin R, D’Alimonte D, Liu X, Cattini N, et al. An experimental evaluation of a loop versus a reference design for two-channel microarrays. Bioinformatics. 2005;21:492–501.
    1. Kerr MK. Design considerations for efficient and effective microarray studies. Biometrics. 2003;59:822–828.
    1. Kerr MK, Churchill GA. Experimental design for gene expression microarrays. Biostatistics. 2001;2:183–201.
    1. Jeffery IB, Higgins DG, Culhane AC. Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinformatics. 2006;7:359.
    1. Jeanmougin M, de Reynies A, Marisa L, Paccard C, Nuel G, et al. Should we abandon the t-test in the analysis of gene expression microarray data: a comparison of variance modeling strategies. PLoS ONE. 2010;5:e12336.
    1. Whitehead A, Crawford DL. Variation within and among species in gene expression: raw material for evolution. Mol Ecol. 2006;15:1197–1211.
    1. Kerr MK, Churchill GA. Statistical design and the analysis of gene expression microarray data. Genet Res. 2007;89:509–514.
    1. Manoli T, Gretz N, Grone HJ, Kenzelmann M, Eils R, et al. Group testing for pathway analysis improves comparability of different microarray datasets. Bioinformatics. 2006;22:2500–2506.
    1. Jung SH. Sample size and power calculation for molecular biology studies. Methods Mol Biol. 2010;620:203–218.
    1. Bammler T, Beyer RP, Bhattacharya S, Boorman GA, Boyles A, et al. Standardizing global gene expression analysis between laboratories and across platforms. Nat Methods. 2005;2:351–356.
    1. van Beek EA, Bakker AH, Kruyt PM, Hofker MH, Saris WH, et al. Intra- and interindividual variation in gene expression in human adipose tissue. Pflugers Arch. 2006.
    1. Hollox EJ, Armour JA, Barber JC. Extensive normal copy number variation of a beta-defensin antimicrobial-gene cluster. Am J Hum Genet. 2003;73:591–600.
    1. Heidenblad M, Lindgren D, Veltman JA, Jonson T, Mahlamaki EH, et al. Microarray analyses reveal strong influence of DNA copy number alterations on the transcriptional patterns in pancreatic cancer: implications for the interpretation of genomic amplifications. Oncogene. 2005;24:1794–1801.
    1. Yan H, Yuan W, Velculescu VE, Vogelstein B, Kinzler KW. Allelic variation in human gene expression. Science. 2002;297:1143.
    1. Bray NJ, Buckland PR, Owen MJ, O’Donovan MC. Cis-acting variation in the expression of a high proportion of genes in human brain. Hum Genet. 2003;113:149–153.
    1. Cheung VG, Conlin LK, Weber TM, Arcaro M, Jen KY, et al. Natural variation in human gene expression assessed in lymphoblastoid cells. Nat Genet. 2003;33:422–425.
    1. Whitney AR, Diehn M, Popper SJ, Alizadeh AA, Boldrick JC, et al. Individuality and variation in gene expression patterns in human blood. Proc Natl Acad Sci U S A. 2003;100:1896–1901.
    1. Gruber MP, Coldren CD, Woolum MD, Cosgrove GP, Zeng C, et al. Human lung project: evaluating variance of gene expression in the human lung. Am J Respir Cell Mol Biol. 2006;35:65–71.
    1. Sood R, Zehnder JL, Druzin ML, Brown PO. Gene expression patterns in human placenta. Proc Natl Acad Sci U S A. 2006;103:5478–5483.
    1. Chowers I, Liu D, Farkas RH, Gunatilaka TL, Hackam AS, et al. Gene expression variation in the adult human retina. Hum Mol Genet. 2003;12:2881–2893.
    1. Oleksiak MF, Churchill GA, Crawford DL. Variation in gene expression within and among natural populations. Nat Genet. 2002;32:261–266.
    1. Pritchard CC, Hsu L, Delrow J, Nelson PS. Project normal: defining normal variance in mouse gene expression. Proc Natl Acad Sci U S A. 2001;98:13266–13271.
    1. Schadt EE, Monks SA, Drake TA, Lusis AJ, Che N, et al. Genetics of gene expression surveyed in maize, mouse and man. Nature. 2003;422:297–302.
    1. Jin W, Riley RM, Wolfinger RD, White KP, Passador-Gurgel G, et al. The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster. Nat Genet. 2001;29:389–395.
    1. Li J, Liu Y, Kim T, Min R, Zhang Z. Gene expression variability within and between human populations and implications toward disease susceptibility. PLoS Comput Biol 6. 2010.
    1. Whitehead A, Crawford DL. Neutral and adaptive variation in gene expression. Proc Natl Acad Sci U S A. 2006;103:5425–5430.
    1. Stevens VM, Pavoine S, Baguette M. Variation within and between closely related species uncovers high intra-specific variability in dispersal. PLoS One. 2010;5:e11123.
    1. Kliebenstein DJ. A role for gene duplication and natural variation of gene expression in the evolution of metabolism. PLoS One. 2008;3:e1838.
    1. Peng HH, Kao CC, Chang SD, Chao AS, Chang YL, et al. The effects of labor on differential gene expression in parturient women, placentas, and fetuses at term pregnancy. Kaohsiung J Med Sci. 2011;27:494–502.
    1. Wang CN, Chang SD, Peng HH, Lee YS, Chang YL, et al. Change in amniotic fluid levels of multiple anti-angiogenic proteins before development of preeclampsia and intrauterine growth restriction. J Clin Endocrinol Metab. 2010;95:1431–1441.
    1. Chang SD, Chao AS, Peng HH, Chang YL, Wang CN, et al. Analyses of placental gene expression in pregnancy-related hypertensive disorders. Taiwan J Obstet Gynecol. 2011;50:283–291.
    1. Wang TH, Lee YS, Chen ES, Kong WH, Chen LK, et al. Establishment of cDNA microarray analysis at the Genomic Medicine Research Core Laboratory (GMRCL) of Chang Gung Memorial Hospital. Chang Gung Med J. 2004;27:243–260.
    1. Lashkari DA, DeRisi JL, McCusker JH, Namath AF, Gentile C, et al. Yeast microarrays for genome wide parallel genetic and gene expression analysis. Proc Natl Acad Sci U S A. 1997;94:13057–13062.
    1. Tsai ML, Chang KY, Chiang CS, Shu WY, Weng TC, et al. UVB radiation induces persistent activation of ribosome and oxidative phosphorylation pathways. Radiat Res. 2009;171:716–724.
    1. Huang CL, Shu WY, Tsai ML, Chiang CS, Chang CW, et al. Repeated small perturbation approach reveals transcriptomic steady States. PLoS One. 2011;6:e29241.
    1. Chen CR, Shu WY, Tsai ML, Cheng WC, Hsu IC. THEME: A web tool for loop-design microarray data analysis. Comput Biol Med. 2012;42:228–234.
    1. Dennis G, Jr, Sherman BT, Hosack DA, Yang J, Gao W, et al. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 2003;4:P3.
    1. Tsai MS, Hwang SM, Chen KD, Lee YS, Hsu LW, et al. Functional network analysis of the transcriptomes of mesenchymal stem cells derived from amniotic fluid, amniotic membrane, cord blood, and bone marrow. Stem Cells. 2007;25:2511–2523.
    1. Wang TH, Chao A. Microarray analysis of gene expression of cancer to guide the use of chemotherapeutics. Taiwan J Obstet Gynecol. 2007;46:222–229.
    1. Richani K, Romero R, Soto E, Nien JK, Cushenberry E, et al. Genetic origin and proportion of basal plate surface-lining cells in normal and abnormal pregnancies. Hum Pathol. 2007;38:269–275.
    1. Cui X, Hwang JT, Qiu J, Blades NJ, Churchill GA. Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics. 2005;6:59–75.
    1. Hong F, Breitling R, McEntee CW, Wittner BS, Nemhauser JL, et al. RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis. Bioinformatics. 2006;22:2825–2827.
    1. Boulesteix AL, Hothorn T. Testing the additional predictive value of high-dimensional molecular data. BMC Bioinformatics. 2010;11:78.
    1. Korkola JE, DeVries S, Fridlyand J, Hwang ES, Estep AL, et al. Differentiation of lobular versus ductal breast carcinomas by expression microarray analysis. Cancer Res. 2003;63:7167–7175.
    1. Lage-Castellanos A, Martinez-Montes E, Hernandez-Cabrera JA, Galan L. False discovery rate and permutation test: an evaluation in ERP data analysis. Stat Med. 2010;29:63–74.
    1. Sohn I, Owzar K, George SL, Kim S, Jung SH. A permutation-based multiple testing method for time-course microarray experiments. BMC Bioinformatics. 2009;10:336.
    1. Guo L, Lobenhofer EK, Wang C, Shippy R, Harris SC, et al. Rat toxicogenomic study reveals analytical consistency across microarray platforms. Nat Biotechnol. 2006;24:1162–1169.
    1. Pan KH, Lih CJ, Cohen SN. Effects of threshold choice on biological conclusions reached during analysis of gene expression by DNA microarrays. Proc Natl Acad Sci U S A. 2005;102:8961–8965.
    1. Chuchana P, Holzmuller P, Vezilier F, Berthier D, Chantal I, et al. Intertwining threshold settings, biological data and database knowledge to optimize the selection of differentially expressed genes from microarray. PLoS ONE. 2010;5:e13518.
    1. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci U S A. 1999;96:6745–6750.

Source: PubMed

3
Prenumerera