MetSizeR: selecting the optimal sample size for metabolomic studies using an analysis based approach

Gift Nyamundanda, Isobel Claire Gormley, Yue Fan, William M Gallagher, Lorraine Brennan, Gift Nyamundanda, Isobel Claire Gormley, Yue Fan, William M Gallagher, Lorraine Brennan

Abstract

Background: Determining sample sizes for metabolomic experiments is important but due to the complexity of these experiments, there are currently no standard methods for sample size estimation in metabolomics. Since pilot studies are rarely done in metabolomics, currently existing sample size estimation approaches which rely on pilot data can not be applied.

Results: In this article, an analysis based approach called MetSizeR is developed to estimate sample size for metabolomic experiments even when experimental pilot data are not available. The key motivation for MetSizeR is that it considers the type of analysis the researcher intends to use for data analysis when estimating sample size. MetSizeR uses information about the data analysis technique and prior expert knowledge of the metabolomic experiment to simulate pilot data from a statistical model. Permutation based techniques are then applied to the simulated pilot data to estimate the required sample size.

Conclusions: The MetSizeR methodology, and a publicly available software package which implements the approach, are illustrated through real metabolomic applications. Sample size estimates, informed by the intended statistical analysis technique, and the associated uncertainty are provided.

Figures

Figure 1
Figure 1
Sample size estimation without experimental pilot data using the PPCA model. In each panel is the estimated FDR (solid red lines) as well as the 10th and 90th percentiles (dashed red lines). A horizontal dashed black line is the target FDR at 5%. (A) The sample size n^ is estimated to be 30 with 15 samples in each treatment group. (B-D) show the effect of varying the proportion of significant bins over a range of sample sizes.
Figure 2
Figure 2
Sample size estimation without experimental pilot data using the PPCCA and DPPCA models. (A) The estimated sample size using the PPCCA model with two covariates. (B) The estimated sample size for a longitudinal study using the DPPCA model.
Figure 3
Figure 3
Sample size estimation with pilot data. (A) The estimated sample size using the PPCCA model on NMR pilot data with weights of subjects as a covariate. (B) The estimated sample size using the PPCA model with targeted MS metabolomic pilot data.

References

    1. Berk M, Ebbels T, Montana G. A statistical framework for biomarker discovery in metabolomic time course data. Bioinformatics. 2011;27(14):1979–1985. doi: 10.1093/bioinformatics/btr289.
    1. Muller P, Parmigiani G, Robert C, Rousseau J. Optimal sample size for multiple testing. J Am Stat Assoc. 2004;99(468):990–100. doi: 10.1198/016214504000001646.
    1. Tibshirani R. A simple method for assessing sample sizes in microarray experiments. BMC Bioinformatics. 2006;7:106. doi: 10.1186/1471-2105-7-106.
    1. Liu P, Hwanga JTG. Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics. 2007;23(6):739–746. doi: 10.1093/bioinformatics/btl664.
    1. Lin WJ, Hsueh HM, Chen JJ. Power and sample size estimation in microarray studies. BMC Bioinformatics. 2010;11:48. doi: 10.1186/1471-2105-11-48.
    1. Tipping ME, Bishop CM. Probabilistic principal component analysis. J R Stat Soc: Series B (Stat Method) 1999;61(3):611–622. doi: 10.1111/1467-9868.00196.
    1. Nyamundanda G, Gormley IC, Brennan L. Probabilistic principal component analysis for metabolomic data. BMC Bioinformatics. 2010;11:571. doi: 10.1186/1471-2105-11-571.
    1. Nyamundanda G, Gormley IC, Brennan L. A dynamic probabilistic principal components model for the analysis of longitudinal metabolomics data. J R Stat Soc Series C. (Appl Stat) (To Appear)
    1. R Development Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2009. [ ]
    1. Reo NV. Metabonomics based on NMR spectroscopy. Drug and Chem Toxicol. 2002;25(4):375–382. doi: 10.1081/DCT-120014789.
    1. Dettmer K, Aronov PA, Hammock BD. Mass spectrometry-based metabolomics. Mass Spectrometry Rev. 2007;26:51–78. doi: 10.1002/mas.20108.
    1. Patti GJ, Yanes O, Siuzdak G. Innovation: Metabolomics: the apogee of the omics trilogy. Nat Rev Mol Cell Biol. 2012;13(4):263–269. doi: 10.1038/nrm3314.
    1. Benjamini Y, Hochberg Y. Controlling false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc. 1995;57:289–300.
    1. Hwang D, Schmitt WA, Stephanopoulos G, Stephanopoulos G. Determination of minimum sample size and discriminatory expression patterns in microarray data. Bioinformatics. 2002;18:1184–1193. doi: 10.1093/bioinformatics/18.9.1184.
    1. Carmody S, Brennan L. Effects of pentylenetetrazole-induced seizures on metabolomic profiles of rat brain. Neurochem Int. 2010;56(2):340–344. doi: 10.1016/j.neuint.2009.11.004.

Source: PubMed

Подписаться