Interdependence of signal processing and analysis of urine 1H NMR spectra for metabolic profiling

Shucha Zhang, Cheng Zheng, Ian R Lanza, K Sreekumaran Nair, Daniel Raftery, Olga Vitek, Shucha Zhang, Cheng Zheng, Ian R Lanza, K Sreekumaran Nair, Daniel Raftery, Olga Vitek

Abstract

Metabolic profiling of urine presents challenges because of the extensive random variation of metabolite concentrations and the dilution resulting from changes in the overall urine volume. Thus statistical analysis methods play a particularly important role; however, appropriate choices of these methods are not straightforward. Here we investigate constant and variance-stabilization normalization of raw and peak picked spectra, for use with exploratory analysis (principal component analysis) and confirmatory analysis (ordinary and Empirical Bayes t-test) in (1)H NMR-based metabolic profiling of urine. We compare the performance of these methods using urine samples spiked with known metabolites according to a Latin square design. We find that analysis of peak picked and logarithm-transformed spectra is preferred, and that signal processing and statistical analysis steps are interdependent. While variance-stabilizing transformation is preferred in conjunction with principal component analysis, constant normalization is more appropriate for use with a t-test. Empirical Bayes t-test provides more reliable conclusions when the number of samples in each group is relatively small. Performance of these methods is illustrated using a clinical metabolomics experiment on patients with type 1 diabetes to evaluate the effect of insulin deprivation.

Figures

Figure 1
Figure 1
Observed log fold changes are plotted against nominal log fold changes for all metabolites, separately for each dilution. The fold changes are taken with respect to a baseline concentration of 800uM. The solid line represents the expected pattern. The dotted lines denote the 75th quantile of standard deviations of the background metabolites. Colors indicate dilution types.
Figure 2
Figure 2
PCA score plots for the 54 spectra in the spike-in dataset. X and Y axes are the first and the second principal components, respectively. Colors indicate the six mixture types, and shapes indicate the three dilution levels. VSN gives rise to the best performance in minimizing the dilution effect.
Figure 3
Figure 3
False positive rate (FPR) for detecting differentially abundant peaks. X-axis: number of detected differentially abundant peaks. Y-axis: average false positive rate, calculated over all pairs of mixtures in a comparison set. Shapes indicate normalization types. Colors indicate ordinary and moderated t-tests. (A) 90 urine-like pairwise comparisons of mixture types. Samples from different mixtures also differ in dilution. (B) 45 blood-like comparisons of mixture types. Samples from different mixtures have identical dilution.
Figure 4
Figure 4
PCA scores plots for the diabetes data set. X and Y axes indicate the first and the second principal components, respectively. Black dots indicate samples from diabetic patients. Open circles indicate samples from healthy controls. The choice of normalization procedure impacts the appearance of the PCA score plots.
Figure 5
Figure 5
Venn diagrams of differentially abundant peaks detected for the diabetic urine data set at the false discovery rate of 5%. Total sum normalization produces the highest number of differentiating peaks, while the choice of ordinary or moderated t-test has little impact on this data set.

Source: PubMed

3
S'abonner