Untargeted Metabolomics Strategies-Challenges and Emerging Directions

Alexandra C Schrimpe-Rutledge, Simona G Codreanu, Stacy D Sherrod, John A McLean, Alexandra C Schrimpe-Rutledge, Simona G Codreanu, Stacy D Sherrod, John A McLean

Abstract

Metabolites are building blocks of cellular function. These species are involved in enzyme-catalyzed chemical reactions and are essential for cellular function. Upstream biological disruptions result in a series of metabolomic changes and, as such, the metabolome holds a wealth of information that is thought to be most predictive of phenotype. Uncovering this knowledge is a work in progress. The field of metabolomics is still maturing; the community has leveraged proteomics experience when applicable and developed a range of sample preparation and instrument methodology along with myriad data processing and analysis approaches. Research focuses have now shifted toward a fundamental understanding of the biology responsible for metabolomic changes. There are several types of metabolomics experiments including both targeted and untargeted analyses. While untargeted, hypothesis generating workflows exhibit many valuable attributes, challenges inherent to the approach remain. This Critical Insight comments on these challenges, focusing on the identification process of LC-MS-based untargeted metabolomics studies-specifically in mammalian systems. Biological interpretation of metabolomics data hinges on the ability to accurately identify metabolites. The range of confidence associated with identifications that is often overlooked is reviewed, and opportunities for advancing the metabolomics field are described. Graphical Abstract ᅟ.

Keywords: Bioinformatics; Discovery; Global; Identification; Metabolomics; Targeted; Untargeted; Validation.

Figures

Figure 1
Figure 1
Untargeted versus Targeted Metabolomics Studies. Untargeted, or discovery-based, metabolomics focuses on global detection and relative quantitation of small molecules in a sample. In contrast, targeted, or validation-based, metabolomics focuses on measuring well-defined groups of metabolites, with opportunities for absolute quantitation.
Figure 2
Figure 2
An illustration of the amount of information density present at different levels of mass measurement accuracy, using the validated entries in the PubChem compound database. (a) The distribution of molecules in the PubChem compound database between 0 and 1000 Da, as surveyed in 2007, 2011, and 2015. As new compounds are discovered and archived, the distribution has shifted to lower mass, with most entries currently centered between 100 and 600 Da. Theoretical molecular formulas determined from chemical stability rules are illustrated by the dotted line, indicating that most of these entries are isomers. The inset zooms in on a 10 Da window where over half a million compounds are represented. (b) At increasing levels of mass accuracy, the number of possible molecular formulas can be reduced to a few thousand, but in one extreme case shown at 1 ppm, one formula is represented by over 10,000 isomers in the database. Mass spectrometry can significantly reduce complexity, but it cannot fully address molecular characterization without other dimensions of information. Reproduced with permission of Annual Review of Analytical Chemistry, Volume 9 © by Annual Reviews, http://www.annualreviews.org from reference [2].
Figure 3
Figure 3
Proposed workflow for metabolite identification confidence using multidimensional mass spectrometry. From top to bottom: Obtaining an exact mass measurement for a Unique Feature (Level 5) allows database searching, which here is illustrated by the over 61 million compounds indexed in PubChem at the time of this review. Subsequent levels of mass accuracy reduce the number of possible molecular formulas from over 200,000 (unit resolution), to ca. 10,000 at 1 ppm mass accuracy for the example mass of 354 Da. Using higher mass accuracy and/or a heuristic filtering approach obtains a unique Molecular Formula (Level 4), which still represents several thousand isomeric compounds. Tentative Structures (Level 3) match precursor m/z to a metabolite database and Putative Identifications (Level 2) match fragmentation data to metabolite MS/MS libraries. Obtaining a Validated Identification (Level 1) requires additional data evidence, such as tandem MS/MS, LC, IM, or measurements from other analytical techniques (optical spectroscopy or NMR) that match corresponding reference standard data under identical experimental conditions. Right portion of figure modified with permission of Annual Review of Analytical Chemistry, Volume 9 © by Annual Reviews, http://www.annualreviews.org from reference [2].
Figure 4
Figure 4
Network module output from mummichog analysis of the qualitative and relative quantitative differences in metabolomic profiles of G6PDd deficient vs. normal human erythrocytes. Feature m/z values and significance measurements were used to predict metabolic activity networks without the use of conventional MS/MS identification workflows. Metabolites are colored blue (negative fold change) or red (positive fold change) and the size/color intensity represents the magnitude of fold change.

Source: PubMed

3
Sottoscrivi