Integrative genomics in cardiovascular medicine

James S Ware, Enrico Petretto, Stuart A Cook, James S Ware, Enrico Petretto, Stuart A Cook

Abstract

Integrative genomics studies have greatly advanced our understanding of cardiovascular pathophysiology over the last decade. Here, we highlight the strengths and challenges of this cutting-edge approach and provide examples where novel insights have arisen through the integration of multi-level genomic information and cardiac physiology. Going forward, the integration of comprehensive next-generation sequencing data sets with quantitative phenotypes at the molecular, cellular, and whole-heart level using advanced modelling approaches provides an unprecedented opportunity for cardiovascular science.

Figures

Figure 1
Figure 1
Integrative genomics across data modalities. The cartoon depicts the flow of biological information from the DNA to the disease level, and modelling of multi-modality data within different layers: genetic (DNA), transcriptional (RNA), protein, and metabolic. Interaction within each layer of biological data can be described using network-models and analysed in conjunction with endo-phenotypes at the cellular and organ level to understand human heart disease pathobiology. Network analyses of quantitative biochemical data sets provide information about complex gene–gene interactions and pathway annotation while increasing power to find individual ‘key players’ in disease, which is not possible in single gene studies. Each network model (genetic, transcriptional, protein, metabolic) can be annotated using extensive bioinformatics and database (db) resources. This allows inference of the ‘functional context’ in which individual genes or networks operate by combining experimental and -omics data. OMIM, Online Mendelian Inheritance in Man (http://www.ncbi.nlm.nih.gov/omim/); GWAS db (for instance: https://www.gwascentral.org/), Mutation db (http://reseq.biosciencedbc.jp/resequence/); GO, Gene Ontology (http://www.geneontology.org/); KEGG, Kyoto Encyclopedia of Genes and Genomes (http://www.genome.jp/kegg/); Biocarta (http://www.biocarta.com/); BBID, Biological Biochemical Image Database (http://bbid.grc.nia.nih.gov/); String, Known and Predicted Protein-Protein Interactions (http://string-db.org/); InterPro, InterPro protein sequence analysis & classification (http://www.ebi.ac.uk/interpro/); SMART, Simple Modular Architecture Research Tool (http://smart.embl-heidelberg.de/); HMP, Human Metabolome Database (http://www.hmdb.ca/); MetaCyc, Metabolic Encyclopedia of enzymes and metabolic pathways (http://www.metacyc.org/); ENZIME, Enzyme nomenclature database (http://enzyme.expasy.org/).
Figure 2
Figure 2
An integrative genomic approach to identify cardiovascular disease genes. Genotypes and phenotypes are measured in a population of related individuals, and each genomic marker position is assessed for linkage with the phenotypes. In this case, left ventricular mass (LVM) is studied in a rodent population. The allelic effect is shown for two genomic markers, marker 1 (m1) on chromosome 3 (chr3) and m2 on chr12. A linkage plot is shown for the first 12 chromosomes, showing linkage of LVM to a locus on chr3, at the position of marker m1. The y-axis is the LOD score (logarithm10 of odds), and the dotted line represents genome-wide statistical significance. (A) Microarrays are used to obtain a genome-wide transcript expression profile for RNA expression in the left ventricle in the same population, and the expression of each transcript is mapped as a quantitative trait. The expression of transcript 1 (t1) maps to chr3, where this transcript is encoded: it is hence termed a cis-eQTL. t2, encoded at the same genomic locus, does not appear to be genetically regulated. Although both genes lie within the original LVM locus, t1 is prioritized as the best candidate after eQTL analysis. Expression of t3, also encoded on chr3, maps to chr4: it is a trans-eQTL. (B) Quantitative trait transcript analysis involves the direct correlation of phenotype and expression data. After correction for multiple testing a single transcript emerges as most highly correlated with LVM. If this is t1 this adds further weight to its candidacy.
Figure 3
Figure 3
Different approaches for gene discovery in humans using next-generation sequencing (NGS). (A) One of the simplest applications of NGS is deep sequencing genes at a known disease locus to identify functional variants that may be responsible for the observed effect. Here, a C/T substitution generates a novel stop codon, which truncates a gene product that may be functionally important. (B) Mendelian disease within a single family is typically genetically homogenous: in the absence of phenocopies affected individuals will all share the same causative variant. Whole-genome or whole-exome sequencing can be used to identify functional variants segregating with disease in a family. Many variants will be shared through simple relatedness, so large families are needed for this approach. It also makes assumptions about which classes of variants are likely to underlie Mendelian diseases—typically truncating variants and very rare non-synonymous SNPs. If the causative variant is synonymous or non-coding then it is unlikely to be detected by targeted NGS approaches. (C) An alternative methodology is to sequence unrelated individuals with the same phenotype or endophenotype. Here, we do not expect that affected individuals will carry the same variant, but we hypothesize that they may carry distinct variants in the same gene. This is a powerful approach for genetically homogenous conditions, but inherited cardiac conditions are typically heterogenous (variation in many different genes yields the same phenotype) limiting the applicability of this approach. (D) Strategy C can be extended to continuous traits. If rare variants of moderately large effect contribute to the phenotype then we may be able to detect these by focusing sequencing efforts on the extremes of a very large population, seeking genes that are enriched for rare functional variants using burden testing. (E) Where disease is caused by de novo variation, this may be detected by sequencing trios (proband and both parents). This approach may also be applied to recessive phenotypes. (F) RNA sequencing not only measures total transcript abundance, but can also quantify different isoforms, detect novel transcripts and novel splicing events, and identify sequence variants. Here, isoform 1 is the predominant transcript, but RNAseq provides evidence for two other isoforms. This can be used in integrative genomic studies.

Source: PubMed

3
Prenumerera