The human disease network

Kwang-Il Goh, Michael E Cusick, David Valle, Barton Childs, Marc Vidal, Albert-László Barabási, Kwang-Il Goh, Michael E Cusick, David Valle, Barton Childs, Marc Vidal, Albert-László Barabási

Abstract

A network of disorders and disease genes linked by known disorder-gene associations offers a platform to explore in a single graph-theoretic framework all known phenotype and disease gene associations, indicating the common genetic origin of many diseases. Genes associated with similar disorders show both higher likelihood of physical interactions between their products and higher expression profiling similarity for their transcripts, supporting the existence of distinct disease-specific functional modules. We find that essential human genes are likely to encode hub proteins and are expressed widely in most tissues. This suggests that disease genes also would play a central role in the human interactome. In contrast, we find that the vast majority of disease genes are nonessential and show no tendency to encode hub proteins, and their expression pattern indicates that they are localized in the functional periphery of the network. A selection-based model explains the observed difference between essential and disease genes and also suggests that diseases caused by somatic mutations should not be peripheral, a prediction we confirm for cancer genes.

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Construction of the diseasome bipartite network. (Center) A small subset of OMIM-based disorder–disease gene associations (18), where circles and rectangles correspond to disorders and disease genes, respectively. A link is placed between a disorder and a disease gene if mutations in that gene lead to the specific disorder. The size of a circle is proportional to the number of genes participating in the corresponding disorder, and the color corresponds to the disorder class to which the disease belongs. (Left) The HDN projection of the diseasome bipartite graph, in which two disorders are connected if there is a gene that is implicated in both. The width of a link is proportional to the number of genes that are implicated in both diseases. For example, three genes are implicated in both breast cancer and prostate cancer, resulting in a link of weight three between them. (Right) The DGN projection where two genes are connected if they are involved in the same disorder. The width of a link is proportional to the number of diseases with which the two genes are commonly associated. A full diseasome bipartite map is provided as SI Fig. 13.
Fig. 2.
Fig. 2.
The HDN and the DGN. (a) In the HDN, each node corresponds to a distinct disorder, colored based on the disorder class to which it belongs, the name of the 22 disorder classes being shown on the right. A link between disorders in the same disorder class is colored with the corresponding dimmer color and links connecting different disorder classes are gray. The size of each node is proportional to the number of genes participating in the corresponding disorder (see key), and the link thickness is proportional to the number of genes shared by the disorders it connects. We indicate the name of disorders with >10 associated genes, as well as those mentioned in the text. For a complete set of names, see SI Fig. 13. (b) In the DGN, each node is a gene, with two genes being connected if they are implicated in the same disorder. The size of each node is proportional to the number of disorders in which the gene is implicated (see key). Nodes are light gray if the corresponding genes are associated with more than one disorder class. Genes associated with more than five disorders, and those mentioned in the text, are indicated with the gene symbol. Only nodes with at least one link are shown.
Fig. 3.
Fig. 3.
Characterizing the disease modules. (a) Number of observed physical interactions between the products of genes within the same disorder (red arrow) and the distribution of the expected number of interactions for the random control (blue) (P < 10−6). (b) Distribution of the tissue-homogeneity of a disorder (red). Random control (blue) with the same number of genes chosen randomly is shown for comparison. (c) The distribution of PCC ρij values of the expression profiles of each disease gene pair that belongs to the same disorder (red) and the control (blue), representing the PCC distribution between all gene pairs (P < 10−6). (d) Distribution of the average PCC between expression profiles of all genes associated with the same disorder (red) is also shifted toward higher values than the random control (blue) with the same number of genes chosen randomly (P < 10−6).
Fig. 4.
Fig. 4.
Functional characteristics of disease and essential genes. (a) The fraction of disease genes among those whose protein products that interact with k other proteins. (b) Venn diagram showing the relationship between the human genes studied in this work. (c) The fraction of genes with lethal mouse phenotypes (essential genes) among those with mouse phenotypes that interact with k other proteins. (d) The same as in a, but only for nonessential disease genes, i.e., excluding 398 proteins with lethal mouse phenotypes. (e and f) The fraction of essential genes (e) and nonessential disease genes (f) among those whose average PCC with other genes is 〈ρ〉. (g and h) The fraction of essential genes (g) and nonessential disease genes (h) among those whose transcript is expressed in nT tissues. Gray horizontal lines in a and c–h indicate the global average. Error bars represent standard errors. Note that for some data points the error bars are smaller than the symbol size, and thus are not visible. In a, c, and d gray symbols are the linearly binned data points, whereas color corresponds to the statistically more uniform log-binned data. For details of the significance analysis, see SI Text.

Source: PubMed

3
Abonner