Use of machine learning to classify high-risk variants of uncertain significance in lamin A/C cardiac disease

Jeffrey S Bennett, David M Gordon, Uddalak Majumdar, Patrick J Lawrence, Adrianna Matos-Nieves, Katherine Myers, Anna N Kamp, Julie C Leonard, Kim L McBride, Peter White, Vidu Garg, Jeffrey S Bennett, David M Gordon, Uddalak Majumdar, Patrick J Lawrence, Adrianna Matos-Nieves, Katherine Myers, Anna N Kamp, Julie C Leonard, Kim L McBride, Peter White, Vidu Garg

Abstract

Background: Variation in lamin A/C results in a spectrum of clinical disease, including arrhythmias and cardiomyopathy. Benign variation is rare, and classification of LMNA missense variants via in silico prediction tools results in a high rate of variants of uncertain significance (VUSs).

Objective: The goal of this study was to use a machine learning (ML) approach for in silico prediction of LMNA pathogenic variation.

Methods: Genetic sequencing was performed on family members with conduction system disease, and patient cell lines were examined for LMNA expression. In silico predictions of conservation and pathogenicity of published LMNA variants were visualized with uniform manifold approximation and projection. K-means clustering was used to identify variant groups with similarly projected scores, allowing the generation of statistically supported risk categories.

Results: We discovered a novel LMNA variant (c.408C>A:p.Asp136Glu) segregating with conduction system disease in a multigeneration pedigree, which was reported as a VUS by a commercial testing company. Additional familial analysis and in vitro testing found it to be pathogenic, which prompted the development of an ML algorithm that used in silico predictions of pathogenicity for known LMNA missense variants. This identified 3 clusters of variation, each with a significantly different incidence of known pathogenic variants (38.8%, 15.0%, and 6.1%). Three hundred thirty-nine of 415 head/rod domain variants (81.7%), including p.Asp136Glu, were in clusters with highest proportions of pathogenic variants.

Conclusion: An unsupervised ML method successfully identified clusters enriched for pathogenic LMNA variants including a novel variant associated with conduction system disease. Our ML method may assist in identifying high-risk VUS when familial testing is unavailable.

Keywords: Atrial fibrillation; Atrioventricular block; Genetics; Lamin A/C; Machine learning.

Conflict of interest statement

Conflict of interest statement: No conflicts to disclose

Copyright © 2021 Heart Rhythm Society. Published by Elsevier Inc. All rights reserved.

Figures

Figure 1.. Multigenerational pedigree of atrial fibrillation(AF)…
Figure 1.. Multigenerational pedigree of atrial fibrillation(AF) and atrioventricular block(AVB).
(A) Four individuals (+) were positive for heterozygous LMNA Asp136Glu variant, each with AF and AVB. (B) Sanger sequencing demonstrates variant in affected individuals(+) and absence(−) in unaffected. Chromatograms are in 3’ to 5’ direction.
Figure 2.. Lamin A/C protein expression in…
Figure 2.. Lamin A/C protein expression in immortalized lymphoblastoid lines.
(A) Lamin A/C is localized normally to the nucleus by immunofluorescence, but (B) overall expression level is significantly decreased in affected subjects (III-3 and II-2) when compared to unaffected (III-1). *, Kruskal-Wallis test, P<.0001. (C) Western blotting shows decreased Lamin A/C protein in cell lysates from III-3 and II-2 as compared to III-1. GAPDH, loading control. (D) Quantification of Lamin A/C protein expression from (C) is shown. *, Kruskal-Wallis test, P=.007.
Figure 3.. Both pathogenic variants and variants…
Figure 3.. Both pathogenic variants and variants of uncertain significance are distributed throughout the LMNA protein.
Pathogenic variants are more densely spaced in the head/rod region with no large regions lacking known variation present.
Figure 4.. Unsupervised machine learning identifies LMNA…
Figure 4.. Unsupervised machine learning identifies LMNA variant clusters enriched for pathogenic variants.
(A) UMAP projection and k-means clustering identify enrichment of pathogenic variants (red circles) in clusters 1 (pink) and 2 (yellow) versus cluster 3 (blue). (B) Clusters 1, 2 and 3 contain significantly different fractions of known LMNA pathogenic variants (Chi-Square 65.1, P=7.5e-15).
Figure 5.. Distribution of variants in head/rod…
Figure 5.. Distribution of variants in head/rod versus other domains is similar, but pathogenic head/rod variants are enriched in cluster 2.
(A) Distribution of variants from the head/rod and other domains is similar with the greatest fraction of variants in cluster 2. (B) Increase in pathogenic variants in cluster 2 derives from a significant enrichment of pathogenic head/rod variants. *, P=.007.
Figure 6.. Variants plotted by cluster across…
Figure 6.. Variants plotted by cluster across the LMNA transcript show areas of varying density.
(A) Cluster 1 (B) Cluster 2 (C) Cluster 3.

Source: PubMed

3
订阅