AL-Base: a visual platform analysis tool for the study of amyloidogenic immunoglobulin light chain sequences

Kip Bodi, Tatiana Prokaeva, Brian Spencer, Maurya Eberhard, Lawreen H Connors, David C Seldin, Kip Bodi, Tatiana Prokaeva, Brian Spencer, Maurya Eberhard, Lawreen H Connors, David C Seldin

Abstract

AL-Base, a curated database of human immunoglobulin (Ig) light chain (LC) sequences derived from patients with AL amyloidosis and controls, is described, along with a collection of analytical and graphic tools designed to facilitate their analysis. AL-Base is designed to compile and analyse amyloidogenic Ig LC sequences and to compare their predicted protein sequence and structure to non-amyloidogenic LC sequences. Currently, the database contains over 3000 de-identified LC nucleotide and amino acid sequences, of which 433 encode monoclonal proteins that were reported to form fibrillar deposits in AL patients. Each sequence is categorised according to germline gene usage, clinical status and sample source. Currently, tools are available to search for sequences by various criteria, to analyse the biochemical properties of the predicted amino acids at each position and to display the results in a graphical fashion. The likelihood that each sequence has evolved through somatic hypermutation can be predicted using an automated binomial or multinomial distribution model. AL-Base is available to the scientific community for research purposes.

Figures

Figure 1
Figure 1
An example of an alignment of selected amyloidogenic VL LC to closest germline progenitor. Each LC in the database can be aligned to its most likely germline gene progenitor. A substitution list is generated and the comparison can be presented visually using a coloured alignment. FRs and CDRs are distinguished by light grey and dark grey header lines, respectively.
Figure 2
Figure 2
An example of a multiple alignment of selected VL LC protein sequences. Once a set of LCs has been selected from a search, a multiple alignment can be performed. Either nucleotide or protein alignments can be shown. Alignments can be coloured according to the residue class or the property; in this case, by the 11 IMGT amino acid chemical characteristics classes are shown [22]. Different regions of the LC as defined by the IMGT can be individually aligned. Alignments can be downloaded in ClustalW format. Each sequence is identified by its clinical type, accession number, LC family subtype and germline gene. Sequences in a multiple alignment can be individually selected for more information. A report can be generated for each column, giving the counts of residues at that position.
Figure 3
Figure 3
Antigen selection algorithm results for LC EF589566. Top section shows nucleotide statistics for germline gene progenitor IGLV1–51★02. R, replacement sites; S, silent sites; RF, replacement frequency; LR, length of region compared with entire VL region. Bottom section shows results of antigen selection algorithm. R, observed replacement mutations; S, observed silent mutations; REXP, expected number of replacement mutations; PB, p-value as determined by binomial distribution model [9]; PM, p-value as determined by multinomial distribution model [10]. This LC shows evidence of antigen selection in both the FR and CDR regions.

Source: PubMed

3
Sottoscrivi