The RIN: an RNA integrity number for assigning integrity values to RNA measurements

Andreas Schroeder, Odilo Mueller, Susanne Stocker, Ruediger Salowsky, Michael Leiber, Marcus Gassmann, Samar Lightfoot, Wolfram Menzel, Martin Granzow, Thomas Ragg, Andreas Schroeder, Odilo Mueller, Susanne Stocker, Ruediger Salowsky, Michael Leiber, Marcus Gassmann, Samar Lightfoot, Wolfram Menzel, Martin Granzow, Thomas Ragg

Abstract

Background: The integrity of RNA molecules is of paramount importance for experiments that try to reflect the snapshot of gene expression at the moment of RNA extraction. Until recently, there has been no reliable standard for estimating the integrity of RNA samples and the ratio of 28S:18S ribosomal RNA, the common measure for this purpose, has been shown to be inconsistent. The advent of microcapillary electrophoretic RNA separation provides the basis for an automated high-throughput approach, in order to estimate the integrity of RNA samples in an unambiguous way.

Methods: A method is introduced that automatically selects features from signal measurements and constructs regression models based on a Bayesian learning technique. Feature spaces of different dimensionality are compared in the Bayesian framework, which allows selecting a final feature combination corresponding to models with high posterior probability.

Results: This approach is applied to a large collection of electrophoretic RNA measurements recorded with an Agilent 2100 bioanalyzer to extract an algorithm that describes RNA integrity. The resulting algorithm is a user-independent, automated and reliable procedure for standardization of RNA quality control that allows the calculation of an RNA integrity number (RIN).

Conclusion: Our results show the importance of taking characteristics of several regions of the recorded electropherogram into account in order to get a robust and reliable prediction of RNA integrity, especially if compared to traditional methods.

Figures

Figure 1
Figure 1
Application environment. (1) Role of RNA in gene expression and protein production, (2) extracted RNA molecules, measurement of RNA sizes applying Agilents 2100 bioanalyzer and (3) assignment of integrity categories to RNA samples. In the sample, RNA molecules of different sizes occur, which is measured by the 2100 bioanalyzer. The distinction with regard to integrity is based on this size distribution in each sample.
Figure 2
Figure 2
RNA integrity categories. The figure shows typical representatives of the ten integrity categories. RIN values range from 10 (intact) to 1 (totally degraded). The gradual degradation of rRNA is reflected by a continuous shift towards shorter fragment sizes.
Figure 3
Figure 3
Evidence-based model selection. Dependency of model evidence on a logarithmic scale from the number of features used and from the degree of non-linearity in the hidden layer. The values are average values over a 10-fold cross-validation procedure. The highest evidence is reached for models with 5 to 7 input features and 2 to 5 hidden neurons respectively. All these models have a low generalization error below 0.25.
Figure 4
Figure 4
Generalization errors. Dependency of the generalization error of the model from the number of features used and from the degree of non-linearity in the hidden layer. The values are average values over a 10-fold cross-validation procedure. Models with highest evidence (5 to 7 input features and 2 to 5 hidden neurons respectively) have a low generalization error below 0.25.
Figure 5
Figure 5
Receiver operating characteristics of categorical misclassifications. The figure shows the Receiver Operating Characteristics for distinguishing electropherograms of category 10 against the set union of other categories. The area under the curve (AUC) gives a measure of classification performance. Random assignment is equal to an area of 0.5, whereas perfect assignment is equal to an area of 1.0. Only few experiments are exchanged over more than one categorical border.
Figure 6
Figure 6
Correlation between RNA integrity and rt-PCR experiments. The figure shows the correlation between RNA integrity values and the outcome of an real-time-PCR experiment, i.e. the average expression values of 4 housekeeping genes (GAPDH, KYNF, NEFL, β2M). The vertical line is a meaningful threshold value for RIN classification, while the horizontal separates acceptable from unacceptable real-time PCR results, a) The RIN shows a strong correlation (0.52) to the expression value of the house keeping genes. A straightforward separation into positives and negatives is possible. b) The ribosomal ratio shows a poor correlation (0.24) to the expression value of the house keeping genes.
Figure 7
Figure 7
2D visualization of integrity categories. The figure shows a projection of the categories onto the two-dimensional space spanned by the first two features of the selected combination. These are total RNA ratio and 28S peak height. The experiments are clearly grouped along a curve from the left bottom corner up to the top and then to the right top corner. The variance in location of the experiments increases with larger categorical value. Categories 1 and 2 have almost no variance in this feature space. The grey border in the domain is given by the abnormality detectors for this two variables, i.e., for a data point outside the white area no RIN is computed.
Figure 8
Figure 8
Feature extraction. Segments of an electropherogram: The segment preceeding the lower marker is designated the pre-region. The marker-region coincides with the area occupied by the lower-marker peak. The 5S-region covers the small rRNA fragments (5S and 5.8S rRNA, and tRNA). The 18S-region and 28S-region cover the 18S peak and 28S peak, respectively. The fast-region lies between the 5S-region and the 18S-region. The inter-region lies between the 18S-region and the 28S-region. The precursor-region covers the pre-cursor RNA following the 28S-region. And finally the post-region lies beyond the precursor-region.

References

    1. Auer H, Lyianarachchi S, Newsome D, Klisovic M, Marcucci G, Kornacker K, Marcucci U. Chipping away at the chip bias: RNA degradation in microarray analysis. Nature Genetics. 2003;35:292–293. doi: 10.1038/ng1203-292.
    1. Imbeaud S, Graudens E, Boulanger V, Barlet X, Zaborski P, Eveno E, Mueller O, Schroeder A, Auffray C. Towards standardization of RNA quality assessment using user-independent classifiers of microcapillary electrophoresis traces. Nucleic Acids Research (Published online 30 March) 2005;33:e56. doi: 10.1093/nar/gni054.
    1. Sambrook J, Fritsch E, Maniatis T. Molecular Cloning, a laboratory manual. 2. Cold Spring Harbor Laboratory Press, New York; 1989.
    1. Mueller O, Hahnenberger K, Dittmann M, Yee H, Dubrow R, Nagle R, Isley D. A microfluidic system for high-speed reproducible DNA sizing and quantitation. Electrophoresis. 2000;21:128–134. doi: 10.1002/(SICI)1522-2683(20000101)21:1<128::AID-ELPS128>;2-M.
    1. Miller C, Diglisic S, Leister F, Webster M, Yolken R. Evaluating RNA status for RT-PCR in extracts of postmortem human brain tissue. Biotechniques. 2004;36:628–633.
    1. RZPD, Im Neuenheimer Feld 580, D-69120 Heidelberg
    1. Swets J, Pickett R. Evaluation of Diagnostic Systems: Methods from Signal Detection Theory. Academic Press, New York; 1982.
    1. RIN software
    1. The RIN-project ,
    1. Mueller O, Lightfoot S, Schröder A. RNA Integrity Number (RIN) Standardization of RNA Quality Control. Tech Rep 5989-1165EN, Agilent Technologies, Application Note. 2004.
    1. Schröder A. Qualitätsbestimmung von RNA-Proben mittels adaptiver Verfahren. Diplomarbeit Universität Karlsruhe. 2003.
    1. Agilent 2100 expert software. Tech Rep 5989-0112EN, Agilent Technologies, Software Data Sheet. 2004.
    1. Scott D, Thompson J. Probability density estimation in higher dimensions. In: Gentle J, editor. Computer Science and Statistics: Proceedings of the Fifteenth Symposium on the Interface. 1983. pp. 173–179.
    1. Silverman B. Density Estimation for Statistics and Data Analysis. Chapman and Hall; 1986.
    1. Cover T, Thomas J. Elements of Information Theory. Wiley Series in Telecommunications, John Wiley & Sons; 1991.
    1. Ragg T. Bayesian Learning and Evolutionary Parameter Optimization. AI Communications. 2002;15:61–74.
    1. Bishop CM. Neural Networks for Pattern Recognition. Oxford Press; 1995.
    1. MacKay DJC. A practical Bayesian Framework for backpropagation networks. Neural Computation. 1992;4:448–472.
    1. Riedmiller M. Advanced Supervised Learning in Multi-layer Perceptrons – From Backpropagation to Adaptive Learning Algorithms. Int Journal of Computer Standards and Interfaces. 1994;16:265–278. doi: 10.1016/0920-5489(94)90017-5. [Special Issue on Neural Networks]

Source: PubMed

3
Abonnieren