Plasma protein patterns as comprehensive indicators of health

Stephen A Williams, Mika Kivimaki, Claudia Langenberg, Aroon D Hingorani, J P Casas, Claude Bouchard, Christian Jonasson, Mark A Sarzynski, Martin J Shipley, Leigh Alexander, Jessica Ash, Tim Bauer, Jessica Chadwick, Gargi Datta, Robert Kirk DeLisle, Yolanda Hagar, Michael Hinterberg, Rachel Ostroff, Sophie Weiss, Peter Ganz, Nicholas J Wareham, Stephen A Williams, Mika Kivimaki, Claudia Langenberg, Aroon D Hingorani, J P Casas, Claude Bouchard, Christian Jonasson, Mark A Sarzynski, Martin J Shipley, Leigh Alexander, Jessica Ash, Tim Bauer, Jessica Chadwick, Gargi Datta, Robert Kirk DeLisle, Yolanda Hagar, Michael Hinterberg, Rachel Ostroff, Sophie Weiss, Peter Ganz, Nicholas J Wareham

Abstract

Proteins are effector molecules that mediate the functions of genes1,2 and modulate comorbidities3-10, behaviors and drug treatments11. They represent an enormous potential resource for personalized, systemic and data-driven diagnosis, prevention, monitoring and treatment. However, the concept of using plasma proteins for individualized health assessment across many health conditions simultaneously has not been tested. Here, we show that plasma protein expression patterns strongly encode for multiple different health states, future disease risks and lifestyle behaviors. We developed and validated protein-phenotype models for 11 different health indicators: liver fat, kidney filtration, percentage body fat, visceral fat mass, lean body mass, cardiopulmonary fitness, physical activity, alcohol consumption, cigarette smoking, diabetes risk and primary cardiovascular event risk. The analyses were prospectively planned, documented and executed at scale on archived samples and clinical data, with a total of ~85 million protein measurements in 16,894 participants. Our proof-of-concept study demonstrates that protein expression patterns reliably encode for many different health issues, and that large-scale protein scanning12-16 coupled with machine learning is viable for the development and future simultaneous delivery of multiple measures of health. We anticipate that, with further validation and the addition of more protein-phenotype models, this approach could enable a single-source, individualized so-called liquid health check.

Conflict of interest statement

Competing interests

The SomaLogic co-authors (S.W., L.A., J.A., T.B., J.C., G.D., R.K.D., Y.H., M.H., R.O. and S.W.) were/are all employees of SomaLogic, Inc., which has a commercial interest in the results. N.W. and C.L. declared that SomaLogic, Inc. has given a grant to the University of Cambridge. P.G. is a member of the SomaLogic Medical Advisory board, for which he receives no remuneration of any kind. The remaining authors (M.K., A.H., J.P.C., C.B., C.J., M.S. and M.S.) have no competing interests.

Figures

Extended Data Fig. 1 |. Descriptors of…
Extended Data Fig. 1 |. Descriptors of parent studies and fractions used for model derivation and validation.
Solid black arrows designate how fractions of samples and clinical data were utilized independently; blue dashed arrows designate the validation of finalized models either in new fractions of the same dataset or in independent datasets. eGFR = estimated glomerular filtration rate; VO2max. = maximum rate of oxygen consumption; kg. = kilograms. *For Fenland, the precise numbers available for 70%/15%/15% fractions depended on the numbers of participants with data for each endpoint as follows: n=9654 for self-reported alcohol units, n = 11,471 with DEXA scans for body composition, n=10,077 with ultrasound for liver fat, n=11,695 with individually calibrated heart rate and movement sensing for caloric expenditure due to physical activity. **For HERITAGE the model was trained on the pre-training time point from half the 523 participants and the post training time point from the other half of the participants. The model was tested on samples with the opposite time points in the same participants and finally replicated in the 10% fraction not used for training.
Extended Data Fig. 2 |
Extended Data Fig. 2 |
Details of the 5 parent cohort studies.
Extended Data Fig. 3 |
Extended Data Fig. 3 |
Participant characteristics for current health state models.
Extended Data Fig. 4 |
Extended Data Fig. 4 |
Participant characteristics for current state body composition models.
Extended Data Fig. 5 |
Extended Data Fig. 5 |
Participant characteristics for modifiable behavioral factors models.
Extended Data Fig. 6 |
Extended Data Fig. 6 |
Participant characteristics for future metabolic health risks models.
Fig. 1 |. Model outputs compared to…
Fig. 1 |. Model outputs compared to the truth standards against which they were derived.
All panels show the data from the validation sets, except for the diabetes survival model where, for clarity, the Kaplan-Meier curves are shown for the much larger discovery datasets. Box plots are broken down into quantiles: minimum (25%), median (75%) and maximum. Scatter plots include a linear line of fit (red solid line). Dashed lines represent upper and lower 95% confidence intervals. VAT, visceral adipose tissue. CVD, cardiovascular disease.

Source: PubMed

3
Abonneren