Investigating phenotypes of pulmonary COVID-19 recovery: A longitudinal observational prospective multicenter trial

Thomas Sonnweber, Piotr Tymoszuk, Sabina Sahanic, Anna Boehm, Alex Pizzini, Anna Luger, Christoph Schwabl, Manfred Nairz, Philipp Grubwieser, Katharina Kurz, Sabine Koppelstätter, Magdalena Aichner, Bernhard Puchner, Alexander Egger, Gregor Hoermann, Ewald Wöll, Günter Weiss, Gerlig Widmann, Ivan Tancevski, Judith Löffler-Ragg, Thomas Sonnweber, Piotr Tymoszuk, Sabina Sahanic, Anna Boehm, Alex Pizzini, Anna Luger, Christoph Schwabl, Manfred Nairz, Philipp Grubwieser, Katharina Kurz, Sabine Koppelstätter, Magdalena Aichner, Bernhard Puchner, Alexander Egger, Gregor Hoermann, Ewald Wöll, Günter Weiss, Gerlig Widmann, Ivan Tancevski, Judith Löffler-Ragg

Abstract

Background: The optimal procedures to prevent, identify, monitor, and treat long-term pulmonary sequelae of COVID-19 are elusive. Here, we characterized the kinetics of respiratory and symptom recovery following COVID-19.

Methods: We conducted a longitudinal, multicenter observational study in ambulatory and hospitalized COVID-19 patients recruited in early 2020 (n = 145). Pulmonary computed tomography (CT) and lung function (LF) readouts, symptom prevalence, and clinical and laboratory parameters were collected during acute COVID-19 and at 60, 100, and 180 days follow-up visits. Recovery kinetics and risk factors were investigated by logistic regression. Classification of clinical features and participants was accomplished by unsupervised and semi-supervised multiparameter clustering and machine learning.

Results: At the 6-month follow-up, 49% of participants reported persistent symptoms. The frequency of structural lung CT abnormalities ranged from 18% in the mild outpatient cases to 76% in the intensive care unit (ICU) convalescents. Prevalence of impaired LF ranged from 14% in the mild outpatient cases to 50% in the ICU survivors. Incomplete radiological lung recovery was associated with increased anti-S1/S2 antibody titer, IL-6, and CRP levels at the early follow-up. We demonstrated that the risk of perturbed pulmonary recovery could be robustly estimated at early follow-up by clustering and machine learning classifiers employing solely non-CT and non-LF parameters.

Conclusions: The severity of acute COVID-19 and protracted systemic inflammation is strongly linked to persistent structural and functional lung abnormality. Automated screening of multiparameter health record data may assist in the prediction of incomplete pulmonary recovery and optimize COVID-19 follow-up management.

Funding: The State of Tyrol (GZ 71934), Boehringer Ingelheim/Investigator initiated study (IIS 1199-0424).

Clinical trial number: ClinicalTrials.gov: NCT04416100.

Keywords: COVID-19; computed tomography; epidemiology; global health; human; long COVID; machine learning; medicine; post-COVID-19 syndrome; pulmonary recovery.

Conflict of interest statement

TS, SS, AB, AP, AL, CS, MN, PG, KK, SK, MA, BP, AE, GH, EW, GW, GW, IT, JL No competing interests declared, PT owns his own business, Data Analytics as a Service Tirol, for which he performs freelance data science work. Has also received an honorarium for the study data management, curation and analysis and minor manuscript work. The author has no other competing interests to declare

Figures

Figure 1.. Study inclusion flow diagram and… — **Figure 1.. Study inclusion flow diagram and analysis scheme.**

Figure 2.. Kinetic of recovery from COVID-19… — **Figure 2.. Kinetic of recovery from COVID-19 symptoms.**
Recovery from any COVID-19 symptoms was investigated by mixed-effect logistic modeling (random effect: individual; fixed effect: time). Significance was determined by the likelihood ratio test corrected for multiple testing with the Benjamini–Hochberg method, and p-values and the numbers of complete observations are indicated in the plots. (A) Frequencies of individuals with any symptoms in the study cohort stratified by acute COVID-19 severity. (B) Frequencies of participants with particular symptoms. imp.: impaired.

**Figure 3.. Kinetic of pulmonary recovery.**
Recovery from any lung computed tomography (CT) abnormalities, moderate-to-severe lung CT abnormalities (severity score > 5), and recovery from functional lung impairment were investigated in the participants stratified by acute COVID-19 severity by mixed-effect logistic modeling (random effect: individual; fixed effect: time). Significance was determined by the likelihood ratio test corrected for multiple testing with the Benjamini–Hochberg method. Frequencies of the given abnormality at the indicated time points are presented, and p-values and the numbers of complete observations are indicated in the plots.

Figure 3—figure supplement 1.. Co-occurrence of lung… — **Figure 3—figure supplement 1.. Co-occurrence of lung computed tomography (CT) abnormalities, functional lung impairment, and any persistent symptoms.**
Numbers and percentages of the study participants with any persistent symptoms, functional lung impairment, or lung CT abnormalities at the consecutive follow-up visits presented in quasi-proportional Venn diagrams. The numbers of participants with CT abnormalities, lung function (LF) impairment, and persistent symptoms are indicated in the diagrams, and the numbers of complete observations are shown under the plots.

Figure 3—figure supplement 2.. Co-occurrence of moderate-to-severe… — **Figure 3—figure supplement 2.. Co-occurrence of moderate-to-severe lung computed tomography (CT) abnormalities, functional lung impairment, and any persistent symptoms.**
Numbers and percentages of the study participants with any persistent symptoms, functional lung impairment, or moderate-to-severe lung CT abnormalities (severity score > 5) at the consecutive follow-up visits presented in quasi-proportional Venn diagrams. The numbers of participants with CT abnormalities, lung function (LF) impairment, and persistent symptoms are indicated in the diagrams, and the numbers of complete observations are shown under the plots.

Figure 3—figure supplement 3.. Frequency of mild… — **Figure 3—figure supplement 3.. Frequency of mild and moderate-to-severe lung computed tomography (CT) abnormalities.**
Prognostic value of functional lung impairment and persistent symptoms for prediction of radiological lung abnormalities. (A) Relevance of functional lung impairment and persistent COVID-19 symptoms at predicting any lung CT abnormalities and moderate-to-severe lung CT abnormalities (severity score > 5) at the consecutive follow-up visits. The concordance of the outcome variables was determined by Cohen’s κ coefficient. Statistical significance (κ / = 0) was assessed by two-tailed t-test corrected for multiple testing with the Benjamini–Hochberg method. Kappa with 95% confidence intervals and p values are presented as a heat map. The number of complete observations is indicated in the plot. (B) Percentages of mild (severity score ≤ 5) and moderate-to-severe lung CT abnormalities at the consecutive follow-up visits in the study participants stratified by the severity of acute COVID-19. Statistical significance of frequency differences was determined by χ2 test for trend corrected for multiple testing with the Benjamini–Hochberg method. The number of complete observations is indicated under the plots.

Figure 4.. Risk factors of persistent radiological… — **Figure 4.. Risk factors of persistent radiological lung abnormalities.**
Association of 52 binary explanatory variables (Appendix 1—table 1) with the presence of any lung computed tomography (CT) abnormalities (A) or moderate-to-severe lung CT abnormalities (severity score > 5) (B) at the 180-day follow-up visit was investigated with a series of univariate logistic models (Appendix 1—table 2). Odds ratio (OR) significance was determined by Wald Z test and corrected for multiple testing with the Benjamini–Hochberg method. ORs with 95% confidence intervals for significant favorable and unfavorable factors are presented in forest plots. Model baseline (ref) and numbers of complete observations are presented in the plot axis text. Q1, Q2, Q3, Q4: first, second, third, and fourth quartile of anti-S1/S2 IgG titer; ICU: intensive care unit.

Figure 5.. Risk factors of persistent functional… — **Figure 5.. Risk factors of persistent functional lung impairment.**
Association of 52 binary explanatory variables (Appendix 1—table 1) with the presence of functional lung impairment at the 180-day follow-up visit was investigated with a series of univariate logistic models (Appendix 1—table 2). Odds ratio (OR) significance was determined by Wald Z test and corrected for multiple testing with the Benjamini–Hochberg method. ORs with 95% confidence intervals for the significant favorable and unfavorable factors are presented in a forest plot. Model baseline (ref) and n numbers of complete observations are presented in the plot axis text. Q1, Q2, Q3, Q4: first, second, third, and fourth quartile of anti-S1/S2 IgG titer; CKD: chronic kidney disease.

Figure 6.. Association of incomplete symptom, lung… — **Figure 6.. Association of incomplete symptom, lung function, and radiological lung recovery with demographic and clinical parameters of acute COVID-19 and early recovery.**
Clustering of 52 non-computed tomography (non-CT) and non-lung function binary explanatory variables recorded for acute COVID-19 or at the early 60-day follow-up visit (Appendix 1—table 1) was investigated by partitioning around medoids (PAM) algorithm with simple matching distance (SMD) dissimilarity measure (Figure 6—figure supplement 1, Appendix 1—table 3). The cluster assignment for the outcome variables at the 180-day follow-up visit (persistent symptoms, functional lung impairment, mild lung CT abnormalities [severity score ≤ 5] and moderate-to-severe lung CT abnormalities [severity score > 5]) was predicted by k-nearest neighbor (kNN) label propagation procedure. Numbers of complete observations and numbers of features in the clusters are indicated in (A). (A) Cluster assignment of the outcome variables (diamonds) presented in the plot of principal component (PC) scores. The first two major PCs are displayed. The explanatory variables are visualized as points. Percentages of the data set variance associated with the PC are presented in the plot axes. (B) Five nearest neighbors (lowest SMD) of the outcome variables presented in radial plots. Font size, point radius, and color code for SMD values. Q1, Q2, Q3, Q4: first, second, third, and fourth quartile of anti-S1/S2 IgG titer; GITD: gastrointestinal disease; CKD: chronic kidney disease; ICU: intensive care unit; COPD: chronic obstructive pulmonary disease.

Figure 6—figure supplement 1.. Study feature clustering… — **Figure 6—figure supplement 1.. Study feature clustering algorithm.**
Clustering of 52 non-computed tomography (non-CT) and non-lung function binary explanatory variables recorded for acute COVID-19 or at the early 60-day follow-up visit (Appendix 1—table 1). (**A, B**) Comparison of ‘explained’ variances (between-cluster to total sum-of-squares ratio) (A) and cluster stability (mean classification error in 20-fold cross-validation) (B) in clustering of the data set with several algorithms with k = 3 centers/branches (algorithms: K-means; PAM: partitioning around medoids; HCl Ward.D2: hierarchical clustering with Ward.D2 method; distances: SMD: simple matching distance; Jaccard, Dice, and Cosine). (**C, D**) The optimal number of the feature clusters in clustering with the optimally performing PAM algorithm with SMD dissimilarity measure was determined by the bend of the total within-cluster sum-of-squares curve (C) and confirmed by good stability (low mean classification error) in 20-fold cross-validation (D).

Figure 6—figure supplement 2.. Semi-supervised clustering of… — Figure 6—figure supplement 2.. Semi-supervised clustering of mild and moderate-to-severe lung computed tomography (CT) abnormalities, functional lung impairment, and persistent symptoms at the 180-day follow-up with parameters of acute COVID-19 and early convalescence.
Clusters of 52 non-CT and non-lung function binary explanatory variables recorded for acute COVID-19 or at the 60-day follow-up visit (Appendix 1—table 1) were defined by the optimally performing partitioning around medoids (PAM) algorithm and simple matching distance (SMD) dissimilarity measure (Figure 6A, Figure 6—figure supplement 1, Appendix 1—table 3). The cluster assignment for the outcome variables at the 180-day follow-up visit (persistent symptoms, functional lung impairment, mild lung CT abnormalities [severity score ≤ 5], and moderate-to-severe lung CT abnormalities [severity score > 5]) was predicted by k-nearest neighbor (kNN) label propagation procedure. SMD between the features and their cluster assignments are shown in a heat map. The numbers of features in the clusters and the total number of observations are indicated under the plot. CVD: cardiovascular disease; Q1, Q2, Q3, Q4: first, second, third, and fourth quartile of anti-S1/S2 IgG titer; GI: gastrointestinal; PD: pulmonary disease; GITD: gastrointestinal disease; ICU: intensive care unit; COPD: chronic obstructive pulmonary disease; CKD: chronic kidney disease.

Figure 7.. Clustering of the study participants… — **Figure 7.. Clustering of the study participants by non-lung function and non-computed tomography (non-CT) clinical features.**
Study participants (n = 133 with the complete variable set) were clustered with respect to 52 non-CT and non-lung function binary explanatory variables recorded for acute COVID-19 or at the 60-day follow-up visit (Appendix 1—table 1) using a combined self-organizing map (SOM: simple matching distance) and hierarchical clustering (Ward.D2 method, Euclidean distance) procedure (Figure 7—figure supplement 1). The numbers of participants assigned to low-risk (LR), intermediate-risk (IR), and high-risk (HR) clusters are indicated in (A). (A) Cluster assignment of the study participants in the plot of principal component (PC) scores. The first two major PCs are displayed. Percentages of the data set variance associated with the PC are presented in the plot axes. (B) Presence of the most influential clustering features (Figure 7—figure supplement 2) in the participant clusters presented as a heat map. Cluster #1, #2, and #3 refer to the feature clusters defined in Figure 6. Q1, Q2, Q3, Q4: first, second, third, and fourth quartile of anti-S1/S2 IgG titer; GITD: gastrointestinal disease; CKD: chronic kidney disease; CVD: cardiovascular disease; GI: gastrointestinal; PD: pulmonary disease.

Figure 7—figure supplement 1.. Study participant clustering… — **Figure 7—figure supplement 1.. Study participant clustering algorithm.**
Clustering of the study participants (n = 133 with the complete variable set) with respect to 52 non-computed tomography (non-CT) and non-lung function binary explanatory variables recorded for acute COVID-19 or at the 60-day follow-up visit (Appendix 1—table 1). The procedure involved clustering of the observations with self-organizing maps (SOM, 4 × 4 hexagonal grid, distances: SMD: simple matching distance, Jaccard, Dice, or Cosine) followed by clustering of the SOM nodes (algorithms: HCl ward.D2: hierarchical clustering with Ward.D2 method; K-means; PAM: partitioning around medoids; distance: Euclidean). Different combinations of observation dissimilarity measures and SOM node clustering algorithms were tested in the search for the optimal clustering algorithm. (**A, B**) Comparison of ‘explained’ variances (between-cluster to total sum-of-squares ratio) (A) and cluster stability (mean classification error in 20-fold cross-validation) (B) in clustering of the data set with different observation distance measures and SOM node clustering algorithms. (C) Training of the SOM algorithm, mean distance to the winning un as a function of lgorithm iterations is presented. Note the mean distance plateau indicative of the algorithm convergence (**D–F**) The optimal number of the SOM node clusters in clustering with the optimally performing SOM HCl algorithm with SMD observation dissimilarity measure. The optimal cluster number was determined by the bend of the total within-cluster sum-of-squares curve (D) and confirmed by visual inspection of the HCl dendrogram (E) and good stability (low mean classification error) in 20-fold cross-validation (F).

Figure 7—figure supplement 2.. Impact of specific… — **Figure 7—figure supplement 2.. Impact of specific variables on the quality of participant clustering.**
The clusters of participants clusters were defined with the optimally performing self-organizing map (SOM)/HCl algorithm with simple matching distance (SMD) observation dissimilarity measure as presented in Figure 7 and Figure 7—figure supplement 1. The impact of a particular clustering variable was determined by comparing the ‘explained’ clustering variance (between-cluster to total sum-of-squares ratio) between the initial cluster structure and the structure with random resampling of the variable. Differences in the clustering variances for the most influential clustering variables (Δ clustering variance > 0) are presented in the plot. Q1, Q3: first, third quartile of anti-S1/S2 IgG titer; CKD: chronic kidney disease; GI: gastrointestinal; CVD: cardiovascular disease; PD: pulmonary disease; GITD: gastrointestinal disease.

Figure 8.. Frequency of persistent radiological lung… — **Figure 8.. Frequency of persistent radiological lung abnormalities, functional lung impairment, and symptoms in the participant clusters.**
The clusters of study participants were defined by non-lung function and non-computed tomography (non-CT) features as presented in Figure 7. Frequencies of outcome variables at the 180-day follow-up visit (mild [severity score ≤ 5], moderate-to-severe lung CT abnormalities [severity score > 5], functional lung impairment, and persistent symptoms) were compared between the low-risk (LR), intermediate-risk (IR), and high-risk (HR) participant clusters by χ2 test corrected for multiple testing with the Benjamini–Hochberg method. p-Values and numbers of participants assigned to the clusters are indicated in the plots. (A) Frequencies of the outcome features in the participant clusters. (B) Frequencies of specific symptoms in the participant clusters.

Figure 8—figure supplement 1.. Risk of radiological… — **Figure 8—figure supplement 1.. Risk of radiological lung abnormalities at the 180-day follow-up in the participant clusters.**
The clusters of participants were defined by non-lung function and non-computed tomography (non-CT) clinical features of acute COVID-19 and early convalescence (60-day follow-up visit, Appendix 1—table 1) with the optimally performing HCl algorithm with simple matching distance (SMD) observation dissimilarity measure as presented in Figure 7 and Figure 7—figure supplement 1. (A) Distribution of mild, moderate, severe, and critical acute COVID-19 cases in the participant clusters. Significance of the distribution differences was assessed with χ2 test. The numbers of participants assigned to the clusters are indicated under the plot. (B) Association of the participant cluster assignment (LR: low-risk; IR: intermediate-risk; HR: high-risk cluster) with the risk of any lung CT abnormalities and moderate-to-severe lung CT abnormalities (severity score > 5) at the 180-day follow-up visit was investigated by logistic modeling with and without inclusion of the acute COVID-19 severity effect (severity-adjusted). Odds ratio (OR) significance was determined by Wald Z test and corrected for multiple testing with the Benjamini–Hochberg method. ORs with 95% confidence intervals are presented in forest plots. Numbers of complete observations, outcome events, participants in the clusters, and the acute COVID-19 severity subsets are indicated under the plot.

Figure 9.. Prediction of persistent radiological lung… — **Figure 9.. Prediction of persistent radiological lung abnormalities, functional lung impairment, and symptoms by machine learning algorithms.**
Single machine learning classifiers (C5.0; RF: random forests; SVM-R: support vector machines with radial kernel; NNet: neural network; glmNet: elastic net) and their ensemble (Ens) were trained in the cohort data set with 52 non-computed tomography (non-CT) and non-lung function binary explanatory variables recorded for acute COVID-19 or at the 60-day follow-up visit (Appendix 1—table 1) for predicting outcome variables at the 180-day follow-up visit (any lung CT abnormalities, moderate-to-severe lung CT abnormalities [severity score > 5], functional lung impairment, and persistent symptoms) (Appendix 1—table 4). The prediction accuracy was verified by repeated 20-fold cross-validation (five repeats). Receiver-operating characteristics (ROCs) of the algorithms in the cross-validation are presented: area under the curve (AUC), sensitivity (Sens), and specificity (Spec) (Appendix 1—table 5). The numbers of complete observations and outcome events are indicated under the plots.

Figure 9—figure supplement 1.. Correlation of the… — **Figure 9—figure supplement 1.. Correlation of the machine learning algorithm prediction accuracy.**
Machine learning classifiers (C5.0; RF: random forests; SVM-R: support vector machines with radial kernel; NNet: neural network; glmNet: elastic net) were trained in the cohort data set with 52 non-computed tomography (non-CT) and non-lung function binary explanatory variables recorded for acute COVID-19 or at the early 60-day follow-up visit (Appendix 1—table 1) for predicting outcome variables at the 180-day follow-up visit (any lung CT abnormalities, moderate-to-severe lung CT abnormalities [severity score > 5], functional lung impairment, and persistent symptoms) (Figure 9, Appendix 1—table 4). The prediction accuracy was verified by repeated 20-fold cross-validation (five repeats). Pearson’s correlation coefficients of the classifier prediction accuracy in the cross-validation are presented as heat maps. Numbers of complete observations and outcome events are indicated under the plots.

Figure 9—figure supplement 2.. Machine learning model… — **Figure 9—figure supplement 2.. Machine learning model ensembles.**
Single machine learning classifiers (C5.0; RF: random forests; SVM-R: support vector machines with radial kernel; NNet: neural network; glmNet: elastic net) were trained as shown in Figure 9. The model ensembles based on the single classifiers were constructed with the glmNet procedure (Appendix 1—table 4). glmNet regression coefficients (β) are presented in the plots. Point and text color correspond to the β value. Numbers of complete observations and outcome events are indicated under the plots.

Figure 9—figure supplement 3.. Prediction of persistent… — **Figure 9—figure supplement 3.. Prediction of persistent radiological lung abnormalities, functional lung impairment, and symptoms by machine learning algorithms in the training data sets.**
Single machine learning classifiers (C5.0; RF: random forests; SVM-R: support vector machines with radial kernel; NNet: neural network; glmNet: elastic net) and their ensembles were trained as shown in Figure 9. Performance of the classifiers in the training data sets was investigated by receiver-operating characteristic (ROC) of the algorithms (AUC: area under the curve; Sens: sensitivity; Spec: specificity, Appendix 1—table 5). Numbers of complete observations and outcome events are indicated under the plots.

Figure 9—figure supplement 4.. Variable importance statistics… — **Figure 9—figure supplement 4.. Variable importance statistics for prediction of lung computed tomography (CT) abnormalities at the 180-day follow-up by machine learning classifiers.**
C5.0, random forests (RF), and elastic net (glmNet) classifiers were trained as presented in Figure 9 for prediction of any lung CT abnormalities at the 180-day follow-up visit. Variable importance measures (C5.0: % attribute/variable usage in the tree model (A); RF: difference in Gini index (B); glmNet: absolute value of the regression coefficient β (C)) for the 10 most influential explanatory variables are presented. CKD: chronic kidney disease; Q1, Q4: first, fourth quartile of anti-S1/S2 IgG titer; PD: pulmonary disease; CKD: chronic kidney disease.

Figure 9—figure supplement 5.. Variable importance statistics… — Figure 9—figure supplement 5.. Variable importance statistics for prediction of moderate-to-severe lung computed tomography (CT) abnormalities at the 180-day follow-up by machine learning classifiers.
C5.0, random forests (RF), and elastic net (glmNet) classifiers were trained as presented in Figure 9 for prediction of moderate-to-severe lung CT abnormalities (severity score > 5) at the 180-day follow-up visit. Variable importance measures (C5.0: % attribute/variable usage in the tree model (A); RF: difference in Gini index (B); glmNet: absolute value of the regression coefficient β (C)) for the 10 most influential explanatory variables are presented. PD: pulmonary disease; GITD: gastrointestinal disease; Q1, Q2, Q4: first, second, fourth quartile of anti-S1/S2 IgG titer.

Figure 9—figure supplement 6.. Variable importance statistics… — **Figure 9—figure supplement 6.. Variable importance statistics for prediction of functional lung impairment at the 180-day follow-up by machine learning classifiers.**
C5.0, random forests (RF), and elastic net (glmNet) classifiers were trained as presented in Figure 9 for prediction of functional lung impairment at the 180-day follow-up visit. Variable importance measures (C5.0: % attribute/variable usage in the tree model (A); RF: difference in Gini index (B); glmNet: absolute value of the regression coefficient β (C)) for the 10 most influential explanatory variables are presented. CKD: chronic kidney disease; Q1, Q2: first. second quartile of anti-S1/S2 IgG titer.

Figure 9—figure supplement 7.. Variable importance statistics… — **Figure 9—figure supplement 7.. Variable importance statistics for prediction of persistent symptoms at the 180-day follow-up by machine learning classifiers.**
C5.0, random forests (RF), and elastic net (glmNet) classifiers were trained as presented in Figure 9 for prediction of persistent symptoms at the 180-day follow-up visit. Variable importance measures (C5.0: % attribute/variable usage in the tree model (A); RF: difference in Gini index (B); glmNet: absolute value of the regression coefficient β (C)) for the 10 most influential explanatory variables are presented. CVD: cardiovascular disease; GITD: gastrointestinal disease; COPD: chronic obstructive lung disease.

Figure 10.. Performance of the machine learning… — **Figure 10.. Performance of the machine learning ensemble classifier in mild-to-moderate and severe-to-critical COVID-19 convalescents.**
The machine learning classifier ensemble (Ens) was developed as presented in Figure 9. Its performance at predicting outcome variables at the 180-day follow-up visit (any computed tomography [CT] lung abnormalities, moderate-to-severe lung CT abnormalities [severity score > 5], functional lung impairment, and persistent symptoms) in the entire cohort, mild-to-moderate (outpatient or hospitalized without oxygen), and severe-to-critical COVID-19 convalescents (oxygen therapy or ICU) in repeated 20-fold cross-validation (five repeats) was assessed by receiver-operating characteristic (ROC) (Appendix 1—table 6). ROC curves and statistics (AUC: area under the curve; Se: sensitivity; Sp: specificity) in the cross-validation are shown. Numbers of complete observations and outcome events are indicated in the plots.

References

1. Amato G, Gennaro C, Oria V, Radovanović M. Faster K-Medoids Clustering: Improving the PAM, CLARA, and CLARANS. Cham: Springer; 2019. Similarity Search and Applications; pp. 171–187.
1. Bates D, Mächler M, Bolker BM, Walker SC. Fitting linear mixed-effects models using lme4. Journal of Statistical Software. 2015;67:1–48.
1. Benito-León J, Del Castillo MD, Estirado A, Ghosh R, Dubey S, Serrano JI. Using Unsupervised Machine Learning to Identify Age- and Sex-Independent Severity Subgroups Among Patients with COVID-19: Observational Longitudinal Study. Journal of Medical Internet Research. 2021;23:e25988. doi: 10.2196/25988.
1. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. 1995;57:289–300. doi: 10.1111/j.2517-6161.1995.tb02031.x.
1. Boriah S, Chandola V, Kumar V. Proceedings of the 2008 SIAM International Conference on Data Mining. Similarity Measures for Categorical Data: A Comparative Evaluation; 2008. pp. 243–254.
1. Breiman L. Random forests. Machine Learning. 2001;45:5–32. doi: 10.1023/A:1010933404324.
1. Caruso D, Guido G, Zerunian M, Polidori T, Lucertini E, Pucciarelli F, Polici M, Rucci C, Bracci B, Nicolai M, Cremona A, De Dominicis C, Laghi A. Post-Acute Sequelae of COVID-19 Pneumonia: Six-month Chest CT Follow-up. Radiology. 2021;301:E396–E405. doi: 10.1148/radiol.2021210834.
1. Croux C, Filzmoser P, Oliveira MR. Algorithms for Projection–Pursuit robust principal component analysis. Chemometrics and Intelligent Laboratory Systems. 2007;87:218–225. doi: 10.1016/j.chemolab.2007.01.004.
1. Davis HE, Assaf GS, McCorkell L, Wei H, Low RJ, Re’em Y, Redfield S, Austin JP, Akrami A. Characterizing long COVID in an international cohort: 7 months of symptoms and their impact. EClinicalMedicine. 2021;38:101019. doi: 10.1016/j.eclinm.2021.101019.
1. Deane-Mayer ZA, Knowles JE. R Package Caret. Ensemble Version 2.0.1. 2019
1. Demichev V, Tober-Lau P, Lemke O, Nazarenko T, Thibeault C, Whitwell H, Röhl A, Freiwald A, Szyrwiel L, Ludwig D, Correia-Melo C, Aulakh SK, Helbig ET, Stubbemann P, Lippert LJ, Grüning N-M, Blyuss O, Vernardis S, White M, Messner CB, Joannidis M, Sonnweber T, Klein SJ, Pizzini A, Wohlfarter Y, Sahanic S, Hilbe R, Schaefer B, Wagner S, Mittermaier M, Machleidt F, Garcia C, Ruwwe-Glösenkamp C, Lingscheid T, Bosquillon de Jarcy L, Stegemann MS, Pfeiffer M, Jürgens L, Denker S, Zickler D, Enghard P, Zelezniak A, Campbell A, Hayward C, Porteous DJ, Marioni RE, Uhrig A, Müller-Redetzky H, Zoller H, Löffler-Ragg J, Keller MA, Tancevski I, Timms JF, Zaikin A, Hippenstiel S, Ramharter M, Witzenrath M, Suttorp N, Lilley K, Mülleder M, Sander LE, Ralser M, Kurth F, PA-COVID-19 Study group A time-resolved proteomic and prognostic map of COVID-19. Cell Systems. 2021;12:780–794. doi: 10.1016/j.cels.2021.05.005.
1. Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. The Lancet. Infectious Diseases. 2020;20:533–534. doi: 10.1016/S1473-3099(20)30120-1.
1. Estiri H, Strasser ZH, Brat GA, Semenov YR, Patel CJ, Murphy SN, Consortium for Characterization of COVID-19 by EHR (4CE) Evolving phenotypes of non-hospitalized patients that indicate long COVID. BMC Medicine. 2021;19:249. doi: 10.1186/s12916-021-02115-0.
1. Evans RA, McAuley H, Harrison EM, Shikotra A, Singapuri A, Sereno M, Elneima O, Docherty AB, Lone NI, Leavy OC, Daines L, Baillie JK, Brown JS, Chalder T, De Soyza A, Diar Bakerly N, Easom N, Geddes JR, Greening NJ, Hart N, Heaney LG, Heller S, Howard L, Hurst JR, Jacob J, Jenkins RG, Jolley C, Kerr S, Kon OM, Lewis K, Lord JM, McCann GP, Neubauer S, Openshaw PJM, Parekh D, Pfeffer P, Rahman NM, Raman B, Richardson M, Rowland M, Semple MG, Shah AM, Singh SJ, Sheikh A, Thomas D, Toshner M, Chalmers JD, Ho LP, Horsley A, Marks M, Poinasamy K, Wain LV, Brightling CE, PHOSP-COVID Collaborative Group Physical, cognitive, and mental health impacts of COVID-19 after hospitalisation (PHOSP-COVID): a UK multicentre, prospective cohort study. The Lancet. Respiratory Medicine. 2021;9:1275–1287. doi: 10.1016/S2213-2600(21)00383-0.
1. Faverio P, Luppi F, Rebora P, Busnelli S, Stainer A, Catalano M, Parachini L, Monzani A, Galimberti S, Bini F, Bodini BD, Betti M, De Giacomi F, Scarpazza P, Oggionni E, Scartabellati A, Bilucaglia L, Ceruti P, Modina D, Harari S, Caminati A, Valsecchi MG, Bellani G, Foti G, Pesci A. Six-Month Pulmonary Impairment after Severe COVID-19: A Prospective, Multicentre Follow-Up Study. Respiration; International Review of Thoracic Diseases. 2021;100:1078–1087. doi: 10.1159/000518141.
1. Ferrari D, Clementi N, Spanò SM, Albitar-Nehme S, Ranno S, Colombini A, Criscuolo E, Di Resta C, Tomaiuolo R, Viganó M, Mancini N, De Vecchi E, Locatelli M, Mangia A, Perno CF, Banfi G. Harmonization of six quantitative SARS-CoV-2 serological assays using sera of vaccinated subjects. Clinica Chimica Acta; International Journal of Clinical Chemistry. 2021;522:144–151. doi: 10.1016/j.cca.2021.08.024.
1. Fleiss JL, Cohen J, Everitt BS. Large sample standard errors of kappa and weighted kappa. Psychological Bulletin. 1969;72:323–327. doi: 10.1037/h0028106.
1. Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software. 2010;33:1–22. doi: 10.18637/jss.v033.i01.
1. Hatabu H, Hunninghake GM, Richeldi L, Brown KK, Wells AU, Remy-Jardin M, Verschakelen J, Nicholson AG, Beasley MB, Christiani DC, San José Estépar R, Seo JB, Johkoh T, Sverzellati N, Ryerson CJ, Graham Barr R, Goo JM, Austin JHM, Powell CA, Lee KS, Inoue Y, Lynch DA. Interstitial lung abnormalities detected incidentally on CT: a Position Paper from the Fleischner Society. The Lancet. Respiratory Medicine. 2020;8:726–737. doi: 10.1016/S2213-2600(20)30168-5.
1. Hellemons ME, Huijts S, Bek L, Berentschot J, Nakshbandi G, Schurink CAM, Vlake J, van Genderen ME, van Bommel J, Gommers D, Odink A, Ciet P, Shamier MC, GeurtsvanKessel C, Baart SJ, Ribbers GM, van den Berg-Emons HG, Heijenbrok-Kal MH, Aerts J. Persistent Health Problems beyond Pulmonary Recovery up to 6 Months after Hospitalization for SARS-CoV-2; A Longitudinal Study of Respiratory, Physical and Psychological Outcomes. Annals of the American Thoracic Society. 2021;10:340OC. doi: 10.1513/AnnalsATS.202103-340OC.
1. Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, Zhang L, Fan G, Xu J, Gu X, Cheng Z, Yu T, Xia J, Wei Y, Wu W, Xie X, Yin W, Li H, Liu M, Xiao Y, Gao H, Guo L, Xie J, Wang G, Jiang R, Gao Z, Jin Q, Wang J, Cao B. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet (London, England) 2020;395:497–506. doi: 10.1016/S0140-6736(20)30183-5.
1. Huang C, Huang L, Wang Y, Li X, Ren L, Gu X, Kang L, Guo L, Liu M, Zhou X, Luo J, Huang Z, Tu S, Zhao Y, Chen L, Xu D, Li Y, Li C, Peng L, Li Y, Xie W, Cui D, Shang L, Fan G, Xu J, Wang G, Wang Y, Zhong J, Wang C, Wang J, Zhang D, Cao B. 6-month consequences of COVID-19 in patients discharged from hospital: a cohort study. Lancet (London, England) 2021a;397:220–232. doi: 10.1016/S0140-6736(20)32656-8.
1. Huang L, Yao Q, Gu X, Wang Q, Ren L, Wang Y, Hu P, Guo L, Liu M, Xu J, Zhang X, Qu Y, Fan Y, Li X, Li C, Yu T, Xia J, Wei M, Chen L, Li Y, Xiao F, Liu D, Wang J, Wang X, Cao B. 1-year outcomes in hospital survivors with COVID-19: a longitudinal cohort study. Lancet (London, England) 2021b;398:747–758. doi: 10.1016/S0140-6736(21)01755-4.
1. Hui DS, Wong KT, Ko FW, Tam LS, Chan DP, Woo J, Sung JJY. The 1-year impact of severe acute respiratory syndrome on pulmonary function, exercise capacity, and quality of life in a cohort of survivors. Chest. 2005;128:2247–2261. doi: 10.1378/chest.128.4.2247.
1. Johns Hopkins Coronavirus Resource Center COVID-19 Map. 2021. [May 20, 2021].
1. Khanna D, Tashkin DP, Denton CP, Renzoni EA, Desai SR, Varga J. Etiology, Risk Factors, and Biomarkers in Systemic Sclerosis with Interstitial Lung Disease. American Journal of Respiratory and Critical Care Medicine. 2020;201:650–660. doi: 10.1164/rccm.201903-0563CI.
1. Kohonen T. Self-Organizing Maps. Berlin, Heidelberg: Springer; 1995.
1. Kuhn M. Building predictive models in R using the caret package. Journal of Statistical Software. 2008;28:1–26. doi: 10.18637/jss.v028.i05.
1. Kuznetsova A, Brockhoff PB, Christensen RHB. lmerTest Package: Tests in Linear Mixed Effects Models. Journal of Statistical Software. 2017;82:1–26. doi: 10.18637/jss.v082.i13.
1. Lam MHB, Wing YK, Yu MWM, Leung CM, Ma RCW, Kong APS, So WY, Fong SYY, Lam SP. Mental morbidities and chronic fatigue in severe acute respiratory syndrome survivors: long-term follow-up. Archives of Internal Medicine. 2009;169:2142–2147. doi: 10.1001/archinternmed.2009.384.
1. Lange T, Roth V, Braun ML, Buhmann JM. Stability-based validation of clustering solutions. Neural Computation. 2004;16:1299–1323. doi: 10.1162/089976604773717621.
1. Leng M, Wang J, Cheng J, Zhou H, Chen X. Adaptive Semi-Supervised Clustering Algorithm with Label Propagation. Journal of Software Engineering. 2013;8:14–22. doi: 10.3923/jse.2014.14.22.
1. Masclans JR, Roca O, Muñoz X, Pallisa E, Torres F, Rello J, Morell F. Quality of life, pulmonary function, and tomographic scan abnormalities after ARDS. Chest. 2011;139:1340–1346. doi: 10.1378/chest.10-2438.
1. Ng CK, Chan JWM, Kwan TL, To TS, Chan YH, Ng FYY, Mok TYW. Six month radiological and physiological outcomes in severe acute respiratory syndrome (SARS) survivors. Thorax. 2004;59:889–891. doi: 10.1136/thx.2004.023762.
1. Ngai JC, Ko FW, Ng SS, To KW, Tong M, Hui DS. The long-term impact of severe acute respiratory syndrome on pulmonary function, exercise capacity and health status. Respirology (Carlton, Vic.) 2010;15:543–550. doi: 10.1111/j.1440-1843.2010.01720.x.
1. Perez-Saez J. Serology-informed estimates of SARS-CoV-2 infection fatality risk in Geneva, Switzerland. The Lancet. Infectious Diseases. 2021;21:e69–e70. doi: 10.1016/S1473-3099(20)30584-3.
1. Pérez-Silva JG, Araujo-Voces M, Quesada V. nVenn: generalized, quasi-proportional Venn and Euler diagrams. Bioinformatics (Oxford, England) 2018;34:2322–2324. doi: 10.1093/bioinformatics/bty109.
1. Quinlan JR. C4.5: Programs for Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc; 1993.
1. Raghu G, Wilson KC. COVID-19 interstitial pneumonia: monitoring the clinical course in survivors. The Lancet. Respiratory Medicine. 2020;8:839–842. doi: 10.1016/S2213-2600(20)30349-0.
1. Ripley BD. Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press; 2014.
1. Sachs MC. plotROC: A Tool for Plotting ROC Curves. Journal of Statistical Software. 2017;79:1–19. doi: 10.18637/jss.v079.c02.
1. Sahanic S, Tymoszuk P, Ausserhofer D, Rass V, Pizzini A, Nordmeyer G, Hüfner K, Kurz K, Weber PM, Sonnweber T, Boehm A, Aichner M, Cima K, Boeckle B, Holzner B, Rumpold G, Puelacher C, Kiechl S, Huber A, Wiedermann CJ, Sperner-Unterweger B, Tancevski I, Bellmann-Weiler R, Bachler H, Piccoliori G, Helbok R, Weiss G, Loeffler-Ragg J. Phenotyping of acute and persistent COVID-19 features in the outpatient setting: exploratory analysis of an international cross-sectional online survey. Clinical Infectious Diseases. 2021;2:ciab978. doi: 10.1093/cid/ciab978.
1. Shah W, Hillman T, Playford ED, Hishmeh L. Managing the long term effects of covid-19: summary of NICE, SIGN, and RCGP rapid guideline. BMJ (Clinical Research Ed.) 2021;372:136. doi: 10.1136/bmj.n136.
1. Sonnweber T, Boehm A, Sahanic S, Pizzini A, Aichner M, Sonnweber B, Kurz K, Koppelstätter S, Haschka D, Petzer V, Hilbe R, Theurl M, Lehner D, Nairz M, Puchner B, Luger A, Schwabl C, Bellmann-Weiler R, Wöll E, Widmann G, Tancevski I, Weiss G. Persisting alterations of iron homeostasis in COVID-19 are associated with non-resolving lung pathologies and poor patients’ performance: a prospective observational cohort study. Respiratory Research. 2020;21:276. doi: 10.1186/s12931-020-01546-2.
1. Sonnweber T, Sahanic S, Pizzini A, Luger A, Schwabl C, Sonnweber B, Kurz K, Koppelstätter S, Haschka D, Petzer V, Boehm A, Aichner M, Tymoszuk P, Lener D, Theurl M, Lorsbach-Köhler A, Tancevski A, Schapfl A, Schaber M, Hilbe R, Nairz M, Puchner B, Hüttenberger D, Tschurtschenthaler C, Aßhoff M, Peer A, Hartig F, Bellmann R, Joannidis M, Gollmann-Tepeköylü C, Holfeld J, Feuchtner G, Egger A, Hoermann G, Schroll A, Fritsche G, Wildner S, Bellmann-Weiler R, Kirchmair R, Helbok R, Prosch H, Rieder D, Trajanoski Z, Kronenberg F, Wöll E, Weiss G, Widmann G, Löffler-Ragg J, Tancevski I. Cardiopulmonary recovery after COVID-19: an observational prospective multicentre trial. The European Respiratory Journal. 2021;57:2003481. doi: 10.1183/13993003.03481-2020.
1. Sudre CH, Lee KA, Lochlainn MN, Varsavsky T, Murray B, Graham MS, Menni C, Modat M, Bowyer RCE, Nguyen LH, Drew DA, Joshi AD, Ma W, Guo CG, Lo CH, Ganesh S, Buwe A, Pujol JC, du Cadet JL, Visconti A, Freidin MB, El-Sayed Moustafa JS, Falchi M, Davies R, Gomez MF, Fall T, Cardoso MJ, Wolf J, Franks PW, Chan AT, Spector TD, Steves CJ, Ourselin S. Symptom clusters in COVID-19: A potential clinical prediction tool from the COVID Symptom Study app. Science Advances. 2021a;7:12. doi: 10.1126/sciadv.abd4177.
1. Sudre CH, Murray B, Varsavsky T, Graham MS, Penfold RS, Bowyer RC, Pujol JC, Klaser K, Antonelli M, Canas LS, Molteni E, Modat M, Jorge Cardoso M, May A, Ganesh S, Davies R, Nguyen LH, Drew DA, Astley CM, Joshi AD, Merino J, Tsereteli N, Fall T, Gomez MF, Duncan EL, Menni C, Williams FMK, Franks PW, Chan AT, Wolf J, Ourselin S, Spector T, Steves CJ. Attributes and predictors of long COVID. Nature Medicine. 2021b;27:626–631. doi: 10.1038/s41591-021-01292-y.
1. Suliman YA, Dobrota R, Huscher D, Nguyen-Kim TDL, Maurer B, Jordan S, Speich R, Frauenfelder T, Distler O. Brief Report: Pulmonary Function Tests: High Rate of False-Negative Results in the Early Detection and Screening of Scleroderma-Related Interstitial Lung Disease. Arthritis & Rheumatology (Hoboken, N.J.) 2015;67:3256–3261. doi: 10.1002/art.39405.
1. Venkatesan P. NICE guideline on long COVID. The Lancet. Respiratory Medicine. 2021;9:129. doi: 10.1016/S2213-2600(21)00031-X.
1. Vesanto J, Alhoniemi E. Clustering of the self-organizing map. IEEE Transactions on Neural Networks. 2000;11:586–600. doi: 10.1109/72.846731.
1. Wang J. Consistent selection of the number of clusters via crossvalidation. Biometrika. 2010;97:893–904. doi: 10.1093/biomet/asq061.
1. Wehrens R, Kruisselbrink J. Flexible self-organizing maps in kohonen 3.0. Journal of Statistical Software. 2018;87:1–18. doi: 10.18637/jss.v087.i07.
1. Weston J, Watkins C. Proceedings, European Symposium on Artificial Neural Networks. Multi-Class Support Vector Machines; 1998.
1. WHO Coronavirus. 2021. [May 20, 2021].
1. Wickham H. Ggplot2: Elegant Graphics for Data Analysis. Cham: Springer-Verlag; 2016.
1. Wickham H, Averick M, Bryan J, Chang W, McGowan L, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen T, Miller E, Bache S, Müller K, Ooms J, Robinson D, Seidel D, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H. Welcome to the Tidyverse. Journal of Open Source Software. 2019;4:1686. doi: 10.21105/joss.01686.
1. Wilcox ME, Patsios D, Murphy G, Kudlow P, Paul N, Tansey CM, Chu L, Matte A, Tomlinson G, Herridge MS. Radiologic outcomes at 5 years after severe ARDS. Chest. 2013;143:920–926. doi: 10.1378/chest.12-0685.
1. Wilke CO. Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures. Sebastopol: O’Reilly Media; 2019.
1. Zhou F, Tao M, Shang L, Liu Y, Pan G, Jin Y, Wang L, Hu S, Li J, Zhang M, Fu Y, Yang S. Assessment of Sequelae of COVID-19 Nearly 1 Year After Diagnosis. Frontiers in Medicine. 2021;8:717194. doi: 10.3389/fmed.2021.717194.

Source: PubMed

Investigating phenotypes of pulmonary COVID-19 recovery: A longitudinal observational prospective multicenter trial

Abstract

Conflict of interest statement

Figures

References

Sponsors and Collaborators

Medical Conditions

Drug Interventions