Natural Language Processing (NLP) Analysis of Free Text Notes to Investigate Coronavirus (COVID-19)

July 22, 2021 updated by: Dr Sapna Trivedi, Cambridge University Hospitals NHS Foundation Trust

A Database and Analytics Study of Free Text Clinical Notes and Structured Data to Investigate Phenotype Associations With Outcomes in Patients With COVID-19

A retrospective cohort study investigating clinical notes using Natural Language Processing in combination with structured data from the Electronic Health Record (EHR) to create a database for analytics to identify features associated with outcomes.

Study Overview

Status

Completed

Conditions

Detailed Description

Patients admitted to Cambridge University Hospitals (CUH)with COVID-19 have undergone routine clinical documentation and specific investigation and testing for COVID-19. The pathway for these patients ranges from supportive measures on the ward to deterioration requiring Intensive therapy Unit (ITU) admission and ventilatory support. Patients are also at risk of developing complications such as Acute Kidney Injury and thromboembolism. Identification of the risk factors for these and other outcomes such as the requirement for ventilation remain a challenge and reviewing the clinical data for these patients is critical in the understanding of the relationship between patient characteristics and outcomes.

There is data available in structured fields in the EHR, however, this is sometimes incomplete and inaccurate. An assessment of the free text clinical notes provides an opportunity to fill in the gaps and provide a much richer dataset for evaluation. We plan to use Natural Language Processing (NLP) (a field of machine learning that allows computers to analyse human language) to review Discharge Summaries of patients admitted to hospital with COVID-19 and convert free text data into structured data for analysis.

The NLP techniques developed by Dr Collier's team include methods for coding of free texts to SNOMED CT and other biomedical ontologies. These methods, based on statistical machine learning from human annotated texts, have been benchmarked for scientific texts and social media. In this project we intend to adapt these techniques for patient records. The techniques will require a number of human annotated patient records in order to adapt. The NLP output will be combined with structured data from the EHR and undergo statistical analysis to identify the rates of complications in patients with COVID-19 and risk factors associated with these. This may help to guide management decisions by earlier intervention to prevent poor outcomes in these patients.

Study Type

Observational

Enrollment (Actual)

200

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Locations

      • Cambridge, United Kingdom
        • Cambridge University NHS Foundation Trust

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

18 years to 100 years (Adult, Older Adult)

Accepts Healthy Volunteers

N/A

Genders Eligible for Study

All

Sampling Method

Non-Probability Sample

Study Population

Patients admitted to Cambridge University Hospitals NHS Foundation Trust with confirmed COVID-19

Description

Inclusion Criteria:

  • Male and female
  • Age range: 18 to 100 years
  • Patients admitted to Cambridge University Hospitals with confirmed COVID-19 on lab testing

Exclusion Criteria:

Children and patients with a negative COVID test.

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
research database of EHR records from COVID-19 patients processed using NLP tools for named entity recognition and linking adapted to CUH EMR data to identify variables of interest
Time Frame: 1 year
Our overarching hypothesis is that the NLP-extracted data from the free-text discharge summary can be combined with structured data from the EMR to yield insights into the development of complications. Patient with severe disease requiring ITU admission and non severe disease managed on an inpatient ward will be included. The variables of interest will include patient characteristics and specific encounter related information including length of stay and baseline investigations (e.g., blood tests) and interventions received
1 year

Secondary Outcome Measures

Outcome Measure
Time Frame
A set of annotation guidelines to produce human-expert (gold) labelled data for a subset of the EHR
Time Frame: 6 months
6 months
A comparison of the NLP output to terms in the structured problem list to identify missing terms in the structured problem list
Time Frame: 1 year
1 year

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Actual)

July 1, 2020

Primary Completion (Actual)

July 1, 2021

Study Completion (Actual)

July 1, 2021

Study Registration Dates

First Submitted

June 15, 2020

First Submitted That Met QC Criteria

June 15, 2020

First Posted (Actual)

June 16, 2020

Study Record Updates

Last Update Posted (Actual)

July 28, 2021

Last Update Submitted That Met QC Criteria

July 22, 2021

Last Verified

July 1, 2021

More Information

Terms related to this study

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

No

Studies a U.S. FDA-regulated device product

No

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on COVID-19

3
Subscribe