Prospective recruitment of patients with congestive heart failure using an ad-hoc binary classifier

Serguei V Pakhomov, James Buntrock, Christopher G Chute, Serguei V Pakhomov, James Buntrock, Christopher G Chute

Abstract

This paper addresses a very specific problem of identifying patients diagnosed with a specific condition for potential recruitment in a clinical trial or an epidemiological study. We present a simple machine learning method for identifying patients diagnosed with congestive heart failure and other related conditions by automatically classifying clinical notes dictated at Mayo Clinic. This method relies on an automatic classifier trained on comparable amounts of positive and negative samples of clinical notes previously categorized by human experts. The documents are represented as feature vectors, where features are a mix of demographic information as well as single words and concept mappings to MeSH and HICDA classification systems. We compare two simple and efficient classification algorithms (Naïve Bayes and Perceptron) and a baseline term spotting method with respect to their accuracy and recall on positive samples. Depending on the test set, we find that Naïve Bayes yields better recall on positive samples (95 vs. 86%) but worse accuracy than Perceptron (57 vs. 65%). Both algorithms perform better than the baseline with recall on positive samples of 71% and accuracy of 54%.

Source: PubMed

3
Abonnieren