A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data

Michael G Kahn, Tiffany J Callahan, Juliana Barnard, Alan E Bauck, Jeff Brown, Bruce N Davidson, Hossein Estiri, Carsten Goerg, Erin Holve, Steven G Johnson, Siaw-Teng Liaw, Marianne Hamilton-Lopez, Daniella Meeker, Toan C Ong, Patrick Ryan, Ning Shang, Nicole G Weiskopf, Chunhua Weng, Meredith N Zozus, Lisa Schilling, Michael G Kahn, Tiffany J Callahan, Juliana Barnard, Alan E Bauck, Jeff Brown, Bruce N Davidson, Hossein Estiri, Carsten Goerg, Erin Holve, Steven G Johnson, Siaw-Teng Liaw, Marianne Hamilton-Lopez, Daniella Meeker, Toan C Ong, Patrick Ryan, Ning Shang, Nicole G Weiskopf, Chunhua Weng, Meredith N Zozus, Lisa Schilling

Abstract

Objective: Harmonized data quality (DQ) assessment terms, methods, and reporting practices can establish a common understanding of the strengths and limitations of electronic health record (EHR) data for operational analytics, quality improvement, and research. Existing published DQ terms were harmonized to a comprehensive unified terminology with definitions and examples and organized into a conceptual framework to support a common approach to defining whether EHR data is 'fit' for specific uses.

Materials and methods: DQ publications, informatics and analytics experts, managers of established DQ programs, and operational manuals from several mature EHR-based research networks were reviewed to identify potential DQ terms and categories. Two face-to-face stakeholder meetings were used to vet an initial set of DQ terms and definitions that were grouped into an overall conceptual framework. Feedback received from data producers and users was used to construct a draft set of harmonized DQ terms and categories. Multiple rounds of iterative refinement resulted in a set of terms and organizing framework consisting of DQ categories, subcategories, terms, definitions, and examples. The harmonized terminology and logical framework's inclusiveness was evaluated against ten published DQ terminologies.

Results: Existing DQ terms were harmonized and organized into a framework by defining three DQ categories: (1) Conformance (2) Completeness and (3) Plausibility and two DQ assessment contexts: (1) Verification and (2) Validation. Conformance and Plausibility categories were further divided into subcategories. Each category and subcategory was defined with respect to whether the data may be verified with organizational data, or validated against an accepted gold standard, depending on proposed context and uses. The coverage of the harmonized DQ terminology was validated by successfully aligning to multiple published DQ terminologies.

Discussion: Existing DQ concepts, community input, and expert review informed the development of a distinct set of terms, organized into categories and subcategories. The resulting DQ terms successfully encompassed a wide range of disparate DQ terminologies. Operational definitions were developed to provide guidance for implementing DQ assessment procedures. The resulting structure is an inclusive DQ framework for standardizing DQ assessment and reporting. While our analysis focused on the DQ issues often found in EHR data, the new terminology may be applicable to a wide range of electronic health data such as administrative, research, and patient-reported data.

Conclusion: A consistent, common DQ terminology, organized into a logical framework, is an initial step in enabling data owners and users, patients, and policy makers to evaluate and communicate data quality findings in a well-defined manner with a shared vocabulary. Future work will leverage the framework and terminology to develop reusable data quality assessment and reporting methods.

Keywords: data completeness; data use & quality; electronic health records.

Figures

Figure 1.
Figure 1.
Timeline of Significant Events in Developing the Harmonized DQ Terminology

References

    1. Sanson-Fisher RW, Bonevski B, Green LW, D’Este C. Limitations of the randomized controlled trial in evaluating population-based health interventions. Am J Prev Med. 2007 Aug;33(2):155–61.
    1. Safran C, Bloomrosen M, Hammond WE, Labkoff S, Markel-Fox S, Tang PC, et al. Toward a National Framework for the Secondary Use of Health Data: An American Medical Informatics Association White Paper. J Am Med Inform Assoc. 2007 Jan 1;14(1):1–9.
    1. Weiner MG, Embi PJ. Toward Reuse of Clinical Data for Research and Quality Improvement: The End of the Beginning? Annals of Internal Medicine. 2009;151(5):359–60.
    1. Lopez MH, Holve E, Sarkar IN, Segal C. Building the informatics infrastructure for Comparative Effectiveness Research (CER): A review of the literature. Medical care. 2012 Jul;50(Suppl):S38–48.
    1. Collins FS, Hudson KL, Briggs JP, Lauer MS. PCORnet: turning a dream into reality. J Am Med Inform Assoc. 2014 May 12;
    1. Pace WD, Cifuentes M, Valuck RJ, Staton EW, Brandt EC, West DR. An electronic practice-based network for observational comparative effectiveness research. Ann Intern Med. 2009 Sep 1;151(5):338–40.
    1. Brown JS, Holmes JH, Shah K, Hall K, Lazarus R, Platt R. Distributed health data networks: a practical and preferred approach to multi-institutional evaluations of comparative effectiveness, safety, and quality of care. Medical care. 2010;48(6):S45–S51.
    1. Randhawa GS, Slutsky JR. Building sustainable multi-functional prospective electronic clinical data systems. Med Care. 2012 Jul;50(Suppl):S3–6.
    1. Randhawa GS. Building electronic data infrastructure for comparative effectiveness research: accomplishments, lessons learned and future steps. J Comp Eff Res. 2014 Nov;3(6):567–72.
    1. Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS. Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc. 2014 May 12;
    1. Curtis LH, Brown J, Platt R. Four Health Data Networks Illustrate The Potential For A Shared National Multipurpose Big-Data Network. Health Affairs. 2014 Jul 1;33(7):1178–86.
    1. Hersh WR. Adding value to the electronic health record through secondary use of data for quality assurance, research, and surveillance. Am J Manag Care. 2007 Jun;13(6 Part 1):277–8.
    1. Immanuel V, Johnson K, Young B, Hart G. Testimony on secondary uses of health data to the National Committee on Vital and Health Statistics. 2007
    1. Murphy SN, Weber G, Mendis M, Gainer V, Chueh HC, Churchill S, et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) J Am Med Inform Assoc. 2010 Mar 1;17(2):124–30.
    1. Chute CG, Pathak J, Savova GK, Bailey KR, Schor MI, Hart LA, et al. The SHARPn project on secondary use of Electronic Medical Record data: Progress, plans, and possibilities. AMIA. Annual Symposium proceedings/AMIA Symposium AMIA Symposium. 2011;2011:248–56.
    1. Holzer K, Gall W. Utilizing IHE-based Electronic Health Record systems for secondary use. Methods of information in medicine. 2011;50:319–25.
    1. Roth C, Shivade CP, Foraker RE, Embi PJ. Integrating population- and patient-level data for secondary use of electronic health records to study overweight and obesity. Stud Health Technol Inform. 2013;192:1100.
    1. Johnson EK, Broder-Fingert S, Tanpowpong P, Bickel J, Lightdale JR, Nelson CP. Use of the i2b2 research query tool to conduct a matched case–control clinical research study: advantages, disadvantages and methodological considerations. BMC Med Res Methodol. 2014 Jan 30;14:16.
    1. Helmer KG, Ambite JL, Ames J, Ananthakrishnan R, Burns G, Chervenak AL, et al. Enabling collaborative research using the Biomedical Informatics Research Network (BIRN) Journal of the American Medical Informatics Association : JAMIA. 2011 Jul;18:416–22.
    1. McMurry AJ, Murphy SN, MacFadden D, Weber G, Simons WW, Orechia J, et al. SHRINE: Enabling Nationally Scalable Multi-Site Disease Studies. PLoS ONE. 2013 Mar 7;8(3):e55811.
    1. Ross TR, Ng D, Brown JS, Pardee R, Hornbrook MC, Hart G, et al. The HMO Research Network Virtual Data Warehouse: A Public Data Model to Support Collaboration. eGEMs (Generating Evidence & Methods to improve patient outcomes) [Internet] 2014. Mar 24, [cited 2014 Apr 16];2(1). Available from: .
    1. NIH Collaboratory Health Care Systems Research Collaboratory home page [Internet] [cited 2016 Jun 8]. Available from: .
    1. De Moor G, Sundgren M, Kalra D, Schmidt A, Dugas M, Claerhout B, et al. Using electronic health records for clinical research: the case of the EHR4CR project. J Biomed Inform. 2015 Feb;53:162–73.
    1. van der Lei J. Use and abuse of computer-stored medical records. Methods Inf Med. 1991 Apr;30(2):79–80.
    1. Brennan PF, Stead WW. Assessing data quality: from concordance, through correctness and completeness, to valid manipulatable representations. J Am Med Inform Assoc. 2000 Jan;7:106–7.
    1. Kahn MG, Eliason BB, Bathurst J. Quantifying clinical data quality using relative gold standards. AMIA. Annual Symposium proceedings / AMIA Symposium AMIA Symposium. 2010;2010:356–60.
    1. Hersh WR, Weiner MG, Embi PJ, Logan JR, Payne PRO, Bernstam EV, et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care. 2013 Aug;51(8 Suppl 3):S30–37.
    1. Hogan WR, Wagner MM. Accuracy of data in computer-based patient records. J Am Med Inform Assoc. 1997 Oct;4(5):342–55.
    1. Aronsky D, Haug PJ. Assessing the quality of clinical data in a computer-based record for calculating the pneumonia severity index. J Am Med Inform Assoc. 2000 Feb;7(1):55–65.
    1. Arts D, de Keizer N, Scheffer G-J, de Jonge E. Quality of data collected for severity of illness scores in the Dutch National Intensive Care Evaluation (NICE) registry. Intensive Care Med. 2002 May;28(5):656–9.
    1. Thiru K, Hassey A, Sullivan F. Systematic review of scope and quality of electronic patient record data in primary care. BMJ. 2003 May 15;326(7398):1070.
    1. Hasan S, Padman R. Analyzing the effect of data quality on the accuracy of clinical decision support systems: a computer simulation approach. AMIA Annu Symp Proc. 2006:324–8.
    1. Cruz-Correia RJ, Rodrigues P, Freitas A, Almeida FC, Chen R, Costa-Pereira A. Data quality and integration issues in electronic health records. In: Hristidis V, editor. Information discovery on electronic health records. Chapman and Hall/CRC; 2009. pp. 55–95.
    1. Botsis T, Hartvigsen G, Chen F, Weng C. Secondary use of EHR: Data quality issues and informatics opportunities. AMIA Summits on Translational Science proceedings AMIA Summit on Translational Science. 2010;2010:1–5.
    1. Hripcsak G, Knirsch C, Zhou L, Wilcox A, Melton G. Bias associated with mining electronic health records. J Biomed Discov Collab. 2011;6:48–52.
    1. Brown JS, Kahn M, Toh S. Data quality assessment for comparative effectiveness research in distributed data networks. Medical care. 2013;51:S22–S29.
    1. Rusanov A, Weiskopf NG, Wang S, Weng C. Hidden in plain sight: bias towards sick patients when sampling patients with sufficient electronic health record data for research. BMC Med Inform Decis Mak. 2014;14:51.
    1. Kahn MG, Ranade D. The impact of electronic medical records data sources on an adverse drug event quality measure. J Am Med Inform Assoc. 2010 Mar 1;17(2):185–91.
    1. Chan KS, Fowles JB, Weiner JP. Electronic Health Records and Reliability and Validity of Quality Measures: A Review of the Literature. Medical Care Research and Review [Internet] 2010. Feb, [cited 2010 Jul 6]; Available from: .
    1. Parsons A, McCullough C, Wang J, Shih S. Validity of electronic health record-derived quality measurement for performance monitoring. Journal of the American Medical Informatics Association. 2012 Jul 1;19(4):604–9.
    1. Brown JS, Chun A, Davidson BN, Holve E, Kahn MG, Hamilton Lopez M, et al. Recommendations for transparent reporting of data quality assessment results for observational healthcare data. eGEMs (Generating Evidence & Methods to improve patient outcomes) 2015. accepted for publication;
    1. Wang R, Strong D. Beyond accuracy: What data quality means to data consumers. J Management Information Systems. 1996;12:5–34.
    1. Redman TC. Data Quality: The Field Guide. Boston: Digital Press; 2001. p. 241.
    1. Kahn MG, Raebel MA, Glanz JM, Riedlinger K, Steiner JF. A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research. Medical care. 2012 Jul;50(Suppl):S21–9.
    1. Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. Journal of the American Medical Informatics Association : JAMIA. 2013 Jan 1;20:144–51.
    1. Liaw ST, Rahimi A, Ray P, Taggart J, Dennis S, de Lusignan S, et al. Towards an ontology for data quality in integrated chronic disease management: A realist review of the literature. International Journal of Medical Informatics. 2013 Jan;82(1):10–24.
    1. Sadiq S, editor. Handbook of Data Quality. Berlin, Heidelberg: Springer Berlin Heidelberg; 2013.
    1. Weiskopf NG, Hripcsak G, Swaminathan S, Weng C. Defining and measuring completeness of electronic health records for secondary use. J Biomed Inform. 2013 Oct;46(5):830–6.
    1. Rahimi A, Liaw S-T, Ray P, Taggart J, Yu H. Ontological specification of quality of chronic disease data in EHRs to support decision analytics: a realist review. Decision Analytics. 2014 Feb 19;1(1):5.
    1. Zozus MN, Hammond WE, Green BB, Kahn MG, Richesson RL, Rusincovitch SA, et al. Assessing Data Quality for Healthcare Systems Data Used in Clinical Research (Version 1.0) [Internet] [cited 2014 Sep 13] Available from:
    1. Johnson SG, Speedie S, Simon G, Kumar V, Westra BL. A Data Quality Ontology for the Secondary Use of EHR Data. Proceedings 2015 American Medical Informatics Association Fall Symposium; San Francisco, CA. accepted for publication.
    1. Kahn MG, Brown J, Chun A, Davidson B, Meeker D, Ryan P, et al. Transparent Reporting of Data Quality in Distributed Data Networks. eGEMs (Generating Evidence & Methods to improve patient outcomes) [Internet] 2015 Mar 23;3(1) Available from: .
    1. Mini-Sentinel Coordinating Center Mini-Sentinel Standard Operating Procedure: Data Quality Checking and Profiling. 20 Available from: .
    1. Observational Medical Outcomes Partnership OSCAR - Observational Source Characteristics Analysis Report (OSCAR) Design Specification and Feasibility Assessment [Internet] 2011. [cited 2013 Apr 1]. Available from: .
    1. Observational Medical Outcomes Partnership Generalized Review of OSCAR Unified Checking [Internet] 2011. [cited 2013 Apr 1]. Available from: .
    1. Canadian Institute for Health Information . The CIHI data quality framework [Internet] Ottawa, Ont.: CIHI; 2009. Available from: .
    1. Nahm M. Data quality in clinical research In: Clinical Research Informatics. London: Springer-Verlag; 2012. pp. 175–201. (Health Informatics).
    1. Maydanchik A. Data quality assessment. Bradley Beach, NJ: Technics Publications; 2007. p. xiv.p. 321. (Data quality for practitioners series).
    1. Magnusson D, Bergman LR. European Network on Longitudinal Studies on Individual Development Data quality in longitudinal research Cambridge [England] New York: Cambridge University Press; 1990. p. xii.p. 285.
    1. Singh S. Evaluation of data quality. [London, England]: International Statistical Institute by Oxford University Press; 1987. pp. 618–643.
    1. Little RJ, Rubin DB. Statistical Analysis with Missing Data. 2nd Ed. New York, N.Y: Wiley; 2002. (Series in probability and statistics; Vol. 1).
    1. Haziza D. Imputation and Inference in the Presence of Missing Data. In: Rao CR, editor. Handbook of Statistics [Internet] New York, N.Y: Elsevier; 2009. pp. 215–46. [cited 2013 May 6]. Available from: .
    1. Donders ART, van der Heijden GJMG, Stijnen T, Moons KGM. Review: A gentle introduction to imputation of missing values. Journal of Clinical Epidemiology. 2006 Oct;59(10):1087–91.
    1. McGilvray D. Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information. 1 edition. Amsterdam; Boston: Morgan Kaufmann; 2008. p. 352.
    1. Eppler M, Helfert M. A Classification and Analysis of Data Quality Costs. Miami; Florida, USA: 2004.
    1. Juran JM, Godfrey AB, editors. Juran’s quality handbook. 5th ed. New York: McGraw Hill; 1999. p. 1. (McGraw-Hill handbooks).
    1. International Organization for Standardization Accuracy (trueness and precision) of measurement methods and results-Part 1: General principles and definitions. ISO. 1994. Report No.: ISO 5725-1.
    1. Menditto A, Patriarca M, Magnusson B. Understanding the meaning of accuracy, trueness and precision. Accreditation and Quality Assurance. 2007 Jan 9;12(1):45–7.
    1. Johnson KE, Kamineni A, Fuller S, Olmstead D, Wernli KJ. How the Provenance of Electronic Health Record Data Matters for Research: A Case Example Using System Mapping. eGEMs (Generating Evidence & Methods to improve patient outcomes) [Internet] 2014. Apr 16, [cited 2016 Mar 13];2(1). Available from: .
    1. Creswell JW. Qualitative, Quantitative, Mixed Methods Approaches (Crewell, Research Design: Qualitative, Quantitative, and Mixed Methods Approaches) 4th edition. Thousand Oaks, California: SAGE Publications, Inc; 2013. p. 273. Fourth Edition edition.
    1. Kahn MG, Batson D, Schilling LM. Data model considerations for clinical effectiveness researchers. Medical care. 2012 Jul;50(Suppl):S60–7.
    1. Observational Health Data Sciences and Informatics (OHDSI) ACHILLES for data characterization | OHDSI [Internet] [cited 2015 Jul 6]. Available from:

Source: PubMed

3
Subscribe