An i2b2-based, generalizable, open source, self-scaling chronic disease registry

Marc D Natter, Justin Quan, David M Ortiz, Athos Bousvaros, Norman T Ilowite, Christi J Inman, Keith Marsolo, Andrew J McMurry, Christy I Sandborg, Laura E Schanberg, Carol A Wallace, Robert W Warren, Griffin M Weber, Kenneth D Mandl, Marc D Natter, Justin Quan, David M Ortiz, Athos Bousvaros, Norman T Ilowite, Christi J Inman, Keith Marsolo, Andrew J McMurry, Christy I Sandborg, Laura E Schanberg, Carol A Wallace, Robert W Warren, Griffin M Weber, Kenneth D Mandl

Abstract

Objective: Registries are a well-established mechanism for obtaining high quality, disease-specific data, but are often highly project-specific in their design, implementation, and policies for data use. In contrast to the conventional model of centralized data contribution, warehousing, and control, we design a self-scaling registry technology for collaborative data sharing, based upon the widely adopted Integrating Biology & the Bedside (i2b2) data warehousing framework and the Shared Health Research Information Network (SHRINE) peer-to-peer networking software.

Materials and methods: Focusing our design around creation of a scalable solution for collaboration within multi-site disease registries, we leverage the i2b2 and SHRINE open source software to create a modular, ontology-based, federated infrastructure that provides research investigators full ownership and access to their contributed data while supporting permissioned yet robust data sharing. We accomplish these objectives via web services supporting peer-group overlays, group-aware data aggregation, and administrative functions.

Results: The 56-site Childhood Arthritis & Rheumatology Research Alliance (CARRA) Registry and 3-site Harvard Inflammatory Bowel Diseases Longitudinal Data Repository now utilize i2b2 self-scaling registry technology (i2b2-SSR). This platform, extensible to federation of multiple projects within and between research networks, encompasses >6000 subjects at sites throughout the USA.

Discussion: We utilize the i2b2-SSR platform to minimize technical barriers to collaboration while enabling fine-grained control over data sharing.

Conclusions: The implementation of i2b2-SSR for the multi-site, multi-stakeholder CARRA Registry has established a digital infrastructure for community-driven research data sharing in pediatric rheumatology in the USA. We envision i2b2-SSR as a scalable, reusable solution facilitating interdisciplinary research across diseases.

Conflict of interest statement

Competing interests: None.

Figures

Figure 1
Figure 1
The i2b2-SSR architecture. The diagram illustrates the interaction of i2b2-SSR core web services C and D, which are customized, i2b2-SSR ‘drop-in’ replacements for the standard SHRINE Broadcaster/Aggregator and i2b2 Project Manager Cell, respectively. In coordination with the i2b2-SSR Overlay Service (E), these modules support introduction of peer-group overlays for sharing of multiple datasets (I) using standard i2b2 nodes and SHRINE adapters (detail H). The authorized end-user (A) constructs a query based on shared ontologies that are pre-defined for the shared datasets. The Shared Ontology Service (F) may employ a standard i2b2 Ontology cell; alternatively, we provide an i2b2 Ontology module with the i2b2-SSR distribution that implements memory-based caching with ontology term search and autocomplete capabilities.
Figure 2
Figure 2
End user query interface. The Site Investigator dashboard view is shown, illustrating a sample visualization of summary statistics for site ‘ABC’ versus the entire CARRAnet registry.
Figure 3
Figure 3
Self-scaling architecture—adding new sites (network nodes) and/or studies to an i2b2-SSR network. With appropriate approvals, a Site Administrator (K) configures the local SHRINE adapter to communicate with a particular registry Broadcaster/Aggregator endpoint (C) and installs a digital certificate distributed by a certificate authority (L) that is mutually trusted by the site and the i2b2-SSR Network Administrator (J).
Figure 4
Figure 4
CARRA Registry, selected demographics (as of February 2012, data from 53 sites), see also table 1. (A) Distribution of subject enrollment by site. The majority of registry subjects (3647 out of 6175 total subjects enrolled, or ∼60%) are found at sites enrolling

References

    1. Gliklich R, Dreyer N, eds Registries for Evaluating Patient Outcomes: A User's Guide. (Prepared by Outcome DEcIDE Center [Outcome Sciences, Inc. d/b/a Outcome] under Contract No. HHSA29020050035I TO3). AHRQ Publication No. 07-EHC001-1. Rockville, MD: Agency for Healthcare Research and Quality, April 2007.
    1. Kohane IS, Churchill SE, Murphy SN. A translational engine at the national scale: informatics for integrating biology and the bedside. J Am Med Inform Assoc 2012;19:181–5
    1. Smith MY, Sobel RE, Wallace CA. Monitoring the long-term safety of therapies for children with juvenile idiopathic arthritis: time for a consolidated patient registry. Arthritis Care Res (Hoboken) 2010;62:800–4
    1. United States, Food and Drug Administration Public Workshop on Developing a Consolidated Pediatric Rheumatology Observational Registry [Internet]. 2009. Report No.: FDA-2009-N-0145–0055.
    1. United States, Department of Health and Human Services FAQ on Public Health Registries [Internet]. National Committee on Vital and Health Statistics. (accessed 20 Dec 2011).
    1. United States, National Institutes of Health Catalog of NIH Funded Databases, Disease Registries, and Biomedical Information Resources, 2008–2009 [Internet]. Research Portfolio Online Reporting Tools (RePORT), 2010. (accessed 20 Dec 2011).
    1. United States, Department of Health and Human Services Centers for Disease Control and Prevention, Justification of Estimates for Appropriation Committees [Internet]. Fiscal Year, 2011.
    1. Blackstone E, Lenat D, Ishwaran H. Methods that need to be developed. In: Olsen LA, Grossmann C, McGinnis JM. IOM (Institute of Medicine). Learning What Works: Infrastructure Required for Comparative Effectiveness Research: Workshop Summary. Washington (DC): National Academies Press (US), 2011:123–44
    1. United States, National Committee on Vital and Health Statistics, Subcommittee on Privacy and Confidentiality Roundtable Discussion: Health and Medical Registries [Transcript of Proceedings]. Washington, DC, 1998.
    1. Wager E. Recognition, reward and responsibility: why the authorship of scientific papers matters. Maturitas 2009;62:109–12
    1. Ross RG, Greco-Sanders L, Laudenslager M, et al. An institutional postdoctoral research training program: predictors of publication rate and federal funding success of its graduates. Acad Psychiatry 2009;33:234–40
    1. Liu Y, Ascoli G. Rhyme and the Reason of Data Sharing: A Satellite Symposium of the 2007 Society for Neuroscience Annual Meeting. Bethesda, MD: National Institute of Neurological Disorders and Stroke (NINDS), 2007
    1. Freudenheim M. National Registry Is a Tool in the Fight on Cystic Fibrosis. New York: The New York Times [Internet], 2009. Sect. D:1.
    1. Gawande A. Annals of medicine: the bell curve. The New Yorker [Internet] 6 December 2004.
    1. Potash J, Toolan J, Steele J, et al. The bipolar disorder phenome database: a resource for genetic studies. Am J Psychiatry 2007;164:1229–37
    1. Arenson AD, Bakhireva LN, Chambers CD, et al. Implementation of a shared data repository and common data dictionary for fetal alcohol spectrum disorders research. Alcohol 2010;44:643–7
    1. Murphy S, Churchill S, Bry L, et al. Instrumenting the health care enterprise for discovery research in the genomic era. Genome Res 2009;19:1675–81
    1. Weber GM, Murphy SN, McMurry AJ, et al. The shared health research information network (SHRINE): a prototype federated query tool for clinical data repositories. J Am Med Inform Assoc 2009;16:624–30
    1. Reis BY, Kirby C, Hadden LE, et al. AEGIS: a robust and scalable real-time public health surveillance system. J Am Med Inform Assoc 2007;14:581–8
    1. Biomedical Informatics Research Network (BIRN) [Internet] History. Biomedical Informatics Research Network. (accessed 21 Dec 2011).
    1. Namini AH, Berkowicz DA, Kohane IS, et al. A submission model for use in the indexing, searching, and retrieval of distributed pathology case and tissue specimens. Stud Health Technol Inform 2004;107:1264–7
    1. BIRT Project [Internet] The Eclipse Foundation. 2012. (accessed 14 Apr 2012).
    1. Team R. R: A Language and Environment for Statistical Computing [Internet]. Vienna: R Foundation for Statistical Computing, 2012. (accessed 14 Apr 2012).
    1. Urbanek S. Rserve–a fast way to provide R functionality to applications. In: Hornik K, Leisch F, Zeileis A, eds. Proc. Of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) [Internet]. Vienna: Technische Universität Wien, 2003. (accessed 14 Apr 2012).
    1. McMurry A. System Use Cases - SHRINE [Internet]. Open.med. 2011. (accessed 20 May 2012).
    1. Simard JF, Neovius M, Hagelberg S, et al. Juvenile idiopathic arthritis and risk of cancer: a nationwide cohort study. Arthritis Rheum 2010;62:3776–82
    1. CARRA Policies [Internet] Childhood Arthritis & Rheumatology Research Alliance. (accessed 8 Apr 2012).
    1. Wallace CA, Ilowite NT. Project Information: 1RC1AR058605–01. NIH RePORTER—NIH Research Portfolio Online Reporting Tools Expenditures and Results [Internet]. National Institutes of Health. (accessed 14 Apr 2012).
    1. Mina R, Scheven von E, Ardoin SP, et al. Consensus treatment plans for induction therapy of newly diagnosed proliferative lupus nephritis in juvenile systemic lupus erythematosus. Arthritis Care Res (Hoboken) 2012;64:375–83
    1. Huber AM, Robinson AB, Reed AM, et al. Consensus treatments for moderate juvenile dermatomyositis: beyond the first two months. Results of the second Childhood Arthritis and Rheumatology Research Alliance consensus conference. Arthritis Care Res (Hoboken) 2012;64:546–53
    1. Huber AM, Giannini EH, Bowyer SL, et al. Protocols for the initial treatment of moderately severe juvenile dermatomyositis: results of a Children's Arthritis and Rheumatology Research Alliance Consensus Conference. Arthritis Care Res (Hoboken) 2010;62:219–25
    1. DeWitt EM, Kimura Y, Beukelman T, et al. Consensus treatment plans for new-onset systemic juvenile idiopathic arthritis. Arthritis Care Res (Hoboken). Published Online First: 30 January 2012.
    1. Beukelman T, Patkar NM, Saag KG, et al. 2011 American College of Rheumatology recommendations for the treatment of juvenile idiopathic arthritis: initiation and safety monitoring of therapeutic agents for the treatment of arthritis and systemic features. Arthritis Care Res (Hoboken) 2011;63:465–82
    1. Li SC, Feldman BM, Higgins GC, et al. Treatment of pediatric localized scleroderma: results of a survey of North American pediatric rheumatologists. J Rheumatol 2010;37:175–81
    1. Saltz J, Oster S, Hastings S, et al. caGrid: design and implementation of the core architecture of the cancer biomedical informatics grid. Bioinformatics 2006;22:1910–16
    1. Curtis LH, Weiner MG, Boudreau DM, et al. Design considerations, architecture, and use of the Mini-Sentinel distributed data system. Pharmacoepidemiol Drug Saf 2012;21(Suppl 1):23–31
    1. Platt R, Carnahan RM, Brown JS, et al. The US Food and Drug Administration's Mini-Sentinel program: status and direction. Pharmacoepidemiol Drug Saf 2012;21(Suppl 1):1–8
    1. National Database for Autism Research [Internet] National Institutes of Health (US). (accessed 13 Mar 2012).
    1. Olive M, Rahmouni H, Solomonides T, et al. SHARE road map for HealthGrids: methodology. Int J Med Inform 2009;78(Suppl 1):S3–12
    1. CONNECT Community Portal [Internet] Office of the National Coordinator for Health Information Technology. US Dept of Health and Human Services. (accessed 14 Apr 2012).
    1. Query Health Initiative [Internet] Standards & Interoperability (S&I) Framework. (accessed 13 Mar 2012).
    1. IOM (Institute of Medicine) Digital Infrastructure for the Learning Health System: The Foundation for Continuous Improvement in Health and Health Care: Workshop Series Summary. Washington (DC): National Academies Press (US), 2011
    1. IOM (Institute of Medicine) Learning What Works: Infrastructure Required for Comparative Effectiveness Research: Workshop Summary. Washington (DC): National Academies Press (US), 2011
    1. Green ED, Guyer MS; National Human Genome Research Institute Charting a course for genomic medicine from base pairs to bedside. Nature 2011;470:204–13
    1. Mandl KD, Kohane IS. No small change for the health information economy. N Engl J Med 2009;360:1278–81
    1. McMurry AJ, Gilbert CA, Reis BY, et al. A self-scaling, distributed information architecture for public health, research, and clinical care. J Am Med Inform Assoc 2007;14:527–33

Source: PubMed

3
Abonner