Development of a large-scale de-identified DNA biobank to enable personalized medicine

D M Roden, J M Pulley, M A Basford, G R Bernard, E W Clayton, J R Balser, D R Masys, D M Roden, J M Pulley, M A Basford, G R Bernard, E W Clayton, J R Balser, D R Masys

Abstract

Our objective was to develop a DNA biobank linked to phenotypic data derived from an electronic medical record (EMR) system. An "opt-out" model was implemented after significant review and revision. The plan included (i) development and maintenance of a de-identified mirror image of the EMR, namely, the "synthetic derivative" (SD) and (ii) DNA extracted from discarded blood samples and linked to the SD. Surveys of patients indicated general acceptance of the concept, with only a minority ( approximately 5%) opposing it. As a result, mechanisms to facilitate opt-out included publicity and revision of a standard "consent to treatment" form. Algorithms for sample handling and procedures for de-identification were developed and validated in order to ensure acceptable error rates (<0.3 and <0.1%, respectively). The rate of sample accrual is 700-900 samples/week. The advantages of this approach are the rate of sample acquisition and the diversity of phenotypes based on EMRs.

Conflict of interest statement

CONFLICT OF INTEREST

The authors declared no conflict of interest.

Figures

Figure 1
Figure 1
Examples of under- and overmarking. The original text is shown on the left and the result of the scrubbing process is shown in the middle. The target text and the result of scrubbing are highlighted in red. (Lortab; Mikart, Atlanta, GA.)
Figure 2
Figure 2
A descriptive example of a record in the synthetic derivative (SD) described in the text. The arrows indicate examples of scrubbing: the medical record number has been removed (black), the social security and phone numbers have been masked (blue), names have been changed (purple), and dates have been shifted (red) as described in Methods.
Figure 3
Figure 3
Synthetic derivative interrogation tool. Search criteria are entered in the blue box, and entries and potential records are returned, with the “keywords in context” shown below. The user then has the option of including the record in the sample set to be analyzed. (Ciprofloxacin; Bayer HealthCare, West Haven, CI.)
Figure 4
Figure 4
Mechanism for linking DNA samples and patient-related information in a de-identified fashion. The approach depends on the use of a one-way hash, an algorithm that always generates the same 128-character code (the research unique identifier, RUI) when the same medical record number is used as input. The medical record number on barcoded blood samples that are about to be discarded is scanned, eligible samples are relabeled with the RUI, and DNA is extracted and stored. The medical record number in each patient’s record is replaced by the RUI, and the record is de-identified to create the synthetic derivative described in the text.
Figure 5
Figure 5
Program review. The program plan was reviewed by the Office for Human Research Protections (OHRP) and the Institutional Review Board (IRB). The IRB recommended further review from the standpoint of ethics, and the Ethics Review recommendations included the formation of a Community Advisory Board. The IRB, Ethics, and Community reviews resulted in program revisions, and these are ongoing.

Source: PubMed

3
구독하다