It's all in the timing: calibrating temporal penalties for biomedical data sharing

Weiyi Xia, Zhiyu Wan, Zhijun Yin, James Gaupp, Yongtai Liu, Ellen Wright Clayton, Murat Kantarcioglu, Yevgeniy Vorobeychik, Bradley A Malin, Weiyi Xia, Zhiyu Wan, Zhijun Yin, James Gaupp, Yongtai Liu, Ellen Wright Clayton, Murat Kantarcioglu, Yevgeniy Vorobeychik, Bradley A Malin

Abstract

Objective: Biomedical science is driven by datasets that are being accumulated at an unprecedented rate, with ever-growing volume and richness. There are various initiatives to make these datasets more widely available to recipients who sign Data Use Certificate agreements, whereby penalties are levied for violations. A particularly popular penalty is the temporary revocation, often for several months, of the recipient's data usage rights. This policy is based on the assumption that the value of biomedical research data depreciates significantly over time; however, no studies have been performed to substantiate this belief. This study investigates whether this assumption holds true and the data science policy implications.

Methods: This study tests the hypothesis that the value of data for scientific investigators, in terms of the impact of the publications based on the data, decreases over time. The hypothesis is tested formally through a mixed linear effects model using approximately 1200 publications between 2007 and 2013 that used datasets from the Database of Genotypes and Phenotypes, a data-sharing initiative of the National Institutes of Health.

Results: The analysis shows that the impact factors for publications based on Database of Genotypes and Phenotypes datasets depreciate in a statistically significant manner. However, we further discover that the depreciation rate is slow, only ∼10% per year, on average.

Conclusion: The enduring value of data for subsequent studies implies that revoking usage for short periods of time may not sufficiently deter those who would violate Data Use Certificate agreements and that alternative penalty mechanisms may need to be invoked.

Keywords: biomedical data science; data sharing; economics of data; genomics; policy.

© The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Figures

Figure 1.
Figure 1.
The triage process for the data used in this study.
Figure 2.
Figure 2.
The probability densities of journal impact factor (JIF) and journal Eigenfactor score (JES).
Figure 3.
Figure 3.
The residual normal Q-Q plot of the log transformation of the journal impact factor (log2 (JIF)).
Figure 4.
Figure 4.
The residual normal Q-Q plot of the log transformation of the journal Eigenfactor score (log2 (JES)).
Figure 5.
Figure 5.
A scatterplot, with a LOESS smoothing curve, of the log transformation of the journal impact factor (log2 (JIF)) vs the period.
Figure 6.
Figure 6.
A scatterplot, with a LOESS smoothing curve, of log transformation of the journal Eigenfactor score (log2 (JES)) vs the period.

Source: PubMed

3
Abonneren