Sharing raw data from clinical trials: what progress since we first asked “Whose data set is it anyway?”

Ten years ago, one of the first papers published in Trials was a commentary entitled “Whose data set is it anyway?” The commentary pointed out that trialists routinely refused requests for data sharing and argued that this attitude was a community standard that had no rational basis. At the time, there had been few calls for clinical trial data sharing and certainly no institutional support. Today the situation could not be more different. Numerous organizations now recommend or require raw data to be made available, including the International Committee of Medical Journal Editors, which recently proposed that clinical trial data sharing be a “condition of … publication.” Furthermore, the literature is replete with papers covering an enormously wide variety of topics on data sharing. But despite a tectonic shift in attitudes, we are yet to see clinical trial data sharing become an unquestioned norm, where a researcher can readily download a data set from a trial almost as easily as they can now download a copy of the published paper. The battle over the next few years is to go beyond changing minds to ensuring that real data sets are routinely made available.


Background
It is often said that people don't change. Indeed, it is almost a point of academic pride to be cynical about our capacity for change, and to view optimists as naïve and callow. Yet change often happens remarkably quickly. In 2004, President George W Bush used opposition to gay marriage to motivate his supporters; in 2015, the Supreme Court legalized gay marriage with the majority support of the American public. In 1988, Jesse Jackson's run for president was considered a token exercise, leading Newsweek magazine to ask: "What makes Jesse run?"; 2008 saw the election of Barack Obama I think we have seen similarly rapid change in terms of attitudes to data sharing. Ten years ago, one of the first papers published in Trials was a commentary entitled "Whose data set is it anyway?" [1]. The commentary pointed out that trialists tended to see trial data as their personal property and would routinely refuse requests for data sharing. As just one example, a National Institutes of Health (NIH) investigator refused to release data from the control group of a published trial, requested to help the sample size calculation for a new study. Anecdotes were complemented by survey data showing that three quarters of trialists, as well as pharmaceutical industry groups, were opposed to making raw data available after trial publication.
The key argument of the commentary was that this attitude was a community standard that had no rational basis. Arguments against data sharing were entirely trivial, such as spurious concerns about patient confidentialityin most cases, it is straightforward to deidentify a data setor complaints about the time and effort an investigator would have to invest in making a data set ready for sharing (would they not already have had to do so in order to analyze the data for publication?). Moreover, other disciplines, from genomic researchers to economists, routinely made data freely available. The clinical trialists did not share data because that is not what clinical trialists did, a social norm not much different in form from attitudes towards gays and blacks.
At the time of the 2006 Trials commentary, only a handful of papers had previously called for data sharing. There was a paper published 10 years previously in the BMJ, the title of which, in a case of inadvertent plagiarism, the commentary had mirrored [2]. Kirwan's review of data-sharing attitudes [3] had been cited in the commentary and, of course, one could go all the way back to the first issue of Biometrika, in which Galton called for publication of data alongside the primary analyses [4]. It might also be noted that at the time of the commentary, no major institution had called for clinical trial data sharing to be a matter of course.

Main text
Today, ten years on from publication of the Trials commentary, the situation could not be more different: numerous organizations now recommend or require raw data to be made available, and the literature is replete with papers covering an enormously wide variety of topics on data sharing. In terms of recommendations, clinical trial data sharing has been the subject of a report from the Institute of Medicine, [5,6] which recommends, among other things, that funders should require trialists to share data and provide appropriate support to do so. Funders have certainly shown interest, with a group of 17 funders led by the Wellcome Trust publishing a "statement of purpose" on data sharing, including a set of principles [7]. Some funders have gone beyond principles: the National Health, Lung and Blood Institute [8], for instance, has developed specific data-sharing practices and a data repository currently including over half a million patients from over 100 trials and observational studies. In a recent, dramatic development, the International Committee of Medical Journal Editors has recommended that as a "condition of … publication" of a trial report, journals will "require authors to share with others the deidentified individual-patient data no later than 6 months after publication" [9]. If fully enacted, this recommendation would transform the landscape of clinical trial data sharing. The BMJ has already taken the lead, with a policy that now requires data sharing "on request" for all trials [10]. Some pharmaceutical groups are following suit, with Roche stating that they will provide individual patient data from clinical trials in response to requests with "good scientific merit" [11]. Project Data Sphere [12] is an industry-led initiative to provide a software platform for clinical trial data sharing, and initiatives by GSK and Medtronic to share clinical trial data have received wide praise [10].
Alongside these initiatives and recommendations, a substantial literature has been published that investigates data sharing as a research topic. We have seen papers developing data standards for clinical trials in narrow fields (for instance, polycystic kidney disease [13] and spinal cord injury [14]); technical papers on deidentification [15]; numerous surveys about the practice of or attitudes to data sharing [16][17][18][19][20][21]; discussion of ethical issues [22] (including those pertaining to highly localized issues in countries such as South Africa [23] or Vietnam [24]); and practical guidance on how to share data [25,26].
All that said, the war is far from won: attitudes have shifted dramatically, tectonically, but we are yet to see clinical trial data sharing become an unquestioned norm, where, say, a researcher can readily download a data set from a trial almost as easily as they can now download the trial publication. And there are still battles to be fought: the Pharmaceutical Research and Manufacturers of America, for instance, claims to be "firmly committed to enhancing public health" but current guidelines on communication of trial results [27] speak of making clinical trial data accessible only to investigators.

Conclusions
I draw three conclusions from my experiences in promoting clinical trial data sharing. First, we are blessed to be working in a discipline in which reason matters, and where individuals will change their attitudes when presented with sound arguments. Second, dramatic cultural change is indeed possible within a short period of time, if the cause is just. Third, changing attitudes is not enough. In the "states of change" model describing how, say, a smoker quits smoking cigarettes, "contemplation" and "preparation" need to be followed by "action" and "maintenance." The 2006 commentary ended: "Let's make sharing of raw data a commonplace, natural part of the clinical trials process, in the same way that we view obtaining ethical approval or publication of the trial results." Our job over the next decades will be to make sure, first, that this does indeed happen and, second, that it stays that way.