General results
For our 18 trials a small proportion of all the data items collected, a median of 5.0%, were for the primary outcome assessment—the outcome assessment considered by the trialists themselves to be of the highest importance in their trial. The median of 5.0% is the better measure due to one trial, ViDiFlu, devoting half of its data items to the primary outcome.
Clearly the primary outcome is not the only thing that is important. One outcome is unlikely to provide all the information needed to make a judgement about the treatment being tested, meaning secondary outcomes are necessary. We also need to know something about harms, as well as cost. Participant identifiers and some data management and process information will also always be needed. Trials may also need to measure more than a single primary outcome.
In our sample the median proportion of secondary outcome data items is eight times that of the primary outcome. The ratio of non-outcome data to primary outcome data was similar. This might be fine. However, the undeniable design importance of the primary outcome, together with its importance to external judgements about the utility of the intervention, makes this distribution of attention look odd. At a minimum it is worthy of some reflection.
Our study raises three key questions:
- 1.
Given that the primary outcome is the most important and (likely) the only fully powered outcome, is the substantially larger proportion of data collected for secondary outcomes justified?
- 2.
Do people really appreciate how much non-outcome data trials collect?
- 3.
Does volume of data correlate with data collection effort?
Our study answers none of these questions. However, it does highlight how important it is to try to answer them. Data collection itself is hard work, and it generates additional work by requiring data management systems, quality assurance and, usually, data entry to deal with it. Given the undoubted importance of the primary outcome, we need to be sure that all outcomes in our set of secondary outcomes—many, if not all, underpowered—are worth the effort. If data collection effort does relate to data volume, then it seems disproportionate for trial teams to devote around eight times as much effort on the secondary outcomes as on the primary. Secondary outcomes may support understanding of the primary outcome result, but they are not the outcomes that trialists themselves consider to be of most importance. A new Trial Forge project called ORINOCO (Optimising Resource Use in Outcome Collection) will look at data collection effort by collecting time spent collecting primary and secondary outcomes (https://www.abdn.ac.uk/hsru/what-we-do/research/projects/orinoco-826.php).
Why trial teams collect so much data is unclear but, anecdotally, we (and others) know that some investigators believe that since the participant is providing data anyway, why not collect a few more things? The work involved in doing this is unlikely to be borne by the person making the request. Items unrelated to the original aims of the trial are added, and trial team members have their own interests, and each adds something to the data collection job. Additional items can be added by Trial Steering Groups and, for that matter, funders. The tendency always seems to be upwards when it comes to data collection.
That said, our own anecdotal experience, and that of others [16], is that when the going gets tough with outcome collection, trial teams quickly start to focus on getting just primary outcome data from participants. This brings into stark focus the relative importance of those secondary outcomes. Secondary outcomes can address related questions and provide context in which to interpret the primary outcome, but we need to keep their relative importance in mind when selecting how many of them to collect. For definitive Phase III trials such as those we selected, a secondary outcome, like the primary, should be essential to the people (generally patients, healthcare professionals and policymakers) whose decisions the trial is intended to support. Anything else is garnish, which has clear resource implications in the cash-limited world of publicly funded research.
The amount of non-outcome data was a surprise. That a median of just under 5% of all data collected is linked to the participant ID was not a result we expected, nor was the finding that internal data management items (e.g. ticking a box if a process was completed) was almost 6%. Some of this cannot be avoided, but even here there is likely to be scope for efficiencies. For example, the proportion of data items linked to demographics ranged from 0 to 6.9%, with a median of 0.5%. Across most of our trials, around 2% of data were demographic. Trial designers should ask themselves at the beginning of trials what a reasonable volume of demographic (or other) data is, make sure they are resourced to collect at that level and have a clear use for these data once collected.
Our data underline that non-outcome data represent a substantial proportion of the data that participants need to provide and trial staff need to work with. Reducing the burden of trials on participants and staff was highlighted as an area in need of research to improve retention by the PRIORITY-2 project [10], and ways of assessing burden have been proposed [17]. One must also carefully choose the non-outcome data that will be collected so trial budgets and resource can be allocated proportionately. Those making policy and governance decisions about research (e.g. sponsors, regulators) need to weigh up their requirements for non-outcome data against the work needed to collect and manage it. Although we only included three CTIMPs, the impact of regulation (or at least how regulation is interpreted) on data collection workload is visible: for our three CTIMPS a median of 11.5% of all data items collected were classed as regulatory, compared to 1.9% for non-CTIMPs. Regulatory decisions are capable of directly leading to thousands of extra items of data collection across hundreds of trials. Regulators need to be confident that, on balance, their requirements do more good than harm and increase the transparency of their requirements. Grey areas around what is needed to meet the conditions imposed can lead to over-collection of data, as researchers may not be clear about what exactly is required. Some of the potential harm is workload, particularly if conservative interpretation, or misinterpretation, of legislation by research administrators adds additional but unnecessary data collection requirements [18].
Limitations and strengths
Our work has limitations. The 18 included trials were a convenience sample rather than a random sample of published trials. We quickly found that the categorisation process required someone close to each included trial, and choosing trials that none of the team knew of made categorisation difficult. As such, we do not claim that our results are representative of all trials. However, all the included trials are real trials, not hypothetical ones, and they vary enormously in terms of trial teams, intervention, sample size and follow-up durations. We would be surprised if the headline result of substantially more data items dedicated to secondary outcomes than primary outcomes was overturned in a bigger sample. Our categorisation method (see Additional file 1) can be replicated by others for their own trials, and we could perhaps build up a larger sample over time.
Only three CTIMPs were included in our sample, which limits what we can say about a comparison between CTIMPs and non-CTIMPs. There appear to be more regulatory and safety data collected in CTIMPs, but to determine exactly how much requires rather more CTIMP trials. Moreover, some regulators (e.g. the UK’s Medicines and Healthcare products Regulatory Agency, the MHRA) categorise CTIMPs by risk, which means that not only would we need a larger sample, but also a good mix of CTIMP risk categories.
The categories were not always exclusive, and much of the discrepancy discussion between the reviewers categorising each trial amounted to which category won out given that a case could be made for more than one. Our SOP and guidance provided some rules. Generally, we went with the emphasis given in the trial protocol for outcomes and tried to be consistent for non-outcome data. Different pairs of reviewers may have reached slightly different conclusions for some items of data, although we are confident that the process we used was as robust as it could be for these judgement-based decisions.
There are some strengths too. The study idea came from trial managers and addressed a question that was very important to their trial front-line experience: what sorts of data are collected in trials? We are confident of the importance of the topic covered by this work. The study also involved staff with diverse roles from seven trials units in three regions with differing regulatory environments (England, Ireland and Scotland), which brought a range of perspectives. We have created a set of data categories, a SOP, guidance and some templates (see Table 1 and Additional file 1) that others can now use to assess their own trials, including at the design stage. Finally, the project started some new collaborations and has some simple messages that we think are worth all trialists’ attention.
Implications for practice
-
1.
For Phase III trials, we think trialists should continue to consult with patients, health professionals and policymakers to identify the outcomes they will need to inform their future decisions about the usefulness of the intervention being tested. Trialists should then resist requests to add to this list without having a compelling reason for collecting data not essential to stakeholders’ future treatment decisions. Core outcome sets [19] may help.
-
2.
Trial teams could consider categorising the data they propose collecting on their CRFs before they start to use them. They could then check that the distribution of data volume is what they anticipated and want. This information would support decision-making for resource allocation to collect, process and quality control the data.
Implications for future research
-
1.
Measuring data collection effort. The time actually spent collecting data items is the focus of a new Trial Forge data collection project: ORINOCO will examine the time spent collecting primary and secondary outcomes (https://www.abdn.ac.uk/hsru/what-we-do/research/projects/orinoco-826.php).
-
2.
Expansion of this work to assess a larger number of trials potentially with a focus on CTIMPs, given the small sample included here, would be beneficial.
-
3.
The work described here did not assess whether collected data were actually used, or published. Doing so would be a useful addendum to this work, perhaps along the lines of the study done by O’Leary et al. [11].
-
4.
The impact on data volume and distribution of doing one or both items listed under ‘Implications for practice’ would be worth evaluating. Like anything else, they are only worth doing if they lead to benefit. We would anticipate that they will reduce both volume of data and number of outcomes, especially secondary outcomes, but this needs to be demonstrated.