Skip to main content

COMPare: a prospective cohort study correcting and monitoring 58 misreported trials in real time

Abstract

Background

Discrepancies between pre-specified and reported outcomes are an important source of bias in trials. Despite legislation, guidelines and public commitments on correct reporting from journals, outcome misreporting continues to be prevalent. We aimed to document the extent of misreporting, establish whether it was possible to publish correction letters on all misreported trials as they were published, and monitor responses from editors and trialists to understand why outcome misreporting persists despite public commitments to address it.

Methods

We identified five high-impact journals endorsing Consolidated Standards of Reporting Trials (CONSORT) (New England Journal of Medicine, The Lancet, Journal of the American Medical Association, British Medical Journal, and Annals of Internal Medicine) and assessed all trials over a six-week period to identify every correctly and incorrectly reported outcome, comparing published reports against published protocols or registry entries, using CONSORT as the gold standard. A correction letter describing all discrepancies was submitted to the journal for all misreported trials, and detailed coding sheets were shared publicly. The proportion of letters published and delay to publication were assessed over 12 months of follow-up. Correspondence received from journals and authors was documented and themes were extracted.

Results

Sixty-seven trials were assessed in total. Outcome reporting was poor overall and there was wide variation between journals on pre-specified primary outcomes (mean 76% correctly reported, journal range 25–96%), secondary outcomes (mean 55%, range 31–72%), and number of undeclared additional outcomes per trial (mean 5.4, range 2.9–8.3). Fifty-eight trials had discrepancies requiring a correction letter (87%, journal range 67–100%). Twenty-three letters were published (40%) with extensive variation between journals (range 0–100%). Where letters were published, there were delays (median 99 days, range 0–257 days). Twenty-nine studies had a pre-trial protocol publicly available (43%, range 0–86%). Qualitative analysis demonstrated extensive misunderstandings among journal editors about correct outcome reporting and CONSORT. Some journals did not engage positively when provided correspondence that identified misreporting; we identified possible breaches of ethics and publishing guidelines.

Conclusions

All five journals were listed as endorsing CONSORT, but all exhibited extensive breaches of this guidance, and most rejected correction letters documenting shortcomings. Readers are likely to be misled by this discrepancy. We discuss the advantages of prospective methodology research sharing all data openly and pro-actively in real time as feedback on critiqued studies. This is the first empirical study of major academic journals’ willingness to publish a cohort of comparable and objective correction letters on misreported high-impact studies. Suggested improvements include changes to correspondence processes at journals, alternatives for indexed post-publication peer review, changes to CONSORT’s mechanisms for enforcement, and novel strategies for research on methods and reporting.

Peer Review reports

Background

Discrepancies between pre-specified and reported outcomes are an important and widespread source of bias in clinical trials [1]. Where outcome misreporting is permitted, it increases the likelihood that reported differences have arisen through chance or are exaggerated [2, 3]. Clinical trial registers were established to address selective reporting [4] and require that all pre-specified outcomes are entered at the outset of the trial in a time-stamped and publicly accessible location. Registering clinical trials and pre-specifying their outcomes are now mandated by legislation in numerous territories, including the US [5], with strong support from global organisations, including the World Health Organization (WHO) [6], the International Committee of Medical Journal Editors (ICMJE) [7] and an extensive range of professional bodies, funders, ethics committees, publishers, universities and legislatures [8]. The importance of reporting all pre-specified outcomes and documenting changes is also emphasised in the International Conference on Harmonisation of Good Clinical Practice (ICH-GCP) [9] and the detailed Consolidated Standards of Reporting Trials (CONSORT) guidelines on best practice in trial reporting [10], which are endorsed by 585 academic journals [11].

However, despite near universal recognition of the importance of this issue and extensive public commitments to address the problem, trial reports in academic journals routinely fail to report pre-specified outcomes, and add in non-pre-specified outcomes, without disclosing that this has occurred. A 2015 systematic review [1] found 27 studies comparing pre-specified outcomes against those reported, in cohorts of between 1 and 198 trials (median n = 65 trials). The median proportion of trials with a discrepancy on primary outcomes was 31% (interquartile range (IQR) 17–45%). Eight studies also assessed the impact of outcome switching on the statistical significance of the published outcome and found that outcome switching favoured the reporting of significant outcomes in half the trials. However, owing to lack of access to all measured outcomes, this biased reporting could not be assessed in many cases and therefore the reviewers concluded that this figure was likely to be an underestimate. The most common issues identified were failure to report a pre-specified outcome, publication of a non-pre-specified outcome, reporting a pre-specified primary outcome as a secondary outcome, and a change in the timing of a pre-specified outcome.

In the Centre for Evidence-Based Medicine Outcome Monitoring Project (COMPare), we aimed to assess the prevalence of outcome misreporting, as in previous research, and to explore whether it was possible to publish correction letters on all trials with misreported outcomes in real time, as they were published, in order to ensure that the academic record was more CONSORT-compliant, as per journals’ public commitments. We also aimed to monitor responses from editors and trialists to this standardised set of correction letters, to better understand why outcome misreporting persists despite public commitments to address it, and to test the ability of academic journals to self-correct when breaches of their public commitments are reported.

Methods

We set out to prospectively identify all trials published in five leading medical journals over a six-week period, identify every correctly and incorrectly reported outcome in every trial by comparing the published report against the published pre-trial protocol (or, where this was unavailable, the pre-trial registry entry), write a correction letter to the journal for publication on all misreported trials, and document the responses from journals. We used mixed methods combining quantitative analyses of the prevalence of flaws identified and quantitative and qualitative description of the responses of the journals to correspondence that notified them of misreporting. We used similar methods to assess the responses from trialists on the papers being assessed: these findings are reported in an accompanying paper.

Sample

We prospectively selected five leading academic journals regularly publishing randomised controlled trials (RCTs) from those currently listed as endorsing the CONSORT guidelines: New England Journal of Medicine (NEJM), The Lancet, Annals of Internal Medicine, Journal of the American Medical Association (JAMA), and the British Medical Journal (BMJ). All trials published between 19 October and 30 November 2015 were included. This sample frame was selected as it was likely to yield a sample of trials comparable to the median sample size of all studies in the most current systematic review on outcome misreporting [1] and reflected what was practically achievable with our team size and availability.

Coding of outcome reporting

Each published trial was allocated to one researcher (HD, AD, IM, ES and PH) who collected all relevant documents pertaining to that trial and archived them into a shared folder. This included the trial report, any appendices, a copy of the registry entry, and the trial protocol. As in previous work on outcome misreporting [1] and consistent with CONSORT’s requirements to declare all changes made after the start of a trial, we set out to identify the outcomes pre-specified before trial commencement. We searched initially for a published protocol dated before trial commencement or a subsequent protocol with a change log that allowed inference of pre-commencement outcomes. If this was not available, then we searched for a registry entry dated before trial commencement. If there were amendments to the registry entry, we accessed historical contents of the registry (for example, using the “archive” function on ClinicalTrials.gov) to find the most recent set of pre-specified outcomes dated before trial commencement. For each trial, the initial reviewer entered all pre-specified outcomes onto our data sheet for that study.

The trial report and appendices were then read in full and searched by the reviewer to establish whether each pre-specified outcome was reported and whether primary outcomes were reported as secondary outcomes (or vice versa) and to identify any novel non-pre-specified outcomes that were reported but not flagged as novel. The data sheet was updated accordingly. A single researcher reviewed and extracted data from each included trial, which then was checked and verified by a second. This data set (including the data sheet and the underlying documents) then was presented to one of the senior supervising clinical academics on the team (BG, CH and KM) where the data extraction was replicated in full. During this meeting (meetings typically lasted two or more hours and there were multiple meetings each week), all source documents and extracted pre-specified outcomes were identified and checked, and the location where each outcome was reported was identified in the paper. If outcomes were not reported and had not already been found by two researchers, then, at a minimum, key search terms were used as a check on the trial report and appendices, and all results tables were reviewed. Any discrepancies were resolved through discussion or, where needed, through referral to one of the other senior supervising clinical academics until consensus was reached. For each trial where outcome switching had occurred, the text of the correction letter to the journal was finalised in the team meeting, formally signed off by the supervising clinical academic, and submitted to the journal by the first reviewer before the submission deadline. Task allocation was closely managed by one team member (HD) as turnaround time from publication to submission was very short for some journals (for example, two weeks for The Lancet and three weeks for NEJM) and many trials were being assessed and responded to simultaneously.

CONSORT guidance states that trial publications should report “completely defined pre-specified primary and secondary outcome measures, including how and when they were assessed” (6a) and “any changes to trial outcomes after the trial commenced, with reasons” (6b) with further elaboration on these issues in the accompanying CONSORT publication [10]. Therefore, consistent with CONSORT, where outcome switching occurred but was openly declared to have occurred in the trial report, these outcomes were classified as correctly reported, as there are often valid reasons for reported outcomes to differ from those pre-specified.

Letter preparation and submission

We constructed a set of template letters to match the journal word limits so that all letters were standardised and comparable (Additional file 1a–c). All journals’ instructions to authors were checked for the time limit and word limit on letters for publication to ensure that there were no grounds for rejection on these procedural issues. Letters reported only the fact of the outcome misreporting, and the breach of CONSORT guidelines, rather than any arguable issue of opinion that might differ between letters or otherwise impact on acceptance and responses. No comments were made on the authors’ background or possible motives for misreporting. We did not adjudicate on the validity of the reasons given for changing pre-specified outcomes. We did not give any subjective opinion on whether the outcome misreporting would lead to clinical harm and reported only the matter of fact: that the journal, having endorsed CONSORT, had breached the CONSORT trial reporting guidance.

All correspondence with journals was collected in a team email account and archived. Where all letters were rejected, this was contested. Where letters were published alongside trialists’ replies and these replies raised new issues or misunderstandings, we replied setting out our concerns. We aimed to conduct all correspondence extremely politely and respond on matters of fact. All outgoing correspondence was reviewed and co-authored by at least one supervising clinical academic and, in many cases, all three. At the conclusion of the study, we extracted themes and key issues in all correspondence in collaboration with a qualitative researcher (CM).

We created a bespoke website at COMPare-trials.org to archive all data and correspondence in public, reflecting a broader commitment to open science [12]. All underlying raw data sheets were shared in full as studies were added to the site. This allowed any interested party, including trialists and editors, to openly review or contest our coding of every outcome in every trial. An automatically updated table of findings calculated rolling summary statistics from the underlying raw data. Correspondence with journals was archived publicly on the site, including the initial letter submitted for publication (after a 4-week delay, to avoid letters’ being rejected on grounds of prior publication), alongside key incoming and outgoing correspondence with journals and trialists.

Data analysis

We generated summary statistics, with confidence intervals and ranges as appropriate, on all outcomes. The follow-up period was 12 months from submission of the final letter. Our pre-specified primary outcomes were proportion of pre-specified outcomes reported, proportion of reported outcomes that are non-pre-specified and not declared as such, proportion of letters published in print, and publication delay in days. Our secondary outcomes were all of the primary outcomes, broken down by journal. At the end of the study, we added two outcomes that were not pre-specified in the original protocol. These were the number and proportion of trials with any discrepancy on primary outcomes, to generate figures commensurable with the systematic review, and the number and proportion of trials with a pre-trial protocol available online, as some editors expressed the view that protocols were more reliable.

A protocol was generated, instantiating the principles of CONSORT in simple instructions and workflows, to share with other teams who may wish to replicate this work on other journals. A full copy is posted at COMPare-trials.org. Amendments were added to this protocol as new challenges were encountered. For example, we initially planned to send corrected tables and figures to journals but this proved impractical; we reviewed our plans for ongoing monitoring after the initial six-week period; and we amended the time to publishing letters online to meet some journals’ consideration periods. All underlying data are shared online as Additional file 2 (full summary data set), Additional file 3 (full archive of all underlying raw coding sheets for each individual trial as available at www.COMPare-trials.org and template assessment sheet), Additional file 4 (COMPare protocol as at August 2016) and Additional file 5 (journal responses and themes). The correspondence archive is available at COMPare-trials.org/data.

Results

Workflow

We assessed 67 trials in total, a mean of 13.4 trials per journal (range 3–24). Each trial took between 1 and 7 hours to assess: workload was therefore high. One paper reported two trials, which were treated as separate trials.

Outcome reporting quality

All forms of outcome misreporting were common; summary statistics on outcome reporting are presented in Table 1. In total, 97 primary outcomes were pre-specified in total across 67 trials (mean 1.4 outcomes per trial); of these, 76.3% were reported correctly as primary outcomes and 80.4% were reported in any form; 19.4% of trials had at least one unreported pre-specified primary outcome. The proportion of correctly reported primary outcomes varied widely between journals (range 25–96%). There were 818 pre-specified secondary outcomes (mean 12.2 per trial); of these, 55.1% were reported, and there was wide variation in reporting rates between journals (range 31–72%); 365 novel outcomes were reported without declaration in total, a mean of 5.4 per trial. Changes were rarely declared: across all trials, 13.7% of novel non-pre-specified outcomes were correctly declared as novel and non-pre-specified, as required by CONSORT. Only 29 studies had a pre-trial protocol publicly available (mean 43%, 95% confidence interval (CI) 31–56%) with journals ranging from 0 to 86%.

Table 1 Summary statistics on outcome reporting discrepancies

Letter publication rates

In total, 58 trials (87%, 95% CI 78–95%) had discrepancies breaching CONSORT and therefore requiring a correction letter. Journals varied considerably as to whether trials required a correction letter (range 67–100%). All letters were submitted to journals within their submission time limit (generally within 2 weeks of trial publication). Of 58 correction letters submitted, 23 (40%) were published (95% CI 27–53%). Acceptance rates and publication delay varied widely between journals, as shown in Table 2. Two (NEJM and JAMA) rejected all letters; one (BMJ) accepted all letters as online comments only but issued a formal correction to one trial; one (Annals) accepted all letters online and some for print but imposed restrictions on subsequent discussion online and in the journal; and one (Lancet) had no facility for rapid online comments but accepted the majority (80%) of letters in print but with long delays (mean 150 days, range 40–257).

Table 2 Summary statistics on correction letter publication

Coding amendments

For one trial (ID = 53), additional misreporting was identified after the 2-week NEJM submission deadline had passed: as all letters to NEJM were rejected, including the initial letter which identified misreporting in this trial, no further letter was sent to the journal. From feedback on all our openly shared data across all included trials, we were made aware of two outcomes that were initially miscoded. One pre-specified outcome was initially coded as unreported but was in fact given in free text in the Results section of the trial report, using very different terminology to the pre-specification text: COMPare did not require identical word matches and attempted to manually identify all outcomes reported in tables and free text; this outcome was accidentally overlooked. The second miscoded outcome was initially coded as missing but was given in the trial report, in free text, using different terminology to the pre-specification text, and was reported only in the Discussion section of the trial report. Out of 756 outcome reporting discrepancies identified by COMPare, we are therefore aware of two errors, an error rate of 0.26%. Both errors were openly acknowledged by COMPare in correspondence for journal publication.

Themes in responses from journals

We identified a range of themes in responses from journals with respect to their understanding, and handling, of correct outcome reporting. We encountered several examples of journals expressing views that conflict with CONSORT. For example, NEJM stated that they use their judgement to decide which outcomes to report; Annals suggested that outcome switching is acceptable if the main results of the study are unaffected; and various editors appeared not to understand that under CONSORT it is acceptable to change outcomes as long as any changes from pre-commencement outcomes are disclosed in the paper reporting the results. We also found evidence of editors misunderstanding the importance of outcomes being pre-specified before the commencement of the trial.

Various editors made dismissive comments about registries, describing the content as unreliable or irrelevant and apparently accepting the notion that there will be multiple discrepant sets of contemporaneous pre-specified outcomes. Of note, for one trial (Trial 57, Annals, 03/05/16), we found three different sets of pre-specified outcomes in two registries (European Union Clinical Trials Register and ClinicalTrials.gov) and one protocol from the same time period, which is hard to reconcile with the notion of a single set of pre-specified outcomes. JAMA suggested that discrepancies between a trial report’s outcomes and pre-specified outcomes on the registry are a matter for the registry owners rather than the journal editors.

We also found editors placing responsibility for ensuring reporting fidelity onto others: NEJM suggested that editors need not ensure that reported outcomes match those pre-specified as readers can check this for themselves (although we found assessing this took between 1 and 7 hours); The Lancet left trialists to reply to our correspondence and did not give a view on whether editors believed that the misreported outcomes were correctly reported when asked directly.

We also encountered examples of what we coded as “rhetoric”. Annals made general statements about supporting the goals of COMPare. JAMA and NEJM both stated that space constraints meant that not all pre-specified outcomes could be reported, which conflicts with our finding that a mean of 5.4 novel non-pre-specified outcomes were added per manuscript. JAMA and Annals editors explained that they have rigorous processes to ensure that pre-commencement outcomes are correctly reported, which conflicted with our finding of extensive discrepancies and with previous work on the prevalence of outcome misreporting in the same journals. Further details and examples are presented in Table 3; a longer series of examples are presented in Additional file 5.

Table 3 Themes in responses from journals

Direct engagement by editors on specific misreported outcomes was rare. NEJM did not reply to COMPare directly on this issue but shared two documents with journalists reporting on COMPare, containing what NEJM stated were errors in COMPare’s coding on two trials. For illustration, a transcript and analysis of all six NEJM responses on one trial are presented in Table 4. This demonstrates errors by NEJM editors (such as confusing outcomes timed for the fourth week during treatment with the fourth week after treatment) and also provides further examples of editors’ approach to correct outcome reporting, such as the need for time points for outcome ascertainment to be pre-specified and adhered to.

Table 4 Errors in New England Journal of Medicine responses on trial 22

We also coded themes in journal editors’ criticisms of the letter-writing project. The dominant theme was misrepresentation of COMPare’s technical approach. For example, Annals editors stated that COMPare’s protocol was unreasonable because it required exact word matches between pre-specified and reported outcomes (it does not) and that COMPare only used registries as a source of pre-specified outcomes (as per the COMPare protocol, registries were used as a last resort when no pre-commencement protocol was available). JAMA stated that COMPare’s responses to published trial reports contained insufficient information (all COMPare raw data sheets were shared in full, detailing each pre-specified primary and secondary outcome; whether and how each pre-specified outcome was reported; each additional non-pre-specified outcome reported; and whether each non-pre-specified outcome added was correctly declared as non-pre-specified). It cannot be ascertained whether these inaccuracies represent misunderstandings or acts of rhetoric. Further details are presented in Table 5 and Additional file 5.

Table 5 Themes in journals’ criticisms of COMPare

We also found some positive responses. The BMJ issued a 149-word correction on the REEACT (Randomised Evaluation of the Effectiveness and Acceptability of Computerised Therapy) trial after receiving COMPare’s correction letter, and Annals corrected the “Reproducible Research” data-sharing statement on one trial after we reported that a protocol was withheld from us by the trialists. No other formal corrections were issued on any of the 58 misreported trials.

Narrative account of individual journals’ responses

Journals’ responses to correction letters reporting breaches of CONSORT were diverse but broadly dismissive. NEJM rejected all COMPare letters, stating “we do not, and never have, required authors to comply with CONSORT” and explaining that “space constraints” prevent all outcomes from being reported [NEJM emails 1]. COMPare appealed and received no reply. In March, NEJM gave journalists a detailed review of COMPare’s assessment of one trial, which NEJM stated had identified six errors in COMPare’s assessment, as per Table 4 above.

JAMA published no letters and informed us halfway through the project that they would publish none, as in their view COMPare letters contained repetition and too little information on specific misreported outcomes [JAMA emails, 09/12/15]. JAMA imposes word length restrictions on letters responding to papers, and this limit prevented us including all details of all misreported outcomes, however all letters signposted COMPare-trials.org where all underlying raw data were shared in full. For all letters submitted to JAMA after their December 9 reply, we removed all repetition and added specific detail of every misreported outcome in the main body of the text. None of these letters was published, and we received no further correspondence from JAMA. Of note, JAMA also stated “trial protocols ... have been included as a supplement with each trial published in JAMA since mid-2014”. We found that pre-commencement protocols were available for only 53.9% of JAMA trials in our cohort.

The Lancet published 16 out of 20 COMPare letters, mostly with author replies. Most author replies contained misunderstandings of correct pre-specification and reporting of outcomes, as reported in our accompanying paper on trialists’ responses. We sent several replies addressing these issues and two of them were published. Several of these replies requested that Lancet editors express a view on whether outcomes had been correctly reported. We have received no comment from The Lancet editors throughout. The BMJ published only three trials during the study period but published all COMPare correspondence online; they issued a formal correction for one trial but not for another which had similarly misreported outcomes. Annals engaged in a lengthy and complex dispute with COMPare, as detailed in the timeline in Table 6.

Table 6 Timeline of Annals’ responses to COMPare

Discussion

Summary

We found that journals listed as endorsing the CONSORT guidelines, which require complete outcome reporting, fail to ensure compliance on this issue. The majority of correction letters were rejected. In addition, we found that two journals actively rejected all letters that signposted outcome misreporting, despite its being an important source of bias; and several journals disclosed that, contrary to their being listed as endorsing CONSORT, they do not regard breaches of CONSORT as problematic. Qualitative analysis of themes in extensive subsequent correspondence with journal editors and trialists demonstrates widespread misunderstandings of what constitutes complete outcome reporting. We additionally found breaches of best practice policies such as ICMJE guidelines.

Strengths and weaknesses

Post-publication peer review is an important component of the scientific process. There have been previous anecdotal reports of shortcomings around how journals handle individual items of critical correspondence [12, 13]: however, to the best of our knowledge, COMPare is the first systematic and prospective study setting out to generate and submit a comparable cohort of correction letters on a large systematic sample of misreported scientific studies, in order to assess how scientific journals are curating critical post-publication peer review. It is also the first to systematically assess whether journals will permit open discussion of possible editorial shortcomings.

The key strength and innovation of our study is that it was conducted prospectively, aiming to correct individual misreported trials in real time rather than retrospectively publishing an overall prevalence figure. This allowed us to go beyond previous work that documents only prevalence and instead generate data shedding light on the reasons for misreporting and, through letter acceptance rates, also generate an objective measure of journals’ commitment to correct reporting.

Our novel approach of prospective real-time corrections brought several additional methodological benefits. Previous work has mostly assessed whether there was outcome switching at all in a study [1]. To maximise the informativeness, credibility and impact of COMPare’s letters, we needed to assess the extent of outcome misreporting in more detail and share information on each individual misreported outcome. Because of this, and a broader commitment to open science in the team, all underlying raw data were shared in full during the project and prominently signposted in journals and all external coverage. This open approach is likely to have reduced the risk of small coding errors: our data were closely scrutinised by trialists and editors motivated to find evidence of errors; and we were highly motivated to ensure that no errors were found. None of the 29 previous cohort studies has shared data on individual outcomes and trials in this fashion. From all trialist feedback, we found two outcomes miscoded by COMPare out of 756 identified outcome discrepancies; no assertions of miscoding made by editors were valid (see Table 2 and Additional file 5 for examples). It remains possible that our data set contains small additional coding errors, as with all research. However, our prevalence figures are consistent with previous work, and at least three members of the research team reviewed each outcome.

An additional strength of our study is the lack of conflict of interest among the COMPare researchers on the specific interventions being trialled. Critical correspondence on methodological shortcomings in published research often originates from other academics in the same field who may have a personal history, financial or ideological conflicts of interest, or a competitive relationship with the individual research teams involved: this was not the case for our large systematic sample of trials and letters. However, we note that the COMPare team do have a complex range of additional conflicts of interest, in excess of what would normally be declared; for example, an academic in our position may be concerned not to appear critical of a journal, given the importance of journal publication to career progression; all senior academics on the team have previously published in at least one of the journals covered, and two of us (BG and CH) have previously worked with the BMJ on a transparency campaign.

The issue of generalisability is key. Our study covered all trials in five general medical journals, reporting a wide range of interventions from hand washing and acupuncture to antiviral drugs. However, all journals were very high-impact: lower-impact journals may have different or more heterogeneous performance on outcome reporting and publication of correction letters. Furthermore, the fact that our letters were part of a coordinated project may have led editors to treat them differently: it is hard to ascertain whether this would make editors more or less likely to handle them appropriately.

Ideally, our study would have examined trials from a wider sample of journals. However, the workload associated with checking trials in real time, in detail, and maintaining subsequent interactive correspondence within the timeline for publication was extremely high, even for a large coordinated team; and the intention of COMPare was not solely to measure the prevalence of outcome misreporting. Initial plans to maintain the process of analysing trials and submitting correction letters were shelved, due to the high workload and blanket rejection of most letters.

Context of previous work

Our findings on the simple prevalence of outcome misreporting are consistent with previous work. The most current systematic review, from 2015 [1], found 27 studies comparing pre-specified outcomes against those reported, as described above: the median proportion of trials with a discrepancy on primary outcomes was 31% (range 0–100%, IQR 17–45%); in COMPare, we found that 19.4% of trials (95% CI 9.9–28.9%) had unreported primary outcomes. Therefore, while some journals argued that our assessment process was unreasonable, COMPare found a lower prevalence for discrepancies than previous work. Although most previous studies were published within the past decade, all but one included trials that commenced before the ICMJE 2005 policy mandating trial registration with pre-specified outcomes as a condition of acceptance for publication in member journals: this may have led to improved reporting standards. However, our findings give little evidence for any strong overall improvement, and two additional cohort studies published in 2008 [14] and 2016 [15] report similar prevalence for discrepancies as before.

All previous studies have published only prevalence figures describing the overall extent of outcome misreporting, and none attempted to actively correct the record on individual misreported trials. It is widely agreed that the scientific literature should be self-correcting, with researchers engaging in post-publication peer review and submitting critical commentary or corrections in letters for publication, sometimes resulting in formal corrections or retraction where the main results of a study are invalidated by an error identified. Prior to our study, there have been only anecdotal reports that these systems fall short when tested. The largest we are aware of is a retrospective narrative description of four academics’ experience attempting to publish correction letters on 25 studies with various flaws that they identified while writing a newsletter for their research field: they found that “post-publication peer review is not consistent, smooth, or rapid” with journal editors unwilling to publish critical letters, correct errors, or retract articles with errors that invalidate their key findings [13].

The current Cochrane review of studies examining discrepancies between protocols or registry entries and published trial reports [16] reports high prevalence for numerous other related reporting flaws, including inconsistencies on sample size calculation, blinding, method of allocation concealment, subgroup analyses, and analytic approach. It is therefore highly likely that the problems we have identified with misreporting, and failure to correct that misreporting, will generalise beyond the single issue of outcome reporting.

Policy implications for journals

Some journals explicitly stated that they did not expect all pre-specified outcomes to be correctly reported, despite being publicly listed as endorsing the CONSORT guidelines which state the contrary. This disparity between public stance and editorial practice is likely to give false reassurance to readers, who may reasonably assume that all pre-specified outcomes are correctly reported in a journal article. While CONSORT compliance would be preferable, we suggest as a minimum that all medical journals explicitly clarify whether they do, or do not, aim to comply with CONSORT (if so, which elements) and specify the documents and methods they use to assess compliance. The workload for our team of independently checking for outcome misreporting was extremely high: however, this would be lower for journals, as any apparent discrepancy could be referred back to the authors, whereas for COMPare letters, an extremely high level of confidence in discrepancies—and therefore more laborious checking—was required before submission.

Most letters were not published, and we encountered instances of what we regard as a failure to abide by best practice around journal correspondence as set out in ICMJE guidance (Table 3, Annals). Since journal editors make editorial judgements about what is published in their journals, they may have significant conflicts of interest when their own editorial processes or judgements are subjected to critical scrutiny. The Lancet has an internal ombudsman issuing an annual report [17]. There have been calls for independent oversight of journal editors for many years [18]: although an external appeals process would likely be valuable, it also risks being cumbersome or vulnerable to abuse by special interest groups. We are aware of no evidence assessing the effectiveness of this approach. At a minimum, we suggest that all correspondence above a basic minimum threshold for quality be published accessibly online, as per Rapid Responses in the BMJ.

We also identified asymmetries in access to critical post-publication peer review. For example, at Annals’ website, all visitors (including those with no password or subscription) can read an abstract that misreports a trial’s outcomes, but only those with password-controlled registration and account access can read online comments demonstrating that pre-specified outcomes were misreported. In addition, there are restrictions on critical post-publication correspondence that may not be justifiable in the era of online publication. For example, most journals had length limits and tight submission deadlines for letters: both of these have been previously criticised [19]. There is further asymmetry here: The Lancet gives readers 2 weeks to submit correspondence on a specific paper yet did not publish some COMPare letters until more than 6 months after initial trial publication. During this period—possibly the period when the trial reports were most read—information about outcome misreporting was effectively withheld from readers. Overall, our findings suggest that post-publication peer review and critical appraisal are not currently well managed by journals. This suggests that alternative approaches such as PubMed Commons—with a lower threshold, instant online publication, good indexing, and independent editorial control—may be more appropriate.

Policy implications for registries

The denigration of trial registry data by some editors was unexpected. Registries were specifically set up as a public time-stamped information resource to address selective outcome reporting. They have received extensive public support from WHO, journals, and ICMJE, who state: “The purpose of clinical trial registration is to prevent selective publication and selective reporting of research outcomes”. The key driver for greater registry uptake was action by journals and specifically the ICMJE stating in 2005 that member journals would not consider unregistered trials for publication [7]. Journals taking the content of registry entries seriously would therefore likely be a key lever in ensuring that registries are used appropriately by trialists; that would be valuable, as we found that registries are often the only publicly accessible time-stamped source for a trial’s pre-specified outcomes. There are no valid reasons why registries should contain outcomes that are discrepant with contemporaneous protocols, as some argued in response to COMPare letters; indeed, trialists in many territories, including the US and European Union, have a legal duty to completely and accurately register their trial, including details of pre-specified outcomes, on a register with a statutory regulatory role. Despite this, for one trial, we found three different contemporaneous sets of pre-specified outcomes spread across two registry entries and one protocol.

Journals varied widely in their practical approach to registries. For example, the BMJ uses registries as the primary source for pre-specified outcomes, whereas Annals editors told COMPare that protocols were their chosen source for assessing complete outcome reporting, but none of five trials published in Annals had a pre-specified protocol publicly available. Annals policy and practice therefore make independent verification of their assessment of correct outcome reporting impossible. We see no justification for relying on protocols when they are routinely unavailable and when registry entries—a legal requirement on publicly accessible services explicitly set up to address selective reporting—are now almost universally available. We hope that trial registry managers will also find our data on some editors’ approaches to their work informative in their broader strategic approach.

Policy implications for CONSORT

We believe that there is a need for greater clarity, emphasis, and awareness raising on certain aspects of CONSORT guidance and a need to review the mechanisms around the EQUATOR (Enhancing the Quality and Transparency Of health Research) network’s public list of journals “endorsing” CONSORT. Since some journals we examined eventually stated that they do not require CONSORT compliance on the key issue of correct outcome reporting, CONSORT may wish to consider removing journal titles from their list, or implementing a two-level approach, with journals opting to “endorse” in spirit or “enforce” in practice, or possibly consider offering a system to check and accredit compliance, for journals wishing to demonstrate credibility to readers.

Future work

There are already extensive data on the simple prevalence of methodological and reporting errors for clinical trials. In our view, there is little value in repeating simple prevalence studies unless there are grounds to believe that the prevalence has changed. While we recognise that other research teams may be intimidated by the response our project received, the mixed methods approach of COMPare provides additional insights into the reasons why shortcomings persist despite public statements of adherence to reporting standards.

It is plausible that the modest coverage, impact, internal discussions and public debate triggered by our systematic programme of corrections have had a positive impact on policy or practice at journals. We are therefore now re-assessing outcome reporting in the same five journals to assess whether standards have improved following the initial COMPare study and feedback period. We would welcome others repurposing our methods and have shared our protocol in full online and as Supplementary Material, expanded where appropriate to clarify specific steps for those unfamiliar with specific requirements of CONSORT. We hope other groups may find this useful to run a similar project in a different set of specialty journals, the same journals, or other sectors where RCTs are becoming commonplace, such as development economics, education, or policing. Our method could also be extended to other methodological and reporting issues, including in fields outside of medicine, especially where there are similar methodological shortcomings that can be identified consistently, to produce a similarly comparable cohort of letters. This would allow researchers to assess whether the problem of journals rejecting legitimate critical commentary is limited to high-impact medical journals with a clinical focus and would move current high-profile discussion on shortcomings at journals forward from anecdotal descriptions of challenges around criticising individual studies.

In our view, the traditional model for research on shortcomings in studies’ methods and reporting—publishing prevalence figures alone, for retrospective cohorts—represents a wasteful use of resources. Specifically, it is a waste of the insights generated by expert reviewers, at considerable time and expense, about shortcomings in individual studies. We suggest that all such studies systematically write letters for publication about each individual misreported or flawed study they identify, in order to alert consumers of the academic literature to those flaws, to maximise efficient use of researcher time, to raise awareness of methodological flaws in published research, and to augment the impact of their work. This simple change will help academia to be a learning system with constructive feedback. In addition, it is likely to improve the data quality in methodological research, for the reasons described above, as researchers of studies coded as flawed will be able to openly contest adjudications they regard as inaccurate.

Conclusion

We found high levels of outcome misreporting amongst the five top medical journals listed as endorsing the CONSORT statement on correct reporting of clinical trials. Most of these journals rejected correction letters that documented their misreporting. We found extensive evidence of misunderstandings about correct outcome reporting at journals. The disparity between journals’ public stance and practical action may mislead readers into assuming that pre-specified outcomes are correctly reported. Possible solutions include changes to correspondence processes at journals, alternatives for indexed post-publication peer review, changes to CONSORT’s mechanisms for enforcement, and changes to traditional practices in methodology research to ensure that problems identified with published studies are routinely shared with the broader academic community.

Abbreviations

BMJ:

British Medical Journal

CI:

Confidence interval

COMPare:

Centre for Evidence-Based Medicine Outcome Monitoring Project

CONSORT:

Consolidated Standards of Reporting Trials

ICMJE:

International Committee of Medical Journal Editors

IQR:

Interquartile range

JAMA:

Journal of the American Medical Association

NEJM:

New England Journal of Medicine

RCT:

Randomised controlled trial

WHO:

World Health Organization

References

  1. Jones CW, Keil LG, Holland WC, Caughey MC, Platts-Mills TF. Comparison of registered and published outcomes in randomized controlled trials: a systematic review. BMC Med. 2015;13:282.

    Article  Google Scholar 

  2. Hart B, Lundh A, Bero L. Effect of reporting bias on meta-analyses of drug trials: reanalysis of meta-analyses. BMJ. 2012;344:d7202.

    Article  Google Scholar 

  3. Kirkham JJ, Dwan KM, Altman DG, Gamble C, Dodd S, Smyth R, et al. The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews. BMJ. 2010;340:c365.

    Article  Google Scholar 

  4. ICMJE | Recommendations | Browse [Internet]. [cited 1 Sep 2016]. Available: http://www.icmje.org/recommendations/browse/.

  5. Commissioner O of T. Food and Drug Administration Amendments Act (FDAAA) of 2007. Office of the Commissioner; Available: https://www.fda.gov/RegulatoryInformation/LawsEnforcedbyFDA/SignificantAmendmentstotheFDCAct/FoodandDrugAdministrationAmendmentsActof2007/default.htm.

  6. WHO | Welcome to the WHO ICTRP. World Health Organization; 2016. Available: http://www.who.int/ictrp/en/.

  7. ICMJE | Recommendations | Browse [Internet]. [cited 1 Sep 2016]. Available: http://www.icmje.org/recommendations/browse/publishing-and-editorial-issues/clinical-trial-registration.html.

  8. WHO | Trial Registration. World Health Organization; 2014. Available: http://www.who.int/ictrp/trial_reg/en/index2.html.

  9. Good Clinical Practice: ICH [Internet]. [cited 1 Sep 2016]. Available: http://www.ich.org/products/guidelines/efficacy/efficacy-single/article/good-clinical-practice.html.

  10. Schulz KF, Altman DG, Moher D, CONSORT Group. CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. BMJ. 2010;340:c332.

    Article  Google Scholar 

  11. Consort - Endorsers [Internet]. [cited 1 Sep 2016]. Available: http://www.consort-statement.org/about-consort/endorsers.

  12. McKiernan EC, Bourne PE, Brown CT, Buck S, Kenall A, Lin J, et al. How open science helps researchers succeed. elife. 2016;5. https://doi.org/10.7554/eLife.16800.

  13. Allison DB, Brown AW, George BJ, Kaiser KA. Reproducibility: a tragedy of errors. Nature. 2016;530:27–9.

    Article  CAS  Google Scholar 

  14. Al-Marzouki S, Sanaa A-M, Ian R, Stephen E, Tom M. Selective reporting in clinical trials: analysis of trial protocols accepted by The Lancet. Lancet. 2008;372:201.

    Article  Google Scholar 

  15. Weston J, Dwan K, Altman D, Clarke M, Gamble C, Schroter S, et al. Feasibility study to examine discrepancy rates in pre-specified and reported outcomes in articles submitted to The BMJ. BMJ Open. 2016;6:e010075.

    Article  Google Scholar 

  16. Dwan K, Kerry D. Comparison of protocols and registry entries to published reports for randomised controlled trials. J Evid Based Med. 2011;4:194.

    Article  Google Scholar 

  17. Molyneux M, Malcolm M. Ombudsman’s report for 2015. Lancet. 2016;387:210.

    Article  Google Scholar 

  18. Altman DG. Is there a case for an international medical scientific press council? JAMA. 1994;272:166–7.

    Article  CAS  Google Scholar 

  19. Altman DG. Unjustified restrictions on letters to the editor. PLoS Med. 2005;2:e126 discussion e152.

    Article  Google Scholar 

  20. Goldacre B. Where does Annals of Internal Medicine stand on outcome switching? A detailed response. In: COMPare [Internet]; 2016. [cited 25 Nov 2017]. Available: http://compare-trials.org/blog/where-does-Annals-of-internal-medicine-stand-on-outcome-switching-a-detailed-response/.

    Google Scholar 

  21. Goldacre B. Make journals report clinical trials properly. Nat News. 2016;530:7.

    Article  CAS  Google Scholar 

  22. ICMJE | Recommendations [Internet]. [cited 27 Nov 2017]. Available: http://www.icmje.org/recommendations/.

  23. In major shift, medical journal to publish protocols along with clinical trials - Retraction Watch. In: Retraction Watch [Internet]. 2016. [cited 27 Nov 2017]. Available: http://retractionwatch.com/2016/05/13/in-major-shift-Annals-to-publish-protocols-along-with-clinical-trials/.

Download references

Acknowledgements

Not applicable.

Funding

No specific funding was sought for this project. BG is funded to work on research integrity by the Laura and John Arnold Foundation and employs AP-S and HD in part from this grant.

Availability of data and materials

All raw data sheets, letters (as they were sent) and correspondence are available at COMPare-trials.org/data. Template documents, full summary data set, raw data sheets, protocol, and a table of journal responses and themes are all shared as Additional files with this paper.

Author information

Authors and Affiliations

Authors

Contributions

BG conceived and designed the study, drafted the article and served as guarantor. BG, HD, CH and KRM developed the full protocol. BG, HD, CH, KRM, ES, PH, AD, IM and CM provided data collection. AP-S, BG and HD provided accompanying website and data management. BG, CM, AP-S and HD provided data analysis. All authors provided critical revisions of the article and gave final approval of the version to be published. BG, CH and KRM provided data checking. All data and correspondence are available at COMPare-trials.org and as appendices.

Corresponding author

Correspondence to Ben Goldacre.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

BG has received research funding from the Laura and John Arnold Foundation, the Wellcome Trust, the National Health Service (NHS) National Institute for Health Research (NIHR), the Health Foundation and the WHO. He also receives personal income from speaking and writing for lay audiences on the misuse of science. KM has received funding from the NHS NIHR and the Royal College of General Practitioners for independent research projects. CH has received grant funding from the WHO, the NIHR and the NIHR School of Primary Care. He is also an advisor to the WHO International Clinical Trials Registry Platform. The views expressed are those of the authors and not necessarily those of any of the funders or institutions mentioned above.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:

a General template letter. b Template letter to the New England Journal of Medicine (NEJM). c Template letter to The Lancet. (ZIP 20 kb)

Additional file 2:

Full summary data set. (XLSX 58 kb)

Additional file 3:

Full archive of all underlying raw coding sheets for each individual trial as available at www.COMPare-trials.org. (DOCX 6 kb)

Additional file 4:

COMPare current protocol as at August 2016. (DOCX 93 kb)

Additional file 5

Journal responses and themes. (PDF 96 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Goldacre, B., Drysdale, H., Dale, A. et al. COMPare: a prospective cohort study correcting and monitoring 58 misreported trials in real time. Trials 20, 118 (2019). https://doi.org/10.1186/s13063-019-3173-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13063-019-3173-2

Keywords