We found that journals listed as endorsing the CONSORT guidelines, which require complete outcome reporting, fail to ensure compliance on this issue. The majority of correction letters were rejected. In addition, we found that two journals actively rejected all letters that signposted outcome misreporting, despite its being an important source of bias; and several journals disclosed that, contrary to their being listed as endorsing CONSORT, they do not regard breaches of CONSORT as problematic. Qualitative analysis of themes in extensive subsequent correspondence with journal editors and trialists demonstrates widespread misunderstandings of what constitutes complete outcome reporting. We additionally found breaches of best practice policies such as ICMJE guidelines.
Strengths and weaknesses
Post-publication peer review is an important component of the scientific process. There have been previous anecdotal reports of shortcomings around how journals handle individual items of critical correspondence [12, 13]: however, to the best of our knowledge, COMPare is the first systematic and prospective study setting out to generate and submit a comparable cohort of correction letters on a large systematic sample of misreported scientific studies, in order to assess how scientific journals are curating critical post-publication peer review. It is also the first to systematically assess whether journals will permit open discussion of possible editorial shortcomings.
The key strength and innovation of our study is that it was conducted prospectively, aiming to correct individual misreported trials in real time rather than retrospectively publishing an overall prevalence figure. This allowed us to go beyond previous work that documents only prevalence and instead generate data shedding light on the reasons for misreporting and, through letter acceptance rates, also generate an objective measure of journals’ commitment to correct reporting.
Our novel approach of prospective real-time corrections brought several additional methodological benefits. Previous work has mostly assessed whether there was outcome switching at all in a study . To maximise the informativeness, credibility and impact of COMPare’s letters, we needed to assess the extent of outcome misreporting in more detail and share information on each individual misreported outcome. Because of this, and a broader commitment to open science in the team, all underlying raw data were shared in full during the project and prominently signposted in journals and all external coverage. This open approach is likely to have reduced the risk of small coding errors: our data were closely scrutinised by trialists and editors motivated to find evidence of errors; and we were highly motivated to ensure that no errors were found. None of the 29 previous cohort studies has shared data on individual outcomes and trials in this fashion. From all trialist feedback, we found two outcomes miscoded by COMPare out of 756 identified outcome discrepancies; no assertions of miscoding made by editors were valid (see Table 2 and Additional file 5 for examples). It remains possible that our data set contains small additional coding errors, as with all research. However, our prevalence figures are consistent with previous work, and at least three members of the research team reviewed each outcome.
An additional strength of our study is the lack of conflict of interest among the COMPare researchers on the specific interventions being trialled. Critical correspondence on methodological shortcomings in published research often originates from other academics in the same field who may have a personal history, financial or ideological conflicts of interest, or a competitive relationship with the individual research teams involved: this was not the case for our large systematic sample of trials and letters. However, we note that the COMPare team do have a complex range of additional conflicts of interest, in excess of what would normally be declared; for example, an academic in our position may be concerned not to appear critical of a journal, given the importance of journal publication to career progression; all senior academics on the team have previously published in at least one of the journals covered, and two of us (BG and CH) have previously worked with the BMJ on a transparency campaign.
The issue of generalisability is key. Our study covered all trials in five general medical journals, reporting a wide range of interventions from hand washing and acupuncture to antiviral drugs. However, all journals were very high-impact: lower-impact journals may have different or more heterogeneous performance on outcome reporting and publication of correction letters. Furthermore, the fact that our letters were part of a coordinated project may have led editors to treat them differently: it is hard to ascertain whether this would make editors more or less likely to handle them appropriately.
Ideally, our study would have examined trials from a wider sample of journals. However, the workload associated with checking trials in real time, in detail, and maintaining subsequent interactive correspondence within the timeline for publication was extremely high, even for a large coordinated team; and the intention of COMPare was not solely to measure the prevalence of outcome misreporting. Initial plans to maintain the process of analysing trials and submitting correction letters were shelved, due to the high workload and blanket rejection of most letters.
Context of previous work
Our findings on the simple prevalence of outcome misreporting are consistent with previous work. The most current systematic review, from 2015 , found 27 studies comparing pre-specified outcomes against those reported, as described above: the median proportion of trials with a discrepancy on primary outcomes was 31% (range 0–100%, IQR 17–45%); in COMPare, we found that 19.4% of trials (95% CI 9.9–28.9%) had unreported primary outcomes. Therefore, while some journals argued that our assessment process was unreasonable, COMPare found a lower prevalence for discrepancies than previous work. Although most previous studies were published within the past decade, all but one included trials that commenced before the ICMJE 2005 policy mandating trial registration with pre-specified outcomes as a condition of acceptance for publication in member journals: this may have led to improved reporting standards. However, our findings give little evidence for any strong overall improvement, and two additional cohort studies published in 2008  and 2016  report similar prevalence for discrepancies as before.
All previous studies have published only prevalence figures describing the overall extent of outcome misreporting, and none attempted to actively correct the record on individual misreported trials. It is widely agreed that the scientific literature should be self-correcting, with researchers engaging in post-publication peer review and submitting critical commentary or corrections in letters for publication, sometimes resulting in formal corrections or retraction where the main results of a study are invalidated by an error identified. Prior to our study, there have been only anecdotal reports that these systems fall short when tested. The largest we are aware of is a retrospective narrative description of four academics’ experience attempting to publish correction letters on 25 studies with various flaws that they identified while writing a newsletter for their research field: they found that “post-publication peer review is not consistent, smooth, or rapid” with journal editors unwilling to publish critical letters, correct errors, or retract articles with errors that invalidate their key findings .
The current Cochrane review of studies examining discrepancies between protocols or registry entries and published trial reports  reports high prevalence for numerous other related reporting flaws, including inconsistencies on sample size calculation, blinding, method of allocation concealment, subgroup analyses, and analytic approach. It is therefore highly likely that the problems we have identified with misreporting, and failure to correct that misreporting, will generalise beyond the single issue of outcome reporting.
Policy implications for journals
Some journals explicitly stated that they did not expect all pre-specified outcomes to be correctly reported, despite being publicly listed as endorsing the CONSORT guidelines which state the contrary. This disparity between public stance and editorial practice is likely to give false reassurance to readers, who may reasonably assume that all pre-specified outcomes are correctly reported in a journal article. While CONSORT compliance would be preferable, we suggest as a minimum that all medical journals explicitly clarify whether they do, or do not, aim to comply with CONSORT (if so, which elements) and specify the documents and methods they use to assess compliance. The workload for our team of independently checking for outcome misreporting was extremely high: however, this would be lower for journals, as any apparent discrepancy could be referred back to the authors, whereas for COMPare letters, an extremely high level of confidence in discrepancies—and therefore more laborious checking—was required before submission.
Most letters were not published, and we encountered instances of what we regard as a failure to abide by best practice around journal correspondence as set out in ICMJE guidance (Table 3, Annals). Since journal editors make editorial judgements about what is published in their journals, they may have significant conflicts of interest when their own editorial processes or judgements are subjected to critical scrutiny. The Lancet has an internal ombudsman issuing an annual report . There have been calls for independent oversight of journal editors for many years : although an external appeals process would likely be valuable, it also risks being cumbersome or vulnerable to abuse by special interest groups. We are aware of no evidence assessing the effectiveness of this approach. At a minimum, we suggest that all correspondence above a basic minimum threshold for quality be published accessibly online, as per Rapid Responses in the BMJ.
We also identified asymmetries in access to critical post-publication peer review. For example, at Annals’ website, all visitors (including those with no password or subscription) can read an abstract that misreports a trial’s outcomes, but only those with password-controlled registration and account access can read online comments demonstrating that pre-specified outcomes were misreported. In addition, there are restrictions on critical post-publication correspondence that may not be justifiable in the era of online publication. For example, most journals had length limits and tight submission deadlines for letters: both of these have been previously criticised . There is further asymmetry here: The Lancet gives readers 2 weeks to submit correspondence on a specific paper yet did not publish some COMPare letters until more than 6 months after initial trial publication. During this period—possibly the period when the trial reports were most read—information about outcome misreporting was effectively withheld from readers. Overall, our findings suggest that post-publication peer review and critical appraisal are not currently well managed by journals. This suggests that alternative approaches such as PubMed Commons—with a lower threshold, instant online publication, good indexing, and independent editorial control—may be more appropriate.
Policy implications for registries
The denigration of trial registry data by some editors was unexpected. Registries were specifically set up as a public time-stamped information resource to address selective outcome reporting. They have received extensive public support from WHO, journals, and ICMJE, who state: “The purpose of clinical trial registration is to prevent selective publication and selective reporting of research outcomes”. The key driver for greater registry uptake was action by journals and specifically the ICMJE stating in 2005 that member journals would not consider unregistered trials for publication . Journals taking the content of registry entries seriously would therefore likely be a key lever in ensuring that registries are used appropriately by trialists; that would be valuable, as we found that registries are often the only publicly accessible time-stamped source for a trial’s pre-specified outcomes. There are no valid reasons why registries should contain outcomes that are discrepant with contemporaneous protocols, as some argued in response to COMPare letters; indeed, trialists in many territories, including the US and European Union, have a legal duty to completely and accurately register their trial, including details of pre-specified outcomes, on a register with a statutory regulatory role. Despite this, for one trial, we found three different contemporaneous sets of pre-specified outcomes spread across two registry entries and one protocol.
Journals varied widely in their practical approach to registries. For example, the BMJ uses registries as the primary source for pre-specified outcomes, whereas Annals editors told COMPare that protocols were their chosen source for assessing complete outcome reporting, but none of five trials published in Annals had a pre-specified protocol publicly available. Annals policy and practice therefore make independent verification of their assessment of correct outcome reporting impossible. We see no justification for relying on protocols when they are routinely unavailable and when registry entries—a legal requirement on publicly accessible services explicitly set up to address selective reporting—are now almost universally available. We hope that trial registry managers will also find our data on some editors’ approaches to their work informative in their broader strategic approach.
Policy implications for CONSORT
We believe that there is a need for greater clarity, emphasis, and awareness raising on certain aspects of CONSORT guidance and a need to review the mechanisms around the EQUATOR (Enhancing the Quality and Transparency Of health Research) network’s public list of journals “endorsing” CONSORT. Since some journals we examined eventually stated that they do not require CONSORT compliance on the key issue of correct outcome reporting, CONSORT may wish to consider removing journal titles from their list, or implementing a two-level approach, with journals opting to “endorse” in spirit or “enforce” in practice, or possibly consider offering a system to check and accredit compliance, for journals wishing to demonstrate credibility to readers.
There are already extensive data on the simple prevalence of methodological and reporting errors for clinical trials. In our view, there is little value in repeating simple prevalence studies unless there are grounds to believe that the prevalence has changed. While we recognise that other research teams may be intimidated by the response our project received, the mixed methods approach of COMPare provides additional insights into the reasons why shortcomings persist despite public statements of adherence to reporting standards.
It is plausible that the modest coverage, impact, internal discussions and public debate triggered by our systematic programme of corrections have had a positive impact on policy or practice at journals. We are therefore now re-assessing outcome reporting in the same five journals to assess whether standards have improved following the initial COMPare study and feedback period. We would welcome others repurposing our methods and have shared our protocol in full online and as Supplementary Material, expanded where appropriate to clarify specific steps for those unfamiliar with specific requirements of CONSORT. We hope other groups may find this useful to run a similar project in a different set of specialty journals, the same journals, or other sectors where RCTs are becoming commonplace, such as development economics, education, or policing. Our method could also be extended to other methodological and reporting issues, including in fields outside of medicine, especially where there are similar methodological shortcomings that can be identified consistently, to produce a similarly comparable cohort of letters. This would allow researchers to assess whether the problem of journals rejecting legitimate critical commentary is limited to high-impact medical journals with a clinical focus and would move current high-profile discussion on shortcomings at journals forward from anecdotal descriptions of challenges around criticising individual studies.
In our view, the traditional model for research on shortcomings in studies’ methods and reporting—publishing prevalence figures alone, for retrospective cohorts—represents a wasteful use of resources. Specifically, it is a waste of the insights generated by expert reviewers, at considerable time and expense, about shortcomings in individual studies. We suggest that all such studies systematically write letters for publication about each individual misreported or flawed study they identify, in order to alert consumers of the academic literature to those flaws, to maximise efficient use of researcher time, to raise awareness of methodological flaws in published research, and to augment the impact of their work. This simple change will help academia to be a learning system with constructive feedback. In addition, it is likely to improve the data quality in methodological research, for the reasons described above, as researchers of studies coded as flawed will be able to openly contest adjudications they regard as inaccurate.