Quality of reporting of clinical non-inferiority and equivalence randomised trials - update and extension

Background Non-inferiority and equivalence trials require tailored methodology and therefore adequate conduct and reporting is an ambitious task. The aim of our review was to assess whether the criteria recommended by the CONSORT extension were followed. Methods We searched the Medline database and the Cochrane Central Register for reports of randomised non-inferiority and equivalence trials published in English language. We excluded reports on bioequivalence studies, reports targeting on other than the main results of a trial, and articles of which the full-text version was not available. In total, we identified 209 reports (167 non-inferiority, 42 equivalence trials) and assessed the reporting and methodological quality using abstracted items of the CONSORT extension. Results Half of the articles did not report on the method of randomisation and only a third of the trials were reported to use blinding. The non-inferiority or equivalence margin was defined in most reports (94%), but was justified only for a quarter of the trials. Sample size calculation was reported for a proportion of 90%, but the margin was taken into account in only 78% of the trials reported. Both intention-to-treat and per-protocol analysis were presented in less than half of the reports. When reporting the results, a confidence interval was given for 85% trials. A proportion of 21% of the reports presented a conclusion that was wrong or incomprehensible. Overall, we found a substantial lack of quality in reporting and conduct. The need to improve also applied to aspects generally recommended for randomised trials. The quality was partly better in high-impact journals as compared to others. Conclusions There are still important deficiencies in the reporting on the methodological approach as well as on results and interpretation even in high-impact journals. It seems to take more than guidelines to improve conduct and reporting of non-inferiority and equivalence trials.


Background
With an increasing number of available effective interventions, the conduct of placebo-controlled clinical trials is for many diseases no longer ethically justifiable. New treatments can often not be expected to enhance the efficacy of a standard therapy. However, there are frequent instances in which one is nevertheless interested in evaluating a new therapy because of an expected advantage other than with respect to efficacy. Non-inferiority trials are performed in such situations to rule out that the treatment under investigation has unacceptably worse efficacy than an active control standard therapy, while superiority is acceptable or even desired. In contrast, it is the aim of equivalence trials to demonstrate that the difference between the two treatments is not large in either direction. Non-inferiority and equivalence trials require tailored methodology and raise special challenges with respect to design, conduct, analysis and interpretation [1][2][3][4][5] As a consequence, reporting these trials is also an ambitious task.
The CONSORT statement, which was first published in 1996 and updated in 2001 [6] and 2010 [7], contains a checklist with a set of recommendations for reporting randomised controlled clinical trials to support authors and editors. Recent publications indicate that the reporting quality has somewhat improved over time and that the CONSORT statement accounts for a substantial part in this process [8,9]. Nevertheless, evaluation of the reporting quality of specific trial designs indicated substantial deficits in the reporting of conduct, analysis and results [10,11]. In a comprehensive review, Le Henanff et al. assessed the methodological quality of 162 articles on non-inferiority and equivalence trials published in 2003 and 2004 [10]. They found important shortcomings in reporting of such trials and gave practical recommendations for an improvement. At the same time, an extension of the CONSORT Statement was published that focused on the methodological aspects specific to noninferiority and equivalence trials with the aim of improving reporting [12]. In our study, we describe how non-inferiority and equivalence trials published after the release of the CONSORT extension were conducted and reported, and the extent of adherence to the CONSORT criteria. To this end, we systematically searched and reviewed articles on trials that aimed at demonstrating non-inferiority or equivalence published in 2009. In an additional analysis, we compared the results with those previously established for trials published in 2003 and 2004 [10]. Bioequivalence studies were not included in our investigation as they show several special features making them different from clinical non-inferiority and equivalence trials, such as, for example, inclusion of healthy volunteers, application of equivalence margins that are broadly accepted by regulating agencies and the scientific community, disproportionately frequent use of cross-over design, and conduct under highly standardised conditions. Therefore, a number of topics assessed in our review do not apply, or are not directly comparable with the situation in bioequivalence trials. Furthermore, other similar studies [10,13,14] also excluded bioequivalence trials, and it was one of the aims of our work to compare our results with others.

Search strategy
We used a computerised literature search of the Medline databases and Cochrane Central Register of Controlled Clinical Trials via Ovid SP to identify reports of randomised non-inferiority and equivalence trials. We defined the following search terms: random* AND (equivalence OR equivalent OR noninferiority OR noninferior OR non-inferiority OR non-inferior) and excluded the publication type Meta-Analysis, Review, and Research Support. The search was limited to citations published between 1 January 2009 and 31 December 2009 in the English language. The due date for the search was 6 April 2010. We selected citations by screening title and abstract to identify relevant reports. The final decision was made on the basis of the full text. After internal harmonisation based on the assessment of the first 25 abstracts, each of the three reviewers (PS, MN, and NB) screened one third of the abstracts to identify the relevant reports. Reasons for exclusion of citations or articles, respectively, are given in Figure 1. In case of duplicate publications only the article that reported the main results for the primary endpoint was selected.

Evaluation criteria and data extraction
We extracted specific criteria to examine whether reports were prepared in compliance with the extension of the CONSORT statement for the reporting of noninferiority and equivalence randomised trials [12]. This included criteria referring to the reporting and to the methodological quality of the trials. In addition, we extracted general aspects to characterise the reported trials.
The reference to the respective item described in the CONSORT extension list and the way we abstracted them are given in Table 1. Some of the 22 items listed in the CONSORT extension were difficult to standardise for extraction, or the assessment would have required the evaluation of a number of associated trials published elsewhere. Therefore, we decided to assess 15 derived items that were essential in the context of noninferiority and equivalence, as well as feasible with regard to the extraction procedure. In addition, we assessed the interpretation given in the reports in relation to the results presented. We also checked whether the relevant guidelines such as the CONSORT extension for reporting of non-inferiority and equivalence randomised trials [12], the revised CONSORT statement [6], and the Points to Consider on the Choice of Noninferiority Margin [15] were referenced.
In addition, we compared our results for trials published in 2009 with the results found for non-inferiority and equivalence trials published in 2003 and 2004, before the CONSORT extension was released [10]. In this comparison, we looked for changes in trial characteristics as well as in the adherence to criteria related to reporting and methodological quality. For a further differentiated assessment, we compared the quality of reporting of trials published in four high-impact general medical journals (JAMA, NEJM, Lancet and BMJ; selection such as in Ghimire 2012 [16]) with that of trials published in low-impact general medical journals and in speciality journals.
The data extraction was done by the three reviewers. We developed a form to extract details of the selected articles. At first, the reviewers completed data extraction for a random sample of seven articles. All discrepancies were discussed in two meetings, the data extraction form was modified and the procedure was repeated. The finally agreed procedure was fixed in an instruction form accompanying the data extraction form to improve interobserver agreement. Interobserver agreement was tested using a random sample of 21 articles. Each reviewer independently extracted data from 14 articles. Each subset of seven articles was extracted in parallel by one of the other two reviewers.
The data of all remaining articles were then extracted by a single reviewer. In case of uncertainties with a specific report, a second reviewer checked the data extraction, and a solution was found by discussion.
We classified a trial as non-inferiority or equivalence trial if the terms non-inferiority or equivalence were part of the title or were mentioned as the aim of the trial in the abstract. If this information was missing, the aim of the planned analysis reported by the authors, or the kind of analysis which was actually done, was taken as an indicator for the classification of the trial.

Data analysis
We calculated descriptive summary statistics for the general and specific items stratified by the trial design. Categorical variables were described by absolute and relative frequency. Continuous variables were described by the median and 25% and 75% percentiles. Differences in trial characteristics and in adherence to reporting criteria (before and after CONSORT extension, high-and low-impact general medical journals and speciality medical journals) were quantified by calculating the increase factor based on trials per year or the absolute differences and the associated 95% confidence intervals (CI), respectively.
The interobserver agreement between the reviewers of the reports was estimated for several essential criteria by   Fleiss' Kappa [17]. The agreement was between 0.27 and 1.0 (statement of non-inferiority or equivalence margin 1.0 (calculation of CI according to Fleiss not possible), justification of margin 0.54 (95% CI 0.17 to 0.59), sample size calculation based on a margin of 0.61 (95% CI 0.32 to 0.9), report of both per-protocol (PP) and intention-to-treat (ITT) analysis 0.27 (95% CI 0.16 to 0.38), presentation of the CI for the difference between treatment groups of 0.61 (0.32 to 0.9).
All analyses were carried out with the software SAS version 9.2 (SAS Institute Inc., Cary, NC, USA).

Identification of reports
The literature search provided a total of 869 citations. A subset of 294 potentially relevant articles was identified by screening the titles and abstracts. After reviewing the full text articles we identified 209 primary reports; of these 167 (80%) were non-inferiority trials, and 42 (20%) were equivalence trials. The flowchart ( Figure 1) gives an overview of the selection process. The most frequent causes for exclusion from the analysis were that the trial was not a non-inferiority or equivalence trial and was  (13) 24 (14) 3 (7) Two durations 5 (2) 3 (2) 2 (5) Results are presented as number (%) unless stated otherwise. *Multiple answers possible, IQR: interquartile range. not a randomised clinical trial, which was mostly due to mentioning the terms 'equivalence' and 'random' in a different context, or for descriptive purpose only. Table 2 provides information on characteristics of the non-inferiority and equivalence trials included in the analysis. Eighty reports (38%) were published in general medical journals and 129 reports (62%) in speciality journals. More than three-quarters of the trials compared two treatment groups. Thirty-one trials (15%) investigated three groups and only 15 (7%) investigated four groups. Twenty-five trials (12%) included a placebo arm to show non-inferiority or equivalence of one or two experimental treatments versus placebo (ten), to show superiority of one or two treatments compared by non-inferiority or equivalence analysis (eleven) or to investigate another objective (four). Two-thirds of the reports (n = 142, 68%) dealt with trials that investigated different modalities of treatments. Eighty reports were of trials comparing the same pharmacological treatments but using different strategies (n = 49, 23%), doses (n = 27, 13%) or treatment durations (n = 5, 2%). Due to trials with more than two groups, some reports referred to the investigation of different as well as the same treatment types (n = 13, 6%). Five reports described trials with a cross-over design (2%).

Characteristics of reports
In 104 of the trials reported, a binary endpoint was chosen as the primary endpoint (50%). A continuous endpoint was investigated in 90 reports (43%) and a time-to-event endpoint in 15 reports (7%).
The median number of patients randomised per trial was 338 (25 th to 75 th percentiles 174 to 686). Noninferiority trials had a higher sample size than equivalence trials with 379 (202 to 714) v 198 (90 to 424). Table 3 shows the percentage of non-inferiority and equivalence trials that met the criteria recommended in the CONSORT extension related to reporting and methodological quality as well as the assessment of the conclusion given by the authors. Just over half of the articles gave information on the method of randomisation (n = 115, 55%), whereas 190 articles reported on the method of blinding (91%). Almost half the reports were stated as double blind (43%), and more than one-third were described as open label (37%). Dates defining the period of recruitment were frequently presented (62%), but dates defining the period of follow up were only given in 10% of the reports. The flow of participants through the trial was presented in 146 (70%) of the reports with more non-inferiority trials than equivalence trials (73% v 57%) following this recommendation.

Reporting quality
The majority of reports provided baseline information for each group (96%). In reports on equivalence trials the percentage was only slightly lower than in reports on non-inferiority trials (93% v 97%). Adverse events were presented in three-quarters of reports with a far higher percentage for non-inferiority trials (80%) than for equivalence trials (62%).
The criteria that are particularly important for noninferiority and equivalent trials in relation to reporting quality were only followed in part. Most of the reported trials could be identified as non-inferiority or equivalence trial based on title or abstract (84%), but justification for the design (48%) or a clear hypothesis (50%) was only stated in half of the reports. Justification for the choice of the non-inferiority or equivalent margin was given in only 24% of reports. The majority of reports (94%) identified a clear primary endpoint, gave information regarding the sample size calculation (90%) and the statistical methods used for the group comparison (94%) as well as the analysis sets (90%). Mostly, the percentage of non-inferiority trials meeting the respective criteria was higher than the percentage of equivalence trials. An exception was the justification for the margin which was stated more often in reports for equivalence trials than for non-inferiority trials (31% v 23%).

Methodological quality
The evaluation of the criteria related to methodological quality showed a high percentage of reports defining the non-inferiority or equivalence margin (96% v 86%). The defined margin was taken into account in the sample size calculation in 81% of the non-inferiority trials and 64% of the equivalence trials. Eighty-five reports presented results using a CI, but only 16% gave a graphical display of the CI together with the margin as recommended. Less than half of the reports (42%) stated the results of the ITT as well as the PP analysis; more non-inferiority trials than equivalence met this criterion (44% v 31%).
All reports presented an interpretation of the trial results. The conclusion drawn by the authors was comprehensible and accurate in 165 reports (79%). The assessment showed a higher percentage of non-inferiority trials than equivalence trials giving a correct interpretation (81% v 71%). In total, the interpretation was wrong in 14 reports (7%), and in 30 reports (14%) it was incomprehensible in the light of the results presented. The percentage of equivalence trials presenting an incomprehensible conclusion was considerably higher than in non-inferiority trials (43% v 13%). A third of the reports (34%) gave a more detailed statement, on which advantage was to be expected as compensation for a potential although irrelevant inferiority. In 52 reports (25%) this statement could be confirmed by the trial results. However, the reported advantage could frequently not be verified by the reviewers due to lack of information given in the report (for example, actual reduction of costs). Relevant guidelines were quoted only rarely. The CONSORT extension was quoted in 16 reports (8%). The CONSORT Statement for randomised trials was quoted only twice, and the Points-to-Consider-document referring to the choice of the margin was referenced only four times.

Changes in trial characteristics and quality of reporting
The comparison of our findings with the results of a previous study published by Le Henanff et al. [10] before the CONSORT extension was released is shown in Table 4 and Figure 2. The previous study included 162 articles that were identified within two years (116 noninferiority reports, 46 equivalence reports; Table 4 The comparison of the adherence to criteria related to reporting quality showed changes varying considerably across the different topics. The proportion of noninferiority and equivalence trials, which were clearly identifiable in the title or abstract, decreased by 15% from almost 100% down to 84%. The proportion of trials described as double blind also decreased by 15% (from 58% to 43%).
The percentage of reports presenting a justification of the defined margin continued to remain low (change from 20% to 24%), whereas the proportion of reports that presented a sample size calculation increased by 11%. Reports on non-inferiority trials more often showed all elements for recalculation of the sample size (increase by 8%), but this proportion decreased for equivalence trials (5%).
Considering the methodological quality there was hardly any improvement. Reports of non-inferiority trials published after release of the CONSORT extension more often presented a sample size calculation taking into account the margin, as compared to reports previously published, with an absolute change of 12% (1.4% to 21.8%). The adherence to other methodological items, such as the defined margin presented, results of PP and ITT analysis presented, and results presented with a CI, showed almost no changes (absolute change in percentage of adherence was between −0.1% and 3.1%). Only the percentage of reports on non-inferiority trials compliant with all four important methodological criteria increased by 17 percentage points (6.6 to 27.5).
Reports on equivalence trials showed a more negative picture. The adherence to the methodological criteria decreased as compared to reports published before the CONSORT extension. The absolute change varied between −12% and −6%. Only the proportion of reports that were compliant with all important methodological criteria remained stable at a level of 20%, with absolute change of 1.9% (−15% to 18.8%).

Discussion
We investigated standard criteria regarding the reporting and methodological quality of non-inferiority and equivalence trials recommended in the CONSORT extension and compared our results with the results of a previous investigation including trials reported before the CONSORT extension was published [10,12].
The major result of our investigation was a substantial lack of quality in reporting and conduct. Most of the reported trials could be identified as non-inferiority or equivalence trials based on title or abstract, but less than half of the reports gave a rationale for the design or the expected advantage. The non-inferiority or equivalence margin was defined in most reports, but the majority of reports gave no information about the justification of the margin defined. A sample size calculation was presented in 90% of the reports, but it did not always take the margin into account. More than half of the reports did not present the results of ITT as well as PP analyses. Far too many reports gave an interpretation that was wrong or incomprehensible. Reports of equivalence trials mostly showed a lower reporting quality than reports of non-inferiority trials. Overall, there is no relevant improvement since the release of the CONSORT extension for non-inferiority and equivalence trials. Moreover, the ITT, intention-to-treat.
need to improve certain aspects also comprised items that should be standard quality for any randomised trial. However, this overall finding might be biased due to an increased number of trials published in journals that do not have a rigorous peer-review process based on strict rules, and that do not endorse the CONSORT statement and its extensions.
Hopewell and colleagues showed that journal endorsement of CONSORT seems to be associated with improvements in reporting of randomised trials [18]. They highlighted the need for journals to endorse the CONSORT statement and its extensions for special designs and to incorporate the checklists and recommendations into the peer-reviewing process [18]. In 2006, Hopewell et al. [8] found a proportion of 44% randomised trials published in journals that endorsed CONSORT. Later in 2008, Hopewell and colleagues [18] reported that an even smaller proportion of a selection of high-impact journals mentioned the general CONSORT statement in the instructions for authors, and that only a fraction of those journals stated that this was a requirement. Moreover, very few journals mentioned the CONSORT extension papers [18]. Since neither the present nor the previous study determined the proportion of reports published in CONSORT-endorsing journals, we could not give information referring to this proportion in journals publishing non-inferiority and equivalence trials. Instead, we stratified by impact factor and journal type and compared the results for reports published in high-impact general medical journals, in low-factor general medical journals, and in speciality journals (see Additional file 1, Annex). The number of non-inferiority and equivalence trials that have been published per year increased strongly. Here, it should be noted that reports published in speciality medical journals are still predominating in our investigation, but the proportion is somewhat smaller than in the previous study what might be caused by a different classification.
In our review we found that only a little over half the reports gave information referring to the randomisation. The proportion was substantially larger in high-impact journals as compared to low-impact general medical journals or speciality journals. Since a similar composition for the groups compared is best attained by randomly dividing a single sample population, and randomisation is a sound basic for statistical inference [4], it should be implemented and the method should be reported.
We found that more than half of the articles reported blinding of the trial and more than one-third were open label trials. Hopewell et al. [8] found a similar proportion of randomised trials referring to any blinding but a smaller proportion were reported as unblinded. However, they also found a substantial proportion of reports Figure 2 Change of adherence to quality criteria for reporting of non-inferiority (a) and equivalence (b) trials published after release of the CONSORT extension for non-inferiority and equivalence trials in relation to trials published before the CONSORT extension [10].
that do not give clear information on blinding. In the evaluation of non-inferiority trials by Wangge et al. [14] there one third of trials were also reported as open label. In their review they pointed out that this was not consistent with the guidelines, which recommend blinding of any randomised trial whenever possible [4].
A diagram showing the flow of participants for each group was presented in more than two-thirds of the reports on non-inferiority and equivalence trials. The subgroup of high-impact journals showed even a larger proportion. A review on randomised trials in special medical fields found smaller proportions [19,20]. Surprisingly, Hopewell et al. [8] reported that fewer than one third of reports of randomised trials published in 2006 included details of participant flow. Overall, with respect to those items there seemed to be a somewhat better reporting quality in non-inferiority and equivalence trials compared to any randomised trials.
The most important specific element in planning a non-inferiority or equivalence trial is the definition and the method of the determination of the margin. In noninferiority trials the margin was reported in most of the articles, which was much more often than in equivalence trials. In comparison to the results of Le Henanff et al.
this was no improvement, and was even a change for the worse in equivalence trials. Only reports published in high-impact general medical journals accomplished this requirement in full. However, the far more important justification of the margin was just given in a quarter of the reports. Though this meant a small improvement compared to the period before the release of the CONSORT extension [12], it was far too low. Wangge et al. reported a higher percentage of 46% of the articles that gave the method by which the margin was determined, but this was also not sufficient [14]. This higher percentage could be caused by the different selection of trials the authors investigated, which excluded non-drug trials. However, we did not find such results for trials published in high-impact general medical journals.
Most of the reports presented information on the sample size calculation and there was some improvement in comparison to previous studies. Although this portion was higher in high-impact than in low-impact journals, the more important finding was the fact that in a substantial proportion of reports the margin was not considered, and an even larger number of articles did not report on all elements needed for a recalculation (20% or 40%, respectively). There too, in high-impact Table 4 Changes of trial characteristics a and adherence b to criteria related to reporting and methological quality of non-inferiority and equivalence trials reported before and after publication of the CONSORT extension journals we found a better reporting quality with respect to the details than in speciality or low-impact journals. The lower quality in speciality journals was also confirmed by a review on non-inferiority and equivalence trials in a specialised medical area [13].
With respect to the analysis, the guidelines state that both ITT and PP analysis have equal importance in noninferiority and equivalence trials [5,12], since both analyses can be biased. However, nearly half of the reports stated the results of both analyses, which is similar to the percentage in the period before publication of the CONSORT extension [10]. Trials published in highimpact journals showed only slightly better results. However, the overall results found by Wangge et al. were similar, though the proportion of high-impact journals reporting on both analysis sets was considerably lower [14].
In most reports the results were presented with a CI as recommended (85%). In high-impact journals this was even true for 100% of the reports. Nevertheless, only a fraction of reports graphically displayed the CI together with the margin, which is the recommended and most informative way of presenting the results with respect to interpretation. In our investigation there were only small differences between trials published in the different types of journals. But in the speciality reports reviewed by Eyawo et al., graphical display of results was very rare [13].
In nearly two-thirds of the reports the authors concluded that non-inferiority or equivalence, respectively, was demonstrated. Wangge et al. found a substantially higher portion of the reports claiming to have demonstrated non-inferiority (90%) [14]. Nevertheless, more important than the respective conclusion by the authors is whether the conclusion is confirmed by the results presented. The percentage of reports with wrong or incomprehensible conclusion added up to one fifth for non-inferiority trials or even more for equivalence trials, which is far too high. After all, many readers gather only the primary message of a paper and will be misled.
The strengths of our investigation were that we assessed the complete set of reports of non-inferiority and equivalence trials published in 2009 and identified these by a clearly defined search strategy. Hence, the basis of our investigation is the entire picture of published reports and considerably exceeds a more or less representative random sample [11,14] or an analysis of trials regarding only a specific therapeutic area [13,[19][20][21][22].
In order to check the criteria most relevant for non-inferiority and equivalence trials, we abstracted most of the recommendations described in the extension to the CONSORT statement for this trial type. Furthermore, we defined all evaluation criteria a priori and established a comprehensive review procedure.
However, our research has also several limitations. Due to the considerable effort we could only investigate reports published in 2009 and we therefore had no own investigation of another year's set of reports for a direct comparison. We decided to use a search strategy similar to the one used in a previous survey published in 2006 [10] to allow a comparison with this and to investigate the impact of the CONSORT extension. However, we are aware of the possible bias due to the course of time and different reviewers. This approach was limited due to the selection of criteria reported in the previous publication [10] and did not allow the comparison of some important items as well as a stratified comparison for high-and low-impact general medical journals for both years. We therefore could only present the stratified results for reports published in 2009. Moreover, it was not possible to standardise or abstract all 22 items referred to in the CONSORT statement. For example, the appropriateness of the interventions with respect to trials that established efficacy for the reference treatment is an important issue, but this would have required the assessment of a huge number of associated trials published elsewhere. Due to the inclusion in our database search of the term 'equivalent' we got a particularly high number of unspecific results. Though each abstract was carefully examined and questionable cases were clarified by two reviewers using the full text, it might be possible that some further suitable reports were not selected for the analysis.

Conclusion
There are still important deficiencies in the reporting on the methodological approach as well as on results and interpretation of non-inferiority and equivalence trials even in high-impact journals. Improvement of the overall situation seems to require other measures than appropriate guidelines and recommendations. It might be helpful to facilitate a better overview and access to the guidelines relevant for the different trial types. But it might be more important to support researchers and reviewers by offering specific training accompanied by an explicit demand of a strict monitoring of CONSORT requirements in the peer-review process. This approach is strongly assisted by the EQUATOR-network, an international initiative that tries to improve the reliability and value of medical research literature by promoting transparent and accurate reporting of research studies [23,24]. Hopefully, these comprehensive measures will have a positive effect on the quality of reporting in different trial types. In any case, there is an urgent need for improvement, which is especially important against