We investigated standard criteria regarding the reporting and methodological quality of non-inferiority and equivalence trials recommended in the CONSORT extension and compared our results with the results of a previous investigation including trials reported before the CONSORT extension was published [10, 12].
The major result of our investigation was a substantial lack of quality in reporting and conduct. Most of the reported trials could be identified as non-inferiority or equivalence trials based on title or abstract, but less than half of the reports gave a rationale for the design or the expected advantage. The non-inferiority or equivalence margin was defined in most reports, but the majority of reports gave no information about the justification of the margin defined. A sample size calculation was presented in 90% of the reports, but it did not always take the margin into account. More than half of the reports did not present the results of ITT as well as PP analyses. Far too many reports gave an interpretation that was wrong or incomprehensible. Reports of equivalence trials mostly showed a lower reporting quality than reports of non-inferiority trials. Overall, there is no relevant improvement since the release of the CONSORT extension for non-inferiority and equivalence trials. Moreover, the need to improve certain aspects also comprised items that should be standard quality for any randomised trial.
However, this overall finding might be biased due to an increased number of trials published in journals that do not have a rigorous peer-review process based on strict rules, and that do not endorse the CONSORT statement and its extensions.
Hopewell and colleagues showed that journal endorsement of CONSORT seems to be associated with improvements in reporting of randomised trials . They highlighted the need for journals to endorse the CONSORT statement and its extensions for special designs and to incorporate the checklists and recommendations into the peer-reviewing process . In 2006, Hopewell et al.  found a proportion of 44% randomised trials published in journals that endorsed CONSORT. Later in 2008, Hopewell and colleagues  reported that an even smaller proportion of a selection of high-impact journals mentioned the general CONSORT statement in the instructions for authors, and that only a fraction of those journals stated that this was a requirement. Moreover, very few journals mentioned the CONSORT extension papers . Since neither the present nor the previous study determined the proportion of reports published in CONSORT-endorsing journals, we could not give information referring to this proportion in journals publishing non-inferiority and equivalence trials. Instead, we stratified by impact factor and journal type and compared the results for reports published in high-impact general medical journals, in low-factor general medical journals, and in speciality journals (see Additional file 1, Annex). The number of non-inferiority and equivalence trials that have been published per year increased strongly. Here, it should be noted that reports published in speciality medical journals are still predominating in our investigation, but the proportion is somewhat smaller than in the previous study what might be caused by a different classification.
In our review we found that only a little over half the reports gave information referring to the randomisation. The proportion was substantially larger in high-impact journals as compared to low-impact general medical journals or speciality journals. Since a similar composition for the groups compared is best attained by randomly dividing a single sample population, and randomisation is a sound basic for statistical inference , it should be implemented and the method should be reported.
We found that more than half of the articles reported blinding of the trial and more than one-third were open label trials. Hopewell et al.  found a similar proportion of randomised trials referring to any blinding but a smaller proportion were reported as unblinded. However, they also found a substantial proportion of reports that do not give clear information on blinding. In the evaluation of non-inferiority trials by Wangge et al.  there one third of trials were also reported as open label. In their review they pointed out that this was not consistent with the guidelines, which recommend blinding of any randomised trial whenever possible .
A diagram showing the flow of participants for each group was presented in more than two-thirds of the reports on non-inferiority and equivalence trials. The subgroup of high-impact journals showed even a larger proportion. A review on randomised trials in special medical fields found smaller proportions [19, 20]. Surprisingly, Hopewell et al.  reported that fewer than one third of reports of randomised trials published in 2006 included details of participant flow. Overall, with respect to those items there seemed to be a somewhat better reporting quality in non-inferiority and equivalence trials compared to any randomised trials.
The most important specific element in planning a non-inferiority or equivalence trial is the definition and the method of the determination of the margin. In non-inferiority trials the margin was reported in most of the articles, which was much more often than in equivalence trials. In comparison to the results of Le Henanff et al. this was no improvement, and was even a change for the worse in equivalence trials. Only reports published in high-impact general medical journals accomplished this requirement in full. However, the far more important justification of the margin was just given in a quarter of the reports. Though this meant a small improvement compared to the period before the release of the CONSORT extension , it was far too low. Wangge et al. reported a higher percentage of 46% of the articles that gave the method by which the margin was determined, but this was also not sufficient . This higher percentage could be caused by the different selection of trials the authors investigated, which excluded non-drug trials. However, we did not find such results for trials published in high-impact general medical journals.
Most of the reports presented information on the sample size calculation and there was some improvement in comparison to previous studies. Although this portion was higher in high-impact than in low-impact journals, the more important finding was the fact that in a substantial proportion of reports the margin was not considered, and an even larger number of articles did not report on all elements needed for a recalculation (20% or 40%, respectively). There too, in high-impact journals we found a better reporting quality with respect to the details than in speciality or low-impact journals. The lower quality in speciality journals was also confirmed by a review on non-inferiority and equivalence trials in a specialised medical area .
With respect to the analysis, the guidelines state that both ITT and PP analysis have equal importance in non-inferiority and equivalence trials [5, 12], since both analyses can be biased. However, nearly half of the reports stated the results of both analyses, which is similar to the percentage in the period before publication of the CONSORT extension . Trials published in high-impact journals showed only slightly better results. However, the overall results found by Wangge et al. were similar, though the proportion of high-impact journals reporting on both analysis sets was considerably lower .
In most reports the results were presented with a CI as recommended (85%). In high-impact journals this was even true for 100% of the reports. Nevertheless, only a fraction of reports graphically displayed the CI together with the margin, which is the recommended and most informative way of presenting the results with respect to interpretation. In our investigation there were only small differences between trials published in the different types of journals. But in the speciality reports reviewed by Eyawo et al., graphical display of results was very rare .
In nearly two-thirds of the reports the authors concluded that non-inferiority or equivalence, respectively, was demonstrated. Wangge et al. found a substantially higher portion of the reports claiming to have demonstrated non-inferiority (90%) . Nevertheless, more important than the respective conclusion by the authors is whether the conclusion is confirmed by the results presented. The percentage of reports with wrong or incomprehensible conclusion added up to one fifth for non-inferiority trials or even more for equivalence trials, which is far too high. After all, many readers gather only the primary message of a paper and will be misled.
The strengths of our investigation were that we assessed the complete set of reports of non-inferiority and equivalence trials published in 2009 and identified these by a clearly defined search strategy. Hence, the basis of our investigation is the entire picture of published reports and considerably exceeds a more or less representative random sample [11, 14] or an analysis of trials regarding only a specific therapeutic area [13, 19–22].
In order to check the criteria most relevant for non-inferiority and equivalence trials, we abstracted most of the recommendations described in the extension to the CONSORT statement for this trial type. Furthermore, we defined all evaluation criteria a priori and established a comprehensive review procedure.
However, our research has also several limitations. Due to the considerable effort we could only investigate reports published in 2009 and we therefore had no own investigation of another year’s set of reports for a direct comparison. We decided to use a search strategy similar to the one used in a previous survey published in 2006  to allow a comparison with this and to investigate the impact of the CONSORT extension. However, we are aware of the possible bias due to the course of time and different reviewers. This approach was limited due to the selection of criteria reported in the previous publication  and did not allow the comparison of some important items as well as a stratified comparison for high- and low-impact general medical journals for both years. We therefore could only present the stratified results for reports published in 2009.
Moreover, it was not possible to standardise or abstract all 22 items referred to in the CONSORT statement. For example, the appropriateness of the interventions with respect to trials that established efficacy for the reference treatment is an important issue, but this would have required the assessment of a huge number of associated trials published elsewhere. Due to the inclusion in our database search of the term ‘equivalent’ we got a particularly high number of unspecific results. Though each abstract was carefully examined and questionable cases were clarified by two reviewers using the full text, it might be possible that some further suitable reports were not selected for the analysis.