- Open Access
- Open Peer Review
Superiority and non-inferiority: two sides of the same coin?
© The Author(s). 2018
- Received: 1 February 2018
- Accepted: 29 August 2018
- Published: 17 September 2018
The classification of phase 3 trials as superiority or non-inferiority has become routine, and it is widely accepted that there are important differences between the two types of trial in their design, analysis and interpretation.
There is a clear rationale for the superiority/non-inferiority framework in the context of regulatory trials. The focus of our article is non-regulatory trials with a public health objective. First, using two examples from infectious disease research, we show that the classification of superiority or non-inferiority trials is not always straightforward. Second, we show that several arguments for different approaches to the design, analysis and interpretation of superiority and non-inferiority trials are unconvincing when examined in detail. We consider, in particular, the calculation of sample size (and the choice of delta or the non-inferiority margin), intention-to-treat versus per-protocol analyses, and one-sided versus two-sided confidence intervals. We argue that the superiority/non-inferiority framework is not just unnecessary but can have a detrimental effect, being a barrier to clear scientific thought and communication. In particular, it places undue emphasis on tests for significance or non-inferiority at the expense of estimation. We emphasise that these concerns apply to phase 3 non-regulatory trials in general, not just to those where the classification of the trial as superiority or non-inferiority is ambiguous.
Guidelines and statistical practice should abandon the sharp division between superiority and non-inferiority phase 3 non-regulatory trials and be more closely aligned to the clinical and public health questions that motivate the trial.
It is widely accepted that important differences exist between superiority and non-inferiority trials in terms of their design, analysis and interpretation. This is reflected in regulatory agency guidelines, CONSORT statements on the reporting of trials and review articles [1–5]. The European Medicines Agency states the “pre-definition of a trial as a superiority trial, an equivalence trial or a non-inferiority trial is necessary for numerous reasons” , and one reporting guideline asserts that non-inferiority trials present “particular difficulties in their design, analysis, and interpretation” . Focussing on non-regulatory trials with a public health objective, our article challenges this dogma.
CAP-IT is a UK-based factorial randomised controlled trial assessing the optimal dose and duration of amoxicillin treatment for children with community-acquired pneumonia, with a primary outcome of clinical non-response requiring re-treatment (http://www.nets.nihr.ac.uk/projects/hta/138811). In original discussions, it was decided to compare the doses of 125 mg and 250 mg, both three times per day (although the final trial design was based on weight-band dependent dosing). At that time, and in the absence of any randomised evidence, the British National Formulary specified a 250 mg dose, but surveys had shown that the 125 mg dose was more commonly used in clinical practice [6, 7]. This raised the dilemma of which dose should be defined as standard and which as experimental. Following conventional statistical thinking, defining 250 mg as the standard dose implies a non-inferiority trial as the lower 125 mg dose would be unlikely to reduce the rate of relapse; conversely, defining 125 mg as the standard dose implies a superiority trial. The fact that the definition of standard versus experimental dose is arbitrary implies that the classification of the trial as superiority or non-inferiority is also arbitrary.
Comparison of SECOND-LINE and EARNEST studies
Raltegravir less toxic than nucleoside reverse transcriptase inhibitors (NRTIs), aim to show similar efficacy
Raltegravir more expensive, aim to show better efficacy than NRTIs
37 sites in 15 countries in 5 continents
14 sites in 5 sub-Saharan African countries
Number of subjects
Viral load < 200 copies/mL at 48 weeks
Composite endpoint (good HIV disease control) at 96 weeks
Frequency of primary endpoint
Difference = 1.8% (95% CI –4.7 to 8.3)
Difference = 4.2% (95% CI –2.4 to 10.7)
Criterion for non-inferiority fulfilled
Superiority of raltegravir not shown
Interpretation (précised from paper Abstract)
The raltegravir regimen was easy to administer, effective, safe and tolerable … This simple NRTI-free treatment strategy might extend the successful public health approach to management of HIV
NRTIs retained substantial virologic activity without evidence of increased toxicity, and there was no advantage to replacing them with raltegravir
In the remaining sections we discuss some areas where important differences are perceived to exist between trials classified as non-inferiority and those classified as superiority. Our points apply both to trials where the classification is natural and those where it is not, such as the CAP-IT trial.
In a superiority trial, the sample size calculation is conventionally based on achieving adequate power to demonstrate that the relevant confidence limit for the difference between the two treatments excludes zero, assuming that the experimental treatment is superior by a given amount (‘delta’). In a non-inferiority trial, the calculation is conventionally based on achieving adequate power to demonstrate that the relevant confidence limit excludes the specified non-inferiority margin, assuming that the two treatments are equally effective [5, 11]; these problems are symmetrical, given these assumptions . In the case of continuous variables, the sample size formulae are identical, provided two-sided confidence intervals (CIs) are used. In the case of binary variables, the formulae yield minor differences related to the computation of standard errors; this difference can go in either direction .
This raises the critical question of whether delta and the non-inferiority margin are conceptually different or identical. We believe they are the same, with their meaning best captured by the term ‘smallest clinically important difference’, which can be quantified by eliciting opinions of expert clinicians and patients [13, 14]. There is no good reason why the size of this difference (and by implication the sample size) should depend on whether the trial is defined as superiority or non-inferiority. In particular, it is a misconception that non-inferiority trials need to be much larger than superiority trials . One reason why superiority trials are sometimes smaller is that delta is instead chosen as the value that corresponds to the expected difference, with optimistic values selected to reduce the sample size [14, 15]. Additionally, some non-inferiority trials define the non-inferiority margin as a certain fraction of the effect of the standard treatment (active control) as estimated from previous placebo-controlled trials [1, 16]. However, the logic of this approach has been challenged in the regulatory setting . The rationale for triangulating results with a hypothetical placebo group is even weaker in a health service context if offering no treatment to a patient with the condition in question is not a viable clinical option.
In superiority trials, a rigorous primary analysis should include all randomised patients, irrespective of whether they took study medication as randomised (intention-to-treat). Historically, non-inferiority trials placed greater emphasis on ‘per-protocol’ analyses, which exclude patients with major protocol violations, including unacceptably low levels of adherence to the study drug . The rationale for this is that including such patients dilutes the observed difference between the randomised groups and therefore increases the chance of demonstrating non-inferiority (if the experimental treatment is inferior). However, there is increasing scepticism about the value of per-protocol analyses because these subvert the integrity of the randomisation and the considerable variation in interpretation of what constitutes the per-protocol population [15, 19–21]. A range of methods to assess the impact of non-adherence have been developed, which can be applied equally to superiority and non-inferiority trials [22, 23]. The selection of the most appropriate method depends critically on the primary research question (e.g. whether inference is intended to apply to all patients or just to those who adhere to the recommended treatment), requiring clear communication between clinical researchers and statisticians .
In the SECOND-LINE trial described above, the non-inferiority margin was specified as 12%. Further, 80.8% of patients in the NRTI (control) group and 82.6% of patients in the raltegravir (experimental) group met the primary endpoint (HIV RNA plasma viral load < 200 copies/mL at 48 weeks), a difference of 1.8% (95% CI –4.7 to 8.3). In the Abstract, the authors concluded that the “criterion for non-inferiority was fulfilled”  i.e. following advice in the CONSORT guidelines to take the non-inferiority hypothesis (margin) into account in the interpretation of the results. However, the lower limit of the observed CI tells us that raltegravir is inferior to NRTIs by a margin of 4.7% at most, i.e. approximately three-fold smaller than the pre-specified non-inferiority margin. As inference should be based primarily on point estimates and CIs rather than significance tests , the emphasis in the results should be on the observed value of 4.7% rather than the arbitrary value of 12%. As other authors have pointed out: “we will eventually come to see that the pre-specification by the sponsor of a non-inferiority margin does not form part of any rational approach to analysing such trials” . Finally, reports of superiority trials usually mention ‘delta’ only in the justification of the sample size calculation in the Methods section, rarely playing a part in the interpretation of the results. This is in sharp contrast with the central role of the non-inferiority margin in the interpretation of non-inferiority trials, and is a logical inconsistency between the two types of trial.
A leading medical journal requires that superiority trials present two-sided CIs but that non-inferiority trials present one-sided CIs . This is based on the dubious argument that “a non-inferiority trial only aims to demonstrate non-inferiority and does not aim to distinguish non-inferiority from superiority” . However, regulatory agencies do not exclude the possibility of switching between superiority and non-inferiority , and it makes no sense to ignore evidence on superiority if a trial produces such evidence, even if this outcome was not anticipated. A recent paper argues that a clear distinction should be made between statistical and clinical superiority, along with consistent presentation of two-sided CIs .
The SECOND-LINE and EARNEST trials both found no material difference between the two randomised treatment strategies in terms of the study primary endpoints (Table 1). The investigators of EARNEST (the superiority trial) interpreted their results as evidence supporting the use of NRTIs in second-line regimens; the investigators of SECOND-LINE (the non-inferiority trial) concluded that raltegravir was an acceptable alternative to NRTIs in a second-line regimen. These conclusions are both ‘correct’ within the particular statistical framework chosen by the trial investigators. The fact that the conclusions are contradictory, despite a partial geographical overlap in the location of trial sites, raises concerns about the framework itself. While it is not unreasonable for two scientists to interpret the same data differently, the pre-definition of a trial as superiority or non-inferiority tends to impel a certain narrative influenced by the results of tests of significance or non-inferiority.
Non-inferiority trials were originally developed in the setting of drug approval, where regulatory agencies have to make a binary decision – either to licence or to not licence the experimental treatment. To ensure that the process is transparent and explicit, the agencies justifiably require that the study sponsors produce detailed study protocols, including pre-specification of the non-inferiority margin. In contrast, the main objective of non-licencing trials is to publish information that allows other bodies (commissioners of health services, producers of clinical guidelines, etc.) to make considered decisions about which treatments should be funded or recommended. These decisions are complex and need to consider issues such as cost, adverse drug effects and quality of life, in addition to clinical efficacy . Ideally, decision analysis models should be employed based on a synthesis of all relevant evidence. Evidence syntheses do not treat superiority and non-inferiority trials differently, nor do they consider whether a trial delivered a significant or non-significant result. As pointed out by Claxton: “the historical accident that dictates which of the alternatives is regarded as current practice is irrelevant” .
Our two examples highlight that the classification of trials as superiority or non-inferiority is sometimes arbitrary, particularly when the classification of treatment groups as standard or experimental is not straightforward. This would not matter much if the distinction was only one of terminology, but the received wisdom is that this classification has an important bearing on how a trial is designed, analysed and interpreted. However, we have shown that the arguments in support of this belief are weak and contend that the superiority/non-inferiority framework can act as a barrier to clear scientific thought and communication. In particular, it places undue emphasis on tests for significance or non-inferiority at the expense of estimation. We stress that these concerns apply to phase 3 non-regulatory trials in general, not just to those where the classification is ambiguous. Guidelines and statistical practice should abandon the sharp division between superiority and non-inferiority phase 3 non-regulatory trials, and should instead be more closely aligned to the clinical and public health questions that motivate the trial.
We thank Julia Bielicki, Mark Boyd, Tony Brady, Nicholas Paton and Mike Sharland for their comments on the paper, although the views expressed are our own.
David Dunn and Andrew Copas were supported by the UK Medical Research Council (MR_UU_12023/23).
The paper arose from discussions between DTD and AJC. PB provided clinical insights. DTD drafted the manuscript. All authors contributed to revisions of the draft, and read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
All authors grant consent for publication.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- U.S. Department of Health and Human Services, Food and Drug Administration. Non-inferiority clinical trials to establish effectiveness. In: Guidance for industry; 2016. https://www.fda.gov/downloads/Drugs/Guidances/UCM202140.pdf. Accessed 11 Sept 2018.
- Piaggio G, Elbourne DR, Altman DG, Pocock SJ, Evans SJ. Reporting of noninferiority and equivalence randomized trials: an extension of the CONSORT statement. JAMA. 2006;295(10):1152–60.View ArticleGoogle Scholar
- ICH Harmonised Tripartite Guideline. Choice of control group and related issues in clinical trials. E10. 2000. https://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E10/Step4/E10_Guideline.pdf. Accessed 11 Sept 2018.
- The European Agency for the Evaluation of Medicinal Products, Committee for Proprietary Medicinal Products. Points to consider on switching between superiority and non-inferiority. 2000. http://www.ema.europa.eu/docs/en_GB/document_library/Scientific_guideline/2009/09/WC500003658.pdf. Accessed 11 Sept 2018.
- Mauri L, D'Agostino RB Sr. Challenges in the design and interpretation of noninferiority trials. N Engl J Med. 2017;377(14):1357–67.View ArticleGoogle Scholar
- Bielicki JA, Barker CI, Saxena S, Wong IC, Long PF, Sharland M. Not too little, not too much: problems of selecting oral antibiotic dose for children. BMJ. 2015;351:h5447.View ArticleGoogle Scholar
- Saxena S, Ismael Z, Murray ML, Barker C, Wong IC, Sharland M, Long PF. Oral penicillin prescribing for children in the UK: a comparison with BNF for children age-band recommendations. Br J Gen Pract. 2014;64(621):e217–22.View ArticleGoogle Scholar
- World Health Organization. Consolidated guidelines on the use of antiretroviral drugs for treating and preventing HIV infection: recommendations for a public health approach. Geneva: WHO; 2013.Google Scholar
- Second-line Study Group. Ritonavir-boosted lopinavir plus nucleoside or nucleotide reverse transcriptase inhibitors versus ritonavir-boosted lopinavir plus raltegravir for treatment of HIV-1 infection in adults with virological failure of a standard first-line ART regimen (SECOND-LINE): a randomised, open-label, non-inferiority study. Lancet. 2013;381(9883):2091–9.View ArticleGoogle Scholar
- Paton NI, Kityo C, Hoppe A, Reid A, Kambugu A, Lugemwa A, van Oosterhout JJ, Kiconco M, Siika A, Mwebaze R, et al. Assessment of second-line antiretroviral regimens for HIV therapy in Africa. N Engl J Med. 2014;371(3):234–47.View ArticleGoogle Scholar
- Ganju J, Rom D. Non-inferiority versus superiority drug claims: the (not so) subtle distinction. Trials. 2017;18(1):278.View ArticleGoogle Scholar
- Blackwelder WC. “Proving the null hypothesis” in clinical trials. Control Clin Trials. 1982;3(4):345–53.View ArticleGoogle Scholar
- Corica T, Joseph D, Saunders C, Bulsara M, Nowak AK. Intraoperative radiotherapy for early breast cancer: do health professionals choose convenience or risk? Radiat Oncol. 2014;9:33.View ArticleGoogle Scholar
- Spiegelhalter DJ, Freedman LS, Parmar MKB. Bayesian approaches to randomized trials. J R Stat Soc Ser A. 1994;157:357–416.View ArticleGoogle Scholar
- Fleming TR. Current issues in non-inferiority trials. Stat Med. 2008;27(3):317–32.View ArticleGoogle Scholar
- Schumi J, Wittes JT. Through the looking glass: understanding non-inferiority. Trials. 2011;12:106.View ArticleGoogle Scholar
- Snapinn S, Jiang Q. Preservation of effect and the regulatory approval of new treatments on the basis of non-inferiority trials. Stat Med. 2008;27(3):382–91.View ArticleGoogle Scholar
- Jones B, Jarvis P, Lewis JA, Ebbutt AF. Trials to assess equivalence: the importance of rigorous methods. BMJ. 1996;313(7048):36–9.View ArticleGoogle Scholar
- Hill A, Sabin C. Designing and interpreting HIV noninferiority trials in naive and experienced patients. AIDS. 2008;22(8):913–21.View ArticleGoogle Scholar
- Abraha I, Montedori A. Modified intention to treat reporting in randomised controlled trials: systematic review. BMJ. 2010;340:c2697.View ArticleGoogle Scholar
- Wiens BL, Zhao W. The role of intention to treat in analysis of noninferiority studies. Clin Trials. 2007;4(3):286–91.View ArticleGoogle Scholar
- Shrier I, Steele RJ, Verhagen E, Herbert R, Riddell CA, Kaufman JS. Beyond intention to treat: what is the right question? Clin Trials. 2014;11(1):28–37.View ArticleGoogle Scholar
- Hauck WW, Anderson S. Some issues in the design and analysis of equivalence trials. Drug Inf J. 1999;33(1):109–17.View ArticleGoogle Scholar
- Sterne JA, Davey Smith G. Sifting the evidence-what's wrong with significance tests? BMJ. 2001;322(7280):226–31.View ArticleGoogle Scholar
- Senn S. Equivalence is different - some comments on therapeutic equivalence. Biom J. 2005;47:104–7.View ArticleGoogle Scholar
- Kaji AH, Lewis RJ. Noninferiority trials: is a new treatment almost as effective as another? JAMA. 2015;313(23):2371–2.View ArticleGoogle Scholar
- Claxton K. The irrelevance of inference: a decision-making approach to the stochastic evaluation of health care technologies. J Health Econ. 1999;18(3):341–64.View ArticleGoogle Scholar