Commentary | Open | Open Peer Review | Published:
In search of justification for the unpredictability paradox
Trialsvolume 15, Article number: 480 (2014)
A 2011 Cochrane Review found that adequately randomized trials sometimes revealed larger, sometimes smaller, and often similar effect sizes to inadequately randomized trials. However, they found no average statistically significant difference in effect sizes between the two study types. Yet instead of concluding that adequate randomization had no effect the review authors postulated the “unpredictability paradox”, which states that randomized and non-randomized studies differ, but in an unpredictable direction. However, stipulating the unpredictability paradox is problematic for several reasons: 1) it makes the authors’ conclusion that adequate randomization makes a difference unfalsifiable—if it turned out that adequately randomized trials had significantly different average results from inadequately randomized trials the authors could have pooled the results and concluded that adequate randomization protected against bias; 2) it leaves other authors of reviews with similar results confused about whether or not to pool results (and hence which conclusions to draw); 3) it discourages researchers from investigating the conditions under which adequate randomization over- or under-exaggerates apparent treatment benefits; and 4) it could obscure the relative importance of allocation concealment and blinding which may be more important than adequate randomization.
In spite of the rationale for adequate randomization, differences between adequately and inadequately randomized trials have proven difficult to detect empirically. In 1995, Schulz and colleagues  found that trials using allocation concealment (concealing which participants are in each treatment group) and double-blinding yielded smaller effect sizes, but they found no statistically significant benefit of adequate over inadequate randomization. Odgaard-Jensen and colleagues  conducted an overview of systematic reviews in 2011 in an attempt to provide more definitive evidence. The review included systematic reviews comparing randomized trials with trials that used some other, non-random method of assignment to conditions (such as alternation). Of the seven reviews eligible for the meta-analysis, six failed to detect a statistically significant difference between adequately and inadequately randomized trials, and one revealed smaller effects in randomized trials. Three of the six reviews that failed to detect a statistically significant difference suggested that adequate randomization increased effect sizes, and three suggested they reduced effect sizes.
Had they pooled the results (which we did, see Figure 1), they would have reported no statistically significant difference between the two study types, yet Odgaard-Jensen and colleagues did not pool the results. Instead they asserted that the results from randomized and non-randomized studies differ, but in an unpredictable direction: “it is not generally possible to predict the magnitude, or even the direction, of possible selection biases and consequent distortions of treatment effects from studies with non-random allocation” . They called this the “unpredictability paradox”.
Yet there are several problems with the inference to the “unpredictability paradox” from the observed data.
Invoking the unpredictability paradox makes the conclusions of the Odgaard-Jensen review unfalsifiable and unscientific (from a Popperian perspective) . If it turned out that randomized trials had average significantly different average results from non-randomized studies, the authors could have pooled the results and concluded that adequately randomized trials were better. In fact, adequate randomization did not yield statistically significant different average results, and the authors drew the very same conclusion that they could have had the data indicated differences between adequately and inadequately randomized trials. Drawing the same conclusion from conflicting evidence allows us to make assertions that do not take empirical evidence into account, which is unscientific in the absence of further justification.
Appeal to the unpredictability paradox reveals an inconsistent approach with regards to pooling data in Cochrane Review methodology. When we pooled the results from the Odgaard-Jensen and colleagues review we found no statistically significant difference between randomized and non-randomized trials (standardized mean difference = −0.17, 95% CI = −0.64 to 0.29; P = 0.47; Figure 1). The decision to pool appears to justify the inference to the conclusion that adequate randomization was not a methodological benefit easy to draw. (As an aside, the problem is not whether to pool itself, but rather the inference from the unpooled result to the conclusion of a difference in an unpredictable direction.) The Cochrane Handbook recommends not pooling highly heterogeneous results , yet the results of the Odgaard-Jensen and colleagues review were remarkably consistent in terms of effect direction, with all but one included study revealing no statistically significant difference. Moreover Cochrane Reviews conducted by the same review group have pooled results with substantially higher heterogeneity (I2 = 87%) . The inconsistency in Cochrane methodology was further highlighted in a recent similar systematic review of randomized versus observational studies. The authors of the latter review found similarly heterogeneous results, but decided to pool and concluded that randomized and non-randomized studies were not qualitatively different . Had they adopted the same strategy as Odgaard-Jensen and colleagues they could have chosen not to pool, postulated the “unpredictability paradox” and concluded that randomized trials have different results from observational studies, but in an unpredictable direction.
The unpredictability paradox has not been used or replicated independently . If proposing that the unpredictability paradox is justified, one would expect independent research to use and validate it. This has not been done.
Invoking the unpredictability paradox discourages researchers from investigating the conditions under which randomization over- and under-exaggerates apparent treatment benefits. If, indeed, adequate randomization makes a difference, it would be interesting to know what made adequate randomization increase effect size and what made it decrease effect size. Proposing the unpredictability paradox as an explanation for the effect of adequate randomization suggests that there is nothing more fundamental to be learned about the conditions under which adequate randomization makes a difference, precisely because it is unpredictable. This approach therefore arguably stifles future research in the area.
If it turns out that adequate randomization is not a powerful protection against bias, it could obscure the relative importance of allocation concealment and blinding which may be more important.
Our arguments presented here do not imply that inadequate randomization is acceptable. In fact one of us has written a book defending the virtues of (adequate) randomization . We believe it is self-evident that inadequate randomization is a sign of sloppy research, and also makes allocation concealment and blinding more difficult. Allocation concealment and blinding, in turn, have been shown empirically to reduce bias in many cases [4, 12]. It follows that, when results from adequately randomized studies and inadequately randomized studies (or observational studies) differ, the results of the adequately randomized trial is likely to be closer to the truth (all other things being equal).
Our conclusion is that Odgaard-Jensen and colleagues’ proposed unpredictability paradox requires further justification. Providing a justification will improve the soundness and validity of the Odgaard-Jensen and colleagues review, inform debates about when to pool heterogeneous results in systematic reviews, rationalize Cochrane Review methodology, and tell us more about the mechanism by which adequate randomization reduces bias. Critical appraisal tools [13, 14], and justification for the inclusion of studies in systematic reviews may also need to be revised in light of an eventual justification for the unpredictability paradox.
Schulz KF, Chalmers I, Hayes RJ, Altman DG: Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA. 1995, 273: 408-412. 10.1001/jama.1995.03520290060030.
Moher D, Pham B, Jones A, Cook DJ, Jadad AR, Moher M, Tugwell P, Klassen TP: Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses?. Lancet. 1998, 352: 609-613. 10.1016/S0140-6736(98)01085-X.
Kjaergard LL, Villumsen J, Gluud C: Reported methodologic quality and discrepancies between large and small randomized trials in meta-analyses. Ann Inter Med. 2001, 135: 982-989. 10.7326/0003-4819-135-11-200112040-00010.
Jüni P, Altman DG, Egger M: Systematic reviews in health care: assessing the quality of controlled clinical trials. BMJ. 2001, 323: 42-46. 10.1136/bmj.323.7303.42.
Odgaard-Jensen J, Vist GE, Timmer A, Kunz R, Akl EA, Schünemann H, Briel M, Nordmann AJ, Pregno S, Oxman AD: Randomization to protect against selection bias in healthcare trials. Cochrane Database Syst Rev. 2011, 4: MR000012-
Popper KR: The Logic of Scientific Discovery. 1968, London: Hutchinson
Higgins JPT, Green S: Cochrane Handbook for Systematic Reviews of Interventions. Volume Version 501st edition. Updated March 2011. 2011, The Cochrane Collaboration, Available from http://www.cochrane-handbook.org
Hróbjartsson A, Gøtzsche PC: Placebo interventions for all clinical conditions. Cochrane Database Syst Rev. 2010, 1: CD003974-
Anglemyer A, Horvath HT, Bero L: Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials. Cochrane Database Syst Rev. 2014, 4: MR000034-
Kunz R, Oxman AD: The unpredictability paradox: review of empirical comparisons of randomized and non-randomised clinical trials. BMJ. 1998, 317: 1185-1190. 10.1136/bmj.317.7167.1185.
Howick J: The Philosophy of Evidence-Based Medicine. 2011, Chichester: Wiley Blackwell & BMJ Books
Savović J, Jones HE, Altman DG, Harris R, Jüni P, Pildal J, Als-Nielsen B, Balk E, Gluud C, Gluud L, Ioannidis J, Schulz K, Beynon R, Welton N, Wood L, Moher D, Deeks J, Sterne J: Influence of reported study design characteristics on intervention effect estimates from randomized, controlled trials. Ann Inter Med. 2012, 157: 429-438. 10.7326/0003-4819-157-6-201209180-00537.
Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, Schünemann HJ, GRADE Working Group: GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ. 2008, 336: 924-926. 10.1136/bmj.39489.470347.AD.
OCEBM Levels of Evidence Working Group: Oxford Centre for Evidence-Based Medicine 2011 Levels of Evidence. [http://www.cebm.net/index.aspx?o=5653]
We thank Jan Odgaard-Jensen and Jan P Vandenbroucke of Leiden University Medical Center for their critical discussion of earlier drafts. JH was funded by the National Institute for Health Research School for Primary Care Research.
The authors declare that they have no competing interests.
Both JH and AM were involved in drafting and revising the manuscript. JH conceived of the study and performed the statistical analysis. Both authors read and approved the final manuscript.