Exclusion rates in randomized controlled trials of treatments for physical conditions: a systematic review

Background The generalisability of randomized controlled trials (RCTs) can be uncertain because the impact of exclusion criteria is rarely quantified. The aim of this study was to systematically review studies examining the percentage of clinical populations with a physical health condition who would be excluded by RCTs of treatments for that condition. Methods Medline and Embase were searched from inception to Feb 11th 2018. Two reviewers independently completed screening, full-text review, data extraction and risk-of-bias assessment. The primary outcome was the percentage of patients in the clinical population who would have been excluded from each examined trial. Subgroup analyses examined exclusion by population setting, publication date and funding source. Results Titles/abstracts (20,754) were screened, and 50 studies were included which reported exclusion rates from 305 trials of treatments in 31 physical conditions. Estimated rates of exclusion from trials varied from 0% to 100%, and the median exclusion rate was 77.1% of patients (interquartile range 55.5% to 89.0% exclusion). Median exclusion rates for trials in common chronic conditions were high, including hypertension 83.0%, type 2 diabetes 81.7%, chronic obstructive pulmonary disease 84.3%, and asthma 96.0%. The most commonly applied exclusion criteria related to age, co-morbidity and co-prescribing, whereas more implicit criteria relating to life expectancy or functional status were not typically examined. There was no evidence that exclusion varied by the nature of the clinical population in which exclusion was evaluated or trial funding source. There was no statistically significant change in exclusion rates in more recent compared with older trials. Conclusions The majority of trials of treatments for physical conditions examined excluded the majority of patients with the condition being treated. Almost a quarter of the trials studied excluded over 90% of patients, more than half of trials excluded at least three quarters of patients, and four out of five trials excluded at least half of patients. A limitation is that most studies applied only a subset of eligibility criteria, so exclusion rates are likely under-estimated. Exclusion from trials of older people and people with co-morbidity and co-prescribing is increasingly untenable given population aging and increasing multimorbidity. Trial registration PROSPERO registration CRD42016042282.


Background
Randomized controlled trials (RCTs) are the gold standard method for evaluating the efficacy of treatments because well-designed RCTs minimize bias and confounding. They therefore maximize internal validity, giving confidence that the results are true for the trial population studied. However, trial populations are often highly selected, which may weaken the generalizability of RCT evidence in the sense of leaving uncertainty that the results apply to everyone with the condition in clinical practice [1,2]. Some exclusions from RCTs are justifiable (e.g., where an individual is allergic to a medicine). However, Van Spall et al. estimated that 84.1% of trials published in high-impact general medical journals between 1994 and 2006 had poorly justified patient exclusion criteria [3].
A number of studies have shown that various landmark RCTs measuring treatment effects, many of which underpin guideline recommendations and influence regulatory decision-making, exclude large proportions of people with the condition being treated [4,5]. Older people, women, and people with co-morbidity or coprescribing are noticeably excluded from trials [3,6,7]. Although there is some evidence that women and older people are better represented in newer trials, they remain under-represented compared with the wider population [7]. These patterns of exclusion do not represent the realities of current and future clinical practice. Most people with any chronic condition have co-morbidity, and multimorbidity is the norm in older people [8,9]. Therefore, guideline-recommended treatment in routine practice will often require significant extrapolation from RCT evidence [10,11], where strict RCT eligibility criteria lead to trial populations significantly differing from clinical populations seen in routine practice [12,13].
The problem that strict RCT eligibility criteria pose for generalizing from RCT-derived evidence is well known [14,15]. However, the extent to which trials assessing treatment effects across different conditions exclude patients seen and treated in clinical practice is uncertain. The aims of this study were to undertake a systematic review of studies estimating the percentage of people with a chronic physical condition who would be excluded by RCTs of treatment for that condition and to examine how exclusion rates varied for different diseases, for different clinical populations, and over time.

Search strategy
A systematic review was undertaken searching the Medline and Embase databases from inception to 11 February 2018 for all studies comparing the percentage of people from a 'clinical' population with a physical condition who would have been excluded from one or more trials of treatment intended for that condition. The search strategy is detailed in Additional file 1.

Inclusion criteria
We included studies published that explicitly examined the percentage of people with a chronic physical condition in a defined clinical population who would have been eligible for one or more selected RCTs of an individual patient treatment for that condition (including medication, surgery and other non-pharmacological interventions). The clinical populations included were not restricted in terms of their setting or method of sampling and therefore could be any of unselected patients seen in clinical practice in primary or specialist care, patients in clinical or research registries, or research cohorts identified or recruited in these settings. However, the appropriateness of the clinical population used to examine exclusion from a particular trial was examined as part of risk-of-bias evaluation.

Exclusion criteria
We excluded studies examining eligibility for trials of mental health conditions, studies that were not published in English, studies that did not explicitly report the percentage of patients eligible for trials or where percentages of patients eligible could not be calculated from the available data (e.g., those comparing recruited with non-recruited patients without examining exclusion in an underlying clinical population), and studies examining eligibility for a hypothetical trial or applying a set of common exclusion criteria from multiple trials instead of using actual exclusion criteria from single trials. Since estimated exclusion rates in very small clinical populations are likely to be imprecise, we also excluded studies where eligibility was calculated in a clinical population that included fewer than 100 patients.

Selection of studies
All titles and abstracts were independently screened by two reviewers to identify papers for full-text review. Full-text review and data extraction were carried out independently by two reviewers on the basis of the published protocol [16], and disagreements were resolved by discussion to reach consensus.

Data extraction and quality assessment
Data extraction was carried out by a minimum of two reviewers, involving a third reviewer where necessary, and disagreements were resolved by discussion to reach consensus. Data extracted for each study included the condition of interest and a description of comparison clinical population, including the purpose of the clinical population dataset (e.g., clinical registry and electronic health record data), health-care setting and location, the date of clinical population recruitment or identification, clinical population size, and the diagnostic criteria used to define the clinical population. These data were used to make an assessment of bias on the overall appropriateness of the clinical population. Extracted data for the underlying trials examined by each study included the rationale for the choice of trials examined, the type of intervention or treatment in the trial, the listed trial eligibility criteria that were applied (or not) to each clinical population to estimate exclusion rate, and the trial's source of funding (pharmaceutical versus non-pharmaceutical).
The primary outcome extracted was the percentage of patients in the clinical population who would have been excluded for each trial examined and the reported 95% confidence interval (CI) of this percentage (which was calculated if not reported by the authors).

Risk-of-bias assessment
There is no published risk-of-bias tool to assess the kinds of studies examined. We therefore developed three pre-specified risk-of-bias criteria that were independently assessed by two reviewers, namely: 1) How the reviewed paper selected trials to examine.
We evaluated whether there was a systematic approach to trial selection (e.g., systematic search of the literature) or a clearly stated justification for the choice of trials and whether that justification was judged to be adequate. Studies were considered to be at low risk of bias if selection rationale were clearly stated and judged to be justifiable; otherwise, they were considered to be at high risk of bias.
2) The appropriateness of each trial-clinical population pair. The appropriateness of each trialclinical population pair was assessed in relation to how well the clinical population appropriately represented the population for whom the treatment evaluated in the trial was intended or suitable. For example, a primary care population of people with heart failure is appropriate for a trial of betablockers or angiotensin-converting enzyme inhibitors used as long-term treatment [4], whereas an emergency department population is appropriate for a trial of treatment in acute, decompensated heart failure [17]. Studies were considered to be at low risk of bias if the clinical population was judged to be representative of real-world populations for which the trial treatment was intended or indicated, at high risk of bias if the clinical population was not considered to be representative of real-world populations for which the trial treatment was intended or indicated, and at unclear risk of bias if insufficient information was provided for assessment.
3) The choice of trial eligibility criteria to examine. The choice of trial eligibility criteria assessed in relation to the stated criteria applied and not applied. Studies were considered to be at low risk of bias for the choice of trial eligibility criteria assessed in relation to the stated criteria if they clearly stated that all important or common criteria were applied; otherwise, studies were considered to be at high risk of bias.

Data synthesis and analysis
Some trials were evaluated in more than one clinical population. In this situation, the trial-clinical population pair with the lowest percentage of patients was selected for analysis in order to obtain the most conservative estimate of the percentage of patients excluded. For the remaining trial-clinical population pairs, the overall median, range and interquartile range for the primary outcome (the estimated percentage of the clinical population excluded by each trial) were calculated and repeated for condition groups (cardiovascular conditions, diabetes, respiratory conditions, cancer, rheumatoid arthritis (RA), human immunodeficiency virus (HIV) and other conditions) and for individual conditions. Variation was further examined by using linear regression to model unadjusted and adjusted differences in the percentage excluded by each trial in relation to whether the clinical population was recruited from primary or specialist care, whether the trial examined was publicly funded or industry-funded, the date of trial publication (with trials grouped into quartiles of publication date with equal numbers of trials in each group: 1994-1999, 2000-2003, 2004-2011 and 2012-2018), and riskof-bias assessment (low risk of bias versus high or unclear risk of bias).

Study characteristics
The searches identified 21,885 articles with a further 18 identified from other sources, including examination of references of included studies. Non-duplicate documents (20,754) were screened, and 222 full-text articles were examined. Fifty studies that examined trial eligibility in 57 distinct clinical populations were included (Fig. 1 Characteristics of all the trials examined by these studies are shown in supplementary tables S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, S13, S14, S15 and S16.

Risk of bias
In risk-of-bias assessment, 126 (41.3%) of estimates of trial exclusion rates were assessed as low risk of bias, 104 (34.1%) as high risk, and 75 (24.6%) as unclear. High risk of bias was driven largely by the clinical population used in the comparison being judged as less appropriate for the treatment being trialed (supplementary tables S17 and S18). Comparisons with a low risk of bias had significantly lower exclusion rates (

Trials where exclusion rates were estimated in multiple clinical populations
Thirty-eight trials were examined in two or more clinical populations (Table 3), and 30 were trials of treatment for RA. Exclusion rates of nine RA trials were each estimated in three clinical populations [18], whereas exclusion rates for the remaining 21 were estimated in two clinical populations [19]. For the nine trials examined in three clinical populations, estimated exclusion rates were higher in every comparison in the Veterans' Affairs Rheumatoid Arthritis (VARA) cohort (median 97.4%, range 75.6 to 98.4%) when compared with the Rheumatoid Arthritis Investigators' Network (RAIN) database (median 89.6%, range 74.7 to 91.6%) and the National Register for Biologic Treatment cohort (median 80.0%, range 56.0 to 92.4%). In the remaining 21 trials, estimated exclusion rates in every comparison were higher in VARA (median 97.4%, range 72.7 to 99.1%) than in RAIN (median 89.0%, range 64.9 to 93.5%). Such differences would be expected given variation in the data collected by different registries and in the clinical population included (VARA, for example, is made up predominately of male veterans whereas RAIN is a less selected population of patients attending rheumatology clinics) [20]. Differences were more variable and sometimes larger for trials of treatments for the other conditions (atrial fibrillation, heart failure, acute myocardial infarction and COPD) examined in more than one clinical population, although there was no consistent pattern to explain this in relation to risk of bias or the nature of the clinical population (Table 3).

Summary of evidence
This study examined estimated exclusion rates in clinical populations in 305 trials of treatments for physical  conditions. Almost a quarter of the trials studied excluded 90% or more of patients, more than half of trials excluded more than 75% of patients, and four out of five trials excluded more than 50% of patients. There was variation in exclusion depending on the condition studied, but exclusion rates did not differ between studies using primary versus specialist care clinical populations to evaluate exclusion rates or between trials that were publicly versus industry-funded. There was no strong evidence that rates of exclusion had changed over time.
A third of studies were at high risk of bias, most commonly because the clinical population used was not appropriate for the trial examined, and a further quarter of studies were at unclear risk of bias. Exclusion rates were lower for studies at low risk of bias where median exclusion was 60.8%, although two thirds of low risk-of-bias studies would still have excluded more than 50% and one third more than 75% of patients.

Strengths and limitations
A strength of the study is the systematic approach to identify and examine the underlying literature by using a deliberately broad search strategy to maximize sensitivity. However, the nature of the literature examined and the fact that there are no clear reporting criteria for such studies make it possible that some studies were not identified. Despite this, estimated exclusion rates in 305 trials in 57 clinical populations were included. A key observation is that examined studies were heterogeneous in a variety of ways. Underlying studies varied in how they selected trials to compare, in their choice of clinical population, and in the trial inclusion and exclusion criteria they applied. Some of the observed variation in exclusion rates likely reflects the choices made, but these were not always explicit in the included studies. This may be related to the fact that there are no clear criteria for the conduct of such studies. A further limitation is that we excluded comparisons with fewer than 100 patients in order to avoid imprecise estimates for common conditions (although, in practice, only two studies were excluded as a result). Finally, most of the underlying studies applied only a subset of eligibility criteria, most commonly age, co-morbidity and co-prescribing because these are easily applied to the data contained in coded data extracted from electronic health records and clinical or research registries. The implication is that true exclusion rates are likely even higher than reported here because of unexamined explicit criteria and because trial recruitment also involves the application of implicit criteria by researchers (such as the presence of frailty and whether an individual is perceived to be likely to adhere to trial procedures).

Comparison with other literature
Exclusion and inclusion criteria are not always clearly reported in trial publications. For example, 56% of 255 cancer RCTs published in leading journals had discrepancies between eligibility criteria listed in protocols and those listed in the papers reporting results, and 96.7% of these discrepancies imply that the trial population was broader than it actually was [21]. Examining RCTs published in high-impact journals 1994-2006, Van Spall et al. found co-morbidity, age and co-prescribing used as exclusion criteria in the majority of the 283 trials examined, usually without any explicit justification [3]. A study of 4341 RCTs published in four high-impact general medical journals found that 29% had upper age limits for inclusion that were rarely explicitly justified. Although the percentage of trials with upper age limits declined somewhat between 1998 and 2015, absolute change over time was small [22], and only 7% of RCTs published in 2012 were specifically conducted in older patients [23]. Of 319 ongoing RCTs for 10 common conditions registered with ClinicalTrials.gov in 2014, 79% excluded patients with common co-morbidities [24]. Studies of trials in individual conditions have similar findings. Only one of 112 RCTs of secondary prevention of cardiovascular disease published in 2010-2012 justified the exclusion criteria applied [25]. Two thirds of RCTs for type 2 diabetes had upper age limits for inclusion, three quarters excluded a range of co-morbidities, and only 1.4% of the 440 RCTs examined were specifically in older adults [26]. However, this literature does not quantify the impact of inclusion and exclusion criteria on eligibility as we have done here.

Implications for policy, practice and research
Exclusion of patients from trials matters only if the exclusion criteria are effect modifiers of treatment [27], meaning that the benefits or harms of treatment (or both) systematically vary in the included versus the excluded. This review found that trial evidence is typically derived from narrow populations which are usually selected to have higher risk of outcomes expected to be improved by treatment (e.g., by selective inclusion of patients at high cardiovascular risk) and usually selected to have lower risk of adverse effects (e.g., by selective exclusion of patients with co-morbidity, co-prescribing and frailty). Guideline developers, medicine regulators and clinicians therefore all face the problem of having to extrapolate RCT findings to excluded clinical populations where benefits and harms may be plausibly different. Simple extrapolation requires making assumptions that the benefits and harms of treatment are similar in included and excluded populations [28]. This is often reasonable but such assumptions do not always hold true. For example, trial-derived estimated numbers needed to treat (NNTs) for the use of angiotensin-converting enzyme inhibitors over about 3 years to prevent end-stage renal disease (ESRD) in chronic kidney disease are 9-25. Estimated NNTs to prevent ESRD in clinical populations are more than 100 because of lower baseline risk of ESRD and higher risk of competing mortality than observed in trial populations [29]. Adverse effects and harms from treatment are also usually higher in people with frailty and polypharmacy [30] and increase with age. Aspirin used after a cerebrovascular event in patients over 75 years old, for example, is associated with a fivefold increase in fatal bleeding compared with younger patients [31]. So even if treatment benefits are similar in trial and clinical populations, overall net benefit may still vary.
Careful attention to internal validity has improved the quality of trial evidence and its systematic synthesis, but generalizability and applicability are usually less explicitly considered [32]. Despite recommendations that systematic reviews should always discuss applicability of evidence [33], only a minority actually do [34]. There remains a clear place for efficacy trials in highly selected populations, but choosing to design such a trial is also effectively a declaration that the trialists have concerns that net benefit may be different in excluded populations. While more restrictive eligibility criteria for earlystage clinical trials may be appropriate when little is known about a treatment's safety and efficacy, enrolment of more diverse populations for later studies (or adaptive enrolment to include broader populations depending on initial efficacy findings) will help ensure a better understanding of the treatment's effect for all patients likely to benefit. In this regard, the US Food and Drug Administration is exploring recommendations around modernizing eligibility criteria for cancer clinical trials [35]. Furthermore, robust methods aimed at generating realworld evidence may help augment evidence from trials.
To facilitate judgements about applicability by clinicians, systematic reviewers, guideline developers and medicine regulators, journals and registries should require trialists to explicitly report and justify inclusion and exclusion criteria, should report data on who was excluded at screening (although much exclusion happens before formal eligibility screening), and ideally should report how the trial population compares with the clinical population from which it was recruited. Age, co-morbidity and co-prescribing exclusions in particular require justification, not least because aging populations mean that for most conditions older people with multimorbidity and polypharmacy will be an increasing percentage of the clinically treated population [8,36].
Assessment of the applicability of evidence should be explicitly reported by systematic reviews and in guideline development. Extrapolation of evidence is inevitable but should be explicitly justified when recommendations are made for all patients with a condition based on trial evidence from narrow subsets of the clinical population. Alternatively, guideline developers may consider making more nuanced or stratified recommendations that account for differences between trial and clinical populations [28,37]. Guideline development therefore needs to be more informed by evidence about applicability by making greater use of epidemiological data describing how the clinical population differs from trial populations. This is also relevant for medicine regulation, where a better understanding of differences between trial and real-world populations may help in risk-minimization planning, including in the design of post-authorization safety studies.
Finally, although this review found a large volume of evidence about exclusion, the quality of that evidence was variable. Future studies in this field should clearly justify their selection of trials to examine and prioritize landmark trials or those cited in high-quality guidelines since these most clearly define standards of practice. The clinical population used to examine eligibility should be clearly described, and its appropriateness for measuring exclusion rates in the trial being examined justified. Studies of exclusion should report all eligibility criteria applied and all criteria not applied and discuss the implications of this for interpreting the findings.

Conclusions
Most people with any of the physical conditions studied would be excluded from most trials of treatments for that condition. This is most commonly because the trial excludes older people and those with significant comorbidity or co-prescribing. Population aging, increasing multimorbidity and increasing polypharmacy make it imperative that evidence of treatment effectiveness better match the people whom we actually treat in clinical practice.