Selecting patients for randomized trials: a systematic approach based on risk group
© Vickers et al; licensee BioMed Central Ltd. 2006
Received: 08 June 2006
Accepted: 05 October 2006
Published: 05 October 2006
A key aspect of randomized trial design is the choice of risk group. Some trials include patients from the entire at-risk population, others accrue only patients deemed to be at increased risk. We present a simple statistical approach for choosing between these approaches. The method is easily adapted to determine which of several competing definitions of high risk is optimal.
We treat eligibility criteria for a trial, such as a smoking history, as a prediction rule associated with a certain sensitivity (the number of patients who have the event and who are classified as high risk divided by the total number patients who have an event) and specificity (the number of patients who do not have an event and who do not meet criteria for high risk divided by the total number of patients who do not have an event). We then derive simple formulae to determine the proportion of patients receiving intervention, and the proportion who experience an event, where either all patients or only those at high risk are treated. We assume that the relative risk associated with intervention is the same over all choices of risk group. The proportion of events and interventions are combined using a net benefit approach and net benefit compared between strategies.
We applied our method to design a trial of adjuvant therapy after prostatectomy. We were able to demonstrate that treating a high risk group was superior to treating all patients; choose the optimal definition of high risk; test the robustness of our results by sensitivity analysis. Our results had a ready clinical interpretation that could immediately aid trial design.
The choice of risk group in randomized trials is usually based on rather informal methods. Our simple method demonstrates that this decision can be informed by simple statistical analyses.
Protocols of randomized trials specify inclusion and exclusion criteria to determine the population under study. Exclusion criteria typically focus on identifying subjects who might be harmed by the study intervention, those for whom benefit is doubtful and those who are unlikely to provide useful data. Inclusion criteria tend to focus on risk: all trials identify the population at risk for the study event, some trials additionally specify criteria to define a study population at high-risk. For example, a trial comparing recurrence rates between two approaches to prostatectomy will specify only that patients with localized prostate cancer are eligible; a trial to determine the effects of adjuvant therapy might further restrict eligibility to patients with locally advanced cancer who are at high risk of recurrence. In some cases trialists studying a similar question have reached different conclusions as to whether to include the whole at-risk population or only a high-risk subgroup. The PLCO trial, for instance, includes all older individuals in a study of lung cancer screening , whereas the National Lung Screening Trial includes only smokers, or recent quitters, with a smoking history of 30 pack-years or more.
In this paper, we present a simple statistical approach for determining whether trialists should use a "whole population" or "high-risk group" approach. The results of the method have a ready clinical interpretation that can immediately aid trial design, moreover, the method is easily adapted to determine which of several competing definitions of high-risk is optimal. In a previous paper, which focused on screening and prevention, two of us argued that the population accrued to a trial should be the same as that to whom the intervention will be applied in practice. Thus the decision whether to include all members of an eligible population, or just a high-risk sub-group, should depend on the relative benefit of these alternative strategies were they to be clinically applied after the completion of a trial indicating an effect of the intervention. We therefore model benefits of alternative strategies in a hypothetical population in order to determine the optimal approach.
Our method is based on the assumption that interventions proven in randomized trials will be offered to eligible patients similar to those studied in the trial. For example, we assume that if a trial accruing patients with locally advanced prostate cancer demonstrates effectiveness of adjuvant therapy, such treatment will subsequently be offered to patients with locally advanced but not organ-confined disease. We therefore compare the predicted outcome of treating all at-risk patients in the population at large to the outcome of treating only the high-risk subgroup. We then recommend the approach with the better outcome to determine the inclusion criteria for the randomized trial.
Outcome is defined in terms of "net benefit" in the eligible population. Net benefit is a concept often used in economic analysis and is simply benefits minus harms. In the case of a medical intervention, "benefits" are associated with reduction in the event rate compared to no additional treatment: in an adjuvant therapy trial, for instance, benefit would be a reduction in cancer recurrences or deaths compared to surgery alone. "Harms" are associated with the intervention itself: side-effects, costs, inconvenience and so on. To assess the relative outcome of the whole population and high-risk approach, we therefore need to calculate the proportion of patients who would be treated, and the reduction in event rate, for each approach. For the whole population approach this is straightforward: the proportion of patients treated is 100% and the reduction in event rate is simply the event rate in the absence of intervention multiplied by the anticipated relative risk of the event with versus without intervention.
Relationship between criteria for defining a "high-risk" sub-group and whether a patient has an event during a clinical trial.
Meet criteria for "high-risk"?
In the Appendix [see Additional file 1], we derive the following formulae for the intervention and event rates when selecting a high-risk group.
Intervention rate: the anticipated proportion of eligible population receiving intervention when the intervention is given only to high risk subjects
= Event rate in the absence of intervention × sensitivity + (1 – Event rate in the absence of intervention) × (1 – specificity)
Event rate: the anticipated proportion of eligible population who have the event when intervention given to the high risk group:
= Event rate in the absence of intervention × sensitivity × relative risk + Event rate in the absence of intervention × (1 – sensitivity)
Decrease in event rate due to intervention:
= Event rate in the absence of intervention – Event rate when intervention given to high risk group
Event: a negative medical outcome such as disease recurrence or death occurring within the projected time frame of the trial
Event rate in the absence of intervention: expected proportion of individuals in the eligible population who will have the event
Sensitivity: numerator: number of individuals in the eligible population who both have the event (in the absence of intervention) and who are classified as high risk; denominator: the total number of patients who have the event.
Specificity: numerator: number of individuals in the eligible population who do not have the event and who do not meet the criteria for high risk; denominator: total number who do not have the event.
Relative risk: relative risk of the event with versus without intervention
As described above, net benefit is benefit minus harm, where benefit is related to the number of events and harm to the number of interventions. To formulate net benefit precisely, it is necessary to put benefits and harms on the same scale. The problem is that events and interventions are not equivalent: an event, such as a prostate cancer recurrence, is generally considered worse than an intervention, such as adjuvant therapy. Just how much worse an event is considered than an intervention will vary from case to case. A common way of converting between events and interventionsis the "number-needed-to-treat" (NNT). We define the threshold NNT (NNTt) as the maximum number of patients that a clinician would treat to prevent one event. The NNTt may be based on an informal subjective judgment; alternatively, methods have been described in the literature to derive NNTt based on the relative harm associated with intervention and an event. NNTt can be thought of in economic terms as the amount we would pay, in interventions, to avoid one event. As such, NNTt is independent of the event rate. Hence we define:
Net benefit = decrease in event rate – intervention rate ÷ NNTt
Note that the units of the left and right terms in the net benefit equation are the same: NNTt is in units of intervention rate divided by event rate, so the units are in terms of event rates.
We propose calculating net benefit for the strategy of treating all patients and for treating only the high-risk group. The approach with the highest net benefit in the eligible population after completion of the trial is chosen for trial design.
A group of investigators wish to investigate whether adjuvant therapy can reduce the risk of recurrence after radical prostatectomy. They plan to randomize patients to surgery alone or surgery with hormonal therapy and follow patients for five years to determine the proportion who recur. About 20% of all prostatectomy patients recur within 5 years (i.e. the event rate in the absence of intervention) and the expected effect of adjuvant therapy is a relative risk of 0.75. Discussion with clinicians and patients suggest an NNTt of 100 for prostate cancer death, that is, if 100 or fewer patients had to be treated with adjuvant therapy to prevent one death, it would be considered worth taking; if more than 100 had to take the agent to prevent one death, the costs, side-effects, risks and inconvenience of the drug would be seen to outweigh its benefits. As only approximately one in three patients who recur after prostatectomy die from disease, the NNTt for the study endpoint of recurrence is 33.
The standard predictive model for prostate cancer recurrence is the "Kattan nomogram" and this has been used in several randomized trials to determine eligibility. Trials have varied as to the threshold risk of recurrence used to determine eligibility: 40% for NCT00283062, 50% for NCT00132301 and 25% for NCT00258765. Let us imagine that our group of investigators disagree as to the optimal threshold: whilst one investigator wishes to define patients as "high risk" if they have 50% or greater risk of recurrence, another argues that the threshold should be set much lower, at 10%, in order to ensure that most patients who actually do recur would be eligible. Meanwhile, the drug company argues that prostate cancer is an unpredictable disease and that the investigators should keep an open mind about whether to accrue all prostatectomy patients to the trial. Note that although the Kattan nomogram is a multivariate model, this is not a requirement of our approach: eligibility criteria can be determined by a model, by a single risk factor – such as a smoking history of at least 30 pack-years – or a combination of risk factors, such as including patients with either high stage cancer or a positive surgical margin.
Calculations to determine whether to treat the whole population or just a high-risk group.
Decrease in event rate (benefit)
Net benefit (benefit – intervention rate ÷ NNTt)
Treat high-risk (risk 10% +)
Treat high-risk (risk 50% +)
As a worked example, we will look at the strategy of treating only patients with a risk of 50% or more. The formula for the intervention rate is: Event rate in the absence of intervention × sensitivity + (1 – Event rate in the absence of intervention) × (1 – specificity), i.e., 20% × 47% + 80% × 4% = 12.6%. The formula for the event rate after the intervention is applied to high-risk subjects is: Event rate in the absence of intervention × sensitivity × relative risk + Event rate in the absence of intervention × (1 – sensitivity) or 20% × 47% × 0.75 + 20% × 53% = 17.65%. This is a decrease is event rate of 20% – 17.65% = 2.35%. The formula for net benefit is decrease in event rate – intervention rate ÷ NNTt giving 2.35% – 12.60% ÷ 33 = 0.01968 as the net benefit for the strategy of treating only men with a risk of 50% or more.
From the table, we can see that the highest net benefit is associated with treating only men with a nomogram predicted risk of recurrence of 10% or more. We would recommend using this as the eligibility criteria for the trial. One particular advantage of our approach is that net benefit has a simple clinical interpretation in terms of either a decrease in event rate while keeping the intervention rate constant or a decrease in the intervention rate while keeping the event rate constant. For example, the net benefit for the high-risk group is 0.0296 greater than that of not using adjuvant therapy in any patient. Thus the strategy of calculating a prediction for all patients and administering an intervention to those with a predicted risk of recurrence ≥ 10% gives the same net benefit as a strategy (say, a change in surgical technique) that leads to the equivalent of about 3 fewer recurrences per 100 patients without any patients receiving adjuvant therapy. A similar calculation can be conducted to determine the decrease in intervention rate for a constant event rate: in this case, the difference in net benefit is multiplied by the NNTt.
Any of the inputs required to calculate net benefit can be varied to determine whether this affects which strategy is deemed optimal. The event rate in the absence of the study intervention can usually be estimated (e.g. from cohort studies), and whether it is worth varying sensitivity and specificity will depend on the size and quality of the studies used to estimate these parameters. Hence the two most important sensitivity analyses concern NNTt – on the grounds that this is a judgment that can reasonably vary from individual to individual and place to place – and relative risk, on the grounds that this is unknown during trial planning.
Sensitivity analysis. Net benefit when relative risk and NNTt are varied.
Cut-off for risk of recurrence
Our method assumes that, following a positive trial result, all or nearly all high risk patients will receive the intervention, and none, or nearly none, of the low risk population will be treated. This might be seen as a somewhat unrealistic ideal of evidence-based medical practice. However, it is easy to adjust estimates of event rates and intervention rates in the presence of variation from this standard by specifying a proportion of high risk patients are not treated and a proportion of low risk patients who inappropriately receive intervention (see Appendix [Additional file 1] for formulae).
Applying the method to other sample scenarios
Net benefit for treating high-risk and all patients, varying the event rate in the absence of intervention.
Event rate in the absence of intervention
Net benefit (high-risk)
Net benefit (treat all)
Net benefit compared to treat all
Net benefit for treating high-risk and all patients, varying the effectiveness and tolerability of intervention.
Net benefit (high-risk)
Net benefit (treat all)
Net benefit: high-risk – treat all
Effective intervention, high sensitivity
Highly tolerable intervention
Highly tolerable intervention, high sensitivity
Adverse intervention, high specificity
The ideal intervention, high sensitivity and specificity
Questionable intervention, poor sensitivity and specificity
The final two rows of table 5 demonstrate the value of a decision analytic approach to the problem of risk group selection. In one scenario, selection criteria that have near perfect sensitivity and specificity are useless because the intervention is highly effective and tolerable, and therefore there is little downside to treating all patients. In another scenario, selection criteria that are only marginally better than random guessing should be used to select a high-risk group because intervening is of extremely marginal benefit.
Sample size considerations
Sample size requirements for different scenarios. Sample size is calculated using 90% power and 5% alpha
Event rate in control arm of trial
Sample size (pts. screened) for relative risk of 0.75
Treat high-risk (risk 10% +)
Treat high-risk (risk 25% +)
Treat high-risk (risk 50% +)
The general approach we suggest is only based on net benefit in the eligible population after completion of the trial, and does not take into account the sample size considerations. If there is an upper bound on the sample size due to budget constraints, the risk group selected should be that group with highest net benefit among those under consideration that satisfy the budget for the trial. Alternatively, one could consider a more complex calculation of net benefit subject to a constraint on total trial costs.
Determining who should receive an intervention is a key aspect of medical practice. It is inevitable that, although some interventions should be applied to all members of an at-risk population (e.g. antibiotics before surgery), others should be restricted to those at high-risk (e.g. β-blockers before surgery). To our knowledge, no previous investigators have described a simple strategy for determining whether a trial should accrue patients selected from the whole at-risk population or only to those from a high-risk subgroup. Moreover, criteria for determining the appropriate definition of high-risk subgroup have not been developed systematically, for example, by quantitatively comparing different definitions.
Accordingly, prior approaches to issues of patient selection in randomized trials have been rather informal. For example, at a National Cancer Institute workshop on risk prediction models, it was reported that participants "repeatedly discussed the use of cancer risk prediction models for high-risk versus population approaches to cancer prevention". Yet the only guidance given was that a population prevention strategy would be optimal unless a predictive model had "high discriminatory power" to identify those who will develop a disease. This begs the question of just how good a model has to be, and omits what we have demonstrated to be the key variables of the underlying event rate, intervention effectiveness and tolerability.
A debate concerning the inclusion criteria for the National Lung Screening Trial is similarly illustrative. In the original trial protocol, the investigators set inclusion criteria of a 30 pack-year or greater smoking history and no more than 15 years since quitting. No clear rationale was given for this threshold. Subsequently, a separate group of investigators created a risk prediction model for lung cancer and argued that, because its predictive properties were well understood, this might be used to select patients for a clinical trial. However, these investigators did not demonstrate clearly that any specific set of criteria derived from their model was superior to those used in the trial.
Given the apparent advantages of our method, we should discuss some of its limitations. One important assumption of the method is that the relative risk for intervention versus no intervention is constant over the choice of risk groups. Generally speaking, we do not know whether this is true: indeed, two of us have previously used possible inconstancy of relative risk to argue against accruing only high-risk patients and then applying the results to the whole eligible population. However, we think there is an important difference between using a certain assumption to help design a trial and using it to make a clinical decision. In case of a clinical decision about treatment, patients could be harmed if assumptions about relative risk do not hold. We would therefore like to avoid any such assumptions. In the case of trial design, any design we choose necessarily involves assumptions, explicit or otherwise, about the relationship between relative and absolute risk. Moreover, these assumptions can be tested once the trial is completed and further research recommended if appropriate.
An apparent disadvantage of our method is that it involves a subjective judgment of NNTt, and a prediction as to relative risk. However, these would be needed for other design decisions even if an investigator chose not to follow our recommendations. For example, the NNTt is equivalent to the "minimum clinically significant difference" that is used in standard sample size calculations; predictions as to event rates are similarly part-and-parcel of sample size estimation.
An alternative to the approach suggested here would be to conduct a trial including all patients, use the trial data to build a predictive model and then select a high-risk group accordingly. The principal advantage is that we can model treatment benefit, rather than baseline risk, and therefore do not need to make assumptions about a constant relative risk. This approach has been pioneered successfully with respect to adjuvant chemotherapy. However, in practice, clinicians and statisticians are uncomfortable recommending interventions to sub-groups of patients unless these have demonstrated clinical and statistical significance in the primary analysis of a randomized trial. It is quite plausible that an intervention with a modest effect size, or one targeting a moderately prevalent disease, will not show sufficient overall effectiveness in definitive trial and will be dropped from consideration, even though it would be of important benefit to a sub-group of high-risk patients. An illustrative recent example is the Women's Health Initiative study of calcium and vitamin D for fracture prevention. Overall, this study found rather small benefits of supplementation, the key conclusion being that treatment "did not significantly reduce hip fracture". Many of the women in this study were at very low risk: for example, 37% of the participants were aged 50 to 59 and the rate of fracture in this sub-group was only about 0.3%. It is entirely possible that supplementation is of important benefit for older women at higher risk of fracture, but that the use of supplementation in the community will decline given the rather negative overall study results.
In this paper, we have introduced a statistical method to determine whether or not to restrict a study to a high-risk population and, if so, to determine which of several competing definitions of high-risk is optimal. Our method is simple and produces results with direct clinical applicability. It should appeal to clinicians since the quantitative results are in concert with clinical intuition. However, we feel that the mathematical details of our method are perhaps less important than our overall message, which includes four main points. First, it may be more rational to focus on high-risk groups than to treat everyone at risk. Second, whether or not to restrict an intervention to a high-risk group is a question that can be informed by data and statistical analyses. It is our impression that current decisions about whom to include in trials have not been statistically based, rather, they appear to have depended on informal judgment. Third, trial eligibility criteria in trials that attempt to identify high-risk subjects (such as pack years in a lung cancer screening trial) can be seen as predictions with certain statistical properties. We have chosen to describe these in terms of sensitivity and specificity, on the grounds that these terms are readily understood by most clinicians. Fourth, we can compare different approaches to trial eligibility using formal statistical analysis. We believe that a more systematic approach to patient selection will maximize the benefits of randomized trials for human health.
Dr Vickers' work on this research was funded by a P50-CA92629 SPORE from the National Cancer Institute. The sponsor had no role in design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript. Dr Michael Morris, Memorial Sloan-Kettering Cancer Center, provided advice on adjuvant trials in prostate cancer; Dr Mike Kattan, Cleveland Clinic Foundation, provided raw data to calculate the properties of his nomogram.
- Gohagan JK: The prostate, lung, colorectal and ovarian (PLCO) cancer screening trial. Controlled Clinical Trials. 2000, 21 (6, Supplement 1): 249S-10.1016/S0197-2456(00)00096-9.View ArticleGoogle Scholar
- Bach PB, Kattan MW, Thornquist MD, Kris MG, Tate RC, Barnett MJ, Hsieh LJ, Begg CB: Variations in Lung Cancer Risk Among Smokers. J Natl Cancer Inst. 2003, 95 (6): 470-478.View ArticlePubMedGoogle Scholar
- Baker S, Kramer B, Corle D: The fallacy of enrolling only high-risk subjects in cancer prevention trials: Is there a "free lunch"?. BMC Medical Research Methodology. 2004, 4 (1): 24-10.1186/1471-2288-4-24.View ArticlePubMedPubMed CentralGoogle Scholar
- Stephenson AJ, Scardino PT, Eastham JA, Bianco FJJ, Dotan ZA, DiBlasio CJ, Reuther A, Klein EA, Kattan MW: Postoperative Nomogram Predicting the 10-Year Probability of Prostate Cancer Recurrence After Radical Prostatectomy. J Clin Oncol. 2005, 23 (28): 7005-7012. 10.1200/JCO.2005.01.867.View ArticlePubMedPubMed CentralGoogle Scholar
- Sinclair JC, Cook RJ, Guyatt GH, Pauker SG, Cook DJ: When should an effective treatment be used?: Derivation of the threshold number needed to treat and the minimum event rate for treatment. Journal of Clinical Epidemiology. 2001, 54 (3): 253-10.1016/S0895-4356(01)00347-X.View ArticlePubMedGoogle Scholar
- Baker SG, Heidenberger K: Choosing sample sizes to maximize expected health benefits subject to a constraint on total trial costs. Med Decis Making. 1989, 9 (1): 14-25.View ArticlePubMedGoogle Scholar
- Freedman AN, Seminara D, Gail MH, Hartge P, Colditz GA, Ballard-Barbash R, Pfeiffer RM: Cancer Risk Prediction Models: A Workshop on Development, Evaluation, and Application. J Natl Cancer Inst. 2005, 97 (10): 715-723.View ArticlePubMedGoogle Scholar
- Gill S, Loprinzi CL, Sargent DJ, Thome SD, Alberts SR, Haller DG, Benedetti J, Francini G, Shepherd LE, Francois Seitz J, Labianca R, Chen W, Cha SS, Heldebrant MP, Goldberg RM: Pooled Analysis of Fluorouracil-Based Adjuvant Therapy for Stage II and III Colon Cancer: Who Benefits and by How Much?. J Clin Oncol. 2004, 22 (10): 1797-1806. 10.1200/JCO.2004.09.059.View ArticlePubMedGoogle Scholar
- Jackson RD, LaCroix AZ, Gass M, Wallace RB, Robbins J, Lewis CE, Bassford T, Beresford SAA, Black HR, Blanchette P, Bonds DE, Brunner RL, Brzyski RG, Caan B, Cauley JA, Chlebowski RT, Cummings SR, Granek I, Hays J, Heiss G, Hendrix SL, Howard BV, Hsia J, Hubbell FA, Johnson KC, Judd H, Kotchen JM, Kuller LH, Langer RD, Lasser NL, Limacher MC, Ludlam S, Manson JAE, Margolis KL, McGowan J, Ockene JK, O'Sullivan MJ, Phillips L, Prentice RL, Sarto GE, Stefanick ML, Van Horn L, Wactawski-Wende J, Whitlock E, Anderson GL, Assaf AR, Barad D, the Women's Health Initiative I: Calcium plus Vitamin D Supplementation and the Risk of Fractures. N Engl J Med. 2006, 354 (7): 669-683. 10.1056/NEJMoa055218.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.