The effect of covariate adjustment for baseline severity in acute stroke clinicaltrials with responder analysis outcomes
© Garofolo et al.; licensee BioMed Central Ltd. 2013
Received: 18 November 2012
Accepted: 13 March 2013
Published: 11 April 2013
Traditionally in acute stroke clinical trials, the primary clinical outcomeemployed is a dichotomized modified Rankin Scale (mRS). New statisticalmethods, such as responder analysis, are being used in stroke studies toaddress the concern that baseline prognostic variables, such as strokeseverity, impact the likelihood of a successful outcome. Responder analysisallows the definition of success to vary according to baseline prognosticvariables, producing a more clinically relevant insight into the actualeffect of investigational treatments. It is unclear whether or notstatistical analyses should adjust for prognostic variables when responderanalysis is used, as the outcome already takes these prognostic variablesinto account. This research aims to investigate the effect of covariateadjustment in the responder analysis framework in order to determine theappropriate analytic method.
Using a current stroke clinical trial and its pilot studies to guidesimulation parameters, 1,000 clinical trials were simulated at varyingsample sizes under several treatment effects to assess power and type Ierror. Covariate-adjusted and unadjusted logistic regressions were used toestimate the treatment effect under each scenario. In the case ofcovariate-adjusted logistic regression, the trichotomized National Instituteof Health Stroke Scale (NIHSS) was used in adjustment.
Under various treatment effect settings, the operating characteristics of theunadjusted and adjusted analyses do not substantially differ. Power and typeI error are preserved for both the unadjusted and adjusted analyses.
Our results suggest that, under the given treatment effect scenarios, thedecision whether or not to adjust for baseline severity when using aresponder analysis outcome should be guided by the needs of the study, astype I error rates and power do not appear to vary largely between themethods. These findings are applicable to stroke trials which use the mRSfor the primary outcome, but also provide a broader insight into theanalysis of binary outcomes that are defined based on baseline prognosticvariables.
This research is part of the Stroke Hyperglycemia Insulin Network Effort(SHINE) trial, Identification Number NCT01369069.
KeywordsResponder analysis Sliding dichotomy Clinical trials Acute stroke Modified rankin scale Baseline severity
Stroke is a potentially debilitating medical event that affects approximately 800,000people in the United States each year, leaving as many as 30% of survivorspermanently disabled . Given this impact, there is great demand for treatments thatsignificantly improve functional outcome following a stroke. To date, few clinicaltrials for the treatment of acute stroke have succeeded; of over 125 acute strokeclinical trials, only three successful treatment methods have been identified [2, 3].
One of the possible reasons for the excessive number of neutral or unsuccessfulstroke trials is the definition of successful outcome utilized in the studies . In clinical trials, stroke outcome is most commonly measured by themodified Rankin Scale (mRS) of global disability at 90 days. The mRS is a validand reliable measure of functional outcome following a stroke . Past trials have dichotomized mRS scores into “success” and“failure”, scores of 0 to 1 (or 0 to 2) were considered to be“successes” while scores greater than 1 (or 2) were considered to be“failures,” regardless of baseline stroke severity [6–9]. This method fails to take into account the understanding that baselineseverity is highly correlated with outcome. New methods, such as the globalstatistic, shift analysis, permutation testing and responder analysis, are evolvingto make better use of the outcome data with the hopes of providing highersensitivity to detect true treatment effects [2, 4, 6, 9–17].
Responder analysis, also known as the sliding dichotomy, dichotomizes ordinaloutcomes into “success” and “failure,” but addresses thedrawbacks of traditional dichotomization by allowing the definition of success tovary by baseline prognostic variables. Various trials have implemented the responderanalysis where baseline severity is defined by one or many baseline prognosticfactors [18–20]. Those study subjects in a less severe prognosis group at baseline mustachieve a better outcome to be considered a trial “success,” whereas aless stringent criterion for success is applied to subjects in a more severebaseline prognosis category. The currently enrolling Stroke Hyperglycemia InsulinNetwork Effort (SHINE) trial employs responder analysis for its primary efficacyoutcome .
The SHINE trial is a large, multicenter, randomized clinical trial designed todetermine the efficacy and safety of targeted glucose control in hyperglycemic acuteischemic stroke patients. While the methodological details of the SHINE trial arediscussed elsewhere , it should be noted that the primary outcome for efficacy is the baselineseverity adjusted 90-day mRS score dichotomized as “success” or“failure” according to a sliding dichotomy. Eligibility criteria forSHINE require that a subject’s baseline NIHSS score must be between 3 and 22,inclusively. Those with a “mild” prognosis, defined by a baseline NIHSSscore of 3 to 7, must achieve a 90-day mRS of 0 to be classified as a“success.” Those with a “moderate” prognosis, defined by abaseline NIHSS score of 8 to 14, must achieve a 90-day mRS of 0 to 1 to beclassified as a “success.” Finally, those subjects with a“severe” prognosis, defined by a baseline NIHSS score of 15 to 22, mustachieve a 90-day mRS of 0 to 2 to be classified as a “success.” By usingresponder analysis with a trichotomized NIHSS, the threshold for success isstringent for the milder strokes, while the moderate to severe strokes are allowedto have more residual deficits in the threshold for success.
One of the questions that arose from the trial’s Data and Safety MonitoringBoard was that of covariate adjustment. Statistical analyses often adjust forprognostic factors, or covariates, that may be predictive of the primary outcome,such as baseline severity [21, 22]; however, in the case of SHINE, this prognostic variable is also used todefine the outcome. While the literature provides many resources on the design andimplementation of responder analysis, as well as examples of trials which usedresponder analysis, there are no clear resources discussing whether or notstatistical analyses should be adjusted for the prognostic variables used to definesuccessful outcome.
This research aims to investigate the effect of covariate adjustment in the responderanalysis framework, particularly when the covariate is involved in the definition ofsuccessful outcome. The cut-points for the SHINE trial are clinically, rather thanstatistically, defined and so it is conceivable that adjustment for baselineseverity in the statistical analysis may account for additional variation andincrease the power to detect a true treatment effect. A simulation study isconducted to assess the operating characteristics (power and type I error) ofcategorically-adjusted and unadjusted analyses under several possible treatmenteffect scenarios. In addition, treatment effect estimates and their standard errorsare examined across the various scenarios. Since the primary outcome for the SHINEtrial is binary, we expect to see an increase of standard error on the treatmenteffect estimates, consistent with the findings of Robinson and Jewell . However, also consistent with Robinson and Jewell, we expect to see thisincrease in standard error to be balanced by a movement of the treatment effectestimate away from the null hypothesis.
By examining the effect of covariate adjustment in responder analysis, we aim todefine the most appropriate statistical approach to identify true treatment effects.Our findings are not only applicable to the SHINE and other stroke trials which usethe mRS for the primary outcome, but also provide insight into the appropriate useof categorical baseline prognostic variables in other trials which use an ordinalscale as a primary outcome measure.
Sliding dichotomy criteria for successful outcome in SHINE trial
90-day mRS for successful outcome
3 to 7
8 to 14
15 to 22
0, 1, 2
The simulation parameters were guided by the SHINE trial design. A total of 1,000clinical trials were simulated at sample sizes ranging from 498 to 1,958. Thissample size range allowed us to cover the planned SHINE sample size of 1,400 whilealso examining model behavior at smaller and larger sample sizes. A 1:1randomization scheme was assumed for the purposes of this investigation. Allanalyses were performed using SAS version 9.2 (SAS Institute, Cary, NC, USA).
The prevalence of each baseline severity category was guided by data from two priorpilot trials of hyperglycemia management in acute stroke, the Glucose Regulation inAcute Stroke Patients (GRASP)  and Treatment of Hyperglycemia in Ischemic Stroke (THIS)  pilot trials. In the simulations, 42% of subjects were classified as“mild” at baseline, 32% classified as “moderate”, and theremaining 26% classified as “severe”. This distribution of prognosiscategories was imposed using a uniform (0, 1) random variable. In order to simulate90-day mRS scores for the control group, we examined the distribution of 90-day mRSscores for the control groups in the GRASP and THIS pilot trials. Though thesimulation of 90-day mRS scores was primarily driven by the results of the GRASP andTHIS pilot trials, the National Institute of Neurological Disorders and Stroketissue Plasminogen Activator (NINDS tPA) trial control data  were used to aid in the approximation of mRS outcome distributions withineach of the baseline severity strata. The NINDS tPA control data helped smooth thedistribution of mRS scores, as the GRASP and THIS pilot trials each had small samplesizes that resulted in several empty cells after baseline severity stratification.The exact control group distribution of 90-day mRS scores used in the simulationstudy is shown in Additional file 1: Table S1.
Success prevalence for simulated treatment effect scenarios
Treatment effect scenarios
No treatment effect
In the first varying scenario, we applied an 8.6% treatment effect in the mildcategory, a 9% treatment effect in the moderate category and a 2% treatment effectin the severe category; that is, there was an 8.6% increase in prevalence of the 0mRS for the mild stratum, a 9% increase in the prevalence of the 0 to 1 range of mRSscores for the moderate stratum, and a 2% increase in the prevalence of the 0 to 2range of mRS scores for the severe stratum. This scenario is relevant to the SHINEtrial; it is similar to what we may observe if the investigational treatment islargely beneficial to mild and moderate stroke victims, but only marginallybeneficial to victims of severe stroke. The second varying treatment effect scenarioapplies an opposite effect in which the intensive glucose control intervention islargely beneficial to more severe strokes, but only slightly beneficial to thosesubjects having mild strokes. Additional file 1: Table S1shows the exact distribution of 90-day mRS scores for the treatment groups undereach of these treatment effect scenarios. These distributions were used to randomlyassign 90-day mRS scores to each simulated subject in each simulated trial, with theproportions of success following the scenarios in Table 2. Given a subject’s simulated baseline severity stratum (mild,moderate or severe), an assignment of “success” or “failure”was made according to the sliding dichotomy definitions.
Logistic regression was used to investigate each of these scenarios. The unadjustedcase models “success” as a function only of treatment group, while thecategorically-adjusted case models “success” as a function of treatmentgroup and severity category. Severity was defined as “mild,”“moderate” or “severe” based on the NIHSS prognosis groupdiscussed in the introduction. Power and type I error rate were based on theproportion of simulated trials at a given sample size which rejected the nullhypothesis at a nominal level of 0.05. The treatment effect and its standard errorwere estimated for each trial.
Successful stroke treatments are desperately needed given stroke’s large anddetrimental effect on the worldwide population. Consequently, statistical methodsthat offer high power to detect a true treatment effect are also needed. With thissimulation study, we sought to determine whether adjustment for baseline severitywithin the responder analysis setting would be beneficial or harmful in terms ofpower and type I error rates when compared to an unadjusted analysis.
The type I error rates did not differ substantially between the two methods. Theexperimental type I error rates for both of the methods stayed within the 95%confidence bounds. This is a welcomed result, as a test that is either too liberalor too conservative, (rejects the null hypothesis either more or less than thenominal level, respectively), has implications on the power of the test. Theoscillation around the nominal 5% level of significance is likely due to chance, andis to be expected in simulated data. Since neither method shows consistently largertype I error rates than the other, we can conclude that there is no meaningfuldifference between the two methods with respect to type I error.
The power appears to be approximately the same or slightly higher for the adjustedanalyses in the selected scenarios. In the cases where the power is slightly higher,the magnitude is not remarkable and offers little evidence to suggest that adjustingby the single covariate leads to significantly more power. Although the simulationstudy presented is not exhaustive and, therefore, does not provide additionalinsight regarding this, the literature by Choi and Hernández suggest that anincrease in power could occur as other important prognostic variables are added tothe model [27, 28]. It is reassuring, however, that neither method appears to be detrimentalto power under the given scenarios.
In terms of bias, the unadjusted analyses consistently underestimate the nominaltreatment effect, while the adjusted analyses tend to be less biased, but oftenslightly overestimate the nominal treatment effect. Given the magnitude of thecoefficient estimates and their standard errors, neither of these bias tendencies issubstantial. In terms of MSE, the two methods do not differ greatly as the samplesize increases. At the smaller sample sizes, the adjusted analyses have larger MSEvalues due to increased standard error; however, as the sample size increases, theMSE values for the two methods converge.
Though negligible differences were identified between the adjusted and unadjustedmodels, researchers should keep the randomization scheme of the study in mind whendeciding whether or not to adjust for baseline severity. In general, it is advisableto “analyze as you randomize,” meaning that any variable used as astratification variable during randomization should be included as a covariate inthe analysis in order to preserve nominal type I error rates and power [22, 29]. Baseline severity is often used as a stratification variable in therandomization of acute stroke clinical trials, and should be included as a covariatein these cases.
It is important to note that these analyses adjust categorically for baselineseverity. The categories - mild, moderate and severe - are defined by the NIHSSscore, which is a larger scale ranging from 0 to 42 (limited to 3 through 22 inSHINE’s inclusion criteria). A one-unit change in the NIHSS cannot easily beinterpreted, as this change may have different meanings depending on the combinationof neurological impairments and location along the scale. Despite this issue, theNIHSS is sometimes used as a continuous measure in the literature [30, 31]. This is not necessarily straightforward and should be done with caution.It is possible that adjusting by the actual NIHSS score will provide additionalinformation to the model and increase or maintain power in some treatment effectscenario(s). However, due to uncertainties in the clinical interpretation of acontinuous NIHSS variable, adjustment by actual NIHSS score has been left as a topicfor future research.
Adjustment for other baseline prognostic variables may also impact study power underthe given scenarios. The inclusion of additional covariates that were not used indefining the primary outcome has not been examined in these scenarios, as it isoutside the primary focus of this research. It is conceivable that the addition ofmultiple covariates could reduce overall power due to the increasing standard erroron the treatment effect estimate, as studied by Robinson and Jewell  and discussed in the Background section of this paper.
Our results show negligible differences between analysis methods in the responderanalysis setting, suggesting that in most treatment effect scenarios, adjustment forbaseline severity in the primary analyses may best be guided by individual studyneeds rather than a blanket guideline for all studies. Though we have not shown theresults here, we did examine other treatment effect scenarios which yield similarresults. These scenarios included a flat and varying 15% treatment effect (insteadof the 7% specified in the SHINE study plan), as well as a scenario in which themild group experienced 5% harm.
Overall, these results shed light on the important concept of adjustment in thecontext of responder analysis. Though this study only examined a single severityscale, its findings are not restricted to use in stroke studies; they can provideinsight into the treatment of categorical baseline prognostic covariates in otherstudies which use responder analysis to define their primary outcome ofinterest.
KG recently completed her master’s degree in biostatistics from the MedicalUniversity of South Carolina. This manuscript is a result of her master’sthesis and her work as a graduate student on the SHINE grant. The other authors wereher committee members and VD was her primary mentor.
Glucose Regulation in Acute Stroke Patients
modified Rankin Scale
mean square error
National Institute of Health Stroke Scale
- NINDS tPA:
National Institute of Neurological Disorders and Stroke tissue PlasminogenActivator
- SHINE trial:
Stroke Hyperglycemia Insulin Network Effort trial
Treatment of Hyperglycemia in Ischemic Stroke.
This work was funded by both the SHINE trial NIH/NINDS grant 1U01-NS069498 andthe Neurological Emergencies Treatment Trial (NETT) Statistical and DataManagement Center NIH/NINDS grant U01-NS059041.
- Sacco RL, Frieden TR, Blakeman DE, Jauch EC, Mohl S: What the Million Hearts Initiative means for stroke: a presidential advisoryfrom the American Heart Association/American Stroke Association. Stroke. 2012, 43: 924-928. 10.1161/STR.0b013e318248f00e.View ArticlePubMedGoogle Scholar
- Saver JL: Optimal endpoints for acute stroke therapy trials: best ways to measuretreatment effects of drugs and devices. Stroke. 2011, 42: 2356-2362. 10.1161/STROKEAHA.111.619122.View ArticlePubMedPubMed CentralGoogle Scholar
- Hong KS, Lee SJ, Hao Q, Liebeskind DS, Saver JL: Acute stroke trials in the 1st decade of the 21st century. Stroke. 2011, 42: e314-View ArticleGoogle Scholar
- Bath PM, Gray LJ, Collier T, Pocock S, Carpenter J, The Optimising Analysis of Stroke Trials (OAST) Collaboration: Can we improve the statistical analysis of stroke trials? Statisticalreanalysis of functional outcomes in stroke trials. Stroke. 2007, 38: 1911-1915.View ArticlePubMedGoogle Scholar
- Banks JL, Marotta CA: Outcomes validity and reliability of the modified Rankin scale: implicationsfor stroke clinical trials. Stroke. 2007, 38: 1091-1096. 10.1161/01.STR.0000258355.23810.c6.View ArticlePubMedGoogle Scholar
- Saver JL: Novel end point analytic techniques and interpreting shifts across the entirerange of outcome scales in acute stroke trials. Stroke. 2007, 38: 3055-3062. 10.1161/STROKEAHA.107.488536.View ArticlePubMedGoogle Scholar
- Young FB, Lees KR, Weir CJ: Strengthening acute stroke trials through optimal use of disability endpoints. Stroke. 2003, 34: 2676-2680. 10.1161/01.STR.0000096210.36741.E7.View ArticlePubMedGoogle Scholar
- Murray GD, Barer D, Choi S, Fernandes H, Gregson B, Lees KR, Maas AI, Marmarou A, Mendelow AD, Steyerberg EW, Taylor GS, Teasdale GM, Weir CJ: Design and analysis of phase III trials with ordered outcome scales: theconcept of the sliding dichotomy. J Neurotrauma. 2005, 22: 511-517. 10.1089/neu.2005.22.511.View ArticlePubMedGoogle Scholar
- Savitz SI, Lew R, Bluhmki E, Hacke W, Fisher M: Shift analysis versus dichotomization of the modified Rankin scale outcomescores in the NINDS and ECASS-II trials. Stroke. 2007, 38: 3205-3212. 10.1161/STROKEAHA.107.489351.View ArticlePubMedGoogle Scholar
- Kasner SE: Clinical interpretation and use of stroke scales. Lancet Neurol. 2006, 5: 603-612. 10.1016/S1474-4422(06)70495-1.View ArticlePubMedGoogle Scholar
- Tilley BC, Marler J, Geller NL, Lu M, Legler J, Brott T, Lyden P, Grotta J: Use of a global test for multiple outcomes in stroke trials with applicationto the National Institute of Neurological Disorders and Stroke t-PA StrokeTrial. Stroke. 1996, 27: 2136-2142. 10.1161/01.STR.27.11.2136.View ArticlePubMedGoogle Scholar
- Cobo E, Secades JJ, Miras F, Gonzalez JA, Saver JL, Corchero C, Rius R, Dàvalos A: Boosting the chances to improve stroke treatment. Stroke. 2010, 41: e143-e150. 10.1161/STROKEAHA.109.567404.View ArticlePubMedGoogle Scholar
- Saver JL, Gornbein J: Treatment effects for which shift or binary analyses are advantageous inacute stroke trials. Neurology. 2009, 72: 1310-1315. 10.1212/01.wnl.0000341308.73506.b7.View ArticlePubMedPubMed CentralGoogle Scholar
- McHugh GS, Butcher I, Steyerberg EW, Marmarou A, Lu J, Lingsma HF, Weir J, Maas AI, Murray GD: A simulation study evaluating approaches to the analysis of ordinal outcomedata in randomized controlled trials in traumatic brain injury: results fromthe IMPACT Project. Clin Trials. 2010, 7: 44-57. 10.1177/1740774509356580.View ArticlePubMedGoogle Scholar
- Howard G, Waller JL, Voeks JH, Howard VJ, Jauch EC, Lees KR, Nichols FT, Rahlfs VW, Hess DC: A simple, assumption-free, and clinically interpretable approach for analysisof modified Rankin outcomes. Stroke. 2012, 43: 664-669. 10.1161/STROKEAHA.111.632935.View ArticlePubMedGoogle Scholar
- Saver JL, Yafeh B: Confirmation of tPA treatment effect by baseline severity-adjusted end pointreanalysis of the NINDS-tPA stroke trials. Stroke. 2007, 38: 414-416. 10.1161/01.STR.0000254580.39297.3c.View ArticlePubMedGoogle Scholar
- Young FB, Lees KR, Weir CJ: Improving trial power through use of prognosis-adjusted end points. Stroke. 2005, 36: 597-601. 10.1161/01.STR.0000154856.42135.85.View ArticlePubMedGoogle Scholar
- Bruno A, Durkalski VL, Hall CE, Juneja R, Barsan WG, Janis S, Meurer WJ, Fansler A, Johnston KC: The stroke hyperglycemia insulin network effort (SHINE) trial; design andmethodology. Int J Stroke. In press,
- den Hertog HM, van der Worp HB, van Gemert HM, Algra A, Kappelle LJ, van Gijn J, Koudstaal PJ, Dippel DW, PAIS Investigators: The Paracetamol (Acetaminophen) In Stroke (PAIS) trial: a multicentre,randomised, placebo-controlled, phase III trial. Lancet Neurol. 2009, 8: 434-440. 10.1016/S1474-4422(09)70051-1.View ArticlePubMedGoogle Scholar
- Adams HP, Effron MB, Torner J, Davalos A, Frayne J, Teal P, Leclerc J, Oemar B, Padgett L, Barnathan ES, Hacke W: Emergency administration of abciximab for treatment of patients with acuteischemic stroke: results of an international phase III trial. Stroke. 2008, 39: 87-99. 10.1161/STROKEAHA.106.476648.View ArticlePubMedGoogle Scholar
- Piantadosi S: Clinical Trials: A Methodological Perspective. 2005, Hoboken, New Jersey: John Wiley and Sons, 470-473. 2View ArticleGoogle Scholar
- Harrell FE: The Role of Covariable Adjustment in the Analysis of ClinicalTrials. [http://biostat.mc.vanderbilt.edu/twiki/pub/Main/FHHandouts/covadj.pdf],
- Robinson LD, Jewell NP: Some surprising results about covariate adjustment in logistic regressionmodels. Int Stat Rev. 1991, 58: 227-240.View ArticleGoogle Scholar
- Johnston KC, Hall CE, Kissela BM, Bleck TP, Conaway MR, GRASP Investigators: Glucose Regulation in Acute Stroke Patients (GRASP) trial: a randomized pilottrial. Stroke. 2009, 40: 3804-3809. 10.1161/STROKEAHA.109.561498.View ArticlePubMedPubMed CentralGoogle Scholar
- Bruno A, Kent TA, Coull BM, Shankar RR, Saha C, Becker KJ, Kissela BM, Williams LS: Treatment of Hyperglycemia in Ischemic Stroke (THIS): a randomized pilottrial. Stroke. 2008, 39: 384-389. 10.1161/STROKEAHA.107.493544.View ArticlePubMedGoogle Scholar
- The National Institute of Neurological Disorders and Stroke rt-PAStroke Study Group: Tissue plasminogen activator for acute ischemic stroke. N Eng J Med. 1995, 333: 1581-1588.View ArticleGoogle Scholar
- Choi SC: Sample size in clinical trials with dichotomous endpoints: use ofcovariables. J Biopharm Stat. 1998, 8: 367-375. 10.1080/10543409808835246.View ArticlePubMedGoogle Scholar
- Hernández AV, Steyerberg EW, Butcher I, Mushkudiani N, Taylor GS, Murray GD, Marmarou A, Choi SC, Lu J, Habbema DF, Maas AI: Adjustment for strong predictors of outcome in traumatic brain injury trials:25% reduction in sample size requirements in the IMPACT study. J Neurotrauma. 2006, 23: 1295-1303. 10.1089/neu.2006.23.1295.View ArticlePubMedGoogle Scholar
- Committee for Proprietary Medicinal Products: Committee for ProprietaryMedicinal Products (CPMP): points to consider on adjustment for baseline covariates. Stat Med. 2004, 23: 701-709.View ArticleGoogle Scholar
- Lin HJ, Chang WL, Tseng MC: Readmission after stroke in a hospital based registry: risk, etiologies, andrisk factors. Neurology. 2011, 76: 438-443. 10.1212/WNL.0b013e31820a0cd8.View ArticlePubMedGoogle Scholar
- Dai DF, Thajeb P, Tu CF, Chiang FT, Chen CH, Yang RB, Chen JJ: Plasma concentration of SCUBE1, a novel platelet protein, is elevated inpatients with acute coronary syndrome and ischemic stroke. J Am Coll Cardiol. 2008, 51: 2173-2180. 10.1016/j.jacc.2008.01.060.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative CommonsAttribution License (http://creativecommons.org/licenses/by/2.0), whichpermits unrestricted use, distribution, and reproduction in any medium, provided theoriginal work is properly cited.