Baseline hospital performance and the impact of medical emergency teams: Modelling vs. conventional subgroup analysis

Background To compare two approaches to the statistical analysis of the relationship between the baseline incidence of adverse events and the effect of medical emergency teams (METs). Methods Using data from a cluster randomized controlled trial (the MERIT study), we analysed the relationship between the baseline incidence of adverse events and its change from baseline to the MET activation phase using quadratic modelling techniques. We compared the findings with those obtained with conventional subgroup analysis. Results Using linear and quadratic modelling techniques, we found that each unit increase in the baseline incidence of adverse events in MET hospitals was associated with a 0.59 unit subsequent reduction in adverse events (95%CI: 0.33 to 0.86) after MET implementation and activation. This applied to cardiac arrests (0.74; 95%CI: 0.52 to 0.95), unplanned ICU admissions (0.56; 95%CI: 0.26 to 0.85) and unexpected deaths (0.68; 95%CI: 0.45 to 0.90). Control hospitals showed a similar reduction only for cardiac arrests (0.95; 95%CI: 0.56 to 1.32). Comparison using conventional subgroup analysis, on the other hand, detected no significant difference between MET and control hospitals. Conclusions Our study showed that, in the MERIT study, when there was dependence of treatment effect on baseline performance, an approach based on regression modelling helped illustrate the nature and magnitude of such dependence while sub-group analysis did not. The ability to assess the nature and magnitude of such dependence may have policy implications. Regression technique may thus prove useful in analysing data when there is a conditional treatment effect.


Introduction
Following the landmark reports from the Institute of Medicine (IOM), numerous programs designed to improve patient safety have been introduced [1][2][3][4][5][6]. Rigorous evaluation of such programs provides a considerable challenge. Organisational theory recognizes that the effectiveness of health care interventions is likely to be system dependent. Therefore, understanding system specific organizational characteristics might be important in evaluating the effectiveness of interventions [7][8][9][10].
However, statistical analysis of data from the Medical Early Response & Intervention Therapy (MERIT) study, a 23-hospital cluster randomised controlled trial of the MET system, failed to show a difference in the aggregate incidence of unexpected cardiac arrests, unexpected deaths and unanticipated ICU admissions between MET and control hospitals [27]. The primary MERIT statistical analysis protocol was based on main effect analysis. Although it used the baseline incidence of adverse events as a covariate, it did not test for an interaction effect between treatment allocation and the baseline incidence of the study outcomes.
The baseline incidence of a specified outcome is an important hospital performance characteristic that may predict the magnitude of improvement in response to an intervention [28]. It varied greatly in MERIT. This raises questions regarding what might be the correct statistical approach to the analysis of the MERIT study results. Unfortunately, there is no established and widely accepted statistical approach that can be applied under these circumstances. Yet the choice of statistical method might well affect the interpretation of the study findings. In this setting, a comparison of statistical approaches might illustrate the impact of the choice of statistical technique on data interpretation and have useful implications for the analysis of similar studies in the future.
The aim of this study was to compare a regression based approach with subgroup analysis in terms of its results and empirical interpretation when the baseline outcome proved to be a continuous variable and treatment effect modifier. Accordingly, we developed a methodology that incorporated regression-modelling techniques and applied it to the MERIT study data. With this methodology, we studied the relationship between the baseline incidence of adverse events and the change that occurred during preparation for and activation of METs. We then compared the findings identified with this approach with those obtained from conventional subgroup analysis.

Methods
The sample recruitment, size calculation, ethical approval, and randomisation scheme for the MERIT study have been described previously [27]. The primary outcome for the MERIT study was the aggregate incidence (adverse events divided by number of eligible patients admitted to the hospital during the study period) of the three adverse events: 1) cardiac arrests without a pre-existing not for resuscitation (NFR) order, 2) unplanned ICU admissions, and 3) unexpected deaths (deaths without an NFR order) occurring in general wards. Secondary outcomes were the incidences of each individual adverse event. Data collection was conducted during a two-month baseline period. This was followed by a four month standardised implementation period during which education was delivered on the concepts and practice changes required with the introduction of the MET and then by a six month study activation period during which the MET system was operational [27]. The MERIT study was approved by the Ethics Committee of the University of New South Wales.
Data collection was conducted in control hospitals during the same time periods. The conduct of the study was not publicised in the control hospitals, and the management and resuscitation committees of the control hospitals agreed that the operation of their cardiac arrest teams would continue unchanged during the study.

Statistical Methods
To test our hypothesis, we used the previously published MERIT data. We set the change in incidence of adverse events as the dependent variable. We then tested for interaction effects (both linear and quadratic) between the baseline incidence of the primary and secondary outcomes and treatment allocation (MET versus control). We used an analytically weighted regression model. We weighted this model by the number of admissions during the study period. This weighting is an extension of the weighted-t test often used in cluster randomised controlled trials [29][30][31]. This initial approach established the existence of significant linear and quadratic interaction effects for the primary outcome, unexpected cardiac arrests and unplanned ICU admissions. Thus, we conducted the statistical analyses for control hospitals and MET hospitals, separately as described below.

Statistical modelling given significant interaction effects
We analysed the primary outcome and secondary outcomes in the same way. First, we fitted a quadratic model. If the model detected a significant quadratic effect, these results were presented together with the results of the linear effect model for comparison. If the model detected no significant quadratic effect, a linear effect model only was fitted and presented. This was done to minimize the problem of multi-co linearity. This model showed that only unexpected cardiac arrests had negative linear slopes for both groups. We examined the statistical difference between these two slopes by testing the interaction effect between treatment group and the baseline incidence of unexpected cardiac arrests. We then addressed the issues of a) small sample size, b) lack of normality, and c) possible heteroscedasticity. We did this by comparing the analytically weighted regression modelling results using two different methods: the ordinary least square method and the heteroscedasticity-consistent covariance matrix estimation method (HC3) [31,32]. We also assessed the potential confounding effects of both teaching status and hospital location. We used a hybrid of forced entry and a blocked backward elimination multivariate regression model. The baseline incidence (linear or quadratic) was forced into the model. The block included teaching status and hospital location. The probability of the block exclusion from the final model was set at 0.15. We analytically weighted the regression model by admission volume. We present the results as the predicted regression curves with 95%CI bands, as well as the original data points for each hospital.

Conventional subgroup analysis given significant interaction effects
For the purpose of this analysis, we followed convention and split the sample into two groups: those hospitals equal to or above the median baseline value and those hospitals below the median baseline value for a given outcome. We then used the weighted-t test for the analysis of both sub-groups separately.

Results
The baseline data for hospital and patient characteristics and the numbers for each event have been presented previously [27]. The baseline characteristics of the MET and control hospitals were similar. Table 1 shows the results of the interaction effects between treatment (MET versus control) and the baseline incidence of the primary and secondary outcomes. There were significant quadratic interaction effects between treatment allocation and the baseline incidence of the primary outcome, unplanned ICU admissions and unexpected deaths. There was no significant interaction effect for unexpected cardiac arrests.
Baseline, study and change in incidence from baseline to study period for the primary and secondary outcomes showed large variability ( Table 2). The relationship between baseline incidence, and its change, for both primary and secondary outcomes is presented in Table 3. During the study period, hospital teaching status and location had no significant impact on this relationship (Table 4). Accordingly, we presented the results from the models with baseline incidence only and the predicted curves for these relationships are shown in Figure 1. In  * Baseline incidence and change in incidence are presented as events per 1000 admissions. Note: P values (in the parentheses) for the regression coefficients and the constant; control hospitals showed a quadratic effect for only the primary outcome and unexpected deaths. * P < 0.05; ** P < 0.01; ¶The sensitivity analysis by removing the two hospitals with highest baseline incidence in MET hospitals produced a regression coefficient for baseline incidence as -0.416 with p = 0.034; † Only the MET hospitals showed a linear effect for all of the four outcomes.
MET hospitals, 5 out of 8 Pearson correlation coefficients between baseline, implementation and study period for every outcome were greater than 0.83 with the lowest value at 0.65. In MET hospitals, the greater the baseline incidence of the primary outcome, the greater its reduction during study period. For every 10 additional baseline events per 1000 admissions, there was an additional reduction of 5.92 events (59.2%). Furthermore, the baseline incidence of the primary outcome accounted for 71% of the variance of this change. In comparison, in control hospitals, it accounted for 53% of variance. Sensitivity analysis showed that even after removing the two hospitals with the highest baseline incidence for the primary outcome, the findings were not qualitatively different.
For both MET and control hospitals, the greater the baseline incidence of unexpected cardiac arrests detected, the greater its reduction during the study period. For every 10 additional baseline cardiac arrests per 1000 admissions, there was an additional reduction of 9.45 and 7.36 arrests for control and MET hospitals, respectively. Furthermore, the baseline incidence of unexpected cardiac arrests accounted for 78.2% and 85.1% of the variance of this change, respectively. The interaction effect test showed no statistical difference. The baseline incidence had no impact on the change in unplanned ICU admissions in control hospitals. In contrast, in MET hospitals, there was a significant linear relationship between the baseline incidence and its change during the study period. In MET hos-pitals, for every 10 additional baseline events per 1000 admissions, there was an additional reduction of 5.56 events (55.6%). Furthermore, the baseline incidence of unplanned ICU admissions accounted for 64.3% of the variance of this change. In comparison, in control hospitals, it only accounted for 11.2% of this variance.
The baseline incidence showed a significant quadratic relationship with the change in unexpected deaths in control hospitals and a linear relationship in MET hospitals. In MET hospitals, for every 10 additional baseline unexpected deaths per 1000 admissions, there was an additional reduction of 6.76 events (67.6%). The baseline incidence accounted for 81.9% of the variance of this change. In comparison, in control hospitals, it accounted for 92.3% of this variance.
The relationship between baseline incidence and its change between baseline and implementation periods for each outcome is shown in Table 5. These show that, in MET hospitals, for all outcomes, the greater the baseline incidence, the greater the reduction. In contrast, control hospitals showed a quadratic trend for primary outcome and unexpected deaths. These relationships are shown in Figure 2.

Conventional subgroup analyses results
There was no significant interaction effect (p = 0.081) for a dichotomized modifier using the baseline median value Relationship between baseline and change in incidence (y axis) for all outcomes during the study period* Figure 1 Relationship between baseline and change in incidence (y axis) for all outcomes during the study period*.* 95%CI band of the predicted curve as well as the original data points presented; the sizes of the scatter points in each graph were drawn proportional to the volume of admissions during the study period; baseline incidence and change in incidence are presented as events per 1000 admissions; for only primary outcome and unexpected death in control hospitals, the figures suggest a quadratic relationship.  Relationship between and the baseline and changes in incidence (y axis) for all outcomes during the implementation period* Figure 2 Relationship between and the baseline and changes in incidence (y axis) for all outcomes during the implementation period*. * 95%CI band of the predicted curve as well as the original data points presented; the sizes of the scatter points in each graph were drawn proportional to the volume of admissions during the implementation period; baseline and change in incidence were presented as events per 1000 admissions; for only primary outcome and unexpected death in control hospitals, the figures suggest a quadratic relationship.  as cut-off. Logically, therefore, there was no need to conduct further subgroup analysis. Nevertheless, the following subgroup analysis was presented for demonstrative purpose. Table 6 shows the results of statistical analysis using the weighted-t test for sub-groups after splitting them according to median baseline values. None of the outcomes showed any statistical significance between MET and control hospitals for both subgroups, except for a borderline significant effect for unexpected cardiac arrests in the subgroup with an incidence lower than the median baseline incidence.

Discussion
We applied a statistical approach that incorporates regression-modelling techniques to the analysis of data from the MERIT study; a cluster randomized controlled trial of the implementation of Medical Emergency Teams (METs). We compared this approach to one based on the conventional method of using a median cut off value to separate groups and then performing subgroup analysis. We found that an approach incorporating regression modelling detected a significant effect of the baseline incidence of the adverse events upon the subsequent effect of introducing METs. In contrast, conventional subgroup analysis did not. The findings based on regression modelling suggest that the baseline incidence of cardiac arrests, unplanned ICU admissions, and unexpected deaths has a significant association with their subsequent reduction after the introduction of a MET. They also indicate that the magnitude of this reduction was proportional to their baseline incidence. Finally, they demonstrate that the choice of statistical analysis plan can significantly affect the interpretation of the outcome data obtained during MERIT [34] The MERIT study paper was designed to compare the incidence of the primary and secondary outcomes during the study period and powered according to the expected size of the therapeutic effect and the expected incidence and variance of the primary outcome [35]. Accordingly, all the analyses were designed and conducted based on a main effect model with adjustment for baseline incidences. This is similar to an ANCOVA approach [36]. Thus, the baseline incidences of both primary and secondary outcomes were adjusted for in the model, but only as covariates. No interaction effects were tested for. Such strategy didn't examine whether the treatment effect was influenced by the baseline incidence of the study outcomes. It didn't examine the possibility that the outcomes would be, to some extent, affected by their baseline incidence of adverse events (baseline hospital performance). It is plausible for many interventions where the relative risk reduction may relate to different baseline incidences of the primary outcome. In MERIT trial, we could address this issue by empirically testing for the interaction effect between treatment allocation and the baseline incidence of study outcomes. We tested for both linear and quadratic interaction effects in order to avoid potential misspecification of true interactions. Misspecification refers to situations when significant interaction effects and/or higher order effects are omitted from a model. In this case, a simplified model with only a main effect and/or linear effect is a simple but possibly inaccurate reflection of the relationship under investigation. The rationale for introducing a higher order interaction effect was that there was an insufficient theoretical basis to believe that any interaction effect should be linear. This is analogous to introducing a quadratic effect for continuous variables in a main effect model.
Given the above considerations, we set out to test whether choice of statistical technique affected our ability to detect the impact of baseline incidence of adverse events on the effect of introducing MET systems. We used regression modelling techniques to explore the relationship between the baseline incidence of adverse events and its change.
The main advantage of using the change of the outcome as the endpoint is in its intuitive interpretation. That is, a lot of system and policy initiatives aim to affect change.
The changes incurred by the intervention are often the outcomes we want to understand. Also given that the distribution of baseline incidences was balanced and the before-after correlation was high for the primary endpoint, the efficacy gain of using an ANCOVA approach may have been negligible [36]. We assessed MET and control hospitals separately after we found significant interaction effects and found that, for the primary outcome, unplanned ICU admissions and unexpected deaths, the baseline incidence and their subsequent changes were related in a different way in MET hospitals compared to control hospitals. In MET hospitals, the relationship between the baseline incidence and its change was linear.
In control hospitals, this relationship was quadratic for the primary outcome and for unexpected deaths. It was absent for unplanned ICU admissions. We also found that, for MET hospitals, these observations started to apply during the education and training period well before the MET system had been activated. A similar education effect on outcome after introduction of a MET system has been previously suggested [19].
The MERIT main-effect analysis, an unconditional valid analysis, showed that benefit of MET may be small on average. Our conditionally analysis showed that the benefit may still be large among those hospitals with high baseline incidences of the study outcomes. The observations derived from statistical modelling suggest a 'proportionality effect'. That is, the relative change in the incidence of an outcome was similar across all levels of baseline incidence, but the absolute change became greater as baseline incidence increased. This phenomenon has been described with other interventions [28]. For example, a recent study of the impact of quality of care interventions in 3000 U.S. hospitals found that, for 16 out of the 17 process-of-care measures, hospitals with a low level of performance at baseline had the greatest improvement [28].
Our results are consistent with the notion that the effectiveness of interventions in complex organizations may be dependent on specific local characteristics. They also suggest that baseline characteristics of individual hospitals should be explored when assessing the possible effectiveness of interventions directed at system change. The strong predictive effect and the magnitude of the variance of improvement explained by the baseline incidence of outcome variables also suggest that such outcomes may be useful indicators of quality of care. They are relatively easy to measure, define, and report upon in a timely fashion.
The regression towards the mean effect may explain the baseline effect seen in MET hospitals with our statistical modelling approach. Regression toward the mean effect, sometimes called the regression effect, is a statistical principle concerning the relationship between two linked measurements, x and y. It states that if x is above its mean, then the associated y is likely to be closer to its mean than x was. The conventional way to assess data for the presence of this effect in MERIT would be to compare the pattern of results in MET hospitals with those in control hospitals [37]. The difference in the pattern shown in Figure 1 &2 between MET and control hospitals suggests that the distribution of outcomes is unlikely to be explained by such an effect. In addition, the range of regression coefficients for all outcomes increased in magnitude during the study period. Furthermore, in MET hospitals, the correlations between baseline, implementation and study period were high. Such higher correlation makes it less likely that the results are due to regression towards the mean [38]. Moreover, as we adopted a stratified (by teaching status and geographic location: urban versus metropolitan) and blocked randomisation scheme, the baseline incidences distributions were balanced between MET and control hospitals [27]. Stratified randomization and baseline balance also make it less likely that a regression towards the mean effect explains our findings.
Our statistical approach avoided the possibility of misspecification by incorporating a higher order interaction effect. Secondly, by not splitting the sample using a median value cut-off, our approach decreased the problems associated with loss of statistical power. It may be possible to effectively carry out a sufficiently powered conventional subgroup analysis in a very large rand-omized controlled trial. However, the MERIT study had only a 25% power to detect a 30% difference in the primary outcome and more than 100 hospitals would have been needed to give it sufficient power. These features of the MERIT study make it unlikely that one could detect a statistical difference when comparing 5 to 6 hospitals within two arms. A further aspect of our modelling approach is that, although we analysed MET and control hospitals separately, our conclusions are based on the comparison of the same results between two groups. This approach compared the relationship of baseline performance with study outcome between control and treatment groups and used all data to draw conclusions. Subgroup analysis, instead, assessed treatment effects in both lower and higher baseline performance groups, respectively. This aspect is a further critical factor in making it less likely that the different patterns seen were due to the regressiontoward-the-mean phenomenon. The advantage of this regression approach over subgroup analysis may be that it allows for smooth linear or nonlinear relationships with baseline variables. These smooth relationships may be plausible in some settings such as in the MERIT study. However, one potential disadvantage of our approach is that, due to the existence of a significant quadratic interaction effect, it may not be feasible to provide a unique treatment effect estimate (such as absolute risk difference) as would be the case with a conventional approach. Furthermore, the existence of a non-linear interaction effects means that the incremental value of the treatment effect is dependent on the value of the baseline incidence. The baseline period was short compared to the implementation and study periods. This may have influenced our findings. However, we found a strong association between the baseline incidence and its change. The strength of this association provides evidence that the baseline incidence had sufficient predictive validity.
Our analysis was performed post hoc. Furthermore, the number of hospitals included in the study was relatively small. However, sensitivity analysis showed that even after removing the two hospitals with the highest baseline incidence for the primary outcome, the findings were not qualitatively different. Nonetheless, our results should be considered preliminary and they should be confirmed in future studies with similar design.

Conclusions
In summary, we used regression modelling to test the hypothesis that the baseline incidence of cardiac arrests, unplanned ICU admissions, and unexpected deaths has a significant influence on the change in their incidence that could be achieved through the introduction of a MET. Our findings support this hypothesis. We also found that the magnitude of this reduction was proportional to the baseline incidence of adverse events such that the higher the baseline incidence, the greater the absolute reduction. These differences were consistent with the initial findings of the significant interaction effects. We found that these observations were in contrast to those obtained using conventional sub-group analysis. They suggest that the choice of statistical analysis can significantly affect the interpretation of the findings of the MERIT study. They also raise concerns about the robustness of conventional sub-group analysis in similar settings.