Statistical analysis plan for the Steppedwedge Cluster Randomized trial of Electronic Early Notification of sepsis in hospitalized ward patients (SCREEN)

Background It is unclear whether screening for sepsis using an electronic alert in hospitalized ward patients improves outcomes. The objective of the Stepped-wedge Cluster Randomized Trial of Electronic Early Notification of Sepsis in Hospitalized Ward Patients (SCREEN) trial is to evaluate whether an electronic screening for sepsis compared to no screening among hospitalized ward patients reduces all-cause 90-day in-hospital mortality. Methods and design This study is designed as a stepped-wedge cluster randomized trial in which the unit of randomization or cluster is the hospital ward. An electronic alert for sepsis was developed in the electronic medical record (EMR), with the feature of being active (visible to treating team) or masked (inactive in EMR frontend for the treating team but active in the backend of the EMR). Forty-five clusters in 5 hospitals are randomized into 9 sequences of 5 clusters each to receive the intervention (active alert) over 10 periods, 2 months each, the first being the baseline period. Data are extracted from EMR and are compared between the intervention (active alert) and control group (masked alert). During the study period, some of the hospital wards were allocated to manage patients with COVID-19. The primary outcome of all-cause hospital mortality by day 90 will be compared using a generalized linear mixed model with a binary distribution and a log-link function to estimate the relative risk as a measure of effect. We will include two levels of random effects to account for nested clustering within wards and periods and two levels of fixed effects: hospitals and COVID-19 ward status in addition to the intervention. Results will be expressed as relative risk with a 95% confidence interval. Conclusion The SCREEN trial provides an opportunity for a novel trial design and analysis of routinely collected and entered data to evaluate the effectiveness of an intervention (alert) for a common medical problem (sepsis in ward patients). In this statistical analysis plan, we outline details of the planned analyses in advance of trial completion. Prior specification of the statistical methods and outcome analysis will facilitate unbiased analyses of these important clinical data. Trial registration ClinicalTrials.gov NCT04078594. Registered on September 6, 2019 Supplementary Information The online version contains supplementary material available at 10.1186/s13063-021-05788-3.

Trial registration: ClinicalTrials.gov NCT04078594. Registered on September 6, 2019 Keywords: Sepsis, Alert, Screening, qSOFA, Mortality, Electronic medical records Background Sepsis is a major cause of morbidity and mortality among hospitalized patients. Sepsis outcome is greatly dependent on the time-sensitive administration of appropriate antimicrobials, fluid resuscitation, and source control [1][2][3]. Screening for sepsis using an electronic alert in hospitalized patients may improve outcomes by early sepsis recognition and timely implementation of appropriate care processes. However, the evidence for such an intervention is modest [4], and a randomized trial is needed to measure its true effect.
The objective of the Stepped-wedge Cluster Randomized Trial of Electronic Early Notification of Sepsis in Hospitalized Ward Patients (SCREEN) is to evaluate whether electronic screening for sepsis compared to no screening among hospitalized ward patients reduces allcause 90-day in-hospital mortality [5]. The electronic screening for sepsis is based on the quick Sequential Organ Failure Assessment qSOFA [6].
In this manuscript, we describe the statistical analysis plan (SAP) of the SCREEN trial. This SAP complies with the International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use and both the "Statistical principles for clinical trials E9" and "Structure and content of clinical study reports E3" [7,8]. The final study report will follow the Consolidated Standards of Reporting Trials (CONSORT) 2010 guidelines for reporting randomized controlled trials and the CONSORT extension for stepped-wedge cluster randomized trials [9][10][11]. This SAP identifies the procedures to be applied to the primary and secondary analyses for the trial cohort once trial data are complete. The SAP was finalized during trial implementation, and all analyses were prospectively defined.

Study design
The study is conducted in the 5 Ministry of National Guard Health Affairs (MNGHA) hospitals which share the same electronic medical record (EMR) system (BESTCare, South Korea). This study is designed as a stepped-wedge cluster randomized trial, which allows to sequentially deliver the study intervention to all trial clusters over a number of periods. We present a glossary of terms in line with the CONSORT extension for stepped-wedge cluster randomized trials [11] in Table 1. The cluster refers to the unit of randomization, which is the hospital ward, and we will refer to it in the rest of the document as "ward." A list of ward-level and patient-level eligibility criteria is outlined in the study protocol [5]. Wards are randomized into 9 sequences of 5 wards each to receive the intervention. After a baseline period of 2 months, the intervention is implemented in a new sequence of five new randomly selected wards, until it is eventually implemented in all 9 sequences (45 wards) (Fig. 1). A computer-generated non-stratified randomly allocated concealed list determines the order in which the wards receive the intervention. The randomization list was maintained with a research coordinator who was not involved in this trial, and the ward allocation remained concealed from the research and clinical teams throughout the study and was revealed for a given sequence only 1 month before the implementation of the intervention to allow training.

Intervention and control groups
The intervention included the implementation of an electronic alert system and the associated training, feedback, and audit. The alert was developed in the electronic medical record (EMR), with the feature of being Table 1 Glossary of terms used in the current report, based on the CONSORT extension for cluster randomized trials [11] Term Definitions [11] Applicability to SCREEN trial

Cluster
The unit of randomization Hospital ward

Control condition
The comparator treatment Masked alert Duration of period Time (e.g., months) between each step 10 periods, each is 2 months

Intervention condition
The treatment under evaluation Active alert Sequence A sequence of codes defining the order of implementation of the treatment conditions for each cluster. More than one cluster can be allocated to each sequence 9 sequences, with 5 clusters in each sequence Step A planned point at which a cluster or a group of clusters crosses from control to intervention The period between two steps is 2 months active (visible to the treating team) or masked (inactive in EMR frontend for the treating team but active in the backend of the EMR). Once a ward is randomized to the intervention group, the alert system was activated. The intervention group constitutes of patients admitted to the wards with the active alert, and the control group constitutes of patients admitted to the wards with masked alert. In the intervention group, once a patient meets the qSOFA criteria, an alert appears in EMR as a popmessage and appears also on a hand-held device carried by the charge nurse of the corresponding ward. The alert prompts the nurse to notify the physician and prompts the physician to assess the patient for possible sepsis. At the beginning of the study, we launched a hospital-wide sepsis awareness campaign in all 5 participating hospitals focusing on the importance of timely interventions for sepsis. We also provided in-service training sessions to the related medical and nursing departments at the beginning of the study and before implementation in each ward. We conducted regular webinars with the leaders of active wards. We also created an intranet page with educational resources (videos, presentations, documents, posters, and related links) that explained the project and provided clinical guidance and resources. We developed a dashboard to display data for each active ward on the number of alerts and the percentage of acknowledged alerts and time to acknowledgment by nurses and physicians. We have set 15 min as an overstretched target for nursing acknowledgment and 30 min for physician acknowledgment, and we have tracked these indicators on the electronic dashboards. The data on the dashboards have been made available in real-time for nursing and medical managers of each ward. In addition, the same data reports are generated every 2 weeks and shared with the teams in the active wards. The hospital administration and quality management department are engaged in the process of feedback and of improving the response time to alerts. The bedside management and assessment are at the discretion of the treating team, and the project has adopted the 2016 Surviving Sepsis Campaign guidelines and the hour-1 bundle, but without specific monitoring of bundle compliance [2,12].

Study population
The study has been conducted in the 5 Ministry of National Guard Health Affairs (MNGHA) hospitals: King Abdulaziz Medical City, Riyadh; King Abdulaziz Medical City, Jeddah; and Prince Mohammed Bin Abdul Aziz Hospital, Al Madinah; King Abdulaziz Hospital, Al Ahsa; and Imam Abdulrahman Al Faisal Hospital, Dammam. A CONSORT diagram will be generated according to the CONSORT extension for stepped-wedge cluster randomized trials (CRTs) [11]. For each sequence at each period, the number of clusters receiving the intervention, the average cluster size, and its variance, and the number of clusters not receiving the intervention will be presented ( Fig. 1).

Sample size
The sample size for this stepped-wedge cluster randomized design was calculated for 45 clusters with 10 periods (including one baseline period) using Power Analysis and Sample Size (PASS) software (PASS 15 Power Analysis and Sample Size Software (2017). NCSS, LLC. Kaysville, UT, USA, ncss.com/software/pass). After a baseline period, 5 new clusters switch from the control group to the intervention group at the beginning of each subsequent period. Using historical data obtained from the development domain of the EMR for ward patients admitted from 01 July 2018 to 30 June 2019, we calculated a baseline in-hospital mortality rate by day 90 of 3.13%. Based on the same dataset, 18.3% of eligible ward patients had an alert based on qSOFA criteria [6] with in-hospital mortality of 8.16% compared to 2% in the non-alert patients. For sample size calculations, we made the following assumptions: (A) the impact of the intervention on mortality occurs only in patients who have the alert; (B) only half of the patients with the alert have sepsis; (C) 90% of deaths among the patients with the alert occur among septic patients; (D) early intervention resulting from the alert will reduce the in-hospital mortality by 50%, i.e., from 8.16 to 4.08% in patients with sepsis, and would lead to an overall change in inhospital mortality for the whole cohort from 3.13 to 2.46% yielding a relative risk of 0.79 and a risk difference of 0.67% which is the target effect size; (E) 80% power using two-sided Wald Z-test and significance level of 5%; and (F) an intra-cluster correlation (ICC, a measure of the relatedness of cluster) of 0.22 as estimated from the same retrospective electronic database. As the primary analysis would be adjusted for random effect to account for the correlation between patients within the same cluster, we used the estimation variance (σ 2 ), which is calculated from responses (P1, P2), as the withincluster variance (σw 2 ) as suggested by Hussey and Hughes and Hemming et al. (2015) [13,14]. As such, a reduction of in-hospital mortality by 90 days from 3.13 to 2.46% (relative risk of 0.79 and a risk difference of 0.67%) requires a total sample size of 65,250 subjects (average of 1450 subjects per cluster with an average of 145 subjects per cluster per period). With all five hospitals combined, this is expected to require 20 months (2 months per period). There is no planned interim analysis.

Study cohorts
Intention-to-treat cohort We will report patient flow according to the CONSORT flowchart for stepped-wedge cluster randomized trials by the allocated sequence and period [11] (Fig. 1). The intention-to-treat (ITT) cohort includes all eligible patients admitted to the eligible wards. The ITT analysis also implies that patients in the ITT cohort in the wards belonging to a particular period will be analyzed per their planned randomization regardless of what happens during the trial. For example, if a ward was planned to have the intervention during a given period and for technical reasons the alert system was not operational, patients admitted in that ward during that period will be analyzed as receiving an active alert. Although it is not anticipated that there will be wards that crossover their allocated group (i.e., change from alert to non-alert, or vice versa), such instances will be documented. Patients who are transferred from one ward to another will be counted as part of the first ward. The primary analysis will be based on this population. The study was launched in October 2019. During the study period in 2020-2021, and in response to the surge in the number of hospitalized patients with coronavirus disease-19 (COVID-19), some of the wards had to be converted to intensive care units (ICUs), making them ineligible for the study intervention [15]. These wards will be excluded from the ITT cohort while used as ICUs. During the peak of COVID-19 cases, total admissions to the wards declined substantially to less than 50%; therefore, 2 consecutive periods (starting June 2020) were extended from 2 to 3 months each to account for the decline in cluster size. In addition, some wards were designated for admission of suspected or confirmed COVID-19 cases. Given the higher mortality associated with such designation [16], and because such designation would likely be over-represented in the intervention group, data on a ward designated as a COVID-19 ward will be documented as a ward-level variable, which will be used as a fixed effect term in the primary model (as discussed below). Additionally, patient-level data on COVID-19 diagnosis will be also obtained.

Alert cohort
This cohort represents the subset of ITT patients who had the alert whether in the intervention wards or the control wards.
Statistical tests and their confidence intervals (CIs) will be calculated with two-sided. The statistical significance will be set at the 5% level. All analyses will be performed using SAS version 9.4. Categorical variables will be summarized as counts and frequencies, and continuous variables as median and interquartile ranges or means and standard deviations, if deemed normally distributed.

Reporting baseline characteristics, physiological parameters, and treatments
Baseline characteristics will be presented for the ITT and alert cohorts (Table S1) including age, sex, source of admission, admitting ward, comorbidities (extracted based on ICD-10-AM), Charlson comorbidity index, source of infection (pneumonia, urinary tract infection, skin and soft tissue infection, intra-abdominal infection, other infections, and no clear source, extracted based on ICD-10-AM), and dialysis. We will also report vital signs (systolic blood pressure, diastolic blood pressure, heart rate, temperature, and respiratory rate) as well as laboratory parameters (lactate, white blood cells, bilirubin, creatinine) and whether cultures of blood, respiratory, urine, or other body fluids (pleural, ascitic, cerebrospinal, joint) were obtained and whether treatments (intravenous fluids and antibiotics) were received at baseline. Definitions of these variables were outlined in the study protocol [5].

Reporting alert information
We will report the number of patients with at least one alert and the number of alerts per patient, the time to first alert, and the qSOFA criteria which led to the alert trigger (Table S2). We will report vital signs in the 12-h pre-alert (systolic blood pressure, diastolic blood pressure, heart rate, temperature, and respiratory rate) as well as laboratory parameters in the 12-h pre-alert (lactate, white blood cells, bilirubin, creatinine) and whether a culture of blood, respiratory, urine, or other body fluids (pleural, ascitic, cerebrospinal, joint) was obtained and whether treatments (intravenous fluids and antibiotics) were received in the 12-h pre-alert.

Reporting process measures and post-alert physiologic parameters
We will compare the following process measures between the two groups: (A) percentage of patients with lactate reported within 12 h of alert if not reported in the 12 h before alert and the highest lactate value reported in the 12 h after the alert; (B) the percentage of patients with blood culture ordered in 12 h if not performed in the 12 h before the alert; (C) percentage of patients with respiratory, urine, and body fluid cultures ordered in 12 h if not performed in the 12 h before the alert; (D) intravenous fluid administered in 12 h after alert (yes, no); (E) percentage of patients not on antibiotics in the 12 h before alert with a new antibiotic administered within 3 and 12 h of the alert; and (F) postalert systolic and diastolic blood pressure (the lowest value in the 12 h after the alert) and heart and respiratory rate (highest value in the 12 h after the alert) (Table  S2).

Study outcomes
The primary outcome is defined as all-cause in-hospital mortality within 90 days. Other outcomes include hospital length of stay (LOS) (ITT cohort and alert cohort), transfer to ICU within 90 days (ITT cohort and alert cohort) and 14 days of alert (alert cohort), ICU-free days in the first 90 days (ITT cohort and alert cohort), critical care response team (CCRT) activation within 90 days (ITT cohort and alert cohort) and 14 days of alert (alert cohort), cardiac arrest within 90 days (ITT cohort and alert cohort) and 14 days of alert (alert cohort), the need for mechanical ventilation, vasopressor therapy, and incident renal replacement therapy within 90 days (all in ITT cohort and alert cohort) and 14 days of alert (alert cohort). Balancing/safety outcome measures include antibiotic-free days, acquisition of multidrug-resistant organisms within 90 days in both groups (ITT cohort and alert cohort), and acquisition of Clostridium difficile infection within 90 days (ITT cohort and alert cohort) (Table S3 and S4).

Analysis of primary outcome
The primary outcome of all-cause in-hospital mortality by day 90 will be compared between the intervention group and the control group at the individual level with a generalized linear mixed model with a binary distribution using the jack-knife method to estimate standard errors to account for grouping within clusters and by incorporating a log-link function to estimate the relative risk as a measure of effect [13,17]. We will include two levels of random effects to account for nested clustering within wards and periods and two levels of fixed effects: hospitals and COVID-19 ward status in addition to the intervention. The model will be selected as the best model with a unique covariance structure that produces the lowest Bayesian Information Criterion (BIC) value. The covariance structures that will be considered in the model are the first order of autocorrelation covariance structure, unstructured covariance structure, Toeplitz covariance structure, and variance component structure (VC). We will use the Satterthwaite method to adjust for denominator degree of freedom for tests of the fixed effects. The random coefficients will be modeled using Gside random effects, and we will obtain the subjectspecific estimates by defining the appropriate variancecovariance structure. The overdispersion of parameters will also be assessed by calculating the ratio of the Pearson chi-square statistic and its degrees of freedom. If this ratio exceeds 1, it indicates that the variability in data has not been properly modeled and that there is residual overdispersion due to misspecification of the conditional distribution. In case of no event reported in a cluster for a particular period, a Firth correction will be used in the generalized linear mixed model to avoid the issue of quasi-separation. The Newton-Raphson optimization technique with ridging option might be used to help with the convergence of the model. Results will be expressed as relative risk with 95% CI. The primary outcome will be also analyzed similarly in the alert cohort. In case any model fails to converge, we will report odds ratio from mixed-effect logistic regression.

Analysis of secondary outcomes
Categorical outcomes including ICU admission, CCRT activation, cardiac arrest, the need for mechanical ventilation, vasopressor therapy, incident renal replacement therapy, acquisition of multidrug-resistant organisms, and Clostridium difficile infection will be compared between the intervention and the control group, in a similar model to the one used in the analysis primary outcome. We will include two levels of random effects to account for nested clustering within wards and periods and two levels of fixed effects: hospitals and COVID-19 ward status in addition to the intervention. Results will be expressed as relative risk with 95% CI. In case any model fails to converge, we will report odds ratio from the mixed-effect logistic regression.
Continuous outcomes including hospital LOS, ICUfree days, and antibiotic-free days will be compared using a mixed-effect Poisson model (Table S3). We will include two levels of random effects to account for nested clustering within wards and periods and two levels of fixed effects: hospitals and COVID-19 ward status in addition to the intervention. The model will be selected as the best model with a unique covariance structure that produces the lowest Bayesian Information Criterion (BIC) value. The covariance structures that will be considered in the model are the first order of autocorrelation covariance structure, unstructured covariance structure, Toeplitz covariance structure, and variance component structure (VC). The random coefficients will be modeled using G-side random effects, and we will obtain the subject-specific estimates by defining the appropriate variance-covariance structure. The results will be expressed as beta estimates with 95% CI.

Sensitivity and subgroup analyses
To address the concern about contamination, we will conduct a sensitivity analysis excluding all patients in the control group in the 90 days before crossing over to the intervention group. Because there are fewer than 50 clusters, we will conduct a sensitivity analysis using a small sample correction with the Kenward-Roger method [13,18,19]. We will conduct a sensitivity analysis adjusting for the following covariates: type of wards (medical, surgical, oncology, and mixed), age, baseline systolic blood pressure, baseline respiratory rate, Glasgow Coma Scale, and Charlson comorbidity index. For the latter analysis, we will use imputation for missing variables as outlined below. In addition, we will conduct a complete case sensitivity analysis. We will conduct a sensitivity analysis excluding the periods in which wards were assigned as COVID-19 wards.
We will analyze the primary outcome of all-cause inhospital mortality by day 90 across predefined subgroups using the same model of the primary analysis. The predefined subgroups include age ≤65 years and >65 years; patients with documented infection source (including ICD-10-AM for pneumonia, urinary tract infection, skin and soft tissue infection, intra-abdominal infection, or other infections); patients with no documented infection or infection source; patients admitted to medical, surgical, oncology, and mixed wards, alert within 48 h of admission and after 48 h of admission; and patients admitted to COVID-19 and non-COVID-19 wards (Table S6). Results of the test of interaction will be reported.

Handling dropouts and missing data
We do not expect to have missing observations in the variables required for the primary analysis. Given the nature of the study, missing observations are expected, for example, not all patients will have all laboratory tests and therapies within the narrow time windows defined in the protocol. Variables used in the descriptive analysis will not be imputed. In some of the secondary and sensitivity analyses, imputation will be used. Because the Glasgow Coma Scale score is not documented for patients with normal neurologic status, missing observations will be assigned a normal value for purposes of the model adjustment. Other variables used in the model adjustment model will be assessed and characterized in terms of their pattern (i.e., Missing Completely at Random, Missing at Random, Missing Not at Random). For Missing Completely at Random data, all analyses will be based on a list-wise deletion approach where observations with complete values will be only considered for analysis. For variables with values Missing at Random, multiple imputation techniques will be utilized to impute the missing values as suggested by Rubin [20]. And for variables with values Missing Not at Random, a pattern-mixture model technique will be used to impute the missing values [21].

Graphical presentation
The subgroup analyses will be displayed as a forest plot.

Adjustment for multiplicity
To adjust for multiple testing for secondary outcomes and subgroup analyses, we will use the false discovery rate (FDR) as described by Benjamini and Hochberg [22]. In this procedure, all hypothesis tests will be sorted in descending order based on their calculated p-value. All hypothesis tests below an index K will be rejected where K is calculated as follows: where i = 1, 2, …., m; m is the total number of tested hypotheses; and q = 0.05. The multiplicity testing adjustment will also be done on CIs by constructing 1 − K × q/m CI for each selected parameter [23].

Protocol deviations
We will report protocol deviations, if they occur, including failure to implement the intervention in a given ward, or wrong implementation of the intervention in a ward assigned to the control group. These deviations will be documented on the CONSORT flow diagram. The data from such wards will be analyzed according to the ITT principle.

Study governance
The study management and development committee is responsible for the overall management of the study, data management, and maintenance of the trial master file and statistical master file following the Good Clinical Practice principles.

Discussion
The SCREEN trial provides an opportunity for a novel trial design and analysis of routinely collected and entered data to evaluate the effectiveness of an intervention for a common medical problem (sepsis in ward patients).
Stepped-wedge cluster randomized trials involve randomization of clusters to different sequences and are increasingly used in clinical medicine [14,[24][25][26]. This design is suitable for evaluating interventions delivered at the level of the cluster. In the SCREEN trial, it allows the assessment of the effect of the alert system by comparing the outcomes of patients in the intervention and control cohorts as well as over time [11]. In addition, by having the masked alert, an additional comparison between patients with alerts in the intervention and control groups will be performed.
Strengths of our SAP include the analysis according to the ITT principle, the large sample size, and the multicenter nature. Our analysis plan highlights some potential trial limitations, in relation to the low event rate, which we have addressed by the large sample size. We will address the multiplicity of secondary analyses by reporting the false discovery rate. Our alert which is based on qSOFA is possibly applicable in other healthcare settings, although there may be variations across different EMRs. As per the design of the stepped-wedge trial, the proportion of clusters in the intervention group increases gradually with time, and as a result, the intervention group will, on average, have more observations at later dates of the study [14]. Because external factors may influence outcome over time, the time is a potential confounder and was accounted in the sample size calculation and will be adjusted for in the analysis [14]. The COVID-19 pandemic is an example of an external factor that is associated with increased in-hospital mortality of hospitalized patients [27]. Because certain wards were assigned as ICUs during the pandemic, we will exclude these wards for that duration of ICU assignment from the analysis. Other wards were assigned as COVID-19 wards. To address this confounder, we will account for COVID-19 ward status as a fixed effect term for each period. In addition, we will carry out a sensitivity analysis excluding the periods in which wards were assigned as COVID-19 wards. We will also conduct a subgroup analysis by whether patients were admitted to COVID-19 or non-COVID-19 wards. We used 90-day in-hospital mortality rather than a shorter mortality measure as the primary outcome to capture the effect of the intervention on a longer outcome. However, this would likely cause some degree of within-cluster contamination, as some patients in the control wards will become exposed to the alert after crossing over to the intervention group. The impact of the contamination is likely to be modest because the average stay in the hospital is likely to be <10 days. Nevertheless, we will conduct a sensitivity analysis excluding all patients in the control group in the 90 days before crossing over to the intervention group. We did not stratify randomization according to hospital given that the 5 hospitals follow a similar healthcare delivery model; however, we used hospital as a fixed effect term in the analytical model.
The study has started in October 2019 and is anticipated to have complete implementation and follow-up data by the end of October 2021.

Conclusion
In this SAP, we outline details of the planned analyses in advance of SCREEN trial completion. Prior specification of the statistical methods and outcome analysis will facilitate unbiased analyses of these important clinical data.