Sample size calculation
In the protocol, we specified that we would include 34 wards with on average 16 employees on each ward (total of 544 employees) [2]. However, we actually included 19 wards, with an average of 25 participants per ward (total 504 participants). Consequently, assuming that all participants have provided baseline and 12-month data, the power would have fallen from 81% to 66%. If, in addition, HECSI data for both baseline and 12 months would have been only available from an average of 16 participants, the power would decrease to 56%. These power calculations were performed in PASS 15 (NCSS, LLC, Kaysville, UT, USA).
Overall principles
The data analysis will start after the 12-month follow-up data are available for all participants or it is clear that any participants, for whom no 12-month data are available, have dropped out of the study, and the study database has been cleaned and locked for this time point. All analyses will be performed by analysing participants in the trial arm, to which they were allocated in the ward level randomisation. The analyses will be first performed blinded to treatment allocation to allow the data and proposed analyses to be checked. Treatment allocation will only be unmasked when all data cleaning and analyses to be presented have been finalised.
We will present the characteristics of wards and participants using simple descriptive statistics. We will use the mean and standard deviation to describe normally distributed continuous variables and the median and upper and lower limits of the interquartile range to describe non-normally distributed continuous variables. We will assess the normality of continuous variables by visually inspecting histograms. We will use counts and percentages to present categorical variables. Two-sided P values of less than 0.05 will be considered statistically significant and statistical uncertainty will be expressed using two-sided 95% confidence intervals. No formal statistical testing will be performed to examine differences in baseline characteristics between the trial arms. The analyses will be performed by one of the investigators (MS) supervised by the other investigators (SK, JS) and a statistician (RH). All statistical programming and analysis will be performed using IBM SPSS statistics version 24 (IBM Corp., Armonk. NY, USA).
Analysis populations and units
A true intention-to-treat population would include all participants randomised. However, due to substantial loss to follow-up in this study, we will perform the main analyses on a modified intention-to-treat population. This population will consist of all the participants with a HECSI score at baseline and 12 months. The per protocol population will consist of all participants with a HECSI score at baseline and 12 months and who worked in the same ward for the whole duration of the study.
Handling of missing data
In our main analyses for the primary outcome, we will use a simple joint model approach to model the missing HECSI scores at 12 months and the observed difference between the HECSI scores at baseline and 12 months. We will perform three types of sensitivity analyses on the way that we have dealt with missing data on the primary outcome. For participants with missing HECSI score data at 12 months, we will: (1) assume the best possible outcome (HECSI score of 0); (2) assume the worst possible outcome (highest observed HECSI score) and (3) perform multiple imputation for the difference between baseline and 12-month HECSI scores. We will use on baseline characteristics as independent variables in the multiple imputations.
List of analyses
Recruitment and retention and baseline characteristics
We will present the numbers of wards and employees assessed for eligibility, included, randomised to the intervention and control arms and lost to follow-up in a Consolidated Standards of Reporting Trials (CONSORT) flow diagram (see Fig. 1).
We will present the baseline characteristics of all randomised participants in each arm in a table, without performing formal statistical testing. We will present: type of ward, working years, working hours, sex, atopic tendency, self-reported HD last month, NMF level and HECSI score.
Deviations and violations from protocol
No major deviation or violations from protocol occurred. The main difference from protocol is the number of wards included (20 wards). while the sample size calculation was based on 34 wards.
Primary and secondary outcomes
We will present crude means and 95% confidence intervals for the changes in HECSI score and levels of NMF between baseline and 12 months for the intervention and control groups. In addition, we will present crude proportions of participants with both baseline and 12 months HECSI scores for both groups. We will obtain p values for the difference between the intervention and control groups using generalised estimating equations with an exchangeable working correlations matrix to account for clustering within wards. We will use a linear model for the changes in HECSI score and levels of NMF and a binary model with a logit link function for the missing data. We will adjust the analysis of the primary outcome for the binary factor ward-level exposure to wet work in the preceding year, used to stratify the wards in the randomisation.
Sensitivity analysis
In addition to the three sensitivity analyses on the handling of missing data, we will perform a sensitivity analysis on the effect of exposure to wet work observed during the study. No subgroup analyses will be performed.