The SW-CRCT design has received substantial attention over the last few years. In this work, we have addressed some of the issues associated with it through a literature review. Principally, we sought to discern the reporting quality of completed SW-CRCTs. To this end, we presented a review of 39 completed studies, assessing their reporting quality through 43 chosen indicators. Previously conducted reviews on SW-CRCTs have assessed reporting quality on at most 15 completed studies and 23 indicators at once. Therefore, we here provided analysis based on a far larger number of studies and indicators than previous reviews.
Our analysis found that much work remains to be done to improve quality. Sound reporting is key to assessing the validity of a study; however, the median performance across all considered criteria was 66.7%. Moreover, whilst the type of randomization used was well described (37/39, 92.3%), the method of random allocation (23/39, 59.0%), the allocation concealment mechanism (12/29, 30.8%) and the person who implemented the randomization (15/39, 38.5%) was not. Moreover, 25.6% (10/39) of studies did not describe their designs as randomized in the title or abstract.
Some 15.4% (6/39) of studies successfully detailed any harm associated with their intervention. In many instances this may well be because there was no harm, but it is important to make this clear, particularly as many of the interventions involve drugs for which side-effects could well be expected.
In line with a 2016 review [35], 38.5% (15/39) of studies did not provide any form of justification for their sample size, and 43.6% (17/39) mentioned the intracluster correlation or coefficient of variation. Moreover, 43.6% (17/39) and 41.0% (16/39) of the studies did not detail planned type-I and type-II error rates for their design. Thus, again there is great room for improvement in the reporting of the specification of SW-CRCTs.
Nonetheless, from Fig. 4 it does appear somewhat that the standard of reporting of SW-CRCTs has improved since 2010. This improvement may be because more has now been published about the design, or could perhaps be a result of the lag time for journals adopting the new criteria contained within the CONSORT extension to CRCTs [37, 38].
Whilst the median performance across the criteria listed on the CONSORT extension to CRCTs was 80.8%, it is more alarming that the mean performance was 70.8%. This, however, should perhaps not surprise us, as it is not a problem restricted solely to SW-CRCTs. Indeed several recent reviews have identified that the reporting of CRCTs in several areas has been poor [48–51]. In particular, one 2011 review identified that the quality of reporting of only 5 of 14 considered criteria had improved following the publication of the CONSORT extension to CRCTs [51]. However, this may have been the result of a lag of uptake of the new criteria. Regardless, further efforts are needed to improve reporting quality for not only SW-CRCTs, but all CRCTs. Nonetheless, there are several features specific to SW-CRCTs not contained within this CONSORT extension, and therefore further guidance is needed on publication of SW-CRCTs. Fortunately, therefore, a CONSORT extension to SW-CRCTs is now under development [52]. In this extension, it appears warranted to encourage greater detail on the timing of each step, to ensure that it is improved in the future. It will also, in light of recent extensions to the classical SW-CRCT, be important to define clearly what constitutes a SW-CRCT, and therefore which trials would need to adhere to such guidance. Finally, 33.3% (13/39) of trials completed thus far did not include a diagram of their design. This is strongly encouraged by the CONSORT extension to CRCTs. However, it may be wise for it to be listed as a specific point on an extension to SW-CRCTs, perhaps even with certain required features, as it is clearly a very powerful means of detailing a SW-CRCT design.
In addition, we sought to ascertain the general design features of all SW-CRCTs conducted to date. To this end, we presented a review of features from an additional 84 study protocols, registrations and conference reports, along with those from the aforementioned 39 completed studies.
We observe that 91% of the identified and included records were from 2010 onwards. Thus, it appears that the design is increasing in popularity, perhaps reflecting an increasing acknowledgement of its usefulness, or simply that more is now known about the design.
It is clear that the design has been used in an extremely broad range of research areas, in a wide range of countries. Moreover, the actual characteristics of the design have varied substantially as well. Therefore, it seems that the design has proven useful to a large selection of study scenarios, its benefits seemingly coming from reasons other than one specific trial scenario. Indeed, our work highlighted once again some of the advantages of the SW-CRCT design that have frequently been cited.
We identified that the most common reason for use was that trial organizers wished to give the intervention to all clusters eventually. Additionally, a large number cited logistical and practical constraints that made the staggered implementation involved in a SW-CRCT favourable. A small number of studies cited that the design was preferred for efficiency or power related reasons. It is, of course, essential to remember that this is not universally true of the SW-CRCT design. Moreover, some studies cited that the design would allow them to fine-tune the intervention over time, but it is important to note that in SW-CRCTs one should not set out to change the intervention across time periods as this would imply that the intervention effect would change during the trial. Overall, however, it seems that there are a large number of reasons for which the design has been preferred, contributing to the broad array of research domains in which it has been utilized.
There are also several disadvantages associated with the SW-CRCT design [27, 29, 31]. Specifically, if the chosen SW-CRCT design is one that requires data collection at each point when a new cluster receives the experimental intervention, this could, in some circumstances, see the cost of data collection become considerable. Given that the design is also not guaranteed to require a smaller sample size than a parallel group CRCT, it may also require a substantially longer trial duration. Consequently, as discussed recently, researchers must remain careful when deciding whether to employ the design [29]. This seems particularly true in light of our findings that 52.0% of completed studies did not find a significant effect on any primary outcome measure for which a minimum clinically important difference had been specified. Of course, it could be that these studies were simply underpowered, owing to misspecification of the variance parameters. Extension of the classical SW-CRCT design to incorporate sample size re-estimation could therefore be an important advance. Alternatively, to guard against over-enthusiastic use of the design, interim analyses to stop the trial early for futility may also be a useful design extension for trial organizers to consider. This approach, however, would, of course, not always be applicable; for example if the intervention were part of some wider planned roll out, or if the possibility that some clusters might not receive the intervention was deemed unacceptable.
Strengths and limitations
We used a literature search strategy based on a previously completed review to identify relevant articles for inclusion. In particular, we utilized a large number of search terms, across a large number of databases, and set no limits on the publication date. Therefore, we were able to complete an assessment of the general design features and quality of reporting of SW-CRCTs to date on more trials than all previous reviews. However, it is still possible that some studies might have been missed that used other phrases to describe their design that we did not include in our search. Additionally, we might have missed some studies reporting in languages other than English. Finally, several reviews, ours included, have now looked to include SW-CRCTs conducted in non-health based research settings. However, to maximize the chance of such studies being identified, additional databases should be included, such as the Campbell Collaboration [53].
We chose to exclude studies where there was no (baseline) period present in which no clusters received the experimental intervention; however, following recent publications many might consider designs of this form as SW-CRCTs. Moreover, similar statements are true for incomplete block SW-CRCTs. Future researchers might seek to include such studies in their reviews. Furthermore, we chose to assess reporting quality on a particular set of 10 criteria, which we chose to be key. Whilst justification for our choices was discussed, some researchers might have preferred alternate criteria to be included in this list. It is possible that future work could convene a panel of experts on SW-CRCT designs to determine which criteria to view as of paramount importance, enlisting the help, for example, of the Comet Initiative [54] or the Equator Network [55].
In addition, future reviews could seek to expand the classification of their extracted data beyond our, and the previous reviewers’, simple ‘yes or no’ prescription. For some of the considered criteria, it would be particularly beneficial to incorporate whether they were partially satisfied, according to some designated scoring procedure. This would potentially allow the improvement of the reporting of SW-CRCTs to be more accurately measured.
Finally, only one author conducted the inclusion search and the eventual data extraction. In our initial planning of the review, we did consider the use of duplicate abstract screening and data extraction. However, it was decided one author would perform all of the screening and data extraction, marking the cases where decisions were unclear for joint discussion with the other authors. This was because, in nearly all cases, it was very clear what decision should be made, with this fact assisted by the careful choice of criteria that avoided the need for any subjective opinion. We acknowledge that this deviates from best practice for conducting a review but are confident it has not affected the quality of our work. For record selection, this claim is backed up by the verification that all trials included in previous reviews, that met our choice of inclusion criteria, were included in our review.