Changing cluster composition in cluster randomised controlled trials: design and analysis considerations
 Neil Corrigan^{1, 2},
 Michael J G Bankart^{1, 3},
 Laura J Gray^{1} and
 Karen L Smith^{1}Email author
DOI: 10.1186/1745621515184
© Corrigan et al.; licensee BioMed Central Ltd. 2014
Received: 8 November 2013
Accepted: 6 May 2014
Published: 24 May 2014
Abstract
Background
There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of postrandomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes.
Methods
We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis.
Results
Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated.
Conclusions
Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations include avoidance of cluster merges where possible, discontinuation of clusters following heterogeneous merges, allowance for potential loss of clusters and additional variability in cluster size in the original sample size calculation, and use of appropriate ICC estimates that reflect cluster size.
Keywords
Cluster merging Cluster randomised trials Loss to followup Primary care Sample size Variability in cluster sizeBackground
Cluster randomised controlled trials (RCTs), in which groups of individuals rather than the individuals themselves are randomised, are conducted for a variety of reasons. The cluster design is often used when an intervention can be administered only to a group, such as a servicewide change or a public health campaign; when there is a risk that an intervention will affect participants in the nonintervention arm; or for reasons of cost or convenience. Such RCTs have a number of methodological challenges in their design, conduct and analysis, discussions of which can be found in a number of texts [1, 2]. One issue that has received little attention is the consequence of changes to the composition of clusters after randomisation, including the merging or fragmentation of clusters. Cluster RCTs are relatively common in general practice settings, where general practitioners (GPs) or general practices, rather than individual patients, are the chosen unit of randomisation. Unfortunately, organisational changes are not uncommon in primary care, with some practices merging and others splitting. The number and size of GP practices in the United Kingdom have changed over time, with a reduction by 28% in the number of singlehanded GP practices between 2004 and 2009 and a 19% increase in the total number of GPs. There was a 9% decrease in the number of GP practices between 1997 and 2007 [3], however, and organisational changes to meet the challenges of patient care have been actively encouraged [4].
In this article, we focus on the implications of merging clusters for the design and analysis of cluster RCTs. We chose to focus on this effect in cluster RCTs carried out within primary care, because the reduction in the number of GP practices in recent years could result in greater potential for merges to occur in this setting than in other areas where cluster RCTs are frequently used, such as schools, communities, factories and hospitals.
There are few incidences of cluster merging reported in the literature. Using a MEDLINE search (with search terms ‘Trial’ AND ‘primary care’ AND ‘cluster’), we identified reports of completed cluster RCTs in primary care published between 2004 and June 2012, with the start date chosen because 2004 was the year of publication of the Consolidated Standards of Reporting Trials (CONSORT) extension for cluster RCTs [5], which require descriptions of the flow of participants and clusters. We identified 451 potentially useful references in the search.
After assessing the publication texts, we identified 211 reports of cluster RCTs in primary care. From among these, we found only one in which the authors explicitly reported a merge of clusters [6]. Foy et al. conducted two parallel cluster RCTs and reported that a practice merge brought together practices that were in the same arm of one RCT and different arms of another. It is not clear whether or how cluster merging was dealt with in their analysis.
To assess the extent of unreported instances of merging clusters in their RCTs, we contacted authors of papers published between 2010 and the present. From among the 67 authors contacted, 27 replied (response rate = 40.3%). Only one of the respondents had experienced a cluster merge in two practices originally randomised to the same trial arm. In the analysis, these two practices were treated as one [7].
Although the number of reported and/or acknowledged incidence of cluster merging is low, it is not obvious how RCT conduct and analysis should be handled when clusters do merge. We suggest that there are a number of simple options available: (1) discontinue recruitment to affected clusters, (2) analyse clusters separately as randomised or (3) analyse the clusters as a new merged cluster. The extent to which merging of clusters might create difficulties is likely to depend on the nature of the cluster merges, the design of the cluster RCT, the arm of the RCT to which the clusters were originally randomised and the timing of the merge. For example, if two primary care practices merge on a purely administrative level, with access to healthcare professionals and patient care unaffected, it seems reasonable to continue as if such clusters had not merged and to analyse them as two separate clusters. Other cases may not be so clearcut, particularly if patient care is reorganised following merging of clusters, resulting in the potential for contamination. In such cases, careful consideration of the design will be needed with regard to the following issues: how recruitment is conducted (identification and enrolment prior to randomisation or recruitment of individuals postrandomisation), cohort or crosssectional design and the nature of the intervention (for example, at the level of practice/clinician or patient). In most circumstances, it is unlikely that merged clusters will be analysed as one cluster if the clusters were originally randomised to different arms of a RCT, but it might be considered acceptable in a crosssectional design, in which different patients are included at each measurement time point. The status of participants at the time of the cluster merge (for example, the number who have already completed treatment, the number part way through treatment and the number in followup) may also have a bearing on the decision.
In the remainder of this article, we explore statistical issues related to changing cluster composition. Methods and results are described for continuous outcome measures, although similar principles apply to binary outcome measures.
Methods
Impact on study design
The aim of any RCT is to obtain an unbiased estimate of the treatment effect with sufficient precision to enable inferences to be made. In order to accomplish this goal, careful consideration of sufficient sample size is required.
The most common approach to calculating the sample size for a cluster RCT involves increasing the number of participants required for an individually randomised trial by an inflation factor called the design effect. Details of the sample size calculation can be found elsewhere [8], but we describe them briefly for a continuous outcome measure.
where Φ is the cumulative distribution function for the standard normal distribution.
it can be seen that the study power increases monotonically as γ increases. Thus the impact of each parameter on study power can be examined. The parameters that will be changed by merging clusters are m, which for post–cluster merges will be the average cluster size (rather than fixed); c, the number of clusters; and λ, the allocation ratio of clusters.
Upon inspection (equation (5)), a simple monotonic relationship between c and power is apparent, such that (1 − β) → 1 as c → ∞ and (1 − β) → Φ(−ξ_{α/2}) as c → 0.
It is known that, at larger values of m, the benefit gained from further increasing the average cluster size becomes less as the study power plateaus [9]. This has important consequences for study power if clusters merge and the average cluster size increases.
The relationship between study power and the ratio of clusters allocated to each arm of a RCT is nonmonotonic, with optimum power at λ = 1, and decreases in power occurring as the value of λ deviates further from 1. Holding other parameters constant in equation (6), $\mathit{\gamma}\left(\mathit{\lambda}\right)\propto \frac{\mathit{\lambda}}{{\left(1+\mathit{\lambda}\right)}^{2}}$ and $\mathit{\gamma}\prime \left(\mathit{\lambda}\right)\propto \frac{1\mathit{\lambda}}{{\left(1+\mathit{\lambda}\right)}^{3}}$, which is positive for λ ∈ [0, 1), negative for λ ∈ (1, ∞) and equal to 0 when λ = 1, indicating that γ(λ), and therefore the study power, reaches a maximum point at λ = 1.
and it is clear that the design effect increases as the variation in cluster sizes increases. As clusters merge, cluster size variability increases and the design effect also increases. This implies that, without an increase in sample size, study power will decrease following cluster merges.
Consider a scenario in which all clusters are of equal size m and treatment groups receive equal allocations of the number of clusters. Let the number of clusters be c ∈ 2ℕ and n ^{(i)} denote the size of cluster i (i = 1, …, c). The standard deviation of cluster size, σ_{ c }, is therefore 0.
Suppose that $\mathit{k}\in \left[0,\frac{\mathit{c}}{2}\right]$ pairs of clusters merge, leaving c − k clusters in total with ${\mathit{n}}^{\left({\mathit{j}}_{1}\right)}=\cdots ={\mathit{n}}^{\left({\mathit{j}}_{\mathit{k}}\right)}=2\mathit{m}$ for j _{1}, …, j _{ k } ∈ {1, …, c − k}, and n ^{(i)} = m for i ∈ {1, …, c − k}∖{j _{1}, … j _{ k }}. For notational convenience, let S denote the set {1, …, c − k}∖{j _{1}, …j _{ k }}.
Recall that study power increases monotonically with c, and, holding all other terms in equation (6) fixed, we obtain $\mathit{\gamma}=\frac{\mathit{m}\left(1\mathit{w}\right)\mathit{c}}{1+\left(\mathit{m}\left(1\mathit{w}\right)1\right)\mathit{\rho}}\times \mathrm{constant}$. After $\mathit{k}\in \left[0,\frac{\mathit{c}}{2}\right]$ merges, this equation becomes $\tilde{\mathit{\gamma}}=\frac{\tilde{\mathit{m}}\left(1\mathit{w}\right)\left(\mathit{c}\mathit{k}\right)}{1+\left(\tilde{\mathit{m}}\left(1\mathit{w}\right)1\right)\mathit{\rho}}\times \mathrm{constant}$. Note that the constant terms in the expressions for γ and $\tilde{\mathit{\gamma}}$ are equal, but $\tilde{\mathit{m}}\left(\mathit{c}\mathit{k}\right)=\frac{\mathit{c}}{\mathit{c}\mathit{k}}\mathit{m}\left(\mathit{c}\mathit{k}\right)=\mathit{mc}$. Therefore, $\tilde{\mathit{\gamma}}=\frac{\mathit{m}\left(1\mathit{w}\right)\mathit{c}}{1+\left(\tilde{\mathit{m}}\left(1\mathit{w}\right)1\right)\mathit{\rho}}\times \mathrm{constant}$, and $\tilde{\mathit{\gamma}}\le \mathit{\gamma}$, with equality holding only if k = 0. Hence, power will be reduced following cluster merges if no additional clusters are recruited.
Because optimum power is achieved when λ = 1, if the number of cluster merges is unequal between the treatment groups, the study power will be adversely affected.
These formulae have been used to explore the combined impact of the changes in design parameters graphically.
Impact on analysis: simulation study
Most of the few reported instances of cluster merges involved clusters within the same treatment arm of a RCT (which we refer to as homogeneous merging). In the one instance in which data analysis was reported, the resulting data were analysed with the merged cluster treated as a single cluster. We explored, by simulation, the appropriateness of this pragmatic strategy and considered approaches to analysis when clusters merge that were randomised to different treatment groups (that is, heterogeneous merging).
Cluster RCT data were simulated using the framework of a multilevel model with a simulated twoarm RCT, comprising a control group and an intervention group. Clusters were set to be of equal size with equal allocation of clusters to treatment arms. The outcome for each individual was generated as the sum of three components, ${\mathit{Y}}_{\mathit{ij}}={\mathit{\mu}}_{\mathit{ij}}^{\mathit{trt}}+{\mathit{u}}_{0\mathit{j}}+{\mathit{\u03f5}}_{0\mathit{ij}}$, where ${\mathit{\mu}}_{\mathit{ij}}^{\mathit{trt}}$ was the mean outcome for the treatment group to which patient i in cluster j was allocated, u _{0j } was sampled from $\mathit{N}\left(0,{\mathit{\sigma}}_{\mathit{b}}^{2}\right)$ and represented the clusterlevel error for all individuals in that cluster, and ϵ _{0ij } was sampled from $\mathit{N}\left(0,{\mathit{\sigma}}_{\mathit{w}}^{2}\right)$ and was used as the individuallevel error. Without loss of generality, ${\mathit{\sigma}}_{\mathit{b}}^{2}$ and ${\mathit{\sigma}}_{\mathit{w}}^{2}$ were chosen so that their sum was equal to 1. A value of 0.05 was used for the ICC, a commonly used value in designing cluster RCTs in primary care. The total number of clusters was set at 80, and 20 individuals were allocated to each cluster, giving a 5% significance level at 80% power and an effect size of 0.2. True treatment group means were given the values μ_{0} = 0 and μ_{1} = 0.2.
For each scenario, 1,000 simulations were generated and a random intercept model was fitted to the resultant data sets. For modelfitting, we used restricted maximum likelihood to improve estimates of the variance components [15].
We conducted further simulations, keeping the planned power static at 80% but increasing the cluster size with a corresponding reduction in the number of clusters. This resulted in combinations of 60 clusters with 40 individuals per cluster and 48 clusters with 100 individuals per cluster.
Homogeneous merges
Homogeneous cluster merges alter cluster size, average cluster size and, potentially, ICC, all of which have an impact on study power.
Scenario 1
For each homogeneous merge, two clusters from the same treatment group, which had not already been involved in a merge, were selected at random to become a merged cluster. Individual patient outcomes were left unchanged because it is assumed that treatment is not affected by the merge of clusters. The scenario was simulated for all pairs (k _{0}, k _{1}) ∈ M × M, where M = {0, 1, 2, 5, 10, 20} and k _{0} and k _{1} are the number of merges in the control and intervention groups, respectively.
Scenario 2
A further scenario was simulated, to more closely reflect what might happen in practice. In this scenario, half of the individuals were assumed to have completed treatment prior to cluster merge, retaining the old cluster level error term. The remainder were allocated to a new merged cluster with a new cluster level error term applied in generating the outcome.
Heterogeneous merges
Two different scenarios were used to simulate heterogeneous cluster merges.
Scenario 3
The simulated data sets were adjusted in a similar way as that used for homogeneous merges, with each merge consisting of one cluster from the control arm and one from the intervention arm randomly selected to form a merged cluster. With this scenario, whilst unrealistic in practice and presented here as an extreme illustration, we assumed that patient outcomes are unchanged following a merge and represented a RCT in which all patients completed the intervention prior to a merge.
Three strategies for analysis were explored: (1) merged clusters were allocated to the control arm of the study, (2) merged clusters were allocated to the intervention arm of the study or (3) merged clusters were eliminated from the analysis. It was expected that the first two strategies would lead to bias and that the third, whilst unbiased, would lead to a loss of power.
Scenario 4
Rather than assume that all patients completed the intervention prior to the merge, in this scenario, we assumed that only 50% of the patients did so. The treatment group mean component used to simulate outcomes for individuals not completing treatment prior to the merge was adjusted according to treatment group allocation postmerge. As with scenario 3, analysis was based on three strategies: (1) merged clusters were allocated to the control group, (2) merged clusters were allocated to the intervention group or (3) merged clusters were dropped from the analysis.
Additionally, this scenario was simulated both with and without those who did not complete treatment prior to the merge, with individuals analysed according to their original cluster assignment when noncompleters were omitted. This analysis reflected a pragmatic approach of discontinuing clusters following a merge.
As with homogeneous merges, further simulations were conducted with increased cluster size and a reduced number of clusters, keeping the planned study power constant at 80%. All simulations and analyses were conducted using Stata 12 software (StataCorp, College Station, TX, USA). Example code for the simulations is given in Additional file 1.
Results
Study design
Analysis: simulation study
We analysed the complete simulated data sets without cluster merges. From among the 1,000 simulated data sets, 826 yielded evidence of a significant treatment difference at the 5% level, and parameter estimates were all in agreement with the ‘true’ level.
Homogeneous cluster merges
Scenario 1
Parameter estimates following homogeneous cluster merges: scenario 1 ^{ a }
Number of cluster merges per treatment group  

Empirical estimates  0  1  2  5  10  20 
Intercept, β_{0}  0.001 (−0.003, 0.005)  0.000 (−0.003, 0.004)  −0.001 (−0.005, 0.002)  −0.001 (−0.004, 0.002)  0.000 (−0.003, 0.004)  −0.003 (−0.006, 0.001) 
Treatment effect, β_{1}  0.200 (0.195, 0.205)  0.200 (0.195, 0.205)  0.201 (0.196, 0.206)  0.202 (0.198, 0.207)  0.201 (0.196, 0.206)  0.204 (0.199, 0.209) 
${\mathit{\sigma}}_{\mathit{b}}^{2}$  0.050 (0.048, 0.051)  0.049 (0.048, 0.051)  0.047 (0.046, 0.049)  0.044 (0.043, 0.045)  0.038 (0.037, 0.040)  0.024 (0.023, 0.025) 
${\mathit{\sigma}}_{\mathit{w}}^{2}$  0.950 (0.947, 0.952)  0.952 (0.949, 0.954)  0.954 (0.951, 0.957)  0.955 (0.953, 0.958)  0.962 (0.959, 0.965)  0.975 (0.973, 0.978) 
Intracluster correlation coefficient  0.050 (0.048, 0.051)  0.049 (0.048, 0.051)  0.047 (0.046, 0.049)  0.044 (0.042, 0.045)  0.038 (0.037, 0.039)  0.024 (0.023, 0.025) 
Cluster size variance  0  10.1  20.2  49.7  90.4  0 
Empirical power  81.8%  80.0%  82.0%  81.7%  80.9%  83.9% 
The same patterns in estimates were observed when the number and size of clusters was varied, with very similar results by proportion of clusters merging.
Scenario 2
When only 50% of patients are assumed to have completed prior to a merge a similar pattern was observed.
Heterogeneous cluster merges
Scenario 3
Parameter estimates following heterogeneous cluster merges: scenario 3 ^{ a }
Number of cluster merges per treatment group  

Empirical estimates  0  1  2  5  10  20 
Assigned to control  
Intercept, β_{0}  0.001 (−0.003, 0.004)  0.004 (0.000, 0.007)  0.006 (0.002, 0.009)  0.015 (0.012, 0.019)  0.032 (0.029, 0.035)  0.057 (0.054, 0.060) 
Treatment effect, β_{1}  0.199 (0.195, 0.204)  0.197 (0.192, 0.202)  0.194 (0.189, 0.198)  0.186 (0.181, 0.190)  0.170 (0.165, 0.175)  0.001 (−0.003, 0.004) 
${\mathit{\sigma}}_{\mathit{b}}^{2}$  0.051 (0.049, 0.052)  0.050 (0.049, 0.051)  0.049 (0.047, 0.0498)  0.048 (0.046, 0.049)  0.046 (0.045, 0.047)  0.039 (0.038, 0.040) 
${\mathit{\sigma}}_{\mathit{w}}^{2}$  0.951 (0.948, 0.954)  0.951 (0.949, 0.954)  0.953 (0.951, 0.956)  0.953 (0.951, 0.956)  0.958 (0.955, 0.961)  0.969 (0.966, 0.971) 
Intracluster correlation coefficient  0.050 (0.048, 0.051)  0.050 (0.049, 0.051)  0.049 (0.047, 0.0494)  0.048 (0.046, 0.049)  0.046 (0.044, 0.047)  0.039 (0.037, 0.040) 
Empirical power  82.5%  79.0%  78.8%  74.3%  64.3%  46.9% 
Assigned to intervention  
Intercept, β_{0}  0.002 (−0.002, 0.005)  −0.002 (−0.006, 0.001)  0.000 (−0.004, 0.003)  0.001 (−0.003, 0.005)  0.000 (−0.004, 0.004)  −0.003 (−0.008, 0.001) 
Treatment effect, β_{1}  0.201 (0.197, 0.206)  0.198 (0.194, 0.203)  0.192 (0.187, 0.196)  0.185 (0.180, 0.190)  0.166 (0.161, 0.171)  0.146 (0.141, 0.152) 
${\mathit{\sigma}}_{\mathit{b}}^{2}$  0.050 (0.048, 0.051)  0.050 (0.048, 0.051)  0.049 (0.047, 0.0499)  0.047 (0.046, 0.049)  0.045 (0.043, 0.046)  0.039 (0.038, 0.040) 
${\mathit{\sigma}}_{\mathit{w}}^{2}$  0.948 (0.946, 0.950)  0.950 (0.947, 0.952)  0.950 (0.948, 0.953)  0.956 (0.953, 0.958)  0.958 (0.956, 0.961)  0.968 (0.965, 0.970) 
Intracluster correlation coefficient  0.049 (0.048, 0.051)  0.050 (0.048, 0.051)  0.049 (0.047, 0.0498)  0.047 (0.046, 0.048)  0.045 (0.043, 0.046)  0.039 (0.037, 0.040) 
Empirical power  82.7%  80.7%  79.5%  72.9%  64.2%  45.1% 
Dropped from analysis  
Intercept, β_{0}  0.001 (−0.003,0.004)  −0.002 (−0.006, 0.001)  −0.001 (−0.004, 0.003)  −0.004 (−0.007, −0.001)  −0.001 (−0.005, 0.003)  0.000 (−0.004, 0.005) 
Treatment effect, β_{1}  0.202 (0.197, 0.207)  0.201 (0.196, 0.206)  0.199 (0.194, 0.204)  0.202 (0.197, 0.207)  0.202 (0.196, 0.207)  0.198 (0.192, 0.205) 
${\mathit{\sigma}}_{\mathit{b}}^{2}$  0.051 (0.049, 0.052)  0.050 (0.048, 0.051)  0.050 (0.048, 0.051)  0.049 (0.048, 0.051)  0.049 (0.047, 0.051)  0.050 (0.048, 0.051) 
${\mathit{\sigma}}_{\mathit{w}}^{2}$  0.949 (0.947, 0.952)  0.951 (0.948, 0.953)  0.952 (0.950, 0.955)  0.950 (0.947, 0.953)  0.953 (0.950, 0.956)  0.950 (0.946, 0.953) 
Intracluster correlation coefficient  0.051 (0.049, 0.052)  0.050 (0.048, 0.051)  0.050 (0.048, 0.051)  0.049 (0.048, 0.051)  0.049 (0.047, 0.0499)  0.049 (0.048, 0.051) 
Empirical power  81.7%  81.0%  78.3%  77.6%  70.2%  53.1% 
As with the homogeneous merges, the ICC decreased as the total number of merges increased, but in this scenario the decrease was not sufficient to prevent the severe loss of power caused by the merges.
Scenario 4
Parameter estimates following heterogeneous cluster merges: scenario 4 ^{ a }
Number of cluster merges per treatment group  

Empirical estimates  0  1  2  5  10  20 
Assigned to control  
Intercept, β_{0}  0.001 (−0.003, 0.004)  0.003 (0.000, 0.007)  0.003 (0.000, 0.007)  0.007 (0.004, 0.011)  0.017 (0.013, 0.020)  0.035 (0.032, 0.0380) 
Treatment effect, β_{1}  0.201 (0.196, 0.205)  0.196 (0.191, 0.199)  0.196 (0.191, 0.201)  0.191 (0.186, 0.196)  0.184 (0.179, 0.189)  0.166 (0.159, 0.172) 
${\mathit{\sigma}}_{\mathit{b}}^{2}$  0.050 (0.048, 0.051)  0.049 (0.048, 0.051)  0.049 (0.047, 0.0496)  0.046 (0.045, 0.047)  0.043 (0.042, 0.044)  0.034 (0.033, 0.036) 
${\mathit{\sigma}}_{\mathit{w}}^{2}$  0.950 (0.948, 0.952)  0.951 (0.950, 0.953)  0.952 (0.950, 0.954)  0.957 (0.954, 0.959)  0.962 (0.959, 0.964)  0.971 (0.969, 0.974) 
Intracluster correlation coefficient  0.050 (0.048, 0.051)  0.049 (0.048, 0.0499)  0.049 (0.047, 0.0494)  0.046 (0.044, 0.047)  0.042 (0.041, 0.044)  0.034 (0.033, 0.035) 
Empirical power  81.2%  78.1%  78.3%  76.8%  72.1%  48.2% 
Assigned to intervention  
Intercept, β_{0}  0.003 (<0.001, 0.007)  0.001 (−0.003, 0.004)  0.002 (−0.002, 0.006)  −0.002 (−0.006, 0.001)  0.001 (−0.004, 0.005)  −0.004 (−0.010, 0.002) 
Treatment effect, β_{1}  0.198 (0.193, 0.202)  0.195 (0.191, 0.1997)  0.194 (0.189, 0.199)  0.192 (0.187, 0.197)  0.179 (0.174, 0.184)  0.168 (0.162, 0.175) 
${\mathit{\sigma}}_{\mathit{b}}^{2}$  0.049 (0.048, 0.051)  0.050 (0.048, 0.051)  0.048 (0.047, 0.0493)  0.047 (0.046, 0.049)  0.044 (0.042, 0.045)  0.034 (0.033, 0.035) 
${\mathit{\sigma}}_{\mathit{w}}^{2}$  0.949 (0.947, 0.951)  0.951 (0.949, 0.953)  0.953 (0.951, 0.955)  0.956 (0.954, 0.958)  0.963 (0.961, 0.965)  0.973 (0.971, 0.976) 
Intracluster correlation coefficient  0.049 (0.048, 0.050)  0.049 (0.048, 0.051)  0.048 (0.047, 0.049)  0.047 (0.046, 0.048)  0.043 (0.042, 0.044)  0.033 (0.032, 0.035) 
Empirical power  81.0%  79.7%  77.3%  75.8%  67.2%  50.7% 
Completers only  
Intercept, β_{0}  −0.000 (−0.004, 0.003)  0.004 (0.000, 0.007)  0.001 (−0.003, 0.005)  −0.000 (−0.004, 0.003)  0.001 (−0.002, 0.005)  −0.000 (−0.004, 0.003) 
Treatment effect, β_{1}  0.200 (0.195, 0.205)  0.195 (0.190, 0.1991)  0.198 (0.193, 0.203)  0.199 (0.194, 0.204)  0.198 (0.193, 0.203)  0.200 (0.194, 0.205) 
${\mathit{\sigma}}_{\mathit{b}}^{2}$  0.050 (0.049, 0.051)  0.050 (0.048, 0.051)  0.050 (0.049, 0.051)  0.051 (0.049, 0.052)  0.050 (0.049, 0.051)  0.050 (0.048, 0.051) 
${\mathit{\sigma}}_{\mathit{w}}^{2}$  0.950 (0.948, 0.952)  0.952 (0.950, 0.954)  0.949 (0.947, 0.952)  0.950 (0.947, 0.952)  0.951 (0.948, 0.953)  0.948 (0.946, 0.951) 
Intracluster correlation coefficient  0.050 (0.049, 0.051)  0.050 (0.048, 0.051)  0.050 (0.049, 0.051)  0.050 (0.049, 0.052)  0.050 (0.048, 0.051)  0.050 (0.048, 0.051) 
Empirical power  82.3%  79.2%  79.6%  79.5%  77.1%  73.3% 
If the analysis is restricted to those completing treatment prior to the cluster merge (labelled “Completers only” in Table 3), then the treatment effect estimates remained unbiased as expected, but the estimates are less precise because of the effective reduction in sample size. The ICC is unaffected by the number of merges, and study power is slightly affected. As with homogeneous merges, the same patterns in estimates were observed when the number and size of clusters were varied, with very similar results by proportion of clusters merging.
Discussion
We have demonstrated, through established approaches to power calculation, that cluster merges have an adverse impact on study power, assuming that the ICC is unaffected by the change in average cluster size and variability in cluster size. Given the way in which study power may be impacted if clusters merge, we suggest that allowance in this case may need to be made through recruitment of additional clusters rather than just by increasing the size of the clusters, which is the more common approach when allowing for loss to followup, although a combination of the two may need to be considered. This issue is closely related to that of variability in cluster size and loss to followup of clusters, in effect being a combination of the two. Consequently, the basis of allowance for cluster merges in the design could be through using established, previously published methods such as the one proposed by Taljaard et al.[16]. However, given the cost of additional clusters, we suggest that the decision whether to allow for cluster merging will depend on the perceived likelihood of merges in any particular study and will be based on knowledge of the chosen participating sites.
The simulations suggest that homogeneous cluster merges do not affect the treatment effect estimate. In our present analysis, we assumed that the cluster size represents the whole cluster for each cluster, not just a subset of a larger cluster being analysed. Consequently, the anticipated loss in study power was offset by the change in the ICC, such that the impact was much smaller than expected. The linear relationship obtained between the estimate of ICC and the total number of cluster merges indicates that the ICC depends on the average cluster size. This is in keeping with the relationship between ICC and natural cluster size that has been shown previously [17, 18], with smaller ICC as the average cluster size increases. This change in ICC would not occur if the size of the cluster represented the number from a larger cluster being analysed, because ICC is related to the natural cluster size rather than the number sampled, and, in such circumstances, we would expect to see a loss in study power following any merges.
The simulations therefore indicate that the pragmatic approach to analysis, treating the new merged cluster as one cluster, if any homogeneous cluster merges occur is reasonable, without causing bias or loss of precision in treatment effect estimate.
The attenuation of the treatment effect estimate following heterogeneous cluster merges is unsurprising, given the change in cluster composition, although we note that the impact is minimal when there are only a few cluster merges. For example, under scenario 3, the clusters resulting from the merge consist of an equal number of individuals from each treatment group, and we might then expect the outcome in these clusters to be (μ_{0} + μ_{1})/2. Following assignment to either treatment group, the treatment effect will be attenuated, either through an increase in mean response in the control group or a decrease in mean response in the intervention group. Consequently, assigning merged clusters to either treatment group in these circumstances will result in biased estimates.
Bias following heterogeneous merges can be avoided by dropping merged clusters from the analysis or by including only those individuals who completed treatment prior to the merge. In practice, this would require that any merged clusters discontinue the RCT.
In a review of 152 cluster RCTs in primary care, Eldridge et al. reported an average cluster size of 32 and an interquartile range of 9 to 82 [19]. In our present study, we assessed three fixed cluster sizes—20, 40 and 100—that reflect the cluster sizes in RCTs carried out in primary care. We note that the findings in each scenario were dependent not on cluster size, only on the proportion of clusters merging. We would not expect the impact to be any different with larger cluster sizes.
We have assumed a fixed cluster size, that is, that the number of individuals recruited per cluster is the same across clusters. In some RCTs, this may be unrealistic, such as in situations where an entire GP practice is included. A review of cluster RCTs in primary care showed that approximately twothirds have clusters of unequal size [19]. Methods have already been proposed for inflating sample size to take into account such variability, the simplest of which rely on knowledge of the range of cluster sizes to be included [12]; however, many assume an average cluster size and do not take this into account when calculating sample size [20]. On the basis of the work presented herein, it might be expected that the impact of clusters merging may be less when the variability in cluster size has already been considered, but further work is needed to understand the consequences in this situation.
Although we have used primary care as the motivating example throughout this article, given the reduction over time in the number of GP practices within the United Kingdom [3], the results presented herein can be applied to other areas if there is a risk of cluster merges.
We have not yet considered other ways in which the cluster composition may change, such as merges with clusters not originally participating in the RCT, which is not likely to lead to biased estimates, but power is likely to be affected as the cluster size increases or if more than two clusters are merged. In addition, clusters may fragment, resulting in more clusters of smaller average size. Again, treatment estimates will be unbiased if original treatment allocation applies, but power will be affected. However, consideration would need to be given to whether these ‘new’ clusters should remain in the same treatment arm of the RCT, because it might be more appropriate to randomise if cluster members are to participate. Cluster membership may also fluctuate during the course of the study without merging or fragmentation of clusters, particularly in primary care, where patients leave and join a practice, an issue discussed by Diehr et al.[21] in relation to survey design.
The CONSORT extension for cluster RCTs requires the flow of clusters, as well as the flow of patients, to be described. Our review of the literature indicates that, even when authors have revealed changes to clusters, they did not do so in a manner that allowed full understanding. Clearly, authors need to follow reporting guidelines more closely, and journal editors should emphasise the need to do so. Investigators also need to consider whether changes need to be made to protocols, either to preempt any possible changes to cluster composition, defining up front how they should be dealt with or in response to such changes.
Conclusions
Adjusting the design effect in power calculations for variability in cluster size and changes in average cluster size, we note that merging of clusters in cluster RCTs is expected to result in a loss of power. However, the simulations conducted examining homogeneous cluster merges resulted in a much smaller loss of power, to the extent of being largely unimportant, because the observed ICC decreased. This suggests that the relationship of ICC with cluster size should not be ignored at the planning stage.
A pragmatic approach in which the merged clusters are analysed as one new cluster, following homogeneous cluster merges, results in acceptable treatment effect estimates, so such merges should not cause concern. However, heterogeneous merges are problematic, leading to biased treatment effect estimates unless merged clusters are discontinued. If such clusters are discontinued, the estimate is unbiased, but with a loss of precision. Allowance for loss to followup at the cluster level as well as at the individual level might be advisable at the planning stage of a cluster RCT. Further research is warranted to fully understand the impact of other changes to clusters postrandomisation and to develop appropriate approaches to statistical analysis.
Abbreviations
 CONSORT:

Consolidated Standards of Reporting Trials
 GP:

General practitioner
 ICC:

Intracluster correlation coefficient
 RCT:

Randomised controlled trial.
Declarations
Acknowledgements
We thank John Brookes for reviewing reports of cluster randomised trials. NC received funding to undertake his master’s degree in medical statistics from the National Institute for Health Research, and his contribution to this project formed part of his dissertation. This project received no specific funding.
Authors’ Affiliations
References
 Hayes RJ, Moulton LH: Cluster Randomised Trials. 2009, Boca Raton FL: Chapman & Hall/CRC PressView ArticleGoogle Scholar
 Eldridge S, Kerry S: A Practical Guide to Cluster Randomised Trials in Health Services Research. 2012, Chichester, UK: John Wiley & SonsView ArticleGoogle Scholar
 Gregory S: General Practice in England: An Overview (Briefing). 2009, London: The King’s Fund, Available at http://www.kingsfund.org.uk/sites/files/kf/generalpracticeinenglandoverviewsarahgregorykingsfundseptember2009.pdf.; (accessed 14 May 2014)Google Scholar
 Goodwin N, Dixon A, Poole T, Raleigh V: Improving the Quality of Care in General Practice (Report of an Independent Enquiry Commissioned by the King’s Fund). 2011, London: The King’s Fund, Available at http://www.kingsfund.org.uk/sites/files/kf/improvingqualityofcaregeneralpracticeindependentinquiryreportkingsfundmarch2011_0.pdf.; (accessed 15 May 2014)Google Scholar
 Campbell MK, Elbourne DR, Altman DG; CONSORT Group: CONSORT statement: extension to cluster randomised trials. BMJ. 2004, 328: 702708. 10.1136/bmj.328.7441.702.View ArticlePubMedPubMed CentralGoogle Scholar
 Foy R, Eccles MP, Hrisos S, Hawthorne G, Steen N, Gibb I, Croal B, Grimshaw J: A cluster randomised trial of educational messages to improve the primary care of diabetes. Implement Sci. 2011, 6: 12910.1186/174859086129.View ArticlePubMedPubMed CentralGoogle Scholar
 Schermer TR, Akkermans RP, Crockett AJ, van Montfort M, GrootensStekelenburg J, Stout JW, Pieters W: Effect of elearning and repeated performance feedback on spirometry test quality in family practice: a cluster trial. Ann Fam Med. 2011, 9: 330336. 10.1370/afm.1258.View ArticlePubMedPubMed CentralGoogle Scholar
 Donner A, Birkett N, Buck C: Randomization by cluster: sample size requirements and analysis. Am J Epidemiol. 1981, 114: 906914.PubMedGoogle Scholar
 Hemming K, Girling AJ, Sitch AJ, March J, Lilford RJ: Sample size calculations for cluster randomised controlled trials with a fixed number of clusters. BMC Med Res Methodol. 2011, 11: 10210.1186/1471228811102.View ArticlePubMedPubMed CentralGoogle Scholar
 Kerry SM, Bland JM: Unequal cluster sizes for trials in English and Welsh general practice: implications for sample size calculations. Stat Med. 2001, 20: 377390. 10.1002/10970258(20010215)20:3<377::AIDSIM799>3.0.CO;2N.View ArticlePubMedGoogle Scholar
 Lake S, Kammann E, Klar N, Betensky R: Sample size reestimation in cluster randomization trials. Stat Med. 2002, 21: 13371350. 10.1002/sim.1121.View ArticlePubMedGoogle Scholar
 Eldridge SM, Ashby D, Kerry S: Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method. Int J Epidemiol. 2006, 35: 12921300. 10.1093/ije/dyl129.View ArticlePubMedGoogle Scholar
 Kong SH, Ahn CW, Jung SH: Sample size calculation for dichotomous outcomes in cluster randomization trials with varying cluster size. Drug Inf J. 2003, 37: 109114.Google Scholar
 Manatunga AK, Hudgens MG, Chen S: Sample size estimation in cluster randomized studies with varying cluster size. Biom J. 2001, 43: 7586. 10.1002/15214036(200102)43:1<75::AIDBIMJ75>3.0.CO;2N.View ArticleGoogle Scholar
 Twisk JWR: Applied Multilevel Analysis: A Practical Guide for Medical Researchers. 2006, Cambridge, UK: Cambridge University PressView ArticleGoogle Scholar
 Taljaard M, Donner A, Klar N: Accounting for expected attrition in the planning of community intervention trials. Stat Med. 2007, 26: 26152628. 10.1002/sim.2733.View ArticlePubMedGoogle Scholar
 Donner A: An empirical study of cluster randomization. Int J Epidemiol. 1982, 11: 283286. 10.1093/ije/11.3.283.View ArticlePubMedGoogle Scholar
 Gulliford MC, Ukoumunne OC, Chinn S: Components of variance and intraclass correlations for the design of communitybased surveys and intervention studies: data from the Health Survey for England 1994. Am J Epidemiol. 1999, 149: 876883. 10.1093/oxfordjournals.aje.a009904.View ArticlePubMedGoogle Scholar
 Eldridge SM, Ashby D, Feder GS, Rudnicka AR, Ukoumunne OC: Lessons for cluster randomized trials in the twentyfirst century: a systematic review of trials in primary care. Clin Trials. 2004, 1: 8090. 10.1191/1740774504cn006rr.View ArticlePubMedGoogle Scholar
 Kerry SM, Bland JM: Sample size in cluster randomisation. BMJ. 1998, 316: 54910.1136/bmj.316.7130.549.View ArticlePubMedPubMed CentralGoogle Scholar
 Diehr P, Martin DC, Koepsell T, Cheadle A, Psaty BM, Wagner EH: Optimal survey design for community evaluations: cohort or crosssectional. J Clin Epidemiol. 1995, 48: 14611472. 10.1016/08954356(95)000550.View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.