 Methodology
 Open Access
 Published:
The mixed model for repeated measures for cluster randomized trials: a simulation study investigating bias and type I error with missing continuous data
Trials volume 21, Article number: 148 (2020)
Abstract
Background
Cluster randomized trials (CRTs) are a design used to test interventions where individual randomization is not appropriate. The mixed model for repeated measures (MMRM) is a popular choice for individually randomized trials with longitudinal continuous outcomes. This model’s appeal is due to avoidance of model misspecification and its unbiasedness for data missing completely at random or at random.
Methods
We extended the MMRM to cluster randomized trials by adding a random intercept for the cluster and undertook a simulation experiment to investigate statistical properties when data are missing at random. We simulated cluster randomized trial data where the outcome was continuous and measured at baseline and three postintervention time points. We varied the number of clusters, the cluster size, the intracluster correlation, missingness and the datageneration models. We demonstrate the MMRMCRT with an example of a cluster randomized trial on cardiovascular disease prevention among diabetics.
Results
When simulating a treatment effect at the final time point we found that estimates were unbiased when data were complete and when data were missing at random. Variance components were also largely unbiased. When simulating under the null, we found that type I error was largely nominal, although for a few specific cases it was as high as 0.081.
Conclusions
Although there have been assertions that this model is inappropriate when there are more than two repeated measures on subjects, we found evidence to the contrary. We conclude that the MMRM for CRTs is a good analytic choice for cluster randomized trials with a continuous outcome measured longitudinally.
Trial registration
ClinicalTrials.gov, ID: NCT02804698.
Introduction
Cluster randomized trials (CRTs) are a design that randomizes clusters, rather than individuals, to intervention arms. The design may be used because the intervention is at the cluster level, such as behavioral group therapy, or due to potential contamination between participants, or because of ethical or logistic considerations [1]. Clusters may be households, clinics, schools or towns, and individuals within clusters are usually correlated, thereby violating the independence assumption of common statistical methods. The intracluster correlation (ICC), defined as the ratio of the betweencluster variance to the total variance, is the measure of this nonindependence [1]. CRTs are increasingly being used as they are a good design for comparative effectiveness research and pragmatic trials [2].
Many trials, both individually and cluster randomized, have repeated outcome measures over time. For example, a cardiovascular disease (CVD) prevention trial may measure weight, body mass index and stress at baseline, at 3 months post intervention and a year [3]. The longitudinal design must be accounted for in the analysis because repeated measures on the same individual are not independent. Maximumlikelihoodbased mixed models are one common statistical approach for handling nonindependence. One particular type of mixed model, commonly referred to as the mixed model for repeated measures (MMRM), is a popular choice for individually randomized trials with longitudinal continuous outcomes measured at set time points [4,5,6,7]. This model uses an unstructured time and covariance structure and its appeal is due to (1) avoidance of model misspecification and (2) its unbiasedness for data which are missing completely at random (MCAR) or missing at random (MAR). Although many researchers use simple methods, such as a ttest to compare arms at a given time point, or single imputation such as lastobservationcarriedforward [8,9,10]. these approaches can result in biased estimates; t tests make an MCAR assumption (defined below) and lastobservationcarriedforward has been shown to give unpredictable results [11,12,13]. In addition to unbiased estimation when data are MAR or MCAR, mixed models are more powerful than t tests when data are missing [14].
Data are MCAR if missingness is unrelated to any observed or unobserved data (covariates or outcomes). Data are MAR if missingness is related to observed outcome data (such as the previous weight, in the CVD example above), but not unobserved data, such as the subject’s current weight. Data are said to be missing not at random (MNAR) if the missingness is dependent on unobserved data (such as the current weight measurement in the CVD example), even after taking observed data into account. A fourth category is sometimes used, which is covariatedependent missing data. (Note that some researchers have called this type of missing data MCAR [15], whereas others have called it MAR [7, 16]) Statistical methods that result in unbiased estimation when data are MAR or MCAR include mixed models, multiple imputation, and inverse probability weighted generalized estimating equations. Reviews show that most trialists use methods that make the strong assumption that data are MCAR (e.g., t tests on available data, single imputation) [8,9,10]. A more conservative approach is to use a primary analysis that assumes that data are MAR, followed by a sensitivity analysis that weakens this assumption [11, 12].
CRTS can be analyzed at the individual or the cluster level, where data from each of the clusters is summarized by a single value, such as the mean (thereby removing the issue of intracluster correlation) [1]. For a small number of clusters (< 40 total) the recommendation is to use a clusterlevel analysis [17]. particularly if unweighted generalized estimating equations are used, as type I error can be severely inflated otherwise [18]. With respect to missing data, Hossain et al. compared individuallevel analysis versus clusterlevel analysis for CRTs with covariatedependent missing data where the continuous outcomes were measured twice (baseline and followup) [19]. They found that using mixed models or multiple imputation at the individual level resulted in unbiased estimation in all considered scenarios, whereas analysis at the cluster level did not always result in unbiased estimates. The focus of this paper is on analysis at the individual level, which is how most CRTs are analyzed [9, 17].
When the CRT has outcomes measured longitudinally on the same individual, both types of nonindependence must be accounted for. Research into the most appropriate analytical approach for this type of design has been limited, particularly with respect to missing data. Johnson et al. [20] investigated type I error for several analytical approaches at the individual and cluster level for CRTs with imbalanced cluster size, but did not consider outcomes that were measured longitudinally. They recommended using the KenwardRogers denominator degrees of freedom, a small sample correction that has been shown to have favorable properties [21, 22]. Murray et al. investigated analytical approaches for CRTs with longitudinally measured outcomes, and concluded that mixedmodel analyses of variance (ANOVAs) are inappropriate when there are more than two time points [23]. However, their conclusions may be too broad, as they did not test the model that we propose here.
We extend the MMRM to cluster trials (MMRMCRT) by simply adding a random effect for cluster. While this model is not necessarily new; for example, Littell discusses this model in the context of repeated measures with clustering due to schools [24], the choice of similar models has been criticized when outcomes are measured at more than two time points (as mentioned above) [23, 25, 26]. Furthermore, to our knowledge, this model has not been investigated for its statistical properties when outcome data are incomplete. The objective of this research was to extend the MMRM to CRTs with continuous outcomes measured longitudinally on the same subject at more than two fixed time points, and to investigate this model’s statistical properties using simulation, particularly with respect to missing data. Specifically, the aims of this study were to investigate the bias of treatment effects and variance estimates, as well as the type I error rate of the MMRMCRT. We aimed to investigate the impact of varying the ICC; the number of clusters per arm; cluster size; missingness mechanism; and underlying covariance structure. We demonstrate using a CRT for CVD prevention among diabetics, where the clusters are clinics from the state of Sonora, Mexico.
Methods
The MMRM in general
The mixed model for repeated measures uses an unstructured time and covariance structure [27]. Unstructured time means that time is modeled categorically, rather than continuously as a linear or polynomial function, and allows for an arbitrary trajectory over time. While the continuous time models may use fewer degrees of freedom and may, therefore, be more powerful, it can be difficult to anticipate the outcome’s time trajectory in advance. Since clinical trials often require a prespecified analysis plan, unstructured time can be appealing [27]. In the context of randomized controlled trials, fixed effects of time, treatment and their interaction are included in the MMRM model. The interaction term accommodates different patterns of change over time between the arms. Baseline values of the outcome are sometimes included [28]. Maximumlikelihoodbased mixed models provide unbiased estimation for data that are MCAR or MAR, as long as the model is not misspecified [29, 30]. All outcome data are used, regardless of whether an individual has complete data or not, making these models consistent with an intentiontotreat analysis [31,32,33].
Cluster randomized trials with longitudinally measured outcomes have two sources of nonindependence: the cluster and the repeated measures over time. Linear mixedeffects models are one option for handling the nonindependence of measurements over time. In the mixedmodel context, one may use a randomcoefficients model, using random effects for a subject’s intercept and sometimes slope. Alternatively, one may use covariance pattern models, where the covariance between repeated measures on the same subject is modeled explicitly from the residual effects [28, 30]. Some commonly used covariance structures, available in statistical software, include compound symmetric, autoregressive, Toeplitz or unstructured. A compound symmetric structure assumes that any two measurements on the same individual have the same covariance, regardless of timing. An autoregressive structure assumes that measurements’ correlation drops over time exponentially. The Toeplitz structure has homogeneous variance over time, and a banded correlation structure, so that the (r,c) element of the matrix is the same as the (r + 1, c + 1) and the (r + 2, c + 2) elements, etc. (i.e., when the degree of adjacency is the same) [28]. Unstructured covariance makes no assumptions about the correlation between measurements, thereby rendering misspecification not a problem; however, it can require that a large number of parameters must be estimated [30]. However, many cluster trials have a fairly small number of assessments on each subject.
The general mixed model for the ith subject is given by:
where γ_{i} is independent of ε_{i}; Y_{i} is the n_{i} × 1 response vector; n is the number of planned assessments for each subject i = 1, …, N and n_{i} is the number of observed assessments for the ith subject; β is the p × 1 fixedeffects vector; X_{i} is the n_{i} × p fixedeffects design matrix; Z_{i} is the n_{i} × q matrix of randomeffects design matrix; γ_{i} is the q × 1 vector of random effects and ε_{i} is the n_{i} × 1 vector of residuals. G is the q × q covariance matrix for the random effects, and R_{i} is the n_{i} × n_{i} covariance matrix for the residuals.
The MMRM for CRTs
Our proposed model, for a twoarm trial, has p = 2n + 2 fixed effects: a fixed effect for each assessment for each treatment arm, an intercept and a treatment indicator. The only random effect is for cluster, so q = 1, and G is a scalar. R_{i} is unstructured. This model is easily extended to include more than two arms, the baseline value of the outcome variable as a covariate (instead of in the outcome vector as shown here), and/or a baseline by treatment arm interaction [28].
The ICC
The ICC is defined as \( \frac{\sigma_C^2}{\sigma_{Total}^2} \) where betweencluster variance is \( {\sigma}_C^2 \) and \( {\sigma}_{Total}^2 \) is the total variance. In the MMRMCRT model, \( {\sigma}_{Total}^2 \) is \( {\sigma}_C^2+{\sigma}_B^2+{\sigma}_W^2 \), where \( {\sigma}_W^2 \) is the withinsubject (or residual) variance and \( {\sigma}_B^2 \) is the betweensubject variance. These variance components are functions of the elements of the G and R matrices. The ICC is used to calculate the design effect = 1 + (m − 1) × ICC, where m = cluster size, which is the factor used to inflate the required sample size of an individually randomized trial to account for clustering, while maintaining the same level of power.
Simulation study
We undertook a set of simulation studies to investigate the MMRM for CRTs in the presence of missing data. We simulated data from a twoarm CRT where the outcome was continuous and measured at baseline and three postintervention time points. We varied the ICC, the number of clusters per arm k, the cluster size m, missingness (complete or MAR), and the missingness mechanism direction (described below). To show the generality of the MMRMCRT we simulated data using three methods, which created different underlying covariance structures. The values for the simulation to investigate bias are shown in Table 1. We also simulated under the null hypothesis to investigate the type I error rate. We used 1000 replications for each parameter combination. We simulated MAR data only: MCAR data were not simulated because analyses which are unbiased for MAR data are also unbiased for MCAR data. Methods for MNAR data are beyond the scope of this paper.
Data generation
Multivariate normal data were generated using three methods, none of which are based directly on the model that we are proposing. To investigate bias, the means of y over time were (50, 50, 50, 50) for the control arm and (50, 55, 60, 55) for the treatment arm. To investigate type I error, we simulated under the null, with no differences in trajectories over time between the arms. Each method simulated data with ICCs of 0.01 and 0.1, values that are consistent with empirically estimated ICCs [34].
The first simulation method was a mixedeffects model with fixed effects for categorical time, treatment arm, and their interaction; random effects (intercepts) for subject and cluster; and a single residualvariance component, σ^{2}_{w}. The number of random effects is q = 2, so G is a 2 × 2 matrix comprised οϕ σ^{2}_{Χ} and σ^{2}_{B}. This model induces a compound symmetric covariance structure for measurements on the same subject. The correlation for subjects within the same cluster is the ICC. For ICC = 0.01, we set σ^{2}_{C}=1, σ^{2}_{B} = 60, and σ^{2}_{w} = 39. For ICC = 0.1, we set σ^{2}_{C} = 10, σ^{2}_{B} = 60, and σ^{2}_{w} = 30.
In the second simulation method we used the same fixedeffects structure as method 1, but with a single random cluster effect and a withinsubject covariance over time governed by the following Toeplitz covariance matrix. For ICC = 0.01, we set σ^{2}_{C} = 1 and σ^{2}_{w} = 99. For ICC = 0.1, we set σ^{2}_{C} = 10 and σ^{2}_{w} = 90.
The third datasimulation method used the same fixedeffects structure as the previous two methods, but included random effects for subject, cluster intercepts and cluster slopes. Each cluster k had a random effect generated ~ N(0, σ^{2}_{C}), which represents a constant offset from the overall mean trajectory over time, and additional random effects at each of the four time points ~ N(0, 0.4σ^{2}_{C}). he number of random effects is q = 5, so G is a 5 × 5 matrix. For ICC = 0.01,we set σ^{2}_{C} = 0.714, σ^{2}_{B} = 60, and σ^{2}_{w} = 39. For ICC = 0.1, we set σ^{2}_{C} = 7.14, σ^{2}_{B} = 60, and σ^{2}_{w} = 30. The simulation code is given in the Additional file 1.
Missingness
We created MAR dropout data by using thresholds, above or below which data were deleted with a given probability at each postbaseline time point. The probability and thresholds were tuned so that data were missing at a rate of 30% in both arms at the final time point. Missingness at the jth time point was based on the outcome value of the j−1th time point, which creates MAR data. All baseline data were complete, and missingness was dropout only. Two mechanisms were used: “same direction,” where values of lower y were deleted for both arms; and “opposite direction,” where high values of y were more likely to be deleted in the control arm, and low values were more likely to be deleted in the treatment arm. In practice, missingness is likely to be a result of a combination of mechanisms, and we use this mechanism as an extreme case. However, a possible scenario where this type of missingness might occur is when the toxicity for the experimental arm is high so that patients with low quality of life are likely to drop out, whereas the control arm’s toxicity is low, and patients with higher quality of life might be likely to drop out, as they feel better, or possibly cured.
Analysis
We analyzed each of the datasets using the MMRMCRT as described above using the SAS Mixed procedure (version 9.4, Cary, NC, USA) with a random intercept for cluster, and fixed effects of categorical time, treatment, and the interaction time x treatment. The unstructured covariance is indicated within the repeated statement. The SAS and R code to fit this model is given in the Additional file 1. Our interest was focused on the difference at the fourth time point between the treatment arms, which we estimated using a contrast within the model. We used restricted maximumlikelihood estimation, and the KenwardRogers denominator degrees of freedom [22].
Performance evaluation
The percent bias was calculated for the mean difference between the arms at the fourth time point by subtracting the true difference from the estimated difference and dividing by the true difference. Coverage was calculated as the percentage of 95% confidence intervals that contained the true value. We assessed the type I error rate for data that were simulated under the null hypothesis of no effect, by finding the percentage of tests (between arms at the fourth time point) that were significant at the 0.05 level. As bias was our performance measure of greatest interest, we calculated the Monte Carlo standard error for percent bias as:
where 5 was the true treatment difference [35].
We also estimated the bias of variance components. The estimates of variance generated by SAS are σ^{2}_{c} from the random statement, and the 4 × 4 R matrix from the repeated statement, 10 elements of which are unique. Thus, the bias for the variance of the random effect of cluster, σ^{2}_{c}, was estimated directly, as this variance estimate is default output for SAS Proc Mixed. For data that were simulated using methods 1 and 3, the expected value of the average of the diagonal elements of R is σ^{2}_{B} + σ^{2}_{W} and the average of the offdiagonal elements should be equal to σ^{2}_{W}.
For data that were generated using method 2 (a Toeplitz covariance structure and a random effect for cluster), we calculated bias by dividing each of the elements of the estimated R matrix by its presumed correlation due to the Toeplitz structure, and averaged these values to get an estimate of σ^{2}_{W}. Diagonal elements were divided by 1, one place off the diagonal were divided by 0.8, two places off the diagonal were divided by 0.7, and three places off the diagonal were divided by 0.6.
Simulation results
The direction of missingness did not affect the results, so we report results when missingness was in the same direction for both arms. Results for when missingness direction differed between arms are given in the Additional file 1.
Bias of treatment effect
The estimates of the difference between arms at the fourth time point were largely unbiased (Tables 1, 2 and 3). The true treatment effect was 5.0: estimates range from 4.93 to 5.12 and the percent bias ranged from − 1.3 to 2.4%. There was no effect of the number of clusters, the cluster size, the ICC, the simulation method or the direction of missing data. The Monte Carlo standard error for the percent bias of the treatment effect ranged from 0.2 to 1.9%.
Bias of variance components
In general, variance component estimates were also unbiased: of the 192 variance components estimated, 85% had less than 10% bias. Smaller cluster sizes, particularly when the number of clusters was small, and low ICCs were associated with higher relative bias for σ^{2}_{C}. For example, when k = 5, m = 10 and ICC = 0.01, simulation methods 1–3 had percent biases of 121, 137, and 186%, respectively. The estimates for σ^{2}_{C} were 2.2, 2.4, and 2.0 for true values of 1.0, 1.0, and 0.71. Using simulation method 3 estimates of the variance for cluster effects and within subject were slightly inflated and estimates for variance between subjects were slightly underestimated.
Coverage
When the ICC was small, at 0.01, coverage estimates for the treatment effect were close to 95% for all three simulation methods. There was noticeable undercoverage when the ICC was 0.1 under simulation method 3 (random slope effect for cluster) with coverage falling as low as 89.7%. See Table 4.
Type I error
When simulating under the null with 30% missing data, type I error ranged from 2.7 to 8.1% (Table 5). Larger values occurred using simulation method 3 (random intercept and slope for clusters) with larger ICC. Other methods and ICCs yielded values that were close to nominal.
Motivating example
We demonstrate the MMRMCRT with MetaSalud Diabetes, a CRT designed to reduce the risk of CVD in diabetics in the Mexican state of Sonora by focusing on improving healthy behaviors. Clusters were health clinics where the intervention was implemented, and randomization was stratified by region (north, south, central). Informed consent from all participants was given and ethical approval was obtained. Details can be found elsewhere [3]. Briefly, the primary outcome was the Framingham CVD risk score, which is a function of age, sex, blood pressure, cholesterol, smoking, and diabetic status, as detailed in D’Agostino et al. which estimates the risk of CVD in the next 10 years [36]. Twentyfour clinics were randomized to intervention (n = 293) or control (n = 242), with two clusters eventually being dropped from the control arm due to logistical reasons. Participants were assessed at baseline, 3, and 12 months. For this demonstration, we fit a MMRMCRT with fixed effects of time, arm, time x arm, strata, and a random effect for clinics. Time was fit categorically and the 3 × 3 covariance matrix for time was unstructured. Inference focused on the difference in CVD risk between the arms at 3 and 12 months.
Results
By month 12, the rates of missing outcome data were 21% and 11% for the intervention and control arm respectively. We found statistically significant differences in CVD risk at both 3 and 12 months. CVD risk was 4.8 percentage points higher in the intervention arm than the control arm at 3 months (95% CI 1.2, 8.5, p = 0.01); at 12 months the difference was 3.9 percentage points (95% CI 0.3, 7.4, p = 0.03). See Table 6. The ICC was estimated to be 0.031, similar to what other studies have found for various psychosocial and behavioral outcomes [34]. This trial had differential retention. While rates of missingness/retention should always be monitored and investigated by trial staff, unbiased estimation is still possible, as shown by Bell et al. [37]. In this particular case, it may have been due to the higher rate of participants who had just joined the health clinic in the intervention arm (34.1) versus the control arm (9.4), and were not fully committed to the clinic.
Discussion
We aimed to investigate the mixed model for repeated measures for CRTs, for complete data and for data MAR, where assessments of the continuous outcome are made at fixed time points. When simulating a treatment effect at the final time point we found that estimates were unbiased when data were complete and when data were MAR. Estimates of variance components were mostly unbiased, although cluster effects were, in some cases, overestimated, particularly when the number of clusters per arm was small (k = 5) and when the data were simulated with random intercept and slopes for cluster. Although the percentage bias was large in some cases, up to 186%, this is due to a small true value of σ^{2}_{C} = 0.71, and an average estimate of 2.0. In practice, this may not have much effect, but caution should be used whenever sample sizes are small, including when there are a small number of clusters with small cluster size.
Type I error rate was close to nominal for most of our simulation methods, ranging from 2.7 to 8.1% when simulating under the null. Generalized estimating equations, another popular approach for analyzing CRTs, also suffers from increased type I error when there is a small number of clusters. Huang et al. showed type I error of 47 to 12% when the number of clusters per arm ranged from two to five.
The worst performance for the MMRMCRT occurred when the ICC was larger, at 0.1, and when simulation method 3 (random intercepts and slopes) was used. Empirical estimates of several ICCs in family practice settings had a median of 0.01; a similar study in the field of psychooncology had a median ICC of 0.0007 for longitudinal studies with a maximum value of 0.09 [34, 38]. This suggests that in certain research settings the ICC may be unlikely to be as high as 0.1. A way to reduce the ICC is to adjust for covariates within the models [34]. Real data are not generated from a model, and in practice, it is likely that multiple mechanisms are involved.
At the request of a reviewer, further simulations using nonlinear trajectories for both arms were undertaken. The results are in the Additional file 1, and are similar to the main results: unbiased treatment estimation and variance components, except for slightly inflated betweencluster variance estimates when using simulation method 3, random intercepts and slopes. Type I error rate was similar to the primary results, with values slightly inflated for simulation method 3 and higher ICCs.
There have been multiple reviews that have asserted that the analyses for CRTs are incorrect based on whether a mixedmodel ANOVA is used, if there are more than two time points, unless a random
coefficients model is used [25, 26, 39]. Our results contradict this, as the MMRMCRT appears, as a whole, to have good statistical properties. The simulation study upon which these assertions are based, however, did not test the MMRMCRT as we have defined it, and was based on measurements on the same clusters over time, but not the same individuals over time [23]. Other differences include the low number of clusters simulated (five per arm); the assumption of a compound symmetric covariance structure, as opposed to the unstructured covariance in the MMRMCRT; simulation under the null only; and the use of empirical sandwich standard errors as well as restricted maximum likelihood (REML). A low number of clusters, along with empirical sandwich errors, has been shown to increase type I error [18]. Hossain et al. recommend linear mixed models for the analysis of CRTs with missing data over clusterlevel analyses, but their simulation study only used two time points.
While our study focused on endpoint analysis, i.e., comparison of arms at a single time point by using a contrast, the MMRM, for both individual and cluster randomization, can also assess response profiles over time. This allows for testing the difference in patterns of change over time between arms via the interaction of treatment and time [33].
Strengths and limitations
A strength of this study is that we used three different datageneration models, none of which were directly the analysis model, as well as two mechanisms within these simulation methods (same and opposite direction missingness). Our results were fairly consistent, indicating that the MMRMCRT is flexible and general. A limitation of this research is that, as a simulation study, we could not investigate all possible scenarios, of which there are an infinite number. However, this is a limitation of all simulation studies, and we varied several parameters that are important in practice. We only used three postbaseline time points; however, we see no reason why more time points would yield substantively different results. We did not simulate data that were MNAR. Although some studies have shown that MNAR data are modeled fairly well (in terms of bias) using MAR methods such as mixed models and multiple imputation, this is not true in general [31]. Another limitation is that we did not explore the case of unequalsized clusters.
Most trials, both individual and cluster randomized, use analyses that make the strong and unlikely assumption that data are MCAR [8,9,10]. The MMRM makes a MAR assumption, which is more plausible than MCAR. While it is possible that longitudinal trial data are MNAR, MNAR models can be complex and most require strong untestable assumptions. We recommend that MNAR models be considered for sensitivity analyses. MNAR models for CRTs, particularly ones with repeated measures on the same subject, are an emerging research topic; for example, Fiero et al. extended the MNAR patternmixture model to longitudinal cluster trials [40]. Future research should include more MNAR models for CRTs, as well as analytical approaches for longitudinal binary and ordinal outcomes.
Conclusion
The MMRM for individually randomized trials is popular because it uses all the data collected over time; is unlikely to misspecify the functional relationship between time and the outcome; and yields unbiased estimates for data that are MCAR or MAR. Our extension to cluster trials has similar properties, and can be considered as a primary analysis when continuous outcome data are collected at fixed time points.
Availability of data and materials
The simulation data can be recreated using the code from the online data files. The example data are proprietary and cannot be shared.
Abbreviations
 CRT:

Cluster randomized trial
 CVD:

Cardiovascular disease
 ICC:

Intracluster correlation
 MAR:

Missing at random
 MCAR:

Missing completely at random
 MMRM:

Mixed model for repeated measures
 MNAR:

Missing not at random
References
 1.
Hayes RJ, Moulton LH. Cluster randomised trials. Chapman & Hall/CRC: Boca Raton; 2009.
 2.
Craig P, Dieppe P, Macintyre S, et al. Developing and evaluating complex interventions: the new Medical Research Council guidance. BMJ. 2008;337:a1655.
 3.
Sabo S, Denman Champion C, Bell ML, et al. Meta salud diabetes study protocol: a clusterrandomised trial to reduce cardiovascular risk among a diabetic population of Mexico. BMJ Open. 2018;8:e020762. https://doi.org/10.1136/bmjopen2017020762.
 4.
Mallinckrodt CH, Kaiser CJ, Watkin JG, et al. Type 1 error rates from likelihoodbased repeated measures analyses of incomplete longitudinal data. Pharm Stat. 2004;3:71–186.
 5.
Mallinckrodt CH, Lane PW, Schnell D, et al. Recommendations for the primary analysis of continuous endpoints in longitudinal clinical trials. Drug Inf J. 2008;42:303–19.
 6.
Siddiqui O, Ali MW. A comparison of the randomeffects pattern mixture model with lastobservationcarriedforward (LOCF) analysis in longitudinal clinical trials with dropouts. J Biopharm Stat. 1998;8:545–63.
 7.
DeSouza CM, Legedza AT, Sankoh AJ. An overview of practical approaches for handling missing data in clinical trials. J Biopharm Stat. 2009;19:1055–73.
 8.
Bell ML, Fiero M, Horton NJ, et al. Handling missing data in RCTs; a review of the top medical journals. BMC Med Res Methodol. 2014;14:118. https://doi.org/10.1186/1471228814118.
 9.
Fiero MH, Huang S, Oren E, et al. Statistical analysis and handling of missing data in cluster randomized trials: a systematic review. Trials. 2016;17:1–10. https://doi.org/10.1186/s130630161201z.
 10.
Rabe BA, Day S, Fiero MH, et al. Missing data handling in noninferiority and equivalence trials: a systematic review. Pharm Stat. 2018. https://doi.org/10.1002/pst.1867.
 11.
Bell ML, Fairclough DL. Practical and statistical issues in missing data for longitudinal patientreported outcomes. Stat Methods Med Res. 2014;23:440–59. https://doi.org/10.1177/0962280213476378.
 12.
Council NR. The prevention and treatment of missing data in clinical trials. Washington DC: National Academies Press; 2010.
 13.
Kenward MG, Molenberghs G. Last observation carried forward: a crystal ball? J Biopharm Stat. 2009;19:872–88. https://doi.org/10.1080/10543400903105406.
 14.
Ashbeck EL, Bell ML. Single time point comparisons in longitudinal randomized controlled trials: power and bias in the presence of missing data. BMC Med Res Methodol. 2016;16:1–8. https://doi.org/10.1186/s1287401601440.
 15.
Hogan JW, Roy J, Korkontzelou C. Tutorial in biostatistics. Handling dropout in longitudinal studies. Stat Med. 2004;23:1455–97.
 16.
Preisser JS, Lohman KK, Rathouz PJ. Performance of weighted estimating equations for longitudinal binary data with dropouts missing at random. Stat Med. 2002;21:3035–54. https://doi.org/10.1002/sim.1241.
 17.
Crespi CM, Maxwell AE, Wu S. Cluster randomized trials of cancer screening interventions: are appropriate statistical methods being used? Contemp Clin Trials. 2011;32:77–484.
 18.
Huang S, Fiero MH, Bell ML. Generalized estimating equations in cluster randomized trials with a small number of clusters: review of practice and simulation study. Clin Trials. 2016. https://doi.org/10.1177/1740774516643498.
 19.
Anower H, Karla DO, Jonathan WB. Missing continuous outcomes under covariate dependent missingness in cluster randomised trials. Stat Methods Med Res. 2016;26:1543–62. https://doi.org/10.1177/0962280216648357.
 20.
Johnson JL, Kreidler SM, Catellier DJ, et al. Recommendations for choosing an analysis method that controls Type I error for unbalanced cluster sample designs with Gaussian outcomes. Stat Med. 2015;34:3531–45. https://doi.org/10.1002/sim.6565.
 21.
McNeish D, Stapleton LM. Modeling clustered data with very few clusters. Multivar Behav Res. 2016;51:495–518. https://doi.org/10.1080/00273171.2016.1167008.
 22.
Kenward MG, Roger JH. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics. 1997;53:983–97.
 23.
Murray DM, Hannan PJ, Wolfinger RD, et al. Analysis of data from grouprandomized trials with repeat observations on the same groups. Stat Med. 1998;17:1581–600.
 24.
Litttell R. Repeated measures analysis with clustered subjects. Gainesville: SAS Global Forum 2007; 2007.
 25.
Murray DM, Varnell SP, Blitstein JL. Design and analysis of grouprandomized trials: a review of recent methodological developments. Am J Public Health. 2004;94:423–32.
 26.
Turner EL, Prague M, Gallis JA, et al. Review of recent methodological developments in grouprandomized trials: part 2 – analysis. Am J Public Health. 2017;107:1078–86. https://doi.org/10.2105/ajph.2017.303707.
 27.
Mallinckrodt CH, Clark WS, Carroll RJ, et al. Assessing response profiles from incomplete longitudinal clinical trial data under regulatory considerations. J Biopharm Stat. 2003;13:179–90.
 28.
Mallinckrodt C, Lipkovich I. Analyzing longitudinal clinical trial data: a practical guide. Boca Raton: Chapman and Hall/CRC; 2016.
 29.
Carpenter J, Kenward M. Missing data in randomised controlled trials—a practical guide. National Institute for Health Research: Birmingham; 2008.
 30.
Fitzmaurice GM, Laird NM, Ware JH. Applied longitudinal analysis. 2nd ed. Hoboken: Wiley; 2011.
 31.
Molenberghs G, Thijs H, Jansen I, et al. Analyzing incomplete longitudinal clinical trial data. Biostatistics. 2004;5:445–64.
 32.
White IR, Carpenter J, Horton NJ. Including all individuals is not enough: lessons for intentiontotreat analysis. Clin Trials. 2012;9:396–407.
 33.
Mallinckrodt CH, Watkin JG, Molenberghs G, et al. Choice of the primary analysis in longitudinal clinical trials. Pharm Stat. 2004;3:161–9.
 34.
Bell ML, McKenzie JE. Designing psychooncology randomised trials and cluster randomised trials: variance components and intracluster correlation of commonly used psychosocial measures. PsychoOncology. 2013;22:1738–47.
 35.
Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019;38(11):2074–102. https://doi.org/10.1002/sim.8086.
 36.
D’Agostino RB Sr, Vasan RS, Pencina MJ, et al. General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation. 2008;117:743–53. https://doi.org/10.1161/circulationaha.107.699579.
 37.
Bell ML, Kenward MG, Fairclough DL, et al. Differential dropout and bias in randomised controlled trials: when it matters and when it may not. BMJ. 2013;346:e8668.
 38.
Adams G, Gulliford MC, Ukoumunne OC, et al. Patterns of intracluster correlation from primary care research to inform study design and analysis. J Clin Epidemiol. 2004;7:785–94.
 39.
Varnell SP, Murray DM, Janega JB, et al. Design and analysis of grouprandomized trials: a review of recent practices. Am J Public Health. 2004;94:393–9.
 40.
Fiero MH, Hsu CH, Bell ML. A patternmixture model approach for handling missing continuous outcome data in longitudinal cluster randomized trials. Stat Med. 2017;36:4094–105.
Acknowledgements
We are grateful to the team of MetaSalud Diabetes for providing these data.
Funding
Dr. Bell was supported by the National Cancer Institute Cancer Center Support Grant P30 CA023074 and National Institute of Health Grant 1R01HL125996–01. The funders had no role in the design, analysis or interpretation of this study.
Author information
Affiliations
Contributions
MLB designed the study, programmed the simulations and wrote the results. BAR performed the simulations and aided in writing the results.
Corresponding author
Correspondence to Melanie L. Bell.
Ethics declarations
Ethics approval and consent to participate
The example trial was given relevant Institutional Review Board approval. Participants from the example trial provided informed consent.
Competing interests
The authors declare they have no conflicts of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Bell, M.L., Rabe, B.A. The mixed model for repeated measures for cluster randomized trials: a simulation study investigating bias and type I error with missing continuous data. Trials 21, 148 (2020). https://doi.org/10.1186/s1306302041149
Received:
Accepted:
Published:
Keywords
 Missing data
 Dropout
 Variance components
 Intentiontotreat
 Cluster trials
 Group randomized trials
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate. Please note that comments may be removed without notice if they are flagged by another user or do not comply with our community guidelines.