 Methodology
 Open Access
 Open Peer Review
 Published:
Sample size reestimation in crossover trials: application to the AIM HYINFORM study
Trials volume 20, Article number: 665 (2019)
Abstract
Background
Crossover designs are commonly utilised in randomised controlled trials investigating treatments for longterm chronic illnesses. One problem with this design is its inherent repeated measures necessitate the availability of an estimate of the withinperson standard deviation (SD) to perform a sample size calculation, which may be rarely available at the design stage of a trial. Interim sample size reestimation designs can be used to help alleviate this issue by adapting the sample size midway through the trial, using accrued information in a statistically robust way.
Methods
The AIM HYINFORM study is part of the Informative Markers in Hypertension (AIM HY) Programme and comprises two crossover trials, each with a planned recruitment of 600 participants. The objective of the study is to test whether blood pressure response to first line antihypertensive treatment depends on ethnicity. An interim analysis is planned to reassess the assumptions of the planned sample size for the study. The aims of this paper are: (1) to provide a formula for sample size reestimation in both crossover trials; and (2) to present a simulation study of the planned interim analysis to investigate alternative withinperson SDs to that assumed.
Results
The AIM HYINFORM protocol sample size calculation fixes the withinperson SD to be 8 mmHg, giving > 90% power for a primary treatment effect of 4 mmHg. Using the method developed here and simulating the interim sample size reassessment, if we were to see a larger withinperson SD of 9 mmHg at interim, 640 participants for 90% power 90% of the time in the threeperiod threetreatment design would be required. Similarly, in the fourperiod fourtreatment crossover design, 602 participants would be required.
Conclusions
The formulas presented here provide a method for reestimating the sample size in crossover trials. In the context of the AIM HYINFORM study, simulating the interim analysis allows us to explore the results of a possible increase in the withinperson SD from that assumed. Simulations show that without increasing the planned sample size of 600 participants, we can reasonably still expect to achieve 80% power with a small increase in the withinperson SD from that assumed.
Trial registration
ClinicalTrials.gov, NCT02847338. Registered on 28 July 2016.
Background
Randomised crossover trials are a wellestablished design for longterm chronic illnesses such as hypertension [1]. The UK hypertension NICE guidance (CG127) stratifies hypertension treatment by age and selfdefined ethnicity (SDE), with guidelines adopting a ‘black versus white’ approach [2]. The problems with this stratification include a lack of data from UK populations supporting the current SDE stratification and no reference to South Asians – the largest ethnic minority group in the UK [2]. Consequently, the primary objective of the AIM HYINFORM study is to determine if the response to existing firstline antihypertensive drugs differs by ethnic group, white British, black African/African Caribbean or South Asian, for participants who are newly diagnosed or established hypertensives.
The AIM HYINFORM study comprises two openlabel, randomised crossover trials: one threeperiod threetreatment (monotherapy) trial for participants newly diagnosed with hypertension and one fourperiod fourtreatment (dual therapy) trial for participants with existing hypertension. The primary outcome is systolic blood pressure (SBP) mmHg; linear mixed models are the preferred method of analysis for a crossover design with a continuous outcome variable [1]. Repeated measurements of SBP taken from the same participant are correlated and this correlation needs to be accounted for in sample size calculations. That is, sample size estimation for any repeated measures design requires an estimate of the withinperson standard deviation (SD). Taking estimates of the withinperson SD from other studies can be unreliable due to differences in the study population and participant attributes, instruments and measurement techniques, or other background conditions, which can result in trials that are either under or overpowered [3, 4]. With the absence of reliable prior estimates of the withinperson SD available for the AIMHY INFORM study, in particular for South Asian ethnicities, the calculation of the required sample size to ensure the desired power to detect a single treatmentbyethnic interaction is challenging.
To address this issue in many repeated measures contexts, sample size reestimation designs have been considered [3, 5,6,7]. However, there is little, directly applicable work on sample size reestimation at interim for crossover designs. Zucker and Denne describe a strategy to deal with the unknown withinperson SD in a twostage, repeated measures design that examines the accrued data at an interim point, obtaining an estimate of the withinperson SD. They then use this estimate to update the covariance parameter in the linear mixed model and modify the sample size required to ensure the trial has sufficient power [3]. Moreover, a collection of papers have addressed sample size reestimation in bioequivalence trials using a twoperiod twotreatment crossover design [8,9,10,11]. While more recently, methodology for blinded and unblinded sample size reestimation in multitreatment crossover trials balanced for period was described [12]. None of these papers, however, directly allows for reestimation in the context of the AIM HYINFORM study, for reasons that will be described shortly.
Here, we present the framework for sample size reestimation in both our 3 × 3 (monotherapy) and 4 × 4 (dual therapy) settings. Precisely, the study design and models used are described, along with the methods developed for reestimation of the required sample size. The planned interim analysis in the AIM HYINFORM study states in the protocol that after 50 individuals have completed their treatment sequence, the sample size calculation for both the mono and dualtherapy treatment rotations will be recalculated using a midtrial estimate of the withinperson SD. Therefore, following our initial descriptions, results from a simulation study carried out ahead of trial recruitment, with the aim of simulating the interim analysis to explore the effect of a larger withinperson SD from that assumed in the protocol are presented.
Methods
Study design
AIM HYINFORM is a multicentre, prospective study comprising two randomised, openlabel crossover trials (threeperiod threetreatment monotherapy and fourperiod fourtreatment dual therapy) in a multiethnic cohort of hypertensive participants, where each study requires separate randomisation to treatment sequences [2].
Participants on both trials selfidentify into one of the following three ethnic groups (SDE):
White (white British, white Irish or any other white background);
Black or black British (black Caribbean, black African or any other black background);
Asian or Asian British (Asian Indian, Asian Pakistani, Asian Bangladeshi or any other South Asian background).
The monotherapy study is a 24week threeperiod threetreatment crossover trial of newly diagnosed hypertensives with Ambulatory Blood Pressure Monitoring (ABPM) ≥ 135/85 mmHg. After initial screening and baseline measurement collection, participants are randomised with equal allocation to one of six sequences of treatments from two threeperiod threetreatment Latin square designs: ABC; ACB; BAC; BCA; CAB; and CBA [13]. Here, treatment A is 1–2 weeks of Amlodipine 5 mg followed by 6 –7 weeks of Amlodipine 10 mg, treatment B is 1–2 weeks of Lisinopril 10 mg followed by 6–7 weeks of Lisinopril 20 mg and treatment C is approximately eight weeks of 25 mg Chlortalidone (Fig. 1).
The dualtherapy study is a 32week fourperiod fourtreatment crossover trial of established hypertensives with ABPM > 135/85 and < 200/110. After initial screening and baseline measurement collection, participants are randomised with equal allocation to one of four sequences of treatments from a fourtreatment fourperiod Williams square design: ABDC; BCAD; CDBA; and DACB. There are 24 possible Latin squares for a fourtreatment crossover design; the design used here is one of six special cases of the Latin square design which are balanced for firstorder carryover and are known as Williams Squares [13, 14].
For participants on dual therapy, treatment A is eight weeks of Amlodipine 5 mg and Lisinopril 20 mg, treatment B is eight weeks of Amlodipine 5 mg and Chlortalidone 25 mg, treatment C is eight weeks of Lisinopril 20 mg and Chlortalidone 25 mg and treatment D is eight weeks of Amiloride 10 mg and Chlortalidone 25 mg (Fig. 2).
The two randomised crossover trials (monotherapy and dual therapy) are openlabel and require separate randomisation to treatment sequence. To control for balance permuted blocks within strata are implemented with an allocation ratio: 1:1:1:1:1:1 for the monotherapy trial and 1:1:1:1 for the dualtherapy trial.
The monotherapy and dualtherapy trials are distinct and analysed using separate linear mixed models. The primary outcome of both trials is seated automated office SBP mmHg as measured eight weeks (± 4 days) after receiving each treatment. The primary objective of the trials is to test for a significant treatmentbyethnic group interaction.
With the absence of an available estimate of the withinparticipant SD for the trial participants, the protocol estimates a sample size of 600 participants assuming a fixed withinperson SD of 8 mmHg, based on previous clinical trial data in people representative of the general population with either high normal blood pressure or mild hypertension [2]. More precisely, for a withinperson SD of 8 mmHg and a single treatmentbyethnic interaction of 3 mmHg, with all other interactions being 0 mmHg, the protocol outlines that a sample size of 600 participants produces 81.3% power to detect treatmentbyethnic interactions using a global test of interaction at a 5% significance level. For the same fixed withinperson SD of 8 mmHg and a larger single treatmentbyethnic interaction of 4 mmHg, a sample size of 600 participants would give 98% power to detect a single treatmentbyethnic interaction. A 10% overrecruitment allows for loss to followup, resulting in 220 planned enrolments from each ethnic group, for each trial [3].
In order to check the assumptions that the withinperson SD is 8 mmHg, there is a planned interim analysis after approximately 50 participants have completed their treatment sequence. The aim of the interim analysis is twofold: (1) to obtain an estimate of the within participant SD from trial participants; and (2) to use this estimate to calculate the sample size required for either 80 or 90% power to detect a treatmentbyethnic interaction using a global test of interaction at a 5% significance level. The aim here is to describe the method for the sample size reestimation and present results from a simulation study carried out ahead of the interim analysis.
Sample size calculations for the protocol along with simulations, sample size and power calculations for the interim analysis were carried out using Stata Statistical Software: Release 14 (StataCorp LP, College Station, TX, USA).
Models and sample size calculations
Sample size reestimation in the threeperiod threetreatment monotherapy crossover
The linear mixed effects model for the threeperiod threetreatment crossover study has fixed effects for treatment, period, ethnic group and treatmentbyethnic group interactions. Additionally, subject is included as a random effect. This is our unrestricted model. We compare this unrestricted model with a nested restricted model that does not contain the treatmentbyethnic group interaction terms.
Restricted model, assuming n participants are recruited in total:
Unrestricted model:
Here
i = 1, …, n/18 indicates a particular individual, j = 1, 2, 3 indicates the time period, k = 1, …, 6 indicates the sequence a particular individual was allocated and l = 1, 2, 3 indicates which of the three ethnicities a particular individual selfdefined as. That is, i, k and l together completely prescribe a particular individual in the trial (there are n unique combinations of these three indices);
μ is an intercept term (the mean of the values y_{i1k1});
τ_{d(j, k)} is the direct effect of the treatment administered to a participant on sequence k in period j. That is, d(j, k) ∈ {A, B, C}. For identifiability purposes, we set τ_{A} = 0;
π_{j} is a fixed effect for period, with π_{1} = 0 for identifiability;
e_{l} is a fixed effect for ethnic group, with e_{1} = 0 for identifiability;
θ_{d(j, k)l} is a fixed interaction effect for treatment d(j, k) by ethnic group l. For identifiability, we set θ_{A1} = θ_{A2} = θ_{A3} = θ_{B1} = θ_{C1} = 0.
\( {u}_{ikl}\sim {N}_1\left(0,{\sigma}_b^2\right) \) is a random participant effect;
\( {\epsilon}_{ijkl}\sim {N}_1\left(0,{\sigma}_e^2\right) \) is the residual error.
We perform a global Wald test to see if the unrestricted model gives a significantly better fit than the restricted, in order to test the primary hypothesis of whether there is an interaction between treatment and ethnic group. To perform sample size reestimation, we require the sampling distribution of a suitable test statistic under the null and alternative hypotheses.
Precisely, setting θ = (θ_{B2}, θ_{B3}, θ_{C2}, θ_{C3})^{⊤}, we test the following null hypothesis of no treatmentbyethnic interactions
against the following alternative
Denoting by \( \hat{\boldsymbol{\theta}}={\left({\hat{\theta}}_{B2},{\hat{\theta}}_{B3},{\hat{\theta}}_{C2},{\hat{\theta}}_{C3}\right)}^{\top } \) our estimates of the interaction terms computed using conventional maximum likelihood estimation, we can test H_{0} using the following test statistic
Moreover, we power our trial for an alternative in which there is a single nonzero treatmentbyethnic interaction (θ_{B2} = δ, i.e. θ = δ = (δ, 0, 0, 0)^{⊤}).
The complexity of performing a hypothesis test in this setting, either in a fixed design or following sample size reestimation, arises from the fact that the sampling distribution of T, the random unknown value of t, is in general complex to compute. Explicitly, while the numerator degrees of freedom in a suitable Ftest would be 4, the denominator degrees of freedom is difficult to assign. Kenward and Roger [15] provided a comprehensive solution to this problem for fixed sample designs by computing the exact denominator degrees of freedom, but unfortunately their approach does not lend itself naturally to sample size reestimation, as the degrees of freedom specification procedure is datadependent.
Accordingly, Grayling et al. [12] specified the denominator degrees of freedom as that of a corresponding multilevel singlestage ANOVA design. For our unrestricted model this would designate the denominator degrees of freedom, ν, for sample size n, as
Here, the 3n term arises as the total number of measurements accrued, while 1 degree of freedom is subtracted for each participant, and for each fixed effect in the model. Thus, at the interim we suppose that we would reject H_{0} at the end of the trial when t > F^{−1}(1 − α, 0, 4, 2n − 10), where F^{−1}(q, 0, m, n) is the 100q th quantile of an F(0, a, b)distribution (a central Fdistribution on a and b degrees of freedom). Combining this with a suitable noncentral Fdistribution under the chosen alternative at which the trial is to be powered, interim reestimation can then be performed.
Such an approach, however, was found by Grayling et al. [12] to, in many circumstances, lead to a notable inflation of the typeI error rate, and frequently provide power slightly below the desired level under the specified alternative. Therefore, they discussed the utility of a potential αadjustment procedure, of using the above methodology for interim reestimation but performing the final hypothesis test using the method of Kenward and Roger [15], and they explored the advantages of a sample size inflation factor. A problem with these adjustments as individual amendments to the basic reestimation procedure described above, however, is that predicting their impact on both the typeI error rate and the power can be difficult.
Consequently, here, desiring to accurately control the typeI error rate while maintaining a high level of power, we make a heuristic conservative adjustment to the above degrees of freedom based on Hotelling’s T^{2} distribution [16].
Precisely, for x_{1}, …, x_{n}~N_{p}(ϕ, Ω), the test statistic \( {t}^2={\left(\overline{\boldsymbol{x}}\boldsymbol{\phi} \right)}^{\top}\hat{Cov}{\left(\overline{\boldsymbol{x}}\right)}^{1}\left(\overline{\boldsymbol{x}}\boldsymbol{\phi} \right)\sim {T}^2\left(p,n1\right) \), Hotelling’s T^{2} distribution on p and n − 1 degrees of freedom. We can work with this distribution using standard statistical software via the following relationship
In our case, we therefore suppose H_{0} will be rejected when
Moreover, to attain power 1 − β we can search for the minimal integer n such that
for
Such a search is easy to perform using interval bisection over the discrete n. The final problem therefore is to specify \( \hat{\mathbf{Cov}}{\left(\hat{\boldsymbol{\theta}}\right)}^{1} \). In the Appendix, we justify the use of
where \( {\hat{\sigma}}_e^2 \) is the interim estimated within person SD, computed using REML (for the reasons outlined below).
With this, our methodology for reestimating the required sample size in the 3 × 3 trial is complete. However, we must still specify how the final analysis will be performed.
Here, we use REML estimation as it takes into account the uncertainty in the fixed parameters in the model into account when estimating the random parameters, in theory leading to better estimates of the variance components with reduced bias when the number of groups is small [17]. Additionally, we use the Kenward–Roger approximation [15] to compute the denominator degrees of freedom in the final Ftest. These choices are again made in order to limit the possibility of observing inflation to the typeI error rate. As AIM HYINFORM is a confirmatory trial of treatment differences between different ethnic groups, inflation of the typeI errors should be avoided.
Sample size reestimation in the fourperiod fourtreatment dualtherapy crossover
The dualtherapy trial is a fourperiod fourtreatment crossover. It will compare the same restricted and unrestricted models as for the threeperiod threetreatment monotherapy trial. However, we now have
i = 1, …, n/12, j = 1, …, 4, k = 1, …, 4, and l = 1, 2, 3;
d(j, k) ∈ {A, B, C, D};
For identifiability, we set θ_{A1} = θ_{A2} = θ_{A3} = θ_{B1} = θ_{C1} = θ_{D1} = 0.
Here, setting θ = (θ_{B2}, θ_{B3}, θ_{C2}, θ_{C3}, θ_{D2}, θ_{D3})^{⊤}, we again test the following hypotheses
For \( \hat{\boldsymbol{\theta}}={\left({\hat{\theta}}_{B2},{\hat{\theta}}_{B3},{\hat{\theta}}_{C2},{\hat{\theta}}_{C3},{\hat{\theta}}_{D2},{\hat{\theta}}_{D3}\right)}^{\top } \) our test is once more based upon
Applying the Hotelling’s T^{2} based adjustment described above, in this case at interim we suppose H_{0} will be rejected when
and choose the required sample size by determining the minimal n such that
for
with δ = (δ, 0, 0, 0, 0, 0)^{⊤}, and where in the Appendix we now justify the use of
Finally, as for the 3 × 3 trial, the final analysis is performed using REML estimation and the Kenward–Roger correction.
At the interim, 50 participants have completed either three treatment periods if they are on the monotherapy trial, or four treatment periods if they are on the dualtherapy trial. Inherently, we have more data from the fourtreatment crossover and may anticipate better performance in this setting. It is important to note that at this stage we are not concerned with estimating treatment effects, we simply want the estimate, \( {\hat{\sigma}}_e^2 \), of the withinperson variance required for the sample size calculations outlined above.
Simulation design
To consider the variation in the estimate of the withinperson SD, a simulation study was carried out to investigate the effect on the requisite final sample size for different treatment effects and withinperson SD scenarios, based on the sample size reestimation calculations described above. In the protocol, the sample size calculation fixes the SD to be 8 mmHg giving power equal to 98% for a single 4 mmHg interaction effect with 600 participants, and 81.3% power with 600 participants and a single 3 mmHg interaction effect.
Two separate simulation studies have been carried out assuming participants are on either the monotherapy or dualtherapy treatment regimes, in all cases assuming that each participant has received three (monotherapy) or four (dualtherapy) treatments and that there are no missing values. Participants are randomly generated with approximately equal numbers of participants on each of the six treatment sequences—ABC, ACB, BAC, BCA, CAB and CBA—for monotherapy and four treatment sequences—ABDC, BCAD, CDBA and DACB—for dual therapy. We consider scenarios in which 80% and 90% power to detect treatmentbyethnic interactions are desired, using our outlined global test of interaction at a 5% significance level are assumed.
What is important to realise is that in sample size calculations the withinperson SD is typically fixed. In our simulation study, we set the desired power, the strength of a single treatmentbyethnic interaction and a withinperson SD, and solve for sample size across numerous replicates. Accordingly, of principle interest is the distribution of the interim estimated values of the withinparticipant SD and the distribution of the final required sample sizes.
The factors varied and held constant in these simulations are outlined in Table 1, resulting in 2 × 2 = 4 scenarios to assess for each of the two trials, with 1000 simulations carried out for each scenariotrial combination. We assume that there is no carryover effect and no treatmentbyperiod interaction. The betweenperson SD is held constant at 10 mmHg, we do not need to consider varying this as the procedures should be invariant to σ_{b} with a sample size of 50.
Our methodology for reestimating the sample size requires a minimum acceptable sample size. In the protocol, a sample size of 600 is estimated for each monotherapy and dualtherapy trial, based on a withinperson SD of 8 mmHg. As there is no plan to reduce the sample size at interim, we therefore set a minimum sample size of 600 for all simulations and scenarios investigated. Matlab and Stata code for sample size calculations can be found in Additional files 1, 2, 3 and 4.
Results
Simulation of the interim analysis and reestimation of the sample size: monotherapy
In the trial protocol, the sample size calculation fixed the SD to be 8 mmHg, giving > 80% power with 600 participants. With simulations, sampling variation means that we have a range of possible estimates of withinperson SD which means that in this case 90% of the time 640 participants would usually be fine to give us 90% power to detect the primary effect for a within person SD of 9 mmHg and a single treatmentbyethnic interaction of 4 mmHg (Fig. 3). A larger withinperson SD of 10 mmHg would mean that in 75% of the time 705 participants would usually be fine to give us 90% power to detect the same planned treatment effect (Fig. 3).
Assuming a smaller single treatmentbyethnic interaction of 3 mmHg, and a 1 mmHg increase in the assumed withinperson SD, 75% of the time with a sample size of 797 participants would give us 80% power to detect the primary hypothesis. A 2 mmHg increase in withinperson SD from that assumed in the protocol means that 75% of the time with a sample size of 979 participants would give us 80% power detect a treatment effect of 3 mmHg (Fig. 3). As would be expected, smaller values of δ, and larger values of σ_{e}, imply larger required sample sizes. Figure 3 shows for the scenario with σ_{e} = 9 mmHg and δ = 4 mmHg more than the planned 600 particpiants are required because of the nature of sample size reestimation and the methodology used here. The conservative approach adopted here to try and control the typeI error rate means we have to push the sample size up a little to keep the typeII error rate down. That is, the use of Kenward–Roger and the Hotelling adjustments pushes the power down compared to the methodology used for the initial 600 estimate, so it is inevitable that the simulations here produce > 600 for the sample size reestimation designs.
Simulation of the interim analysis and reestimation of the sample size: dual therapy
For the dualtherapy trial, sample sizes required for the different simulation scenarios are similar to those estimated for the monotherapy trial. For 90% power to detect a treatment effect of 4 mmHg, 90% of the time 602 participants would usually be fine when the withinperson SD is 9 mmHg. A larger withinperson SD of 10 mmHg would mean that 75% of the time 692 participants would usually be fine to give us 90% power to detect the same treatment effect (Fig. 4). In both scenarios, the dualtherapy trial requires slightly fewer participants.
Assuming a smaller single treatmentbyethnic interaction of 3 mmHg, a 1 mmHg increase in the assumed withinperson SD 75% of the time a sample size of 782 participants would give us 80% power to detect the primary hypothesis. A 2 mmHg increase in withinperson SD from that assumed in the protocol means that 75% of trials with a sample size of 966 participants would give us 80% power detect a planned treatment effect of 3 mmHg; again, in both scenarios, fewer participants are required than in the monotherapy trial, as a result of having more measurements overall in the dualtherapy trial.
In summary, for the same simulated treatmentbyethnic interaction an increase in withinperson SD requires a large sample size. For the same simulated withinperson SD, a smaller planned treatment effect also requires a larger sample size, which is what would be expected from sample size calculation theory. The fact that the dualtherapy trial when compared with the monotherapy trial requires slightly fewer participants when comparing likeforlike is a result of the increased number of denominator degrees of freedom in the sample size calculation for the fourperiod fourtreatment compared with the threeperiod threetreatment crossover: 3n – 14 = 136 compared with 2n – 10 = 90 when n = 50. The increased number in the denominator degrees of freedom in the fourperiod fourtreatment compared with the threeperiod threetreatment crossover is in turn due to the increased number of observations per participant in the fourbyfour crossover trial.
Discussion
A novel method for sample size reestimation has been described for threeperiod threetreatment and fourperiod fourtreatment crossover trials. Here we have dealt with a more complicated covariance matrix in a 3 × 3 and 4 × 4 randomised crossover setting that incorporates both a global test and allows for interaction terms in the linear mixed model.
The simulation study allowed us to explore the outcome for a possible increase in the withinperson SD from that assumed and used for the sample size calculations in the protocol ahead of trial recruitment and the interim analysis. As would be expected, an increase in the withinperson SD or a smaller primary treatment effect would require a larger sample size. The fact that the dualtherapy trial requires slightly fewer participants than the monotherapy trial when all variables are like for like is a consequence of the increased degrees of freedom in the denominator of the Ftest which is used in the sample size calculation. The simulation of the interim sample size indicates that we can only realistically aim for 80% in these scenarios without increasing the sample size above 600 participants.
Conclusions
The formulas presented here provide a means for reestimating the sample size in both threeperiod threetreatment and fourperiod fourtreatment crossover trials. In the context of the AIM HYINFORM study, simulating the planned interim analysis allows us to explore the outcome for a possible increase in the withinperson SD from that assumed in the protocol. Simulations show that without increasing the planned sample size of 600 participants, on each crossover trial, we can reasonably still expect to achieve 80% power with a small increased in the within person SD from that assumed.
Availability of data and materials
The datasets generated and analysed during the current study are available from the corresponding author on reasonable request.
Abbreviations
 ABPM:

Ambulatory blood pressure monitoring (mmHg)
 AIM HYINFORM:

ComparIsoN oF Optimal HypeRtension RegiMens (Part of the Informative Markers in Hypertension (AIM HY) Programme – AIM HYINFORM)
 SBP:

Systolic blood pressure (mmHg)
References
 1.
Jones B, Kenward MG. Design and analysis of crossover trials. 3rd ed. Boca Raton: Chapman and Hall/ CRC Press; 2014.
 2.
Mukhtar O, Cheriyan J, Cockcroft JR, Collier D, Coulson JM, Dasgupta I, et al. A randomized controlled crossover trial evaluating differential responses to antihypertensive drugs (used as mono or dual therapy) on the basis of ethnicity: The comparIsoN oF Optimal Hypertension RegiMens; part of the Ancestry Informative Markers in HYpertension program—AIMHY INFORM trial. Am Heart J. 2018;204:102–8.
 3.
Zucker DM, Denne J. Samplesize redetermination for repeated measures studies. Biometrics. 2002;58:548–59.
 4.
Lenth RV. Some practical guidelines for effective sample size determination. Am Stat. 2001;55(3):187–93.
 5.
Guo Y, Logan HL, Glueck DH, Muller KE. Selecting a sample size for studies with repeated measures. BMC Med Res Methodol. 2013;13:100. https://doi.org/10.1186/1471228813100.
 6.
Lui GF. Sample size calculations for studies with correlated observations. Biometrics. 1997;53(30):937–47.
 7.
Shih WJ, Gould AL. Reevaluation design specifications of longitudinal clinical trials without unblinding when the key response is rate of change. Stat Med. 1995;14:2239–48.
 8.
Potvin D, DiLiberti CE, Hauck WW, Parr AF, Schuirmann DJ, Smith RA. Sequential design approaches for bioequivalence studies with crossover designs. Pharm Stat. 2008;7:245–62.
 9.
Montague TH, Potvin D, Diliberti CE, Hauck WW, Parr AF, Schuirmann DJ. Additional results for ‘Sequential design approaches for bioequivalence studies with crossover designs’. Pharm Stat. 2012;11:8–13.
 10.
Golkowski D, Friede T, Kieser M. Blinded sample size reestimation in crossover bioequivalence trials. Pharm Stat. 2014;13:157–62.
 11.
Xu J, Audet C, DiLiberti CE, Hauck WW, Montague TH, Parr AF, et al. Optimal adaptive sequential designs for crossover bioequivalence studies. Pharm Stat. 2016;15:15–27.
 12.
Grayling MJ, Mander AP, Wason JMS. Blinded and unblinded sample size reestimation in crossover trials balanced for period. Biom J. 2018;60:917–33.
 13.
Senn S. Crossover trials in clinical research. 2nd ed. Chichester: Wiley; 2002.
 14.
Williams EJ. Experimental designs balanced for the estimation of residual effects. Aust J Sci Res. 1949;2:149–68.
 15.
Kenward MG, Roger JH. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics. 1997;53:983–97.
 16.
Hotelling H. The generalization of Student’s ratio. Ann Math Statist. 1931;2(3):360–78.
 17.
Maas CJM, Hox JJ. Sufficient sample sizes for multilevel modelling. Methodology. 2005;1(3):86–92.
 18.
Fitzmaurice GM, Laird NM, and Ware JH. Applied Longitudinal Analysis, NJ: Wiley. Second edition; 2011.
Acknowledgements
We would like to acknowledge funding from the Medical Research Council, the British Heart Foundation and sponsorship from Cambridge University Hospitals NHS Foundation Trust for the Ancestry and biological Informative Markers for stratification of Hypertension: the (AIM HY) Study, and Chief Investigators Prof. Ian B. Wilkinson and Prof. Phil J. Chowienczyk. JW would like to thank Professor Ian White and Dr. Tim Morris for simulation study methods advice. JW is supported by the Medical Research Council grant reference MR/M016560/1, M.J.G. and A.P.M are supported by Medical Research Council grant reference MC_UU_00002/3. We are grateful to Caroline Fairhurst and Helen Mossop for taking the time to review our work and for their constructive comments that have improved this manuscript.
Funding
The AIM HY program is cofunded by the Medical Research Council and British Heart Foundation (Medical Research Council Reference: MR/M016560/1; EUDRACT number 2016–00016523).
Author information
Affiliations
Contributions
MJG developed the method for sample size reassessment. JW produced all simulation results and analysis of the data. JW drafted the manuscript. MJG and APM critically revised the manuscript for important intellectual content. All authors approved the final version.
Corresponding author
Correspondence to Julie Wych.
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Consent for publication was not required as no individual person data are reported.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Here, we justify our specified values for \( \hat{\mathbf{Cov}}\left(\hat{\boldsymbol{\theta}}\right) \), used in the sample size reestimation procedures. To begin, consider the 3 × 3 trial and cast our unrestricted linear mixed model in the form
where β = (μ, π_{2}, π_{3}, τ_{B}, τ_{C}, e_{2}, e_{3}, θ_{B2}, θ_{B3}, θ_{C2}, θ_{C3})^{⊤}. Then, the maximum likelihood estimator of the fixed effects β at the end of trial is
with Σ = ZCov(u)Z^{T} + Cov(ϵ). We then have
See, for example, Fitzmaurice et al. [18] for further details.
In our case, \( \mathbf{Cov}\left(\boldsymbol{u}\right)={\sigma}_b^2{\boldsymbol{I}}_n \) and \( \mathbf{Cov}\left(\boldsymbol{\epsilon} \right)={\sigma}_e^2{\boldsymbol{I}}_{3n}. \) This implies that Σ is block diagonal, with n blocks of the 3 × 3 matrix Σ_{3}, say, where
It can then be shown by verifying \( {\boldsymbol{\Sigma}}_3{\boldsymbol{\Sigma}}_3^{1}={\boldsymbol{I}}_3 \) that
Now
where X_{kl} is the design matrix for a single individual on treatment k, of ethnicity l.
Thus \( \mathbf{Cov}\left(\hat{\boldsymbol{\beta}}\right) \) can be found by evaluating the above expression, which can be readily achieved computationally using a symbolic algebra package. Performing these calculations in Matlab, and extracting the subcomponent corresponding to \( \hat{\boldsymbol{\theta}} \), we identified that
It is for this reason that we utilise
in our reestimation procedure.
Equivalent calculations for the 4 × 4 trial also demonstrate that
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Wych, J., Grayling, M.J. & Mander, A.P. Sample size reestimation in crossover trials: application to the AIM HYINFORM study. Trials 20, 665 (2019) doi:10.1186/s1306301937246
Received
Accepted
Published
DOI
Keywords
 Crossover trial
 Sample size calculation
 Simulation study
 AIM HYINFORM study
 Hypertension
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate. Please note that comments may be removed without notice if they are flagged by another user or do not comply with our community guidelines.