A randomised trial of adaptive pacing therapy, cognitive behaviour therapy, graded exercise, and specialist medical care for chronic fatigue syndrome (PACE): statistical analysis plan

Background The publication of protocols by medical journals is increasingly becoming an accepted means for promoting good quality research and maximising transparency. Recently, Finfer and Bellomo have suggested the publication of statistical analysis plans (SAPs).The aim of this paper is to make public and to report in detail the planned analyses that were approved by the Trial Steering Committee in May 2010 for the principal papers of the PACE (Pacing, graded Activity, and Cognitive behaviour therapy: a randomised Evaluation) trial, a treatment trial for chronic fatigue syndrome. It illustrates planned analyses of a complex intervention trial that allows for the impact of clustering by care providers, where multiple care-providers are present for each patient in some but not all arms of the trial. Results The trial design, objectives and data collection are reported. Considerations relating to blinding, samples, adherence to the protocol, stratification, centre and other clustering effects, missing data, multiplicity and compliance are described. Descriptive, interim and final analyses of the primary and secondary outcomes are then outlined. Conclusions This SAP maximises transparency, providing a record of all planned analyses, and it may be a resource for those who are developing SAPs, acting as an illustrative example for teaching and methodological research. It is not the sum of the statistical analysis sections of the principal papers, being completed well before individual papers were drafted. Trial registration ISRCTN54285094 assigned 22 May 2003; First participant was randomised on 18 March 2005.

documentation [7]. More recently, calls have been made for the publication of other key trial documentation. Chan, for instance, has argued the case for public access to regulatory agency submissions [8]. In an editorial for Critical Care and Resuscitation, Finfer and Bellomo suggested the publication of statistical analysis plans [9]. The plans for the NICE-SUGAR (Normoglycaemia in Intensive Care Evaluation and Survival Using Glucose Algorithm Regulation) and RENAL (Randomised Evaluation of Normal versus Augmented Level of Replacement Therapy) studies [10,11] were published in the same issue.
A statistical analysis plan (SAP) is defined within the International Conference on Harmonisation's guidance on the statistical principles for clinical trials (ICH E9) as 'a document that contains a more technical and detailed elaboration of the principal features of the analysis described in the protocol, and includes detailed procedures for executing the statistical analysis of the primary and secondary variables and other data' [12]. According to ICH E9, the statistical analysis plan should be prespecified, completed after the protocol has been finalised but reviewed and possibly updated as a result of a blind review of the data carried out after the completion of data collection. It is suggested that details of the primary analysis should be clearly distinguished from those of supporting analyses and that the methods for handling missing data, outliers and multiplicity be described [12]. While the statistical analysis plan is clearly an important document, at present it is rarely made available to people outside of the study.
There are many reasons why study-specific statistical analysis plans should be published in full, with electronic journals offering the greatest potential for this to be commonplace. Due to space constraints, the paper providing the principal results often contains only a very limited description of the analyses that were planned or carried out. If the study protocol is published, further information is likely to be available. However, this is often insufficient to enable full replication of the analyses. The statistical analysis plan complements both the protocol and the principal paper by providing a systematic and comprehensive description of the planned analyses, taking into consideration any relevant methodological or clinical developments that may have arisen since the study's inception. Its publication enables any changes to the original plan to be laid out, increasing the scientific rigour and transparency with which the principal analyses are currently reported.
Maximum transparency regarding what decisions were made a priori could be achieved by publishing the statistical analysis plan, which has been approved by the Trial Steering Committee (TSC), before the results of a study are known. The final analyses reported may differ from those planned, allowing for post-hoc analysis where it is indicated (as Finfer and Bellomo [9] have noted), reporting alternative methods if statistical models do not converge, and omitting planned analyses that are superseded, redundant, or no longer of interest. Assessment of the validity of the analyses, reporting and consequent interpretation would also be made easier by the increased visibility of selective or misreporting. This may, in turn, encourage more balanced, accurate and complete reporting of results and ultimately help to raise the standard of trial analyses. Peer review has particular advantages, as it encourages dialogue, the quality of which is likely to be improved by the level of detail given. Knowledge of this added scrutiny should, in turn, act to promote the quality of the submitted plan. This process would be especially valuable if the research is anticipated to generate debate or if it might have a large impact on clinical practice.
The benefits of publication go beyond those specific to the study. Making statistical analysis plans accessible will help future statisticians and other researchers design and analyse better studies. This is because each study throws up different issues, often more complex than the standard textbook ones. Publishing details of the ways in which different groups choose to address these helps to generate discussion and could also promote greater communication and collaboration between methodologists, applied statisticians and researchers.

The PACE trial
The rationale for the trial is outlined in the protocol [13] and main clinical paper [14]. To be brief, chronic fatigue syndrome is characterised by chronic disabling fatigue in the absence of an alternative diagnosis, present in 0.2 to 2.6% of the population. The National Institute for Health and Clinical Excellence (NICE, UK) recommends two treatments: cognitive behaviour therapy (CBT) and graded exercise therapy (GET), but patient organisations recommend a third treatment: adaptive pacing therapy (APT). A definitive randomised trial was therefore needed to compare all three treatments with specialist medical care (SSMC) and to compare the established treatments (CBT, GET) against the new treatment (APT).
The objective of this paper was to make public and report in detail the planned analyses for the principal papers of the PACE (Pacing, graded Activity, and Cognitive behaviour therapy; a randomised Evaluation) trial, using the template statistical analysis plan developed by the Mental Health and Neuroscience (MH&N) Clinical Trials Unit based at the Institute of Psychiatry. These planned analyses were written with a view to publication and are reproduced almost as they were approved by the Trial Steering Committee (Version 1.2 dated 2 May 2010) prior to database lock. The changes from the original document were editorial clarifications suggested by reviewers and editors for which we are most grateful; these changes in no way alter the strategy for analysis. The SAP supplements the published protocol [13], the main clinical [14] and health economics [15] papers and the authors' reply [16] to a selection of correspondence published by the Lancet [17][18][19][20][21][22][23][24]. They also provide an illustration of the planned analyses of a complex intervention trial taking into account the impact of clustering by care providers, where multiple care providers are present for each patient in some but not all arms of the trial. Details of the statistical aspects of multiple therapist-per-patient designs are published elsewhere [25].

Statistical analysis plan Introduction
Purpose and scope of statistical analysis strategy This document details the presentation and analysis strategy for the principal paper(s) reporting results from the PACE Trial. It is intended that the results reported in these papers will follow the strategy set out here; subsequent papers of a more exploratory nature will not be bound by this strategy but will be expected to follow the broad principles laid down for the principal papers. The principles are not intended to curtail exploratory analysis or to prohibit sensible statistical and reporting practices, but they are intended to establish the strategy that will be followed, as closely as possible, when analysing and reporting the trial. Reference was made to the published trial protocol [13], ICH Guidance on Statistical Principles for Clinical Trials (E9) [12], CPMP points to consider on multiplicity [26], and CONSORT guidelines for the reporting of harms [27] and for non-pharmacological treatment trials [28].
Analysis strategy group The Statistical Analysis Strategy was developed by the PACE Analysis Strategy Group whose members were: Convention Throughout this Statistical Analysis Strategy the four individual randomised interventions are referred to as APT (adaptive pacing therapy plus standardised specialist medical care), CBT (cognitive behaviour therapy plus standardised specialist medical care), GET (graded exercise therapy plus standardised specialist medical care), and SSMC (standardised specialist medical care alone).
Unless stated otherwise 'intervention' refers to the four randomised interventions (group), and 'therapy' refers to APT, CBT, or GET. 'Treatment' is used more generally and embraces all forms including drugs.
The anchoring date for visits and assessments is randomisation; thus 24 weeks refers to 24 weeks from randomisation.

Trial design and objectives
Study objectives The PACE trial aims to answer the questions set out below under primary objectives, secondary objectives, and health economics objectives.
Primary objectives: 1. Is APT more effective than SSMC in reducing (i) fatigue or (ii) disability up to 52 weeks from randomisation? 2. Is CBT more effective than SSMC in reducing (i) fatigue or (ii) disability up to 52 weeks from randomisation? 3. Is GET more effective than SSMC in reducing (i) fatigue or (ii) disability up to 52 weeks from randomisation? 4. Is CBT more effective than APT in reducing (i) fatigue or (ii) disability up to 52 weeks from randomisation? 5. Is GET more effective than APT in reducing (i) fatigue or (ii) disability up to 52 weeks from randomisation?
Secondary objectives: 1. Is the pattern of results relating to the primary objectives replicated with the outcome as the participants' self-rated clinical global impression change rating? 2. Do different interventions have differential effects on the two primary outcomes (that is, fatigue versus disability)? 3. Are the differences across interventions in the primary outcomes associated with similar differences in secondary outcomes?

Health economics objectives
The primary health economics objectives are as indicated below: Secondary outcome measures include safety outcomes, efficacy outcomes and health economics outcomes.
Safety outcomes are: 1. Serious deterioration (primary) defined as one or more of the following up to 52 weeks: a. SF-36 physical function score diminishing by 20 or more points between baseline and any two consecutive assessment interviews. b. Participant-rated CGI change score of "much worse" or "very much worse" at two consecutive assessment interviews. c. Withdrawal from therapy (APT, CBT, or GET) later than 8 [36].
A selection of the above efficacy outcomes will be reported in the primary paper as required to aid interpretation of the primary outcomes; other secondary outcomes will be reported in subsequent papers. The selection will, in part, be determined by space constraints.

Trial design
APT is based on the illness model of CFS/ME as a currently undetermined organic disease, with the assumption that APT can improve quality of life while not affecting the core disease other than providing the best conditions for natural recovery. APT is essentially an energy management approach, which involves assessment of the link between activity and subsequent symptoms and disability, using a daily diary, with advice to plan and pace activity to avoid exacerbations. 2. Cognitive behavioural therapy + standardised specialist medical care (CBT). CBT is based on the illness model of fear avoidance, used in the three previous trials of CBT [39][40][41].
There are three essential elements: (a) assessment of illness beliefs and coping strategies; (b) structuring of daily rest, sleep, and activity, with a graduated return to normal activity; and (c) collaborative challenging of unhelpful beliefs about symptoms and activity. 3. Graded exercise therapy + standardised specialist medical care (GET). GET is based on the illness model of deconditioning and exercise intolerance, used in the previous trials [42,43]. Therapy involves an assessment of physical capacity, negotiation of an individually designed home exercise programme with set target heart rates and times, and participant feedback with mutual planning of the next fortnight's exercise programme. 4. Standardised specialist medical care alone (SSMC).
SSMC is given to all participants and includes visits to the clinic doctor with general, but not specific advice, regarding activity and rest management, such as advice to avoid the extremes of exercise and rest, as well as symptomatic pharmacotherapy. SSMC is standardised in the SSMC Doctor's Manual. SSMC participants, like all other participants, will already have received the patient clinic leaflet (PCL). There will be no additional therapist involvement, and, in particular, there will be no diary monitoring with consequent advice.
Participating centres: Table 1 gives the details of the participating centres, including their IDs.

Sample size calculation taken from the protocol (v5.2)
The following is quoted from the PACE trial protocol (v5.2) (see also [13]) and describes sample size estimation based on percentages responding to the trial interventions. The primary outcomes were changed subsequently to measures on continuous scales.
At one year we assume that 60% will respond with CBT, 50% with GET, 25% with APT, and 10% with SSMC. The existing evidence suggests that at one-year follow-up, 50 to 63% of participants with CFS/ME had a positive outcome, by intention to treat, in the three RCTs of rehabilitative CBT [39][40][41] with 69% improved after an educational rehabilitation that closely resembled CBT [44]. This compares with 18 to 63% improved in the two RCTs of GET [42,43] and 47% improvement in a clinical audit of GET [45]. For usual medical care 6 to 17% improved by one year in two RCTs [40,41]. There are no previous RCTs of APT to guide us, but we estimate that APT will be at least as effective as the control therapy of relaxation and flexibility used in previous RCTs, with 26 to 27% improved on primary outcomes [39,43].
Our planned intention to treat analyses will compare APT against SSMC alone, and both CBT and GET against APT. Assuming α = 5% and a power of 90%, we require a minimum of 135 participants in the SSMC alone and APT groups, 80 participants in the GET group and 40 in the CBT group [46]. However these last two numbers are insufficient to study predictors, process, or cost-effectiveness. We will have low statistical power to detect the difference between CBT and GET, though our estimates will be useful in planning future trials. As an example, to detect a difference in responder rates of 50 and 60%, with 90% power, would require 520 participants per group; numbers beyond a realistic two-arm trial. Therefore, we will study equal numbers of 135 participants in each of the four arms, which gives us greater than 90% power to study differences in efficacy between APT and both CBT and GET. We will adjust our numbers for dropouts, at the same time as designing the trial and its management to minimise dropouts. Dropout rates were 12 and 33% in the two studies of GET [42,43] and 3, 10, and 40% in the three studies of rehabilitative CBT [39][40][41]. On the basis of our own previous trials we estimate a dropout rate of 10%. We therefore require approximately 150 participants in each intervention group, or 600 participants in all. Calculation of the sample size required to detect economic differences between intervention groups requires data on cost per change in outcome, which are not currently available.
Stratification at randomisation Allocation of interventions to participants was by minimisation with a random component [47] and four stratification factors: Centre (6 strata): 1 and 4, 2, 3, 5, 6, 7 CDC Criteria (2 strata): Met or unmet London Criteria (2 strata): Met or unmet Current Depressive Disorder (2 strata): Present or absent Participants found to be incorrectly stratified will be kept in their original strata for the primary analysis in accordance with the principle of intention-to-treat (ITT) [48]. The extent of incorrect stratification will be reported.

Data collection
Screening measures A clinic patient log book was kept of all new CFS/ME referrals to trial centres to facilitate monitoring of recruitment to the trial. Screening information will not be used in the analysis except for: 1. Reasons for patients not taking part in the trial (see Participant Flow).

Chalder Fatigue Questionnaire and SF-36 Physical
Functioning subscale scores where these are not available at baseline (see Method for Handling Dropouts and Missing Data).
Baseline and outcome measures The information collected at baseline and follow-up is presented in Table 2.
In addition, details were recorded for i) training sessions, ii) therapist competency, iii) quality control checks of therapy sessions, and iv) homework compliance assessments that were made after every therapy session and that will be summarized as part of the general description of intended intervention policies.
Primary and secondary outcome variables will be derived from the follow-up data at each relevant timepoint as follows.
Primary outcomes: i) Fatigue total score (Likert scoring, higher scores indicate more fatigue).
ii) Physical disability total score (sum of 10 items multiplied by 5, lower scores indicate more disability).
Secondary outcomes are presented in Table 3.
Trial periods (recruitment and follow-up) Recruitment was initially intended to be ongoing for 36 months, with three centres recruiting during the first 12 months, six centres recruiting during the subsequent 24 months, and three centres recruiting at twice the annual rate during the last 12 months. Due to a funding extension, a seventh centre (Bristol) was added and recruitment was ongoing for 45 months. SSMC is ongoing over 52 weeks; therapy in APT, CBT and GET is ongoing for the first 23 weeks with one booster session between 36 and 52 weeks. Participants are followed up at 12, 24, and 52 weeks.
Research visit window definitions Screening data are collected prior to baseline visit 1; baseline data are collected prior to randomisation. Baseline visit 2 is at least one week after baseline visit 1. Baseline CFQ and SF-36PF should be collected within one month prior to randomisation. Follow-up data should be collected within one week of the expected date where possible.
Week 52 follow-up data can be collected at any time after week 52 with no specified upper time limit other than the end of 52 week data collection for the trial (31 January 2010).
When research visits fall outside of the guidance window, they will be analysed according to the most appropriate time point. Specifically, planned visits taking place up to 18 weeks will be used for the 12-week data, while the closest planned visit will be used for the 24-and 52-week data. If a planned visit data is missing, previous unscheduled visit data can be used instead.
Visit windows will be summarised to indicate whether their distribution is similar across interventions; the use of unscheduled visits will also be summarised.
Where variation in visit times is large, or the average visit time differs across interventions, time will be fitted as a continuous instead of a categorical variable. This decision will be made by a consensus judgement of the authors.

General considerations
Blinding of the statistical analysis This document has been developed without reference to the PACE trial database. No analyses of outcomes relating to this strategy have been, or will be, conducted prior to final written approval of the analysis strategy by the TSC. Reports have been prepared with data presented descriptively by intervention (coded to maintain blinding) for the closed  Table 2 Timing of research assessments (Continued) Step test Baseline was conducted over two research visits prior to randomisation; b The SIQ is now known as the Cognitive and Behavioural Questionnaire; c The therapist and doctor data will be kept separate from the trial database and summarised by the  sessions of the Data Monitoring Committee. Consequently, both DMC and TSC were blind to intervention group, as were the trial statisticians. Data cleaning will be performed as blind to intervention allocation as possible. Decisions made during analysis concerning data or additional analyses will be documented.
Trial samples Numbers (and percentages) of participants satisfying the following definitions will be reported overall and by intervention.
Intention-to-treat sample The intention-to-treat (ITT) sample is defined as all participants who were randomised into the trial included in the intervention to which they were randomised, regardless of the presence or absence of follow-up data. Participants will be included in the stratum in which they were randomised.
Available-case sample The available case sample is defined as all participants who were randomised into the trial, who have any outcome data available for analysis, included in the stratum and intervention to which they were randomised. This sample will be a subset of the ITT sample, excluding randomised participants who have no outcome data.
Per-protocol sample The per-protocol sample is defined as all participants who were randomised into the trial, who met trial eligibility criteria, and who followed their randomised intervention policy at the centre in which they were randomised; they will be included in the intervention to which they were randomised and with their correct stratum. This sample will be a subset of the ITT sample, excluding randomised participants who (i) are confirmed not to have met trial eligibility criteria at randomisation, and (ii) departed from their randomised intervention policy at any point up to 52 weeks.
As-treated sample The as-treated sample is defined for the health economic analyses as all participants who were randomised into the trial and received one of the trial interventions. This sample will be a subset of the ITT sample, excluding participants who have not received any of the four interventions. Participants will be assigned to their received therapy rather than to their randomised intervention if these disagree.
Safety sample The safety sample is the ITT sample for this trial. Departure from intended therapy (APT, CBT, GET) Departures from intended therapy refer to discrepancies between the intended therapy (as described in the therapy manuals) and the manner in which the therapies were actually delivered within the trial. To assess the extent of fidelity to the manuals as well as the distinguishability of the therapies, a random sample of audio recordings of therapy session number 10 will be independently and blindly assessed at the end of the trial. This will be done by competent therapists who do not have specific allegiance to any of the three forms of therapy. The sample will be of sufficient size to ensure that at least one tape from each therapist will be assessed. Each tape will be evaluated by two raters using a treatment integrity schedule specifically designed for the purpose. The scheme will be piloted using three tapes from each therapy, nine in total. Inter-rater reliability will be assessed and the ratings reported using descriptive statistics.
Departures from randomised intervention policy The overall definition of departures from the randomised intervention policy is given in terms of session attendance as: a. Fewer than three sessions of SSMC (participants allocated SSMC only) b. Fewer than ten sessions of APT, CBT or GET (participants allocated these therapies) The number of sessions includes both face-to-face sessions and those conducted over the telephone. Within this definition, formal withdrawal from intervention after three sessions of SSMC or ten sessions of APT, CBT, or GET have been completed will not be regarded as a departure from the randomised intervention policy. However any participant withdrawing from his or her randomised intervention, or initiating another trial therapy prior to the above cut-offs would be regarded as a departure from the randomised intervention (it will be noted when this was by mutual consent). This includes participants randomised to SSMC who, in fact, receive APT, CBT or GET as a trial therapy. The overall compliance variable will therefore be binary separating those who followed their randomised intervention policy from those who did not.
The average (and range) of the numbers of therapy and SSMC sessions attended will be reported by intervention.

Withdrawals from intervention
The decision to withdraw a participant from an intervention is made by the clinician or the participant (active withdrawals).
The number of active withdrawals (broken down by initiator (participant, clinical staff, both)) will be reported by intervention and centre, and by interval from randomisation. The most common reasons for withdrawal will be summarised.

Withdrawals from the trial and losses to follow-up
The decision to withdraw a participant from follow-up within the trial is made only when the participant withdraws their consent to research follow-up. All reasonable attempts are made to continue to follow up all participants, including those that withdraw from intervention.
For the purposes of analysis, losses to follow-up are those missing all primary outcome scale data at all follow-up assessments, those missing all primary outcome scale data at weeks 24 and 52, or those missing all primary outcome scale data at week 52.
The numbers of withdrawals and losses to follow-up will be reported (see Comparisons of Losses to Follow-Up).

Statistical considerations
Stratification in the analysis The primary analysis of therapy effect will be adjusted by the factors used for stratification at randomisation (that is, centre, CDC criteria, London criteria and current depressive disorder) [12,49] and by the baseline assessment of the outcome variable.
Method for handling centre effects The PACE trial was designed with variation in participant outcomes between centres rather than between doctors or therapists in mind. For the primary analysis to be consistent with the trial design, the primary method for handling contextual variation in the analysis of therapy effects will be to include centre as a fixed covariate. The centre that randomises the largest number of participants will be the reference category. The centre assigned to each participant will be based on the participant's centre at randomisation. Consideration will also be given to including centre as a random effect [50].
Method for handling other clustering effects Outcomes at weeks 12, 24 and 52 are nested within participants. The primary method for handling clustering associated with repeated measurements will be to fit a clusterspecific random effects model [51][52][53] including the participant as a random intercept, and investigating the addition of a random slope over time. Where therapy effects cannot be interpreted as population-averaged effects because outcomes are binary, a population-average (GEE) model will also be fitted.
Due to (i) the nesting of participants within therapists and doctors; (ii) the partial nesting of therapists within APT, CBT, and GET as there was no therapist involvement in SSMC; and (iii) the crossing of doctors with interventions, variation in participant outcomes between therapists, and in intervention effects between SSMC doctors, are recognised to be potential sources of clustering in this trial [54]. The data structure envisaged in the design (Figure 2) differs from that observed in practice due to a number of planned deviations resulting from unavoidable therapist absences (section 8.6 of Protocol v5.2). To summarise: a. Local centre cover delivered by a PACE therapist of the same discipline working in a nearby centre will mean that some therapists will be crossed with centres. b. Distant therapy delivered by a PACE therapist of the same discipline means that participants will not always be seen by a single therapist. c. Cross-cover therapy delivered by a PACE therapist of a different discipline means that participants will not always be seen by a single therapist and some therapists may be crossed with the therapies. d. Recruitment of a replacement therapist means that more than one therapist per centre may deliver each therapy.
It is also possible that participants may be seen by more than one SSMC doctor over the course of the trial.
These deviations are anticipated to affect less than 10% of the trial participants. We will initially assume independence of outcomes within therapists/doctors in the primary analysis. Two further analyses are planned, using two-level heteroscedastic models assuming a fully nested design [54], with clusters based on i) the main care provider and ii) the pair comprising the main therapist and the main doctor to assess the robustness of the model to the assumption of independence.
If no clustering is found in (i) supporting the conclusions of the primary analysis, then (ii) will not be performed. The 'main care provider' is defined as the therapist or doctor providing the largest number of trial therapy sessions for each participant. As such, the main care provider is likely to be a therapist for APT, CBT, and GET and a doctor for SSMC. To be explicit, if the doctor provides more sessions than the therapist in APT, CBT or GET then the doctor is the main care provider (see Departures from Randomised Intervention Policy). If there is a tie in number of sessions delivered by two care providers the main care provider will be the one who delivered the earlier sessions.
In summary, three analyses are planned: 1) without accounting for therapist effect/clustering, 2) accounting for main care provider, and 3) accounting for both the main therapist and main doctor for each participant. The third will not be done if the second shows no clustering effects.
An analysis accounting for the effect of clustering on secondary outcomes will be considered.
Any differences in the point estimates, confidence intervals (CIs) or conclusions will be reported. Any problems encountered in fitting these models will be reported and the scope of the analyses will be restricted; the weights used within the multiple membership model [55] will be determined by the proportion of participants treated by each therapist/doctor. Additional models to explore or take account of complex clustering effects may also be fitted; if so, the motivation for these will be reported together with their results.
Method for handling dropouts and missing data Data are missing completely at random (MCAR) when they represent a simple random sample of the complete sample and the missing data mechanism is independent of all observed and unobserved variables. The assumption that data are missing at random (MAR) is reasonable when missing data represent an identifiable stratified sample of the complete sample and the missing data mechanism is dependent only on other known and observed variables. Data are missing not at random (MNAR) where missing data represent an unidentifiable stratified sample of the complete sample and the missing data mechanism depends on measured and unmeasured variables. The model describing the missing data mechanism will take any clustering effects into consideration. The planned strategy for handling missing data at the item [56] and scale [57] levels will depend on whether the amount of item-missing data observed is minimal. Within practical constraints it will be assumed that data are missing at random (MAR) conditional on the variables included in the substantive model.
Missing item data To ensure the same strategy is followed across all scales reported in the principal paper (s) any guidance given by authors of validated questionnaires will be superseded by the strategy outlined here. Where item-missing data are considered minimal (defined here as no more than 10% of participants with any missing item data across visits where collected or where no more than 20% of the items within a scale are missing within participants), prorating (that is, mean imputation across items within a scale, or subscale where scales are formed of subscales, for each visit and participant) [56] will be used. The focus will instead be on handling scale-missing data. Any bias or underestimation of variance of scores associated with prorating is  T8  T7  T16  T24  T17  T9  T23  T15 Therapists Doctors Patient Figure 2 Data structure envisaged in the design.
anticipated to be negligible where item-missing data are minimal [56]. We will report the amount of missing item data by the percentage of participants who have more than 10% item missing data for each scale reported. The amount of item-missing data is expected to be minimal. However, if this is not so for any outcome scale then multiple imputation [58,59] at the item-level will be the primary method used. Items will be imputed 100 times [60] separately for each scale (with the exception of the CFQ and SF-36PF, which will be imputed simultaneously). All of the other items for that scale across all time points (including baseline), scores (overall and any subscales) across all time points (including baseline), the four stratification factors at randomisation, randomised intervention, main therapist, and main SSMC doctor will be included in the imputation model.
Missing scale data Missing baseline scale data are not an issue for the primary analysis of efficacy; no missing data are expected for the stratification factors. Where the CFQ or SF-36PF is missing at baseline they will be replaced by the relevant scale at screening. There is specific guidance for missing baseline scale data, and this will be followed [61]. That is, we will use mean imputation of baseline variables assuming baseline and outcome are correlated less than 0.6.
Where the amount of item-missing data is considered minimal, missing outcome scale-data will be handled within the primary analysis by maximum likelihood [57,62] under a similar model for the missing data mechanism assumed for missing item data (see section above). We will report the amount of missing scale data by the percentage of participants who have more than 10% missing item data for each scale reported.
Loss to follow-up Some participants will withdraw from follow-up during the trial, and for these it may be more appropriate to assume data are missing not at random (MNAR). Where more than 10% of randomised participants are lost to follow-up, the impact of this will be investigated in a sensitivity analysis using the weighting approach described by Carpenter, Kenward and White [63] if multiple imputation is the primary method, or comparing selection model and pattern-mixture model therapy effect estimates [64] where maximum likelihood is the primary method.
Method for handling multiple comparisons and multiplicity The overall probability of falsely claiming a statistically significant result increases when multiple significance tests (or equally CIs) are interpreted simultaneously. Multiplicity considerations arise in this trial from the presence of (i) multiple outcomes, (ii) multiple intervention comparisons, and (iii) multiple analyses.
The strategy for adjusting, presenting and interpreting the results is set out below.
Multiplicity adjustments will be made as follows: 1. The following five comparisons will be made using two-sided hypothesis tests (alpha = 0.05) at 52 weeks: APT versus SSMC, CBT versus SSMC, GET versus SSMC, CBT versus APT, GET versus APT. For the co-primary outcomes, fatigue and disability, and for the secondary outcome, the participantrated CGI, P-values will be presented unadjusted for multiplicity. 2. In addition Bonferroni adjustment (0.05/5) will be applied separately to each of the three outcomes to control the outcome-wise type I error rate at 5%. 3. No adjustment will be made for any sensitivity analysis as their purpose is to increase confidence in the results obtained from the analysis nominated as primary [26]. 4. No adjustment will be made within the principal paper(s) for other analyses including those for safety, secondary outcomes (except the CGI) [26], and health economics.
Presentation will occur as follows: 1. All analyses undertaken will be reported as far as practical (regardless of statistical significance) [65]. 2. Estimated effects will be presented with unadjusted 2-sided 95% CIs and P-values. 3. P-values adjusted for multiplicity will also be presented and explained.
Interpretation will be done as indicated below: 1. Marginal interpretation of the results will be of primary interest and will be based on the size and precision of the observed differences between interventions with reference to point estimates and unadjusted 95% CIs. 2. Intervention recommendations will also take into consideration the consistency of effects a. across any supportive intervention contrasts, b. across sensitivity analyses, primary outcomes and time points, c. across efficacy, safety and cost analyses, and d. with the results of previous studies, and clinical and consumer opinions.
Method for handling compliance The primary analysis will be based on the intention-to-treat principle which compares the randomised intervention policies rather than the interventions per se [48]. Interpretation of the extent to which intervention effect estimates reflect the effects of the intervention described in the protocol requires analyses focusing on the effects of the interventions received rather than the interventions prescribed. It is recognised that per-protocol analyses have a number of limitations, most importantly, selection biases resulting from participants who are excluded not being a simple random sample of those randomised. As such, discrepancies between the conclusions of an intention-to-treat analysis and a perprotocol analysis may not reflect discrepancies between the effects of the intervention prescribed and the intervention received. Acknowledging these and other limitations, a per-protocol analysis will serve as the primary sensitivity analysis investigating the robustness of the conclusions of the primary analysis to assumptions about departures from the randomised intervention policies.

Descriptive analyses
Description of available data The patterns of availability of baseline and follow-up data will be summarised overall and separately for the four interventions and for each assessment visit at the scale level. If one or more case report forms (CRFs) are available for a particular visit then the visit will be regarded as available. If one or more (non-administrative) items are available then the scale will be regarded as available. Availability of baseline and follow-up data will be summarised with differentiation of fully, or partially completed measures from those completely missing, or with sketchy detail. The timing of baseline and follow-up data will be summarised overall and by intervention for each assessment visit in terms of the median (lower quartile, upper quartile, minimum and maximum) number of days from randomisation and the proportion falling outside guideline timeframes. Histograms of distributions will also be examined. Where assessments for a particular visit are carried out on more than one date, the timing of CFQ and SF-36PF assessments will be used to summarise visit timing. The extent to which visits are carried out on more than one date will be examined together with any further relevant details.
Description of missing data Where available, the reasons for missing baseline and follow-up data will be summarised overall and by intervention at the visit and scale levels. This will be done using relevant information included in the comments fields of the database. It is anticipated that such information will be available principally for visit and scale missing data.
Where the level of item-missing data is borderline between 'minimal' and 'important' (see Methods for Handling Dropouts and Missing Data), the appropriateness of prorating will be evaluated using the checks outlined by Fayers et al. [56]. Assumptions regarding the nature of the missing data mechanism (that is, MAR as compared to MCAR and MAR, conditional on the variables included in the substantive model as compared to additional variables) will be evaluated by looking descriptively at the statistical associations between whether or not data is missing and any potential predictors, including those generated by looking at the comments fields or the data.
Participant flow Participant throughput will be summarised in a CONSORT diagram [28] including the stages of enrolment, allocation, follow-up and analysis (see Figure 3). Where available, similar summary information will also be provided on the flow of therapists and doctors from recruitment to analysis [66]. In addition to the median, lower quartile, upper quartile minimum and maximum, the arithmetic, harmonic and minimum-variance mean cluster sizes together with the standard deviation will be tabulated by intervention as these may be useful for calculating the design effect where cluster sizes are variable in size [67,68].
Any participant attending at least one session of SSMC or at least one session of APT, CBT, or GET will be regarded as having initiated their randomised intervention. The overall definition of departures from randomised intervention policy (see Departures from Randomised Intervention Policy) will be used to define an inadequate randomised intervention.
Representativeness of sample This will be presented within the baseline comparability tables (see Baseline Comparability of Randomised Groups).
Baseline comparability of randomised groups The following participant-level baseline variables will be summarised both overall and between randomised interventions: i) Oxford criteria met (yes; no) ii) Centre (Barts, Bristol, Edinburgh, Kings, Oxford, Royal Free) iii) Diagnostic criteria (neither met; CDC met only; London met only; both met) iv) Current depressive disorder (present or absent) v) GAD (yes, no) vi) Agoraphobia (yes, no) vii) Panic disorder (yes, no) viii) Fibromyalgia (met, unmet) ix) Duration of CFS/ME since start of illness x) Taking hypnotics, analgesics or antidepressants xi) Number of other medications/treatments taken xii) CFQ Score (continuous) xiii) SF-36PF score xiv) Age at randomisation (years) (continuous) xv) Age at randomisation (years) ( Eyeball comparisons of distributions will be carried out as a measure of the randomisation integrity. The following therapist-level baseline variables will be summarised overall: iii) Worked in CFS/ME or chronic pain service previously iv) Employment grade (for health economic analysis) Doctor variables will be summarised by: i) Discipline (for example, psychiatrist/physician/GP) ii) Grade (for example, Consultant/SpR/SHO) Numbers (with percentages) for binary and categorical variables, and ordered categories plus means (and standard deviations), or medians (with lower and upper quartiles) for continuous variables will be presented. No statistical significance tests or CIs will be calculated for differences between randomised interventions on any participantlevel baseline variables [69][70][71]. Differences in therapistlevel baseline variables are expected because therapist characteristics are a component of the randomised intervention policies.
Median (lower and upper quartile) of number of participants per therapist will be reported.
Comparison of losses to follow-up Losses to follow-up will be reported at 13, 26, and 52 weeks by intervention and centre. Narrative summaries will be given of the reasons when known.
Therapy and other treatment received Summaries will be given of treatment received under the intervention policies; these will include: The number and percentage of those who comply will be reported by randomised intervention within the CON-SORT diagram. In addition, more detailed descriptions will be given by randomised intervention including: i) Number (percentage) of participants attending (i) fewer than three sessions of SSMC or (ii) fewer than ten sessions of APT, CBT or GET.
ii) Number and percentage of participants initiating a trial therapy other than the one randomised. iii) Number, percentage and details of participants receiving a trial intervention from (i) more than one therapist/doctor, (ii) a therapist/doctor from a different centre, or (iii) a therapist delivering their second therapy type.
Details of the following will be reported overall and by randomised intervention: a) Mid-trial modifications to trial interventions and manuals. b) Partial suspension of randomisation.
Narrative summaries will be given of the reasons for withdrawal when known.
Each primary outcome will be tabulated in a 2 × 4 table by compliance status and randomised intervention.
Unblinding of randomised intervention While this trial is not blinded, due to impracticability, a number of steps were taken to minimise bias arising from this. The apparent success of these steps will be assessed where possible:

Extent of any unblinding of the Trial Steering and
Data Monitoring Committees or the blinded statisticians will be reported. 2. Extent of primary outcomes data collected over the phone will be reported by randomised intervention. 3. The degree of self-declared expectations of the trial outcome among the trial team by professional role (that is, SSMC doctor, APT/CBT/GET therapist, therapy leader, centre leader, research staff ) and centre by randomised intervention was collected. 4. Participant preferences will be reported by randomised intervention. 5. Participant expectations of outcome will be reported by randomised intervention. 6. Proportion and type of discrepancies between preferred intervention and randomised intervention will be reported by randomised intervention.

Interim analyses and safety monitoring analyses
No interim analyses were planned or have been carried out.

Analysis of fatigue and disability (co-primary outcomes)
Definition of outcome measure (including trial periods) The fatigue and physical disability outcomes are continuous scores defined separately at weeks 12, 24, and 52. These are the primary outcomes. Fatigue will be measured by the Likert scores of the CFQ (possible range 0 to 33). Physical disability will be measured by the continuous scale of the SF36-PF (possible range 0 to 100).
Descriptive statistics for outcome measures The distributions of the Likert Chalder fatigue scores will be presented in frequency histograms both overall and by intervention at each assessment point (baseline, weeks 12, 24, and 52). The distribution of the SF-36 physical function subscale score will also be presented in histograms both overall and by intervention at each assessment point. It is anticipated that the distributions of the Likert Chalder fatigue score and the SF-36 physical function subscale score will be approximately normally distributed. Summary statistics (minimum, maximum, mean and standard deviation, median and inter-quartile range) will be tabulated and the response profiles plotted for each continuous score both overall and by intervention at each assessment point. The response profiles over time will also be plotted by outcome and intervention.
The mean scores (Likert Chalder fatigue scores and SF-36 physical function subscale scores) within each main therapist's caseload will be calculated by therapy (APT, CBT and GET). These means will be plotted to investigate the level of variability in participant outcomes between therapists and to examine the distribution of these summary statistics (that is, whether they are normally distributed or skewed). Differences in the mean scores within each main doctor's caseload will also be calculated and similar plots based on these presented.

Primary analysis (including method of analysis)
The primary analysis addressing primary objectives (1) to (5) and secondary objectives (1) and (3) will be based on the principle of intention-to-treat. If missing data are estimated using multiple imputation this analysis will be based on the intention-to-treat sample (see Trial Samples); if missing data are estimated via prorating and maximum likelihood, the analysis will be based on the available-case sample (see Trial Samples) and will exclude any participants with no follow-up data in a 'modified ITT' analysis. The primary outcomes of fatigue and physical disability will be analysed separately using two mixed-effects linear regressions, each including participant as a random intercept and investigating adding a random slope on time. Time (investigating the possibility of linearising across 12, 24 and 52 weeks), the time-byintervention interaction, baseline CFQ Likert score, baseline SF-36 physical function score and the design factors (that is, centre, CDC criteria, London criteria and current depressive disorder) will be included as fixed effects. Primary interest will be in the fixed contrasts specified in Method for Handling Multiple Comparisons and Multiplicity section at 52 weeks. The statistical models used in the analysis will be reported in full.
Clinical importance of the mean differences in primary outcomes at 52 weeks This will be judged by reference to the trial sample SDs at baseline in this trial supported by estimates from other sources. Specifically, a difference between means of two intervention groups, at 52 weeks, of 0.3 SD will be regarded as of minimal clinical importance (a MCID) and of 0.5 SD as a clinically useful difference. From published literature on these scales these differences can be translated into 5 points on the SF-36PF, and 1.2 points on the CFQ, for minimal clinical importance and 8 points on the SF-36PF, and 2.0 points on the CFQ, for clinically useful.

By design factors and additional factors
This is the primary analysis.
Model assumption checks The following assumptions will be checked: 1. Independence of residuals will be checked using the supportive analyses described in Method for Handling Other Clustering Effects section. ICC and within-cluster variance estimates will be reported. 2. Distribution of residuals and random effects will be checked visually using Q-Q plots and histograms of the residuals and by plotting the between-participant variation in participant outcomes and where appropriate the between-centre, the within-doctor but between-interventions, and the between-therapist variation in participant outcomes. Deviations from a Normal distribution would indicate a violation of model assumptions.
In this event an alternative approach to the analysis would be investigated. 3. Equal variance of residuals will be checked visually using plots of the standardised residuals against the predicted values. 4. Absence of an intervention-by-centre interaction will be checked in the primary analysis by including fixed contrasts for the intervention-by-centre interaction.
Checks will be made for extreme outliers and points with high leverage. In the event that these are found, the analysis will be reported with and without these observations together with any relevant details.
Other analyses supporting the primary analysis A number of sensitivity analyses will be employed to examine the robustness of the conclusions of the primary analysis to: 1. Categorical responder/improver analysis. Clinically significant improvement will be taken as a CGI participant score of 1 (very much improved) or 2 (much improved). CGI (P) scores of 3 (a little improved), 4 (no change) and 5 (a little worse) will be considered as non-improvement. CGI (P) scores of 6 (much worse) and 7 (very much worse) will be considered as deterioration. The primary analyses will be repeated replacing the primary outcomes with the CGI (P) (response versus no response versus deterioration) and using mixed-effects logistic regression. 2. Methods for handling missing data.
The primary analysis will be repeated assuming data are missing not at random (MNAR) as described in Method for Handling Dropouts and Missing Data section.

Choice of sample.
A per-protocol analysis will be employed using the per-protocol sample to examine the robustness of the results of the primary analysis to departures from the intended randomised intervention or eligibility criteria.
Additional analyses The CBT versus GET contrast will be reported, recognising its exploratory status. Secondary objective (2) (Do different interventions have differential effects on primary outcomes?) will be addressed by extracting fixed contrasts for the outcometype-by-intervention interaction from a bivariate mixedeffects linear regression [51,[72][73][74] fitted with fatigue and physical disability as joint outcomes, participant as a random effect (investigating adding a random slope on time), outcome-type (physical disability versus fatigue), intervention (all contrasts specified), time (investigating linearising this effect across 12, 24 and 52 weeks), the time-by-intervention interaction, the outcome-type-byintervention interaction, baseline CFQ Likert score, baseline SF-36 physical function score and the design factors (that is, centre, CDC criteria, London criteria and current depressive disorder) as fixed effects. These contrasts directly estimate the differences in the intervention effects between the two primary outcomes.

Analysis of secondary outcomes
Efficacy outcomes Definition of outcome measures (including trial periods) All secondary efficacy outcomes are defined separately at weeks 12, 24 and 52 unless specified otherwise (see Baseline and Outcome Measures). The PACE Scoring Protocol outlines in detail the process for calculating scores from questionnaire items and variables from case report forms. Participant-, therapist-and SSMC doctor-rated CGI are defined as ordinal variables with three categories. Participant satisfaction is defined as an ordinal variable with seven categories. The anxiety and depression subscale scores of the HADS, the Walking Test, and the total score of the Work and Social Adjustment scale are all continuous variables. However, the distribution of these is not pre-specified with the possibility that some or all may be skewed and the Walking Test may be bimodal. The number of CDC symptoms is a count variable and CDC Symptoms (1) and (8) are binary variables.
Descriptive statistics for outcome measures The distributions of all secondary efficacy outcomes will be presented in histograms (continuous/count) or bar charts (ordinal/binary) both overall and by intervention at each assessment point. A single table will be produced including summary statistics for all secondary efficacy outcomes by intervention and assessment point. Numbers (and percentages) or means (and standard deviations, minimums and maximums) or medians (and interquartile ranges, minimum and maximums) will be presented as appropriate. Summary statistics will be further plotted using line graphs for each outcome across time by intervention. The anticipated profiles have not been specified in advance. Potential variability in secondary efficacy outcomes between therapists and between doctors will be investigated using an approach similar to that outlined for the primary outcomes.
Primary analysis (including method of analysis) The primary analyses addressing secondary objective (3) will involve the secondary efficacy outcomes and will be based on the intention-to-treat principle. Participant will be included as a random intercept (investigating adding a random slope on time), time (investigating the possibility of linearising this effect across 12, 24 and 52 weeks) and the associated baseline variable as fixed effects and centre, CDC criteria, London criteria, and Current Depressive Disorder as fixed indicator variables. Participantrated CGI and the participant satisfaction will be analysed using mixed-effects ordinal logistic regressions. The anxiety and depression subscale scores of the HADS, number of CDC symptoms, the Jenkins sleep scale total score, the Walking Test, and the total score of the Work and Social Adjustment scale will be analysed using mixed-effects linear regressions, unless there is evidence to suggest that these outcomes are skewed/bimodal, in which case transformation and bootstrapping will be investigated. CDC Symptoms (1) and (8) will be analysed using mixed-effects logistic regressions. The intervention and time-by-intervention contrasts fitted for the primary outcomes will be extracted for each secondary efficacy outcome as outlined in the analyses of the primary outcomes.
Baseline adjustment The same as that outlined for the primary outcomes Model assumptions checks The following will be checked as described for the primary analysis of the primary outcomes 1. Independence of residuals 2. Distribution of residuals 3. Equal variance of residuals 4. Distribution of random effects (as appropriate) 5. Absence of an intervention-by-centre interaction 6. Extreme outliers and points with high leverage Other analyses supporting the primary analysis Sensitivity analyses investigating the robustness of the conclusions of the primary analyses of the secondary efficacy outcomes will be less extensive than those described for the primary outcomes unless concern is raised by those carried out for the primary outcomes.
Safety outcomes These analyses will be based on the safety sample (see Trial Samples).
Definition of outcome measures (including trial periods) The safety of the trial interventions will be assessed using the definition of serious deterioration that was developed for monitoring safety during the course of the trial (see Outcome Measures), participant-rated adverse events defined and recorded in accordance with the protocol, and withdrawals from intervention. Serious deterioration, defined at 52 weeks, will be the primary assessment of safety. Its four components will be reported separately to enable evaluation of their relative contributions. These draw on the two adverse outcomes defined in the protocol, namely negative change on either the participant-rated CGI or the SF-36 physical function scale defined at 12, 24 and 52 weeks.
Participant-reported adverse events, including comorbid conditions which started after randomisation, are reported in terms of their relatedness to the trial intervention (events versus reactions), seriousness (non-serious versus serious) and severity (mild, moderate, severe). In addition serious adverse events are reported by the above and by their expectedness (expected versus unexpected).
Three independent assessors, initially blinded to intervention, selected by AfME and approved by the TSC, will do the following: 1. review all non-serious adverse events to determining if any should be upgraded to serious adverse events (SAEs) (masked to intervention); 2. review all SAEs to agree their classification as such (masked to intervention); 3. rate the relationship of each SAE to the randomised interventions (unmasked to intervention) (to consider whether any might be serious adverse reactions (SAR) to an intervention or suspected unexpected serious adverse reactions (SUSAR)); and 4. review all SARs and SUSARs.
Assessors will work independently of each other during the both classification periods. Where there is disagreement, consensus will be sought. Where disagreement continues, a majority vote will be taken.
Descriptive statistics for outcome measures Serious deterioration Serious deterioration will be tabulated both overall and by its four components at week 52 by randomised intervention. Absolute risk difference tests will be performed between serious deterioration (yes or no) and randomised intervention.
Adverse events Adverse events will be tabulated separately by type (non-serious adverse events, serious adverse events, serious adverse reactions and suspected unexpected serious adverse reactions), by time (weeks 0 to 12, weeks 12 to 26, weeks 26 to 52, and overall weeks 0 to 52), and by randomised intervention. Each table will include denominators showing how many participants were in the trial at each time point by randomised intervention. The numerator will indicate the number of affected participants, and an event rate will be provided indicating the events per unit of person time so as to capture events with recurrences.
The frequency of non-serious adverse events (non-serious adverse events and non-serious adverse reactions) per participant will be tabulated by randomised intervention.
All serious adverse events will be described individually: stating randomised intervention, participant identification number, centre, sex and age, investigator's reported term, preferred term, date of onset according to the date of the randomization, duration, number of SSMC sessions, number of therapist sessions (if applicable), action taken regarding the study intervention administration, use of a corrective treatment, outcome, relationship to the study intervention in the PACE clinician's opinion and expectedness. Where the independent scrutineers have disagreed with the PACE clinician's opinion, the scrutineers' views only will be reported.
Deaths will be reported as described for a serious adverse event.
All adverse events leading to withdrawal (which constitute significant adverse events) will be summarised by randomised intervention, and whether the participant withdrew from the whole trial or intervention only.

Discontinuation and withdrawals from intervention
Discontinuation and withdrawals from intervention will be listed by intervention, participant identification number, centre, who made decision for withdrawal, whether the participant withdrew from intervention or trial, the reason for withdrawal, and interval post-randomisation (in days). Reasons for discontinuation and withdrawal from intervention will be tabulated by time (week 0 to week 12, week 12 to week 26, week 26 to week 52 and week 0 to week 52), randomised intervention and reason for withdrawal.
More detailed descriptions of adverse events will be published separately.
Primary analysis (including method of analysis) All serious adverse events (SAEs, SARs and SUSARs combined) will be tabulated in relation to the intervention. Any doubling in harms observed between interventions will be highlighted. The percentages of participants with SAEs, SARs and SUSARS, and the three combined, as well as number of non-serious AE and percentage of participants with one or more non-serious AEs, will be reported by intervention group, including differences between groups with 95% CIs.
Health economics outcomes Definition of outcome measures (including trial periods) Service use and lost employment Comprehensive data are being collected on all health, social care and other relevant services used by individual study members using a tailored version of the Client Service Receipt Inventory (CSRI). The CSRI is used at baseline and at 24and 52-week follow-up each time covering resource use for the previous 6 months. The CSRI covers the following broad categories of information.
Living situation/accommodation Education, employment and income (including benefits) Time off work (measured in days) and time unemployed (or retired due to illness) summing the relevant cost period (−24 to 0 weeks, 0 to 24 weeks 0 to 52 weeks) Use of health and social care resources Cost calculation The costs of each resource item will be calculated using best available unit cost estimates [75]. The cost of APT, GET and CBT will be estimated using information on the core resource inputs involved in delivering the interventions, and estimating country-specific costs for those inputs. Costs will be calculated using data on the number of intervention sessions received by each participant.
Lost employment costs for those in employment will be calculated by combining time off work with daily earnings. For those unemployed/retired due to ill health lost employment costs will be calculated by combining this period of time with average age and gender specific earnings.
The variables derived from the CSRI will be: (i) use (yes/no) of each service, (ii) number of service contacts/ days in hospital, (iii) cost of each service, (iv) in employment (yes/no), (v) days not worked, and (vi) whether benefits received (each benefit -yes/no).
Quality adjusted life year measurement The EQ-5D consists of five domains (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression). Each of these will receive a score of 1, 2 or 3 corresponding to no problems, moderate problems and major problems. Utility scores will be attached to each health state based on these scores (a table of utility values [76] has been produced by the Centre for Health Economics, University of York). These utility scores will be used to generate QALY gains over the follow-up period.
Descriptive statistics for outcome measures Data will be reported on the number and percentage of participants using each service in the CSRI by intervention, at baseline and 24 and 52 week follow-up. The mean and standard deviation number of service contacts for using services will also be reported as well as the mean and standard deviation costs for all participants. The number and percentage of participants with a score of 1, 2 or 3 for each EQ-5D domain will be reported.
Primary analysis (including method of analysis) Cost comparisons Regression analysis will be used to compare service costs and total costs between the four interventions which will each be represented by dummy variables. Each intervention will be used in turn as the reference category to make all relevant comparisons.
Predictors of cost Participant characteristics will be used in a regression model to explain differences in baseline costs. We will test the hypothesised associations with both healthcare and societal costs, as well as using multivariable modelling of other possible predictors identified from univariate analyses. Subsequent regression models will be used to explain variations in follow-up costs, and these will also include clinical characteristics from preceding periods. Two types of regression model will be used. First, we will construct ordinary least squares models, with bootstrapping used to produce reliable 95% CIs around the regression coefficients. Second, we will construct generalised linear models with a log link and gamma distribution to account for the skewness that is likely in the costs data. Independent variables will include demographic characteristics (such as age, gender and marital status), year of randomisation, clinical variables (such as fatigue score, disability, depression, anxiety) and benefits status (whether receiving benefits and whether benefits are in dispute).
Cost-effectiveness analysis Cost-effectiveness will be assessed by linking data on service cost differences and outcome (fatigue and physical disability) differences [77]. If any intervention has significantly lower costs and significantly better outcomes then it will be deemed to be more cost-effective. If costs are significantly higher and outcomes significantly better or if there is uncertainty in these findings (indicated by the CIs) then we will use the net benefit approach and cost-effectiveness acceptability curves to assess cost-effectiveness. Cost-effectiveness results will be plotted on a cost-effectiveness plane. This will involve producing estimates of cost and outcome differences from 1,000 bootstrapped re-samples of the original data. Such planes will be produced for each combination of two-way group comparisons. The plane will inform us as to the probability that an intervention has either (i) lower costs and better outcomes, (ii) lower costs and worse outcomes, (iii) higher costs and better outcomes or (iv) higher costs and worse outcomes than each comparator.
Cost-utility analysis This will be conducted in the same way as the cost-effectiveness analysis but will use quality adjusted life years (derived from the EQ-5D) as the outcome measure.
Predictors of cost-effectiveness/cost-utility The netbenefit approach allows multivariable analyses of economic data. This will enable us to identify predictors of cost-effectiveness and cost-utility. This will be done using regression models as described above. In particular we hypothesise that age and gender will predict costeffectiveness and cost-utility.
Baseline adjustment The predictors of the cost regression model will be adjusted by the CSRI baseline outcome data.
Model assumptions checks Cost data are usually skewed and if this results in similarly skewed residuals then the standard linear model is inappropriate. The distribution of the regression residuals will be checked visually and if the distribution is non-normal we will use bootstrapping with 10,000 resamples to estimate 95% CIs around the cost differences (CIs will be based on the percentile or bias-corrected method depending on the level of bias observed in the model.) The assumption of independent residuals will be checked by bootstrapping at the therapist level.
Other analyses supporting the primary analysis (including sensitivity analyses) Sensitivity analyses will be carried out on two aspects of the analyses to assess the robustness of the findings. The effect of each of these alternative approaches on mean total societal costs at 12 months and subsequent cost-effectiveness calculations based on these costs will be explored in turn.
The main analyses will use an informal care unit cost based on the replacement method (where the cost of a homecare worker is used as a proxy for informal care). We will alternatively use a zero cost and a cost based on the national minimum wage for informal care. We will also conduct sensitivity analyses around the costs attached to lost employment.
The estimated costs of APT, GET and CBT will be increased and decreased by 50% to see how sensitive the costs, cost-effectiveness and cost-utility findings are to these variables.

Subgroup analyses
Exploratory sub-group analyses are planned to investigate whether intervention effects differ between those meeting and not meeting the CDC criteria or London criteria and between those with or without a depressive disorder at the point of randomisation.

Software
The data has been entered and checked during the course of the trial in a customised Microsoft Access [78] database. Once the database is locked, the data will be transferred into Stata [79]. It is anticipated that the analyses will be carried out primarily within Stata [79], although MLwiN [80] and other statistical packages may be used as necessary. The most up-to-date version available will be used in each case.