 Methodology
 Open Access
 Open Peer Review
The correlation between baseline score and postintervention score, and its implications for statistical analysis
 Lei Clifton^{1}Email authorView ORCID ID profile and
 David A. Clifton^{2}
 Received: 16 May 2018
 Accepted: 6 December 2018
 Published: 11 January 2019
Abstract
Background
When using a continuous outcome measure in a randomised controlled trial (RCT), the baseline score should be measured in addition to the postintervention score, and it should be analysed using the appropriate statistical analysis.
Methods

Correlation between change score and baseline score

Correlation between change score and post score

Correlation between change score and average score.
The setting here is a parallel, twoarm RCT, but the method discussed in this paper is applicable for any studies or trials that have a continuous outcome measure; it is not restricted to RCTs.
Results
We show that using the change score as the outcome measure does not address the problem of regression to the mean, nor does it take account of the baseline imbalance. Whether the outcome is change score or post score, one should always adjust for baseline using analysis of covariance (ANCOVA); otherwise, the estimated treat effect may be biased. We show that these correlations also apply when comparing two measurement methods using BlandAltman plots.
Conclusions
The correlation between baseline and postintervention scores can be derived using the variance sum law. We can then use the derived correlation to calculate the required sample size in the design stage. Baseline imbalance may occur in RCTs, and ANCOVA should be used to adjust for baseline in the analysis stage.
Keywords
 Correlation
 Baseline
 Change score
 Postintervention
 Treatment
 Statistical analysis
 Sample size
 Independent
 Means
 Standard error (SE)
 Standard deviation (SD)
 Regression to the mean (RTM)
 Outcome
 Analysis of covariance (ANCOVA)
 Randomised controlled trial (RCT)
 BlandAltman plot
Background
When using a continuous outcome measure in a randomised controlled trial (RCT), the baseline score should be measured in addition to the postintervention score. In a previous paper [9], we have shown how to derive r, the correlation between baseline score and post score. In this paper, we derive correlations between different variables in the Appendix, assuming r is known. There are two options for the outcome measure: change score or post score. We examine the validity of using change score as the outcome measure in the “Method” section, and then further discuss the applications of our methods outlined in the “Discussion” section.
Method
The choice of outcome measure: post score vs. change score
The methods outlined in this paper are suitable for any continuous measure, and therefore we have used a generic notation. Suppose the primary continuous outcome measure is Y, with Y_{0} and Y_{1} denoting the value of Y at baseline and postintervention, respectively. Let r denote the correlation coefficient between Y_{0} and Y_{1}.
We will call Y_{1} “post score”, Y_{0} “baseline score”, and (Y_{1} − Y_{0}) “change score”. We note that Y_{1} is also called “follow up score” in [17]. The authors assumed there is no interaction between baseline and intervention group, and we make the same assumption in this paper.
Equation 2 is ANCOVA using change score (Y_{1} − Y_{0}) as outcome and adjusting for baseline Y_{0}. Compared with the standard ANCOVA in Eq. 1, where post score Y_{1} is the outcome, nothing has changed except that the regression coefficient for Y_{0} has decreased by 1. The significance level and the width of the confidence intervals for all estimated regression coefficients remain the same as those in a standard ANCOVA. Further mathematical details can be found in [14].
Equation 3 shows that using change score as outcome without adjusting for baseline is only equivalent to a standard ANCOVA when b = 1. In practice, the estimated b in an ANCOVA is rarely equal to 1; hence, it is only a special case of ANCOVA.
Regression to the mean (RTM) and ANCOVA
RTM is a wellknown statistical phenomenon, first discovered by Galton in [10]. RTM has been discussed by a number of authors, e.g. [3, 6, 7, 15], etc. In this paper, we consider RTM in the context of baseline measures.
 1.
Post score: If post score is positively correlated with the baseline score (which is usually the case in clinical practice), acupuncture will appear to have a negative effect, even though it has no effect; i.e. the treatment effect of acupuncture will be underestimated.
 2.
Change score: Acupuncture will appear to have a positive effect, even though it has no effect; i.e. the treatment effect of acupuncture will be overestimated.
In both of the preceding scenarios, the appropriate statistical analysis is ANCOVA adjusting for baseline scores. The first scenario corresponds to Eq. 1, with its lefthand side showing the post score as the outcome measure. The second scenario corresponds to Eq. 2, with its lefthand side showing the change score as the outcome measure.
Using change score as the outcome measure does not address the problem of RTM, nor does it take account of the baseline imbalance. Even if change score is deemed to be the appropriate outcome measure after careful consideration, ANCOVA should still be used to adjust for baseline scores, as shown by Eq. 2.
Using change score as outcome does not adjust for baseline imbalance; instead, any imbalance will be reversed due to RTM [4]. Equation 2 shows that when change score is the chosen outcome, one should still adjust for baseline using ANCOVA. In such a case, ANCOVA is the valid statistical analysis.
ANCOVA has the advantages of being unaffected by baseline imbalance [17], and it has greater statistical power than other methods [16]. An RCT reduces RTM at the design stage, but one should still use ANCOVA to adjust for baseline in the analysis stage [3].
The validity of using change score as outcome measure
Let r denote the correlation between post score Y_{1} and baseline score Y_{0}. Let \( {s}_{Y_0}^2 \) denote the sample variance of baseline score Y_{0}, \( {s}_{Y_1}^2 \) denote the sample variance of post score Y_{1}, and \( {s}_{\left({Y}_1{Y}_0\right)}^2 \) denote the sample variance of the change score (Y_{1} − Y_{0}). Let \( {s}_{Y_0} \), \( {s}_{Y_1} \), and \( {s}_{\left({Y}_1{Y}_0\right)} \) denote their corresponding standard deviations (SD).
Equation 4 shows that corr(Y_{1} − Y_{0}, Y_{0}) will be positive if \( r>{s}_{Y_1}/{s}_{Y_0} \), and vice versa. In the special case of r = 0 (i.e. Y_{1} and Y_{0} are not correlated), there still will be negative correlation between change score (Y_{1} − Y_{0}) and baseline score Y_{0}. If the post score and baseline have similar variance, then corr(Y_{1} − Y_{0}, Y_{0}) will usually be negative because r ≤ 1. Most importantly, Eq. 4 shows that there is always a correlation between the change score and the baseline score; therefore, one should use ANCOVA to adjust for the baseline score.
Equation 4 was also applied when comparing methods of measurement [8], where Y_{1} and Y_{0} were replaced by test measure and standard measure, respectively. The authors of [8] show that plotting difference against standard method is misleading, because there will be a negative correlation even if the two methods are not correlated. The authors also conclude that plotting difference against the average is more useful in almost all medical measures.
Equation 5 shows that if r is small, \( {s}_{\left({Y}_1{Y}_0\right)}^2 \) will be greater than \( {s}_{Y_1}^2 \); i.e. using change score will add variance compared with using post score as the outcome measure, and therefore will be less likely to show a significant result. Conversely, the post score will be more likely to show a significant result if r is high. However, the choice of the outcome measure should not be driven by the likelihood of a significant result; instead, it should be prespecified in the trial protocol [17].
Using change score as outcome has undesirable implications. For example, if there is a hard lower or upper limit on the score, it may lead to “floor” or “ceiling” effects in change score. If transformation of the original scores is used during data analysis, it is not guaranteed that the transformation applies to the change score. Different transformations can reorder change scores across patients. By contrast, using post scores is always valid and never misleading [12].
The change score can be a reasonable outcome when the correlation between baseline and post scores is high (e.g. r > 0.8) in stable chronic conditions such as obesity [17]. In this instance, ANCOVA is still the preferred general approach.
In the Appendix, we show that the change score is always correlated with the baseline score (Eq. 9) and with the post score (Eq. 10). This is purely a statistical artefact, and it exists regardless of whether the treatment is effective or not. The method of using change score as outcome measure is prone to incorrect interpretations of such correlations.
In summary, one should be cautious about using change score as the outcome measure. If justification exists for using change score as the outcome measure, one should still adjust for baseline using ANCOVA. This will increase statistical power and avoid the pitfall of RTM.
Discussion
Potential imbalance of baseline in RCTs
In practice, given the finite sample size and random nature of RCTs, any important prognostic factors and baseline score may not be balanced between arms. RCTs of small or moderate sample sizes are particularly prone to such imbalances.
The balance of specific prognostic factors can be achieved during randomisation. The most commonly used randomisation methods are stratified permuted blocks [1] and minimisation [2]. Both methods allow randomisation to be stratified according to prognostic factors, such as gender, disease severity, age group, etc., which ensures that these characteristics are balanced between the treatment and control arms. One should adjust for stratification or minimisation factors during the data analysis stage [13].
However, the randomisation methods outlined above do not deal with the potential imbalance in baseline scores between arms. It is therefore of particular importance to measure the baseline scores before randomisation and then use ANCOVA to adjust for baseline in the data analysis stage, as shown in this paper.
The correlation between change score and baseline score
The correlation between change score (Y_{1} − Y_{0}) and baseline score Y_{0} has previously been observed in the context of initial blood pressure and its fall with treatment [11]. We provide a detailed mathematical derivation in the Appendix. Equation 9 shows that there is always a correlation between the change score and the baseline, regardless of any treatment effects. This correlation will be negative if r is small; that is, if the change score is the chosen outcome measure (for instance, of blood pressure), we will observe a fall in the blood pressure against the baseline blood pressure. An incorrect interpretation of such an observed decrease in the change score would be to conclude that the treatment is more effective for patients whose initial blood pressure is high.
Deriving correlation between baseline score Y _{0} and post score Y _{1}
In this paper, we have assumed that the value of r, the correlation between baseline score Y_{0} and post score Y_{1}, is known. However, the value of r is usually not readily available in the design stage of an RCT. In a previous paper [9], we have shown how to derive r using the variance sum law based on a published paper,and then use the derived value of r to calculate sample size using different methods.
Once we have derived r, we can derive correlations between different variables using equations derived in the Appendix. In the ideal situation when the raw data are available, one can fully investigate correlations between different variables using the equations provided in this paper.
BlandAltman plots
The same mathematical principles derived in the Appendix can be applied to both choosing outcome measures in an RCT and assessing agreement between two measurement methods [5, 8]. A BlandAltman plot shows the difference of the two measures on the yaxis and their average on the xaxis.
When assessing the agreement between two measurement methods, one should use a BlandAltman plot showing the difference of the two measures against their average. The correlation r between the two measures does not assess their agreement [5]. We note that if the ranges of the two measures are different, their variances will be different; therefore, there will be a trend on the BlandAltman plot, caused by the correlation shown in Eq. 13. Therefore, one should examine the variances of the two measures before using BlandAltman plots.
Similarly, in the context of outcome measures in an RCT, one can plot the change score against the average of the baseline and post scores, as in a BlandAltman plot. Equation 13 shows that the correlation between the change score and their average will be zero if the baseline score and post score have equal variance.
Limitations
The methods described in this paper only consider continuous variables or outcome measures. They are not applicable to binary variables.
Declarations
Acknowledgements
The research presented in this paper builds on original contributions from Professor Doug Altman, although he was not directly involved in the work described by this paper. This work was undertaken during the years LC worked for Professor Altman, who passed away during the review period of this paper. The authors dedicate this paper to the memory of Professor Altman, with our deepest admiration, affection, and respect.
Funding
Not applicable.
Availability of data and materials
Not required.
Authors’ contributions
LC conceived the research idea and led the writing of the paper. DC contributed to writing the paper. Both authors read and approved the final manuscript.
Ethics approval and consent to participate
Not required.
Consent for publication
Both authors have given consent for publications. No other personal data are used in this methodological paper.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Altman DG, Bland JM. How to randomise. BMJ. 1999;319(7211):703–4.View ArticleGoogle Scholar
 Altman DG, Bland JM. Treatment allocation by minimisation. BMJ. 2005;330(7495):843.View ArticleGoogle Scholar
 Barnett AG, van der Pols JC, Dobson AJ. Regression to the mean: what it is and how to deal with it. Int J Epidemiol. 2005;34(1):215–20.View ArticleGoogle Scholar
 Bland JM. Regression towards the mean or Why was Terminator III such a disappointment? 2004. https://wwwusers.york.ac.uk/~mb55/talks/regmean.htm. Accessed 23 Dec 2018.
 Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;327(8476):307–10.View ArticleGoogle Scholar
 Bland JM, Altman DG. Some examples of regression towards the mean. BMJ. 1994a;309(6957):780.View ArticleGoogle Scholar
 Bland JM, Altman DG. Statistic notes: regression towards the mean. BMJ. 1994b;308(6942):1499.View ArticleGoogle Scholar
 Bland JM, Altman DG. Comparing methods of measurement: why plotting difference against standard method is misleading. Lancet. 1995;346(8982):1085–7.View ArticleGoogle Scholar
 Clifton L, Birks J, Clifton DA. Comparing different ways of calculating sample size for two independent means: a worked example. Contemp Clin Trials Commun. 2018:100309.Google Scholar
 Galton F. Regression towards mediocrity in hereditary stature. J Anthropol Inst G B Irel. 1886;15:246–63.Google Scholar
 Gill JS, Beevers DG, Zezulka AV, Davies P. Relation between initial blood pressure and its fall with treatment. Lancet. 1985;325(8428):567–9.View ArticleGoogle Scholar
 Harrell F. How should change be measured? 2017. http://biostat.mc.vanderbilt.edu/wiki/Main/MeasureChange. Accessed 23 Dec 2018.
 Kahan BC, Morris TP. Improper analysis of trials randomised using stratified blocks or minimisation. Stat Med. 2012;31(4):328–40.View ArticleGoogle Scholar
 Laird N. Further comparative analyses of pretestposttest research designs. Am Stat. 1983;37(4):329–30.View ArticleGoogle Scholar
 Pocock SJ, Bakris G, Bhatt DL, Brar S, Fahy M, Gersh BJ. Regression to the mean in SYMPLICITY HTN3: implications for design and reporting of future trials. J Am Coll Cardiol. 2016;68(18):2016–25.View ArticleGoogle Scholar
 Vickers AJ. The use of percentage change from baseline as an outcome in a controlled trial is statistically inefficient: a simulation study. BMC Med Res Methodol. 2001;1(1):6.View ArticleGoogle Scholar
 Vickers AJ, Altman DG. Analysing controlled trials with baseline and follow up measurements. BMJ. 2001;323(7321):1123.View ArticleGoogle Scholar
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate. Please note that comments may be removed without notice if they are flagged by another user or do not comply with our community guidelines.