The choice of outcome measure: post score vs. change score
The methods outlined in this paper are suitable for any continuous measure, and therefore we have used a generic notation. Suppose the primary continuous outcome measure is Y, with Y0 and Y1 denoting the value of Y at baseline and post-intervention, respectively. Let r denote the correlation coefficient between Y0 and Y1.
We will call Y1 “post score”, Y0 “baseline score”, and (Y1 − Y0) “change score”. We note that Y1 is also called “follow up score” in [17]. The authors assumed there is no interaction between baseline and intervention group, and we make the same assumption in this paper.
In analysis of covariance (ANCOVA), we estimate parameters a, b, and c in the following regression question:
$$ \mathrm{post}\ \mathrm{score}=a+b\times \mathrm{baseline}+c\times \mathrm{group} $$
where “group” stands for “intervention group”. One is usually most interested in the estimate of c, the treatment effect, in an RCT. Substituting Y1 for “post score”, Y0 for “baseline score”, and G for “group”, we have the following regression equation for a standard ANCOVA that uses post score as outcome:
$$ {Y}_1=a+b{Y}_0+ cG $$
(1)
Rearranging Eq. 1, we have
$$ {Y}_1-{Y}_0=a+\left(b-1\right){Y}_0+ cG $$
(2)
Equation 2 is ANCOVA using change score (Y1 − Y0) as outcome and adjusting for baseline Y0. Compared with the standard ANCOVA in Eq. 1, where post score Y1 is the outcome, nothing has changed except that the regression coefficient for Y0 has decreased by 1. The significance level and the width of the confidence intervals for all estimated regression coefficients remain the same as those in a standard ANCOVA. Further mathematical details can be found in [14].
Rearranging Eq. 1 in a different way, we have
$$ {Y}_1-b{Y}_0=a+ cG $$
(3)
Equation 3 shows that using change score as outcome without adjusting for baseline is only equivalent to a standard ANCOVA when b = 1. In practice, the estimated b in an ANCOVA is rarely equal to 1; hence, it is only a special case of ANCOVA.
Regression to the mean (RTM) and ANCOVA
RTM is a well-known statistical phenomenon, first discovered by Galton in [10]. RTM has been discussed by a number of authors, e.g. [3, 6, 7, 15], etc. In this paper, we consider RTM in the context of baseline measures.
If an extreme measure is observed at baseline, then its value is likely to be less extreme in the post-intervention measure, even if the intervention has no effect. In the RCT example in [17], the treatment effect of acupuncture was measured by a 100-point rating score, where lower scores indicate poorer outcomes. Suppose that the baseline scores of the control group reflect the scores of the general population, and that acupuncture has no treatment effect. If, by chance, the baseline scores of the intervention group are lower than the scores of the general population, their post scores will still be higher than their baseline scores, due to RTM. We consider two options of outcome measure:
-
1.
Post score: If post score is positively correlated with the baseline score (which is usually the case in clinical practice), acupuncture will appear to have a negative effect, even though it has no effect; i.e. the treatment effect of acupuncture will be under-estimated.
-
2.
Change score: Acupuncture will appear to have a positive effect, even though it has no effect; i.e. the treatment effect of acupuncture will be over-estimated.
In both of the preceding scenarios, the appropriate statistical analysis is ANCOVA adjusting for baseline scores. The first scenario corresponds to Eq. 1, with its left-hand side showing the post score as the outcome measure. The second scenario corresponds to Eq. 2, with its left-hand side showing the change score as the outcome measure.
Using change score as the outcome measure does not address the problem of RTM, nor does it take account of the baseline imbalance. Even if change score is deemed to be the appropriate outcome measure after careful consideration, ANCOVA should still be used to adjust for baseline scores, as shown by Eq. 2.
Using change score as outcome does not adjust for baseline imbalance; instead, any imbalance will be reversed due to RTM [4]. Equation 2 shows that when change score is the chosen outcome, one should still adjust for baseline using ANCOVA. In such a case, ANCOVA is the valid statistical analysis.
ANCOVA has the advantages of being unaffected by baseline imbalance [17], and it has greater statistical power than other methods [16]. An RCT reduces RTM at the design stage, but one should still use ANCOVA to adjust for baseline in the analysis stage [3].
The validity of using change score as outcome measure
Let r denote the correlation between post score Y1 and baseline score Y0. Let \( {s}_{Y_0}^2 \) denote the sample variance of baseline score Y0, \( {s}_{Y_1}^2 \) denote the sample variance of post score Y1, and \( {s}_{\left({Y}_1-{Y}_0\right)}^2 \) denote the sample variance of the change score (Y1 − Y0). Let \( {s}_{Y_0} \), \( {s}_{Y_1} \), and \( {s}_{\left({Y}_1-{Y}_0\right)} \) denote their corresponding standard deviations (SD).
The Appendix shows that the correlation between change score (Y1 − Y0) and baseline score Y0 is
$$ corr\left({Y}_1-{Y}_0,{Y}_0\right)=\frac{r\ {s}_{Y_1}-{s}_{Y_0}}{\sqrt{s_{Y_0}^2+{s}_{Y_1}^2-2\ r\ {s}_{Y_0}{s}_{Y_1}}} $$
(4)
Equation 4 shows that corr(Y1 − Y0, Y0) will be positive if \( r>{s}_{Y_1}/{s}_{Y_0} \), and vice versa. In the special case of r = 0 (i.e. Y1 and Y0 are not correlated), there still will be negative correlation between change score (Y1 − Y0) and baseline score Y0. If the post score and baseline have similar variance, then corr(Y1 − Y0, Y0) will usually be negative because r ≤ 1. Most importantly, Eq. 4 shows that there is always a correlation between the change score and the baseline score; therefore, one should use ANCOVA to adjust for the baseline score.
Equation 4 was also applied when comparing methods of measurement [8], where Y1 and Y0 were replaced by test measure and standard measure, respectively. The authors of [8] show that plotting difference against standard method is misleading, because there will be a negative correlation even if the two methods are not correlated. The authors also conclude that plotting difference against the average is more useful in almost all medical measures.
We now consider the variance of the change score (Y1 − Y0). The variance sum law states that the variance of the change score is
$$ {s}_{\left({Y}_1-{Y}_0\right)}^2={s}_{Y_0}^2+{s}_{Y_1}^2-2\ r\ {s}_{Y_0}{s}_{Y_1} $$
(5)
Equation 5 shows that if r is small, \( {s}_{\left({Y}_1-{Y}_0\right)}^2 \) will be greater than \( {s}_{Y_1}^2 \); i.e. using change score will add variance compared with using post score as the outcome measure, and therefore will be less likely to show a significant result. Conversely, the post score will be more likely to show a significant result if r is high. However, the choice of the outcome measure should not be driven by the likelihood of a significant result; instead, it should be pre-specified in the trial protocol [17].
Using change score as outcome has undesirable implications. For example, if there is a hard lower or upper limit on the score, it may lead to “floor” or “ceiling” effects in change score. If transformation of the original scores is used during data analysis, it is not guaranteed that the transformation applies to the change score. Different transformations can reorder change scores across patients. By contrast, using post scores is always valid and never misleading [12].
The change score can be a reasonable outcome when the correlation between baseline and post scores is high (e.g. r > 0.8) in stable chronic conditions such as obesity [17]. In this instance, ANCOVA is still the preferred general approach.
In the Appendix, we show that the change score is always correlated with the baseline score (Eq. 9) and with the post score (Eq. 10). This is purely a statistical artefact, and it exists regardless of whether the treatment is effective or not. The method of using change score as outcome measure is prone to incorrect interpretations of such correlations.
In summary, one should be cautious about using change score as the outcome measure. If justification exists for using change score as the outcome measure, one should still adjust for baseline using ANCOVA. This will increase statistical power and avoid the pitfall of RTM.