 Methodology
 Open access
 Published:
A comparison of approaches for combining predictive markers for personalised treatment recommendations
Trials volume 22, Article number: 20 (2021)
Abstract
Background
In the presence of heterogeneous treatment effects, it is desirable to divide patients into subgroups based on their expected response to treatment. This is formalised via a personalised treatment recommendation: an algorithm that uses biomarker measurements to select treatments. It could be that multiple, rather than single, biomarkers better predict these subgroups. However, finding the optimal combination of multiple biomarkers can be a difficult prediction problem.
Methods
We described three parametric methods for finding the optimal combination of biomarkers in a personalised treatment recommendation, using randomised trial data: a regression approach that models outcome using treatment by biomarker interactions; an approach proposed by Kraemer that forms a combined measure from individual biomarker weights, calculated on all treated and control pairs; and a novel modification of Kraemer’s approach that utilises a prognostic score to sample matched treated and control subjects. Using Monte Carlo simulations under multiple datagenerating models, we compare these approaches and draw conclusions based on a measure of improvement under a personalised treatment recommendation compared to a standard treatment. The three methods are applied to data from a randomised trial of homedelivered pragmatic rehabilitation versus treatment as usual for patients with chronic fatigue syndrome (the FINE trial). Prior analysis of this data indicated some treatment effect heterogeneity from multiple, correlated biomarkers.
Results
The regression approach outperformed Kraemer’s approach across all datagenerating scenarios. The modification of Kraemer’s approach leads to improved treatment recommendations, except in the case where there was a strong unobserved prognostic biomarker. In the FINE example, the regression method indicated a weak improvement under its personalised treatment recommendation algorithm.
Conclusions
The method proposed by Kraemer does not perform better than a regression approach for combining multiple biomarkers. All methods are sensitive to misspecification of the parametric models.
Background
One of the primary aims of the modern paradigm of stratified medicine is to move beyond a onesizefitsall approach that allocates treatment based on population average responses, towards identifying patient subgroups for whom a given treatment is beneficial and those for whom it is not. Given a patient population with heterogeneous treatment response, it might be possible to produce an algorithm for clinical use that provides a recommendation for treatment based on measurable traits (biomarkers). For these purposes, it is necessary to separate biomarkers into those that predict treatment response (moderating biomarkers) and those that predict the outcome, regardless of treatment (prognostic biomarkers). When the treatment choice is binary (the situation considered in this paper), the algorithm may recommend a treatment over an alternative for values of a single moderating biomarker, or a weighted combination of multiple moderating biomarkers. Such an algorithm is referred to as a personalised treatment recommendation (PTR).
In many disease areas, it might be that a combination of multiple biomarkers is more effective in identifying subgroups with a beneficial treatment outcome than any single biomarker [1]. Finding the optimal combination of biomarkers in a PTR algorithm is a challenging prediction problem. In order to avoid the confounding between treatment assignment and outcome, it is considered optimal that PTR’s are estimated from randomised controlled trial (RCT) data. A method used to estimate a PTR is to fit a regression model with treatment by biomarker interaction terms [2, 3]. Kraemer [4] proposes an alternative method that uses a parametric model fitted to all pairwise combinations of treated and control subjects. The method assigns a weight to each moderator from the correlation between pair’s average moderator value and the difference in their outcomes. A composite moderator is then derived from the individual moderator weights and used to derive a PTR.
In this paper, we expand the Kraemer method by utilising a prognostic score to sample matched pairs of treated and control subjects, rather than using all pairwise combinations. This prognostic score is the estimated treatmentfree outcome in all subjects, regardless of treatment assignment. We use Monte Carlo simulations to compare the Kraemer method, its proposed modification and the regression method with respect to estimating a PTR. Their respective utility is measured using an estimate of the expected change in outcome under the PTR compared to a nonstratified approach. The comparisons are made under a range of datagenerating mechanisms and sample sizes. The methods are then applied to data from a randomised controlled trial of pragmatic rehabilitation versus treatment as usual, for patients with chronic fatigue syndrome [5].
Methods Constructing a personalised treatment recommendation
Following a randomised controlled trial, for a continuous outcome Y and binary treatment A (following Kraemer, treatment is effectcoded as +1/2 for treated and –1/2 for control), the outcome for the ith subject might be described by the linear model:
where α represents parameters for the prognostic variables in vector X (determining treatmentfree outcome) and β are parameters for the moderator variables in vector Z (determining treatment effect heterogeneity). β_{0} represents the effect of treatment at the average value of the moderators.
Kraemer method for creating a ‘composite moderator’
In order to estimate the overall moderating effect of multiple biomarkers, Kraemer proposes a method for creating a ‘composite moderator’ from a weighted sum of moderating biomarkers [4, 6]. In the first step, a dataset is constructed from all possible pairwise combinations of treated and untreated subjects. From the parameterisation in Eq. 1, the change in outcome between the jth pair of treated (superscript T) and control (superscript C) subjects is equivalent to:
which can be reexpressed as:
where ∆ represents the withinpair difference and μ represents the withinpair mean. Intuitively, if a single variable Z had a strong moderating effect, then we would expect the withinpair difference ∆(Y_{j}) would increase in line with the average value of the moderator. Similarly, if a single variable X had a strong prognostic effect, then we would expect that large withinpair differences in that variable would result in large withinpair differences in the outcome. Kraemer formally shows that, for a single marker, the moderator effect size is characterised by the correlation between ∆(Y) and μ(Z) and the prognostic effect size is characterised by the correlation between ∆(Y) and ∆(X).
We can account for multiple, correlated moderating variables by regressing the average moderator value for K variables on the change in outcome \( \Delta (Y)={\sum}_1^K{w}_k\mu (Z) \) to calculate moderatorspecific weights w_{1}, w_{2}, …w_{k}. A combined moderator is obtained by the sum of the individual moderator weights: \( {Z_i}^{\ast }=\sum \limits_1^K{w}_k{Z}_{ik} \). This combined moderator can be used to separate subgroups based on their expected treatment response (see below) [6].
Modified Kraemer approach: matching on a prognostic score
We modify Kraemer’s approach to determine whether considering only treated and control pairs that have similar values of a ‘prognostic score’ improves its performance. A prognostic score is an estimate of the treatmentfree outcome. This is calculated by fitting a regression model to the subjects in the control group only (thus excluding any moderators):
The resulting model can be used to predict the outcome under the control condition, for subjects in both the control and treatment groups.
The resulting estimates are referred to as prognostic scores and have been used in observational research to control for confounding [7].
We propose modifying Kraemer’s approach so that the composite moderator is derived using a sample of treated and control pairs that only have similar values of their prognostic scores. In our implementation, we use a single nearest neighbour matching algorithm, with replacement, and with a calliper such that, for each treated subject, a control is sought that is within 0.1 times the standard deviation of the prognostic score. The calliper width is arbitrary and was set prior to any implementation.
The rationale for this modification is to minimise the contribution to ∆(Y) by variables that are irrelevant to treatment effect modification. There is a tradeoff between minimising this variance and losing treated and control pairs who do not fit the matching criteria. We investigate whether this modification results in an improvement on the Kraemer approach by applying these techniques to simulated datasets. First, we establish how to measure improvement in the context of determining a personalised treatment recommendation.
Constructing a personalised treatment recommendation (PTR)
A PTR uses biomarker values to recommend whether a patient should be treated or not. For example, statins are recommended in the UK if a person is aged over 40 and if their estimated CVD risk is at least 10% over a 10year period [8]. Formally, this can be represented as: \( \mathrm{PTR}=\mathbbm{I}\left\{\mathrm{age}\ge 40\&\mathrm{CVD}\ \mathrm{risk}\ge 10\%\right\} \), where \( \mathbbm{I} \) indicates that treatment (statins) is recommended over the alternative (no statins) when the bracketed expression is true.
Assuming higher values of a continuous outcome are advantageous, an optimal PTR is one that recommends treatment when, conditional on a set of moderating biomarkers Z, the mean outcome under treatment μ(A = 1/2, Z) is greater than the mean outcome under control μ(A = − 1/2, Z):
Under the parametrisation in Eq. (1):
The parameters β_{0} and β can be estimated using ordinary least squares regression with treatment by moderator interaction terms [2, 3]. This we refer to as the regression approach.
A PTR can be constructed using the Kraemer or modified Kraemer method using the following steps: (1) calculate the combined moderator; (2) regress the outcome on a model with treatment, the combined moderator and their interaction with treatment: \( \mu \left(A,{Z}^{\ast}\right)={\alpha}_0+{\alpha Z}^{\ast }+A\left({\beta}_0^{\ast }+{\beta}^{\ast }{Z}^{\ast}\right) \) (where Z^{∗} is the combined moderator and \( {\beta}_0^{\ast }+{\beta}^{\ast } \) are the average effect of treatment and the moderated effect, respectively); and (3) use these parameters to calculate the PTR: \( \mathrm{PTR}=\mathbbm{I}\left\{\left({\beta}_0^{\ast }+{\beta}^{\ast }{Z}^{\ast}\right)>0\right\} \).
Measuring the performance of a personalised treatment recommendations
Under randomisation, an unbiased estimate of the population mean outcome under a PTR is provided by the mean of the observed outcome in those receiving the treatment they were recommended weighted by the probability of being randomised to their respective group (π) [9,10,11]:
This can be contrasted with the average outcome under treatment \( \mu \left\{A=1/2\right\}=\frac{1}{n}{\sum}^n\left(\frac{A+1/2}{\pi }Y\right) \) or control \( \mu \left\{A=1/2\right\}=\frac{1}{n}{\sum}^n\left(\frac{1/2A}{1\pi }Y\right) \) to get the parameters: θ_{T} = μ{PTR} − μ{A = 1/2 } and θ_{C} = μ{PTR} − μ{A = − 1/2 }. These are interpreted as the expected change in outcome under a PTR compared to a policy where everybody receives treatment or everybody receives control.
In a simulation study, an additional measure of the performance of a PTR is the rate of misclassification; that is, the proportion of subjects whose PTR conflicts with their known optimal treatment: ℙ(PTR(X) ≠ PTR^{opt}), where PTR^{opt} indicates treatment if the simulated outcome under treatment is greater than the simulated outcome under control.
Simulations comparing approaches
Monte Carlo simulations were constructed to compare the regression, Kraemer and modified Kraemer approaches to estimating a PTR. Training datasets were simulated with sample sizes 75, 200 and 300 with 1:1 randomisation. These datasets were generated under a range of scenarios (shown in Table 1). PTRs were estimated using all three approaches and applied to a test dataset of the same size and generated using the same specifications as the training dataset. For each PTR, we use the test dataset to calculate the change under PTR (θ_{T}) and the misclassification rate. For each datagenerating scenario, 5000 simulations were carried out and we evaluate each method by averaging θ_{T} and the misclassification across simulations.
Application to randomised trial data
The three approaches to constructing a PTR were applied to data from the Fatigue Intervention Nurses Evaluation (FINE) randomised trial [12]. This trial randomised 296 patients diagnosed with Chronic Fatigue Syndrome to three groups: homedelivered pragmatic rehabilitation, supportive listening or treatment as usual. It found marginal evidence that homedelivered pragmatic rehabilitation reduced fatigue scores, compared to treatment as usual (effect estimate − 1.18, 95% confidence interval − 2.18 to − 0.18; 2sided p value = 0.021) [5]. An exploratory secondary analysis considered individual moderators of this effect for those randomised to either pragmatic rehabilitation or treatment as usual (n = 195) using variables collected at baseline [13]. Here, we combine these moderators to identify whether a PTR that recommends treatment to a subset improves on a scenario where everybody receives treatmentasusual approach.
The original effect estimate was not enough to change policy, given that the treatment was costly. Stratifying the treatment so that it is provided to those who benefit most might be more costeffective than providing treatment to everybody. For our analysis, variables were designated prognostic (in the regression or modified Kraemer approaches) if they were identified as such in the initial analysis [13]. The outcome variable was change in fatigue score from baseline to followup (with positive values indicating an improvement). Variables were designated moderators if they had a univariate p value in the initial analysis of less than 0.10. Moderating variables were then excluded then if their p value was greater than 0.3, in a multivariate model that included all moderating variables. Three variables remained to include in the PTR: baseline fatigue score (p value for interaction = 0.24), EQ5D mobility (no problems, some problems, severe problems, p = 0.16) and score on the Oslo Social Support scale, relating to concern (p = 0.15). The data was randomly split in half between training and test datasets (size n = 98 and n = 97 respectively) and then PTRs were evaluated on the test dataset using the parameter θ_{C} outlined above. Inference for this parameter was determined by drawing 1000 bootstrap samples and using the normal approximation.
Results
Simulations
Across all datagenerating scenarios and sample sizes, the regression method was superior to both the Kraemer and the modified Kraemer methods: on average, it was both associated with higher values of θ_{T} (the expected benefit under PTR compared to treating everyone) and the lowest misclassification rate (Table 2). The modification of the Kraemer method, where treated and control subjects are matched on their prognostic score, improved on the Kraemer method across most datagenerating scenarios. The exception is the scenario with a strong prognostic variable that is not included in the regression model. This suggests that the prognostic score is useful only when it captures sufficient variation in the prognostic effects. Post hoc, we changed the size of the calliper distance but this did not make any notifiable difference until it was < 0.05 SD or > 1.5 SD of the prognostic score (Table 2).
In the scenario with a nonlinear datagenerating model, no method, on average, constructed a PTR where subjects had a better outcome compared to a policy where everybody was treated. It is worth noting that, in this scenario, if each simulated subject were allocated the treatment they should have received, then their expected outcome would be, on average, 1.40 higher than if everybody were treated.
Trial data
The results of the PTR algorithms and the estimated change under a PTR, compared to a treatmentasusual approach, are shown in Table 3. There is weak evidence that the regression method results in a PTR that results in a greater reduction in chronic fatigue symptoms compared to an approach where everybody receives treatment as usual (θ = 1.92, 95% CI − 0.65 to 4.49). There was little evidence that a PTR estimated using the Kraemer method or the modified Kraemer method results in an improvement compared to treatment as usual (p = 0.47 and p = 0.13 respectively). Eight subjects were excluded when implementing the modified Kraemer approach because they did not have a match within the set calliper distance of the prognostic score.
Discussion
This paper reported on a comparison of three methods for constructing personalised treatment recommendations from randomised controlled trial data: the regression method that models outcome using treatment by moderator biomarker interactions; a method proposed by Kraemer that forms a combined moderator from individual moderator weights, calculated on all treated and control pairs; and a modification of Kraemer’s approach that utilises a prognostic score to sample pairs of treated and control subjects. Across all simulations, the regression approach outperformed Kraemer’s approach. The modification of Kraemer’s approach appeared to indicate higher values of θ, except in the case where there was a strong unobserved prognostic marker. The superiority of the regression approach was replicated using realworld data from a randomised trial of homedelivered pragmatic rehabilitation for chronic fatigue patients; however, for this example, no method conclusively demonstrated that a PTR does better than a policy of ‘treatment as usual’ despite there being several individual moderators of treatment effect [13]. Therefore, in this case, we conclude that forming a PTR is more difficult than finding individual treatment effect moderators.
All three methods described here use linear parameterisation to model trial data, and therefore, the efficacy of these methods relies on the models being correctly specified. In many situations, nonlinear models may be more applicable and when there are many variables then the likelihood of correctly specifying the model might be low. For example, none of the approaches, on average, indicated an improvement under PTR in the scenario with a nonlinear datagenerating model. These simulations were limited because they did not include any variable selection or transformations of variables based on model fit. Such processes require care and often require knowledge of the variables at hand. In practice, researchers will be insuring against model underfitting by testing for nonlinear terms and higherorder interactions. Overfitting of these models could be counterbalanced using regularisation techniques, such as Lasso regression. Additionally, methods exist that are more robust to model misspecification, for example methods that seek to maximise the expected outcome under a PTR using classification techniques [14,15,16].
The application to the FINE randomised control trial showed that a PTR, as estimated using the regression approach, might result in an improvement over a recommendation where everybody is provided treatment as usual; however, the 95% confidence interval for all approaches included estimates where the PTR strategy is associated with a small amount of harm. These results should not be overinterpreted: the 95% confidence intervals were wide, which indicates insufficient power to detect a change. Data from another trial testing the use of the PTR would be needed to confirm whether any PTR results in an overall benefit. Another aspect that should be considered is whether including cost information in the measure of benefit has an effect on the decision to adopt a PTR strategy.
In the discussion to the paper, Kraemer says: ‘an irrelevant baseline factor and a nonspecific predictor can have no influence on making different choices between [treatments]’. We argue that this appears to be false, based on our findings: information from a prognostic score appears to result in a composite moderator that more effectively discriminates between those who should receive treatment and those that should not. Whilst the Kraemer approach does not appear to improve on the regression approach, it should be noted that the original paper provides a useful example of how to judge the relative effect sizes of multiple modifiers that would form a useful exploratory analysis before forming a PTR.
Conclusion
Our simulations demonstrate that the parametric method proposed by Kraemer does not result in a more effective personalised treatment recommendation than a method that uses a regression model. Utilising a prognostic score improves the Kraemer method, however not to an extent that it should be adopted over the regression method.
Availability of data and materials
Stata dofiles used to construct simulations are available on request from the corresponding author.
Abbreviations
 PTR:

Personalised treatment recommendation
 RCT:

Randomised controlled trial
 FINE:

Fatigue Intervention Nurses Evaluation randomised trial
References
The Academy of Medical Sciences. Realising the potential of stratified medicine: Academy of Medical Sciences; 2013. p. 105.
Byar DP. Assessing apparent treatment  covariate interactions in randomized clinical trials. Stat Med. 1985;4:255–63.
Gail AM, Simon R. Testing for qualitative interactions between treatment effects and patient subsets. Int Biometric Soc. 2009;41:361–72.
Kraemer HC. Discovering, comparing, and combining moderators of treatment on outcome after randomized clinical trials: a parametric approach. Stat Med. 2013;32:1964–73.
Wearden AJ, Dowrick C, ChewGraham C, Bentall RP, Morriss RK, Peters S, et al. Nurse led, home based self help treatment for patients in primary care with chronic fatigue syndrome: randomised controlled trial. Br Med J. 2010;340:c1777.
Wallace ML, Frank E, Kraemer HC. A novel approach for developing and interpreting treatment moderator profiles in randomized clinical trials. JAMA Psychiatry. 2013;70:1241–7.
Hansen BB. The prognostic analogue of the propensity score. Biometrika. 2008;95:481–8.
National Institute for Health and Care Excellence. Cardiovascular disease: risk assessment and reduction including lipid modification (NICE Guidelines CG181). 2016. Available at: https://www.nice.org.uk/guidance/cg181. Accessed 12 Oct 2002.
Song X, Pepe MS. Evaluating markers for selecting a patient’s treatment. Biometrics. 2004;60:874–83.
Janes H, Brown MD, Huang Y, Pepe MS. An approach to evaluating and comparing biomarkers for patient treatment selection. Int J Biostat. 2014;10:99–121.
Kapelner A, Bleich J, Cohen ZD, DeRubeis RJ, Berk R. Inference for treatment regime models in personalized medicine 2014: http://arxiv.org/abs/1404.7844.
Wearden AJ, Riste L, Dowrick C, ChewGraham C, Bentall RP, Morriss RK, et al. Fatigue intervention by nurses evaluationthe FINE trial. A randomised controlled trial of nurse led selfhelp treatment for patients in primary care with chronic fatigue syndrome: study protocol. BMC Medicine. 2006;4:9.
Wearden AJ, Dunn G, Dowrick C, Morriss RK. Depressive symptoms and pragmatic rehabilitation for chronic fatigue syndrome. Br J Psychiatry. 2012;201:227–32.
Zhao Y, Zeng D, Rush AJ, Kosorok MR. Estimating individualized treatment rules using outcome weighted learning. J Am Stat Assoc. 2012;107:1106–18.
Rubin DB, van der Laan MJ. Statistical issues and limitations in personalized medicine research with clinical trials. Int J Biostat. 2012;8:18.
Zhang B, Tsiatis AA, Davidian M, Zhang M, Laber E. Estimating optimal treatment regimes from a classification perspective. Stat. 2012;1:103–14.
Acknowledgements
We would like to thank Alison Wearden and the rest of the FINE team for the use of their trial data and Andrew Pickles and Jennifier Hellier for their helpful comments on this work. This work is dedicated to the memory of Professor Graham Dunn, who provided us with support and advice when developing this manuscript.
Funding
This work was supported by the MRC North West Hub for Trials Methodology Research (MR/K025635/1). This work represents independent research part funded by the National Institute for Health Research (NIHR) Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.
Author information
Authors and Affiliations
Contributions
RE conceived of the study and MP developed the methodology and carried out the analysis. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Ethical approval for the FINE trial was granted by the Eastern Multicentre Research Ethics Committee, reference 03/5/62.
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Pierce, M., Emsley, R. A comparison of approaches for combining predictive markers for personalised treatment recommendations. Trials 22, 20 (2021). https://doi.org/10.1186/s13063020049012
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13063020049012