 Methodology
 Open access
 Published:
Adjustment for baseline characteristics in randomized trials using logistic regression: samplebased model versus true model
Trials volume 24, Article number: 107 (2023)
Abstract
Background
Adjustment for baseline prognostic factors in randomized clinical trials is usually performed by means of samplebased regression models. Samplebased models may be incorrect due to overfitting. To assess whether overfitting is a problem in practice, we used simulated data to examine the performance of the samplebased model in comparison to a “true” adjustment model, in terms of estimation of the treatment effect.
Methods
We conducted a simulation study using samples drawn from a “population” in which both the treatment effect and the effect of the potential confounder were specified. The outcome variable was binary. Using logistic regression, we compared three estimates of the treatment effect in each situation: unadjusted, adjusted for the confounder using the sample, adjusted for the confounder using the true effect. Experimental factors were sample size (from 2 × 50 to 2 × 1000), treatment effect (logit of 0, 0.5, or 1.0), confounder type (continuous or binary), and confounder effect (logit of 0, − 0.5, or − 1.0). The assessment criteria for the estimated treatment effect were bias, variance, precision (proportion of estimates within 0.1 logit units), type 1 error, and power.
Results
Samplebased adjustment models yielded more biased estimates of the treatment effect than adjustment models that used the true confounder effect but had similar variance, accuracy, power, and type 1 error rates. The simulation also confirmed the conservative bias of unadjusted analyses due to the noncollapsibility of the odds ratio, the smaller variance of unadjusted estimates, and the bias of the odds ratio away from the null hypothesis in small datasets.
Conclusions
Samplebased adjustment yields similar results to exact adjustment in estimating the treatment effect. Samplebased adjustment is preferable to no adjustment.
Introduction
Randomized trials rely on chance to form patient groups that are comparable at baseline. However, randomization balances the trial arms only in expectation, as a long term average; it does not guarantee that the groups will be comparable in any given instance [1,2,3]. As a result, current guidelines recommend that analyses of randomized clinical trials be adjusted for baseline patient characteristics that are associated with the outcome [4,5,6]. This approach assumes that the researchers are interested in the conditional treatment effect, i.e., treatment effect with all other patient characteristics held constant [7]. Several adjustment methods exist, including multiple regression, use of propensity scores, and other methods [8]. Here, we will consider only one case, adjustment for the confounder using logistic regression. In this case, as an added benefit, adjustment for prognostic factors will eliminate a conservative bias due to the noncollapsibility of the odds ratio, which occurs even when the trial arms are balanced [9,10,11,12].
Ideally, the adjustment model should represent correctly the effects of the prognostic factors under consideration. For example, if being 10 years older doubled the risk of death, this is the effect of age that should be used for adjustment. In real life, true effects are typically unknown, and the analyst estimates the effect of age from the trial sample at hand. But this samplebased model reflects the associations present in the study sample and will not necessarily yield the correct effect estimate—possibly, the effect of 10 years of age will be to triple the risk in this particular dataset, or to increase it by half, or even to reduce the risk. There is no guarantee that statistical adjustment based on available data will yield the correct estimate of the treatment effect, but it is also possible that the effect of overfitting would be negligible.
To what extent using a potentially overfitted samplebased adjustment model affects the estimation of treatment effects in randomized trials has not been explored to our knowledge. In this study, we use simulated data to compare a samplebased adjustment model to a true adjustment model, in terms of bias in estimating the treatment effect, as well as its variance, accuracy, and power.
Methods
We conducted an experimental simulation study. In brief, in each iteration, we generated a clinical trial dataset in which a patient was either treated or untreated (1:1), and each was assigned a specific value of the potential confounder. A binary outcome variable was generated for each patient, and the trial results were analyzed using three logistic regression models: without adjustment for the potential confounder, with confounder adjustment using a samplebased model, and with confounder adjustment using the true confounder effect. The estimates of the treatment effect were compared in terms of bias, variance, proportion of treatment effects that were reasonably close to the true value, power, and type 1 error (when the modeled treatment effect was nil). Each experiment was replicated 50,000 times.
Data generation
For each sample, we generated individual observations as follows: the treatment variable T was set to 1 in the experimental group and to 0 in the control group, and the potential confounder variable C was drawn either from a uniform distribution or from a Bernoulli distribution. We note that since C is independent of treatment under random allocation, it cannot be a confounder of the treatment effect in expectation (in other words, the estimator of the effect of treatment is unconfounded). However, C can cause “realized confounding” when by chance its distribution is not balanced across the two trial arms (i.e., any particular estimate of the effect of treatment can be confounded). Hereafter, for simplicity, we use the term “confounder” to designate a covariate C that is associated with the outcome and may be unbalanced between trial arms in any particular sample.
To facilitate comparisons between models, we selected the distributions of C so as to obtain the same variance. Thus, the uniform distribution of C had bounds − 0.75 and + 0.75 (variance was 1.5^{2}/12, or 0.1875). The binary case had a Bernoulli parameter of 0.25 (variance was 0.25*0.75, or 0.1875). The expectations of C were 0 for the continuous case and 0.25 for the binary case.
Then, the probability of outcome r in an individual was obtained using the equation Logit(r) = β_{1}T + β_{2}C. The value of β_{1} was set to 0, 0.5, or 1.0 (we used positive values of β_{1} to facilitate the interpretation of the results; therefore, the outcome was clinically desirable). The value of β_{2} was set to 0, − 0.5, or − 1.0. We note that the sign of β_{2} is arbitrary and does not alter the estimation of the treatment effect. The value of r was obtained as e^{β1T + β2C}/(1 + e^{β1T + β2C}). The individual outcome was generated as a Bernoulli random variable Y with parameter r.
Sample sizes in each treatment arm were 50, 100, 200, 500, and 1000.
Analysis of each replicate
We estimated the treatment effects using these three models:

a)
Unadjusted analysis: Logit(Y) = b_{0} + b_{1}T

b)
Adjusted for C using the samplebased model: Logit(Y) = b_{0}' + b_{1}'T + b_{2}'C

c)
Adjusted for C using the true effect: Logit(Y) = b_{0}″ + b_{1}″T + β_{2}C
The unadjusted model was included as a point of reference, even though it was not required to answer the research question. The difference between the two adjusted models is that b_{2}' was estimated from the data, whereas β_{2} took the value used in the simulation; the product β_{2}*C was introduced as an offset variable into the regression model.
Analysis of the simulated results
For each of the 90 experimental situations (3 treatment effects, 3 confounder effects, 2 types of confounder, 5 sample sizes) and the 3 models, we report the following results:

a)
Bias in the estimated treatment effect, i.e., the mean of b_{1}—β_{1}.

b)
Variance of the estimated treatment effect b_{1}.

c)
Proportion of estimated treatment effects b_{1} within ± 0.1 of the true parameter value (on the odds ratio scale, this corresponds to intervals of 0.89 to 1.11 when when β_{1} = 0, 1.49 to 1.82 when β_{1} = 0.5, and 2.46 to 3.00 when β_{1} = 1).

d)
Proportion of treatment effects that were statistically significant (p < 0.05), i.e., type 1 error rate when β_{1} = 0, and power when β_{1} = 0.5 or 1.
Because some result patterns were similar across values of the treatment effect or confounder effect, we show herein only selected results.
To better understand the relationships between estimates of treatment effect, estimates of confounder effect, and confounder imbalance, we conducted the following analyses, for strong confounder and treatment effects (β_{2} = − 1 and β_{1} = 1), at N = 2 × 50:

a)
Scatterplots of estimates of treatment effect in the three models (unadjusted, adjusted for C using the sample, adjusted for C using the true effect); for a continuous confounder.

b)
Scatterplots of observed estimates of the adjusted treatment effect b_{1} versus observed confounder effect b_{2}, for both types of confounder, with nonparametric regression lines (Lowess).

c)
Scatterplots of observed estimates of the adjusted treatment effect b_{1} versus baseline imbalance between treatment arms in the confounder (using Cohen’s d), with nonparametric regression lines (Lowess).
The simulations and analyses were performed using the R software version R4.0.2 (R Foundation for Statistical Computing, Vienna, Austria. URL https://www.Rproject.org/).
Results
All models converged in all 90 experimental conditions.
Bias
When C was not associated with the outcome (β_{2} = 0), treatment effects were biased upward in small samples, somewhat more under samplebased adjustment than without adjustment or adjustment using the true model, for both a continuous and a binary confounder (Table 1). This upward bias was also detected in presence of confounding. Furthermore, in presence of potential confounding (β_{2} < 0), unadjusted estimates of treatment effect were biased downward, for both types of confounder, which corresponds to the expected effect of noncollapsibility of the odds ratio. Overall, adjustment using the true model (β_{2}) produced less positive bias at small sample size than adjustment with the samplebased model (b_{2}).
Variance
We limit variance results to simulations under a strong confounder effect (β_{2} = − 1), as the patterns were similar but weaker for the lower value of β_{2} (Table 2). The variance of the treatment effect decreased predictably with sample size and was slightly lower in the unadjusted analyses. The two adjustment methods performed similarly.
Accuracy of estimation
Proportions of estimates that fell within ± 0.1 of the real parameter value were fairly low and even for the largest sample size of 2 × 1000 they barely reached 70% (Table 3). Unadjusted analyses produced less accurate estimates when the treatment effect was strong, which is consistent with the conservative bias of the estimates due to noncollapsibility. The two adjustment methods performed similarly.
Type 1 error and power
Type 1 errors were well controlled in all circumstances (Table 4). Power rose predictably with sample size and was slightly better for adjusted analyses than for unadjusted analyses (Table 5). The two adjustment methods yielded similar power.
Correlations between treatment effect estimates
Unadjusted estimates of treatment effect were more strongly correlated with estimates adjusted for the true effect than with samplebased adjustment (Fig. 1). The Pearson correlation coefficients were 0.97, 0.98, and 0.99 in the three panels of Fig. 1. Despite the high correlations, the differences between the estimates of treatment effect could vary by 0.5 or 1 unit (on the logit scale) in some samples.
Joint distributions of estimated adjusted treatment and confounder effects
The scatterplots of the estimated treatment and confounder effects at size 2 × 50 (Fig. 2) yielded similar results for continuous and binary confounders. The estimated confounder effect b_{2} ranged between approximately − 3 and 1, for a true parameter value of − 1. Treatment effects b_{1} appeared stronger at negative values of the estimated confounder effect (i.e., when the confounder effect was overestimated). This showed as an asymmetry of the scatterplots and was confirmed by the nonparametric regression lines. Pearson correlation coefficients between the confounder effect and the treatment effect were − 0.10 for both types of confounder.
Estimated treatment effects as a function of baseline imbalance
The scatterplots of the estimated adjusted treatment effect as a function of baseline confounder imbalance were symmetric and did not reveal any bias or obvious heteroscedasticity (Fig. 3). Results were similar for continuous and binary confounders. Cohen’s d—i.e., betweenarm difference in C expressed in pooled observed standard deviation units—ranged from approximately − 0.6 to + 0.6 (since confounder variance was 0.1875 by design, one standard deviation unit was 0.4330, and 0.6 of this value corresponds to 0.26). Pearson correlation coefficients between the confounder effect and the treatment effect were null in both scenarios.
Discussion
This simulation study indicated that a samplebased adjustment model has only a small disadvantage visàvis a true model when analyzing the results of a clinical trial. Specifically, the samplebased model produced estimates of the treatment effect that were more positively biased, but only at small sample sizes (2 × 50). There were no losses in terms of accuracy, type 1 error, or power. Furthermore, we found no relation between the magnitude in the baseline imbalance in the potential confounder and the estimation of the treatment effect, after samplebased adjustment. This indicates that samplebased adjustment works adequately across levels of imbalance. We found that the adjusted treatment effect was overestimated when the effect of the confounder was overestimated as well, but this occurred only in rather extreme situations (observed confounder effect at least twice as strong as the true effect). Overall, these results are reassuring; the current practice of adjusting based on the sample at hand appears reasonable.
We used the true adjustment model as a yardstick to demonstrate the possible impact of an incorrectly estimated samplebased confounder effect, but in reallife, the true confounder effect β_{2} is usually unknown. Reasonably solid estimates may exist for some confounder effects: e.g., the prediction of death following brain injury has been modeled in trials that enrolled thousands of patients [13], and various mortality prediction models are available for intensive care patients, patients hospitalized with COVID19, patients with coronary artery disease, etc. In other instances reasonable guesses are possible, at least as to the direction of the effect—e.g., greater severity of disease, presence of comorbidities, or older age are typically associated with less favorable outcomes. If the observed associations ran in the opposite direction, it may be prudent either to remove the paradoxical covariate from the adjustment model (effectively setting the regression coefficient to 0) or to apply other regularization methods. In any case, such adaptive procedures should be prespecified in the statistical analysis plan, to avoid post hoc selection of the main analysis model.
This simulation study also confirmed two established results. One is the conservative bias present in unadjusted analyses of binary outcomes, due to the noncollapsibility of the odds ratio [9,10,11,12]. This bias increases with the effect of the confounding factor under consideration. This confirms the utility of adjusting trial results for known risk factors regardless of any imbalance at baseline. Such adjustment was particularly useful at larger sample sizes; indeed, with 2 × 1000 observations, adjusted estimates were substantially more accurate than unadjusted estimates.
The other confirmatory result was the positive bias of logistic regression coefficient estimates at small sample sizes. This too has been described previously [14, 15]. This bias away from the null in small samples was revealed by the adjustment procedures, and this is one area where the true adjustment model performed better than samplebased adjustment.
Finally, we did not observe any gain of power in adjusted models, compared with unadjusted analyses. This too is consistent with current knowledge. Power gains from confounder adjustment are expected in linear regression models for continuous outcomes, but not necessarily in analyses of binary outcomes [16], as adjusted estimates of treatment effect are generally less biased toward the null but also less precise. This was also shown in a previous simulation study partly based on actual trial results [17].
While estimates of treatment effect and of samplebased confounder effect were only weakly correlated, a notable bias in the treatment effect was seen only when the confounder effect was overestimated (Fig. 2). This suggest that analysts should remain cautious when the confounder effect is much larger than expected, based on prior knowledge. Furthermore, while unadjusted and adjusted treatment effects were highly correlated, substantial differences occurred on occasion (Fig. 1). This indicated that data dredging has the potential for yielding spurious results and reinforces the recommendation that adjustment models be always prespecified.
A limitation of this study is that we did not explore all possible situations, such as different levels of baseline risk, or multiple adjustment variables. However, we believe that this simulation study provides a realistic assessment of the potential of true adjustment models to improve the analysis strategy for clinical trials. We found this potential to be minor; the risk inherent in relying on samplebased models seems negligible.
Another limitation is that we did not examine what happens if the treatment effect varies across subgroups (i.e., effectmodification, assuming that the effectmodifier is distinct from the confounder). If the effectmodifier is measured, then stratumspecific estimates of β_{1} can be obtained, with adjustment for the potential confounder. However, the estimation of the confounder effect can be pooled over strata of the effectmodifier, which may reduce potential overfitting, according to our results. This would particularly benefit the estimation of treatment effects in small strata.
In conclusion, we saw on average little or no disadvantage to using a samplebased model, rather than a true regression model, for the adjustment for baseline prognostic factors. Adjusted estimates performed better than unadjusted estimates.
Availability of data and materials
Not applicable (the simulated datasets were discarded).
References
Greenland S. Randomization, statistics, and causal inference. Epidemiology. 1990;1:421–9.
Roberts C, Torgerson DJ. Baseline imbalance in randomised controlled trials. BMJ. 1999;319:185.
Hauck WW, Anderson S, Marcus SM. Should we adjust for covariates in nonlinear regression analyses of randomized clinical trials? Controlled Clin Trials. 1998;19:249–56.
Center for Drug Evaluation and Research. E9 statistical principles for clinical trials; 1998. Available at: https://www.fda.gov/regulatoryinformation/searchfdaguidancedocuments/e9statisticalprinciplesclinicaltrials
European Medicines Agency. Guideline on adjustment for baseline covariates in clinical trials. London, UK: European Medicines Agency, EMA/CHMP/295050/2013, 2015. Accessed at: https://www.ema.europa.eu/en/documents/scientificguideline/guidelineadjustmentbaselinecovariatesclinicaltrials_en.pdf
Food and Drug Administration. Adjusting for covariates in randomized clinical trials for drugs and biologics with continuous outcomes. Guidance for industry. Draft. Rockville, MD, 2019. Accessed at: https://www.fda.gov/media/123801/download
Lee Y, Nelder JA. Conditional and marginal models: another view. Stat Sci. 2004;19:219–32.
Morris TP, Walker AS, Williamson EJ, White IR. Planning a method for covariate adjustment in individually randomised trials: a practical guide. Trials. 2022;23:328.
Gail MH, Wieland S, Piantadosi S. Biased estimates of treatment effect in randomized experiments with nonlinear regression and omitted covariates. Biometrika. 1984;71:431–44.
Greenland S, Robins JM, Pearl J. Confounding and collapsibility in causal inference. Stat Science. 1999;14:29–46.
Steyerberg EW, Bossuyt PMM, Lee KL. Clinical trials in acute myocardial infarction: should we adjust for baseline characteristics? Am Heart J. 2000;139:745–52.
Groenwold RHH, Moons KGM, Peelen LM, Knol MJ, Hoes AW. Reporting of treatment effects from randomized trials: a plea for multivariable risk ratios. Contemp Clin Trials. 2011;32:399–402.
Turner EL, Perel P, Clayton T, Edwards P, Hernandez AV, Roberts I, Shakur H, Steyerberg EW. Covariate adjustment increased power in randomized controlled trials: an example in traumatic brain injury. J Clin Epidemiol. 2012;65:474–81.
Walter SD. Small sample estimation of log odds ratios from logistic regression and fourfold tables. Stat Med. 1984;4:437–44.
Nemes S, Jonasson JM, Genell A, Steineck G. Bias in odds ratios by logistic regression modelling and sample size. BMC Med Res Methodol. 2009;9:56.
Jiang H, Kulkarni PM, Mallinckrodt CH, Shurzinske L, Molenberghs G, Lipkivitch I. Covariate adjustment for logistic regression analysis of binary clinical trial data. Stat Biopharm Res. 2017;9:126–34.
Kahan BC, Jairath V, Doré CJ, Morris TP. The risks and rewards of covariate adjustment in randomized trials: an assessment of 12 outcomes from 8 studies. Trials. 2014;15:139.
Acknowledgements
Not applicable
Funding
No specific funding was obtained.
Author information
Authors and Affiliations
Contributions
TP proposed the study, designed the study, interpreted the results, wrote the first draft; CC designed the study, interpreted the results, and revised the manuscript; AP designed the study, conducted the simulations, produced tables and figures, interpreted the results, and revised the manuscript. All authors read and approved the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1: Appendix.
R code used for simulations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Perneger, T., Combescure, C. & Poncet, A. Adjustment for baseline characteristics in randomized trials using logistic regression: samplebased model versus true model. Trials 24, 107 (2023). https://doi.org/10.1186/s13063022070537
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13063022070537