Adjustment for baseline characteristics in randomized trials using logistic regression: sample-based model versus true model

Perneger, Thomas; Combescure, Christophe; Poncet, Antoine

doi:10.1186/s13063-022-07053-7

Methodology
Open access
Published: 13 February 2023

Adjustment for baseline characteristics in randomized trials using logistic regression: sample-based model versus true model

Trials volume 24, Article number: 107 (2023) Cite this article

1857 Accesses
1 Citations
1 Altmetric
Metrics details

Abstract

Background

Adjustment for baseline prognostic factors in randomized clinical trials is usually performed by means of sample-based regression models. Sample-based models may be incorrect due to overfitting. To assess whether overfitting is a problem in practice, we used simulated data to examine the performance of the sample-based model in comparison to a “true” adjustment model, in terms of estimation of the treatment effect.

Methods

We conducted a simulation study using samples drawn from a “population” in which both the treatment effect and the effect of the potential confounder were specified. The outcome variable was binary. Using logistic regression, we compared three estimates of the treatment effect in each situation: unadjusted, adjusted for the confounder using the sample, adjusted for the confounder using the true effect. Experimental factors were sample size (from 2 × 50 to 2 × 1000), treatment effect (logit of 0, 0.5, or 1.0), confounder type (continuous or binary), and confounder effect (logit of 0, − 0.5, or − 1.0). The assessment criteria for the estimated treatment effect were bias, variance, precision (proportion of estimates within 0.1 logit units), type 1 error, and power.

Results

Sample-based adjustment models yielded more biased estimates of the treatment effect than adjustment models that used the true confounder effect but had similar variance, accuracy, power, and type 1 error rates. The simulation also confirmed the conservative bias of unadjusted analyses due to the non-collapsibility of the odds ratio, the smaller variance of unadjusted estimates, and the bias of the odds ratio away from the null hypothesis in small datasets.

Conclusions

Sample-based adjustment yields similar results to exact adjustment in estimating the treatment effect. Sample-based adjustment is preferable to no adjustment.

Peer Review reports

Introduction

Randomized trials rely on chance to form patient groups that are comparable at baseline. However, randomization balances the trial arms only in expectation, as a long term average; it does not guarantee that the groups will be comparable in any given instance [1,2,3]. As a result, current guidelines recommend that analyses of randomized clinical trials be adjusted for baseline patient characteristics that are associated with the outcome [4,5,6]. This approach assumes that the researchers are interested in the conditional treatment effect, i.e., treatment effect with all other patient characteristics held constant [7]. Several adjustment methods exist, including multiple regression, use of propensity scores, and other methods [8]. Here, we will consider only one case, adjustment for the confounder using logistic regression. In this case, as an added benefit, adjustment for prognostic factors will eliminate a conservative bias due to the non-collapsibility of the odds ratio, which occurs even when the trial arms are balanced [9,10,11,12].

Ideally, the adjustment model should represent correctly the effects of the prognostic factors under consideration. For example, if being 10 years older doubled the risk of death, this is the effect of age that should be used for adjustment. In real life, true effects are typically unknown, and the analyst estimates the effect of age from the trial sample at hand. But this sample-based model reflects the associations present in the study sample and will not necessarily yield the correct effect estimate—possibly, the effect of 10 years of age will be to triple the risk in this particular dataset, or to increase it by half, or even to reduce the risk. There is no guarantee that statistical adjustment based on available data will yield the correct estimate of the treatment effect, but it is also possible that the effect of over-fitting would be negligible.

To what extent using a potentially over-fitted sample-based adjustment model affects the estimation of treatment effects in randomized trials has not been explored to our knowledge. In this study, we use simulated data to compare a sample-based adjustment model to a true adjustment model, in terms of bias in estimating the treatment effect, as well as its variance, accuracy, and power.

Methods

We conducted an experimental simulation study. In brief, in each iteration, we generated a clinical trial dataset in which a patient was either treated or untreated (1:1), and each was assigned a specific value of the potential confounder. A binary outcome variable was generated for each patient, and the trial results were analyzed using three logistic regression models: without adjustment for the potential confounder, with confounder adjustment using a sample-based model, and with confounder adjustment using the true confounder effect. The estimates of the treatment effect were compared in terms of bias, variance, proportion of treatment effects that were reasonably close to the true value, power, and type 1 error (when the modeled treatment effect was nil). Each experiment was replicated 50,000 times.

Data generation

For each sample, we generated individual observations as follows: the treatment variable T was set to 1 in the experimental group and to 0 in the control group, and the potential confounder variable C was drawn either from a uniform distribution or from a Bernoulli distribution. We note that since C is independent of treatment under random allocation, it cannot be a confounder of the treatment effect in expectation (in other words, the estimator of the effect of treatment is unconfounded). However, C can cause “realized confounding” when by chance its distribution is not balanced across the two trial arms (i.e., any particular estimate of the effect of treatment can be confounded). Hereafter, for simplicity, we use the term “confounder” to designate a covariate C that is associated with the outcome and may be unbalanced between trial arms in any particular sample.

To facilitate comparisons between models, we selected the distributions of C so as to obtain the same variance. Thus, the uniform distribution of C had bounds − 0.75 and + 0.75 (variance was 1.5²/12, or 0.1875). The binary case had a Bernoulli parameter of 0.25 (variance was 0.25*0.75, or 0.1875). The expectations of C were 0 for the continuous case and 0.25 for the binary case.

Then, the probability of outcome r in an individual was obtained using the equation Logit(r) = β₁T + β₂C. The value of β₁ was set to 0, 0.5, or 1.0 (we used positive values of β₁ to facilitate the interpretation of the results; therefore, the outcome was clinically desirable). The value of β₂ was set to 0, − 0.5, or − 1.0. We note that the sign of β₂ is arbitrary and does not alter the estimation of the treatment effect. The value of r was obtained as e^{β1T + β2C}/(1 + e^{β1T + β2C}). The individual outcome was generated as a Bernoulli random variable Y with parameter r.

Sample sizes in each treatment arm were 50, 100, 200, 500, and 1000.

Analysis of each replicate

We estimated the treatment effects using these three models:

a)
Unadjusted analysis: Logit(Y) = b₀ + b₁T
b)
Adjusted for C using the sample-based model: Logit(Y) = b₀' + b₁'T + b₂'C
c)
Adjusted for C using the true effect: Logit(Y) = b₀″ + b₁″T + β₂C

The unadjusted model was included as a point of reference, even though it was not required to answer the research question. The difference between the two adjusted models is that b₂' was estimated from the data, whereas β₂ took the value used in the simulation; the product β₂*C was introduced as an offset variable into the regression model.

Analysis of the simulated results

For each of the 90 experimental situations (3 treatment effects, 3 confounder effects, 2 types of confounder, 5 sample sizes) and the 3 models, we report the following results:

a)
Bias in the estimated treatment effect, i.e., the mean of b₁—β₁.
b)
Variance of the estimated treatment effect b₁.
c)
Proportion of estimated treatment effects b₁ within ± 0.1 of the true parameter value (on the odds ratio scale, this corresponds to intervals of 0.89 to 1.11 when when β₁ = 0, 1.49 to 1.82 when β₁ = 0.5, and 2.46 to 3.00 when β₁ = 1).
d)
Proportion of treatment effects that were statistically significant (p < 0.05), i.e., type 1 error rate when β₁ = 0, and power when β₁ = 0.5 or 1.

Because some result patterns were similar across values of the treatment effect or confounder effect, we show herein only selected results.

To better understand the relationships between estimates of treatment effect, estimates of confounder effect, and confounder imbalance, we conducted the following analyses, for strong confounder and treatment effects (β₂ = − 1 and β₁ = 1), at N = 2 × 50:

a)
Scatterplots of estimates of treatment effect in the three models (unadjusted, adjusted for C using the sample, adjusted for C using the true effect); for a continuous confounder.
b)
Scatterplots of observed estimates of the adjusted treatment effect b₁ versus observed confounder effect b₂, for both types of confounder, with non-parametric regression lines (Lowess).
c)
Scatterplots of observed estimates of the adjusted treatment effect b₁ versus baseline imbalance between treatment arms in the confounder (using Cohen’s d), with non-parametric regression lines (Lowess).

The simulations and analyses were performed using the R software version R-4.0.2 (R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/).

Results

All models converged in all 90 experimental conditions.

Bias

When C was not associated with the outcome (β₂ = 0), treatment effects were biased upward in small samples, somewhat more under sample-based adjustment than without adjustment or adjustment using the true model, for both a continuous and a binary confounder (Table 1). This upward bias was also detected in presence of confounding. Furthermore, in presence of potential confounding (β₂ < 0), unadjusted estimates of treatment effect were biased downward, for both types of confounder, which corresponds to the expected effect of non-collapsibility of the odds ratio. Overall, adjustment using the true model (β₂) produced less positive bias at small sample size than adjustment with the sample-based model (b₂).

Table 1 Bias in the estimation of treatment effect (β₁ = 1), for different values of the confounder effect (β₂) and of sample size, for 3 logistic regression models: unadjusted for confounder, adjusted in sample-based model, and adjusted in true model

Full size table

Variance

We limit variance results to simulations under a strong confounder effect (β₂ = − 1), as the patterns were similar but weaker for the lower value of β₂ (Table 2). The variance of the treatment effect decreased predictably with sample size and was slightly lower in the unadjusted analyses. The two adjustment methods performed similarly.

Table 2 Variance in the estimation of treatment effect (b₁), for different values of the true treatment effect (β₁) and of sample size, with a strong confounder effect (β₂ = − 1): unadjusted for confounder, adjusted in sample-based model, and adjusted in true model

Full size table

Accuracy of estimation

Proportions of estimates that fell within ± 0.1 of the real parameter value were fairly low and even for the largest sample size of 2 × 1000 they barely reached 70% (Table 3). Unadjusted analyses produced less accurate estimates when the treatment effect was strong, which is consistent with the conservative bias of the estimates due to non-collapsibility. The two adjustment methods performed similarly.

Table 3 Proportion of treatment effects within ± 0.1 of the true value, for different values of the true treatment effect (β₁) and of sample size, with a strong confounder effect (β₂ = − 1): unadjusted for confounder, adjusted in sample-based model, and adjusted in true model

Full size table

Type 1 error and power

Type 1 errors were well controlled in all circumstances (Table 4). Power rose predictably with sample size and was slightly better for adjusted analyses than for unadjusted analyses (Table 5). The two adjustment methods yielded similar power.

Table 4 Proportion of type 1 errors, for different values of the confounder effect (β₂) and of sample size, for 3 logistic regression models: unadjusted for confounder, adjusted in sample-based model, and adjusted in true model

Full size table

Table 5 Observed power for different values of the treatment effect (β₁) and of sample size, with a strong confounder effect (β₂ = − 1), for 3 logistic regression models: unadjusted for confounder, adjusted in sample-based model, and adjusted in true model

Full size table

Correlations between treatment effect estimates

Unadjusted estimates of treatment effect were more strongly correlated with estimates adjusted for the true effect than with sample-based adjustment (Fig. 1). The Pearson correlation coefficients were 0.97, 0.98, and 0.99 in the three panels of Fig. 1. Despite the high correlations, the differences between the estimates of treatment effect could vary by 0.5 or 1 unit (on the logit scale) in some samples.

Joint distributions of estimated adjusted treatment and confounder effects

The scatterplots of the estimated treatment and confounder effects at size 2 × 50 (Fig. 2) yielded similar results for continuous and binary confounders. The estimated confounder effect b₂ ranged between approximately − 3 and 1, for a true parameter value of − 1. Treatment effects b₁ appeared stronger at negative values of the estimated confounder effect (i.e., when the confounder effect was overestimated). This showed as an asymmetry of the scatterplots and was confirmed by the non-parametric regression lines. Pearson correlation coefficients between the confounder effect and the treatment effect were − 0.10 for both types of confounder.

Estimated treatment effects as a function of baseline imbalance

The scatterplots of the estimated adjusted treatment effect as a function of baseline confounder imbalance were symmetric and did not reveal any bias or obvious heteroscedasticity (Fig. 3). Results were similar for continuous and binary confounders. Cohen’s d—i.e., between-arm difference in C expressed in pooled observed standard deviation units—ranged from approximately − 0.6 to + 0.6 (since confounder variance was 0.1875 by design, one standard deviation unit was 0.4330, and 0.6 of this value corresponds to 0.26). Pearson correlation coefficients between the confounder effect and the treatment effect were null in both scenarios.

Discussion

This simulation study indicated that a sample-based adjustment model has only a small disadvantage vis-à-vis a true model when analyzing the results of a clinical trial. Specifically, the sample-based model produced estimates of the treatment effect that were more positively biased, but only at small sample sizes (2 × 50). There were no losses in terms of accuracy, type 1 error, or power. Furthermore, we found no relation between the magnitude in the baseline imbalance in the potential confounder and the estimation of the treatment effect, after sample-based adjustment. This indicates that sample-based adjustment works adequately across levels of imbalance. We found that the adjusted treatment effect was overestimated when the effect of the confounder was over-estimated as well, but this occurred only in rather extreme situations (observed confounder effect at least twice as strong as the true effect). Overall, these results are reassuring; the current practice of adjusting based on the sample at hand appears reasonable.

We used the true adjustment model as a yardstick to demonstrate the possible impact of an incorrectly estimated sample-based confounder effect, but in real-life, the true confounder effect β₂ is usually unknown. Reasonably solid estimates may exist for some confounder effects: e.g., the prediction of death following brain injury has been modeled in trials that enrolled thousands of patients [13], and various mortality prediction models are available for intensive care patients, patients hospitalized with COVID-19, patients with coronary artery disease, etc. In other instances reasonable guesses are possible, at least as to the direction of the effect—e.g., greater severity of disease, presence of comorbidities, or older age are typically associated with less favorable outcomes. If the observed associations ran in the opposite direction, it may be prudent either to remove the paradoxical covariate from the adjustment model (effectively setting the regression coefficient to 0) or to apply other regularization methods. In any case, such adaptive procedures should be pre-specified in the statistical analysis plan, to avoid post hoc selection of the main analysis model.

This simulation study also confirmed two established results. One is the conservative bias present in unadjusted analyses of binary outcomes, due to the non-collapsibility of the odds ratio [9,10,11,12]. This bias increases with the effect of the confounding factor under consideration. This confirms the utility of adjusting trial results for known risk factors regardless of any imbalance at baseline. Such adjustment was particularly useful at larger sample sizes; indeed, with 2 × 1000 observations, adjusted estimates were substantially more accurate than unadjusted estimates.

The other confirmatory result was the positive bias of logistic regression coefficient estimates at small sample sizes. This too has been described previously [14, 15]. This bias away from the null in small samples was revealed by the adjustment procedures, and this is one area where the true adjustment model performed better than sample-based adjustment.

Finally, we did not observe any gain of power in adjusted models, compared with unadjusted analyses. This too is consistent with current knowledge. Power gains from confounder adjustment are expected in linear regression models for continuous outcomes, but not necessarily in analyses of binary outcomes [16], as adjusted estimates of treatment effect are generally less biased toward the null but also less precise. This was also shown in a previous simulation study partly based on actual trial results [17].

While estimates of treatment effect and of sample-based confounder effect were only weakly correlated, a notable bias in the treatment effect was seen only when the confounder effect was overestimated (Fig. 2). This suggest that analysts should remain cautious when the confounder effect is much larger than expected, based on prior knowledge. Furthermore, while unadjusted and adjusted treatment effects were highly correlated, substantial differences occurred on occasion (Fig. 1). This indicated that data dredging has the potential for yielding spurious results and reinforces the recommendation that adjustment models be always pre-specified.

A limitation of this study is that we did not explore all possible situations, such as different levels of baseline risk, or multiple adjustment variables. However, we believe that this simulation study provides a realistic assessment of the potential of true adjustment models to improve the analysis strategy for clinical trials. We found this potential to be minor; the risk inherent in relying on sample-based models seems negligible.

Another limitation is that we did not examine what happens if the treatment effect varies across subgroups (i.e., effect-modification, assuming that the effect-modifier is distinct from the confounder). If the effect-modifier is measured, then stratum-specific estimates of β₁ can be obtained, with adjustment for the potential confounder. However, the estimation of the confounder effect can be pooled over strata of the effect-modifier, which may reduce potential overfitting, according to our results. This would particularly benefit the estimation of treatment effects in small strata.

In conclusion, we saw on average little or no disadvantage to using a sample-based model, rather than a true regression model, for the adjustment for baseline prognostic factors. Adjusted estimates performed better than unadjusted estimates.

Availability of data and materials

Not applicable (the simulated datasets were discarded).

References

Greenland S. Randomization, statistics, and causal inference. Epidemiology. 1990;1:421–9.
Article CAS PubMed Google Scholar
Roberts C, Torgerson DJ. Baseline imbalance in randomised controlled trials. BMJ. 1999;319:185.
Article CAS PubMed PubMed Central Google Scholar
Hauck WW, Anderson S, Marcus SM. Should we adjust for covariates in nonlinear regression analyses of randomized clinical trials? Controlled Clin Trials. 1998;19:249–56.
Center for Drug Evaluation and Research. E9 statistical principles for clinical trials; 1998. Available at: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/e9-statistical-principles-clinical-trials
European Medicines Agency. Guideline on adjustment for baseline covariates in clinical trials. London, UK: European Medicines Agency, EMA/CHMP/295050/2013, 2015. Accessed at: https://www.ema.europa.eu/en/documents/scientific-guideline/guideline-adjustment-baseline-covariates-clinical-trials_en.pdf
Food and Drug Administration. Adjusting for covariates in randomized clinical trials for drugs and biologics with continuous outcomes. Guidance for industry. Draft. Rockville, MD, 2019. Accessed at: https://www.fda.gov/media/123801/download
Lee Y, Nelder JA. Conditional and marginal models: another view. Stat Sci. 2004;19:219–32.
Article Google Scholar
Morris TP, Walker AS, Williamson EJ, White IR. Planning a method for covariate adjustment in individually randomised trials: a practical guide. Trials. 2022;23:328.
Article PubMed PubMed Central Google Scholar
Gail MH, Wieland S, Piantadosi S. Biased estimates of treatment effect in randomized experiments with nonlinear regression and omitted covariates. Biometrika. 1984;71:431–44.
Article Google Scholar
Greenland S, Robins JM, Pearl J. Confounding and collapsibility in causal inference. Stat Science. 1999;14:29–46.
Article Google Scholar
Steyerberg EW, Bossuyt PMM, Lee KL. Clinical trials in acute myocardial infarction: should we adjust for baseline characteristics? Am Heart J. 2000;139:745–52.
Article CAS PubMed Google Scholar
Groenwold RHH, Moons KGM, Peelen LM, Knol MJ, Hoes AW. Reporting of treatment effects from randomized trials: a plea for multivariable risk ratios. Contemp Clin Trials. 2011;32:399–402.
Article PubMed Google Scholar
Turner EL, Perel P, Clayton T, Edwards P, Hernandez AV, Roberts I, Shakur H, Steyerberg EW. Covariate adjustment increased power in randomized controlled trials: an example in traumatic brain injury. J Clin Epidemiol. 2012;65:474–81.
Article PubMed Google Scholar
Walter SD. Small sample estimation of log odds ratios from logistic regression and fourfold tables. Stat Med. 1984;4:437–44.
Article Google Scholar
Nemes S, Jonasson JM, Genell A, Steineck G. Bias in odds ratios by logistic regression modelling and sample size. BMC Med Res Methodol. 2009;9:56.
Article PubMed PubMed Central Google Scholar
Jiang H, Kulkarni PM, Mallinckrodt CH, Shurzinske L, Molenberghs G, Lipkivitch I. Covariate adjustment for logistic regression analysis of binary clinical trial data. Stat Biopharm Res. 2017;9:126–34.
Article Google Scholar
Kahan BC, Jairath V, Doré CJ, Morris TP. The risks and rewards of covariate adjustment in randomized trials: an assessment of 12 outcomes from 8 studies. Trials. 2014;15:139.
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

Not applicable

Funding

No specific funding was obtained.

Author information

Authors and Affiliations

Division of Clinical Epidemiology, University of Geneva and Geneva University Hospitals, 6 Rue Gabrielle-Perret-Gentil, 1211, Geneva, Switzerland
Thomas Perneger, Christophe Combescure & Antoine Poncet

Authors

Thomas Perneger
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Combescure
View author publications
You can also search for this author in PubMed Google Scholar
Antoine Poncet
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

TP proposed the study, designed the study, interpreted the results, wrote the first draft; CC designed the study, interpreted the results, and revised the manuscript; AP designed the study, conducted the simulations, produced tables and figures, interpreted the results, and revised the manuscript. All authors read and approved the manuscript.

Corresponding author

Correspondence to Thomas Perneger.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Appendix.

R code used for simulations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Perneger, T., Combescure, C. & Poncet, A. Adjustment for baseline characteristics in randomized trials using logistic regression: sample-based model versus true model. Trials 24, 107 (2023). https://doi.org/10.1186/s13063-022-07053-7

Download citation

Received: 31 March 2022
Accepted: 26 December 2022
Published: 13 February 2023
DOI: https://doi.org/10.1186/s13063-022-07053-7

Adjustment for baseline characteristics in randomized trials using logistic regression: sample-based model versus true model

Abstract

Background

Methods

Results

Conclusions

Introduction

Methods

Data generation

Analysis of each replicate

Analysis of the simulated results

Results

Bias

Variance

Accuracy of estimation

Type 1 error and power

Correlations between treatment effect estimates

Joint distributions of estimated adjusted treatment and confounder effects

Estimated treatment effects as a function of baseline imbalance

Discussion

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Additional file 1: Appendix.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Trials

Contact us