Interval-cohort designs and bias in the estimation of per-protocol effects: a simulation study

Background Randomized trials are considered the gold standard for making inferences about the causal effects of treatments. However, when protocol deviations occur, the baseline randomization of the trial is no longer sufficient to ensure unbiased estimation of the per-protocol effect: post-randomization, time-varying confounders must be sufficiently measured and adjusted for in the analysis. Given the historical emphasis on intention-to-treat effects in randomized trials, measurement of post-randomization confounders is typically infrequent. This may induce bias in estimates of the per-protocol effect, even using methods such as inverse probability weighting, which appropriately account for time-varying confounders affected by past treatment. Methods/design In order to concretely illustrate the potential magnitude of bias due to infrequent measurement of time-varying covariates, we simulated data from a very large trial with a survival outcome and time-varying confounding affected by past treatment. We generated the data such that the true underlying per-protocol effect is null and under varying degrees of confounding (strong, moderate, weak). In the simulated data, we estimated per-protocol survival curves and associated contrasts using inverse probability weighting under monthly measurement of the time-varying covariates (which constituted complete measurement in our simulation), yearly measurement, as well as 3- and 6-month intervals. Results Using inverse probability weighting, we were able to recover the true null under the complete measurement scenario no matter the strength of confounding. Under yearly measurement intervals, the estimate of the per-protocol effect diverged from the null; inverse probability weighted estimates of the per-protocol 5-year risk ratio based on yearly measurement were 1.19, 1.12, and 1.03 under strong, moderate, and weak confounding, respectively. Bias decreased with measurement interval length. Under all scenarios, inverse probability weighted estimators were considerably less biased than a naive estimator that ignored time-varying confounding completely. Conclusions Bias that arises from interval measurement designs highlights the need for planning in the design of randomized trials for collection of time-varying covariate data. This may come from more frequent in-person measurement or external sources (e.g., electronic medical record data). Such planning will provide improved estimates of the per-protocol effect through the use of methods that appropriately adjust for time-varying confounders. Electronic supplementary material The online version of this article (10.1186/s13063-019-3577-z) contains supplementary material, which is available to authorized users.


Details of inverse probability weighted estimation when full time-varying covariate history is measured
We estimated the counterfactual survival by a given month t + 1 in arm Z = z had everyone in that arm adhered to the protocol as follows in a person-time data set with one record per study participant in arm Z = z per month of his/her follow-up under full data collection: 1. Artificially censor a participant the first month in which he/she deviates from the protocol (i.e. the first time A t = z). This month of censoring will be the last record in the artificially censored data set for that subject.
2. For each record indexed by a particular subject and month s ≤ t in the artificially censored data, attach a weight to that record which takes the value 0 if the subject first deviated from the protocol at time s and otherwise takes the value: where "overbars" are used to represent history of a covariate through the specified time index (e.g. A j−1 is observed treatment history through j − 1), a j−1 is a vector of constants all equal to z, s j=0 represents taking the product from month 0 through month s,Pr(A j = z|A j−1 = a j−1 , Y j = 0, Z = z) is an estimate of the overall probability the subject adhered to the protocol in month j andPr(A j = z|L j , A j−1 = a j−1 , Y j = 0, Z = z) is an estimate of this same probability but conditional on the study participant's covariate history through j (which we denote L j ). Both the numerator and denominator probabilities can be estimated from the data via pooled logistic over time regression models. 4. From the monthly hazards, estimate the survival probability by month t + 1 for am Z = z that would have been observed under full adherence as S(t + 1, z) = t+1 s=1 (1 − h s (z)).
The above algorithm is repeated in each arm Z = 1 and Z = 0 for each follow-up month t + 1 = 1, . . . , 60 to obtain survival curves. An estimate of the per-protocol effect is obtained by a contrast in 1 − S(1 + 1, z = 1) vs. 1 − S(t + 1, z = 0). This estimate will recover the true per-protocol effect under our data generating mechanism (as depicted by the causal diagram in the main text) and also provided the pooled logistic regression model used to estimate the weight denominator probabilities is correctly specified. Because we simulated A t directly from this model, we fit this model according to the true data generating model under the full data collection scenario.
R code is provided in a separate supplementary file.

Comparison of bias calculations based on a single large sample versus the average of many smaller samples
Appendix Figures 1-4 show that the calculation of bias is nearly equivalent comparing estimates based on a single sample of 100,000 individuals per arm compared with the average of estimates based on many samples of either 100 or 500 individuals per arm. This is illustrated for bias under the main scenario of strong confounding and approximately 40% nonadherence per arm (scenario 0 of Table 1