Skip to main content

Randomized controlled trials: who fails run-in?



Early identification of participants at risk of run-in failure (RIF) may present opportunities to improve trial efficiency and generalizability.


We conducted a partial factorial-design, randomized, controlled trial of calcium and vitamin D to prevent colorectal adenoma recurrence at 11 centers in the United States. At baseline, participants completed two self-administered questionnaires (SAQs) and a questionnaire administered by staff. Participants in the full factorial randomization (calcium, vitamin D, both, or neither) received a placebo during a 3-month single-blinded run-in; women electing to take calcium enrolled in a two-group randomization (calcium with vitamin D, or calcium alone) and received calcium during the run-in. Using logistic regression models, we examined baseline factors associated with RIF in three subgroups: men (N = 1606) and women (N = 301) in the full factorial randomization and women in the two-group randomization (N = 666).


Overall, 314/2573 (12 %) participants failed run-in; 211 (67 %) took fewer than 80 % of their tablets (poor adherence), and 103 (33 %) withdrew or were uncooperative. In multivariable models, 8- to 13-fold variation was seen by study center in odds of RIF risk in the two largest groups. In men, RIF decreased with age (adjusted odds ratio [OR] per 5 years 0.85 [95 % confidence interval, CI; 0.76–0.96]) and was associated with being single (OR 1.65 [95 % CI; 1.10–2.47]), not graduating from high school (OR 2.77 [95 % CI; 1.58–4.85]), and missing SAQ data (OR 1.97 [1.40–2.76]). Among women, RIF was associated primarily with health-related factors; RIF risk was lower with higher physical health score (OR 0.73 [95 % CI; 0.62–0.86]) and baseline multivitamin use (OR 0.44 [95 % CI; 0.26–0.75]). Women in the 5-year colonoscopy surveillance interval were at greater risk of RIF than those with 3-year follow-up (OR 1.91 [95 % CI; 1.08–3.37]), and the number of prescription medicines taken was also positively correlated with RIF (p = 0.03). Perceived toxicities during run-in were associated with 12- to 29-fold significantly increased odds of RIF.


There were few common baseline predictors of run-in failure in the three randomization groups. However, heterogeneity in run-in failure associated with study center, and missing SAQ data reflect potential opportunities for intervention to improve trial efficiency and retention.

Trial registration NCT00153816. Registered September 2005.

Peer Review reports


The run-in period of a trial is a participatory phase between enrollment and randomization to determine participants’ eligibility to continue in the trial. At the end of run-in, participants are randomized only if they meet prespecified criteria. A common use of the run-in period is to identify and exclude individuals who are likely to adhere poorly to the trial protocol. When used appropriately, this helps to minimize dropouts after randomization [1] and maximize trial efficiency and statistical power in the estimation of efficacy, although it may impair external validity if run-in failures (RIFs) are systematically different than those retained in the trial. Other uses of a run-in period are to identify and remove “placebo responders,” to identify the best effective or tolerated dose for each participant, to select participants with a good clinical response to (or tolerance of) the active treatment [2], or to establish baseline measurements for comparison after the intervention has been applied [3]. Such uses of a run-in period may introduce various forms of bias [4]. The run-in also offers time before randomization for participants to change their minds about taking part and for investigators to verify eligibility, for example, through review of enrollment blood test results and medical records.

In a trial whose placebo run-in is specifically designed to assess adherence and select adherent participants, participants may be largely responsible for their ineligibility, e.g., due to poor pill-taking or changing their mind about participation; in other cases they may have been good candidates to take part if factors beyond their control had not intervened, e.g., abnormal blood test results. Simplistically, we can think of these groups as voluntary and involuntary RIFs, respectively. An understanding of the characteristics associated with voluntary RIF may increase efficiency in trial planning, help determine enrollment targets for subgroups at risk of failing run-in, or be used to identify participants who might be retained in the trial using motivational strategies.

During a multicenter, randomized, placebo-controlled trial of calcium and vitamin D in the chemoprevention of colorectal adenoma recurrence [5], we examined the characteristics of participants who became “voluntary run-in failures” after an approximately 3-month single-blinded placebo run-in period.


We conducted a randomized, double-blinded, placebo-controlled multicenter trial of daily supplementation with 1000 IU vitamin D3 and/or 1200 mg elemental calcium as calcium carbonate, for the prevention of large bowel adenomas [5]. Participants were recruited between 2004 and 2008 at 11 clinical centers in the continental United States and Puerto Rico following a complete colonoscopy, during which at least one colorectal adenoma was removed and none remained after the procedure. Eligible participants aged 45–75 years were in good general health and had no contraindications to calcium or vitamin D, no familial colorectal cancer syndromes, and no history of serious gastrointestinal disease. All participants provided written informed consent; the research was approved by the Committee for the Protection of Human Subjects at Dartmouth College and by Institutional Review Boards (IRBs) at each clinical center (see Additional file 1 for the list). The trial is reported according to the Consolidated Standards of Reporting Trials CONSORT (see Additional file 2 for the CONSORT checklist).

Enrollment and run-in

During the informed consent process at enrollment, participants received an explanation about the study procedures, including randomized allocation to the study agents, and were provided a copy of the signed and dated informed consent form. Participants were mailed the SF-36 short form health survey and food frequency questionnaires to complete at home and bring to the enrollment interview. During a 2- to 3-hour in-person interview, they completed a detailed intake questionnaire including questions on their beliefs about the study interventions and the health effects of vitamin D: preference: “If you could choose, which kind of pill would you like to receive during the study?”; efficacy beliefs: “How likely do you think it is that vitamin D supplements [are helpful in preventing colon polyps]/[improve general health]/[improve pain in bones and joints]/[improve mood or depression]/[cause constipation]?”; and allocation belief: “If you were to place a bet, which pill would you bet you’ll be given during the trial?”. The latter question was poorly received by many participants and was removed from the survey after several months of use.

Participants were given a 7-day pill dispenser and their first bottle of study tablets containing placebo tablets, or calcium tablets for those in the two-group randomization, dispensed in a single-blinded manner. Information about the contents of the medicine bottle was shown on the label as: “This bottle contains one of the following: calcium, vitamin D, vitamin D + calcium, or placebo” or, for the two-group randomization, “This bottle contains one of the following: calcium, or vitamin D + calcium”, with instructions to take one tablet twice a day with food. The granulation and coating of the placebo tablets were composed of inactive compounds such as lactose (no more than 600 mg), cellulose, polymers, and minerals. All study tablets were manufactured for the study and looked similar. Participants were asked to take the first tablet during the enrollment interview; in some cases the first tablet was taken at home, and the participant was asked to confirm this by postcard or telephone. Participants were also given a study brochure, including general instructions on taking the tablets and specific instructions on staggered administration if they were taking medicines that might interact with calcium. Participants were asked to discontinue any personal calcium- or vitamin D-containing supplements and, for the duration of the trial, were offered a supply of multivitamins that excluded those ingredients. The label on the multivitamin bottle identified these as “Multivitamin supplement” and included a complete list of ingredients. Participants were counseled on the major dietary sources of calcium and vitamin D and were asked to avoid regularly consuming large amounts of such foods.

Blood was drawn at enrollment and tested for calcium, 25-hydroxyvitamin D (25-OH D), and creatinine levels. Where possible, an appointment was scheduled for a telephone interview in 2 to 3 months’ time. Participants received $100 at completion of the enrollment interview. Study coordinators and investigators were aware of the placebo (or calcium, in the two-group randomization) run-in, but participants were not informed about the run-in period. During run-in, medical records were reviewed and blood test results became available; individuals found to have disqualifying medical conditions or abnormal blood results (including serum 25-OH D levels <12 ng/ml) were ineligible for randomization. Throughout the trial, perceived toxicity reports were completed when participants reported symptoms that they attributed to the study tablets.


After a single-blinded run-in period of approximately 3 months (56–84 days), coordinators confirmed that the eligibility criteria necessary for randomization had been met, including a one-hour telephone interview to determine patient-reported eligibility criteria. They obtained the self-reported number of tablets left in the participant’s bottle, calculated the percentage of tablets taken, and disqualified from randomization anyone with self-reported adherence below 80 %. RIFs were generally discontinued from the study before completion of the full questionnaire. In rare cases, coordinators used their discretion, e.g., to retain a non-adherent participant who had misunderstood the dose during run-in, or to exclude a participant with unusual circumstances unlikely to adversely affect participation.

Eligible participants who confirmed their ongoing commitment to the study were block-randomized to treatment in a double-blinded manner, using a web-based random number generator, stratified by study center, sex, and colonoscopy interval (3 or 5 years). Those in the full factorial randomization were randomly assigned to one of four treatment groups (calcium, vitamin D, both, or placebo). Those in the two-group randomization (i.e., women who were not willing to forgo calcium supplementation) were randomized to vitamin D or placebo and were all provided with calcium. Study treatment was scheduled to continue until the surveillance colonoscopy either 3 or 5 years after the qualifying examination, according to recommendations by each participant’s gastroenterologist.

Definitions: voluntary and involuntary run-in failures

RIFs were defined as individuals who enrolled and consented to participate in the trial but were not randomized after the run-in period. We defined “voluntary” RIFs as participants with some degree of control over the factors that prevented their randomization, i.e., those who declined to continue; those who took <80 % of their study tablets; and those who could not be reached for the telephone interview to be randomized. “Involuntary” RIFs were defined as participants whose removal from the trial was beyond their control, e.g., those whose safety blood tests were abnormal; those confirmed after enrollment as having a disqualifying medical condition; and others who changed residence after enrollment. Had the disqualifying medical information been available earlier, these participants would have been ineligible for enrollment. We focus on voluntary RIFs throughout this paper.

Statistical methods

We reasoned that the determinants of voluntary RIF are likely to vary in different trials, depending on the type of intervention offered and the characteristics of participants who enroll in a particular trial. For this reason, a priori we chose to analyze three groups of participants separately to illustrate what might happen in parallel “trials” with minor differences in participant characteristics: (M4) men in the full factorial randomization; (F4) women in the full factorial randomization; and (F2) women in the two-group randomization, i.e., women who chose to take calcium. We reasoned that the predictors of RIF in a single, pooled analysis would lack external validity, whereas common factors identified independently in the three subgroups might generate more plausible hypotheses that could be tested in future, similar trials.

For each group of participants, we looked for univariate associations between baseline factors and RIF using chi-square, t test, and analysis of variance. All variables that had a p value <0.1 in univariate analyses were added to a ”full” logistic regression model. The final model was obtained by removing all variables in the full model with p > 0.05 in order to give a minimum set of variables that together contribute significantly to the model. If significant univariately, the reported perceived toxicity (the only post-enrollment factor examined) was added to the final model to allow us to assess how perceived toxicity influences the effect of the final model factors. For women in the two-group randomization, multivitamin use, calcium supplementation, and vitamin D supplementation were collinear, and we chose to include only multivitamin use in the full model.

Post hoc analyses

In general, the analysis plan and potential predictors of RIF were selected a priori; however, during the analysis we discovered substantial variability in RIF according to which study center the participant had been recruited. This was unexpected and led us to undertake several post hoc analyses. (1) First, we recognized other variables that deserved investigation and tested those in the models. These were: imperfect completion of the interviewer-administered enrollment questionnaire or the self-administered SF-36 and food frequency questionnaires; the participant’s willingness to set a date at enrollment for the next interview; the study experience of the enrolling staff member in prior polyp prevention studies (dichotomous) and the total number of participants they enrolled during the entire study. For logistical reasons, we were only able to measure this by summing enrollments retrospectively, understanding that total accrual did not reflect staff experience up to the point at which each individual was enrolled. The variables used in these post hoc analyses are identified as such in tables. (2) We reasoned that the observed effect of the study center may reflect a range of factors from regional differences among participants (e.g., education, race) to differences in methods used by study staff; we examined this possibility by building multivariable models without the center variable. (3) Finally, we investigated the importance of RIF rates at each center for long-term trial efficiency by assessing the proportion failing run-in at each center in relation to three measures of post-randomization adherence, using scatterplots and Kendall’s tau-b.

Odds ratios are presented with 95 % confidence intervals. A single p value for each categorical variable was estimated using a likelihood ratio test. Analyses were done using SAS (version 9) and STATA (version 14).


Of 2813 enrollees, 240 (8.5 %) were excluded from randomization for reasons beyond their control such as out-of-range blood test results (involuntary RIFs, Fig. 1), leaving 2573 enrollees who were medically eligible for randomization. A further 314/2573 (12 %) were not randomized because of poor adherence or refusal to continue in the study (voluntary RIFs, Fig. 1). The RIF proportions among M4, F4, and F2 were 183/1606 (11 %), 49/301 (16 %), and 82/666 (12 %), respectively (Table 1). Of the 103 who declined to participate, 66 (21 % of RIFs, 2 % of those eligible) clearly stated that they did not want to continue in our study (15 because of a perceived toxicity [PT]), and 37 (12 % of RIFs) were uncooperative or could not be contacted. Two-thirds (N = 211) of RIFs (8 % of those eligible) were attributed to poor pill-taking adherence.

Fig. 1
figure 1

CONSORT flow diagram

Table 1 Participant characteristics at enrollment for randomized participants and voluntary run-in failures

Treating the three randomization groups as though they represent three separate study populations, the univariate analyses identified several variables that were associated with a higher proportion of RIF in each of the three groups: non-white race (overall proportions in RIF and randomized participants respectively, 17 % and 11 %), lower educational attainment (overall 27 % versus 11 %), being unmarried (overall 18 % versus 11 %), and failing to complete one or more questions in either of the self-administered questionnaires (overall 20 % versus 10 %) (Table 1). In all three groups, participants at the enrollment interview who confirmed a date and time for the next appointment were less likely to fail run-in (overall 11 % versus 17 %), but this was only statistically significant in the largest group, M4 (p < 0.001). Five additional factors were significantly associated with RIF in univariate analyses in two of the three groups: Hispanic ethnicity, study center, a lower SF-36 mental health score, non-use of multivitamins before the study, and answering “Don’t know” or refusing to answer one or more questions during the in-person enrollment interview.

Generally, participants’ preferences and beliefs about the properties of the study tablets did not substantially influence their probability of being randomized (Table 2). At study entry, 63 % of participants believed that calcium and vitamin D were very or somewhat likely to prevent colorectal polyps, 34 % did not know, and 3 % thought this unlikely. Overall, a majority of participants believed that calcium and vitamin D were likely to improve general health (86 %), improve pain in bones and joints (63 %), or improve mood (32 %); 20 % believed the study agents were likely to cause constipation. In univariate analyses, beliefs about the effectiveness of calcium and vitamin D and the baseline guess about allocation were not significantly associated with RIF. Although RIF risk tended to be lower in participants who would prefer to receive both calcium and vitamin D (overall 11 % versus 14 %), this was not statistically significant.

Table 2 Participant beliefs and voluntary run-in failure

The multivariable models developed for the three groups had few similarities (Tables 3, 4, 5). Study center was significantly associated with RIF in M4 and F2, with >8-fold variation in odds of RIF among the 11 centers in M4, and >13-fold in F2. In M4, men who missed or refused any question in the self-administered SF-36 or food frequency questionnaires had more than twice the odds of failing run-in (adjusted odds ratio [OR] 1.97; 95 % confidence interval [CI] 1.40–2.76). Younger men were more likely to fail run-in (adjusted OR per 5 years of age 0.85; 95 % CI 0.76–0.96), as well as single or divorced men (adjusted OR 1.65; 95 % CI 1.10–2.47) or men who had not graduated high school (OR 2.77; 95 % CI 1.58–4.85). Among women in the full factorial randomization (F4), RIF was more likely in those reporting use of no prescription medicines or three or more (p = 0.03). Among women in the two-group randomization (F2), in addition to study center, RIF was inversely associated with regular use of multivitamins at baseline (adjusted OR 0.44; 95 % CI 0.26–0.75) and SF-36 Physical Component Summary (PCS) measure (adjusted OR 0.73; 95 % CI 0.62–0.86). Women in the 5-year colonoscopy cycle had almost twice the odds of run-in failure as those with 3-year recommended follow-up (adjusted OR 1.91; 95 % CI 1.08–3.37).

Table 3 Logistic regression models of voluntary run-in failure: full factorial study males
Table 4 Logistic regression models of run-in failure: full factorial study females
Table 5 Logistic regression models of run-in failure: two-group randomization females

Perceived toxicities (PTs) were reported during run-in by 34 (1.8 %) participants in the full factorial randomization and by 12 (1.8 %) women in the two-group randomization. Of these 46 participants, 29 (63 %) became run-in failures, representing 12-fold increased odds of RIF in the full factorial randomization groups and 29-fold in the two-group randomization. Inclusion of PTs in the final models did not substantially alter the estimates for the other RIF predictors (Tables 3, 4, 5).

Study center was included in the final multivariable model for men (Table 3) and for women in the two-arm randomization (Table 5). When center was omitted from the model of RIF in men, two factors became significant: the odds of failing run-in were higher among men who had not scheduled a time for the next phone call before they left the enrollment interview (OR 1.61; 95 % CI 1.11–2.33), and lower among men enrolled by a coordinator who had worked in a prior polyp prevention study (OR 0.64; 95 % CI 0.42–0.97) (see Additional file 3). Omission of center from the model also led to a larger odds ratio for the lowest education category (3.25; 95 % CI 1.92–5.49 from 2.77; 95 % CI 1.58–4.85). Variable selection for the model for women in the two-arm randomization was not affected by exclusion of center (see Additional file 4).

In analyses by center, the risk of run-in was inversely correlated with three measures of post-randomization adherence based on pill-taking and/or endpoint ascertainment, but none of these associations reached statistical significance (see Additional files 5, 6, and 7 ).


In this study, we analyzed data from the first 3 months of a long-term chemoprevention trial to identify baseline predictors of voluntary run-in failure (RIF) in three groups of participants. In all, 12 % of participants failed run-in, and this loss before randomization represents considerable effort that was invested specifically to improve long-term trial efficiency. Our analyses uncovered differences in the factors associated with RIF in the three groups, even though they experienced fundamentally the same trial conditions. RIF risk in men was primarily associated with sociodemographic characteristics, but the key drivers in women were health-related factors. There were further differences in RIF predictors among women in the full factorial and two-group randomization protocols. The former group experienced a true, single-blind, placebo run-in, whereas the latter received calcium during run-in; thus, their single-blind run-in was potentially influenced by the physiological effects of calcium as well as any health beliefs related to their preference for calcium supplementation. Our results illustrate the difficulties in defining a simple, generalizable set of risk factors to identify participants at risk of RIF. While one might expect to see different factors affecting RIF in trials that involve different diseases, interventions, and outcomes, here we see different multivariable models in men and women within the same trial.

In the two larger groups, study center was associated with an 8- to 13-fold variation in odds of RIF, even after adjustment for other factors. This substantial effect may reflect differences in the methods used by study staff at each center, or it may be due to residual confounding by medical, educational, cultural, or other characteristics of the participants. With the available data, we could not identify differences in methods between centers. However, in post hoc analyses designed to explore the substantial heterogeneity in RIF by center, we found that building the model without center led, among men, to stronger associations between run-in failure and coordinator experience, timely scheduling of the next interview, and participant education. Center may represent a mix of factors including participant and coordinator characteristics, participants’ uncertainty about their commitment, and competing constraints on their time (e.g., by employment). Center has been associated with RIF risk in other settings [6, 7], and this association may be worth exploring in future trials. Will a center with more RIFs subsequently have better adherence and endpoint ascertainment because participants were more stringently selected, or does a high RIF rate indicate a local problem that will persist throughout the study? Our exploratory post hoc data hinted that centers with more RIFs had lower rates of post-randomization adherence and trial completion; this might suggest local differences by center, e.g., in study methods, or in intrinsic differences among the participants who enrolled in each region.

Among men, those who were younger, single, and had not graduated high school had the greatest risk of failing run-in. One possible explanation is that those who were working had less time to commit to the study than retired participants, but we did not collect employment data. In addition, men had twice the risk of RIF if they overlooked or refused to answer one or more questions in two self-administered questionnaires (SAQs). This finding might be explained by an association between perfect completion of the SAQ and an individual’s motivations underlying trial participation. However, an alternative explanation is differences in quality checking by study staff; those who more effectively identified missing SAQ questions and had participants correct them may have also been better at motivating enrollees during enrollment and run-in. With our data, we could not distinguish these two possibilities. The first suggests that a SAQ could be developed to help identify individuals at risk of run-in failure and target them for intervention. The second could be addressed via improvements in staff training and motivational protocols.

In women, health-related factors tended to be associated with RIF. In the full factorial randomization, RIF was most frequent among women taking no prescription medications at baseline. Among women in the two-group randomization, RIF was less common among women who took vitamin supplements at baseline, had better SF-36 physical health scores, and a 3-year (rather than a 5-year) trial participation, which represents not only a shorter commitment to trial participation, but also higher risk adenomas that require more intensive follow-up. We found no convincing evidence that the successful negotiation of run-in was associated with participants’ expectations of health benefits from the agents, which tablets they would prefer if given the choice, or which agents they guessed they would be given for the study. Other studies have found that trial participation is associated with altruistic factors [811], so it is possible that individuals with stronger feelings about the study agents were less likely to enroll in the first place.

Predictors of RIF have varied in previous trials. RIF was associated with lower Karnofsky performance score and lower education level in a head and neck cancer chemoprevention trial [12], and with younger age, not working, and smoking, in a trial of pregnant women to prevent adverse neonatal outcomes [7]. Interestingly, in the latter study, the proportions of RIFs ranged from 20–40 % in five clinics but was lowest in one clinic where participants were told they would receive sugar pills during run-in. Another trial of motivational interventions to reduce blood pressure found no significant characteristics of RIFs, but the sample was small [1].

In our trial and others, the purpose of the adherence run-in is to help researchers identify and randomize the participants most likely to follow trial protocol. Its potential advantages include the retention of good long-term adherers [13], improved efficiency [14], improved internal validity in the estimation of efficacy, and greater statistical precision [15]. There is certainly evidence that removing poor adherers can change a study’s effect estimates. For example, in a trial of lovastatin to reduce cholesterol levels, Davis et al. found that participants who would have failed run-in had an adherence criterion been applied, experienced 17.8 % smaller cholesterol reductions than more adherent participants [6]. By retaining these individuals in the trial, the proportion of less educated participants was increased (improving generalizability), but the overall measure of lovastatin’s effectiveness was 5.2 % lower than it would otherwise have been. Pablos-Mendez discussed two comparable trials of aspirin to prevent myocardial infarction [2]. The first trial, in American physicians, included a run-in and reported 90 % adherence over 5 years and a significant risk reduction of 44 % [16]. The other trial, in British physicians, did not include a run-in, and reported adherence of 70 % over 6 years and a non-significant risk reduction of 3 % [17]. These findings support the use of run-in to improve adherence, provided that the goal is to estimate efficacy rather than effectiveness.

One concern about the adherence run-in is that individuals who adhere poorly to treatment are very different than those who adhere well, with respect to health and other characteristics; poor adherers have worse health outcomes during clinical trials independent of assignment to the active or placebo group [1823], and their exclusion may therefore reduce generalizability. Further, improvements in efficiency offered by the run-in may be attenuated if the exclusion criteria misclassify individuals who would have gone on to complete the trial successfully [14]. Although early trial adherence and longer term adherence tend to be highly correlated [7], one study showed that this was true for less educated participants but not for more educated ones [6]. The implication is that exclusion of better educated participants who adhere poorly during run-in may be relatively inefficient because their longer term adherence is less accurately predicted. However, although the factors associated with adherence in different settings have been studied extensively [24, 25], RIF is not simply an adherence issue; it also gives participants an opportunity to reconsider their enrollment (“buyer’s remorse”). In one study, when patients were interviewed within a month of enrollment in a variety of clinical trials, 16 (12 %) had already considered dropping out, and 4 of those continued because of a sense of obligation, despite a preference to withdraw [26]. In our study, 21 % of RIFs (4 % of enrollees medically eligible for randomization) clearly stated that they did not want to continue participating, and a further 12 % were uncooperative or could not be contacted. But among the two-thirds of RIFs we attributed to poor adherence, ambivalence towards trial participation may have been the underlying cause of that poor adherence in some cases. It is also possible that the $100 incentive payment was the primary motivation for enrollment; this may account for some early dropouts.

Future studies of run-in might collect more granular data to distinguish poor adherence from disinclination to continue in the trial. Potential strategies to address both problems may include increased communication with participants during run-in to elicit questions and concerns, motivate participants in their pill-taking routine, and develop a rapport that encourages individuals to stay in the trial. One option might be a two-stage run-in. For example, in our trial, participants could have been telephoned one week after enrollment to assess early adherence, identify cases of “buyer’s remorse,” and provide explanations and motivational counseling where appropriate. After 3 months, persistent non-adherers could be excluded from randomization as usual, and randomization could be stratified according to whether motivational counseling was initiated, to assess the impact of the strategy by subgroup. This approach could be extended to study the ideal frequency of participant contact or counseling needed to retain promising participants but not reluctant enrollees with poor long-term prospects of adherence.

Forty-six participants (2 % of enrollees and 15 % of RIFs) contacted their coordinator to report perceived toxicities (PTs) during run-in, and almost two-thirds of those became RIFs. In the full factorial randomization, participants received placebo during run-in and PTs were associated with a 12-fold increased odds of RIF. In the two-group randomization, women receiving calcium and reporting a PT experienced a 29-fold increase in odds of RIF. However, the risk of PT was similar in those given placebo or calcium during run-in. When a participant develops any new symptom by chance during run-in, they may attribute it to the study intervention; alternatively, the PT may be a nocebo phenomenon arising from expectations of an adverse effect upon starting a new treatment. Up to a quarter of patients in the placebo group in previous trials have reported symptoms after randomization, with a wide range of frequency across trials [2729]. Some trials deliberately exclude participants who show a placebo response (or a coincident improvement in health) during run-in, but they may be criticized for poorer generalizability [4]. What we observed was an increased tendency for participants with a nocebo response (or coincidental development of symptoms) to fail run-in. Both types of exclusion involve a change in health while taking placebo: the first, determined by the investigator, excludes those with beneficial changes in health while taking placebo, while the second type, determined by the participant, excludes those with adverse changes in health. Both are likely to affect generalizability. Although PTs were very strongly associated with RIF, it is unclear how to address this, because if the blind must be maintained, participants cannot be told that they were taking placebo during run-in.

This study was limited by differences in sample size in the three subgroups, which will have affected the power to consistently detect factors with common specific effect sizes and meant that our multivariable modeling did not directly compare RIF rates in men and women. In addition, we chose to only consider main effects in our analyses and did not model any interactions between potential predictor variables. A further limitation was our use of self-reported adherence via tablet counts, which, although superior to some methods, is still subject to misclassification [30]. Our study was a relatively long-term (3- to 5-year) trial using agents (calcium and vitamin D) that are generally thought to have few, if any, adverse side effects. As a chemoprevention trial, it recruited individuals with a specific interest in preventive health strategies, and our study was also limited to adults aged 45–75 in good general health. The study may therefore have limited generalizability to other types of trials, although this would be more of a concern if we had found a consistent set of RIF predictors in the three subgroups and were proposing to extend our findings to diverse trial settings.


The lack of a clear, single set of predictors of run-in failures among our three subgroups suggests that the search for general predictors of run-in failure in trials will not be straightforward. However, substantially different risks associated with study center and missing self-administered questionnaire data reflect opportunities for further study with the goal of identifying interventions that might improve trial efficiency and retention. Perceived toxicities during a placebo run-in represent the strongest risk factor for run-in failure, but perhaps are the most difficult to address in a blinded study. The loss of 12 % of participants due to voluntary factors represents considerable expense as well as loss of generalizability, and the search for a balance between optimal efficiency and generalizability via the adherence run-in deserves further attention.


CI, confidence interval; MCS, mental component summary measure (SF-36); OR, odds ratio; PCS, physical component summary measure (SF-36); PT, perceived toxicity; RIF, run-in failure; SAQ, self-administered questionnaire


  1. Ulmer M, Robinaugh D, Friedberg JP, Lipsitz SR, Natarajan S. Usefulness of a run-in period to reduce drop-outs in a randomized controlled trial of a behavioral intervention. Contemp Clin Trials. 2008;29(5):705–10.

    Article  PubMed  Google Scholar 

  2. Pablos-Mendez A, Barr RG, Shea S. Run-in periods in randomized trials: implications for the application of results in clinical practice. JAMA. 1998;279(3):222–5.

    CAS  Article  PubMed  Google Scholar 

  3. Frost C, Kenward MG, Fox NC. Optimizing the design of clinical trials where the outcome is a rate. Can estimating a baseline rate in a run-in period increase efficiency? Stat Med. 2008;27(19):3717–31.

    Article  PubMed  Google Scholar 

  4. Berger VW, Rezvani A, Makarewicz VA. Direct effect on validity of response run-in selection in clinical trials. Control Clin Trials. 2003;24(2):156–66.

    Article  PubMed  Google Scholar 

  5. Baron JA, Barry EL, Mott LA, Rees JR, Sandler RS, Snover DC, et al. A trial of calcium and vitamin D for the prevention of colorectal adenomas. N Engl J Med. 2015;373(16):1519–30. doi:10.1056/NEJMoa1500409.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  6. Davis CE, Applegate WB, Gordon DJ, Curtis RC, McCormick M. An empirical evaluation of the placebo run-in. Control Clin Trials. 1995;16(1):41–50.

    CAS  Article  PubMed  Google Scholar 

  7. Blackwelder WC, Hastings BK, Lee ML, Deloria MA. Value of a run-in period in a drug trial during pregnancy. Control Clin Trials. 1990;11(3):187–98.

    CAS  Article  PubMed  Google Scholar 

  8. Newburg SM, Holland AE, Pearce LA. Motivation of subjects to participate in a research trial. Appl Nurs Res. 1992;5(2):89–93.

    CAS  Article  PubMed  Google Scholar 

  9. Cassileth BR, Lusk EJ, Miller DS, Hurwitz S. Attitudes toward clinical trials among patients and the public. JAMA. 1982;248(8):968–70.

    CAS  Article  PubMed  Google Scholar 

  10. Mattson ME, Curb JD, McArdle R. Participation in a clinical trial: the patients’ point of view. Control Clin Trials. 1985;6(2):156–67.

    CAS  Article  PubMed  Google Scholar 

  11. Rosenbaum JR, Wells CK, Viscoli CM, Brass LM, Kernan WN, Horwitz RI. Altruism as a reason for participation in clinical trials was independently associated with adherence. J Clin Epidemiol. 2005;58(11):1109–14.

    Article  PubMed  Google Scholar 

  12. Hudmon KS, Chamberlain RM, Frankowski RF. Outcomes of a placebo run-in period in a head and neck cancer chemoprevention trial. Control Clin Trials. 1997;18(3):228–40.

    CAS  Article  PubMed  Google Scholar 

  13. Glynn RJ, Buring JE, Manson JE, LaMotte F, Hennekens CH. Adherence to aspirin in the prevention of myocardial infarction. The Physicians’ Health Study. Arch Int Med. 1994;154(23):2649–57.

    CAS  Article  Google Scholar 

  14. Brittain E, Wittes J. The run-in period in clinical trials. The effect of misclassification on efficiency. [Erratum appears in Control Clin Trials 1991 Jun;12(3):456]. Control Clin Trials. 1990;11(5):327–38.

    CAS  Article  PubMed  Google Scholar 

  15. Franciosa JA. Commentary on the use of run-in periods in clinical trials. Am J Cardiol. 1999;83(6):942–4. A9.

    CAS  Article  PubMed  Google Scholar 

  16. Steering Committee of the Physicians’ Health Study Research Group. Final report on the aspirin component of the ongoing Physicians’ Health Study. N Engl J Med. 1989;321(3):129–35.

    Article  Google Scholar 

  17. Peto R, Gray R, Collins R, Wheatley K, Hennekens C, Jamrozik K, et al. Randomised trial of prophylactic daily aspirin in British male doctors. BMJ. 1988;296(6618):313–6.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  18. Affuso O, Kaiser KA, Carson TL, Ingram KH, Schwiers M, Robertson H, et al. Association of run-in periods with weight loss in obesity randomized controlled trials. Obes Rev. 2014;15(1):68–73. doi:10.1111/obr.12111.

    CAS  Article  PubMed  Google Scholar 

  19. Coronary Drug Project Research Group. Influence of adherence to treatment and response of cholesterol on mortality in the coronary drug project. N Engl J Med. 1980;303(18):1038–41.

    Article  Google Scholar 

  20. Epstein LH. The direct effects of compliance on health outcome. Health Psychol. 1984;3(4):385–93.

    CAS  Article  PubMed  Google Scholar 

  21. Horwitz RI, Viscoli CM, Berkman L, Donaldson RM, Horwitz SM, Murray CJ, et al. Treatment adherence and risk of death after a myocardial infarction. Lancet. 1990;336(8714):542–5.

    CAS  Article  PubMed  Google Scholar 

  22. Granger BB, Swedberg K, Ekman I, Granger CB, Olofsson B, McMurray JJ, et al. Adherence to candesartan and placebo and outcomes in chronic heart failure in the CHARM programme: double-blind, randomised, controlled clinical trial. Lancet. 2005;366(9502):2005–11.

    CAS  Article  PubMed  Google Scholar 

  23. Gallagher EJ, Viscoli CM, Horwitz RI. The relationship of treatment adherence to the risk of death after myocardial infarction in women. JAMA. 1993;270(6):742–4.

    CAS  Article  PubMed  Google Scholar 

  24. Martin KA, Bowen DJ, Dunbar-Jacob J, Perri MG. Who will adhere? Key issues in the study and prediction of adherence in randomized controlled trials. Control Clin Trials. 2000;21(5 Suppl):195S–9S.

    CAS  Article  PubMed  Google Scholar 

  25. Dunbar-Jacob J, Erlen JA, Schlenk EA, Ryan CM, Sereika SM, Doswell WM. Adherence in chronic disease. Ann Rev Nurs Res. 2000;18:48–90.

    CAS  Google Scholar 

  26. Verheggen FW, Nieman FH, Reerink E, Kok GJ. Patient satisfaction with clinical trial participation. Int J Qual Health Care. 1998;10(4):319–30.

    CAS  Article  PubMed  Google Scholar 

  27. Barsky AJ, Saintfort R, Rogers MP, Borus JF. Nonspecific medication side effects and the nocebo phenomenon. JAMA. 2002;287(5):622–7.

    Article  PubMed  Google Scholar 

  28. Rief W, Avorn J, Barsky AJ. Medication-attributed adverse effects in placebo groups: implications for assessment of adverse effects. Arch Int Med. 2006;166(2):155–60.

    Article  Google Scholar 

  29. Rosenzweig P, Brohier S, Zipfel A. The placebo effect in healthy volunteers: influence of experimental conditions on the adverse events profile during phase I studies. Clin Pharmacol Ther. 1993;54(5):578–83.

    CAS  Article  PubMed  Google Scholar 

  30. van Onzenoort HA, Verberk WJ, Kessels AG, Kroon AA, Neef C, van der Kuy PH, et al. Assessing medication adherence simultaneously by electronic monitoring and pill count in patients with mild-to-moderate hypertension. Am J Hypertens. 2010;23(2):149–54.

    Article  PubMed  Google Scholar 

Download references


We would like to thank all individuals who enrolled in the trial, the study coordinators and co-investigators, the Dartmouth project coordinating center staff, the Data and Safety Monitoring Committee, and Pfizer Consumer Healthcare for provision of the study tablets. Open access for this article was funded by King’s College London.


This study was supported by a grant from the National Institutes of Health, National Cancer Institute (CA178272, to Dr. Rees). The parent trial was supported by a grant from the National Institutes of Health, National Cancer Institute (CA098286, to Dr. Baron).

Availability of data and materials

To request the data, readers should contact the Polyp Prevention Study Group (PPSG) Consortium Steering Committee by emailing John A. Baron, Principal Investigator for the PPSG, at Our study involves human participants who have not consented to data publication, and the dataset is a complex one that contains indirect identifiers.

Authors’ contributions

JRR and JLP designed the research; JRR, JAB, JLP, ELB, JCF, DJR, and RSB conducted research and/or collected data; LAM analyzed data; JRR, ELB, LAM, JCF, DJR, and JLP wrote the paper; JRR had primary responsibility for the final content. All authors read and approved the final manuscript.

Authors’ information

JLP was partly supported by the National Institute for Health Research (NIHR) Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust and King’s College London, UK. The views expressed are those of the author(s) and not necessarily those of the UK NHS, the NIHR, or the UK Department of Health.

Competing interests

JLP received consultancy payment from Dartmouth College for statistical input into the study. The study tablets used in the trial were provided by Pfizer Consumer Healthcare. The authors declare that they have no competing interests.

Ethics approval and consent to participate

All participants provided written informed consent; the research was approved by the Committee for the Protection of Human Subjects at Dartmouth College and by IRBs at each clinical center.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Janet L. Peacock.

Additional files

Additional file 1:

Institutional Review Boards that approved the trial at each study center. (DOCX 14 kb)

Additional file 2:

CONSORT 2010 checklist of information to include when reporting a randomised trial. (DOC 213 kb)

Additional file 3:

Logistic regression models of voluntary run-in failure, excluding study center: full factorial study males (DOCX 20 kb)

Additional file 4:

Logistic regression models of run-in failure, excluding study center: two-group randomization females. (DOCX 19 kb)

Additional file 5:

Run-in failure and measures of post-randomization adherence, by center. Successful voluntary completion (%) is defined for each center as the number of participants with an end-of-treatment colonoscopy result and mean self-reported adherence ≥50 % throughout the trial (3–5 years), divided by the number of participants medically eligible to complete the trial. The measure excludes participants who involuntarily dropped out of the study or stopped taking pills for valid medical reasons. The correlation with RIF (%) measured by Kendall’s tau-b is –0.22 (95 % CI –0.81 to 0.37; p = 0.40). (PDF 43 kb)

Additional file 6:

Run-in failure and measures of post-randomization adherence, by center. Outcome data provision rate (%) is defined for each center as the number of participants with an end-of-treatment colonoscopy result, divided by the number of participants medically eligible to complete the trial. The measure excludes participants who involuntarily dropped out of the study or stopped taking pills for valid medical reasons. The correlation with RIF (%) measured by Kendall’s tau-b is –0.27 (95 % CI –0.68 to 0.13; p = 0.28). (PDF 43 kb)

Additional file 7:

Run-in failure and measures of post-randomization adherence, by center. Average adherence (%) is defined for each center throughout the trial (3–5 years) as the self-reported number of pills taken by all participants medically eligible to take study pills throughout the trial, divided by the maximum number of pills that they should have taken. The measure excludes participants who involuntarily dropped out of the study or stopped taking pills for valid medical reasons. The correlation with RIF (%) measured by Kendall’s tau-b is –0.31 (95 % CI –0.78 to 0.16; p = 0.22). (PDF 45 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rees, J.R., Mott, L.A., Barry, E.L. et al. Randomized controlled trials: who fails run-in?. Trials 17, 374 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Randomized controlled trials
  • Run-in
  • Adherence
  • Generalizability