 Methodology
 Open Access
 Published:
Bounding the perprotocol effect in randomized trials: an application to colorectal cancer screening
Trials volume 16, Article number: 541 (2015)
Abstract
Background
The perprotocol effect is the effect that would have been observed in a randomized trial had everybody followed the protocol. Though obtaining a valid point estimate for the perprotocol effect requires assumptions that are unverifiable and often implausible, lower and upper bounds for the perprotocol effect may be estimated under more plausible assumptions. Strategies for obtaining bounds, known as “partial identification” methods, are especially promising in randomized trials.
Results
We estimated bounds for the perprotocol effect of colorectal cancer screening in the Norwegian Colorectal Cancer Prevention trial, a randomized trial of onetime sigmoidoscopy screening in 98,792 men and women aged 50–64 years. The screening was not available to the control arm, while approximately two thirds of individuals in the treatment arm attended the screening. Study outcomes included colorectal cancer incidence and mortality over 10 years of followup. Without any assumptions, the data alone provide little information about the size of the effect. Under the assumption that randomization had no effect on the outcome except through screening, a point estimate for the risk under no screening and bounds for the risk under screening are achievable. Thus, the 10year risk difference for colorectal cancer was estimated to be at least −0.6 % but less than 37.0 %. Bounds for the risk difference for colorectal cancer mortality (–0.2 to 37.4 %) and allcause mortality (–5.1 to 32.6 %) had similar widths. These bounds appear helpful in quantifying the maximum possible effectiveness, but cannot rule out harm. By making further assumptions about the effect in the subpopulation who would not attend screening regardless of their randomization arm, narrower bounds can be achieved.
Conclusions
Bounding the perprotocol effect under several sets of assumptions illuminates our reliance on unverifiable assumptions, highlights the range of effect sizes we are most confident in, and can sometimes demonstrate whether to expect certain subpopulations to receive more benefit or harm than others.
Trial registration
Clinicaltrials.gov identifier NCT00119912 (registered 6 July 2005)
Background
Most randomized trials report the intentiontotreat (ITT) effect as the primary, or only, measure of the comparative effect of the studied interventions. A focus on the ITT effect is attractive for several reasons [1, 2]. However, the ITT effect may not be the effect of interest for patients and clinicians when there is a high rate of noncompliance or when the rate of noncompliance in the trial differs from that expected outside the trial setting. In such circumstances, the perprotocol effect – the effect that would have been observed had all trial participants followed the trial protocol – may be of greater interest [1, 2]. Unfortunately, when patient characteristics associated with noncompliance are also related to patient outcomes, the naïve approach to estimating this effect in a “perprotocol analysis” restricted to those who follow the protocol in each arm of the trial will be biased. In these cases, identifying the perprotocol effect in a randomized trial requires strong assumptions (e.g., no unmeasured confounding) and methods that are commonly used in the analysis of nonrandomized studies [2].
An alternative to estimating the perprotocol effect under these strong assumptions is to estimate lower and upper limits or “bounds” for the perprotocol effect under weaker, but perhaps more realistic, assumptions [3–8]. While effect bounding, known as “partial identification of the effect”, has been attempted in observational studies (particular in the social sciences), it is rarely implemented in randomized trials. This is surprising because partial identification methods can capitalize on assumptions that are expected to hold in many randomized trials.
Here we provide a guide to the use of partial identification methods in randomized trials with dichotomous outcomes and point interventions, i.e., interventions that are not sustained over time. As an example, we demonstrate the estimation of bounds for the perprotocol effect of colorectal cancer (CRC) screening on the 10year risk of CRC incidence and death in the Norwegian Colorectal Cancer Prevention (NORCCAP) trial.
Methods
The NORCCAP trial
The design, procedures, and primary findings of the NORCCAP trial have been described elsewhere [9–11]. In brief, 98,792 residents of the City of Oslo and of Telemark County, Norway, who had no history of CRC and were aged 55–64 years in 1998 or 50–54 years in 2000 were randomly assigned to either treatment or control arms. Those selected for the treatment arm were invited to CRC screening, including either a onceonly flexible sigmoidoscopy or a combination of onceonly flexible sigmoidoscopy plus an immunochemical fecal occult blood testing. Individuals assigned to the control arm were not offered any intervention. All participants who attended the screening provided written informed consent, and the study was approved by the Ethics Committee of SouthEast Norway and the Norwegian Data Inspectorate. Primary study endpoints prespecified in the study protocol were CRC incidence and mortality. Table 1 shows the 10year risk of these outcomes [10] by age group, randomization arm, and screening intervention received. After standardization by age group, the ITT 10year risk differences (95 % confidence interval) were −0.2 % (−0.4 %, −0.1 %) for CRC incidence, −0.1 % (−0.1 %, 0.0 %) for CRC mortality, and −0.2 % (−0.6 %, 0.2 %) for allcause mortality.
Several features of this trial are relevant for our purposes. First, CRC screening was not available in the trial communities for individuals not assigned to screening in the trial; thus, nobody in the control arm received it. Second, the treatment was a onceonly screen (a point intervention); thus, compliance in the screening arm is allornothing. Third, losstofollowup was minimal; only 3 individuals were not followed until they experienced a study endpoint, emigration, or end of 10year followup.
Overview of the analytic approach
The following sections describe the estimation of bounds for the perprotocol effect of point interventions on dichotomous outcomes under increasingly stronger assumptions. We begin with no assumptions (i.e., the data alone), then assume the socalled “instrumental conditions” described below, and then combine additional assumptions with these instrumental conditions. Intuitively, the more assumptions we make, the narrower the bounds become, but of course this comes at a cost if our assumptions are illplaced. Table 2 summarizes the assumptions.
To compute the bounds in the NORCCAP trial, we first estimated bounds within age groups (50–54 years; 55–64 years) and then standardized these bounds. We begin by estimating the bounds in the older age group only so readers can check our calculations using the expressions provided in the text and the summary data in Table 1.
Results
Bounding the perprotocol effect under no assumptions
The perprotocol risk difference can theoretically range from −100 % (the treatment universally prevents the outcome) to 100 % (the treatment universally causes the outcome). However, the study data – without any assumptions – can be used to exclude parts of the theoretical range of the perprotocol effect. To see this, consider that, for every person in the study, we observe one of their two potential or counterfactual outcomes: e.g., for those who were screened, we see their outcome under screening, but do not know what would have happened to them had they not been screened (see illustration in Additional file 1: Table S1). We can compute bounds for the perprotocol effect by imputing the most extreme scenarios for the unobserved counterfactual outcomes – e.g., for the upper bound, imagining that everybody who was screened would have experienced the outcome had they not been screened, and that everybody who was not screened would have not experienced the outcome had they been screened. Thus, the perprotocol risk difference must lie within the bounds (using probability notation):
where X is an indicator of treatment, Y is an indicator of a dichotomous outcome of interest, and LB and UB denote the lower and upper bounds, respectively. Expressions for the risk under each level of treatment, the risk ratio, and a formal definition of the effect of interest are provided in the Additional file 1.
The first block of Table 3 shows that these assumptionfree bounds for the 10year risk difference in the NORCCAP trial are quite wide: e.g., from −10.0 to 90.0 % for CRC incidence. In fact, these bounds will always cover the null and necessarily have a width of 100 % for dichotomous outcomes. In order to obtain narrower bounds for the perprotocol effect we need to combine the data with assumptions.
Bounding the perprotocol effect under the instrumental conditions
The three instrumental conditions are often described as follows: (1) the randomization indicator (which we will denote Z) is associated with receiving treatment X; (2) the randomization indicator Z causes the outcome Y only through treatment X; and (3) the randomization indicator Z and the outcome Y share no causes [3]. The first condition, sometimes referred to as the “relevance” condition, can be checked in the data: e.g., in the 55–64year age group of the NORCCAP trial the risk difference:
Here, the Fstatistic = 75626 and p value < 0.0001.
The third “exchangeability” condition is expected by design in randomized trials. The second condition, known as the exclusion restriction, is expected to hold in doubleblinded placebocontrolled randomized trials in which doubleblinding is successfully maintained and there is no placebo effect, but may not hold in other trials, like ours. For example, the exclusion restriction would be violated if the invitation letter alone prompted new awareness of CRC risk and risk factors, and subjects in the screening arm adopted preventive measures that they would not have adopted had they been in the control arm. Although we cannot prove conditions (2) and (3) hold in a given study, it is sometimes possible to find empirical evidence refuting, i.e., falsifying, them [3].
Under the instrumental conditions, we can compute the following bounds for the perprotocol risk difference:
Related bounds have also been proposed under different interpretations of condition (3) [6, 8], but the bounds presented here make use of the strongest version of this condition expected to hold in randomized trials. See Richardson and Robins [12] for further discussion, including some intuition for these complicated expressions. See Additional file 1 for bounds for the (counterfactual) absolute risks under each treatment.
The second block of Table 3 shows that the bounds for the 10year risk difference under the instrumental conditions are quite wide in the NORCCAP trial. For example, for CRC risk, the effect may fall anywhere between –0.7 % and 34.5 %. Interestingly, we can obtain a point estimate for the risk under no screening (for CRC risk: 1.4 %) but, because there is noncompliance in the screening arm, only bounds for the risk under screening (for CRC risk: 0.7 to 35.9 %). The wide bounds for the risk under screening drive the wide bounds for the risk difference.
Bounding the perprotocol effect within compliance types
Under the instrumental conditions, we can describe patients in the study population as belonging to one of four mutuallyexclusive “compliance types” or “principal strata” [13–15]:

(1)
“Alwaystakers,” those who would have always been treated regardless of randomization

(2)
“Nevertakers,” those who would have always opted out of treatment regardless of randomization

(3)
“Compliers,” those who would have been treated had they been randomized to receive treatment, and would not have been treated had they been randomized to the control arm

(4)
“Defiers,” those who would not have been treated had they been randomized to receive treatment, but would have been treated had they been randomized to the control arm
For many trials, it may be reasonable to assume there are zero (or at least a small number of) “defiers.” If we assume there are no “defiers” then we can identify the proportion of our study population who are in each of the other compliance types.
In the NORCCAP trial, there are no “alwaystakers” and no “defiers” because the screening was not available to those who were randomized to the control arm. Therefore, under exchangeability of the randomization arms, 35 % of trial participants are “nevertakers” (estimated by Pr [X = 0Z = 1]) and the other 65 % are “compliers”. For each person in the treatment arm we know whether she is a “complier” (if she did undergo screening) or a “nevertaker” (if she did not). In studies that have noncompliance in both randomization arms, we will not know with certainty any given subject’s compliance type.
Richardson and Robins [5] described bounds for the counterfactual risks and treatment effects within compliance types. In the special case when we (i) know there are only “compliers” and “nevertakers” and (ii) have no empirical evidence against the instrumental conditions, then the effect within the “nevertakers” is bounded as follows:
Meanwhile, the effect in the “compliers” is pointidentified:
Beyond the special case where (i) is expected by design, more general expressions have been described for an assumed distribution of compliance types [5]. Note that any assumed distribution needs to be feasible. For example, in studies like the NORCCAP trial with noncompliance in only one treatment arm, the only feasible proportion of “defiers” is zero. In trials with noncompliance in both treatment arms, the data may be consistent with a range in the proportion of “defiers,” and investigators may consider computing bounds for the effects within compliance types under an assumed proportion of “defiers” within that range.
The second block of Table 3 further presents bounds for the counterfactual risks and perprotocol effect within the “compliers” and the “nevertakers.” (In the NORCCAP trial, there is no evidence against the instrumental conditions, so we can use the expressions described above.) For the “nevertakers”, we can obtain a point estimate for the risks under no screening, but we have (by definition) no information on what would have happened to them had we forced them to follow the protocol in the screening arm and, therefore, been screened. Therefore, we can only achieve wide bounds for the perprotocol effect in the “nevertakers” because we have limited information in the data on what would have happened to this subgroup had they followed the protocol. For the “compliers”, we can obtain a point estimate for the risks under screening and no screening, and, therefore, we can also obtain a point estimate for the perprotocol effect in the “compliers” [10], often referred to as a “local” average treatment effect [15]. In the NORCCAP trial, the “compliers” are known to be those who actually received the screening, and thus the effect in the “compliers” is the effect in the screened.
The bounds in the NORCCAP trial using the instrumental conditions alone, described above, are a weighted average of the bounds in the “nevertakers” and the point estimate in the “compliers”. In order to obtain narrower bounds for the perprotocol effect in the study population, we may combine the instrumental conditions with restrictions on the upper bound of the risk under screening in the “nevertakers” (i.e., a counterfactual risk in 35 % of the study population for which we have no empirical information; Fig. 1) [5]. For example, we might assume that the risk under screening in the “nevertakers” is actually not greater than their risk under no screening. Under this assumption, the resulting bounds do not include the null value: (−0.7 %, −0.2 %) for CRC risk, (−0.3 %, −0.1 %) for CRC mortality, and (−5.9 %, −0.2 %) for allcause mortality. Though this assumption is plausible for CRC risk and mortality, it may not be for allcause mortality, which could be more susceptible to unintended consequences of screening. A less stringent restriction for allcause mortality is that at most, say, 50 % of the “nevertakers” would have died had they been screened. As seen in Fig. 1, this restriction (xaxis = 0.5) would imply bounds that include the possibility of a null or positive risk difference. A sensitivity analysis can be conducted under different hypothesized risks under screening in the “nevertakers”, or the full range of possibilities could be presented as we do in the insets in Fig. 1. For trials with noncompliance in both treatment arms, similar sensitivity analyses could be conducted under hypothesized risks under no treatment in the “alwaystakers”.
Point identification of the perprotocol effect under the instrumental conditions plus homogeneity
The bounds can also be narrowed by combining the instrumental conditions with assumptions that restrict effect heterogeneity. In fact, when the instrumental conditions are combined with sufficiently strong homogeneity assumptions, a point estimate of the perprotocol effect can be obtained [6, 7, 16]. Assuming (i) no additive effect modification by the instrument among the treated and the untreated leads to what is often referred to as the standard instrumental variable (IV) estimator:
Assuming (ii) no multiplicative effect modification by the instrument among the treated and the untreated leads to a different estimator:
where
In the NORCCAP trial, because we have only “compliers” and “nevertakers”, these assumptions could be restated as equal effects in the “compliers” and “nevertakers” on the (i) additive or (ii) multiplicative scale. Effect homogeneity assumptions may be implausible in many trials, particularly if there are interactions between patients’ treatment assignment and characteristics in informing treatment choice [16].
The third and fourth blocks of Table 3 shows the point estimates of the perprotocol risk difference under effect homogeneity on the additive and multiplicative scale, respectively. These assumptions will not hold in our study if, for example, family history of cancers modifies the effect of screening and patients in the screening arm with no family history of cancers may be more likely to forgo screening. The possibility for such modification is also apparent when examining the baseline risks across compliance types: the risk under no screening is lower in the “compliers” than the “nevertakers” which might indicate the magnitude of the effects in the “compliers” and “nevertakers” could be very different. If family history and other relevant patient characteristics were measured, it would be possible to relax the homogeneity assumptions (and instrumental conditions) to hold within levels of covariates and then present a point estimate of the perprotocol effect within levels of the covariates [6, 7, 16].
Bounds and point identification results for the perprotocol effect, aged 50–64 years
Thus far, we have considered bounding the counterfactual risks and the perprotocol effect among the 55–64year age group. We repeated these computations for subjects aged 50–54 years (Additional file 1: Table S2). We then standardized by age group to obtain the estimates presented in Additional file 1: Table S3; sexstratified results are presented in Additional file 1: Table S4. The final agestandardized bounds for the perprotocol risk difference and risk ratio estimated under each set of assumptions are shown in Figs. 2 and 3. As demonstrated above with estimating the effect among the 55–64year age group, the agestandardized bounds for the risk difference under the instrumental conditions are relatively wide (e.g., the risk difference for CRC incidence is between −0.6 % and 37.0 %), while narrower bounds or even point identification can be achieved by making additional assumptions about the possible magnitude and direction of the effect in the “nevertakers”.
Discussion
We have demonstrated how combining data with various sets of assumptions helps to bound the perprotocol effect of point interventions (i.e., interventions that are not sustained over time) in randomized trials with dichotomous outcomes. In our application to a trial of CRC screening, we showed how bounds for both the perprotocol risk difference and risk ratio are achievable. Our application illustrates three key benefits of an approach based on partial identification with progressively stronger assumptions.
First, this approach illuminates our reliance on unverifiable assumptions. In our trial, the wide bounds under no assumptions make clear that we cannot learn much at all about the effectiveness of screening without bringing in prior knowledge about the study design or our subject matter.
Second, this approach provides the range of effect sizes we are most confident in under fairly reasonable assumptions. In our trial we could estimate relatively informative lower bounds that quantify the maximum benefit of screening. For example, had everybody been screened, at most we would expect CRC risk to decrease by 0.6 percentage points. This number provides a limit for how much our ITT effect estimate (−0.2 %) might underestimate the effectiveness under perfect adherence, and a boundary that could be helpful in evaluating the costeffectiveness of screening or informing clinical or policy decisions. We know less about the upper bound (minimum effectiveness or even possible harm) of the screening program without making more debatable assumptions, but the type of analyses presented in Fig. 1 provides a template for discussing what level of assumptions may be reasonable and how much differing opinions may lead to differing conclusions.
Third, this approach can demonstrate our confidence, or lack thereof, in the effect sizes for certain subpopulations [13–15]. In our trial, the estimates support the benefit of CRC screening for nearly two thirds of the study population (the “compliers”), and in this case we can describe which individuals are included in this group. In randomized trials with noncompliance in both arms, we can only obtain a point estimate for the effect in the “compliers” if we assume there are no “defiers”. However, we would not know who the “compliers” are and membership in this group may vary across studies. Because of this, the common practice of presenting this subgroup effect alone is of questionable interest for clinical or policy decisionmaking [17] as there is no obvious way of applying the results of the study to that particular subgroup. When presented alongside bounds for the effect in the full study population, however, investigators may sometimes be able to discern whether certain subpopulations are likely to receive more benefit or harm than others. In trials with onesided noncompliance, like the NORCCAP trial, such practice is sometimes actionable because we can describe the subpopulation of “compliers” based on measured prerandomization characteristics.
Investigators considering employing these methods in randomized trials with point interventions and dichotomous outcomes should consider how features of their particular study design may affect which sets of assumptions we describe in Table 2 are reasonable. The instrumental conditions are expected to hold in placebocontrolled, doubleblinded randomized trials of point interventions where there is no loss to followup, no placebo effect, and doubleblinding is successfully maintained, but the instrumental conditions are suspect in headtohead randomized trials and whenever doubleblinding is not successfully maintained or there is a possible placebo effect. The homogeneity conditions, on the other hand, are not expected to hold based on any study design feature and thus should be weighed judiciously when applied to the analysis of any randomized trial. A similar caveat applies to conditions about the distribution of or effects within compliance types when there is noncompliance in both treatment arms [5].
Our discussion of bounding the perprotocol effect focused on dichotomous outcomes and point interventions. Similar bounds under the instrumental conditions can be identified for continuous outcomes if one assumes the outcomes are finitely bounded [8], and the pointidentification expressions under effect homogeneity conditions can also be restated to apply to continuous outcomes [6, 7, 16]. Because we can choose to estimate cumulative risk up through any point in time in followup, we could also extend these bounds to bounding the survival curve for timetoevent outcomes [18]. Partial identification strategies can also be applied to trials with substantial attrition by further incorporating methods to account for selection bias, e.g., inverse probability weighting [19]. In trials that involve an intervention sustained over time, accounting for nonadherence can be more complicated as participants may discontinue the intervention at different times during followup and timevarying patient characteristics may inform and be affected by these decisions. More research is needed on how to generalize partial identification strategies to such settings, although the pointidentification results can be expanded upon using structural nested models under related homogeneity and instrumental conditions [7, 20]. Finally, our example and discussion has focused on identification, but there is a growing body of literature on how to incorporate random variability [21]. Specifically, there has been recent development in methods for estimating confidence intervals around the bounds [22–26] as well as estimating confidence intervals for the partially identified treatment effect itself [27, 28]. Incorporating random variability into the presentation of partial identification results in randomized trials is critical; however, more research is needed as there is currently no consensus in the statistical literature on – or readily available software for – the optimal approach.
The perprotocol effect is often of greater interest than, or complementary with, the ITT effect [1, 2]. In trials like the NORCCAP trial with essentially no loss to followup, we can easily compute an unbiased estimate for the ITT effect. However, the ITT effect quantifies the effect of assignment to treatment. From a patient’s perspective, deciding whether or not to take treatment requires knowledge about the effect of the treatment when received as intended rather than the effect of merely being assigned to treatment [1, 2]. Further, the ITT effect is studyspecific because it depends on the magnitude and type of observed adherence to the intervention among study participants. That the perprotocol effect is independent of the observed adherence makes it interesting from a societal perspective too. For example, were the screening made available in the future to the Norwegian population, the actual adherence to the intervention could be different from that observed in the trial (not the least because the trial itself contributed to establish the efficacy of screening). As a result, the ITT effect from the trial would be outdated as a tool for decisionmaking, e.g., for costeffectiveness analyses. On the other hand, unbiased estimates for the perprotocol effect, while potentially more relevant for decision making, are not achievable from the data alone: investigators need to combine the data with assumptions based on the study design and subject matter expertise. Historically, this has deterred many investigators from estimating the perprotocol effect as expert knowledge is, by definition, provisional and fallible.
Conclusion
As we have demonstrated using data from the NORCAPP trial, bounding the perprotocol effect under several sets of assumptions provides investigators with a middle ground between presenting a single value for the perprotocol effect based on sometimes heroic assumptions versus avoiding estimating the perprotocol effect altogether. This middle ground shifts the scientific debate to what assumptions are most plausible and, therefore, to what range of effect sizes we are most confident in.
Abbreviations
 CRC:

colorectal cancer
 ITT:

intentiontotreat
 IV:

instrumental variable
 NORCCAP trial:

Norwegian Colorectal Cancer Prevention trial
References
 1.
Hernán MA, HernandezDiaz S, Robins JM. Randomized trials analyzed as observational studies. Ann Intern Med. 2013;159(8):560–2. doi:10.7326/00034819159820131015000709.
 2.
Hernán MA, HernandezDiaz S. Beyond the intentiontotreat in comparative effectiveness research. Clin Trials (London, England). 2012;9(1):48–55. doi:10.1177/1740774511420743.
 3.
Balke A, Pearl J. Bounds on treatment effects for studies with imperfect compliance. J Am Stat Assoc. 1997;92(439):1171–6.
 4.
Hernan MA, HernandezDiaz S, Werler MM, Mitchell AA. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol. 2002;155(2):176–84.
 5.
Richardson T, Robins JM. Analysis of the binary instrumental variable model. In: Dechter R, Geffner H, Halpern JY, editors. Heuristics, probability, and causality: a tribute to Judea Pearl. London: College Publications; 2010: 415–44.
 6.
Robins JM. The analysis of randomized and nonrandomized AIDS treatment trials using a new approach to causal inference in longitudinal studies. In: Sechrest L, Freeman H, Mulley A, editors. Health service research methodology: a focus on AIDS. Washington, DC: US Public Health Service; 1989. p. 113–59.
 7.
Robins JM. Correcting for noncompliance in randomized trials using structural nested mean models. Commun Stat. 1994;23:2379–412.
 8.
Manski CF. Nonparametric bounds on treatment effects. Am Econ Rev. 1990;80(2):31923
 9.
Bretthauer M, Gondal G, Larsen K, Carlsen E, Eide TJ, Grotmol T, et al. Design, organization and management of a controlled population screening study for detection of colorectal neoplasia: attendance rates in the NORCCAP study (Norwegian Colorectal Cancer Prevention). Scand J Gastroenterol. 2002;37(5):568–73.
 10.
Holme O, Loberg M, Kalager M, Bretthauer M, Hernán MA, Aas E et al. Colorectal cancer incidence and mortality after flexible sigmoidoscopy screening – First populationbased randomized trial. JAMA. 2014;312(6):606615
 11.
Hoff G, Grotmol T, Skovlund E, Bretthauer M. Norwegian Colorectal Cancer Prevention Study G. Risk of colorectal cancer seven years after flexible sigmoidoscopy screening: randomised controlled trial. BMJ. 2009;338:b1846. doi:10.1136/bmj.b1846.
 12.
Richardson T, Robins JM. ACE bounds; SEMs with equilibrium conditions. Stat Sci. 2014;29(3):363366.
 13.
Frangakis CE, Rubin DB. Principal stratification in causal inference. Biometrics. 2002;58(1):21–9.
 14.
VanderWeele TJ. Principal stratification – uses and limitations. Int J Biostat. 2011;7(1):1–14.
 15.
Angrist JD, Imbens GW, Rubin DB. Identification of causal effects using instrumental variables. J Am Stat Assoc. 1996;91(434):444–55.
 16.
Hernán MA, Robins JM. Instruments for causal inference: an epidemiologist’s dream? Epidemiology (Cambridge, Mass). 2006;17(4):360–72. doi:10.1097/01.ede.0000222409.00878.37.
 17.
Swanson SA, Hernán MA. Think globally, act globally: an epidemiologist’s perspective on instrumental variable estimation. Stat Sci. 2014;29(3):371–4.
 18.
Pearl J. Imperfect experiments: bounding effects and counterfactuals. Causality. New York City: Cambridge University Press; 2009. p. 259–81.
 19.
Little RJ, D’Agostino R, Cohen ML, Dickersin K, Emerson SS, Farrar JT, et al. The prevention and treatment of missing data in clinical trials. New Engl J Med. 2012;367(14):1355–60.
 20.
Robins JM. Structural nested failure time models. In: Anderson PK, Keiding N, editors. The encyclopedia of biostatistics. Chichester, UK: Wiley; 1998. p. 4372–89.
 21.
Tamer E. Partial identification in econometrics. Annu Rev Econ. 2010;2(1):167–95.
 22.
Chernozhukov V, Hong H, Tamer E. Estimation and confidence regions for parameter sets in econometric models. Econometrica. 2007;75(5):1243–84.
 23.
Horowitz JL, Manski CF. Nonparametric analysis of randomized experiments with missing covariate and outcome data. J Am Stat Assoc. 2000;95(449):77–84.
 24.
Manski CF, Sandefur GD, McLanahan S, Powers D. Alternative estimates of the effect of family structure during adolescence on high school graduation. J Am Stat Assoc. 1992;87(417):25–37.
 25.
Ramsahai R, Lauritzen S. Likelihood analysis of the binary instrumental variable model. Biometrika. 2011;98(4):987994.
 26.
Romano JP, Shaikh AM. Inference for identifiable parameters in partially identified econometric models. J Stat Plann Infer. 2008;138(9):2786–807.
 27.
Imbens GW, Manski CF. Confidence intervals for partially identified parameters. Econometrica. 2004;72(6):184557.
 28.
Vansteelandt S, Goetghebeur E, Kenward MG, Molenberghs G. Ignorance and uncertainty regions as inferential tools in a sensitivity analysis. 2006;16(3):953979.
Acknowledgments
This research was partly funded by National Institutes of Health grants R01 AI102634 and P01 CA134294.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors report no competing interests.
Authors’ contributions
SS, ØH, ML, MK, MB, GH, EA, and MH contributed to the conception of the current study. SS performed the data analyses and drafted the manuscript. ØH, ML, MK, MB, GH, EA, and MH contributed to the interpretation of the data and provided critical revisions. All authors read and approved the final manuscript.
Additional file
Additional file 1:
Norwegian Colorectal Cancer Prevention trial instrumental variable (NORCCAP IV) bounds trials submission supplement. Supplemental tables and an appendix describing the derivations for bounds not presented in the main text. (PDF 132 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Swanson, S.A., Holme, Ø., Løberg, M. et al. Bounding the perprotocol effect in randomized trials: an application to colorectal cancer screening. Trials 16, 541 (2015). https://doi.org/10.1186/s1306301510568
Received:
Accepted:
Published:
Keywords
 Instrumental variable
 Partial identification
 Perprotocol effect
 Colorectal cancer
 Screening