### Objectives

As in any data analysis, the first consideration is the objective of the analysis. In the presence of dropouts, there can be two types of questions: (i) What would be the treatment effect without dropouts? and (ii) What would be the treatment effect in the presence of dropouts? Question (i) is concerned with an ideal situation. It is also known as a 'question for explanatory trials' [6]. It is often concerned with the human pharmacological properties of new drugs under investigation rather than practical usage. Regarding question (ii), we need to further differentiate two situations: patients drop out either (a) totally from the study and no data are collected after withdrawal, or (b) merely from the study assigned treatment with data still being collected. For (b) there will be no missing data. If we can design trials that will allow patients to be followed until the end of the study despite the patient's lack of compliance, then (ii) is a very practical question, also known as the 'question for pragmatic trials' [7]. Prevention studies with all-cause mortality as the primary endpoint usually follow this design. However, other endpoints may also be followed-up (until death) in such a design. A recent example is [8], in which all participants, even those who discontinued treatment (lovastatin or placebo), were contacted annually for vital status, cardiovascular events, and cancer history. Since no missing data would occur, the design of (b) is highly recommended for all trials if at all possible. In fact, the ITT principle originally aims to answer question (ii) with (b) type of dropouts, where no missing data would occur. However, more often than not we face studies in which patients have withdrawn from the study entirely and caused the missing data problem, ie, type (a), as the Examples 1–5 (with the exception of Example 3) above have demonstrated. Unless the patient's clinical status does not permit further testing after discontinuing the study treatment, type (a) dropout problem is a common design flaw and should be corrected. Nevertheless, the problem of no follow-up data prevails in clinical trials. For clinical trials conducted for drug registrations it is possible that, in light of the International Conference on Harmonization (ICH)-E9 guideline [9], the data analyses have to address both questions (i) and (ii).

### Imputation methods

The analyses illustrated in Table 2 were methods in the general category of imputation. In general, the basic idea of imputation is to fill in the missing data by using values based on a certain model with assumptions. There are methods based on a single imputation and methods based on multiple imputation, which, instead of filling in a single value for each missing value, replace each missing value with a set of plausible values that represent the uncertainty about the right value to impute. The attraction of imputation is that once the missing data are filled-in (imputed), all the statistical tools available for the complete data may be applied. Each method of (a), (b) and (c) in Table 2 is a single simple imputation method, but together they may be viewed as a 'multiple simple imputation' method (as opposed to the 'proper multiple imputation' method discussed below). The data in Table 2 only had one time-point (Month 6) for analysis.

For longitudinal data with multiple time-points, the conventional last-observation-carried-forward (LOCF) approach is a common practice of another simple imputation. This approach was used by the authors in Examples 3 and 5. Attempting to follow the principle of ITT to account for all randomized, LOCF method includes every randomized subject who has at least one post-therapy observation. LOCF is popular among practitioners because it is simple to put into effect and because of a misconception that it is conservative (meaning working against an effective treatment group). However, every imputation method implicitly or explicitly assumes a model for the missing data. The LOCF assumes (unrealistically) that the missing data after patient's withdrawal are the same as the last value observed for that patient. The consequence of this assumption is that it imputes data without giving them within-subject variability and that it alters the sample size.

Proper multiple imputation (PMI) methods are described in [10] and [11], which use regression models to create more than one imputed data sets and thus provide variability within and between imputations. PMI method has long been a preferred approach in survey research. Its popularity has recently gainied in clinical trials since the method became automated by commercial computer software [12, 13]. However, the complexity of regression models used in PMI should be carefully thought through by clinical trial practitioners, because the method assumes that the missing data process can be fully captured by the regression model employed on observed values. This assumption is called missing at random (MAR). MAR essentially says that the cause of the missing data may be dependent on observed data (such as data of previous visits) but must be independent of the missing value that would have been observed. It is a less restrictive model than MCAR, which says that the missing data cannot be dependent on either the observed or the missing data. The design suggested by Murray and Findlay [14], which forced dropouts upon observing uncontrolled BP, uses the MAR principle. When MAR or MCAR conditions are met, model-based analyses can be appropriately performed based on the observed data alone without further modeling the missing data process.

Another imputation method, which is in-between the LOCF and PMI, is the partial imputation (PI) or improved LOCF method [15]. The idea of this method is quite simple. In LOCF, one imputes every missing visit time-point by carrying the last observation forward until the end of the study. Since LOCF requires the strong assumption of stability, the more it imputes the more bias it introduces if the assumption of stability does not hold. The method of PI does not always carry the observations to the end time-point of the study, but just far enough to balance the dropout patterns between the treatment groups. The underlying principle is that when the dropout patterns are made almost identical between the treatment groups, the relative comparison of the treatment effects will be less biased. Since PI does less imputation, it is less biased than LOCF because the assumption of stability usually does not hold. Some simulation results under various missing data processes demonstrated the potential usefulness of PI over the methods of using all available data and LOCF [15]. However, more experience is still needed to test this new method in practice.