 Research
 Open Access
 Open Peer Review
 Published:
A Bayesian prediction model between a biomarker and the clinical endpoint for dichotomous variables
Trialsvolume 15, Article number: 500 (2014)
Abstract
Background
Early biomarkers are helpful for predicting clinical endpoints and for evaluating efficacy in clinical trials even if the biomarker cannot replace clinical outcome as a surrogate. The building and evaluation of an association model between biomarkers and clinical outcomes are two equally important concerns regarding the prediction of clinical outcome. This paper is to address both issues in a Bayesian framework.
Methods
A Bayesian metaanalytic approach is proposed to build a prediction model between the biomarker and clinical endpoint for dichotomous variables. Compared with other Bayesian methods, the proposed model only requires triallevel summary data of historical trials in model building. By using extensive simulations, we evaluate the link function and the application condition of the proposed Bayesian model under scenario (i) equal positive predictive value (PPV) and negative predictive value (NPV) and (ii) higher NPV and lower PPV. In the simulations, the patientlevel data is generated to evaluate the metaanalytic model. PPV and NPV are employed to describe the patientlevel relationship between the biomarker and the clinical outcome. The minimum number of historical trials to be included in building the model is also considered.
Results
It is seen from the simulations that the logit link function performs better than the odds and cloglog functions under both scenarios. PPV/NPV ≥0.5 for equal PPV and NPV, and PPV + NPV ≥1 for higher NPV and lower PPV are proposed in order to predict clinical outcome accurately and precisely when the proposed model is considered. Twenty historical trials are required to be included in model building when PPV and NPV are equal. For unequal PPV and NPV, the minimum number of historical trials for model building is proposed to be five. A hypothetical example shows an application of the proposed model in global drug development.
Conclusions
The proposed Bayesian model is able to predict well the clinical endpoint from the observed biomarker data for dichotomous variables as long as the conditions are satisfied. It could be applied in drug development. But the practical problems in applications have to be studied in further research.
Background
The biomarker employed as a surrogate to evaluate efficacy is preferred in clinical trials. It leads to reduced trial durations and costs [1, 2], improved compliance [3], better ethical satisfaction [4], etcetera For example, United States Food and Drug Administration (FDA) has enabled accelerated marketing approval based on a surrogate endpoint, rather than the primary endpoint, in lifethreatening diseases [5, 6]. Controversy has surrounded the evaluation of a biomarker as a surrogate [7–20] since Prentice [7] formulated a statistical framework in the context of hypothesis testing. However, the evaluation of a biomarker covers various considerations [21–23]. The experience of the evaluation is limited and very few biomarkers are accepted as surrogate endpoints in practice [21]. On the other hand, as an intermediate endpoint, the biomarker is still helpful to predict the effect on clinical endpoint from the intervention even it cannot directly replace the true clinical endpoint as a surrogate. In drug development, the prediction of a true clinical endpoint from an early biomarker makes sense in early drug efficacy evaluation and decision making. The key point here is how to build and evaluate the prediction model between biomarker and clinical endpoint.
A metaanalytic approach was first considered by Fleming [11] and Hughes et al. [12] to validate a surrogate endpoint from a collection of previous trials. Some approaches modeled the association between a biomarker and clinical endpoint based on triallevel summary data in a frequentist framework [16–19]. But few have made such evaluations within the Bayesian framework. The main paper involving a Bayesian metaanalytic method is that of Daniels and Hughes [13], who considered a mixed effect model for the triallevel association of treatment effect on biomarker and clinical outcome. However, it required patientlevel data to estimate the correlation between biomarker and clinical endpoint in the model building [24]. Both patientlevel data and triallevel summary data were necessary for the method of Daniels and Hughes. Furthermore, Van Walraven et al.[24] implemented a Monte Carlo model and Wang et al.[25] employed classical Cox model to predict clinical outcomes from biomarkers.
The evaluation of the model between the biomarker and the clinical endpoint is another key concern. It is vital that the model is able to describe the association between the biomarker and the clinical outcome well and to predict the clinical endpoint from the observed biomarker accurately and precisely in a new trial. Buyse et al.[14] proposed the coefficient of determination ${R}_{\mathit{\text{trial}}}^{2}$ to evaluate the triallevel association and predictive ability. But the value of ${R}_{\mathit{\text{trial}}}^{2}$ is difficult to interpret [20], and it is difficult to clearly define which model is appropriate for prediction. On the other hand, the accessibility of triallevel summary data from historical trials is one of the advantages of metaanalytic approaches. But this approach also leads to information loss compared with the use of patientlevel data. Therefore, simulation studies [17], which evaluate the model based on the simulative data from a specific assumption, ignore the effect of information loss from triallevel data and may overevaluate the predictive ability of metaanalytic model. It is more objective and reasonable to evaluate the metaanalytic model based on the assumption of the patientlevel data of previous trials in the simulations.
In this paper, we consider a Bayesian metaanalytic method for building a prediction model between the biomarker and the clinical endpoint when both endpoints are dichotomous, which is completely independent of patientlevel data. It is used to predict rate ratio (RR) of the clinical endpoint from an early biomarker. In addition, we evaluate the proposed prediction model by using extensive simulations from clinical practical considerations. The patientlevel data are simulated to evaluate the metaanalytic prediction model to avoid the assumption on triallevel data. Positive predictive value (PPV) and negative predictive value (NPV), which measure the patientlevel association between biomarker and clinical endpoint for dichotomous variables, are employed in the simulations.
From the clinical point, a good biomarker is expected to have higher PPV and NPV to predict longterm clinical outcome (for example, in the early detection of a disease in clinical diagnosis, the early observed recurrence of a tumor in oncology study and so on). But a biomarker with higher NPV and lower PPV is also common in medical studies. For example, persistent infection of HPV (human papillomavirus) is considered to be a potential biomarker of high grade cervical disease and eventually cervical cancer. Because though HPV persistent infection case does not always progress to high grade cervical disease, the one who is not infected with HPV persistently has only a small probability of getting the disease from the viewpoint of medical mechanisms. In HPV vaccine clinical trials, a persistent infection of HPV, as a biomarker that has higher NPV and lower PPV, is able to help predict vaccine efficacy. Therefore, both scenarios, (i) equal PPV and NPV and (ii) higher NPV and lower PPV, are considered in the evaluation of the proposed model in the simulations.
The proposed prediction model between biomarker and clinical outcome in this article is built in a Bayesian framework. It is different from the frequentist metaanalytic approaches [14–19, 26]. It intuitively describes the association between biomarker and clinical endpoint and is easy to be implemented by using Markov Chain Monte Carlo (MCMC) techniques. But compared with other Bayesian methods, the proposed model has its own features. The Bayesian mixed model proposed by Daniels and Hughes [13] is not a complete metaanalytic method and the advantage of the metaanalytic approach cannot be shown. But the proposed Bayesian model has no such restriction. Furthermore, the method of Daniels and Hughes ignores the variability of withintrial treatment effect [10], which is included in the proposed Bayesian model.
Methods
Bayesian model for prediction
Consider N randomized trials of size n _{ i }(i = 1, 2, …, N). Equal sample size is assumed in the treatment and control group in each trial. In the ith trial, B _{ Ti } and B _{ Ci } biomarker responses are observed in the treatment and control group, respectively. X _{ Ti } and X _{ Ci } subjects respond to the clinical outcome in the two groups. Let φ _{ Bi } be the proportion of biomarker responses in the treatment group in the ith trial. It is estimated by ${\widehat{\phi}}_{\mathit{Bi}}={B}_{Ti}/\left({B}_{Ti}+{B}_{\mathit{Ci}}\right)$. Correspondingly, ${\widehat{\phi}}_{\mathit{Xi}}={X}_{Ti}/\left({X}_{Ti}+{X}_{\mathit{Ci}}\right)$ is the estimate of φ _{ Xi }, which denotes the proportion of clinical outcome responses in the treatment group. Assuming the association between the biomarker and the clinical endpoint is equal across N trials irrespective of the intervention, a generalized linear model
is proposed to describe the relationship. In the model, g(⋅) is a link function. The biomarker and clinical endpoint are transformed by using the link function because they are both dichotomous. In this paper, three link functions, the odds function:
the logit function:
and the cloglog function:
are considered and will be further compared in the simulations.
Given B _{ i }, B _{ Ti }, X _{ i } and X _{ Ti }, φ _{ Bi } and φ _{ Xi } follow the distributions of
and
in which B _{ i } = B _{ Ti } + B _{ Ci } and X _{ i } = X _{ Ti } + X _{ Ci }.
In the Bayesian model, we consider a uniform prior for the coefficient β and a normal prior for the intercept β _{0} with mean zero and variance σ ^{2}. The φ _{ Bi } takes a prior distribution of Beta(a _{ Bi }, b _{ Bi }). The noninformative priors are considered for all three parameters β, β _{0} and φ _{ Bi }. The posterior distributions of the three parameters depend on the link function in the model. Given B _{ i }, B _{ Ti }, X _{ i }, X _{ Ti } and a specific link function, the estimates of posterior distributions for β, β _{0} and φ _{ Bi } are calculated by using the MCMC method based on the formulas (1), (5) and (6). The uncertainty on withintrial treatment effect is considered by incorporating formula (5) and (6) in the model.
For a new trial j, the rate ratio of the treatment and control group on the clinical endpoint
is employed to evaluate the efficacy of the intervention. An equal association between the biomarker and the clinical endpoint across the new trial and historical trials is assumed here. It means that the biomarker in the new trial captures the same treatment effect as the one in the historical trials. Based on the Bayesian model built from N historical trials, the MCMC estimation of predictive distribution for φ _{ Xj } is obtained when φ _{ Bj } is given. Correspondingly, the predictive distribution of RR _{ j } is derived from formula (7). We take the median of the predictive distribution as the point estimate of RR _{ j } prediction and construct the 95% credible interval (CI) with 2.5% and 97.5% percentiles. The flow chart of Bayesian model building and prediction is depicted in Figure 1.
The predictive ability of the proposed model is related to the association strength between the biomarker and the clinical outcome. PPV and NPV are generally employed to measure the association from the patient level. These directly affect the predictive ability of the proposed Bayesian model. When one uses the proposed approach to build the Bayesian model and to predict the clinical outcome of the new trial, the first thing is to evaluate the strength of the association between the biomarker and the clinical endpoint, which is measured by PPV and NPV. Therefore, a simulation study is conducted to explore the application condition of the proposed method.
Simulation study
A simulation study is employed to (a) compare the predictive ability of different link functions (the odds, logit and cloglog functions) in the proposed Bayesian model when PPV and NPV vary; (b) explore the effect of the association strength between biomarker and clinical endpoint, which is measured by PPV and NPV, on the predictive ability of the model; and (c) discuss the number of historical trials to be included in model building for a good clinical prediction in the new trial. As we have mentioned above, the biomarkers with (i) equal PPV and NPV and (ii) higher NPV and lower PPV are both common in medical studies. From the practical perspective, both scenarios are considered in the simulations. All simulations are repeated for 5,000 times and performed by using R package for data generation and calculation and OpenBUGS for Bayesian model fitting.
Data generation process
In N historical randomized trials, it is assumed that the biomarker response rate of control group is 0.3, and that 400 patients complete the trial for simplicity. The sample size ratio of treatment and control group is 1:1.
Let π _{ BTi } and π _{ BCi } be the biomarker response rate of the treatment and control group in the ith trial. Then φ _{ Bi } = π _{ BTi }/(π _{ BTi } + π _{ BCi }). when equal sample size in the two groups. Because 0 < π _{ BTi } <1 and ${\pi}_{BTi}=\frac{{\phi}_{\mathit{Bi}}}{1{\phi}_{\mathit{Bi}}}{\pi}_{BCi},$ φ _{ Bi } has to be 0 < φ _{ Bi } <0.76 when π _{ BCi } is specified as 0.3. Therefore, we consider φ _{ Bi } comes from the uniform distribution U(0, 0.76) and the biomarker response rate of the treatment group π _{ BTi } is derived when π _{ BCi } = 0.3 in the simulations.
Let D _{ pi } be the biomarker response identifier of the pth patient in the ith trial and ${B}_{i}={\displaystyle \sum _{p=1}^{{n}_{i}}\left({D}_{\mathit{pi}}=1\right)}.$ Here, D _{ pi } = 1 when the biomarker responds; otherwise, D _{ pi } = 0. It is randomly generated from D _{ pi } ~ Bernoulli(π _{ BTi }) for the treatment group and D _{ pi } ~ Bernoulli(π _{ BCi }) for the control group.
Let ρ be the concordance index between C _{ pi } and D _{ pi }, where C _{ pi } denotes the clinical response identifier of the pth patient in the ith trial and ${X}_{i}={\displaystyle \sum _{p=1}^{{n}_{i}}\left({C}_{\mathit{pi}}=1\right)}.$ ρ is from the distribution of Bernoulli(PPV) for D _{ pi } = 1 and Bernoulli(NPV) for D _{ pi } = 0. The patientlevel data of clinical endpoint C _{ pi } could be derived by using
Finally, the summary statistics of each historical trial, B _{ i }, B _{ Ti,} X _{ i } and X _{ Ti }, are derived from the patientlevel data and employed to build the Bayesian model by using formulas (1), (5) and (6).
The calculation of true values
For a new trial j with sample size of 400, given PPV, NPV and π _{ BTj }, the true value of the clinical response rate of the treatment group is calculated by using
Equally, the true value of clinical response rate of control group π _{ XCj } is calculated and the true value of RR _{ j } is derived from RR _{ j } = π _{ XTj }/π _{ XCj }.
Measures of predictive ability
The accuracy and robustness of RR prediction is vital to measure the predictive ability of the proposed Bayesian model. But the traditional measures, for example, bias, root mean square error (RMSE), etcetera, cannot be applied here because the endpoint is binary. The log transformation is considered to adjust the nonnormality of the binary endpoint. The modified bias and modified RMSE are proposed to measure the accuracy of the RR prediction. For the new trial j, when K predicted values of RR _{ j } are obtained, they are calculated by using
and
in which $R{R}_{j}^{\mathit{\text{True}}}$ denotes the true value of RR _{ j } and $R{R}_{kj}^{P}$ is the kth predicted value of RR _{ j }. Similarly, modified RMSE includes the effect of bias direction to estimate prediction error compared with modified bias. The RR prediction is more accurate when the modified bias and the modified RMSE approach to 0.
To measure the precision of the RR prediction, the average width of the 95% CIs is calculated by using
where $R{R}_{kjU}^{P}$ and $R{R}_{kjL}^{P}$ are the upper and lower bound of 95% CI of the kth RR _{ j } prediction respectively. When the average width of 95% CIs approaches to 1, the better the precision of RR prediction is.
Results
Simulation I: comparison of Bayesian model with different link functions and positive predictive values/negative predictive values
Three link functions ( the odds, logit and cloglog functions) are compared in this section. The φ _{ Bj } is assumed to be 0.1, 0.3, and 0.5. Ten historical trials are generated randomly from patient level and their triallevel summary data are included for Bayesian model building. Modified bias and modified RMSE are calculated to measure the accuracy of RR prediction, and average width of 95% CIs is estimated to evaluate the precision. Both scenarios, (i) equal PPV and NPV and (ii) higher NPV and lower PPV, are considered here.
Scenario I: equal PPV and NPV
In the simulations, PPV/NPV is assumed to take the values of 0.1, 0.25, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95 and 0.99. Figure 2 depicts the change of modified bias, modified RMSE and average width of 95% CIs for different φ _{ Bj } and link functions. No matter which link function is employed, the modified bias and the modified RMSE are both worse when PPV/NPV <0.5. The modified RMSE fails to be estimated for the logit link function when PPV/NPV <0.5 and φ _{ Bj } ≤0.3 because of the poor prediction. The other two link functions have the same problems when PPV/NPV = 0.1 and φ _{ Bj } = 0.1. On the other hand, the average width of the 95% CIs is not good when PPV/NPV <0.5, especially when φ _{ Bj } <0.5. Therefore, PPV/NPV ≥0.5 is an indispensable condition of the proposed Bayesian model for RR prediction under the assumption of equal NPV and PPV.
Among the three link functions, both modified bias and modified RMSE of the logit function are the closest to zero when PPV/NPV ≥0.5, and the cloglog link function is better than the odds function. For the logit link function, modified bias and modified RMSE have a little inflation when PPV/NPV = 0.8 and approach to zero when PPV/NPV continues increasing. The accuracy of the prediction remains good, however, if φ _{ Bj } varies when PPV/NPV ≥0.5 and logit link function is employed. Regarding the precision of the RR prediction, the logit link function has the narrowest average width of the 95% CIs, which increases and approaches to the other two link functions when PPV/NPV increases. As φ _{ Bj } increases, the treatment effect becomes smaller and the average width of 95% CIs of the logit link function decreases. The logit link function is considered as the first choice for building the Bayesian model from the perspective of the accuracy and precision of RR prediction. The detailed simulation results are presented in Additional file 1: Table S1.
Scenario II: unequal positive predictive value and negative predictive value
In this scenario, NPV is considered to be 0.99, 0.95 and 0.9, and PPV is not larger than 0.5. The values of PPV and NPV are listed in Additional file 1: Table S2. As is depicted in Figure 3, the logit and cloglog link function have the better prediction accuracy on modified bias and modified RMSE when PPV + NPV ≥1. But the odds link function leads to an inaccurate prediction, especially when φ _{ Bj } = 0.1. As φ _{ Bj } increases, the modified bias of the cloglog link function is rising and deviates from zero when φ _{ Bj } ≥0.3 though the modified RMSE of cloglog one still looks fine. However, the modified bias of the logit link function always fluctuates around zero regardless of how φ _{ Bj } varies.
On the other hand, the average width of the 95% CIs of the logit link function is a little narrower than cloglog one when PPV/NPV remains the same and PPV + NPV ≥1. Considering both the accuracy and precision of the prediction, the logit link function is also the optimal one to build the Bayesian model when PPV + NPV ≥1 under the assumption of higher NPV and lower PPV. Furthermore, the average width of the 95% CIs of the logit link function is a little larger when φ _{ Bj } = 0.1 and becomes smaller when φ _{ Bj } increases. When PPV is fixed, the logit function has the narrower average width of 95% CIs, as NPV is smaller.
Simulation II: the number of historical trials to be included for model building
The number of historical trials to be included for model building is a key issue to metaanalytic approaches. In this section, 5, 10, 20, 30 and 50 historical trials are assumed to build the Bayesian prediction model, and their predictive abilities are compared under (i) equal PPV and NPV and (ii) higher NPV and lower PPV. Based on the simulation results of the last section, the logit link function is employed to build the Bayesian model here. The observed φ _{ Bj } of the new trial is still assumed to be 0.1, 0.3, and 0.5.
Scenario I: equal positive predictive value and negative predictive value
Based on the results of the last section, PPV/NPV ≥0.5 is considered here. The values of PPV/NPV are listed in Additional file 1: Table S3. As is seen in Figure 4, the prediction is a little underestimated when φ _{ Bj } = 0.1. The modified bias increases and even deviates from zero positively with the rise of φ _{ Bj }. It varies within (0.08,0.08) regardless of how φ _{ Bj } changes. The modified RMSE is the smallest when 50 trials are included in model building for φ _{ Bj } = 0.1. But as φ _{ Bj } increases, the larger N does not always bring an accurate prediction. For example, the model built from five trials has the smallest modified RMSE when φ _{ Bj } = 0.5. It is perhaps because there is no treatment effect when φ _{ Bj } = 0.5, and more historical trials include larger variations in the model, which reduces the accuracy of the prediction. Regarding the precision of RR prediction, the average width of the 95% CIs is smaller when N is larger and φ _{ Bj } increases. The detailed simulation results are presented in Additional file 1: Table S3.
When N = 20, the maximum modified RMSE is only about 0.1, and the average width of the 95% CIs is also larger than 0.6 even when φ _{ Bj } = 0.1. Therefore, the minimum number of historical trials to be included for model building is proposed to be 20 for an accurate and precise prediction when PPV and NPV are equal.
Scenario II: unequal positive predictive value and negative predictive value
In the simulations, NPV is considered to be 0.99, 0.95 and 0.9 and PPV ≤0.5. As long as PPV + NPV ≥1 is satisfied, the accuracy of the prediction is good on both modified bias and modified RMSE. They both converge to zero when N and φ _{ Bj } increase. The larger PPV + NPV is, the more accurate the prediction. Even when N = 5, the modified RMSE is smaller than 0.4 for φ _{ Bj } = 0.1. Therefore, five historical trials are enough to build a model that has an accurate prediction.
On the other hand, the larger N leads to the smaller average width of 95% CIs, which is shown in Figure 5. When PPV and φ _{ Bj } is fixed, the average width of 95% CIs rises with the increase of NPV. When N = 5 and φ _{ Bj } = 0.1, the minimum average width of 95% CIs is only about 0.2. The 95% CIs of the RR prediction is wide and the Bayesian model brings a conservative interval estimate. In conclusion, when a higher NPV and a lower PPV are considered, the Bayesian model built from five historical trials is enough for an accurate point estimate of the prediction, but leads to a conservative interval estimate. If a precise interval estimate of RR prediction is expected, N = 30 is proposed where the average width of 95% CIs is larger than 0.5 even when φ _{ Bj } = 0.1.
Hypothetical example
To evaluate the efficacy of a vaccine in clinical trial, the incidence rate of the disease of interest is usually observed. Vaccine efficacy (VE) is estimated by using VE = 1  π _{ T }/π _{ C } = 1  RR, in which π _{ T } and π _{ C } denote the incidence rate of the disease in the vaccine and control group. However, the occurrence of disease requires too long a followup time. Virus infection is considered as an early biomarker of the disease. It has an NPV of 99.99% and PPV of 15% for predicting the occurrence of the disease. VE is just a transformation of RR. It is possible to predict vaccine efficacy earlier when the infection cases are observed by employing our proposed approach.
It was hypothesized that five global placebocontrolled vaccine trials with sample size ratio of 1:1 completed in the United States, Europe and Japan. The number of disease cases and infection cases in the vaccine and placebo group in each trial are listed in Table 1. Recently, a new placebocontrolled trial was conducted in China to evaluate VE in the Chinese population. A total of 3606 subjects in two groups of equal size were recruited. A total of 36 infection cases were observed, with 7 cases from the vaccine group. No disease was observed, and we wanted to evaluate vaccine efficacy earlier based on the infection cases.
It is presumed that there is no ethnic difference in the association between infection and disease occurrence even if the vaccine efficacy is different between Chinese and other ethnic groups. The proposed approach can be applied to build the Bayesian model from historical vaccine trials and to predict VE from the observed infection cases in the Chinese population. A Bayesian model built from five historical vaccine trials in the United States, Europe and Japan could bring an accurate point estimate and a conservative interval estimate of VE prediction according to the simulation results in last section. Logit link function is employed in the Bayesian model. The VE on disease in the Chinese population is predicted as 81.84% with 95% CI of (53.98%,97.54%). Consequently, the vaccine is considered to have significant efficacy in the Chinese population because the lower boundary of the conservative 95% CI is still larger than zero.
Discussion
The proposed Bayesian prediction model between a biomarker and the clinical outcome in this article is one of metaanalytic approaches. It focuses on the prediction of clinical outcome based on biomarker, not the replacement as a surrogate. According to Prentice criterion [7], the proposed Bayesian model requires that the association between the biomarker and the clinical endpoint be equal across the two groups. Compared with the Daniels and Hughes method, the proposed Bayesian model is a complete metaanalytic approach, which only requires the accessible triallevel summary data of previous trials, and no patientlevel data is necessary. Nikolakopoulos et al.[27] also employed a Bayesian method to predict clinical endpoint based on observed biomarker for phase II trial decision making. But there are important differences between the two approaches. First, the model between a biomarker and the clinical endpoint must be known for clinical outcome prediction, and how to build the model from previous studies was not considered in [27]. But it is part of our work. Both model building from historical trials and model prediction for a new trial are considered here. Second, as a metaanalytic approach, our proposed model only involves triallevel summary data of historical studies for model building and predicts clinical endpoint with triallevel summary biomarker data for the new trial. The patientlevel data of the new trial is not necessary for the prediction. PPV and NPV are not involved in model building and model prediction, but are needed for model evaluation. Third, the proposed prediction model is also different. Here, we propose a generalized linear model to describe the association between the biomarker and the clinical endpoint. We only consider the prior information on the model parameter β, β _{0} and φ _{ Bi } for model building, but not place prior information on the clinical endpoint in the prediction. Finally, the scenario of unequal PPV and NPV between the biomarker and the clinical endpoint, which is common in vaccine clinical trials, is discussed in this article. On the other hand, the proposed Bayesian model describes the relationship between biomarker and clinical endpoint regardless of treatment effect. The historical trials for model building do not have to contain the same treatment and control group. The observational studies are also available for model building as long as both biomarker and clinical endpoint are collected in the studies. It further increases the availability of historical studies and makes it easy to be applied in practical.
In the proposed Bayesian model, φ _{ Bi } and φ _{ Xi } are employed to measure the treatment difference of biomarker and clinical outcome between the two groups and connected with formula (1). That is because this paper is to predict the rate ratio of clinical outcome for binary endpoint. φ _{ Bi } and φ _{ Xi }, which are usually considered in vaccine clinical trials, have a direct connection with RR that is shown in formula (7) and does not depend on trial size. The association between the biomarker and clinical endpoint could be well described in the model in virtue of φ _{ Bi } and φ _{ Xi }. When unequal sample size in the two groups is considered, φ _{ Bi } is still calculated by φ _{ Bi } = π _{ BTi }/(π _{ BTi } + π _{ BCi }), but estimated by ${\widehat{\mathit{\phi}}}_{\mathit{Bi}}={B}_{Ti}/\left({B}_{Ti}+{B}_{\mathit{Ci}}\times R\right)$ where R is the sample size ratio of treatment and control group. Equally, φ _{ Xi } is estimated by ${\widehat{\mathit{\phi}}}_{\mathit{Xi}}={X}_{Ti}/\left({X}_{Ti}+{X}_{\mathit{Ci}}\times R\right)$. Therefore, the proposed Bayesian model is still applicable for unequal sample size in two groups.
Though multiple metaanalytic models between biomarker and clinical outcome were proposed, few of them evaluated the model in a broad sense and explicitly describe the application guidance. The direct assumption on triallevel data in Baker’s simulation study [17] possibly influences the evaluation. The association model may be overevaluated because the information loss from the triallevel data was ignored. ${R}_{\mathit{\text{trial}}}^{2}$, introduced by Buyse et al.[14], is difficult to interpret, and it is difficult to define the application condition clearly. To overcome both problems, we make assumption on the patientlevel data of historical trials and perform the simulations to evaluate the metaanalytic model.
Both scenarios, (i) equal PPV and NPV and (ii) higher NPV and lower PPV, are considered in the simulations from the practical perspective. In clinical diagnosis, higher PPV and NPV are usually required for a biomarker to detect the potential disease earlier [27, 28]. But the biomarker with a higher NPV and lower PPV is also common in vaccine studies. According to the simulation results, the proposed Bayesian model leads to a good prediction of clinical RR based on a biomarker when both PPV and NPV are larger than 0.5. For higher NPV and lower PPV, the model makes sense of the prediction when PPV + NPV ≥1. In an actual trial, the exact values of NPV and PPV between biomarker and clinical outcome are generally unknown. But it is possible to estimate PPV and NPV within a range by clinicians from the medical mechanism and clinical experiences and evaluate if the model is applicable. Furthermore, the logit link function is better than other functions in model building from the point of accuracy and precision of RR prediction. But for a specific trial, an extensive simulation for the trial is recommended to choose the optimal link function for model building to satisfy the demand of the trial.
Regarding the minimum number of historical trials to be included in model building, it is advised that 20 historical trials be enough to build a model that predicts clinical RR accurately and precisely for equal PPV and NPV. When higher NPV and lower PPV is considered, the model built from five historical trials is able to lead to an accurate point estimate of the prediction, but a conservative interval estimate. If a more precise prediction is demanded, a larger N is required. However, if too low biomarker and clinical response rate is expected in all historical trials, more historical trials are proposed to be included in model building in order to describe the association between biomarker and clinical outcome accurately and well predict the clinical outcome from biomarker in the new trial. A simulation is also proposed to evaluate the minimum number of historical trials for a specific trial.
The proposed Bayesian model for clinical outcome prediction based on biomarker in this paper is not only for binary endpoint, but also continuous variables as long as a suitable link function is given. The proposed method has extensive potential applications in drug development. The example shows an application in global drug development. As long as an equal association between the biomarker and the clinical endpoint across different ethnic groups is considered from the medical point, it is possible to ‘bridge’ the relationship from historical global studies to a new regional trial in virtue of the proposed Bayesian model and then predict clinical endpoint in the new region. It is different from the traditional bridging study, which bridges the treatment effect from historical trials, and equally enhances the efficiency of regional trial in another way. On the other hand, the early prediction of clinical outcome with the help of the Bayesian model built from the historical studies is able to help with making a go/nogo decision in new drug development. However, a few practical problems, for example, sample size estimation of regional trial involving biomarker, go/nogo decision rule based on biomarker, etcetera, will be encountered in the application of the proposed Bayesian model. They are not in the scope of this article and will be discussed in further studies.
Conclusions
A Bayesian prediction model between a biomarker and the clinical outcome is proposed in this paper. It is a complete metaanalytic approach and only requires triallevel data in model building. It is able to predict well the clinical outcome from an observed biomarker when PPV/NPV ≥0.5 for equal PPV and NPV and when PPV + NPV ≥1 for higher NPV and lower PPV. The Logit link function is preferred in both scenarios. The minimum number of historical trials to be included in model building is proposed to be 20 when PPV and NPV are considered to be equal. For higher NPV and lower PPV, the Bayesian model from five historical trials could lead to an accurate point estimate, but conservative interval estimate of the prediction. The proposed model has potential applications in decision making of new drug development and globalregional drug development program. But the practical problems have to be discussed in further studies.
Abbreviations
 NPV:

negative predictive value
 PPV:

positive predictive value
 RMSE:

root mean square error
 RR:

rate ratio
 VE:

vaccine efficacy
 HPV:

human papillomavirus
 MCMC:

Markov Chain Monte Carlo
 CI:

credible interval
 RMSE:

root mean square error.
References
 1.
Boissel JP, Collet JP, Moleur P, Haugh M: Surrogate endpoints: a basis for a rational approach. Eur J Clin Pharmacol. 1992, 43: 235244. 10.1007/BF02333016.
 2.
De Gruttola V, Clax P, DeMets DL, Downing GJ, Ellenberg SS, Friedman L, Gail MH, Prentice R, Wittes J, Zeger SL: Considerations in the evaluation of surrogate endpoint in clinical trials: summary of a National Institutes of Health workgroup. Control Clin Trials. 2001, 22: 485502. 10.1016/S01972456(01)001532.
 3.
ChuangStein C, DeMasi R: Surrogate endpoints in AIDS drug development: current status. Drug Inform J. 1998, 32: 439448.
 4.
Biomarkers Definitions Working Group: Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther. 2001, 69: 8995.
 5.
US Department of Health and Human Services: FDA: The Nation’s Premier Consumer Health Protection Agency. 2004, Washington DC: US Food and Drug Administration
 6.
US Department of Health and Human Services: Guidance For Industry: Fast Track Drug Development ProgramsDesignation, Development, And Application Review. 2004, Washington DC: US Food and Drug Administration
 7.
Prentice RL: Surrogate endpoints in clinical trials: definitions and operational bias. Stat Med. 1989, 8: 431440. 10.1002/sim.4780080407.
 8.
Freedman LS, Graubard BI, Schatzkin A: Statistical validation of intermediate endpoints for chronic diseases. Stat Med. 1992, 11: 167178. 10.1002/sim.4780110204.
 9.
Weir CJ, Walley RJ: Statistical evaluation of biomarker as surrogate endpoints: a literature review. Stat Med. 2006, 25: 183203. 10.1002/sim.2319.
 10.
Shi Q, Sargent DJ: Metaanalysis for the evaluation of surrogate endpoints in cancer clinical trials. Int J Clin Oncol. 2009, 14: 102111. 10.1007/s1014700908854.
 11.
Fleming TR: Surrogate endpoints in clinical trials: definition and operational criteria. Stat Med. 1994, 8: 431440.
 12.
Hughes MD, DeGruttola V, Welles SL: Evaluating surrogate markers. J Acquir Immune Defic Syndr Hum Retrovirol. 1995, 10: S1S8. 10.1097/0004256019951000100001.
 13.
Daniels MJ, Hughes MD: Metaanalysis for the evaluation of potential surrogate markers. Stat Med. 1997, 16: 18651982.
 14.
Buyse M, Molenberghs G, Burzykowski T, Renard D, Geys H: The validation of surrogate endpoints in metaanalyses of randomized experiments. Biostatistics. 2000, 1: 4967. 10.1093/biostatistics/1.1.49.
 15.
Gail MH, Pfeiffer R, Van Houwelingen HC, Carroll RJ: On metaanalytic assessment of surrogate outcomes. Biostatistics. 2000, 1: 231246. 10.1093/biostatistics/1.3.231.
 16.
Korn EL, Albert PS, McShane LM: Assessing surrogates as trial endpoints using mixed models. Stat Med. 2005, 24: 163182. 10.1002/sim.1779.
 17.
Baker SG: A simple metaanalytic approach for using a binary surrogate endpoint to predict the effect of intervention on true endpoint. Biostatistics. 2006, 7: 5870.
 18.
Baker SG: Two simple approaches for validating a binary surrogate endpoint using data from multiple trials. Stat Methods Med Res. 2008, 17: 505514. 10.1177/0962280207081861.
 19.
Baker SG, Sargent DJ, Buyse M, Burzykowski T: Predicting treatment effect from surrogate endpoints and historical trials: an extrapolation involving probabilities of a binary outcome or survival to a specific time. Biometrics. 2012, 68: 248257. 10.1111/j.15410420.2011.01646.x.
 20.
Burzykowski T, Buyse M: Surrogate threshold effect: an alternative measure for metaanalytic surrogate endpoint validation. Pharm Stat. 2006, 5: 173186. 10.1002/pst.207.
 21.
Rolan P: The contribution of clinical pharmacology surrogates and models to drug development – a critical appraisal. Br J Clin Pharmacol. 1997, 44: 219225.
 22.
International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use: Statistical principles for clinical trials. Stat Med. 1999, 18: 19051942.
 23.
Lassere MN, Johnson KR, Boers M, Tugwell P, Brooks P, Simon L, Strand V, Conaghan PG, Ostergaard M, Maksymowych WP, Landewe R, Bresnihan B, Tak PP, Wakefield R, Mease P, Bingham CO, Hughes M, Altman D, Buyse M, Galbraith S, Wells G: Definitions and validation criteria for biomarkers and surrogate endpoints: development and testing of a quantitative hierarchical levels of evidence schema. J Rheumatol. 2007, 34: 607615.
 24.
Van Walraven C, Oake N, Coyle D, Taljaard M, Forster AJ: Changes in surrogate outcomes can be translated into clinical outcomes using a Monte Carlo model. J Clin Epidemiol. 2009, 62: 13061315. 10.1016/j.jclinepi.2009.01.015.
 25.
Wang Y, Sung C, Dartois C, Ramchandani R, Booth BP, Rock E, Gobburu J: Elucidation of relationship between tumor size and survival in nonsmallcell lung cancer patients can aid early decision making in clinical drug development. Clin Pharmacol Ther. 2009, 86: 167174. 10.1038/clpt.2009.64.
 26.
Van Houwelingen HC, Arends LR, Stijnen T: Advanced methods in metaanalysis: multivariate approach and metaregression. Stat Med. 2002, 21: 589624. 10.1002/sim.1040.
 27.
Nikolakopoulos S, van der Wal WM, Roes K: An analytical approach to assess the predictive value of biomarkers in phase II decision making. J Biopharm Stat. 2013, 23: 11061123. 10.1080/10543406.2013.814377.
 28.
Thoresen M, HellstromWestas L, Liu X, de Vries LS: Effect of hypothermia on amplitudeintegrated electroencephalogram in infants with asphyxia. Pediatrics. 2010, 126: 131139. 10.1542/peds.20092938.
Acknowledgements
This work was partially supported by the research grant (81273176, 81302509 and 81473069) from National Science Foundation of China. We would like to thank two reviewers for their constructive comments and Dr. Gould, A. Lawrence for the suggestions to improve the manuscript.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
ZJ, YS and WW conceived the study concept and designed the simulation methods. ZJ and JX performed the simulations. All authors contributed to the interpretation of the simulation study. ZJ and QS performed the analysis of the example. All authors revised and approved the submitted manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
About this article
Received
Accepted
Published
DOI
Keywords
 Biomarker
 Clinical endpoint
 Bayesian model
 PPV
 NPV
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate. Please note that comments may be removed without notice if they are flagged by another user or do not comply with our community guidelines.