The Comparative Effectiveness of Innovative Treatments for Cancer (CEIT-Cancer) project: Rationale and design of the database and the collection of evidence available at approval of novel drugs

Background The available evidence on the benefits and harms of novel drugs and therapeutic biologics at the time of approval is reported in publicly available documents provided by the US Food and Drug Administration (FDA). We aimed to create a comprehensive database providing the relevant information required to systematically analyze and assess this early evidence in meta-epidemiological research. Methods We designed a modular and flexible database of systematically collected data. We identified all novel cancer drugs and therapeutic biologics approved by the FDA between 2000 and 2016, recorded regulatory characteristics, acquired the corresponding FDA approval documents, identified all clinical trials reported therein, and extracted trial design characteristics and treatment effects. Herein, we describe the rationale and design of the data collection process, particularly the organization of the data capture, the identification and eligibility assessment of clinical trials, and the data extraction activities. Discussion We established a comprehensive database on the comparative effects of drugs and therapeutic biologics approved by the FDA over a time period of 17 years for the treatment of cancer (solid tumors and hematological malignancies). The database provides information on the clinical trial evidence available at the time of approval of novel cancer treatments. The modular nature and structure of the database and the data collection processes allow updates, expansions, and adaption for a continuous meta-epidemiological analysis of novel drugs. The database allows us to systematically evaluate benefits and harms of novel drugs and therapeutic biologics. It provides a useful basis for meta-epidemiological research on the comparative effects of innovative cancer treatments and continuous evaluations of regulatory developments. Electronic supplementary material The online version of this article (10.1186/s13063-018-2877-z) contains supplementary material, which is available to authorized users.


Background
Cancer drug development is characterized by a perceived urgency to find novel treatments that improve patients' survival and quality of life. Timely access to such beneficial treatments is considered paramount for patients with cancer. Before granting approval and market access, health authorities such as the FDA review the available evidence on benefits and harms from clinical trials and the claims made by the pharmaceutical companies and sponsors of the trials. The FDA examines the submitted clinical trial results, re-analyzes the trial's patient-level data, and evaluates whether the trials were conducted and analyzed in accordance with the original study protocols [1,2]. For drugs and therapeutic biologics that receive approval, the FDA reviews are made publicly available in the Drugs@FDA database as "approval packages" [3]. These packages provide a wealth of information on the evidence on benefits and harms of innovative treatments at the time of approval.
With the introduction of new incentives and approval pathways, the FDA aimed to facilitate the development and approval process of drugs intended to treat serious or life-threatening conditions, including cancer [4]. For example, some policies focus specifically on orphan drugs for rare diseases [4]. Between 2000 and 2012, 46 out of 47 oncology drugs approved by the FDA underwent expedited approval [5]. In 2012, a further policy for so-called "breakthrough" therapies was introduced for drugs with highly promising clinical evidence [5].
However, there is increasing discussion about the impact of these regulations because they may leave evidence gaps regarding efficacy and safety and increase uncertainty in clinical decision making as expedited and orphan drug approvals are often based on smaller studies than used in traditional approvals [6]. At the time of approval, there may be a dearth of evidence on hard clinical outcomes and subsequent follow-up evaluations suggest that such evidence either may never become available or may end up showing limited or no benefits [7][8][9]. Oncology and hematology are probably the medical fields which are currently most affected by such developments.
Numerous meta-epidemiological studies aimed to better understand the evidence at the time of approval of novel cancer drugs and therapeutic biologics using data from the FDA and the European Medicines Agency (EMA). We give an overview of these studies and the research in context in Table 1 (details of the underlying search strategy are provided in Additional file 1). The first related investigation that we are aware of was published in 2009 [10], and the number of publications peaked in 2017 with 10 articles. Nonetheless, a major limitation is that many of these studies cover only certain types of cancer (for example, solid tumors). Overall, there are four studies [10][11][12][13] which describe regulatory characteristics and clinical trials and assess endpoints and effect sizes used for approval on all cancer drugs, but none of them covers the most recently approved drugs (for example, after 2013). This would not allow the assessment of newer policies such as the breakthrough program introduced in 2012. Thus, the current knowledge on approval evidence for cancer drugs is marked by not only a limited scope but also a great diversity in methods and approaches, reducing the interpretability of the findings.
To address such limitations, we intended to establish a comprehensive database allowing a continuous analysis of such regulatory developments in meta-epidemiological research. The ongoing "Comparative Effectiveness of Innovative Treatments for Cancer" (CEIT-Cancer) project aims to transparently describe and characterize the clinical trial evidence of novel cancer drugs. Our goal is to capture the relevant information required to systematically analyze and assess early evidence on benefits and harms of novel cancer drug treatments.
As a first step, we collected the pre-marketing clinical trial evidence using FDA approval documents with a specific focus on cancer drugs, randomized controlled trials (RCTs) and single-arm trials (SATs), and treatment effects on overall survival (OS), progression-free survival (PFS), and objective response rate (RR). However, the overall database structure is organized in a modular nature which allows continuous updating of the list of  [42] To determine the availability of data on overall survival and quality of life benefits of cancer drugs. Ocana and Tannock (2011) [48] To determine if a difference in outcome between the experimental and control groups was detected that was equal to or greater than the value predefined in the protocol To review the long-term safety and efficacy or cancer drugs approved without evidence from randomized trials. This list is based on a systematic search (Additional file 1) but not intended to be exhaustive, as some relevant articles were brought to our attention by experts and could not be found with our limited search approach. *For example, approval pathways such as accelerated approval or orphan-drug status. Other regulatory characteristics (such as approval times, approval probabilities, or availability of pediatric label information) are not considered here. Abbreviations: drugs, the addition of new variables, expansion of the number of topics, health authorities, and outcomes as well as linkage with other related datasets (for example, from post-approval evidence including non-randomized real-world studies). Herein, we describe the rationale and design of the data collection process for the pre-approval evidence, including the organization of the data capture, the identification of clinical trial information, the assessment of trials for eligibility, and the data extraction.

Data collection
Project organization and database structure The data collection consisted of three steps. In step 1, we made an inventory of novel FDA-approved drug products and acquired the corresponding FDA approval packages. In step 2, we made an inventory of RCTs and SATs reported in FDA approval documents, assessed their eligibility, and extracted trial design characteristics. In step 3, we extracted treatment effects on OS, PFS, and RR.
Steps 2 and 3 started with a planning and organizing phase (operationalization of concepts, drafting of an instruction manual for standardized data selection and extraction, setting up the extraction platform, pilot testing of the instruction manual and extraction platform, and training of reviewers) followed by an execution phase (independent data extraction and verification) and ended with a closing phase (documentation of activities). Specific project activities are described in greater detail in the following sections.
The clinical trial data were managed in a single database. The database consists of four data tables (with information about the drug, indication, trial, study groups and treatment comparisons, and treatment effects) that are linked in one-to-many (1:n) relationships (Fig. 1). The relational structure is indispensable because of the nature of the data (for example, multiple indications approved for a single drug, multiple clinical trials supporting approval of a single indication, and multiple comparisons within a single multi-arm clinical trial). We used both Microsoft Access as a local data extraction and management platform and Ragic [14] as a cloud-based equivalent.
Step 1: Inventory of FDA-approved drugs and acquisition of approval packages The aims of this step were to identify and characterize all drugs licensed by the FDA for the treatment of cancer diseases and to download as well as prepare FDA approval documents for subsequent activities. This step was performed by a single reviewer (AL).

Inventory of FDA-approved drugs
In a first stage, we created a list of novel drugs and therapeutic biologics (referred to in this article as "drugs") that were granted their first FDA marketing authorization between 1 January 2000 and 31 December 2016. (Technically speaking, we included so-called "new molecular entities" and "new therapeutic biologics" approved via either a "New Drug Application" or a "Biologics License Application".) The drug names were collected from the "Annual drug and biologic approval activity" reports for new molecular and biological entities (2000 to 2016) [15] as well as the "FDA reports on drug innovation" (2011 to 2016) [16]. Information on therapeutic biologics approved before 2004 is not available in these documents and therefore we reviewed the drug approval reports by month for the period of January 2000 to December 2003 obtained from the Drugs@FDA database [3].

Selection of cancer indications
In a second stage, drugs were considered for inclusion in the CEIT-Cancer database if the original approval (that is, the first-ever approved use of a novel drug) was for the treatment of a solid tumor or hematological malignancy. Drugs without presumed cancer activity, such as supportive care drugs (for example, anti-emetics and hematopoietic stem cell mobilizing agents) or imaging drugs (for example, diagnostic radiopharmaceutical agents), were excluded. A medical oncologist (BK) was consulted in case of any doubts about eligibility.

Extraction of information on drug, indication, and regulatory characteristics
In the third stage, we collated information on drug, indication, and regulatory characteristics for each eligible drug and cancer indication ("drug-indication pair"; Table 2). The line of treatment was determined by a medical oncologist (BK). The remaining information was retrieved from various information sources as follows.
For drug-indication pair characteristics: "Annual drug and biologic approval activity" reports for new molecular and biological entities (2000 to 2016) [3], "FDA reports on drug innovation" (2011 to 2016) [16], and a peer-reviewed publication [17] for drug and regulatory characteristics, and the first-ever available FDA drug label from the Drugs@FDA database [3] for information about the FDA-approved indication(s).
For information on additional expedited programs and orphan status, we perused the following: "FDA reports on accelerated approvals" to identify accelerated approved indications [18]; that is, indications approved on the basis of preliminary evidence that does not meet regulatory standards for traditional (full) approval [4]; "Breakthrough designation approval" reports [19] to identify indications that received a breakthrough therapy designation in the pre-approval period; that is, drugs that are expected to advance the treatment of certain diseases [4]; and FDA database of orphan drug product designations to identify indications that received an orphan status [20]; that is, drugs intended for the treatment of rare diseases affecting fewer than 200,000 people in the US [21].
All documents were downloaded or accessed on 2 November 2015 (for the 2000 to 2012 approvals) and 2 March 2017 (for the 2013 to 2016 approvals). We relied on the information from the Drugs@FDA database in the case of discrepant information between information sources (for example, if there were different approval dates presented). We categorized the drug innovation class (first-in-class, advance-in-class, and addition-to-class) in accordance with the algorithm of Lanthier et al. [17]. Accordingly, first-in-class drugs can be seen as "true" therapeutic innovation and define a new drug class. Advance-in-class drugs may offer an important therapeutic advance (that is, they were granted priority review by the FDA) over existing drugs in the same class. Drugs that do not fall under either of these two categories are categorized as addition-to-class.

Approval packages
The FDA's review of the pre-clinical and clinical information generated by a biopharmaceutical company during the course of drug development is summarized in FDA "approval packages" published in the Drugs@FDA database. We used a similar approach to retrieve the approval documents as described recently [22], and we provided practical details on how we navigated the documents elsewhere [23]. The following documents served as source documents throughout this project and were made suitable for text searching using Adobe Acrobat's Optical Character Recognition (OCR) function: Medical review (sometimes referred to as clinical review) Statistical review Drug label Cross-discipline team leader review Summary review Multi-discipline review.

Step 2: Trial selection and characterization
The aims of this step were to identify eligible clinical trials in the medical review, assess their eligibility, and characterize their design characteristics. These activities were performed by teams of two independent reviewers. Trials include randomized and non-randomized studies (the latter within the category of SATs), and for each trial the database explicitly indicates whether a randomized design was used.

Identification of trials, eligibility assessment, and data extraction
Each reviewer was provided with a set of indications to identify potentially eligible trials. Reviewers independently searched the medical review document for randomized trials as well as for trials that were indicated as pivotal for approval (that is, the trial was described as "approval", "registration", "major", "pivotal", or similar) regardless of whether they were randomized or not. For each trial, the reviewers recorded variables presented in Table 3. In particular, they extracted the study identifier, name, or acronym and determined whether the following criteria were met (each criterion was assessed separately): (1) the trial was explicitly described as pivotal to approval, (2) the patients were randomly assigned to treatment arms, Generic name (Character string) According to US Adopted Names.
Type of active compound "NME"; "NBE" NME (New Molecular Entity, that is, a small molecule) or NBE (New Biologic Entity; that is, a biologic product).
Date of marketing authorization (Date) Format: YYYY-MM-DD.
Innovation class "First-in-class"; "Advance-in-class"; "Addition-to-class" Drug innovation class, following the definitions and categories described by Lanthier et al. [17]. New molecular or new biological entities are categorized as "First-in-class" if they define a new drug class, as "Advance-in-class" if they offer significant therapeutic advance (that is, they were granted priority review by the FDA) over existing drugs in the same class, or "Addition-to-class" in any other case.
Indication characteristics

FDA-approved indication (Character string)
Medical condition for which the drug of interest has been approved, according to the first-ever available FDA drug label.
Line of treatment "1st"; "2nd"; "3rd"; "4th" The clinical order the treatment is given NDA/BLA number (Integer) FDA's Original New Drug Application (NDA) or Biologics License Application (BLA) number. A unique identifier assigned to each application for approval submitted to the FDA.

Regulatory characteristics
Priority review "Standard"; "Priority" Priority review is an expedited FDA review program for drugs that provide a significant improvement over existing therapies.
Accelerated approval "Yes"; "No" Expedited FDA approval pathway for drugs that (a) treat serious conditions, (b) provide a meaningful advantage over available therapies, and (c) demonstrate effects on a surrogate endpoint that is reasonably likely to predict clinical endpoints. Accelerated approved drugs do not meet regulatory standards for traditional or full approval and are therefore required to provide evidence of clinical benefit in subsequent pivotal trials.
Breakthrough therapy designation "Yes"; "No" An expedited program at FDA introduced in 2012 for drugs that are (a) intended to treat serious conditions and (b) provide preliminary clinical evidence of substantial improvement over existing therapies.
Orphan designation "Yes"; "No" A status assigned by the FDA to rare disease indications if less than 200,000 people in the US are affected.
(3) the patients matched broadly in their disease characteristics with the approved target population, (4) the patients were randomly assigned to at least one control arm that did not contain the drug under review (regardless of dose or administration schedule), (5) as per the judgment of the reviewer, a trial could still be relevant even if none of the abovementioned criteria was met; for example, if the trial is extensively discussed or the only trial evaluated in the medical review (which is sometimes the case in accelerated approval settings, where such trials are often not explicitly labeled as "pivotal" but extensively discussed in the documents).
After completion, the two independently generated datasets were compared and disagreements resolved by consensus. The inter-rater reliability for trial identification (as assessed with the Kappa statistic [24]) was good (74%). Ultimately, trials that met any of the following sets of criteria were deemed eligible: the trial was described as pivotal (criterion 1 alone is met; categorized as "explicitly pivotal") the trial was not described as pivotal but was randomized (criterion 2), enrolled a population that matched the approved target population (criterion 3), and had a control arm that did not contain the intervention under review (criterion 4) (categorized as "likely pivotal RCT") the trial was not "explicitly pivotal" or a "likely pivotal RCT" but considered otherwise essential (criterion 5) for the approval decision (categorized as "other pivotal"). Such trials were typically singlearm studies in accelerated approval settings.
For each eligible trial, teams of two independent reviewers extracted information on variables presented in Table 4.
Step 3: Treatment effect estimates on overall survival, progression-free survival, and response rate The aim of this step was to retrieve treatment effect estimates on OS, PFS, and RR for each treatment comparison. This information was collected only for RCTs. This activity was performed by teams of two independent reviewers.

Data extraction
We preferred trial analyses conducted by the FDA over sponsors' analyses, whenever both were available. Similarly, more recent data cutoff dates were preferred over older cutoff dates if there were multiple analysis results on the same endpoint available. We used the statistical review document (or any other FDA approval documents) if the medical review document was not available or was incomplete or not legible.
For each treatment comparison, two reviewers independently searched the FDA review documents for treatment effect estimates on OS, PFS, and RR and extracted information on variables presented in Table 5. For OS and PFS endpoints with incomplete or missing information Pivotal "Yes"; "No" Trial eligibility criteria: the trial is described as "pivotal" (or similar).
On-label "Yes"; "No"; "Partially"; "Not reported" Trial eligibility criteria: the drug of interest is tested in the approved indication.
Comparator "Yes"; "No"; "Partially"; "Not reported" Trial eligibility criteria: the control intervention does not contain the active component of the drug under review.

Relevance
"Yes"; "No" Trial eligibility criteria: two reviewers consider that this trial was definitely used for approval, but none of the abovementioned eligibility criteria are met.
Eligible rationale "explicitly pivotal"; "likely pivotal"; "other pivotal"; "not eligible" The rationale for trial eligibility based on eligibility algorithm.
(for example, no confidence interval), we approximated treatment effect estimates following the methods described by Parmar et al. [25] and Tierney et al. [26]. At the end of the data collection activities in this step, the datasets of the two reviewers evaluating the same set of treatment comparisons were compared, and disagreements were resolved by consensus.

Discussion
We have successfully developed the CEIT-Cancer database, which transparently describes and characterizes information on the clinical trial evidence of novel cancer drugs at the time of their approval by the FDA. Exploring characteristics of the evidence of novel cancer drugs at the time of their approval could greatly improve our understanding of the real-world clinical benefit and safety of such treatments. Importantly, it may also open new avenues of future research and regulation, leading to better-designed studies, reduced waste in research, and more rigorous criteria for health authorities and health systems to consider incorporating new interventions into the current cancer armamentarium.
The CEIT-Cancer database is a comprehensive, manually curated platform that captures regulatory, drug, indication, and clinical trial data from FDA approvals of novel cancer drugs. This database differs from previous investigations in three important ways. First, the CEIT-Cancer database covers a time frame of 17 years, substantially larger compared with most previous studies. Second, it assesses all types of cancers, including both solid tumors and hematologic malignancies. Third, the database encompasses the most recent FDA drug approvals. In addition, this database can be expanded to other medical fields and be linked with other databases. It can be augmented with post-approval evidence and also can be expanded for data extraction of approval documents from other health authorities, such as the EMA [11,27].
We have set up the database and realized the project in a multidisciplinary team including experts in clinical trial methodology and conduct, clinical epidemiology, health technology assessment, biostatistics, clinical research, information management, public health, and medical oncology. The initial dataset covers a time period of 17 years. This allows us to investigate several regulatory developments over time and changes in the focus of drug development, such as the development of targeted agents and immunotherapy in contrast to classic cytotoxic chemotherapy. Following standardized and established data extraction procedures as in systematic reviews, we created a large evidence base on treatment effects and trial quality. This lays the foundation for our Other trial characteristics "Parallel"; "Cross-over"; "Uncontrolled/historic control" Patients are randomized to a concurrent control ("Parallel") or to a sequence of treatments ("Cross-over").
Comparison characteristics Arm 1 Type "Experimental"; "Active"; "Placebo"; "No treatment"; "Dose-comparison" In add-on trials, comparators were categorized as "active" whenever an intervention given on top of an active treatment (for example, standard of care with or without placebo). Comparators were categorized as "No treatment" if "supportive therapy" or "usual care" was given which included a wide variety of treatments rather than a specific intervention.
Characteristics (Character string) All interventions in arm 1, including drug names, doses, and route of administration. Interventions used to avoid treatment-related complications were not recorded, such as pre-treatment with acetaminophen/diphenhydramine to reduce infusion reactions with intravenous infusion of therapeutic biologics, or anti-emetics to reduce nausea and vomiting associated with certain chemotherapies.
planned continuous meta-epidemiological analysis of novel drugs and therapeutic biologics within the CEIT-Cancer project. We are currently developing the infrastructure to make the database available and aim to obtain structural funding and support to provide a sustainable solution. Through the collaborating participation of other investigators, we aim to establish a data-sharing process to provide access to the database and foster further research.

Conclusions
Publicly available drug approval documents offer highly valuable information that is very useful for evidence syntheses and research-on-research projects. The CEIT-Cancer database transparently describes and characterizes this information on the clinical trial evidence of novel cancer drugs. It allows systematic analysis and assessment of early evidence on benefits and harms of novel drug treatments in meta-epidemiological research. The modular nature and structure of the database as well as the data collection processes permit continuous updates and expansions. Overall, the database provides a solid basis for meta-epidemiological research of the evidence on novel treatments in cancer.

Availability of data and materials
The datasets generated or analyzed (or both) during the current study are not publicly available, because data collection and processing are ongoing. They will be made available from the corresponding author on reasonable request as described in the article.