This review was prepared in accordance with the PRISMA extension for Scoping Reviews reporting guideline (see Additional File 1: eTable 1) [33]. The protocol for this review has been published elsewhere [30, 34]. This scoping review did not require ethics approval from our institution.
Eligibility criteria
Documents that provided guidance (advice or formal recommendation) or a checklist describing outcome-specific information that should be included in a clinical trial protocol or report were eligible if published in the last 10 years in a language that our team could read (English, French, or Dutch). Dates were restricted to the last 10 years from the time of review commencement to focus the review to inform the update and extension of existing guidance provided by CONSORT (published in 2010) and SPIRIT (published in 2013) on outcome reporting and to increase feasibility related to the large number of documents identified in our preliminary searches. There were no restrictions on population, trial design, or outcome type. We only included documents that provided explicit guidance (“stated clearly and in detail, leaving no room for confusion or doubt” [35], such that the guidance must specifically state that the information should be included in a clinical trial protocol or report) [36]. An example of included guidance follows from the CONSORT-PRO extension: “Evidence of patient-reported outcome instrument validity and reliability should be provided or cited, if available” [36].
Information sources
Documents were searched for using: an electronic bibliographic database search (MEDLINE and the Cochrane Methodology Register; see eTable 2 in Additional file 2 for search strategy), developed in close consultation with an experienced research librarian, and searched from inception to 19 March 2018; a grey literature search; solicitation of colleagues; and reference list searching. Eligible document types included review articles, reporting guidelines, recommendation/guidance documents, commentary/opinion pieces/letters, regulatory documents, government reports, ethics review board documents, websites, funder documents, and other trial-related documents such as trial protocol templates.
The grey literature search methods included a systematic search of Google (www.google.com) using 40 combinations of key words (e.g., “trial outcome guidance”, “trial protocol outcome recommendations”; see eTable 3 in Additional file 3 for a complete list). The first five pages of the search results for each key term were reviewed (10 hits per page, leading to 2000 Google hits screened in total). Documents were also searched for using a targeted website search of 41 relevant websites (e.g., the EQUATOR Network, Health Canada, the Agency for Healthcare Research and Quality; see eTable 3 in Additional file 3) identified by the review team, solicitation of colleagues, and use of a tool for searching health-related grey literature [37]. Website searching included screening of the homepage and relevant subpages of each website. When applicable, the term “outcome” and its synonyms were searched for using the internal search feature of the website. We searched online for forms and guidelines from an international sample of ethics review boards, as ethics boards are responsible for evaluating proposed trials including the selection, measurement, and analyses of trial outcomes. We restricted the ethics review board search to five major research universities and five major research hospitals (considered likely to be experienced in reviewing and providing guidance on clinical trials) in four English-speaking countries: United States, United Kingdom, Canada, and Australia (see eTable 3 in Additional file 3). This approach helped to limit the search to a manageable sample of international ethics review board guidance. To ensure diverse geographic representation of documents from ethics review boards, as some countries yielded substantially more documents than others, documents were randomly selected from each of the four selected countries (i.e., 25% of documents were from each country), amounting to approximately half of the number of the total ethics review board documents initially identified.
Additional documents and sources from experts were obtained by contacting all founding members of the “InsPECT Group” [25]. This included 18 trialists, methodologists, knowledge synthesis experts, clinicians, and reporting guideline developers from around the world [28]. We asked each expert to identify documents, relevant websites, ethics review boards, and additional experts who may have further information. All recommended experts were contacted with the same request. Given the comprehensiveness of our search strategies and the large number of documents identified as eligible for inclusion, we performed reference list searching only for included documents identified via Google searching, as this document set encompassed the diversity of sources and document types eligible for inclusion (e.g., academic publications, websites).
Selection of sources of evidence
A trained team member (L. Saeed) performed the final electronic bibliographic database searches and exported the search results into EndNote version X8 [38] to remove all duplicates. All other data sources were first de-duplicated within each source manually, and then de-duplicated between already screened sources, leaving only new documents to move forward for “charting” (in scoping reviews, the data extraction process is referred to as charting the results) [32, 33].
Initial screening
All screening and data charting forms are available on the Open Science Framework [39]. Titles and abstracts of documents retrieved from the electronic bibliographic database search were screened for potential eligibility by one of two reviewers with graduate-level epidemiological training (AM, EJM) before full texts were thoroughly examined. The two reviewers assessed 90 citations as a practice set and reviewed the results with a senior team member (NJB). The reviewers then screened a randomly selected training set of 100 documents from the electronic bibliographic database search and achieved 93% observed agreement and 71% chance agreement, yielding a Cohen’s κ score of 0.76 (substantial agreement [40]). The remaining search results were then divided and each independently screened by one of the two reviewers, with periodic verification checks performed by NJB. One reviewer (AM) screened and charted all website search results. Documents gathered from the ethics review board searches (by L. Saeed) and from the solicitation of experts moved directly to full-text review and charting by EJM.
Full-text screening
The reviewers (AM, EJM) performed full-text screening for eligibility using a similar process as for title and abstract screening. A sample of 35 documents identified from title and abstract screening were assessed for eligibility. The observed agreement rate was 94% (33 of 35 documents). The included documents (n = 14) were charted in duplicate, and the reviewers examined their charting results and resolved any discrepancies through discussion. Following review of the agreement results by a senior team member (NJB), the remaining search results were divided and independently screened and charted by one of the two reviewers, with periodic verification checks performed by NJB. Full-text screening and reasons for exclusion were logged using a standardized form [39] developed using Research Electronic Data Capture (REDCap) software [41].
Data charting process
The included documents proceeded to undergo data charting using a standardized charting form [39] developed using REDCap software [41]. Prior to data charting, 11 documents were piloted through the full-text screening form and the charting form by EJM and AM (AM was not involved in developing the forms), and the forms were modified as necessary following review of the form testing with NJB and MO. The reviewers (AM, EJM) charted data that included information such as characteristics of the document (e.g., publication type, article title, last name of first author, publication year, publisher) as well as the scope and characteristics for each of the specific recommendations extracted from each included document (e.g., whether the recommendation was specific to clinical trial protocols or reports, or specific to type of outcomes, trial design, or population). Given the nature of this review, a risk of bias assessment or formal quality appraisal of included documents was not performed. To help gauge the credibility of recommendations gathered, we categorized the type(s) of recommendation as made with supporting empirical evidence provided within the source document (e.g., based on findings from a literature review or expert consensus methods) and/or citation(s) provided to other documents (e.g., citation provided to an existing reporting guideline), or neither.
Synthesis of results
Recommendations identified within the included documents were compared with the candidate outcome reporting items to support, refute, or refine item content and to assess the need for the development of additional candidate items. To achieve these aims, the reviewers (AM and EJM) mapped each recommendation gathered to existing candidate items or one of the ten descriptive categories, supported by full-text extraction captured in free text boxes within the charting form. Recommendations that did not fall within the scope of any existing candidate items or categories were captured in free text boxes. Eight in-person meetings were held by members of the “InsPECT Operations Team” [25, 28] over a 2-month period to review these recommendations and to develop any new candidate reporting items or refine existing candidate items to better reflect the concepts/wording in the literature. Attendance was required by the review lead author (NJB), the senior author (MO), and at least three other members of the Operations Team (EJM, AM, L. Saeed, A. Chee-a-tow). After completion of data collection, the mapping results of recommendations to each candidate item were reviewed by NJB in their entirety and finalized by consensus with the two reviewers (EJM, AM). The wording of the candidate items was then clarified as necessary and finalized by the Operations Team. Data analysis included descriptive quantitative measures (counts and frequencies) to characterize the guidance document characteristics and their recommendations.