How to select outcome measurement instruments for outcomes included in a “Core Outcome Set” – a practical guideline

Background In cooperation with the Core Outcome Measures in Effectiveness Trials (COMET) initiative, the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative aimed to develop a guideline on how to select outcome measurement instruments for outcomes (i.e., constructs or domains) included in a “Core Outcome Set” (COS). A COS is an agreed minimum set of outcomes that should be measured and reported in all clinical trials of a specific disease or trial population. Methods Informed by a literature review to identify potentially relevant tasks on outcome measurement instrument selection, a Delphi study was performed among a panel of international experts, representing diverse stakeholders. In three consecutive rounds, panelists were asked to rate the importance of different tasks in the selection of outcome measurement instruments, to justify their choices, and to add other relevant tasks. Consensus was defined as being achieved when 70 % or more of the panelists agreed and when fewer than 15 % of the panelists disagreed. Results Of the 481 invited experts, 120 agreed to participate of whom 95 (79 %) completed the first Delphi questionnaire. We reached consensus on four main steps in the selection of outcome measurement instruments for COS: Step 1, conceptual considerations; Step 2, finding existing outcome measurement instruments, by means of a systematic review and/or a literature search; Step 3, quality assessment of outcome measurement instruments, by means of the evaluation of the measurement properties and feasibility aspects of outcome measurement instruments; and Step 4, generic recommendations on the selection of outcome measurement instruments for outcomes included in a COS (consensus ranged from 70 to 99 %). Conclusions This study resulted in a consensus-based guideline on the methods for selecting outcome measurement instruments for outcomes included in a COS. This guideline can be used by COS developers in defining how to measure core outcomes. Electronic supplementary material The online version of this article (doi:10.1186/s13063-016-1555-2) contains supplementary material, which is available to authorized users.


Table of Contents
Abbreviations and definitions 3 Introduction 4 Step 1. Conceptual considerations 5 Step 2. Finding existing outcome measurement instruments 5 Systematic reviews 5 Literature searches 6 Other sources 7 Step 3. Quality assessment of outcome measurement instruments 8 Evaluation of the methodological quality of the included studies 8 Evaluation of the quality of the measurement properties 9 Best evidence synthesis 14 Feasibility aspects 15 Step 4. Generic recommendations on the selection of outcome measurement instruments for a COS 1 6 Select only one outcome measurement instrument for each outcome in a COS 16 Minimum requirements for including an outcome measurement instrument in a COS 17 A consensus procedure to agree on the outcome measurement instrument for each outcome in a COS 17 Introduction A joint initiative between the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative [1] and the Core Outcome Measures in Effectiveness Trials (COMET) initiative [2] aimed to develop a guideline on how to select outcome measurement instruments (OMIs) (e.g. assessments by health professionals, biomarkers, clinical rating scales, imaging tests, laboratory tests, patient questionnaires, and performance-based tests) for outcomes (i.e. constructs or domains) included in a Core Outcome Set (COS). A COS is an agreed minimum set of outcomes that should be measured and reported in all clinical trials of a specific disease or trial population; it is a recommendation of what should be measured and reported in all clinical trials. [3] Once a COS is defined, it is then important to achieve consensus on how these outcomes should be measured, i.e. which OMIs should be selected.
The present guideline results from an International Delphi study, among different groups of stakeholders, that took place between November 2013 and October 2014. With this Delphi study we (that is: COSMIN and COMET) reached consensus on the different steps to be taken in the selection of OMIs for outcomes included in a COS. Details on the methods have been published elsewhere [4] and results of the Delphi study have been published separately. [Trials 2016, in press] In this guideline we intended to describe the optimal methodology, i.e. the preferred approach for selecting OMIs for outcomes included in a COS. We assume the situation where choices regarding what to measure (i.e. the core outcomes in a COS) already have been made.

Step 2. Finding existing outcome measurement instruments
The aim of the second step in the selection of OMIs for outcomes included in a COS is finding existing OMIs. With the intention to search for all existing OMIs, three sources of information can be used: 1) systematic reviews, 2) literature searches, and 3) other sources, considered as optional.

Systematic reviews
We recommend that COS developers use existing, good quality, and up-to-date systematic reviews of OMIs for the selection of OMIs for a COS. To see if a systematic review exists, the COSMIN database of systematic reviews of OMIs can be consulted MEDLINE (e.g., through the PubMed or OVID interface) is considered the minimum database that COS developers should consult for finding existing OMIs. An additional search in EMBASE is highly recommended because in several systematic reviews of OMIs, it appeared that two or three relevant articles were found in EMBASE that were not found in MEDLINE. [11][12][13] We also recommend searching in other specific databases as well, depending on the construct and target population of interest, for example the Cochrane Library, Cinahl, or PsycINFO. Lastly, we recommend that the reference lists of the included studies should be screened for additional relevant studies. [1] Other sources Additional sources of information that can be consulted for finding existing OMIs include (online) databases of OMIs, books and/or book chapters, conference proceedings, contact/lead authors of publications in the field, World Wide Web, trial registries, citations, consumer networks, patient organizations, and special interest groups. Examples of relevant online databases can be found in Table 1. However, searches in these additional sources cannot be performed in a systematic and reproducible manner. Moreover, it is unlikely that one will find OMIs of good quality that were not already identified in a systematic literature search. We therefore consider such searches as optional.

Step 3. Quality assessment of outcome measurement instruments
The third step in the selection of OMIs is a quality assessment of the OMIs that result from Step 2. There are many aspects to assessing the measurement properties of OMIs and their constructs and definitions do vary. We chose the definitions listed at http://www.cosmin.nl/COSMIN%20taxonomy.html which have been derived from an international Delphi consensus process. [14,15] The quality assessment includes two distinctive parts: 1) the evaluation of the methodological quality of the included studies by using the COSMIN checklist, [14] and 2) the evaluation of the quality of the OMIs (i.e., their measurement properties and feasibility aspects) by applying criteria for good measurement properties. [16] Evidence on the methodological quality of the studies and the quality of the measurement properties should be combined in a best evidence synthesis, with feasibility aspects also taken into consideration. It is possible that Step 3 may already have been performed in an existing and up-to-date systematic review of good quality.

Evaluation of the methodological quality of the included studies
The COSMIN checklist is recommended to evaluate the methodological quality of the identified studies. Although the COSMIN checklist was originally developed for evaluating the quality of studies on the measurement properties of health measurement instruments, it has also been used for evaluating the quality of studies on the measurement properties of other OMIs, such as performance based tests. [17] This checklist is applied to each study and covers nine different measurement properties each rated from excellent to poor, resulting in an overall quality score for each measurement property. [18] The COSMIN checklist, including detailed information on using the checklist, can be found on the COSMIN website. [1] Evaluation of the quality of the measurement properties To determine whether an OMI has good measurement properties, criteria for good measurement properties can be applied. [16]  The degree to which the measurement is free from measurement error Measurement error The systematic and random error of a patient's score that is not attributed to true changes in the construct to be measured Validity Content validity (including face validity) The degree to which the content of a measurement instrument is an adequate reflection of the construct to be measured Structural validity The degree to which the scores of a measurement instrument are an adequate reflection of the dimensionality of the construct to be measured Hypotheses testing The degree to which the scores of a measurement instrument are consistent with hypotheses based on the assumption that the measurement instrument validly measures the construct to be measured Cross cultural validity The degree to which the performance of the items on a translated or culturally adapted measurement instrument are an adequate reflection of the performance of the items of the original version of the measurement instrument Criterion validity The degree to which the scores of a measurement instrument are an adequate reflection of a 'gold standard'

Responsiveness
The ability of a measurement instrument to detect change over time in the construct to be measured

Content validity (including face validity)
We recommend that the content validity (including face validity) of the included OMIs be validity studies found in the literature, and 4) the content of the OMI. We recommend that content validity is judged by two reviewers independently and that the perspective of patients is included as well. COSMIN is currently working on updated standards and criteria for content validity that includes these aspects (a Delphi study is currently ongoing).
The methodological quality of the studies on content validity can be evaluated by completing the COSMIN box for 'content validity'. Subsequently, criteria for good measurement properties can be applied.   Subsequently, we recommend that the internal structure of the included OMIs should be evaluated, focusing on the structural validity (i.e., dimensionality) and internal consistency of the OMI.
Structural validity refers to the degree to which the scores of an OMI are an adequate reflection of the dimensionality of the construct of interest. [15] Structural validity can be assessed by either factor analyses, IRT analysis, or Rasch analysis.
Internal consistency refers to the degree of interrelatedness among the items. [15] Cronbach's alpha can be used to assess the internal consistency of an OMI that has been shown to be unidimensional by factor analysis.
We recommend the methodological quality of the studies on structural validity and internal consistency be evaluated by completing the applicable COSMIN boxes for 'structural validity' and for 'internal consistency'. Subsequently, criteria for good measurement properties can be applied ( Table 3). The IRT criteria for structural validity were adapted from the Patient Reported Outcomes Measurement Information System (PROMIS) Standards. [22] The assessment of structural validity and internal consistency is only relevant for OMIs based on a reflective model. [19] These items are expected to be highly correlated and interchangeable. In case the OMI is based on a formative model (i.e., when the items together form the construct and do not need to be correlated), these measurement properties are not relevant and this task can be skipped. For example, for the ACR20, a formative model that indicates how much a person's rheumatoid arthritis has improved, the assessment of interrater reliability may be more relevant instead.
In case there is evidence in multiple studies (i.e., at least two studies) of good quality or in one study of excellent quality that content validity (including face validity) AND structural validity or internal consistency are poor 2 , the OMI will not be further considered, i.e. the other measurement properties (including reliability, measurement error, hypotheses testing, cross-cultural validity, criterion validity, and responsiveness) will not be further evaluated. [1]

Other measurement properties
As for all other measurement properties (i.e., reliability, measurement error, hypotheses testing, cross-cultural validity, criterion validity, and responsiveness), we recommend the methodological quality of the included studies be evaluated, as well as the quality of the measurement properties. Aspects of feasibility may play an important role in the selection of an OMI for a COS. COS developers should ask themselves the following question: "Can the measure be applied easily in its intended setting, given constraints of time, money, and interpretability?" [5,6] Aspects of feasibility may be decisive in the acceptance of the OMI by researchers. A complete overview of all feasibility aspects that COS developers may take into consideration is provided in Table   5. Step 4. Generic recommendations on the selection of outcome measurement instruments for a COS These recommendations concern the final decision making on including an OMI in a COS.

Select only one outcome measurement instrument for each outcome in a COS
Taking all evidence of the measurement properties and feasibility aspects into consideration, and the specific situation for which the OMI is intended, we recommend -in principle-to select only one OMI for each outcome in a COS. This will enhance the comparability of future clinical trials. If the outcome of interest is a complex outcome (e.g., pain) that consists of multiple aspects that are being measured by different OMIs (e.g., pain intensity, pain interference), we recommend that these different aspects be considered as different outcomes.
In addition, we recommend COS developers to consider whether different (sub)populations may need their own OMIs to measure the same outcome. For example, a different OMI may be selected to measure pain in children and in adults.

Minimum requirements for including an outcome measurement instrument in a COS
Ideally, an OMI included in a COS has high quality evidence for all measurement properties.
However, in practice, there is often unknown or (very) low evidence for some measurement properties. We recommend that an OMI can be provisionally included in a COS if there is at least high quality evidence for good 1 content validity and for good 1 internal consistency (if applicable), and if the OMI seems feasible. Conversely there should be an absence of high quality evidence that one or more other measurement properties are poor 2 . If internal consistency is not relevant, evidence for test-retest or inter-rater reliability should be available. Where an OMI lacks evidence on one or more measurement properties, we recommend proposing a research agenda for further validation studies. When no OMI with good 1 content validity is available, we recommend developing a new OMI, followed by a quality assessment of the OMI.
A consensus procedure to agree on the outcome measurement instrument for each

outcome in a COS
We recommend that COS developers use a consensus procedure (e.g., a consensus meeting) to get final agreement on the selected OMIs included in a COS among all relevant stakeholders, including patients. Group discussions and a plenary discussion plus voting during a face-to-face meeting among a group of stakeholders can be used to achieve consensus on the final core set of OMIs. [5,6]

Summary
We reached consensus on four main steps in the selection of outcome measurement instruments (OMIs) for outcomes included in a Core Outcome Set (COS): Step 1) conceptual considerations; Step 2) finding existing OMIs, by means of a systematic review and/or literature searches; Step 3) quality assessment of OMIs, by means of the evaluation of the measurement properties and feasibility aspects of the OMIs; and Step 4) generic recommendations on the selection of OMIs for outcomes included in a COS. In general, the methods for the selection of OMIs for a COS are considered to be similar to the methods for selecting OMIs for individual clinical trials. A summary of this guideline is presented in a flow chart (Figure 1).