- Open Access
- Open Peer Review
How to select outcome measurement instruments for outcomes included in a “Core Outcome Set” – a practical guideline
Trials volume 17, Article number: 449 (2016)
In cooperation with the Core Outcome Measures in Effectiveness Trials (COMET) initiative, the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative aimed to develop a guideline on how to select outcome measurement instruments for outcomes (i.e., constructs or domains) included in a “Core Outcome Set” (COS). A COS is an agreed minimum set of outcomes that should be measured and reported in all clinical trials of a specific disease or trial population.
Informed by a literature review to identify potentially relevant tasks on outcome measurement instrument selection, a Delphi study was performed among a panel of international experts, representing diverse stakeholders. In three consecutive rounds, panelists were asked to rate the importance of different tasks in the selection of outcome measurement instruments, to justify their choices, and to add other relevant tasks. Consensus was defined as being achieved when 70 % or more of the panelists agreed and when fewer than 15 % of the panelists disagreed.
Of the 481 invited experts, 120 agreed to participate of whom 95 (79 %) completed the first Delphi questionnaire. We reached consensus on four main steps in the selection of outcome measurement instruments for COS: Step 1, conceptual considerations; Step 2, finding existing outcome measurement instruments, by means of a systematic review and/or a literature search; Step 3, quality assessment of outcome measurement instruments, by means of the evaluation of the measurement properties and feasibility aspects of outcome measurement instruments; and Step 4, generic recommendations on the selection of outcome measurement instruments for outcomes included in a COS (consensus ranged from 70 to 99 %).
This study resulted in a consensus-based guideline on the methods for selecting outcome measurement instruments for outcomes included in a COS. This guideline can be used by COS developers in defining how to measure core outcomes.
There is a lack of consensus with regard to the selection of outcomes (i.e., constructs or domains) and outcome measurement instruments (OMIs) for clinical trials . As a result, different outcomes are assessed and a variety of OMIs (e.g., assessments by health professionals, biomarkers, clinical rating scales, imaging tests, laboratory tests, patient questionnaires, and performance-based tests) measure the same outcome, causing inconsistencies in reporting and difficulties in comparing and combining the findings in systematic reviews and meta-analyses [2, 3]. In addition, the quality of OMIs varies considerably, and it is usually not apparent that the most reliable and valid OMI has been selected. Standardization of the selection of outcomes and OMIs is needed.
The current project is a joint initiative between the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative  and the Core Outcome Measures in Effectiveness Trials (COMET) initiative . COSMIN aims to improve the selection of OMIs, and has developed methodological standards for studies on the measurement properties of OMIs . COMET aims to facilitate the development and application of agreed standardized sets of outcomes, also known as “Core Outcome Sets” (COS). A COS is an agreed minimum set of outcomes that should be measured and reported in all clinical trials of a specific disease or trial population. It is a recommendation of what should be measured and reported in all clinical trials .
Once a COS is defined, it is then important to achieve consensus on how these outcomes should be measured, i.e., which OMIs should be selected. In the selection of OMIs, a number of tasks need to be performed. For example, a literature search to find potentially relevant OMIs, and a quality assessment to evaluate the (methodological) quality of the available OMIs. However, in addition to existing and valuable expert-based guidelines, such as the HOME Roadmap  and the OMERACT Handbook [http://www.omeract.org/], a consensus-based guideline on OMI selection is missing.
The primary aim of this study was to develop a guideline on how to select OMIs for outcomes included in a COS. However, a COS is not usually specific for any given clinical trial. A clinical trial may impose additional requirements for selecting OMIs perhaps relating to feasibility or sensitivity. We therefore had a secondary aim of investigating whether the methods for selecting OMIs for a COS are similar to the methods for selecting OMIs for individual clinical trials.
As details on the methods and design have been published previously , this section is restricted to a summary.
A Delphi study was performed to achieve consensus on relevant tasks that need to be performed in the process of selecting OMIs for outcomes (i.e., constructs or domains) included in a COS. The resulting guideline is based on the results of the Delphi study. Also, existing methodology that has been developed by COSMIN for performing systematic reviews of OMIs was used to support the guideline  as well as methodology that stems from the Outcome Measures in Rheumatology (OMERACT) Filter 2.0 and the OMERACT Handbook for developing COSs for rheumatic diseases [10, 11], and the Primary Outcomes Reporting in Trials (PORTal) initiative which looks at primary outcomes reported in adult and pediatric clinical trials . These other sources of evidence were used to expand on items to a level not discussed in the Delphi study.
To inform the Delphi study, a literature review was performed to identify existing studies that provide guidance on OMI selection. A health research librarian conducted an electronic literature search in November 2012 in MEDLINE, EMBASE, PsycINFO, and Cinahl.
Inclusion criteria: studies that were guidelines, meta-analyses, review articles, or systematic reviews, and study protocols that developed or applied methodology for selecting outcomes or OMIs to be used in clinical trials. Exclusion criteria: studies that discussed “how to measure” rather than “how to select” outcomes or OMIs for use in clinical trials; and studies that aimed to evaluate the measurement properties of OMIs.
All search strategies are presented in Additional file 1.
Development of the Delphi questionnaire
The potentially relevant tasks on OMI selection identified from the literature review were included in the Delphi questionnaire. Questions were formulated on the relevance of each of the tasks, for example: “Should COS developers agree upon the target population before starting to search for outcome measurement instruments?” Response options included “highly recommended,” “desirable,” “not relevant,” and “not my expertise.” Free text boxes were included after each question to facilitate comments.
Selection of experts
Experts who were identified from the literature review, as well as experts who participated in a previous COSMIN Delphi study , were invited to participate. A “snowball sampling” approach was used to identify other potential experts. We found no guidelines for sample sizes of Delphi studies, but in general having more panelists will facilitate acceptance and implementation of the guideline . Based on our previous experiences with Delphi studies [6, 13, 15, 16], we anticipated a response rate of between 30 and 40 %. We therefore invited all 481 previously identified experts to participate.
The Delphi study was planned to consist of three questionnaire rounds in order to achieve consensus . Panelists were asked to anonymously rate the relevance of different tasks on OMI selection. They were encouraged to justify their choices and to add other possibly relevant tasks. Subsequently, panelists were asked for their opinion on whether the methods for selecting OMIs for a COS are similar to the methods for selecting OMIs for individual clinical trials.
Consensus was defined as being achieved when at least 70 % of the panelists agreed with a task (i.e., highly recommended or desirable) with no opposing arguments provided, and when fewer than 15 % of the panelists disagreed with a task (i.e., “not relevant”). Tasks on which such consensus was reached were included in the guideline and panelists were not asked to vote for these tasks again. When at least 50 % of the panelists disagreed with a task (i.e., “not relevant”) and when no strong arguments in favor of this task were given, we excluded the task from the guideline. Tasks with an indeterminate response were taken to the subsequent round. When consensus was not reached after the third round, the need for a fourth questionnaire round was considered by the Delphi Steering Committee (CP, SV, MR, and CT).
Data were analyzed both quantitatively (absolute values, percentages) and qualitatively (listings of the comments and suggestions given by the panelists). Based on the responses given in the first round, including the comments given in the free text boxes, new proposals were formulated. Response options included –“strongly agree,” “agree,” “no opinion,” “disagree,” and “strongly disagree.” Additionally, new questions that arose based on the comments given were formulated and were marked as “new questions.” Panelists were asked to rate their agreement on the given proposals and the relevance of the new tasks in the second round. The results of the second round were then again analyzed for consensus following the same procedure as for the first round .
A total of 481 experts were invited to participate. Delivery failed to 41 recipients and four “out of office” notifications were received concerning long-term absence. A total of 120/436 panelists (28 %) accepted the invitation. 95/120 panelists (79 %), from 14 different countries, completed the first Delphi questionnaire (Table 1). The second questionnaire was completed by 65/95 (68 %), and 76/95 (80 %) completed the third questionnaire.
In the first round, panelists were asked to rate 78 questions. Consensus was reached on 58 questions (74 %).
In the second round, panelists were asked to rerate 20 questions on which no consensus was achieved in the first round. In addition, 19 new questions were formulated based on the additional comments invited in the first round. For 2/19 new questions, a 70 % or greater consensus was not reached (67 %Footnote 1 and 48 %Footnote 2, respectively). For 7/19 questions, consensus was reached (range 71 to 84 %) but 15 % or more of the panelists disagreed. In reviewing the panelists’ comments on these items, it was clear that for a total of eight questions we were too restrictive in our formulations, too brief in the descriptions of the tasks, or that certain tasks might not be applicable in all circumstances.
In the third round, panelists were provided with eight new formulations, instead of questions, of the paragraph for potential inclusion in the guideline intending to address nuances applicable to specific situations. For example, in the first round it was suggested that the selection of OMIs should always be guided by a review of the face validity of an OMI. In the second round, panelists were asked if COS developers themselves should assess the face validity of an outcome measurement instrument to be included in a COS. Eighty-four percent of the panelists agreed; however, 16 % of them (strongly) disagreed. It was argued that only if no face validity assessment is reported in the literature, COS developers should do it themselves. In the third round, we proposed the following recommendation for the guideline: “It is recommended that, in case no face validity assessment is reported in the literature, COS developers assess the face validity of an OMI to be included in a COS.” On all eight formulations of the paragraph for potential inclusion in the guideline consensus was reached (range 81 to 93 %), but 15 % or more of the panelists disagreed on three of these formulations (15 %, 15 %, and 19 %, respectively). As no opposing arguments were provided against these three formulations, the Steering Committee decided to include all eight proposed formulations in the guideline.
We reached consensus on four main steps in the selection of OMIs for outcomes included in a COS (Table 2). Each of these four steps includes a variety of tasks.
Step 1. Conceptual considerations
We reached 98–99 % consensus that the first step in the selection of OMIs is to agree in detail upon the construct (i.e., outcome or domain) to be measured  and the target population (e.g., age, gender, disease characteristics) (Table 2). This is a key task of the group developing a COS for which OMIs are sought.
Step 2. Finding existing outcome measurement instruments
We reached 70–99 % consensus that the second step is to find existing OMIs. With the intention to search for all existing OMIs, three sources of information can be used: (1) systematic reviews, (2) literature searches, and (3) other sources, considered as optional (Table 2). The COSMIN guideline for systematic reviews of OMIs recommends that those searching the literature for all OMIs do not use search terms to cover “type of OMI” because a wide variety of terminology is used (e.g., OMIs are also termed measures, methods, questionnaires, tests, etc.). This variety of terms that has been used in the original articles can lead to a high risk of missing relevant studies . There is, however, one exception for patient-reported outcome measures (PROMs): for these a comprehensive PROM filter, developed for PubMed by the Patient Reported Outcomes Measurement Group of the University of Oxford, can be used. This search filter is available through the COSMIN website . In all other cases it is recommended to only use search terms for “construct,” “population,” and “measurement properties” in the search for all OMIs .
Step 3. Quality assessment of outcome measurement instruments
We reached 70–97 % consensus that the third step in the selection of OMIs is quality assessment of the available OMIs. According to COSMIN, this includes two distinctive parts: (1) evaluation of the methodological quality of the included studies by using the COSMIN checklist  and (2) evaluation of the quality of the OMIs (i.e., their measurement properties and feasibility aspects) by applying criteria for good measurement properties (Table 2) .
Following the COSMIN taxonomy on which international consensus was reached [6, 13], all nine measurement properties were considered relevant in the selection process of OMIs for outcomes included in a COS (Table 3). Consensus was achieved on the criteria for good measurement properties (Table 4). The quality assessment applies to all different types of OMIs, such as assessments by health professionals, biomarkers, clinical rating scales, imaging tests, laboratory tests, patient questionnaires, and performance-based tests, and the applicable measurement properties should be evaluated.
In the evaluation of the measurement properties of the OMIs that could potentially be included in a COS, COSMIN recommends a predefined order of importance of evaluating the measurement properties: (1) content validity, (2) internal structure (i.e., structural validity and internal consistency, and/or Item Response Theory (IRT)/Rasch model fit), and where applicable (3) the remaining measurement properties (i.e., reliability, measurement error, hypotheses testing, cross-cultural validity, criterion validity, and responsiveness). Content validity is considered to be the most important measurement property of an OMI because if it is unclear what the OMI is actually measuring, the assessment of the other measurement properties is not valuable. If the content validity of an OMI is poor or unknown, the OMI will not be further considered in the selection process. Subsequently, the internal structure (i.e., internal consistency and structural validity) should be evaluated. In case there is evidence that the internal structure of an OMI is poor, the OMI will not be further considered, i.e., the other measurement properties (including reliability, measurement error, hypotheses testing, cross-cultural validity, criterion validity, and responsiveness) will not be further evaluated .
To reach a conclusion about the overall quality of an OMI, an overall evaluation of the OMI should be constructed, based on all available evidence . This can be done by a best-evidence synthesis, where the quality of evidence should be graded for a body of evidence for each measurement property, taking into account the number of studies, the methodological quality of the studies, and the consistency of the results of the measurement properties (Table 5) .
We reached 77–97 % consensus that COS developers should take feasibility aspects into consideration in the selection of OMIs for outcomes included in a COS (Table 6).
Step 4. Generic recommendations on the selection of outcome measurement instruments for a COS
We reached 81–90 % consensus on three generic recommendations concerning the final decision-making on including an OMI in a COS: (1) it is recommended to select only one OMI for each outcome (i.e., constructs or domains) in a COS, which will enhance the comparability of clinical trials, (2) it is recommended that an OMI can be provisionally included in a COS if there is at least high quality evidenceFootnote 3 for goodFootnote 4 content validity and good4 internal consistency (or evidence for test-retest or interrater reliability) and if the OMI is feasible, and (3) it is recommended that COS developers use a consensus procedure to get final agreement on the selected OMIs included in a COS among relevant stakeholders, including patients (Table 2).
Following the OMERACT Handbook, the next phase of research needs to be more explicit on what categories of stakeholders should be considered (patients, public, practitioner, press, policy-maker, program manager, professor, payer) and what the minimum requirements are for consensus [10, 11].
In addition, we reached 95 % consensus that, in general, the methods for the selection of OMIs for a COS are considered to be similar to the methods for selecting OMIs for individual clinical trials. However, as in practice it may not be feasible to perform all these steps for a clinical trial, trialists can then chose to use those OMIs that are included in a COS.
The four main steps, including their tasks, were included in the final guideline that can be found in Additional file 2.
The present guideline on methods for selecting OMIs can be used by COS developers in defining how to measure the core outcomes (i.e., constructs or domains) that are included in a COS. The guideline is based on the results of the Delphi study, the methodology derived from the COSMIN initiative, and recommendations from OMERACT . With this stepwise approach, we intend to optimize the methodology of selecting OMIs for outcomes included in a COS. The field of COS development is relatively new but rapidly growing; COMET maintains a database with the aim of including all registered and ongoing initiatives on COS development including, for example, the Harmonizing Outcome Measures for Eczema (HOME), and the Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT) initiatives. Currently, this database includes 249 published COS studies that relate to 300 COS, and 144 ongoing COS studies [5, 21]. Other examples of the potential impact of COSs are that the National Institute for Health Research’s (a UK research funding body) Health Technology Assessment program, requires COSs to be considered in the funding applications of clinical trials, and that Cochrane and Grading of Recommendations Assessment, Development and Evaluation (GRADE) are encouraging the use of COSs in reviews and clinical practice guidelines. We believe that methodology guidelines should be based on the agreed methodology so as to deliver high-quality COSs that can be used in future clinical trials and other research. Using high-quality COSs will ultimately improve the conduct and reporting of clinical trials, enhance the value of evidence synthesis by reducing heterogeneity between trials, and may reduce outcome reporting bias. COSs reflect the best evidence at the time. However, as the field of COS development is continuously evolving (e.g., existing OMIs are further tested and new ones are being developed), the OMIs included in a COS might be reconsidered and/or replaced in light of new evidence.
There may be good reasons for COS developers to deviate from the guideline. For example, OMERACT wants responsiveness to be assessed before inclusion in a provisional core set, whereas we reached consensus for at least high quality evidence for good content validity and for good internal consistency. Another example is that, although a Cronbach’s alpha of >0.95 usually indicates item redundancy, there may be good reasons to retain certain potentially redundant items in a questionnaire. Also, we realize that in practice not all steps might be feasible within a given time frame or budget. We recommend that COS developers should decide what is feasible in their time frame and within their budget.
Although the methods for the selection of OMIs for a COS are considered to be similar to the methods for selecting OMIs for individual clinical trials, it was argued that a higher standard for selecting OMIs for a COS may be justified. Furthermore, it may not be feasible to perform all these steps for a clinical trial. This underlines the importance of the development of COSs, as trialists can then chose to use those OMIs that are included in a COS. When the primary outcome of a clinical trial is not a core outcome, the COS still needs to be measured. However, trialists could apply these recommendations to select the OMI for their primary outcome.
We acknowledge the limitations that might arise because of the relatively low response rate to the initial invitation of our Delphi study. As the results of Delphi studies in general are highly dependent upon the composition of the panel, we aimed to include a sample of experts who represent diverse disciplines, institutes and organizations and reflect the population that is intended to use a guideline for OMI selection. However, it is difficult to examine the representativeness of the panelists as it is impossible to draw a random sample from all experts. Experts were, therefore, selected nonsystematically, which may be considered as a limitation of our Delphi study. Another limitation of our study is that we did not include patient research partners in the Delphi process. We acknowledge that, herewith, we may have omitted their contribution to the selection of OMIs.
This consensus-based guideline on the methods for selecting OMIs for outcomes included in a COS can be used by COS developers and clinical trialists to define how to measure core outcomes (i.e., constructs or domains) for any diseases or other condition in health and social care.
Question: “The minimum standard for internal consistency of outcome measurement instruments to be included in a COS should be a Cronbach’s alpha of between 0.70 and 0.90.”
Question: “If no outcome measurement instrument exists that meets the requirements for adequate measurement properties, it can be included in a COS “conditionally.” What should be the minimum condition before an instrument can be included in a COS?”
“High quality evidence” is defined as consistent findings in multiple studies of at least good quality OR in one study of excellent quality AND a total sample size of 100 patients or more (Table 5)
“Good” is defined as a “+” rating according to the criteria for good measurement properties (Table 4)
Core Outcome Measures in Effectiveness Trials Initiative
Core Outcome Set
COnsensus-based Standards for the selection of health Measurement INstruments
Outcome Measures in Rheumatology
Outcome Measurement Instrument
Chalmers I, Glasziou P. Avoidable waste in the production and reporting of research evidence. Lancet. 2009;374:86–9.
Williamson PR, Altman DG, Blazeby JM, Clarke M, Devane D, Gargon E, et al. Developing core outcome sets for clinical trials: issues to consider. Trials. 2012;13:132.
Gorst SL, Gargon E, Clarke M, Blazeby JM, Altman DG, Williamson PR. Choosing Important Health Outcomes for Comparative Effectiveness Research: An Updated Review and User Survey. PLoS One. 2016;11:e0146444.
COnsensus-based Standards for the selection of health Measurement INstruments. COSMIN website. Available from http://www.cosmin.nl/. Accessed 25 Apr 2016.
Core Outcome Measures in Effectiveness Trials. COMET website. Available from http://www.comet-initiative.org/. Accessed 31 Aug 2016.
Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63:737–45.
Clarke M. Standardising outcomes for clinical trials and systematic reviews. Trials. 2007;8:39.
Schmitt J, Apfelbacher C, Spuls PI, Thomas KS, Simpson EL, Furue M, et al. The Harmonizing Outcome Measures for Eczema (HOME) Roadmap: a methodological framework to develop core sets of outcome measurements in dermatology. J Invest Dermatol. 2014;135:24–30.
Prinsen CAC, Vohra S, Rose MR, King-Jones S, Ishaque S, Bhaloo Z, et al. Core Outcome Measures in Effectiveness Trials (COMET) initiative: protocol for an international Delphi study to achieve consensus on how to select outcome measurement instruments for outcomes included in a ‘core outcome set’. Trials. 2014;15:247.
OMERACT Handbook. Available from http://www.omeract.org/pdf/OMERACT_Handbook.pdf. Accessed 25 Apr 2016.
Boers M, Kirwan JR, Wells GA, Beaton DE, Gossec L, D’Agostino MA, et al. Developing core outcome measurement sets for clinical trials: OMERACT filter 2.0. J Clin Epidemiol. 2014;67:745–53.
Adams D, Liu Y, Bhaloo Z, Hansraj N, Hartling L, Vohra S. Primary Outcomes Reporting in Trials (PORTal): a systematic review of pediatric randomized controlled trials. (accepted by J Clin Epidemiol, 2016). 2013. in press.
Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res. 2010;19:539–49.
Murphy MK, Black NA, Lamping DL, McKee CM, Sanderson CF, Askham J, et al. Consensus development methods, and their use in clinical guideline development. Health Technol Assess. 1998;2:1–88.
Chiarotto A, Deyo RA, Terwee CB, Boers M, Buchbinder R, Corbin TP, et al. Core outcome domains for clinical trials in non-specific low back pain. Eur Spine J. 2015;24:1127–42.
Verhagen AP, de Vet HC, de Bie RA, Kessels AG, Boers M, Bouter LM, et al. The Delphi list: a criteria list for quality assessment of randomized clinical trials for conducting systematic reviews developed by Delphi consensus. J Clin Epidemiol. 1998;51:1235–41.
Jones J, Hunter D. Consensus methods for medical and health services research. BMJ. 1995;311:376–80.
Patient Reported Outcomes Measurement Group, Nuffield Department of Population Health, University of Oxford. Available from http://www.cosmin.nl/images/upload/files/PROM%20Gp%20filtersOCTOBER%202010FINAL.pdf. Accessed 05 Sep 2016.
Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60:34–42.
The Standards for Educational and Psychological Testing. http://www.apa.org/science/programs/testing/standards.aspx. Accessed 25 Apr 2016.
Gargon E, Gurung B, Medley N, Altman DG, Blazeby JM, Clarke M, et al. Choosing important health outcomes for comparative effectiveness research: a systematic review. PLoS One. 2014;9:e99111.
We thank all panelists for their time spent completing all three Delphi questionnaires, including the following who accepted our invitation to be mentioned in the acknowledgement section:
Jordi Alonso, Felix Angst, Dorcas Beaton, Anna Beurskens, Andrea E Bialocerkowski, Jane M Blazeby, Maarten Boers, Peter J Bower, Peter Bragge, Merel Brehm, Zeeshan Butt, Patrina HY Caldwell, Sarah L Chamlin, An-Wen Chan, Alessandro Chiarotto, Kaj S Christensen, Tammy J Clifford, Natalie J Collins, Megan Davidson, Aileen M Davis, Jill Dawson, Marcel P Dijkers, Fiona L Dobson, Antonio Escobar, Ricardo M Fernandes, Montse Ferrer, Carlos G Forero, Robert Frouf, Margreth Grotle, Jan Hartvigsen, Kirstie L Haywood, Bridget E Hill, Albere JA Koke, Ida J Korfage, Jan Kottner, Johanna H van der Lee, Sarah McIntyre, Joerg J Meerpohl, Lidwine B Mokkink, Jennifer L Moore, Catherine J Morgan, Kelly K O’Brien, Amy C Plint, Marcel WM Post, Kevin Pottie, Cecilia AC Prinsen, Carlijn van Randeraad-van der Zee, Bernd Reuschenbach, Ewa M Roos, Michael R Rose, Anne WS Rutjes, Jochen Schmitt, Beverley J Shea, Lee S Simon, Peter M Smith, Claire Snyder, Sally Spencer, Mirjam AG Sprangers, David L Streiner, Nicholas F Taylor, Caroline B Terwee, Kim S Thomas, Peter Tugwell, Peter Vedsted, Henrica CW de Vet, Ester Villalonga-Olives, Sunita Vohra, Angela C Webster, Anne Whitehead, Gavin Williams, Hywel C Williams, Paula R Williamson.
The research has received funding from the European Union’s Seventh Framework Program (FP7/2007-2013) under grant agreement number .
CP conceptualized the study protocol, developed the Delphi questionnaire, coordinated the Delphi study, analyzed the data, wrote the manuscript, and reviewed it for important intellectual content. SV reviewed the study protocol, reviewed the Delphi questionnaire, and reviewed the manuscript for important intellectual content. MR reviewed the study protocol, reviewed the Delphi questionnaire, and reviewed the manuscript for important intellectual content. MB reviewed and revised the manuscript for important intellectual content. PT reviewed and revised the manuscript for important intellectual content. MC reviewed and revised the manuscript for important intellectual content. PW reviewed and revised the manuscript for important intellectual content. CT conceptualized the study protocol, developed the Delphi questionnaire, analyzed the data, wrote the manuscript, and reviewed it for important intellectual content. All authors have given final approval of the version to be published and agreed to be accountable for all aspects of the work.
Dr CB Terwee developed the COSMIN checklist. Dr CB Terwee and Dr CAC Prinsen are members of the COSMIN Steering Committee.
Ethics approval and consent to participate
As this project does not involve patients or study subjects as defined by the Dutch Medical Research in Human Subjects Act (WMO), the study was exempted from ethical approval in The Netherlands and similarly in the UK. Ethical approval was needed, and was obtained, from the Health Research Ethics Board of the University of Alberta, Canada (reference number: Pro00048898). Since our study sample consisted of experts (defined for these purposes as people who have a credibility relating to the target audience as indicated by, for example, authorship of multiple frequently cited publications in this field) and not patients or vulnerable subjects, consent to participate was implied through return of the questionnaire.
About this article
Cite this article
Prinsen, C.A.C., Vohra, S., Rose, M.R. et al. How to select outcome measurement instruments for outcomes included in a “Core Outcome Set” – a practical guideline. Trials 17, 449 (2016) doi:10.1186/s13063-016-1555-2
- Core Outcome Set
- Delphi study
- Instrument selection
- Outcomes research
- Outcome measurement instrument