Skip to main content

Table 4 Criteria for good measurement properties

From: How to select outcome measurement instruments for outcomes included in a “Core Outcome Set” – a practical guideline

Measurement property Rating* Criteria Percentage of agreement in the Delphi study (%)
Content validity
(including face validity)
+ All items refer to relevant aspects of the construct to be measured AND are relevant for the target population AND are relevant for the context of use AND together comprehensively reflect the construct to be measured 97
? Not all information for ‘+’ reported  
Criteria for ‘+’ not met  
Structural validity + CTT:
Unidimensionality: EFA: First factor accounts for at least 20% of the variability AND ratio of the variance explained by the first to the second factor greater than 4 OR Bi-factor model: Standardized loadings on a common factor >0.30 AND correlation between individual scores under a bi-factor and unidimensional model >0.90
Structural validity: CFI or TLI or comparable measure >0.95 AND RMSEA <0.06 OR SRMR <0.08
CTT: 84
Rasch/IRT: 90
Rasch/IRT:
At least limited evidence for unidimensionality or positive structural validity AND no evidence for violation of local independence: Rasch: standardized item-person fit residuals between -2.5 and 2.5; OR IRT: residual correlations among the items after controlling for the dominant factor < 0.20 OR Q3's < 0.37 AND no evidence for violation of monotonicity: adequate looking graphs OR item scalability >0.30 AND adequate model fit: Rasch: infit and outfit mean squares ≥ 0.5 and ≤ 1.5 OR Z-standardized values > -2 and <2; OR IRT: G2 >0.01;
Optional additional evidence:
Adequate targeting; Rasch: adequate person-item threshold distribution; IRT: adequate threshold range
No important DIF for relevant subject characteristics (such as age, gender, education), McFadden's R2 < 0.02
 
? CTT: Not all information for ‘+’ reported
IRT: Model fit not reported
 
Criteria for ‘+’ not met  
Internal consistency + At least limited evidence for unidimensionality or positive structural validity AND Cronbach's alpha(s) ≥ 0.70 and ≤ 0.95 89
? Not all information for ‘+’ reported OR conflicting evidence for unidimensionality or structural validity OR evidence for lack of unidimensionality or negative structural validity  
Criteria for ‘+’ not met  
Reliability + ICC or weighted Kappa ≥ 0.70 88
? ICC or weighted Kappa not reported  
Criteria for ‘+’ not met  
Measurement error + SDC or LoA < MIC 72
? MIC not defined  
Criteria for ‘+’ not met  
Hypotheses testing + At least 75% of the results are in accordance with the hypotheses 87
? No correlations with instrument(s) measuring related construct(s) AND no differences between relevant groups reported  
Criteria for ‘+’ not met  
  + No important differences found between language versions in multiple group factor analysis or DIF analysis  
Cross-cultural validity ? Multiple group factor analysis AND DIF analysis not performed 84
  One or more criteria for ‘+’ not met  
Criterion validity + Convincing arguments that gold standard is “gold” AND correlation with gold standard ≥ 0.70 88
? Not all information for ‘+’ reported  
Criteria for ‘+’ not met  
Responsiveness + At least 75% of the results are in accordance with the hypotheses 88
? No correlations with changes in instrument(s) measuring related construct(s) AND no differences between changes in relevant groups reported  
Criteria for ‘+’ not met  
  1. Modified from Terwee et al. [19]
  2. AUC = area under the curve, CFI = comparative fit index, CTT = classical test theory, DIF = differential item functioning, EFA = exploratory factor analysis, ICC = intraclass correlation coefficient, IRT = item response theory, LoA = limits of agreement, MIC = minimal important change, RMSEA = root mean square error of approximation, SEM = Standard Error of Measurement, SDC = smallest detectable change, SRMR = standardized root mean residuals, TLI = Tucker-Lewis index
  3. * “+” = positive rating, “?” = indeterminate rating,” –“ = negative rating