Skip to main content

Table 4 Criteria for good measurement properties

From: How to select outcome measurement instruments for outcomes included in a “Core Outcome Set” – a practical guideline

Measurement property

Rating*

Criteria

Percentage of agreement in the Delphi study (%)

Content validity

(including face validity)

+

All items refer to relevant aspects of the construct to be measured AND are relevant for the target population AND are relevant for the context of use AND together comprehensively reflect the construct to be measured

97

?

Not all information for ‘+’ reported

 

Criteria for ‘+’ not met

 

Structural validity

+

CTT:

Unidimensionality: EFA: First factor accounts for at least 20% of the variability AND ratio of the variance explained by the first to the second factor greater than 4 OR Bi-factor model: Standardized loadings on a common factor >0.30 AND correlation between individual scores under a bi-factor and unidimensional model >0.90

Structural validity: CFI or TLI or comparable measure >0.95 AND RMSEA <0.06 OR SRMR <0.08

CTT: 84

Rasch/IRT: 90

Rasch/IRT:

At least limited evidence for unidimensionality or positive structural validity AND no evidence for violation of local independence: Rasch: standardized item-person fit residuals between -2.5 and 2.5; OR IRT: residual correlations among the items after controlling for the dominant factor < 0.20 OR Q3's < 0.37 AND no evidence for violation of monotonicity: adequate looking graphs OR item scalability >0.30 AND adequate model fit: Rasch: infit and outfit mean squares ≥ 0.5 and ≤ 1.5 OR Z-standardized values > -2 and <2; OR IRT: G2 >0.01;

Optional additional evidence:

Adequate targeting; Rasch: adequate person-item threshold distribution; IRT: adequate threshold range

No important DIF for relevant subject characteristics (such as age, gender, education), McFadden's R2 < 0.02

 

?

CTT: Not all information for ‘+’ reported

IRT: Model fit not reported

 

Criteria for ‘+’ not met

 

Internal consistency

+

At least limited evidence for unidimensionality or positive structural validity AND Cronbach's alpha(s) ≥ 0.70 and ≤ 0.95

89

?

Not all information for ‘+’ reported OR conflicting evidence for unidimensionality or structural validity OR evidence for lack of unidimensionality or negative structural validity

 

Criteria for ‘+’ not met

 

Reliability

+

ICC or weighted Kappa ≥ 0.70

88

?

ICC or weighted Kappa not reported

 

Criteria for ‘+’ not met

 

Measurement error

+

SDC or LoA < MIC

72

?

MIC not defined

 

Criteria for ‘+’ not met

 

Hypotheses testing

+

At least 75% of the results are in accordance with the hypotheses

87

?

No correlations with instrument(s) measuring related construct(s) AND no differences between relevant groups reported

 

Criteria for ‘+’ not met

 
 

+

No important differences found between language versions in multiple group factor analysis or DIF analysis

 

Cross-cultural validity

?

Multiple group factor analysis AND DIF analysis not performed

84

 

One or more criteria for ‘+’ not met

 

Criterion validity

+

Convincing arguments that gold standard is “gold” AND correlation with gold standard ≥ 0.70

88

?

Not all information for ‘+’ reported

 

Criteria for ‘+’ not met

 

Responsiveness

+

At least 75% of the results are in accordance with the hypotheses

88

?

No correlations with changes in instrument(s) measuring related construct(s) AND no differences between changes in relevant groups reported

 

Criteria for ‘+’ not met

 
  1. Modified from Terwee et al. [19]
  2. AUC = area under the curve, CFI = comparative fit index, CTT = classical test theory, DIF = differential item functioning, EFA = exploratory factor analysis, ICC = intraclass correlation coefficient, IRT = item response theory, LoA = limits of agreement, MIC = minimal important change, RMSEA = root mean square error of approximation, SEM = Standard Error of Measurement, SDC = smallest detectable change, SRMR = standardized root mean residuals, TLI = Tucker-Lewis index
  3. * “+” = positive rating, “?” = indeterminate rating,” –“ = negative rating