Skip to main content

Table 2 Rate of correct verification from suspected matches

From: Evaluating the re-identification risk of a clinical study report anonymized under EMA Policy 0070 and Health Canada Regulations

Study

Data details

% of suspected matches verified as actual matches

Kwok and Lafky and colleagues [25, 26]

Matched 15,000 Safe Harbor de-identified admission records from a regional hospital to a marketing dataset of 30,000 records

10% (2/20)

Elliot et al. [29]

Sampled records from the UK Labour Force Survey (LFS) and the Living Costs and Food Survey (LCF) to re-identify. Matches were performed with and without the Output Area Classifier (OAC), which provides more precise geography

• LFS: 12% (6/50) using web-based info to match with;28% (14/50) using commercial data

• LCF: 10% (2/20) for dataset without OAC;43% (18/42) for dataset with OAC

Tudor and colleagues [30, 31]

Data examined were tabular in nature, consisting of 89 tables that were determined to be potentially high risk

• 36% claims of identifying a neighbor were correct

• 61% correct for identifying self/family

• All claims, except one, involved people the intruder knew

Sweeney [45]

News reports of hospitalizations (n = 81) were used to identify individuals in a Washington state hospital inpatient dataset of 648,384 records

23% (8/35)