Skip to main content

Table 2 Rate of correct verification from suspected matches

From: Evaluating the re-identification risk of a clinical study report anonymized under EMA Policy 0070 and Health Canada Regulations

StudyData details% of suspected matches verified as actual matches
Kwok and Lafky and colleagues [25, 26]Matched 15,000 Safe Harbor de-identified admission records from a regional hospital to a marketing dataset of 30,000 records10% (2/20)
Elliot et al. [29]Sampled records from the UK Labour Force Survey (LFS) and the Living Costs and Food Survey (LCF) to re-identify. Matches were performed with and without the Output Area Classifier (OAC), which provides more precise geography• LFS: 12% (6/50) using web-based info to match with;28% (14/50) using commercial data
• LCF: 10% (2/20) for dataset without OAC;43% (18/42) for dataset with OAC
Tudor and colleagues [30, 31]Data examined were tabular in nature, consisting of 89 tables that were determined to be potentially high risk• 36% claims of identifying a neighbor were correct
• 61% correct for identifying self/family
• All claims, except one, involved people the intruder knew
Sweeney [45]News reports of hospitalizations (n = 81) were used to identify individuals in a Washington state hospital inpatient dataset of 648,384 records23% (8/35)