Evaluating the re-identification risk of a clinical study report anonymized under EMA Policy 0070 and Health Canada Regulations

Table 2 Rate of correct verification from suspected matches

Study	Data details	% of suspected matches verified as actual matches
Kwok and Lafky and colleagues [25, 26]	Matched 15,000 Safe Harbor de-identified admission records from a regional hospital to a marketing dataset of 30,000 records	10% (2/20)
Elliot et al. [29]	Sampled records from the UK Labour Force Survey (LFS) and the Living Costs and Food Survey (LCF) to re-identify. Matches were performed with and without the Output Area Classifier (OAC), which provides more precise geography	• LFS: 12% (6/50) using web-based info to match with;28% (14/50) using commercial data • LCF: 10% (2/20) for dataset without OAC;43% (18/42) for dataset with OAC
Tudor and colleagues [30, 31]	Data examined were tabular in nature, consisting of 89 tables that were determined to be potentially high risk	• 36% claims of identifying a neighbor were correct • 61% correct for identifying self/family • All claims, except one, involved people the intruder knew
Sweeney [45]	News reports of hospitalizations (n = 81) were used to identify individuals in a Washington state hospital inpatient dataset of 648,384 records	23% (8/35)

ISSN: 1745-6215