Skip to main content

Table 1 Steps taken to process the data

From: Some data quality issues at ClinicalTrials.gov

1a. Steps taken to process the data with each record representing one trial

Number

Processing step

Records selected

Records rejected

Table(s) with the details

1

Downloaded files

112,013

 

Additional file 3: Table S1

2

Selected trials with the start date between 1/1/2005 and 12/31/2014

79,838

32,175

Additional file 4: Table S2

3

Selected trials with “drug:” or “biological:” in the intervention field

64,496

15,342

Additional file 5: Table S3

4

Selected trials with a completion date or primary completion date

63,786

710

Additional file 6: Table S4 and Additional file 7: Table S5

5

Selected trials registered with a US authority

35,121

28,665

Additional file 8: Table S6 and Additional file 9: Table S7

6

Selected trials that listed both the investigator’s name and role

31,392

3729

Additional file 10: Table S8

7

Selected trials where the number of investigators’ names matched the number of their roles

31,375

17

Additional file 11: Table S9

1b. Steps taken to process the data, with each row in Additional file 12: Table S10 and Additional file 13: Table S11 representing one investigator

Number

Processing step

Names selected

Names rejected

Table(s) with the details

8

In trials where there were multiple names, separated them to create pairs of single names and their corresponding roles

71,359

 

Additional file 12: Table S10

9

Selected trials where the investigator’s name was that of a real individual

60,787

10,572*

Additional file 13: Table S11

  1. *These 10,572 “names” came from 8907 NCT IDs