Volume 12 Supplement 1

Clinical Trials Methodology Conference 2011

Open Access

Central statistical monitoring in clinical trials

  • Amy A Kirkwood1 and
  • Allan Hackshaw1
Trials201112(Suppl 1):A55


Published: 13 December 2011


On-site monitoring is a common but time-consuming and expensive activity, with little evidence that it is worthwhile. Centralised statistical monitoring (CSM) is a much cheaper alternative, where data checks are performed by the co-ordinating centre, reducing the need to visit every site. Although some publications have outlined possible methods, few have applied them to data from real clinical trials.


R-programs were developed to check data at either the patient or site level, for fraud or data errors. These included finding anomalous data patterns, digit preference, rounding, incorrect dates (eg weekends/holidays), values of variables too close or too far from the means, odd correlation structures and extreme values or variances. We applied these to 3 trials: (i) where data had already been checked, (ii) an ongoing trial where our findings could be checked in real-time, and (iii) where data errors and fake patients were created.


The programs were designed to be run automatically and produce simple tables or figures. Few errors were detected in the trial where data had already been checked (as expected). Most data errors were found in the two other trials. The programs were able to detect data errors, as well as fabricated patients that we generated to have values that were too close to the multivariate mean (fig. 1). They also detected centres that had too few or too many serious adverse events (fig. 2). It might be difficult to reliably apply some of the programs to centres with few patients. Several patients that were fabricated were not detected because the data did not follow the assumptions used by the R-programs, or the number of fabricated patients within a centre was too small. Examples of the different output produced, including easy-to-read diagrams and how they are interpreted, could be shown and discussed, along with their strengths and limitations.
Figure 1

Patient level data checks. Output for one site, in which data were faked for 2 patients (shown in grey) by creating values for several variables that were close to the mean of all patients (which is more likely to occur if data were to be faked). Patients with values which lie too close to the multivariate mean are shown away from the others and were picked up (and circled in red) by the program.

Figure 2

Site level data checks. The y-axis represents the SAE rate per site, allowing for time in the trial by patients. The lowest 10% of SAE rates are shown as black squares. The circled observation is for a site where data were faked so that the site had too few SAEs, compared to the average for all sites (horizontal line). Sites in the bottom right hand corner have lower than expected SAE rates but relatively large numbers of patients, so could have on-site monitoring checks.


CSM appears to be a cost-effective and worthwhile alternative to on-site monitoring. It can identify incorrect patient data, or centre where the data considered together is too different to all other sites and therefore should be reviewed. However, more research is needed to identify which situations CSM does not work well in.

Authors’ Affiliations

CRUK and UCL Cancer Trials Centre


© Kirkwood and Hackshaw; licensee BioMed Central Ltd. 2011

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.