The International Stroke Trial database

Background We aimed to make individual patient data from the International Stroke Trial (IST), one of the largest randomised trials ever conducted in acute stroke, available for public use, to facilitate the planning of future trials and to permit additional secondary analyses. Methods For each randomised patient, we have extracted data on the variables assessed at randomisation, at the early outcome point (14-days after randomisation or prior discharge) and at 6-months and provide them as an analysable database. Results The IST dataset includes data on 19 435 patients with acute stroke, with 99% complete follow-up. Over 26.4% patients were aged over 80 years at study entry. Background stroke care was limited and none of the patients received thrombolytic therapy. Conclusions The IST dataset provides a source of primary data which could be used for planning further trials, for sample size calculations and for novel secondary analyses. Given the age distribution and nature of the background treatment given, the data may be of value in planning trials in older patients and in resource-poor settings.


Background
The International Stroke Trial (IST) was conducted between 1991 and 1996 (including the pilot phase between 1991 and 1993). It was a large, prospective, randomised controlled trial, with 100% complete baseline data and over 99% complete follow-up data. The aim of the trial was to establish whether early administration of aspirin, heparin, both or neither influenced the clinical course of acute ischaemic stroke [1].

Methods
The study had a prospective, randomised, open treatment, blinded outcome (PROBE) design. The inclusion criteria were: clinical diagnosis of acute ischaemic stroke, with onset within the previous 48 hours and no clear indication for, or clear contraindication to, treatment with aspirin or subcutaneous heparin. Unlike many stroke trials of that era (and subsequently), the study did not set an upper age limit. Patients were to have a CT brain scan to confirm the diagnosis of stroke, and this was to be done before randomisation if at all possible. To enter a patient in the study, the clinician telephoned a central randomisation service (at the Clinical Trial Service Unit, Oxford) during this telephone call, the baseline variables were entered and checked, and once validated, the computer allocated the treatment and the telephonist then informed the clinician. The patients and treating clinicians were not blinded to the treatment given. Early outcome data were collected by the treating physician who completed a follow-up form at 14 days, death or hospital discharge (whichever occurred first). This form recorded data on events in hospital within 14 days, and the doctor's opinion on the final diagnosis of the initial event that led to randomisation. These unblinded data, may therefore be subject to some degree of bias. The primary outcome was the proportion of patients who were either dead or dependent on other people for activities of daily living at six months after randomisation. This outcome was collected by postal questionnaire mailed directly to the patient, or (in Italy) by telephone interview of the patient by a trained researcher, blinded to treatment allocation. The primary outcome was therefore assessed -as far as practicableblind to treatment allocation and hence should be free from bias. We re-checked the data set for inaccuracies and inconsistencies and extracted data on the variables assessed at randomisation, and at the two outcome assessment points: at 14-days after randomisation, death or prior hospital discharge (whichever occurred first) and at 6-months.

Results
Consent for publication of raw data was not obtained from participants. Consent for participation in the trial was obtained from all subjects or from an appropriate proxy, according to the procedures approved by relevant national and local hospital ethics committees (or Institutional Review Boards [IRB]). These patients were treated 15-20 years ago, and many have died. The dataset (see additional file 1 -IST_data.csv) is fully anonymous in a manner that can easily be verified by any user of the dataset. Patients and hospitals are identified only by an anonymous code; there are no identifying data such as name, address or social security numbers; patient age has been rounded to the nearest whole number. In our view, publication of the dataset clearly presents no material risk to confidentiality of study participants.
The dataset includes the following baseline data: age, gender, time from onset to randomisation, presence or absence of atrial fibrillation (AF), aspirin administration within 3 days prior to randomisation, systolic blood pressure at randomisation, level of consciousness and neurological deficit. The deficits were classified as one of the Oxfordshire Community Stroke Project (OCSP) categories: total anterior circulation syndrome (TACS), partial anterior circulation syndrome (PACS), posterior circulation syndrome (POCS) and lacunar syndrome (LACS). We extracted events within 14 days on: the occurrence of recurrent stroke, pulmonary embolism, and death (date and cause of death). At 6 months we extracted: degree of recovery, place of residence and current use of antiplatelet or anticoagulant drugs and death (date and cause of death). The cause of death was classified as: due to initial stroke, recurrent ischaemic stroke, recurrent haemorrhagic stroke, pneumonia, coronary artery disease, pulmonary embolism, other vascular cause or a nonvascular cause. Patients were assigned to one of 6 categories according to the place of residence at 6 months following stroke: own home, relatives home, residential care, nursing home, other hospital departments or unknown. The variables extracted are listed with a brief description of each in Tables 1, 2 and 3. Nineteen thousand four hundred and thirty five patients from 467 hospitals in 36 countries were randomised within      Table 1) FU1_COMP Date discharge form completed (days elapsed from randomisation) 48 hours of symptoms onset, of whom 13020 had a CT before randomisation, 5569 were first scanned after randomisation and 846 were not scanned at all. Five thousand one hundred thirty two (26.4%) were aged over 80 years at study entry. Given that 5569 patients were first scanned after randomisation, and 846 were not scanned at all, the 'final diagnosis' is somewhat imprecise. However, since the analysis was by intention to treat, all participants were retained in the analysis, irrespective of the final diagnosis. The numbers of patients with each final diagnosis are given in Table 4. Whilst the 'final diagnosis variable' is of some interest, it may be influenced by events occurring after randomisation, so for any future analyses, the least biased assessment of the patient characteristics is that recorded at baseline, before randomisation.
To restrict analyses to cases of definite ischaemic stroke, confirmed at the time of trial entry, the variable denoting whether CT had been performed before entry (RCT) should = Y and the final diagnosis should also be ischaemic (DDIAGISC=Y).
Please note that, in the original 1997 Lancet report on the trial [1], figures two a and two b reported the effects of allocation to aspirin and to heparin on the primary outcome, subdivided by various baseline characteristics and by the final diagnosis. The numbers of patients with each pathological type of stroke are somewhat different to the numbers above, because they relate to the number of patients with complete 6 month follow-up data, whereas the numbers above relate to all randomised patients.

Anonymisation
As recommended by Hrynaszkiewicz et al. [2] we have removed all direct and indirect identifiers from the database. We therefore present patient's age rounded to the nearest whole number of years. Time of admission to

NCCODE
Coding of compliance (see Table 3 hospital (a potential identifier) was not recorded. Dates of events occurring post randomisation have been converted to the number of days from randomisation. The time variables that were recorded (see below) referred to time of randomisation in the trial (i.e. the time at which the system generated the treatment allocation), not time of admission to hospital, a variable, that -in our viewwould not help identify the patient.

Discussion
This large data set, with very complete follow-up, includes a very broad range of acute stroke patients with a uniquely large number of very elderly patients, and so may be useful to researchers planning future research studies. Users of the dataset should be aware that the study was conducted at a time when stroke unit care was not widely available and thrombolytic therapy was used rarely (and none of the included patients received it) [3]. Thus, the background stroke care for the included subjects, while not typical of present-day acute stroke care [4], is perhaps more typical of current stroke care in resource poor settings [5]. Given that the developing world faces a future epidemic of noncommunicable diseases, including stroke [5], these data may therefore prove particularly valuable for planning future trials in resource-poor settings. In the developed world, the proportion of the general population who are 'very elderly' is rapidly increasing. Older people have been substantially under-represented in stroke trials to date [6], so we hope the large number of patients aged over 80 in this data set could also facilitate planning of trials in the 'older old'. The publication of raw datasets such as the IST's may offer wholly unanticipated benefits to the wider research community. For example, the dataset was licensed to an independent statistical group who used the data to estimate the size and direction of biases introduced when non-randomised comparisons were made and the differences between direct and indirect comparisons. This empirical work led to two important publications on the topic [7,8]. Such additional benefits, realised long after the original trial was completed, are a further clear indication of the value of opening access to such datasets.

Note for users of the data set
The authors ask that any publications arising from the use of this dataset acknowledges the source of the dataset, its funding and the collaborative group that collected the data.

Sources of Funding
The study was principally funded by the UK Medical Research Council, the UK Stroke Association, and the European Union BIOMED-1 program. Limited support for collaborators' meetings and travel was provided by Eli Lilly, Sterling Winthrop (now Bayer USA), Sanofi, and Bayer UK. Follow-up in Australia was supported by a grant from the National Heart Foundation and in Canada by a Nova Scotia Heart and Stroke Foundation grant. Czech Republic IST was supported by a grant from the IGA Ministry of Health. India IST was supported by the McMaster INCLEN program and the All India Institute of Medical Sciences. The IST in New Zealand was funded by the Julius Brendel Trust and the Lottery Grants Board. In Norway, the IST was supported by the Norwegian Council on Cardiovascular Disease and Nycomed (for insurance).

Additional material
Additional file 1: Database with information completed in IST.