Skip to main content

Table 1 Issues to consider. The table describes issues to be considered before the decision to use HSD for collecting outcomes in an RCT is finalised are described here. The aim is to help the trial team make an informed judgement based on an understanding of the suitability of HSD for outcome data in the context of the specific clinical trial, and to build in mitigation, for example including the option to supplement with data directly from participants or sites

From: Using healthcare systems data for outcomes in clinical trials: issues to consider at the design stage

Working through the items below may highlight ways trialists can work with HSD providers to improve how such trials are designed and delivered

    • Trialists should consider additional incurred costs or unanticipated workarounds required during trial, e.g. changes in legislation, delays in data release, periodic renewal of data sharing agreements

    • Strategies to address uncertainties might include building contingency fund or agreeing phased project plan with funder

    • Researchers are encouraged to risk assess a broad range of possible scenarios and consider potential mitigation strategies

(1) Terminology

    • Be aware terminology within data access applications will likely differ between providers; seek clarification/ examples from provider

    • Ensure awareness of how terms can be interpreted by individuals involved across multiple organisations

(2) Feasibility

2.1 Team

        • Seek to include in trial team: trial operations professionals, data and health specialists with experience of completing data access forms and analysing data from provider/s for relevant health research question

        • Ideally needs to include individuals who:

            (1) understand datasets, structure, interpretation and quality;

            (2) understand how and when data are collected at source;

            (3) have skills to handle dataset when provided;

            (4) will undertake statistical and health economic analysis

        • Where knowledge gaps are identified, seek funding for training and development activities

2.2 Data

        • Trialists should be aware of how HSD are entered &, coded, QA processes, how data are validated at point of upload and how transferred

        • Data providers should be approached to provide this information

        • Trialists should justify the use of healthcare systems datasets in the trial protocol and, in greater detail, in the appropriate section of the Trial Master File (https://doi.org/10.1016/S2589-7500(22)00122-4https://zenodo.org/records/6047155 and Appendix 2 of https://zenodo.org/records/6047938/files/Routine_dataset_justification_template_v1.0_2022-02-15.docx?download=1)

2.2.a Does the HSD include what the trial needs?

        • Using data provider’s available data dictionary, establish which outcome measures are collected “routinely”

        • Ascertain costs of data provision

        • Ascertain data provider timelines for data verification/release

        • Consider need for repeated data releases and costs relating to data retention

        • Discuss processes for data linkage if linking to trial cohort and/or multiple data sources are sought

        • If time and resources permit, interrogate for limitations before deciding to use HSD

        • Dataset may cover only subset of outcomes relevant to trial question. Consider how other outcome data will be collected, or whether benefit of using single approach to data collection outweighs value of collecting data across multiple sources

        • Additionally, take into consideration the follow-up outcomes, and their availability form HSD

        • For registry-based trials, discuss whether registry team could adapt or supplement routine HSD collection to meet trial’s needs without compromising integrity of registry

        • Whether HSD may be appropriate for aspects of safety reporting depends on clinical trial risk profile. Consider during trial design and define clearly in protocol. Likely to be appropriate in low-risk trials where adverse events are not informing emergent safety profile of treatment

        • Timeliness of data provision should be considered in relation to safety monitoring plans

        • Establish whether any precedent, or evidence of public support for accessing these data for research, exists, or alternatively whether issues have arisen previously. Consider trial participants’ needs for understanding of the use of their HSD for outcomes in research and how that may vary according to study populations

2.2.b Data quality assurance

        • Establish whether provider can provide information regarding data provenance, integrity, and completeness

        • Understand timeliness of collection of data held by provider, e.g. whether there is lag between site data collection and entry into provider system, or whether data is only released at certain time of year

        • Understand how provider receives and processes data, and how changes in processing and coding are handled and communicated

        • Consider what is known, from previous literature, about validity and completeness of outcome data, which may include national audit reports

        • Assess whether it is realistic to be able to provide funder with accurate idea of HSD data quality at application, or whether it is possible to build in approaches to examine uncertainty during trial

2.2.c Time

        • Ask provider how long it will take from point of request and then from point of approval to supply specified dataset to trial team

        • Determine if contract includes binding timelines and decide what is acceptable delay for delivery of data for first occasion and subsequent deliveries

        • Establish whether this time will reduce if datasets are requested on multiple occasions during trial. Consider in relation to whether interim analyses are planned or when using HSD for monitoring safety outcomes

2.2.d Algorithms for deriving outcomes

        • Explore whether validated algorithm for deriving outcomes from HSD exists

        • If not, consider whether to include time to develop and test proposed algorithm, within utility comparison

2.2.e Considerations around missing data

        • Be aware of timing of data entry processes into HSD resource by clinical teams and data entry clerks, and subsequent availability or missingness, which may vary across sites. For example, within registries outcomes may be entered on annual basis or annual reviews may be delayed

        • Be aware of how long data may take from local collection into national or collated set, and how long it takes for latter to be released

        • Discuss whether possible to go back to participating sites to collect missing data

        • Otherwise consider imputation from other available variables, or other HSD datasets, with collection of extra variables to maximise effectiveness of imputation method. This may be where contingency fund for unanticipated workarounds would be helpful

2.2.f Consideration of potential reporting errors/discrepancies

        • Discuss mechanism and opportunity for resolution of discrepancies with provider

        • Ask provider whether they have guidance on range of possible solutions based on experience (e.g. rules of precedence, windows for ‘same dates’, impossible events)

        • Always cost for managing data queries — could be part of contingency management

2.2.g Preparation of trial dataset

        • Discuss with provider whether raw data or analysis-ready data will be provided. For example, it may be useful to consider whether trial team will need to do additional analyses for primary analysis, implying raw data more appropriate

        • However, if the trial team has limited statistical support or only need one or two defined analyses, analysis-ready data might be more appropriate

        • Cost and time may be a factor — access to analysis-ready data could be more costly or take longer to receive

        • Additional considerations might be ability to verify derivation of analysis-ready data undertaken by third party. Raw data might be more appropriate here, where the trial team has control over analysis steps provided local statistical expertise exists

(3) Internal pilot

        • If internal pilot to be undertaken, determine how use of HSD compares to collecting outcome data traditionally, e.g. in terms of sufficiency, timeliness, completeness, and cost-effectiveness. Trial team needs to consider whether setting up trial using both approaches justified in terms of cost and complexity, e.g. by providing added value for health area more widely than individual trial

        • If internal pilot felt to be valuable and feasible, consider progression criteria to be applied to aspects related to use of HSD

(4) Onward data sharing

        • Discuss funder’s requirements for onward data sharing and whether provider can approve, considering it can facilitate further research and extend efficiency gains

        • Ensure issues around onward sharing or subsequent access considered in data sharing agreement/contract as well as any resources involved

        • Consider prospectively who (in broadest sense, e.g. trial oversight committees, trial team, industry partners, future meta-analysts) needs to see HSD, as raw or aggregated data

        • Explore legal, ethical and governance responsibilities in advance within appropriate timeframes. There may also be implications for consent forms for the trial, allowing further use of data past initial trial

        • Ensure any ethical or governance issues regarding the sharing of data from the registry are addressed

(5) Data destruction and archiving

        • Discuss regulatory requirements for archiving period with data provider, ensuring archiving agreements compliant with clinical trials regulations

        • Discuss costs associated with holding data for archiving period, and permissions to retain anonymised data, in original or derived format, beyond archive period