Protocol for the Development of a Repository of Individual Participant Data From Randomised Controlled Trials Conducted in UK Adult Care Homes (The Virtual International Care Homes Trials Archive (VICHTA))


 Background Approximately 418,000 people live in care homes in the UK, yet accessible, robust data on care home populations and organisation are lacking. This hampers our ability to plan, allocate resources or prevent risk. Large randomised controlled trials (RCTs) conducted in care homes offer a potential solution. The value of detailed data on residents’ demographics, outcomes and contextual information captured in RCTs has yet to be fully realised. Irrespective of the intervention tested, much of the trial data collected overlaps, in terms of structured assessments and descriptive information. Given the time and costs required to prospectively collect data in these populations, pooling anonymised RCT data into a structured repository offers benefit; secondary analyses of pooled RCT data can improve understanding of this under-researched population, and enhance future trial design. This protocol describes the creation of a repository of individual participant data (IPD) from trials conducted in care homes, to address the need for accurate, high quality IPD on this vulnerable population.Methods Informed by scoping of relevant literature, the principal investigators of RCTs conducted in adult care homes in UK since 2010 will be invited to contribute trial IPD. Contributing trialists will form a Steering Committee, who will oversee data sharing and remain gatekeepers of their own trial’s data. IPD will be cleaned and standardised in consultation with the Steering Committee for accuracy. Planned analyses include comparison of pooled IPD with point estimates from administrative sources, to assess generalisability of RCT data to the wider care home population. We will also identify key resident characteristics and outcomes from within the trial repository, which will inform development of a national minimum dataset for care homes. Following project completion, management will migrate to the Virtual Trials Archives, forming a legacy dataset which will be accessible to the wider research community for analyses. Discussion Analysis of pooled IPD has the potential to inform and direct future practice, research and policy at low cost, enhancing the value of existing data and reducing research waste. We aim to create a permanent archive for care home trial data, and welcome the contribution of emerging trial datasets.


Introduction Background & rationale {6a}
Approximately 418,000 people live in care homes in the UK, yet accessible, reliable data on UK care homes, their residents, and staff are lacking. The dearth of accessible, high quality data has been highlighted previously, but was starkly exposed in the recent and continuing COVID-19 pandemic (2).
Information about care home capacity, sta ng, health and social care needs and resident demographics are each required in order to inform resource allocation and meet their care needs. Administrative data (e.g. UK O ce of National Statistics census) provides information about age, sex and demographic change in care home population over time, but cannot be readily linked to the long-term health, function or quality of life of individual residents. Length of stay, life expectancy and mortality of the care home population are not reliably known. Large cohort studies of older adults give much richer health data, but the proportion of care home residents in such studies is low (3,4). For example, Cognitive Function and Ageing Studies (CFAS) reports on 543 residents, English Longitudinal Study of Ageing (ELSA) reports on 303 residents (5,6). Internationally, large care home datasets are available, for example through insurance schemes in private healthcare systems. However, with any routinely collected data there are always concerns over data quality and for many of these registers the data collected speak to a certain purpose only, and may not contain the most relevant clinical information. In addition to problems sourcing data about residents, it is also di cult to nd consistent information about the fragmented care home market, including sta ng (ratios and retention), case mix, funding mix, and ownership. The lack of publicly available, national data on the care home sector is detrimental to those who live and work there. By failing to quantify the needs of those requiring care and their journey before entering care homes, local and national planning for the care needs of the ageing population living with dementia, multimorbidity and frailty is impaired (7). For example, it is estimated that care home capacity will need to expand to facilitate care for those with complex needs to receive care at the end of their lives (8,9). However current sta ng, funding source resident pathways to care and capacity to provide care is unknown.
Large randomised controlled trials (RCTs) conducted solely in care homes are a growing resource (10), collecting detailed information about every care home and resident they recruit. While these RCTs may focus on a variety of health/care topics (e.g. falls risk, medication management, nutrition, or infection) from the study team's experience of working with various care home trials, we know that there is much overlap in outcome measurement, and information collected on both residents and the care home structure. Trials in care homes monitor participants regularly, often for up to one year. Outcome measures, health resource use, and clinical events as well as care home characteristics can therefore be tracked over this period, allowing for longitudinal analysis. Secondary analysis of individual participant data (IPD) allows for more complex and exible analyses than is possible with only summary-level results. Whilst single care home trial datasets are valuable, if IPD from existing trials could be pooled, they would collectively provide a much larger, richer dataset on residents and staff of care homes. Repurposing care home trial data would permit rapid synthesis of large IPD through which to generate evidence based on high quality data. This principle aligns with current moves towards improving e ciency and reducing research waste (11); a theme of increasing importance to funders and peer reviewers. Pooled IPD would permit exploratory analysis to better understand the care home population, reduce duplication of effort, and re ne and pilot future research questions. The International Committee of Medical Journal editors has reiterated its commitment to improve trial transparency by sharing IPD from RCTs and registries (12), and strive to normalise the sharing of de-identi ed trial data (13). UK Clinical Trials units have also signalled their support (14) and all trials started after January 1, 2019, must include an IPD sharing plan in their trial registration (13).

Data repository models
Generic data repositories such as www. gshare.com and www.datadryad.org are available to access IPD from single trials. To allow data from multiple trials to be pooled into a single source within a secure data infrastructure, we will replicate the model developed by the Virtual Trials Archives (VTA) (15). VTA was established in 2001, bringing together multiple, large, international data sets from completed clinical trials on stroke research (16,17). It has since expanded to include two additional repositories in areas of cardiovascular and cognition (VICCTA), and renal transplantation (VIRTTA) (18). VTA is a not-for-pro t collaboration, with datasets hosted by the Robertson Centre for Biostatistics (RCB) at the University of Glasgow, UK. The VTA facilitates a wide range of empirical and methodological research including recent projects on test accuracy (19), psychometrics (20), prognosis (21), and trial design (22). Unlike with a traditional IPD meta-analysis (23,24), a key tenet is that data should be used for novel research and not to test original hypotheses from contributed RCTs. Investigators can access data by submitting a research proposal on the VTA website. Following approval by the relevant repository Steering Committee (a virtual collaboration of the original trialists), data extraction is tailored to the speci c research question, and the requesting investigator is granted access to analyse the bespoke data extract on a secure analysis platform. On completion, the anonymised data extract is archived centrally. The VTA is funded by administrative charges per data request, which supports data curation, storage, continued development and day-to-day administration of the resource. VTA has a well-established governance infrastructure, with ability to host data securely on a working data-sharing platform, and expertise to manage future trial inclusion and data access requests. To enable the care home trial repository to operate on a long-term basis, we are working closely with the VTA from the outset. Once operational, the repository will formally migrate to the VTA, where it will be named the Virtual International Care Homes Trials Archive (VICHTA).
This protocol describes the creation of a care home trial repository as part of a funded project (the Developing research resources And minimum data set for Care Homes' Adoption and use (DACHA) study; hereby described as the 'development stage'), and also outlines plans for operation of the VICHTA repository that will be accessible beyond the DACHA study (hereby described as the 'operational stage').
Our aims are to create a repository of IPD from RCTs conducted in UK care homes; and use the repository data to conduct analyses to inform a care home minimum dataset relevant to the UK context (25). Internationally there is signi cant heterogeneity in the terminology used in practice and research to describe the settings in which long-term care is delivered (26,27). We have used the term 'care home' to describe care facilities that provides 24-hour care to their residents, including those with and without onsite registered nursing staff.

Identifying trials
A scoping review identi ed potential care home trials for inclusion. As part of preparatory work, we contacted a small number of trialists who had completed RCTs in UK care homes to date. Based on provisional agreement from ve of these trialists, we anticipate the repository will initially combine trial data for over 4200 residents from 250 care homes across the UK. Through an ongoing scoping review, we have identi ed a further thirteen potential trials, representing an additional 6000 residents from approximately 500 care homes. We anticipate this will increase further as the project develops. Additional trials will be identi ed through an ongoing Google Scholar alert, systematically through concurrent reviews (Prospero: CRD42020155923), by contacting all trialists listed in the NIHR "Advancing Care" Themed Review (10) (44 studies featured), the CLAHRC National Work stream Report (28) (32 studies featured), and snowballing techniques utilising the DACHA project management team, study steering committees, and their professional networks.
Approaching/inviting trialists to share their data We have created a database to track potentially eligible trials, where we will record how IPD are requested, collected, and managed, and log of all contact with trialists. We will write to original trialists explaining the purpose of the repository and how it will operate. A reminder email will be sent two weeks after the initial contact if the trialist has not responded. If the trialist declines or does not respond, we will log this dataset as unavailable. Following a positive response, we will set up a meeting (phone, Zoom, or face to face depending on trialist preference) to outline the project in more detail. If a trialist agrees to participate, they will be asked to sign a data transfer agreement that covers the transfer, use and storage of their trial data (see Terms of Reference, Supplementary Index 1).

Establishing Trialist Steering Committee (TSC)
Contributing trialists will make up the TSC, to oversee sharing, combining and repurposing of the pooled trial data. While day-to-day co-ordination will be led by the DACHA co-ordinator at University of Hertfordshire (LI) and latterly the Virtual Trials Archive (MA), the TSC will agree on Terms of Reference for the collaboration, including the approval process for data requests, and will have the ultimate responsibility for all decisions regarding strategy, con dentiality, scienti c matters and determining publication policy. This system mirrors the VTA, to which the care home repository will ultimately migrate.
The main role of the TSC during the DACHA-funded phase will be to provide advice on trial speci c details to aid with the pooling of datasets and better understanding of original data. Key information will be drawn from the original trial protocol, funders report, and standard study documentation such as case report form templates and statistical analysis plans, but if any issues are not dealt with from those sources, we will seek clari cation from the original trial team.
Phase 2: Creating repository, preparing data and pooling individual trial datasets Contributing trial data to repository Once an agreement has been made to contribute data, trial data managers (e.g. within Clinical Trials Units (CTU)) will be engaged to prepare datasets. As standard practice with individual participant data sharing models (29), only completely anonymised data will be held in the repository, to minimise the risk of reidenti cation. We will request that all data received will be fully de-personalised (such as converting 'date of birth' to 'age at randomisation'). Full instructions on de-identi cation and how to transfer securely will be provided if necessary.
Additional documents to support datasets will be requested, including the trial protocol and data dictionary. Optional supporting documents will include blank, annotated case report forms, statistical analysis plans, relevant published outputs or grey literature about the trial. We will request evidence of ethical approval and consent procedure (e.g. blank consent and/or assent forms).

Repository Data storage
The Virtual Trials Archive team have developed a DACHA data contribution form (15) where trialists can record information about the trial and complete memorandum of understanding. Following this, the trial dataset and all accompanying les will be transferred in a zipped, password protected folder to University of Glasgow (UG)'s Robertson Centre for Biostatistics (RCB), using the University of Glasgow's File Transfer Protocol, where it will be held securely for the duration of the DACHA study and beyond. As it does for other VTA repositories, the RCB will act as an independent data host, providing common format and access mechanisms. All data will remain on their server and analysed through their secure analysis platform. During the development stage, access to the data will be restricted to the core team (LI, JB & MA), who have undergone necessary data protection and con dentiality training. At the end of the DACHA project, the VTA will act as custodians of the data under the terms of the data transfer agreement.
Data preparation and quality checks When trial data are submitted to the repository, the DACHA co-ordinator (LI) at University of Hertfordshire (UH) will access the server remotely via secure virtual private network. A data checking analysis plan will be developed, outlining procedures and decision rules for data pooling, according to established principles (29). We will query any anomalies, including checks for invalid, out-of-range, or inconsistent items with the trialist (or their nominated study contact) to ensure that the data are represented accurately. Trials may use the same outcome measure but administer it differently. If a measure could be completed e.g. face-to-face with a member of the research team, or as self-report, or as proxy-response from care staff, we will ensure this data is coded in a standardised way. Decisions on standardisation will be made by consensus decisions with the wider TSC or delegated groups e.g. trial statisticians. Where possible, we will request all individual domain levels for outcome measures as opposed to the single, composite scores. All trial datasets will be cross-checked against their respective protocol and statistical analysis plan to con rm how each composite outcome was derived. If the scoring was modi ed, we will seek clari cation from the respective trialists in the TSC for their advice and interpretation on whether the composite outcome data should be removed or amended to enable pooling with other trial datasets. We will record the number and timing of measurement points and ensure all timepoints are labelled consistently.
We anticipate there will be a strong opportunity for methodological research to look at groups of measures, e.g. cognitive assessments, to attempt mapping or potentially harmonising similar variables (30,31). We would encourage external researchers to look at this in the operational phase, however in the development phase we will not attempt to harmonise non-matched data.
We anticipate most RCTs with an economic evaluation component will use a variant of the Client Service Receipt Inventory (CSRI) (32) to record information on resource use and costs alongside the trial. We will request all health service use questionnaires used in the trials and look for differences which may potentially impact ndings. Due to differences in price years and interpretation of unit costs, we will focus on resource use (e.g. number of GP contacts) as opposed to costs (e.g. total cost of GP contacts over the follow-up period). We will request datasets to include missing values where possible, and not the imputed values. In developing the repository, we will not perform any missing data imputation.

Database of trial summaries
We have collated aggregate data available in each trial (generated through protocol papers and funders reports) and will build on this database as new trials are published. A summary of available data will be published on the VTA website, allowing viewers to identify what outcome measures have been collected multiple times, how care home characteristics have been recorded, and contextual aspects of each trial e.g. sample size and follow-up points.
The repository will host trials with a range of clinical focus -it is therefore likely that some measures will be unique to single trials. However, a combination of several key outcome measures -e.g. Barthel; MMSE; EQ5D and DEMQoL (33)(34)(35)(36), are used in almost all RCTs conducted in care homes. Additionally, clinical indicators such as hospitalisations, falls, and death rates are routinely reported (see Appendix 2: Examples of data available from each trial.)

Phase 3: Analysis of pooled data to inform DACHA study objectives
When the initial set of trials have been added and variables prepared for pooling, we will temporarily lock the repository to allow two pre-speci ed analyses: 1. Identi cation of key resident characteristics and outcomes from within the trial repository, which could be used to inform the development of a minimum dataset (MDS) for care homes 2. Comparison of the pooled individual participant data with point estimates from administrative sources to assess the generalisability of RCT data We will prepare a detailed research plan for each analysis, outlining the purpose of the request, objective/research question, plan for statistical analysis, and repository variables requested. This research plan will then be circulated to the Trial Steering Committee for approval, as per future data requests from external analysts.
Informing development of a prototype minimum dataset (MDS) for care homes Brie y: We will expand focus on what clinical, demographic, and outcomes data from trials may be appropriate to include in a care homes MDS framework. We will categorise outcome measures to broad areas, e.g. cognition, anxiety & depression, pain, mobility, activities of daily living (ADLs), and speci c clinical measures, and will focus on pre-speci ed outcome measures, in part identi ed through existing work on evidence reviews (Prospero: CRD42020155923 and CRD42020171323). This identi cation and critique of relevant outcome measures within existing trials will help inform the development of a prototype MDS (25). We will develop a quality assessment criterion to assess proposed outcome measures in terms of: what has been measured -baseline, processes of care, outcomes how data were collected (resident notes, researcher observation/assessment, use of routine data sources) completeness of the data and where data are incomplete, what is the nature of this (i.e. death, unavailable, withdrawn consent, unable to complete, unclear) where outcomes are measured across multiple studies, what are the range of values where outcomes are measured over time, what is their sensitivity to detect change what information may be derived from collected data, e.g. comorbidity scoring based on medication usage Generalisability of trial data Brie y: We will conduct an evidence synthesis of key care home demographic information, by collating data from administrative sources e.g. UK Census, Care Quality Commission. We will report baseline characteristics about care homes and residents as derived from all pooled trial data, tabulated for each individual trial and the pooled dataset. We will then compare point estimates from administrative sources with point estimates from the pooled IPD trial data, to evaluate how generalisable the repository data is, compared to alternative data sources.

Phase 4: Preparing migration to Virtual Trials archive
The VICHTA repository will be a legacy output of the DACHA project -a valuable source of high-quality, anonymised, individual participants' data (IPD) to inform the development of future research, testing of hypotheses and optimisation of study design issues. We took an early decision to store all trial data solely on the University of Glasgow secure server, where the VTA is also stored. This means the repository will already have a permanent 'home' when the DACHA study ends. Management of the repository will be transferred from the DACHA team at University of Hertfordshire (LI, CG), to the VTA team at University of Glasgow (principally the VTA co-ordinator, MA). The VTA will maintain and update the VICHTA repository, and manage requests to access its data, in conjunction with the existing TSC.
Following formal migration to the VTA, external researchers may apply for data extracts, by submitting a project proposal (for review and approval by the TSC) and agreeing to prede ned VTA data sharing terms and conditions (See Appendix 1). At the proposal stage, TSC members may declare an interest in joining the analysis team of a proposed project and take an active role, thereby meeting ICMJE criteria for authorship. All completed analyses will be forwarded to the TSC before submission for presentation or publication for review (see Data Processing Flowchart). The TSC is acknowledged on all publications using "on behalf of VICHTA collaborators" by-line. Active involvement from each TSC member is encouraged but not essential, as data request decisions will be made by a quorum (See Appendix 3: Summary of Development and Operational Phases).

Oversight And Monitoring Data protection considerations {27}
In sharing any form of individual participant data, protection of personal privacy must be upheld (37). A key factor to achieve this is to ensure trial data must be fully anonymised before it is added to the repository, to minimise the risk of reidenti cation. Electronic data will be stored securely on University of Glasgow server and will not be transferred or copied to any other location. Any paper documentation linked to the study will be scanned and stored as electronic data in the DACHA Study OneDrive as well as within the RCB servers. The paper version will then be destroyed. Together with the Data Protection O cer at University of Hertfordshire, we have completed a Data Protection Impact Assessment to cover the research period of the DACHA study.

Research governance {5d}
University of Hertfordshire is the Sponsor for the study and their Ethics Review board has approved this methodology (HSK/SF/UH/04185 approved 18/06/2020). Virtual Trials Archive have overarching University ethical approval for all their repositories and will update this through University of Glasgow to include VICHTA. VTA will ask for inde nite ethics approval, subject to regular but infrequent reports at the discretion of the REC, e.g. 5-yearly, to minimise the administrative burden on both sides.

Data security
Access to data extracts is restricted to individuals who have been granted access by the TSC only. The Robertson Centre for Biostatistics (RCB) is certi ed for ISO 9001:20015 for its Quality Management System and to ISO/IEC 27001:2013 for its Information Security Management System. RCB is audited every six months by the British Standards in Industry (BSI) and is regularly audited by its sponsors and clients both prior to and during studies. RCB has extensive experience of managing data in the context of privacy and data protection legislation, including the Data Protection Act 2018 and EU General Data Protection Regulation. Extensive data security procedures are in place including rewall protection, virus detection, daily backups, routine transaction logging, restricted access, and on-site and off-site re-proof storage of backups.

DACHA project management
The Virtual Trials Archives are coordinated on behalf of the steering committees by MA (a coordinator with more than 10 years of experience in running VTA). During the DACHA study, the TSC will be cochaired by JB (co-investigator on DACHA) and TQ, also based at the University of Glasgow and experienced in chairing other VTA repositories. Chairmanship can be reassigned at the nomination of the TSC. The research team has extensive clinical trials experience and all members are familiar with handling con dential anonymised personal health data.
The DACHA project has an independent Steering Group which will oversee this work package and the wider aims of developing a minimum dataset for care homes. This committee meets twice per calendar year.

PPIE & Public consultation {31a}
Patient and public involvement and engagement (PPIE) for the DACHA study will be led by University of East Anglia, and our expert-by-experience co-applicant (a family carer). PPIE will be represented on the DACHA independent steering group by two carers with family living in care homes. A PPIE panel is planned to work as the hub of PPIE activities, made up of 8-10 people representing care home staff, managers, family carers of care home residents and representatives of people with dementia. This group will meet 4-monthly, initially virtually. The care home resident PPIE contribution will be supported through two groups based in Norfolk care homes.
In addition to the PPIE panel, DACHA will have four regional groups of Expert Consultation groups, meeting annually. During these meetings we will explain what data will be available in the trial repository, and then ask members to identify research topics that may be important for further investigation. Residents, their relatives, care home workers, and managers are better placed to prioritise research questions on a more practical level, therefore this exercise will ensure the right issues are being addressed.

Discussion
Those living in care homes are a vulnerable population and research in this setting is challenging, not least due to high rates of incapacity and dementia (38). Re-use of data is e cient, minimising burden to overstretched staff if it reduces the need for primary data collection. It adds value to the original trial question -while most trials are framed as health research questions, IPD provides the opportunity to address the questions and priorities of social care, including experiences of living and dying in care homes. This protocol de nes the methods to curate a repository of care home trials for IPD analysis. It uses the existing, established infrastructure of the Virtual Trials Archive (1) to create this resource for informing the DACHA study and generating a legacy repository for future researchers. This represents an e cient use of existing research resources, enhancing the value from existing data, and reducing waste (39,40). In the absence of standardised data sets about care home residents, trial data will help us to understand more about this under-researched population. Curating a resource which is based on setting of care, rather than being disease-speci c, is attractive as we recognise that many of the challenges posed by health and care services are in caring for those with complex multimorbidity. Furthermore, there are lessons to improve future trial design, by exploring the value of the assessments and measures used in care home trials, to understand their utility, feasibility and relevance to care home life. Many of these tools were designed for use in community-dwelling adults or those in hospital settings and their applicability to the population living in care homes has yet to be established. IPD analysis can help address these questions which are otherwise unanswered.

Study Status
Protocol version 4.
The project began in January 6, 2020. It is funded via DACHA Study (NIHR127234) until October 30, 2023, after which the repository (VICHTA) will be maintained by the Virtual Trials Archive. We anticipate pooled datasets will be available for sharing by late 2023.