Skip to main content
  • Study protocol
  • Open access
  • Published:

Measuring the effects of a personalized music intervention on agitated behaviors among nursing home residents with dementia: design features for cluster-randomized adaptive trial



Agitated and aggressive behaviors (behaviors) are common in nursing home (NH) residents with dementia. Medications commonly used to manage behaviors have dangerous side effects. NHs are adopting non-pharmacological interventions to manage behaviors, despite a lack of effectiveness evidence and an understanding of optimal implementation strategies. We are conducting an adaptive trial to evaluate the effects of personalized music on behaviors. Adaptive trials may increase efficiency and reduce costs associated with traditional RCTs by learning and making modifications to the trial while it is ongoing.


We are conducting two consecutive parallel cluster-randomized trials with 54 NHs in each trial (27 treatment, 27 control). Participating NHs were recruited from 4 corporations which differ in size, ownership structure, geography, and residents’ racial composition. After randomization, there were no significant differences between the NHs randomized to each trial with respect to baseline behaviors, number of eligible residents, degree of cognitive impairment, or antipsychotic use. Agitated behavior frequency is assessed via staff interviews (primary outcome), required nursing staff conducted resident assessments (secondary outcome), and direct observations of residents (secondary outcome). Between the two parallel trials, the adaptive design will be used to test alternative implementation strategies, increasingly enroll residents who are likely to benefit from the intervention, and seamlessly conduct a stage III/IV trial.


This adaptive trial allows investigators to estimate the impact of a popular non-pharmaceutical intervention (personalized music) on residents’ behaviors, under pragmatic, real-world conditions testing two implementation strategies. This design has the potential to reduce the research timeline by improving the likelihood of powered results, increasingly enrolling residents most likely to benefit from intervention, sequentially assessing the effectiveness of implementation strategies in the same trial, and creating a statistical model to reduce the future need for onsite data collection. The design may also increase research equity by enrolling and tailoring the intervention to populations otherwise excluded from research. Our design will inform pragmatic testing of other interventions with limited efficacy evidence but widespread stakeholder adoption because of the real-world need for non-pharmaceutical approaches.

{2a} Trial registration

ClinicalTrials.govNCT03821844. Registered on January 30, 2019. This trial registration meets the World Health Organization (WHO) minimum standard.

Peer Review reports

{6a} Introduction

Most people with dementia will manifest agitated and/or aggressive behaviors (behaviors) at some point during their disease [1]. These behaviors are a significant source of patient and caregiver distress and can precipitate placement in a nursing home (NH) [2]. In addition to decreasing the quality of remaining life for NH residents with dementia, behaviors can result in injury to other residents [3] and increased staff burnout [4]. Antipsychotic medications, often used to manage such behaviors, increase the risk of death in people with dementia [5]. To improve dementia care, there is a need to identify effective non-pharmaceutical interventions that improve behaviors.

One popular non-pharmaceutical intervention is Music & MemorySM (M&M). M&M is a personalized music program in which the music a resident liked as a young adult is loaded onto a personal music device and administered by NH staff to address agitation [6]. While the mechanism of action is unknown, evidence suggests early musical memories are stored in a part of the brain affected later in dementia [7]. Listening to music may elicit autobiographical memories [8,9,10] and evoke a relaxation response [11, 12]. We hypothesize behaviors resulting from social isolation, depression, confusion, or sensory deprivation [13] may be affected by M&M.

The need for non-pharmaceutical approaches for managing behaviors in residents with dementia has resulted in widespread adoption of M&M ahead of effectiveness evidence. The largest pragmatic, randomized trial of the program to date enrolled 59 residents with dementia from 10 NHs and found no significant decrease in agitation after exposure compared to usual care controls [14]. Weaknesses of that study include small sample size, lack of a measure of behaviors close in time to the intervention, and inadequate implementation (music was only used an average of 9 days a month) [14].

Our study addresses the limitations of previous studies by enrolling over 1200 NH residents from 81 NHs, directly observing residents close in time to delivery of the intervention, and by using an adaptive trial design to test alternative implementation strategies which may improve nursing staff uptake of the intervention. Adaptive trials can increase efficiency and reduce costs associated with traditional RCTs by learning and making modifications to the trial while it is ongoing [15].

{7, 8, 6b} This protocol describes two parallel cluster-randomized, superiority trials designed to test the effectiveness of a personalized music intervention on agitated behaviors among nursing home residents with dementia compared to usual nursing care for behaviors (an appropriate comparator for a pragmatic trial). {7} We will also describe how the adaptive design will be used to test alternative implementation strategies, increasingly enroll residents who are likely to benefit from the intervention, and seamlessly conduct a stage III/IV trial [16]. To our knowledge, this is the first cluster-randomized controlled trial to use an adaptive design.


Methods are reported using SPIRIT guidelines (see Additional file 1 for the checklist) [17].


Potentially eligible NHs from four partnering NH corporations were identified and allowed to opt in.

{10} NHs were potentially eligible if they had at least 20 residents who were long-stay (90 of the last 100 days spent in the NH), had a dementia diagnosis, and were not completely deaf. The number of eligible residents in a NH was determined using the Minimum Data Set (MDS) [11]. MDS data are derived from routine, standardized assessments of residents. These data are similar to electronic health record data that can be used to identify study-eligible patients in large embedded pragmatic trials or quality improvement programs [18].

NH leadership removed potentially eligible NHs with competing demands that would affect successful implementation, including a recent poor inspection or major leadership change. NHs with prior exposure to M&M were also removed. Priority was given to NHs located in a common geographic area to reduce data collection costs. There were 44 potentially eligible NHs for corporation A, 15 for corporation B, 19 for corporation C, and 55 for corporation D. Most NH administrators were interested in participating in the trial; five declined. We enrolled NHs when they returned their letters of commitment, until each corporation reached capacity (A 24 NHs, B 12 NHs, C 15 NHs, D 30 NHs). Capacity was determined by corporation relative size and the desire to have the same number of NHs in each of the arms of the two parallel trials, where control facilities in the first trial are assigned to the treatment group in the second trial.

Study settings

{9, 15} We elicited volunteer NH corporations via the American Health Care Association and approached four NH corporations to participate to assure diversity in size, ownership structure, geography, and residents’ racial composition (Table 1). Two for-profit corporations (one with fewer than 25 eligible NHs (small), one with more than 50 eligible NHs (large)) and two non-profit corporations (one small, one large) were recruited. The Midwestern corporations had predominantly white resident populations, and the mid-Atlantic and Southern corporations had higher proportions of African American residents (40–50%). The corporations also differed in baseline CMS 5-Star quality ratings [19], antipsychotic use, and percent of eligible residents with behaviors. A list of participating corporations can be found at

Table 1 Characteristics of participating corporations and their potentially eligible nursing homes


{11a} Music & MemorySM is a personalized music program in which the music a resident with dementia liked as a young adult is loaded onto a personal music device and administered by NH staff to preempt or reduce agitation [6]. Earphones are used to deliver the personalized music to the residents. From a list of potentially eligible residents, NH staff choose 15 residents to receive the program. NH staff are instructed to use the music at times of day when behaviors were likely or at early signs of agitation. The recommended dose is 30 min a day. {11d} The control condition is usual care, which may include the use of ambient music or group music.


{12} The primary study outcome is agitated behaviors. Agitated behaviors are measured in three ways: researcher-collected staff interviews of NH staff about resident behaviors in the past 2 weeks (primary study outcome); researcher-collected direct observation of residents (secondary outcome); and NH-collected standardized assessment data about resident behaviors in the past week (secondary outcome).

Researcher-collected behavior data include staff interviews (primary outcome) and direct observations (secondary outcome). {18a} Data collectors receive a 3-day intensive training and are required to have weekly phone calls with study staff while in the field. To collect the study primary outcome, the research staff interview a nursing staff member who knows the resident well using the Cohen-Mansfield Agitation Inventory (CMAI) [20], which asks about the frequency of 29 agitated behaviors in the past 2 weeks. Response options for each CMAI item range from never (1) to several times per hour (7). The total CMAI score ranges from 29 to 203. The total CMAI serves as the primary outcome variable in this study. Using the Agitation Behavior Mapping Instrument (ABMI) [21], research staff also observe residents for short intervals (3 min per observation) and record the number of times that 14 specific agitated behaviors occur (range 0–140). Each resident is observed at least four standardized times over the course of each 3-day visit. Both the ABMI and CMAI have been widely used in the NH setting and have high interrater reliabilities (0.88 to 0.93) [20, 21].

NH-collected behavior data captured in the MDS (secondary outcome) includes frequency of physical behavioral symptoms directed toward others, verbal behavioral symptoms directed toward others, other behavioral symptoms not directed toward others, and behaviors related to resisting necessary care [22]. These items are derived from the CMAI domains and include many of the same behaviors. Frequency in the past week is reported as behavior not exhibited, behavior occurred 1–3 days, behavior occurred 4–6 days, or behavior occurred daily. These four behavioral frequency items in the MDS are summed to create the Minimum Data Set Agitated and Reactive Behavior Scale (MDS-ARBS), which has adequate internal consistency [23].

There are several other secondary outcomes. MDS data measure changes in the administration of antipsychotic, anxiolytic, and hypnotic medications. Another secondary outcome of interest is resident mood. The Lawton Observed Emotion Rating Scale (OERS) measures researcher-observed pleasure, anger, anxiety/fear, sadness, and general alertness in NH residents with advanced dementia [24]. Depressed mood is also assessed using a version of the Patient Health Questionnaire (PHQ-9) [25] embedded in the MDS [25].

Data collection, transfer, and monitoring

{19} On-site data will be collected using tablets through data entry systems developed in Qualtrics. Data will be uploaded to the Qualtrics central servers using a secure channel. {27} When entering the study data in Qualtrics, the patient data will only be identified by pre-assigned study identification numbers; no personally identifying information (PII) or existing identifiers (e.g., medical record number, social security number) will be entered. Partnering corporations will also transfer their MDS data to the research institution servers via a secure SFTP protocol with password protection. The information systems manager will be in charge of all data transfers, and he will replace PII with study identification numbers to allow linkage of data for analytic purposes. {29} Data use agreements limit access to participant-level analytic files to the study team. {31c} The full study protocol and statistical code will be made public through the Brown data repository ( {31a} Lay language results will be disseminated with partnering corporations and posted on the Brown University, Center for Long-Term Care Quality & Innovation public-facing website.

Standardizing and monitoring implementation

{11c}This study had a 6-month pilot phase focused on developing and testing a step-by-step implementation guide [26]. The guide provides step-by-step guidance on identifying residents’ preferred music, downloading it onto a personalized music device, and testing and using the music with the resident.

All participating NHs receive two types of training. First, NH staff participate in standard M&M training and certification, which includes two 1.5-h live webinars describing the program. The second in-person training was developed by researchers during the pilot and is administered jointly by corporate leadership and study consultants. This training follows the steps outlined in the implementation guide. Staff required to attend the in-person trainings include the NH administrator, director of nursing, activities director, a nurse manager, and a certified nursing assistant.

Another aspect of the program implementation includes monthly coaching calls with the NHs to monitor progress, troubleshoot problems, and share successes. Monthly coaching calls are led by corporate trainers and study implementation consultants. Participation in monthly calls is tracked; calls are audio recorded.

Adherence is monitored using data from the personalized music devices. For each song on the device, these data document the length in minutes and a count of the times the song was played, yielding an estimate of the amount of exposure to the intervention for each resident.

Interim analyses and stopping

{11b, 21b} There are no formal stopping rules for the two trials. The study may be discontinued at any time by the Institutional Review Board or the National Institute on Aging, as part of their duties to ensure that research participants are protected. {24, 26a}This was deemed a minimal risk study by the Brown University Institutional Review Board, which issued a waiver of individual consent (#1705001793). Given the short implementation period for each trial (8 months), interim analyses were not practical.


{16a, 16b, 16c} NHs were randomized within corporate strata. Within each corporation, NHs were partitioned into triplets based on the Mahalanobis distance from the overall mean [27] on percentage of eligible residents with any agitated or aggressive behavior and number of eligible residents. Balancing was important because behaviors vary considerably at the NH level because of resident composition, staffing, and the degree of “ascertainment” and documentation of agitated behaviors [26, 28], and NHs with more eligible residents can be more selective in who receives the intervention. Within balanced triplets, one NH was randomly assigned to either being in the treatment group in the first parallel trial, being in the control condition in the first parallel trial and treatment group in the second parallel trial, or being in control condition in the second parallel trial. Random assignment was performed by the study statistician (RG). After randomization, there were no significant differences between the NHs randomized to the three groups with respect to baseline behaviors, number of eligible residents, degree of cognitive impairment, or antipsychotic use (Table 2).

Table 2 Characteristics of nursing homes at baseline (post-randomization)


{17a} Only aggregated post-random assignment comparisons of intervention and control NH’s baseline characteristics are viewed by the investigators. The study principal investigator is blinded to the identity of both the control and intervention NHs. {17b} Unblinding during a trial is not permissible.

Sample size

{14} The required number of clusters to reach a pre-specified power was derived such that each of the two parallel trials is adequately powered to detect an effect size of δ. This may result in a conservative sample size estimation of the second parallel trial, because we do not consider the incorporation of the information from the first parallel trial in the sample size calculation. We collect information about resident’s CMAI score before and after the intervention was implemented for each of the two trials. Within each parallel trial, this design is referred to as a cluster-randomized trial with the pretest–posttest design [29]. It has been shown that by adjusting the posttest with the pretest score, the power of the study could be improved [29, 30]. To estimate the required sample size for different effect sizes, we used the formula proposed by Teerenstra et al. [30]. For significance level α and power 1-β, the formula for the required number of residents is:

$$ {n}_{res}=\frac{2{\left({Z}_{1-\frac{\alpha }{2}}+{Z}_{1-\beta}\right)}^2{\sigma}^2}{\delta^2}\left(1+\left(n-1\right)\rho \right)\left(1-{r}^2\right), $$

where Zx is the critical value from a normal distribution at x, σ2 is the variance of the outcome CMAI, δ is the effect size, ρ is the intra-class correlation, n is the number of residents per cluster, and r is the correlation between a cluster means at baseline and at follow-up. To obtain the number of clusters required per arm, we would need nres/n. Assuming a nominal level of α = 0.05 and power of 80%, Table 3 describes the required number of clusters per arm for different effect sizes based on n = 15, σ = 20, ρ = 0.12, and r = 0.5. For a 6-point reduction in the total CMAI score, 24 NHs per study arm are required. To address possibly higher ICC values, non-participation, and lower correlation between the baseline and outcome scores, 27 NHs per study arm are recruited.

Table 3 Number of nursing homes needed for different effect sizes

Statistical methods

{20a, 20b, 20c} First parallel trial analysis

The analytic approach in the first parallel trial is based on the frequency of agitated and aggressive behaviors in a long-stay population with dementia after intended exposure to the intervention (treatment) or after 4 months (control), conditional upon survival to at least one post-intervention observation (up to 4 months after baseline measurement). {18b} Our primary analysis is based upon an intent-to-treat principle, and we estimate complier average causal effect as a secondary analysis. The complier analysis estimates the effects of the intervention for residents who received the music or would have received the music.

Our primary ITT analysis model is based on the model described by Murray and Blistein [29] and Teerenstra et al. [30]. Let Yijk be the staff interview for resident i {1, …, n} from NH j {1, …, J} at time k {baseline, post − exposure} and Xij a set of baseline covariates for resident i from NH j. We assume that Yijk = μijk + ϵijk, where \( {\epsilon}_{ijk}\sim N\left(0,{\sigma}_{\epsilon}^2\right) \), and μijk = μ + α1Iij + α2Tk = 1 + θtXij + δTk = 1Iij + uj + ()j, k + sij. We define Tk = k to be an indicator function that is equal to 1 when k = k′ and 0 otherwise, \( {u}_j\sim N\left(0,{\sigma}_u^2\right) \) is the deviation of cluster j from the overall mean, \( {\left( u\tau \right)}_{j,k}\sim N\left(0,{\sigma}_{u\tau}^2\right) \) represent the variation of each cluster at different time points, \( {s}_{ij}\sim N\left(0,{\sigma}_s^2\right) \)is the variation of individuals, Iij is an indicator for participating in the intervention group, α1 is the difference in baseline averages between control and treated units, α2 is the change from baseline to follow-up for the control cluster means, θa vector of unknown coefficients, and δ is the conditional difference in change from baseline between intervention and control cluster means. The conditional treatment effect is then defined as δ. Individual-level covariates comprise baseline variables. The estimate of interest would be the difference in marginal means.

To estimate the effects among participants that would comply with the intervention, we used a technique described by Jo et al. [31]. Let cij be an indicator that is equal to 1 if resident i in NH j would use the music if provided. We assume that residents who would not be offered the music will not attempt to obtain it on their own. Eligible residents who do not receive the intervention and receive care in an intervention NH are referred to as “non-compliers.” The effects of the intervention would be estimated using, \( {\mu}_{ij}={\beta}_0+{\beta}_c{c}_{ij}+{\alpha}_c{c}_{ij}{I}_{ij}+{\sum}_{l=1}^L{\gamma}_{ij l}{X}_{ij l}+{u}_{nbj}\left(1-{c}_{ij}\right)+{u}_{nwij}\left(1-{\mathrm{c}}_{\mathrm{ij}}\right)+{u}_{cwij}{c}_{ij}+{u}_{cbj}{c}_{ij} \), where the macro-unit residuals unbj (non-compliers) and ucbj (compliers) represent cluster-specific effects given Iijk and Xijl, which are assumed to be normally distributed with zero mean and the between-cluster variances σ2nb (non-compliers) and σ2cb (compliers), respectively. The micro-unit residuals unwij (non-compliers) and ucwij (compliers) are assumed to be normally distributed with zero mean and the within-cluster variance σ2nw (non-compliers) and σ2cw (compliers) and are equal across clusters. The following model for compliance status was assumed:

$$ P\left({C}_{ij}=1\right)=\frac{\mathit{\exp}\left({\sum}_{l=1}^L{\pi}_{ij l}{X}_{ij l}+{\tau}_j\right)}{1+\mathit{\exp}\left({\sum}_{l=1}^L{\pi}_{ij l}{X}_{ij l}+{\tau}_j\right)} $$

where πijl are unknown parameters and \( {\tau}_j\sim N\left(0,{\sigma}_{\tau}^2\right) \) so that the proportion of compliers may vary across clusters. Compliance status is only known in the intervention arm. Thus, a mixture model for compliance status in the control arm would be applied. Using the full likelihood, parameter estimates of the effect among compliers are estimated:

$$ \hat{\delta_c}=\left({\hat{\delta}}_{c1}-{\hat{\delta}}_{c0}\right)/{\rho}_c $$

where \( {\hat{\delta}}_{ct} \) are the average CMAI among compliers in treatment group t and where ρc is the proportion of compliers. \( {\hat{\delta}}_{ct} \) can be obtained from the above models across NHs. The variance of this estimate can be obtained via the delta method or using Markov chain Monte Carlo techniques.

Second parallel trial analysis

A similar model to the one described for the primary ITT analysis in the first parallel trial would be implemented in the second parallel trial. However, to gain efficiency among the control population, we would rely on the meta-analytic-predictive approach [32]. This approach assumes that model parameters for the control population of both trials are exchangeable and are drawn from the same distribution. In this trial, this assumption is appropriate, because all of the facilities were randomized at beginning of the trial and they are treating a similar population of patients. Using data on individuals that reside in control facilities in the first trial can be used to inform estimation of model parameters for individuals in control facilities in the second trial. This method was shown to achieve gain in precision while maintaining type I error [33].

Data safety and monitoring

{21a} An independent data safety and monitoring board (DSMB) with no financial or other competing interests will act in an advisory capacity to the National Institute on Aging (NIA) Director to monitor participant safety, data quality, and progress of the study. {5d} The steering committee, consisting of the principal investigator (VM) and the project director (EM), will have ultimate responsibility for all aspects of the study, including ensuring timely submission of all requested project materials to the funder, serving as the primary liaison between the project and the NH corporations, coordinating tasks among individual working groups, ensuring project milestones are met, and reviewing and approving all publications. Members of the study team who will participate in the semi-annual sessions of the DSMB include the PI (VM), the lead biostatistician (RG), and the project director (EM). The NIA project officer will attend DSMB meetings and serve as the liaison between the DSMB and the funder.


{13} The study timeline is provided as Table 4.

Table 4 Timeline for the adaptive cluster-randomized trial

Reporting harms

{22} The potential adverse events that could occur during this trial are distress or strong negative emotional reactions in response to the intervention or distress or strong negative emotional reactions in response to being observed. NH staff and data collectors are trained to report potential adverse events to the project director (EM). The project director will report potential adverse events to the PI (VM) via email or telephone immediately upon becoming aware of the event. All potential adverse events will be investigated and independently verified by the study geriatrician (JR). Verified adverse events will be reported quarterly to the Data Safety Monitoring Board (DSMB), the Program Officer, and the IRB. Unanticipated harms will be reported to the DSMB, the Program Officer, the Office for Human Research Protections, and the IRB within 24 h of the research team becoming aware of the event. {23} During verification, if it is determined that the event does not meet the criteria for an adverse event or unanticipated problem, the event reporting form and the event verification form will be retained for auditing purposes.

Key design features of the adaptive trial

Test alternative implementation strategies

In a previous trial of M&M, a common implementation barrier cited was a lack of “buy-in” by nursing staff responsible for administering medications [34]. This lack of buy-in may, in part, result from a lack of nursing ownership early in the intervention. The training provided by M&M emphasizes the importance of identifying the songs the resident loved when s/he was a young adult [6]. To accomplish this, M&M recommends talking to family members and individually testing each song with the resident to look for a positive response. This time-consuming, trial and error process is typically completed by activity staff or volunteers [34]. Given activity staff work primarily during the day and do not administer medications, it is unlikely that they will be able to respond to behaviors in real time to reduce pro re nata (PRN) medication use.

However, personalization of the playlists is one of the core components [35] of the M&M intervention. In theory, the intervention works by eliciting memories triggered by music residents loved when they were young adults. There is some preliminary evidence to support that long-stored musical memories are retained into later dementia [7] and resident preferred music may provoke a more visceral reaction than calming music alone [36]. However, there is no evidence to suggest the degree of personalization that is necessary; does the music need to be the resident’s favorite songs or is familiarity sufficient? If familiarity is sufficient to calm behaviors, Spotify or similar streaming services would be a less time-consuming way to deliver the intervention.

To better understand the degree of personalization which is required to potentially affect behaviors in NH residents with dementia, we will test two implementation strategies separately in each of the two parallel trials. The first trial will use a full-personalized approach, in which activity staff test individual songs with residents to look for a positive reaction. Activity staff identify 25–50 songs that the resident appears to like, and the music player is then given to frontline nursing staff to use at early signs of agitation. The second trial will use a partially personalized strategy, in which nursing staff identify residents with behaviors who they think would benefit from the intervention. Then, research staff preload music players based on the demographics of the resident and his/her preferred genre (if known). Music players are sent directly to the nursing staff champion for use at early signs of agitation.

For each parallel trial, we will measure the degree of nursing engagement with the intervention by assessing the proportion of residents who are chosen for the intervention to address agitated behaviors and by asking nursing staff how often in the past week they have used the music with the resident. We will also measure the dose of music that is received under each approach. We will keep all the outcome measurements as close as possible in the two trials, while modifying the intervention delivery to better understand the importance of personalization on behavior and the effect of personalization on nursing use of the intervention.

Increasingly enroll residents who are likely to benefit from the intervention

Each participating NH is provided equipment for 15 residents to be exposed to the M&M program during the 8-month study period. Given that many sites have more than 15 potentially eligible residents, it is important to standardize the process for choosing residents. NHs in treatment and control arms of the parallel design are asked to select and rank order 15 residents to receive the intervention at baseline. Standardized guidance is provided to staff about how to choose and rank these residents. NHs are asked to start with residents who liked music, were visible to staff during the day, and had specific, non-severe behaviors. Early successes are key to moving forward with widespread intervention adoption.

At the intervention midpoint (4 months), NHs are allowed to replace residents from their original lists who had died or been discharged from the NH. At this point, there is a potential for differential selection of replacements between treatment and control NHs because treatment NHs have been using the intervention and learning what type of residents seem to most benefit from the intervention. During the year between parallel trials, we will examine this selection process as well as play data from the music devices to identify resident demographic and clinical characteristics associated with greater use of the music devices and greater likelihood of being chosen by NH staff at the intervention midpoint. At the beginning of the second trial, we will use this information to help NH staff better choose residents who are likely to benefit from the program, a hallmark of adaptive trials [37].

Seamlessly conduct a combination stage III/IV trial

This study was originally designed as a stage IV embedded pragmatic trial (ePCT) [16], a hallmark of which are case and outcome ascertainment using available data sources (MDS and EHR) [38, 39]. However, during the pilot phase of this research, we found considerable under-detection of behaviors in the MDS data [23], raising questions about the sensitivity of MDS data to detect changes in behaviors resulting from the music intervention [26]. The protocol was altered to have researchers visit NHs and collect “gold standard” CMAI measures in addition to the NH-collected measure.

The CMAI and ABMI require researchers to visit NHs, observe residents, and interview staff, an expensive proposition for researchers and a less pragmatic approach than using existing data. To compensate for this under ascertainment, we will develop a statistical measurement model to equilibrate the MDS-ARBS to the CMAI and ABMI resident behavioral data among the treated and control NHs using the complete data set during the first trial. This model will be validated using data from the second trial.

This statistical imputation model will be used two ways. First, we will use the imputation model to address missingness of baseline CMAI in the current trials. Using the estimated relationship between instruments, CMAI and ABMI scores will be multiply imputed for residents for whom only the MDS-ARBS is available [40, 41]. We will rely upon a two-stage imputation procedure allowing us to compare all residents using common instruments, increasing the efficiency of the study design because these two measures are known to be reasonably correlated [42]. Formally, the multivariate ordinal probit model will be used to estimate the relationship between the three different scales (CMAI, ABMI, MDS-ARBS) while adjusting for demographics and other characteristics (e.g., gender, race, physical function, and comorbidities) [41]. Using estimates from these models, CMAI and ABMI will be multiply imputed for residents who are missing a baseline or outcome measures. This will result in K multiple datasets for which CMAI and the results would be combined using common combination rules [43].

Second, we will consider the generalizability to future pragmatic trials of non-pharmaceutical interventions for NH residents with dementia. If we demonstrate that our imputation model is relatively accurate, other researchers could use this model to generate a more sensitive score that can be used in large-scale pragmatic trials of non-drug interventions in this population. This would allow for cost-effective, large-scale evaluation when an intervention lacks effectiveness evidence and simple application of available administrative measures may not be appropriately sensitive.


Using an adaptive study design, we are conducting two parallel, cluster-randomized controlled trials. The adaptive design has three key features: test alternative implementation strategies, increasingly enroll residents who are likely to benefit from the intervention, and seamlessly conduct a stage III/IV trial. To our knowledge, this is the first cluster-randomized trial to utilize an adaptive study design.

The proposed adaptive design has the potential to reduce the research timeline by leveraging enrollment and recruitment for one large study to test two implementation strategies. The current best-practice M&M protocol involves full personalization of the music playlists through individual testing of the songs with the residents with dementia to look for a positive response [6]. While there is some evidence to support that early learned music is better for recall than late learned music [7], and preferred music is better than “calming” music [36], there is no evidence on how personalized the music playlists need to be. The only existing trial of the existing best-practice protocol is small (59 residents) with low adherence (music was used an average of 9 days a month) [14]. The next step of this research is test the same protocol with an adequate sample and increased adherence monitoring. However, qualitative work from the same study suggested that the process for identifying resident preferred music was time-consuming and potentially a barrier to use [34]. The adaptive trial design allows us to test the existing protocol in a larger trial with increased adherence monitoring and to conduct a subsequent trial with a partially personalized music playlist strategy. If partial personalization is sufficient, the intervention could be more readily implemented by nursing staff, which is likely to result in more substitution of the intervention for PRN medications.

Another benefit of this design is that it allows us to better identify who is likely to benefit from the intervention and test that hypothesis within the same trial. Often, we are forced to rely on post hoc subgroup analyses to describe populations who are most likely to be affected by the intervention. These types of analyses are hypothesis generating at best and can lead to spurious results which are often underpowered [44,45,46]. In this adaptive trial, we will use an observed selection from the first trial as well as play data from the music devices to identify resident demographic and clinical characteristics associated with greater use of the music devices and greater likelihood of being chosen by NH staff. We will use this information to guide NH staff on the choice of residents who are likely to benefit from the program for the second trial. As the number and type of sensory and reminiscence therapies for people with dementia grow [47], it is important to be able to identify which non-pharmaceutical alternatives are likely to work for specific individuals [48]. This adaptive feature has the potential to help us better match available interventions to residents.

The combined stage III/IV feature of the adaptive trial design has the potential to produce a scalable, cost-effective solution for dealing with under-detection of outcomes in administrative data. Using routinely collected administrative data to assess outcomes for participants is one way to increase pragmatism in study eligibility and contain study costs [49, 50]. However, administrative data have known biases. In the case of behavioral data, our primary outcome, NH staff normalize the behaviors of residents that they interact with every day and only document the most severe behaviors leading to under-detection in the associated measures [51]. By equating on-site researcher-collected data to available NH-collected administrative data at the resident level, we can derive a more sensitive behavioral score using available administrative data without on-site data collection.

For this trial, we originally proposed a stepped wedge design in which 81 NHs received the intervention over the course of 3 study years (27 NHs per year). Enrollment of residents for the first study year began in June 2019 and ended in January 2020. We were forced to pause the training and roll-out of the intervention in NHs randomized to receive the intervention in the second study year because of the emergency response to the coronavirus pandemic in nursing homes. The stepped wedge trial design is sensitive to confounding by time, particularly when time is correlated with the study outcome due to a secular trend (like the increased agitation which may well have occurred during a national pandemic) [52]. Thus, we believed that the use of a stepped wedge design to complete the remainder of the study was irreparably damaged by this exogenous shock. We revised our study protocol to include the use of an adaptive trial design to conduct two parallel trials. This modified trial protocol was approved by the National Institute on Aging and an independent data safety and monitoring board in December 2020.

This trial has limitations. Interventionists traditionally establish efficacy before testing effectiveness using pragmatic methods [16, 39]. Yet there may be valid reasons to test interventions with limited efficacy under real-world conditions—for example, when there are populations or settings in which it is not possible to obtain traditional efficacy data [53]. We decided to proceed with this trial, in part because there is a pressing need for effective non-pharmaceutical interventions to address dementia-related behaviors in NHs and because traditional efficacy studies systematically fail to enroll complex populations and typically require proxy for consent [54, 55]. Residents with involved proxies differ from typical residents with dementia in important ways, including race [56], that may affect consent and the generalizability of efficacy studies. In such instances, it may be important to accelerate the testing of promising interventions. There are also several characteristics of this trial design which are not fully pragmatic. The PRECIS-2 tool assists researchers to identify and justify the level of pragmatism of their study along nine relevant domains [38]. Our trial is highly pragmatic in six of the nine trial domains (recruitment, setting, delivery, adherence, outcome, and analysis), reflecting the flexibility of real-world implementation and the primary intent-to-treat analysis. The trial is less pragmatic in three PRECIS-2 domains—follow-up, organization, and eligibility. Our deviations from full pragmatism are direct results of piloting our implementation, measurement, and recruitment strategies. We argue that fully pragmatic trials are rare [57, 58], and piloting helps researchers understand where compromises must be made along the explanatory–pragmatic continuum to maintain the integrity of the research [59].

This design has the potential to reduce the research timeline by leveraging enrollment and recruitment for one large study to test two implementation strategies, increasingly enroll residents who are likely to benefit from the intervention, and addressing known limitations associated with using administrative data to evaluate behavioral outcomes. Similar approaches may be of interest to funders, researchers, and clinicians serving populations in need of timely solutions to real-world problems.

Availability of data and materials

The datasets generated from the proposed study will be made available in the Brown University data repository:


  1. Zuidema S, Koopmans R, Verhey F. Prevalence and predictors of neuropsychiatric symptoms in cognitively impaired nursing home patients. J Geriatr Psychiatry Neurol. 2007;20:41–9. 2007/03/08.

    Article  PubMed  Google Scholar 

  2. Phillips VL and Diwan S. The incremental effect of dementia-related problem behaviors on the time to nursing home placement in poor, frail, demented older people. J Am Geriatr Soc 2003; 51: 188-193. 2003/02/01.

  3. DeBois KA, Evans SD, Chatfield SL. Resident-to-resident aggression in long-term care: analysis of structured and unstructured data from the National Violent Death Reporting System, 2003-2016. J Appl Gerontol. 2019:733464819863926. 2019/07/20.

  4. Kandelman N, Mazars T, Levy A. Risk factors for burnout among caregivers working in nursing homes. J Clin Nurs. 2018;27(1-2):E147–53.

    Article  PubMed  Google Scholar 

  5. Maust DT, Kim HM, Seyfried LS, et al. Antipsychotics, other psychotropics, and the risk of death in patients with dementia: number needed to harm. JAMA Psychiatry. 2015;72:438–45. 2015/03/19.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Why Get Certified. Music & Memory. , Accessed 1 Dec 2019.

  7. Jacobsen JH, Stelzer J, Fritz TH, Chételat G, la Joie R, Turner R. Why musical memory can be preserved in advanced Alzheimer’s disease. Brain. 2015;138(8):2438–50.

    Article  PubMed  Google Scholar 

  8. Foster NA, Valentine ER. The effect of auditory stimulation on autobiographical recall in dementia. Exp Aging Res. 2001;27:215–28. 2001/07/10.

    Article  CAS  PubMed  Google Scholar 

  9. Irish M, Cunningham CJ, Walsh JB, et al. Investigating the enhancing effect of music on autobiographical memory in mild Alzheimer’s disease. Dement Geriatr Cogn Disord. 2006;22:108–20. 2006/05/24.

    Article  PubMed  Google Scholar 

  10. El Haj M, Fasotti L, Allain P. The involuntary nature of music-evoked autobiographical memories in Alzheimer’s disease. Conscious Cogn. 2012;21:238–46. 2012/01/24.

    Article  PubMed  Google Scholar 

  11. Peck KJ, Girard TA, Russo FA, et al. Music and memory in Alzheimer’s disease and the potential underlying mechanisms. J Alzheimers Dis. 2016;51:949–59. 2016/03/12.

    Article  PubMed  Google Scholar 

  12. de la Rubia Ortí JE, García-Pardo MP, Iranzo CC, et al. Does music therapy improve anxiety and depression in Alzheimer’s patients? J Altern Complement Med. 2018;24:33–6. 2017/07/18.

    Article  PubMed  Google Scholar 

  13. Cohen-Mansfield J, Dakheel-Ali M, Marx MS, Thein K, Regier NG. Which unmet needs contribute to behavior problems in persons with advanced dementia? Psychiatry Res. 2015;228(1):59–64.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Kwak J, Anderson K, O'Connell VK. Findings from a prospective randomized controlled trial of an individualized music listening program for persons with dementia. J Appl Gerontol. 2018:733464818778991. 2018/06/07.

  15. Thorlund K, Haggstrom J, Park JJ, et al. Key design considerations for adaptive clinical trials: a primer for clinicians. bmj. 2018;360:k698.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Onken LS, Carroll KM, Shoham V, et al. Reenvisioning clinical science: unifying the discipline to improve the public health. Clin Psychol Sci. 2014;2:22–34. 2014/01/01.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Chan A-W, Tetzlaff JM, Gøtzsche PC, et al. SPIRIT 2013 explanation and elaboration: guidance for protocols of clinical trials. Bmj. 2013;346(jan08 15).

  18. Mor V, Volandes AE, Gutman R, et al. PRagmatic trial Of Video Education in Nursing homes: the design and rationale for a pragmatic cluster randomized trial in the nursing home setting. Clin Trials. 2017;14:140–51. 2017/01/11.

    Article  PubMed  PubMed Central  Google Scholar 

  19. CMS. Five-star quality rating system. Five-Star Quality Rating System. Retrieved October 1, 2021, from

  20. Cohen-Mansfield J, Marx MS, Rosenthal AS. A description of agitation in a nursing home. J Gerontol. 1989;44:M77–84. 1989/05/01.

    Article  CAS  PubMed  Google Scholar 

  21. Cohen-Mansfield J, Werner P, Marx MS. An observational study of agitation in agitated nursing home residents. Int Psychogeriatr. 1989;1:153–65. 1989/01/01.

    Article  CAS  PubMed  Google Scholar 

  22. Saliba D, Buchanan J. Development and validation of a revised nursing home assessment tool: MDS 3.0. Santa Monica: Rand Corp; 2008.

  23. McCreedy E, Ogarek JA, Thomas KS, et al. The minimum data set agitated and reactive behavior scale: measuring behaviors in nursing home residents with dementia. J Am Med Dir Assoc. 2019;20:1548–52. 2019/11/05.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Lawton MP, Van Haitsma K, Klapper J. Observed affect in nursing home residents with Alzheimer’s disease. J Gerontol B Psychol Sci Soc Sci. 1996;51:P3–14. 1996/01/01.

    Article  CAS  PubMed  Google Scholar 

  25. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16:606–13. 2001/09/15.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. McCreedy EM, Yang X, Baier RR, et al. Measuring effects of nondrug interventions on behaviors: Music & Memory pilot study. J Am Geriatr Soc. 2019;67:2134–8. 2019/07/14.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Rencher AC, Schimek M. Methods of multivariate analysis. Comput Stat. 1997;12:422.

    Google Scholar 

  28. Saliba DBJ. Development and validation of a revised nursing home assessment tool: MDS 3.0; 2008.

    Google Scholar 

  29. Murray DM, Blistein JL. Methods to reduce the impact of intraclass correlation in group-randomized trials. Eval Rev. 2003;27:79–103. 2003/02/06.

    Article  PubMed  Google Scholar 

  30. Teerenstra S, Eldridge S, Graff M, et al. A simple sample size formula for analysis of covariance in cluster randomized trials. Stat Med. 2012;31:2169–78. 2012/04/13.

    Article  PubMed  Google Scholar 

  31. Jo B, Asparouhov T, Muthen BO. Intention-to-treat analysis in cluster randomized trials with noncompliance. Stat Med. 2008;27:5565–77. 2008/07/16.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Neuenschwander B, Capkun-Niggli G, Branson M, Spiegelhalter DJ. Summarizing historical information on controls in clinical trials. Clin Trials. 2010;7(1):5–18.

    Article  PubMed  Google Scholar 

  33. van Rosmalen J, Dejardin D, van Norden Y, Löwenberg B, Lesaffre E. Including historical data in the analysis of clinical trials: is it worth the effort? Stat Methods Med Res. 2018;27(10):3167–82.

    Article  PubMed  Google Scholar 

  34. Kwak J, Ha J-H, O’Connell VK. Lessons learned from the statewide implementation of the Music & Memory program in nursing homes in Wisconsin in the USA. Dementia. 2020;1471301220962234(5):1617–30.

    Article  Google Scholar 

  35. Jolles MP, Lengnick-Hall R, Mittman BS. Core functions and forms of complex health interventions: a patient-centered medical home illustration. J Gen Intern Med. 2019;34(6):1032–8.

    Article  Google Scholar 

  36. Gerdner LA. Effects of individualized versus classical “relaxation” music on the frequency of agitation in elderly persons with Alzheimer’s disease and related disorders. Int Psychogeriatr. 2000;12(1):49–65.

    Article  CAS  PubMed  Google Scholar 

  37. Pallmann P, Bedding AW, Choodari-Oskooei B, Dimairo M, Flight L, Hampson LV, et al. Adaptive designs in clinical trials: why use them, and how to run and report them. BMC Med. 2018;16(1):1–15.

    Article  CAS  Google Scholar 

  38. Loudon K, Treweek S, Sullivan F, et al. The PRECIS-2 tool: designing trials that are fit for purpose. Bmj-Brit Med J. 2015;350:ARTN h2147.

    Article  Google Scholar 

  39. Baier RR, Jutkowitz E, Mitchell SL, et al. Readiness assessment for pragmatic trials (RAPT): a model to assess the readiness of an intervention for testing in a pragmatic trial. BMC Med Res Methodol. 2019;19:156. 2019/07/20.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Gu C, Gutman R. Combining item response theory with multiple imputation to equate health assessment questionnaires. Biometrics. 2017;73:990–8. 2016/12/10.

    Article  PubMed  Google Scholar 

  41. Gu CY, Gutman R. Development of a common patient assessment scale across the continuum of care: a nested multiple imputation approach. Ann Appl Stat. 2019;13(1):466–91.

    Article  Google Scholar 

  42. Treiman DJ, Bielby WT, Cheng MT. Evaluating a multiple-imputation method for recalibrating 1970 US census detailed industry codes to the 1980 standard. Sociol Methodol. 1988:309–45.

  43. Rubin DB. Multiple imputation for nonresponse in surveys. New York: Wiley; 1987.

    Book  Google Scholar 

  44. Tanniou J, Van Der Tweel I, Teerenstra S, et al. Subgroup analyses in confirmatory clinical trials: time to be specific about their purposes. BMC Med Res Methodol. 2016;16(1):1–15.

    Article  Google Scholar 

  45. Schühlen H. Pre-specified vs. post-hoc subgroup analyses: are we wiser before or after a trial has been performed?. Eur Heart J. 2014;35(31):2055–7.

  46. Naggara O, Raymond J, Guilbert F, Altman DG. The problem of subgroup analyses: an example from a trial on ruptured intracranial aneurysms. Am J Neuroradiol. 2011;32(4):633–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Scales K, Zimmerman S, Miller SJ. Evidence-based nonpharmacological practices to address behavioral and psychological symptoms of dementia. The Gerontologist. 2018;58(suppl_1):S88–S102.

    Article  PubMed  PubMed Central  Google Scholar 

  48. Cohen-Mansfield J, Thein K, Marx MS, Dakheel-Ali M, Freedman L. Efficacy of nonpharmacologic interventions for agitation in advanced dementia: a randomized, placebo-controlled trial. J Clin Psychiatry. 2012;73(09):1255–61.

    Article  PubMed  Google Scholar 

  49. Zimmerman S, Sloane PD. Making pragmatic trials pragmatic in post-acute and long-term care settings. J Am Med Dir Assoc. 2019;20:107–9. 2019/01/30.

    Article  PubMed  Google Scholar 

  50. Lipman PD, Loudon K, Dluzak L, et al. Framing the conversation: use of PRECIS-2 ratings to advance understanding of pragmatic trial design domains. Trials. 2017;18:532. 2017/11/12.

    Article  PubMed  PubMed Central  Google Scholar 

  51. Bharucha AJ, Vasilescu M, Dew MA, et al. Prevalence of behavioral symptoms: comparison of the minimum data set assessments with research instruments. J Am Med Dir Assoc. 2008;9:244–50. 2008/05/07.

    Article  PubMed  PubMed Central  Google Scholar 

  52. Hemming K, Taljaard M, Forbes A. Analysis of cluster randomised stepped wedge trials with repeated cross-sectional samples. Trials. 2017;18(1):1–11.

    Article  Google Scholar 

  53. Nipp RD, Yao NA, Lowenstein LM, et al. Pragmatic study designs for older adults with cancer: report from the U13 conference. J Geriatr Oncol. 2016;7:234–41. 2016/05/21.

    Article  PubMed  Google Scholar 

  54. Shepherd V, Wood F, Griffith R, et al. Protection by exclusion? The (lack of) inclusion of adults who lack capacity to consent to research in clinical trials in the UK. Trials. 2019;20:474. 2019/08/07.

    Article  PubMed  PubMed Central  Google Scholar 

  55. Baskin SA, Morris J, Ahronheim JC, et al. Barriers to obtaining consent in dementia research: implications for surrogate decision-making. J Am Geriatr Soc. 1998;46:287–90. 1998/03/26.

    Article  CAS  PubMed  Google Scholar 

  56. McCreedy E, Loomer L, Palmer JA, et al. Representation in the care planning process for nursing home residents with dementia. J Am Med Dir Assoc. 2018;19:415–21. 2018/03/15.

    Article  PubMed  PubMed Central  Google Scholar 

  57. Dal-Ré R, Janiaud P, Ioannidis JP. Real-world evidence: how pragmatic are randomized controlled trials labeled as pragmatic? BMC Med. 2018;16(1):49.

    Article  PubMed  PubMed Central  Google Scholar 

  58. Ford I, Norrie J. Pragmatic trials. N Engl J Med. 2016;375(5):454–63.

    Article  PubMed  Google Scholar 

  59. Brody AA, Barnes DE, Chodosh J, Galvin JE, Hepburn KW, Troxel AB, et al. Building a national program for pilot studies of embedded pragmatic clinical trials in dementia care. J Am Geriatr Soc. 2020;68(S2):S14–20.

    Article  PubMed  PubMed Central  Google Scholar 

Download references


Not applicable

Trial status

Protocol version 3.0 was drafted in September 2020 and approved by the National Institute on Aging and the independent data safety and monitoring board in December 2020. Recruitment began in June 2019. The expected end of recruitment is January 2022. This protocol is being submitted mid-trial due to a necessary change in study protocol in response to the devastating effects of the coronavirus pandemic on nursing homes.

{4} Funding

This work is supported by the National Institute on Aging (Grant #: R33AG057451). {5c} The sponsor did not have a role in the study design, collection, management, analysis, and interpretation of the data; the writing of the report; and the decision to submit the report for publication. The authors have the ultimate authority over these activities.

{5b}National Institute on Aging

Building 31, Room 5C27

31 Center Drive, MSC 2292

Bethesda, MD 20892

Author information

Authors and Affiliations



{31b} Authorship requirements include making significant contributions to the conceptualization, writing, analyses, or editing of the manuscript. EM, VM, and RG conceptualized and drafted the manuscript. EM and JO analyzed the data and created tables. RB, JR, KT, FD, and RU provided substantial edits to the manuscript. All authors read and approved the final manuscript. We do not plan to use professional writers.

Corresponding author

Correspondence to Ellen M. McCreedy.

Ethics declarations

Ethics approval and consent to participate

This was deemed a minimal risk study by the Brown University Institutional Review Board, which issued a waiver of individual consent (#1705001793).

Consent for publication

Not applicable

Competing interests

{28} The principal investigators, authors, and study sites declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

{3} Protocol Version: 9/23/20, version 3. {25} All protocol amendments must be approved by the National Institute on Aging and the independent Data Safety and Monitoring Board.

Supplementary Information

Additional file 1

SPIRIT Checklist for Trials.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

McCreedy, E.M., Gutman, R., Baier, R. et al. Measuring the effects of a personalized music intervention on agitated behaviors among nursing home residents with dementia: design features for cluster-randomized adaptive trial. Trials 22, 681 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: