A meta-review of evidence on heart failure disease management programs: the challenges of describing and synthesizing evidence on complex interventions

Background Despite favourable results from past meta-analyses, some recent large trials have not found Heart Failure (HF) disease management programs to be beneficial. To explore reasons for this, we evaluated evidence from existing meta-analyses. Methods Systematic review incorporating meta-review was used. We selected meta-analyses of randomized controlled trials published after 1995 in English that examined the effects of HF disease management programs on key outcomes. Databases searched: MEDLINE, EMBASE, Cochrane Database of Systematic Reviews (CDSR), DARE, NHS EED, NHS HTA, Ageline, AMED, Scopus, Web of Science and CINAHL; cited references, experts and existing reviews were also searched. Results 15 meta-analyses were identified containing a mean of 18.5 randomized trials of HF interventions +/- 10.1 (range: 6 to 36). Overall quality of the meta-analyses was very mixed (Mean AMSTAR Score = 6.4 +/- 1.9; range 2-9). Reporting inadequacies were widespread around populations, intervention components, settings and characteristics, comparison, and comparator groups. Heterogeneity (statistical, clinical, and methodological) was not taken into account sufficiently when drawing conclusions from pooled analyses. Conclusions Meta-analyses of heart failure disease management programs have promising findings but often fail to report key characteristics of populations, interventions, and comparisons. Existing reviews are of mixed quality and do not adequately take account of program complexity and heterogeneity.


Background
Heart failure (HF) disease management programs are common in North America, Europe, and Australia [1,2]. These services provide care to optimize pharmacological regimen and support medication management and effective self-care. Programs have been widely introduced following recommendations from international clinical guidelines [1,3,4] but a number of recent and comparatively large trials have found no or small benefits from programs [5][6][7][8][9][10]. These inconsistencies have been explained by design issues rather than biases, reporting inadequacies or differences in actual effects [11,12]. However, recent results from the United States of the Medicare Health Support Pilot Program (MHSPP) [13] provide corroboration that program effects are poorly understood. This independent randomized trial of nine disease management programs with 30,000 patients with heart failure and diabetes concluded that programs did not decrease mortality, frequency of hospitalization, costs, or improve self-care, self-care efficacy, or mental and physical health [13].
These results raise questions about what clinicians should do in the light of contradictory evidence from trials and meta-analyses. When results from trials differ, it should not be concluded that an intervention is ineffective because most trials are underpowered to identify true effects [14]. Meta-analyses can overcome this lack of power but are as prone to reporting and design flaws as any other type of research design [15]. Though findings from meta-analyses frequently influence guidelines, like any other research design, as the recent PRISMA guidelines acknowledge, systematic reviews can vary widely in quality [16,17].
Thus, the methods and overall quality of meta-analysis are of great importance. Despite this, there has been no systematic appraisal of the quality of meta-analyses of heart failure management programs to date. This is particularly important given the increasing awareness of the complexity and diversity of these programs [18]. To evaluate the strength of evidence from current metaanalyses of these programs, we appraised the nature and quality of evidence from existing published meta-analyses of HF disease management programs.

Methods
Meta-review was used to identify and appraise evidence from published meta-analyses of heart failure disease management programs or approaches. Meta-review appraises and synthesises findings from systematic reviews, in this instance, from meta-analyses [19]. The approach has evolved in response to the growing number of systematic reviews and the need to appraise quality of a review before application to practice and policy, for example via PRISMA [17].
Meta-review follows similar principles to systematic review [19]: it involves a comprehensive and detailed search of the literature for relevant studies with quality assessment to assess for bias, transparency, and comprehensiveness [19]. As with traditional systematic review, in meta-review, validation of quality by a second, independent reviewer is important to reduce potential for bias [19].
A comprehensive search was done to identify metaanalyses of randomized controlled trials published in English that examined the effects of HF disease management programs on key outcomes. To be included, reviews had to have a detailed and comprehensive search strategy (as identified by: naming of databases and years of searching and example or actual terms), contain data on study quality and make reference to synthesis of findings either by pooling data or rejecting the pooling of data. Due to changes in clinical practice, and to ensure some degree of congruence with contemporary clinical practice, we searched only for meta-analyses published after 1995, confined our search to reviews that contained comparisons of programs with usual care, and included samples of adults over the age of 18 years with confirmed diagnosis of HF. Meta-analyses of interventions that included patients with other forms of cardiac disease (such as cardiac rehabilitation or secondary prevention) that may have addressed heart failure disease management were not included due to the lack of data specific to heart failure populations in these reviews [20,21]. Finally, the meta-analyses had to contain extractable data for HF patients on mortality (all-cause or HF related), hospital (re)admission (allcause and HF related), or health-related quality of life.
For the purposes of the review, interventions were defined as HF management programs if they consisted of more than one recognized disease management component (medication optimization, lifestyle modification, or education) with the purpose of improving outcomes related to HF in patients with a confirmed diagnosis or were self-identified by the authors as constituting a program or analogous health service intervention beyond usual care for the treatment of HF.
A variety of electronic databases using a range of search terms (Table 1) were searched, including: MED-LINE, EMBASE, Cochrane Database of Systematic Reviews (CDSR), DARE, NHS EED, NHS HTA, Ageline, AMED, Scopus, Web of Science and CINAHL from 1 st January 1995 to July 31, 2008. In addition, reference lists and bibliographies of identified reviews were hand searched.
The primary screening was conducted independently by LS and AMC with abstracts/titles being screened fully. Full papers for potential inclusion were then screened by LS and AMC for detailed evaluation with disagreements regarding eligibility being handled with joint discussion between LS, AMC, and DRT.
Data were extracted onto a standardized data extraction template relating to: population, intervention, comparison, and outcome (PICO). This approach has been developed for optimizing evidence-based practice. Quality of each meta-analysis was assessed independently by LS and AMC using a standardized and valid measure of quality of systematic review (AMSTAR) [22].

Results
4529 potential articles were initially identified ( Figure 1) but primary screening excluded 4285 papers. After reviewing the remaining papers (n = 244), 15 meta-analyses met the inclusion criteria ( Table 2).   The 15 meta-analyses (Table 2) contained a mean of 18.5 randomized trials +/-10.1 (range: 6 to 36) and a mean of 3267.4 patients +/-2184.0. Two reviews did not report sample size [9,23]. Overall quality of the meta-analyses based on AMSTAR criteria [22] was moderate but varied widely (Mean Score = 6.4 +/-1.9; range 2-9). Main weaknesses in the reviews were lack of incorporation of study quality in conclusions and low detail regarding excluded studies (Additional file 1).

Populations
Mean age of the review population was calculated in two reviews [28,31] (both mean age: 73 years) with the oldest reported mean age being 81.6 [23]. Seven reviews [23,26,27,30,33,34,36] reported an upper age limit of 80 years. The lowest mean age reported was 56 by five reviews by way of inclusion of the same trial [24,30,32,34,36]. Two additional reviews reported lower mean age limits of 57 and 58 [25,35] but none presented data on standard deviation of ages.
Six reviews [23,25,26,30,35,36] provided no data on the sex of the participants in the trials. Co-morbidities and characteristics of study populations were frequently not reported with particular weaknesses in reporting of medication treatments (Table 3). Of the four studies that did report co-morbidities, [26,31,36,37] hypertension, diabetes, chronic obstructive lung disease, and coronary artery disease were most common.

Definitions of trials
Reviews most frequently used operationalised definitions (Table 4) to guide inclusion of interventions, though only three used definitions involving approach, personnel, setting, and content [23,26,27]. The foci of reviews differed markedly, for example, reviews specified interventions provided only in particular settings, [23,[25][26][27] or without reference at all to content [25,34,37].

Program Content
The reviews specified a mean of 1.13 essential components of content (range 0 to 3). Interventions were described in terms of content using general descriptors, such as education, self-care, discharge plan, and medication support. Reviews most commonly stated that interventions had to have three or four component items though reviews could extend to five or more content components [26,30,37]. Educational and monitoring interventions were the most commonly identified elements. Other components included support at hospital discharge, medication review, and social support. Hence, a degree of overlap existed across settings. For example, a systematic review may focus on a nurse-led hospitalbased intervention yet offers home visits, telephone support, and follow-up with a general practitioner [23]. Obtaining data on usual care was noted to be problematic [23,[27][28][29]32,35] and the care provided to comparison groups was poorly defined (Table 5). For example, in seven of twelve trials in one review, descriptions of care were omitted entirely [35].

Outcomes
The follow-up period was 3 to 12 months in six reviews [24,27,29,31,33,36]. Three studies reported beginning follow-up periods at three months but the upper limit extended to 16, 18, and 22 months [25,32,28]. Other reviews did not report length of follow-up [34] or did not report follow-up periods [23].

Definitions of interventions included in reviews
Due to the limited reporting of interventions and control groups and the diversity of trials included in the reviews, it is not appropriate to pool outcomes from the meta-analyses here. This is important because findings from interventions that are excessively heterogeneous should not be pooled. Particularly, this was the case with these meta-analyses that varied and/or contained unclear data pertaining to a wide range of factors and strata of programs, for example, relating to clinical populations, providers, location, mode of delivery, numbers of components, and length. These multiple ambiguities made pooling, sensitivity analysis, and metaregression inappropriate [38][39][40].

Discussion
This meta-review is the first of meta-analyses of HF disease management programs and conveys the challenges of performing meta-analyses of complex health services interventions. Overall, quality of the reviews was moderate though very mixed across reviews -this quality is important to consider when deciding whether review findings should guide practice and guidelines [22,41,42].
Based on the consistency and size of effect sizes identified by the meta-analyses, it would immediately appear reasonable to conclude either that, in generality, programs work or that programs of various types work [43]. However, this meta-review supports concerns that populations, programs, and analyses of these programs are inconsistently and poorly described [44,45]. For Table 6 Effect sizes of primary outcomes of reviews (95% Confidence Intervals)

Review (reference number)
All cause mortality All cause re-hospitalization HF-related hospitalization example, studies were poorly described in terms of populations and treatments with only one-fifth of reviews defining programs comprehensively in terms of approach, personnel, setting, and content. Even with the use of operationalised definitions to guide study selection in reviews, findings from interventions with very diverse characteristics and populations were pooled and, though mentioned in reviews, the implications of trial quality or statistical, clinical or methodological heterogeneity were seldom actually taken into account in analyses. No progress over time was evident in quality of reporting. Hence, reviews continue to focus on the results of study pooling over issues related to program complexity and heterogeneity. Why might program complexity and heterogeneity be comparatively neglected in comparison to the findings of reviews? Firstly, this emphasis is understandable due to limitations in methodology. Complex interventions are often poorly described in published manuscripts [46] and it is well known that HF disease management programs are complex and diverse [43,45,47]. Current statistical and methodological techniques to describe and analyse such interventions in systematic review remain rudimentary [48]. Current meta-analyses also predate the existence of a taxonomy to classify HF disease management programs [18] and more extensive CONSORT reporting requirements for non-pharmacological trials [49].
Secondly, scientific findings that are more positive are more likely to be published in higher impact journals  and cited more often in guidelines [50,51]. This reduces incentives to qualify results to take account of 'messy' issues related to program diversity and heterogeneity and fosters a disproportionate emphasis on positive findings without qualification [52] or recognition of how elements of context may moderate intervention effects [53]. This tendency may be combined with a wider perceived political need to champion multi-disciplinary health services interventions to attain greater recognition and usage of such interventions in healthcare systems seen to favour pharmacological interventions and biomedicine [54]. However, paradoxically, ignoring complexity and heterogeneity may actually reduce knowledge translation. This follows because uptake is likely to be reduced by unclear descriptions of what programs and comparison groups consist of, lack of clarity over likely benefits in important patient groups (for example: the effects of both age and sex on program outcomes are not known), and lack of specificity in findings regarding key program characteristics [16,53].
In future reviews, programs should be described comprehensively using systematic classification methods [18]. More sophisticated taxonomies are needed to fully capture the deeper characteristics of programs [48]. These should be used in future reviews to describe programs comprehensively and the effects of clinical, methodological, and statistical heterogeneity -as per PRISMA guidelines -must be formally taken into account in methods and conclusions [15]. Future trials should report key elements of populations, interventions, comparison group, and outcomes in accordance with the modified CONSORT statement for non-pharmacological trials [49]. These factors should be incorporated and reported comprehensively in meta-analyses. Findings from meta-analyses should be evaluated prior to application to practice and policy with review quality being assessed using valid quality criteria [15].
In terms of limitations, as with any review, this metareview was constrained by the quality of reporting of the component studies. The data presented here are descriptive because it was inappropriate to synthesise outcomes to generate pooled effect sizes due to the wide diversity of programs subsumed in the reviews and the lack of comprehensive reporting in the reviews of intervention, comparator groups, and population characteristics [55,56]. As pivotal elements of programs, reporting of these components has to be clear and comprehensive if synthesis is to be undertaken.

Conclusions
Meta-analyses of heart failure disease management programs have promising findings but often fail to report key characteristics of populations, interventions, and comparisons. Existing reviews are of mixed quality and do not adequately take account of program complexity and heterogeneity.

Additional material
Additional file 1: AMSTAR Quality Ranking of Included Studies. Quality assessment ratings for each item on the AMSTAR tool for each review.
Abbreviations HF: heart failure