This study assessed the methods and reporting of information on the applicability of trial results in systematic reviews that might aid in applying their results. We assessed 98 systematic reviews of research into interventions aimed at reducing or stopping tobacco use and treating or preventing HIV infection published during a recent 10-year period. The applicability of results was poorly reported and taken into account in these systematic reviews.
These results and our finding of lack of information on cultural and socioeconomic contexts, patient characteristics, and the content of the interventions in the reviews questions how decision makers and clinicians can use the results of such reviews [16, 19, 29, 37–39]? This situation is particularly problematic in the fields we studied because more than half of the interventions concerned nonpharmacologic treatments such as behavioural interventions, which are complex and difficult to reproduce in clinical practice, and the socioeconomic and cultural contexts are important for their success in clinical practice.
Considering that applicability is essential for the developers of guidelines to grade the strength of recommendations, the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system for grading the evidence of clinical guidelines clearly tackles this issue .
This grading system separates decisions regarding the quality of evidence (mainly considering the internal validity of the studies) from strength of recommendations (i.e., taking into account the risk-benefit balance). The strength of the recommendations are likely to differ by practice settings or patient group. For example, in the field of cardiovascular risk management, randomized controlled trial-based evidence was downgraded most often because of reservations about the applicability of the trial results.
Most of the effort of methodological research in the field of systematic reviews, particularly the work by the Cochrane Collaboration, has focused on the evaluation of internal validity. The results of these efforts emphasize a better consideration of internal validity in systematic reviews performed by the Cochrane Collaboration [43, 44]. However, evaluating applicability of results is of similar importance. In the field of HIV interventions for example, Merson et al. highlighted "the lack of [...] contextual data to tailor specific interventions is reprehensible, particularly in view of the large amount of resources that have been invested to date in HIV prevention efforts, and hinders policy makers' ability to make informed decisions on prevention priorities". The Applicability and Recommendations Methods Group (ARMG) of the Cochrane Collaboration is nevertheless tackling this important but difficult issue, and recommendations are pending. To add to this discussion, from our results, we propose 3 recommendations for performing systematic reviews and 3 recommendations for methodological research (see Additional file 3).
Assessing applicability and external validity is difficult. As well, deciding which items are relevant and should be reported is difficult. Further, the importance of some items may vary by context (e.g., assessing pharmacologic treatments or nonpharmacologic treatments). Therefore, when planning a systematic review, the protocol should define which applicability items are important and should be collected and reported. Not all of the applicability items we evaluated necessarily interact with effect size. However, methodological work evaluating the impact of applicability on effect size is lacking, and therefore, making a definitive statement on this issue is difficult. Further, even if some applicability items do not interact with effect size, details of applicability items must be provided to allow clinicians, patients and decision makers decide whether and how they will apply the results in clinical practice. Items identified as possibly interacting with treatment effect estimates should be offered as a priori explanations of heterogeneity, and an exploration of whether treatment differs across these characteristics should be undertaken. Other items aimed at helping readers appraise the applicability of the trials in their context should be reported. Online addenda now provide a great opportunity to adequately describe the included studies for interested readers without burdening every reader.
The reporting of data related to external validity is now clearly indicated in the PRISMA statement for reporting systematic reviews and meta-analyses. The PRISMA statement clearly focuses on the need to consider components to frame the question known by the acronym "PICOS" (Patient, Intervention, Comparator group, Outcome, and Study design). Focusing on PICOS, the statement should improve the reporting of external validity. In fact, issues related to PICOS affect several PRISMA items with the need to clearly describe participants, the disease, the setting of care, the intervention, and the comparator.
One explanation for the differences between Cochrane and non-Cochrane reviews could be linked to the space constraints (limited word count, number of tables and figures) requested by some editors but not by publishers of Cochrane Library reports. Further, the question evaluated in Cochrane and non-Cochrane reviews differed in terms of the type of treatment (pharmacologic or nonpharmacologic); for example, about half of the Cochrane reviews evaluated drugs, whereas non-Cochrane reviews more often evaluated nonpharmacological treatments
This study has several limitations. First, we focused on two medical areas, and these results should be confirmed in other medical areas. However, we chose tobacco consumption and HIV infection because they are among the first 5 causes of mortality in the world. Second, currently no consensus exists on how to assess the applicability of study results, we identified the applicability items following a literature review, and the relevance of some items might vary. Third, we did not consider the importance of each item even though it may vary according to context (e.g., assessing pharmacologic treatments or nonpharmacologic treatments). Fourth, during the appraisal process, we assumed that if data were reported in at least one randomized controlled trial included on the systematic review, this data had been gathered systematically in the systematic review. This assumption may have overestimated the reporting. Fifth, we excluded systematic reviews of reports for specific contexts or a specific population, which may have biased our sample of reviews to those widely applicable.
Finally, the screening process and the data collection were performed by only one reviewer. However, a quality assurance procedure was performed.