Skip to main content

Table 1 Common problems with statistical analyses of RCTs in (mental) health research

From: Evaluation of randomized controlled trials: a primer and tutorial for mental health researchers

Analysis steps

Common problems

Missing data handling

Many systematic reviews have found that, while standards have improved in recent years [9], the missing data handling in many RCTs remains poor [9,10,11,12,13]:

• The amount of missing data is often insufficiently reported, as is the methodology used to handle missing values.

• Assumptions of the missing data handling strategy remain undiscussed (are the data assumed to be missing completely at random, missing at random, missing not at random—and why?).

• Methods that are inadequate (e.g., single imputation) or based on strong assumptions (e.g., complete case analysis) are used.

• Although recommended by many regulatory guidelines [14], sensitivity analyses are still underused. If sensitivity analyses are conducted, they are often not suited to test the assumptions of the main missing data handling strategy.

• While often plausible, methods that model the missing not at random (MNAR) assumption are employed very infrequently and are often poorly reported.

Baseline covariate tests

Methodologists have frequently commented that baseline covariate or “randomization tests” are superfluous and that they should not be conducted [15,16,17,18,19].

Nevertheless, these tests are frequently reported in RCT evaluations, and reviewers often demand them to show that the randomization “worked”. Because P values of these tests are often included in the baseline characteristics table, some refer to this as the “Table-1 Fallacy” [20].

Analysis model

Even when data was derived form a parallel-group RCT, researchers often calculate change from baseline and pre-post effect sizes to assess intervention effects. While widespread and often requested by reviewers, this approach does not account for regression to the mean and can produce highly misleading results [21,22,23].

Interpretation of results

Null (viz., p ≥ 0.05) results are often interpreted as showing the absence of an effect, while “absence of evidence does not imply evidence of absence” [24, 25]. This issue also pertains to negative effects, which may be uncommon but important to detect. This problem is exacerbated by the fact that (in mental health research), most trials are not even sufficiently powered to detect the main effect of the intervention [26].

In a similar vein, “post hoc” power analyses are often conducted (or requested), e.g., to calculate the power of a trial based on its final sample size and calculated effect size (often with the intention to check if there is a “true” effect that the trial was simply not powered to detect). This approach is circular and logically flawed, since the observed power is simply a function of the P value [27, 28].

Reporting

There is evidence that the quality of clinical trial reports has improved substantially since journals started adopting the Consolidated Standards of Reporting Trials (CONSORT [29, 30]). Nevertheless, the reporting of RCT results in mental health research remains suboptimal [31, 32]. In the abstract, for example, trialists often fail to report methods of randomization and/or allocation concealment, or do not disclose the funding source.

Another concern is selective reporting. Still, many trials are not preregistered in a clinical trial registry; statistical analysis plans (SAPs) provided in these registrations are often vague. This makes it easier to conceal questionable research practices such as selective outcome reporting (i.e., only reporting outcomes that fit the researcher’s objective) [33] or “outcome switching” [34] in clinical trial reports.

Core outcome sets (COS [35]) are collections of outcomes that should be measured and reported in all clinical trials. They are a great way to ensure that endpoints are assessed consistently within a research field and using the appropriate instruments. A number of COS or related consensus papers has been developed for various mental and behavior disorders [36,37,38,39,40,41], but they remain underused. A comprehensive overview of available COS for mental health research and beyond is provided by the Core Outcome Measures in Effectiveness Trials (COMET [42]) initiative (www.comet-initiative.org).