Article Text

Download PDFPDF

The limitations of using randomised controlled trials as a basis for developing treatment guidelines
Free
  1. Roger Mulder1,2,
  2. Ajeet B Singh1,3,
  3. Amber Hamilton1,4,5,6,
  4. Pritha Das1,4,5,6,
  5. Tim Outhred1,4,5,6,
  6. Grace Morris1,4,5,6,
  7. Darryl Bassett1,7,
  8. Bernhard T Baune1,8,
  9. Michael Berk1,3,9,
  10. Philip Boyce1,10,
  11. Bill Lyndon1,5,11,12,
  12. Gordon Parker1,13,14,
  13. Gin S Malhi1,4,6
  1. 1 Mood Assessment and Classification (MAC) Committee, Sydney, Australia
  2. 2 Department of Psychological Medicine, University of Otago, Christchurch, New Zealand
  3. 3 School of Medicine, IMPACT Strategic Research Centre, Deakin University, Barwon Health, Geelong, Victoria, Australia
  4. 4 Department of Psychiatry, Northern Sydney Local Health District, St Leonards, New South Wales, Australia
  5. 5 Sydney Medical School Northern, University of Sydney, Sydney, New South Wales, Australia
  6. 6 CADE Clinic, Royal North Shore Hospital, Northern Sydney Local Health District, St Leonards, New South Wales, Australia
  7. 7 Private Practice in Psychiatry and Division of Psychiatry, University of Western Australia, Perth, Australia
  8. 8 Discipline of Psychiatry, University of Adelaide, Adelaide, South Australia, Australia
  9. 9 Department of Psychiatry, Orygen Research Centre, and the Florey Institute for Neuroscience and Mental Health, University of Melbourne, Melbourne, Victoria, Australia
  10. 10 Discipline of Psychiatry, Sydney Medical School, Westmead Clinical School, University of Sydney, Sydney, New South Wales, Australia
  11. 11 Mood Disorders Unit, Northside Clinic, Greenwich, New South Wales, Australia
  12. 12 ECT Services, Northside Group Hospitals, Greenwich, New South Wales, Australia
  13. 13 School of Psychiatry, University of New South Wales, Kensington, New South Wales, Australia
  14. 14 Black Dog Institute, Sydney, New South Wales, Australia
  1. Correspondence to Professor Gin S Malhi, CADE Clinic, Academic Department of Psychiatry, Level 3, Main Building, Royal North Shore Hospital, St Leonards, Sydney, NSW 2065, Australia; gin.malhi{at}sydney.edu.au

Abstract

Randomised controlled trials (RCTs) are considered the ‘gold standard’ by which novel psychotropic medications and psychological interventions are evaluated and consequently adopted into widespread clinical practice. However, there are some limitations to using RCTs as the basis for developing treatment guidelines. While RCTs allow researchers to determine whether a given medication or intervention is effective in a specific patient sample, for practicing clinicians it is more important to know whether it will work for their particular patient in their particular setting. This information cannot be garnered from an RCT. These inherent limitations are exacerbated by biases in design, recruitment, sample populations and data analysis that are inevitable in real-world studies. While trial registration and CONSORT have been implemented to correct and improve these issues, it is worrying that many trials fail to achieve such standards and yet their findings are used to inform clinical decision making. This perspective piece questions the assumptions of RCTs and highlights the widespread distortion of findings that currently undermine the credibility of this powerful design. It is recommended that the clinical guidelines include advice as to what should be considered good and relevant evidence and that external bodies continue to monitor RCTs to ensure that the outcomes published indeed reflect reality.

  • protocols & guidelines
  • medical ethics
  • psychiatry
  • statistics & research methods
  • clinical trials

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

In preparing the Royal Australian and New Zealand College of Psychiatrists guidelines for mood disorders,1 the usual empirical methodological hierarchy was employed in which individual case reports were at the ‘bottom’ and randomised controlled trials (RCTs) at the ‘top’. This action is virtually unthinking, reflecting the rise of evidenced-based medicine. What is good about RCTs? The canonical answer is that RCTs control for unknown confounders by a design that ensures that all features causally related to the outcome other than treatment are distributed identically between the treatment and control groups. If the outcome is more probable in the treatment group, then the only explanation possible is that the treatment caused the outcome in some members of that group.2 But as Cartwright has pointed out, the logic of RCTs is ideal for supporting ‘it-works-somewhere’ claims. Demonstrating that a drug is effective in a patient sample is an essential step in drug registration, hence the need for clinical trials to be conducted so that the drug can to get to market. In clinical practice, we need evidence for its clinical utility; that it will produce the desired outcome in real-world patients and settings: the ‘it-works-for-us’ claim. This article questions the truth of both claims in the context of RCTs informing mood disorder clinical guidelines.

‘It-works-somewhere’

First, the ‘it-works-somewhere’ claim will be evaluated. RCTs in psychiatry may have bias in design, recruitment, patient populations, data analysis and presentation of findings. Studies are relatively small generally involving, at most, a few hundred subjects. The treatment effect sizes are small, which compounds the problem of clinical utility and translation. Many syndromes have high spontaneous recovery and placebo response rates, which complicate analyses and obfuscate effects. Definitions of syndromes are often imprecise and overlapping while simultaneously heterogeneous, and commonly result in highly mixed samples. Added to this, a variety of outcome measures are used while interventions are often only conducted in tertiary referral units.3 Such specialist units are usually found in academic institutions, where patients are referred due to more complex illness patterns. Such groups of patients tend to have poorer prospects of remission and are rarely reflective of the general population with mood disorders.

There is consistent evidence of selected or distorted reporting in RCTs. Chan et al 4 reviewed 102 trials and noted that the reporting of trial outcomes is not only frequently incomplete but also biased and inconsistent with protocols. Over half the trials were reported either in part (incompletely) or not at all, with statistically significant results having higher odds of being reported compared with non-significant outcomes for both efficacy (pooled OR 2.4) and harm (pooled OR 4.7). In other words, a significant outcome results in a greater likelihood of being reported. More disturbingly, 86% of survey responders denied the existence of unreported outcomes despite evidence to the contrary. A prominent example of this in psychiatry is Study 329: an RCT comparing paroxetine, imipramine and placebo in adolescents. A recent re-analysis reported that paroxetine only produced a positive result when four new secondary outcome measures were used instead of the primary outcomes. Analysing the primary outcome measure revealed no group differences.5

Due to these concerns, the International Committee of Medical Journal Editors introduced a policy to require registration of all clinical trials prior to enrolment of subjects.6 Registration involves information about trial protocols and the specified outcome measures being made publicly available. However, the efficacy of trial registration has been called into question by a number of studies. In a review of five psychiatry journals that mandate registering of prospective clinical trials, it was reported that only 33% of trials were correctly prospectively registered and, of these, 28% had evidence of selective outcome reporting and 27% a large change in participant numbers. Overall, only 14.4% were correctly registered and reported.7 For psychotherapy RCTs, the results were even worse. Only 24.1% were registered and 4.5% free from selective outcome reporting,8 underscoring the fact that bias is not an issue just confined to pharmaceutical industry trials.

The other major attempt to improve the conduct and reporting of RCTs is the Consolidated Standards of Report Trials (CONSORT) guidelines. These provide an evidence-based minimum set of recommendations for reporting RCTs.9 While there is evidence that reporting in psychiatric RCTs has improved, over 40% of studies still do not adhere to the CONSORT guidelines.10

There is also the often cited influence of pharmaceutical marketing where there may be motivation for bias in design and external validity. A review of drug company authorship and sponsorship on drug trial outcomes reported that of 198 studies in three prestigious psychiatry journals (British Journal of Psychiatry, American Journal of Psychiatry and JAMA Psychiatry), only 23% were independently funded. Furthermore, independently funded studies were significantly more likely to report negative findings whereas industry-authored studies nearly always reported positive findings. Specifically, 74 out of 76 RCTs in this study demonstrated this bias11—although journal editors are also reluctant to publish negative studies suggesting that sources of this bias arise at multiple levels. Similar effects are found in psychotherapy RCTs. Larger positive effect sizes were found when authors’ allegiance to the studied psychotherapy existed and this allegiance effect was even stronger where the RCT was performed by the developer of the preferred treatment.12 13 Finally, the influence of publication bias remains an issue. A large survey of RCT researchers (n=318) revealed that around 25% of trials go unpublished and these unpublished studies are less likely to have favoured the new therapy. Interestingly, they noted that non-publication was primarily a result of failure to write up and submit trial results rather than a rejection of submitted manuscripts.14

Overall, while the ‘it-works-somewhere’ claim is somewhat more likely to be true over the past decade, it remains possible that many published RCTs are spurious or at least overstate their claims through a combination of methodological flaws (eg, type I and type II errors), selective reporting, marketing interests and publication bias.

‘It-will-work-for-us’

The ‘it-will-work-for-us’ claim is more difficult to evaluate. The CONSORT statement mandates a clear exposition of the recruitment pathway by which patients enter the RCT. This reporting is intended to enable clinicians to judge to whom the results of the RCT apply. But again the reality falls short. A review of trials leading to clinical alerts by the US National Institute of Health revealed that in relation to 31 eligibility criteria, only 63% were published in the main trial report and only 19% in the clinical alert.15 Inadequate reporting is even more of a problem in secondary publications such as clinical guidelines since space limitations and the need for a succinct message do not allow for detailed consideration of eligibility of trials or other determinants of external validity.16 Exclusion of common comorbidities is one of the common factors preventing real-world generalisability of RCTs. Furthermore, the population-level statistical approach to evidence-based medicine can ‘homogenise’ the complex heterogeneity of clinical reality and produce empirical data sets that lack clinical salience to real-world patients. It is not good to be an outlier when a standard population-based evidence approach is applied to your care.

Much less discussed (other than the type II error bias)17 is the possibility that true findings may be annulled because of reverse bias (ie, bias may under-estimate treatment effect). A potential source of reverse bias in psychiatric RCTs emerges from recruitment strategies. For example, participants entering clinical trials for depression are likely to be mild or moderately depressed, sometimes better diagnosed as having an adjustment disorder or a persistent depressive disorder, or a depression related to psychosocial adversity, but are all lumped into a ‘major depression’ category to meet recruitment targets. Patients may inflate their scores to get ‘free’ treatment while assessing clinicians may inflate scores to enhance recruitment.18 Many trials exclude those with common comorbidities, such as those with suicidal ideation, and only seldom is a developmental trauma history obtained to better inform diagnosis.19

Conclusion

While RCTs provide the most credible research design for evaluating the effects of treatment at population levels, there is justifiable concern that the way the trials are conducted results in limited external validity and clinical salience. Despite efforts using trial registration and CONSORT, the evidence indicates many RCTs fall short of these standards. Further bias is introduced by pharmaceutical industry funding, ‘championing’ by developers of psychotherapies and publication bias. Clinical practice guidelines leave judgement largely to clinicians governed by clinical experience, but their observations do carry weight and inform decisions regarding patient care. Although this may seem inadequate, it reflects the current lack of explicit methodology to evaluate efficacy claims down to the level of individual patient decision-making. It can be argued that clinical guidelines need to include advice about what counts as good and relevant evidence.20 To use RCT evidence, we need to tackle rather than ignore the real issues of whether ‘it-works-somewhere’ is actually true and even more whether this means ‘it-will-work-for-us’. Applying empirical evidence to effectively care for the individual patient is the ‘art’ of medicine. It is an art that is alive and well, but can lead to idiosyncratic practice, making guidelines still pertinent despite the many epistemological limitations of population-level clinical trial science.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.

Footnotes

  • Funding The MAC Project was supported logistically by Servier who provided financial assistance with travel and accommodation for those MAC Committee members travelling interstate or overseas to attend the meeting in Sydney (held on 18th March 2017). Members of the committee were not paid to participate in this project and Servier had no input into the content, format or outputs from this project.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Collaborators The Mood Assessment and Classification Committee (MAC Committee) comprised academic psychiatrists with clinical expertise in the management of mood disorders and researchers with an interest in depression and bipolar disorders. The independently convened committee specifically targeted contentious aspects of mood disorders diagnosis and assessment with the express aim of informing clinical practice and future research. Members of the committee held one face to face meeting in Sydney (Australia) to discuss the issues in depth and agree upon outcomes. These were then developed further via email correspondence.