Skip to main content
Log in

Realising the Value of Linked Data to Health Economic Analyses of Cancer Care: A Case Study of Cancer 2015

  • Practical Application
  • Published:
PharmacoEconomics Aims and scope Submit manuscript

Abstract

There is a growing appetite for large complex databases that integrate a range of personal, socio-demographic, health, genetic and financial information on individuals. It has been argued that ‘Big Data’ will provide the necessary catalyst to advance both biomedical research and health economics and outcomes research. However, it is important that we do not succumb to being data rich but information poor. This paper discusses the benefits and challenges of building Big Data, analysing Big Data and making appropriate inferences in order to advance cancer care, using Cancer 2015 (a prospective, longitudinal, genomic cohort study in Victoria, Australia) as a case study. Cancer 2015 has been linked to State and Commonwealth reimbursement databases that have known limitations. This partly reflects the funding arrangements in Australia, a country with both public and private provision, including public funding of private healthcare, and partly the legislative frameworks that govern data linkage. Additionally, linkage is not without time delays and, as such, achieving a contemporaneous database is challenging. Despite these limitations, there is clear value in using linked data and creating Big Data. This paper describes the linked Cancer 2015 dataset, discusses estimation issues given the nature of the data and presents panel regression results that allow us to make possible inferences regarding which patient, disease, genomic and treatment characteristics explain variation in health expenditure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Note that these ‘V’ terms, and those that follow, have been variously defined in lists of the three Vs for Big Data [12], the five Vs [13], the seven Vs [14] and even ten Vs [15].

  2. MBS data will also capture private hospital services and hospital outpatient services if they are billed via Medicare.

  3. As the cohort did not specifically target the elderly, and is not expressly interested in issues with end-of-life care, we chose not to establish a linkage with the Commonwealth Department of Veterans’ Affairs (DVA) database. Furthermore, the cost of requesting Commonwealth data is not inconsequential, so it was also for budgetary reasons. This likely means that our estimate of total healthcare expenditure is an underestimate. See Ward et al. [16] for an analysis that includes DVA patients.

  4. Interested researchers are invited to discuss applications to access the Cancer 2015 data with the Steering Committee.

  5. As VDL de-identify the data, they can provide data for as far back as the study team requests, although linkage and data quality does diminish.

  6. Note that there is currently a lag with batch testing within the NGS panel, as it is more cost effective to run it with large samples of data.

  7. Although this is just the first wave of data from Cancer 2015 to be linked, the breadth of the data in both the cohort and available from the State and Commonwealth governments results in a combined dataset that is greater than 67 Mb; this will grow exponentially as enrollment and follow-up continue, and will very much be in the realm of terabytes of Big Data.

  8. The research team considered it unlikely that an individual diagnosed with cancer would not appear in either of these records; on the other hand, they may not appear in the hospital records if a palliative pathway was established from the outset or the cancer was particularly aggressive.

  9. The peak just after diagnosis is likely to be due to the large number of tests that occur to inform diagnosis and treatment alternatives, and in those circumstances where it is possible, the surgery to remove the tumour.

  10. The Hausman test rejected the null hypothesis that the random effects and regressors are uncorrelated, favouring the fixed-effects specification.

  11. These were identified in the PBS records using the high level ATC (Anatomical Therapeutic Chemical) Code. These two ATC codes were included to reflect possible adverse effects of cancer treatment.

  12. The UK EQ-5D-3L values were used to estimate EQ-5D values. We used a means of forward (and backward) extrapolation to determine QOL prior to (after) the first (last) recorded EQ-5D-3L, and a method of linear interpolation between QOL measurement points, such that we have a measure of QOL for each time period.

  13. All analyses were undertaken in STATA/MP® 13 (STATACorp LP, College Station, TX, USA). The dataset required manipulation and selection to perform optimally; it is likely that future analyses with more linked data will require alternate software packages.

References

  1. Trusheim MR, Berndt ER, Douglas FL. Stratified medicine: strategic and economic implications of combining drugs and clinical biomarkers. Nat Rev Drug Discov. 2007;6(4):287–93.

    Article  CAS  PubMed  Google Scholar 

  2. Sullivan R, Peppercorn J, Sikora K, Zalcberg J, Meropol NJ, Amir E, et al. Delivering affordable cancer care in high-income countries. Lancet Oncol. 2011;12(10):933–80.

    Article  PubMed  Google Scholar 

  3. Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inf Sci Syst. 2014;2(1):3.

    Article  PubMed Central  PubMed  Google Scholar 

  4. Bates DW, Saria S, Ohno-Machado L, Shah A, Escobar G. Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Aff. 2014;33(7):1123–31.

    Article  Google Scholar 

  5. Boyd D, Crawford K. Critical questions for big data: provocations for a cultural, technological, and scholarly phenomenon. Inform Commun Soc. 2012;15(5):662–79.

    Article  Google Scholar 

  6. Collins B. Big Data and health economics: strengths, weaknesses, opportunities and threats. Epub: Pharmacoeconomics; 2015.

    Google Scholar 

  7. Hart CL, MacKinnon PL, Watt GC, Upton MN, McConnachie A, Hole DJ, et al. The Midspan studies. Int J Epidemiol. 2005;34(1):28–34.

    Article  PubMed  Google Scholar 

  8. Geue C, Briggs A, Lewsey J, Lorgelly P. Population ageing and healthcare expenditure projections: new evidence from a time to death approach. Eur J Health Econ. 2014;15(8):885–96.

    Article  PubMed  Google Scholar 

  9. Parisot JP, Thorne H, Fellowes A, Doig K, Lucas M, McNeil JJ, et al. “Cancer 2015”: a prospective, population-based cancer cohort—phase 1: feasibility of genomics-guided precision medicine in the clinic. J Personalised Med. 2015;5(4):354–69.

    Article  Google Scholar 

  10. Katz SJ. Cancer care delivery research and the National Cancer Institute SEER program challenges and opportunities. JAMA. 2015;313(2):165–73.

    Article  Google Scholar 

  11. Reichman ME, Altekruse S, Li CI, Chen VW, Deapen D, Potts M, et al. Feasibility study for collection of HER2 data by National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) Program central cancer registries. Cancer Epidemiol Biomark Prev. 2010;19(1):144–7.

    Article  Google Scholar 

  12. Laney D (2001). 3D data management: controlling data volume, velocity and variety. META Group Research Note; 2001. File: 949. http://blogs.gartner.com/douglaney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf. Accessed 13 Apr 2015.

  13. Marr B. Big Data: using SMART big data, analytics and metrics to make better decisions and improve performance. Chichester: John Wiley & Sons; 2015.

    Google Scholar 

  14. McNulty E. Understanding Big Data: the seven V’s. 2014. http://dataconomy.com/seven-vs-big-data/. Accessed 13 Apr 2015.

  15. Borne K, editor. Top 10 Big Data challenges—a serious look at 10 Big Data V’s. MARP Blog 2014 Apr 11. https://www.mapr.com/blog/top-10-big-data-challenges-%E2%80%93-seriouslook-10-big-data-v%E2%80%99s. Accessed 13 Apr 2015.

  16. Ward RL, Laaksonen MA, Gool K, Pearson SA, Daniels B, Bastick P, et al. Cost of cancer care for patients undergoing chemotherapy: the Elements of Cancer Care study. Asia Pac J Clin Oncol. 2015;11(2):178–86.

    Article  PubMed  Google Scholar 

  17. Wong S, Fellowes A, Doig K, Ellul J, Bosma T, Irwin D, et al. Assessing the clinical value of targeted massively parallel sequencing in a longitudinal, prospective population-based study of cancer patients. Br J Cancer. 2015;112(8):1411–20.

    Article  CAS  PubMed  Google Scholar 

  18. Sundararajan V, Henderson TM, Ackland M, Marshall R. Linkage of the Victorian Admitted Episodes Dataset. Symposium on health data linkage: its value for Australian health policy development and policy relevant research; 20–21 Mar 2002; Sydney.

  19. Independent Hospital Pricing Authority (IHPA). National Efficient Price Determination. 2015. http://www.ihpa.gov.au/internet/ihpa/publishing.nsf/Content/national-efficient-price-determination-lp. Accessed 2 Feb 2015.

  20. Independent Hospital Pricing Authority (IHPA). Technical specifications and NWAU calculators. 2015. http://www.ihpa.gov.au/internet/ihpa/publishing.nsf/Content/tech-specs-lp. Accessed 2 Feb 2015.

  21. Mihaylova B, Briggs A, O’Hagan A, Thompson SG. Review of statistical methods for analysing healthcare resources and costs. Health Econ. 2011;20(8):897–916.

    Article  PubMed Central  PubMed  Google Scholar 

  22. Jones AM (2000). Health econometrics. In: Culyer AJ, Newhouse JP, editors. Handbook of health economics. Part 1. Amsterdam: North Holland; 2001. p. 265–344.

  23. Wong SQ, Li J, Tan AY, Vedururu R, Pang J-MB, Do H, et al. Sequence artefacts in a prospective series of formalin-fixed tumours tested for mutations in hotspot regions by massively parallel sequencing. BMC Med Genomics. 2014;7(1):23.

    Article  PubMed Central  PubMed  Google Scholar 

  24. Ellis RP, Fiebig DG, Johar M, Jones G, Savage E. Explaining health care expenditure variation: large-sample evidence using linked survey and health administrative data. Health Econ. 2013;22(9):1093–110.

    Article  PubMed  Google Scholar 

  25. Medeiros BC, Satram-Hoang S, Hurst D, Hoang KQ, Momin F, Reyes C. Big data analysis of treatment patterns and outcomes among elderly acute myeloid leukemia patients in the United States. Ann Hematol. 2015;94(7):1127–38.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  26. Lorgelly P, Knott R, Doble B, Harris M (2015). Modelling the cost of cancer: a system of equations approach to understanding inter-relationships. In: Health Economists Study Group, 22–24 June 2015. Lancaster University, UK.

  27. Saleema J, Shenoy PD, Venugopal K, Patnaik L. Cancer prognosis prediction model using data mining techniques. Data Min Knowl Eng. 2014;6(1):21–9.

    Google Scholar 

  28. Al-Bahrani R, Agrawal A, Choudhary A. Colon cancer survival prediction using ensemble data mining on SEER data. 2013 IEEE International Conference on Big Data; 6–9 Oct 2013; Silicon Valley.

  29. Crown WH. Potential application of machine learning in health outcomes research and some statistical cautions. Value Health. 2015;18(2):137–40.

    Article  PubMed  Google Scholar 

  30. Piana R. National Cancer Institute pulls PSA data from SEER. The ASCO Post. 2015;6(11). http://www.ascopost.com/issues/june-25,-2015/national-cancer-institute-pulls-psa-data-from-seer.aspx. Accessed 22 Oct 2015.

  31. Blakely T, Atkinson J, Kvizhinadze G, Wilson N, Davies A, Clarke P. Patterns of cancer care costs in a country with detailed individual data. Med Care. 2015;53(4):302–9.

    PubMed Central  PubMed  Google Scholar 

Download references

Acknowledgments

We would like to sincerely thank all of the cancer patients who agreed to participate in the cohort. We acknowledge the contributions of the following staff and collaborators of this multi-site cohort: Kristy Barnes-Cullen, Kate Crough, Jessica McDonald, Kim Waddell, Jasmine Marr, Mandy Ballinger, Ann Officer, Anne Fennessy, Sonia Mailer, Connie Mascarenhas and Mathew Shibi from the Peter MacCallum Cancer Centre; Kate Richards, Laura Zamurs and Kate Hurford from Cabrini Health; Carolyn Wielens, Lea-Anne Harrison, Judi Broad, Robert Swiger, Tina Smith and Anne Woollett from The Andrew Love Cancer Centre, Barwon Health; Sandra Robinson and Marcelle Hennig from SouthWest Health; Monica Merceica, Stefanie Hartley, Pat Bugeja, Lidia Veca, Christopher Bates and Nicole Ng from The Royal Melbourne Hospital, Melbourne Health; Thomas John from The Olivia Newton John Cancer and Wellness Centre, Austin Health; Neil Watkins from Monash Medical Centre, Southern Health; and Paul Waring and Melissa Southey from Department of Pathology, University of Melbourne. We also acknowledge the contributions of the Cancer 2015 Expert Advisory Committee consisting of Richard Sullivan, John Zalcberg, Andrew Biankin, Sean Grimmond, David Roder and David Goldstein.

The ‘Big Data’ data would not be possible without the collaboration of the Commonwealth Department of Human Services, and the Victorian Data Linkage Unit at the Victorian Department of Health and Human Services. The authors are especially grateful to Rhonda Charlesworth at DHS and Ying Chen at VDL.

Cancer 2015 investigators

David M. Thomas, Division of Cancer Research, Peter MacCallum Cancer Centre; Sir Peter MacCallum Department of Oncology, The University of Melbourne; The Kinghorn Cancer Centre and Garvan Institute

Stephen B. Fox, Division of Cancer Research, Peter MacCallum Cancer Centre; Sir Peter MacCallum Department of Oncology, The University of Melbourne; Department of Pathology, Peter MacCallum Cancer Centre; The Department of Pathology, The University of Melbourne

Heather Thorne, Division of Cancer Research, Peter MacCallum Cancer Centre; Sir Peter MacCallum Department of Oncology, The University of Melbourne

John P. Parisot, Division of Cancer Research, Peter MacCallum Cancer Centre; Sir Peter MacCallum Department of Oncology, The University of Melbourne

Ken Doig, Division of Cancer Research, Peter MacCallum Cancer Centre; Sir Peter MacCallum Department of Oncology, The University of Melbourne

Andrew Fellowes, Department of Pathology, Peter MacCallum Cancer Centre

Alexander Dobrovic, Translational Genomics and Epigenomics Laboratory, Olivia Newton-John Cancer Research Institute; The Department of Pathology, The University of Melbourne; School of Cancer Medicine, La Trobe University

Paul A. James, Division of Cancer Medicine, Peter MacCallum Cancer Centre

Lara Lipton, Department of Medical Oncology, The Royal Melbourne Hospital

David Ashley, The Andrew Love Cancer Centre, Geelong Hospital, Barwon Health

Theresa Hayes, Warrnambool Hospital, SouthWest Healthcare

Paul McMurrick, Department of Surgery, Cabrini Institute, Cabrini Health

Gary Richardson, Department Haematology and Oncology, Cabrini Institute, Cabrini Health

Paula Lorgelly, Centre for Health Economics, Monash University

Mark Lucas, Department of Epidemiology and Preventative Medicine, Alfred Centre, Monash University

John J. McNeil, Department of Epidemiology and Preventative Medicine, Alfred Centre, Monash University

Tom John, Department of Medical Oncology, Olivia Newton John Cancer and Wellness Centre, Austin Health

Author information

Authors and Affiliations

Authors

Consortia

Corresponding author

Correspondence to Paula K. Lorgelly.

Ethics declarations

Cancer 2015 is funded by the Victorian Cancer Agency Translational Research Program. PKL, BD and RJK have no conflicts of interest to declare. This study was approved by the Human Research Ethics Committee (HREC) at the Peter MacCallum Cancer Centre (HREC number 11/69) and all participating hospitals; and has been performed in accordance with the ethical standards of the Declaration of Helsinki. Informed consent was obtained from all individual participants included in the study.

Author contributions

All authors contributed to the paper concept, data analyses, interpretation of results and report writing. PKL acts as the guarantor.

Additional information

The list of Cancer 2015 investigators is given in Acknowledgments.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 110 kb)

Appendix

Appendix

See Fig. 3.

Fig. 3
figure 3

Cancer 2015 data collection and linkage. MBS Medicare Benefits Schedule, PBS Pharmaceutical Benefits Scheme, VAED Victorian Admitted Episodes Database, VEMD Victorian Emergency Minimum Database

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lorgelly, P.K., Doble, B., Knott, R.J. et al. Realising the Value of Linked Data to Health Economic Analyses of Cancer Care: A Case Study of Cancer 2015. PharmacoEconomics 34, 139–154 (2016). https://doi.org/10.1007/s40273-015-0343-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40273-015-0343-2

Keywords

Navigation