Abstract
There is a growing appetite for large complex databases that integrate a range of personal, socio-demographic, health, genetic and financial information on individuals. It has been argued that ‘Big Data’ will provide the necessary catalyst to advance both biomedical research and health economics and outcomes research. However, it is important that we do not succumb to being data rich but information poor. This paper discusses the benefits and challenges of building Big Data, analysing Big Data and making appropriate inferences in order to advance cancer care, using Cancer 2015 (a prospective, longitudinal, genomic cohort study in Victoria, Australia) as a case study. Cancer 2015 has been linked to State and Commonwealth reimbursement databases that have known limitations. This partly reflects the funding arrangements in Australia, a country with both public and private provision, including public funding of private healthcare, and partly the legislative frameworks that govern data linkage. Additionally, linkage is not without time delays and, as such, achieving a contemporaneous database is challenging. Despite these limitations, there is clear value in using linked data and creating Big Data. This paper describes the linked Cancer 2015 dataset, discusses estimation issues given the nature of the data and presents panel regression results that allow us to make possible inferences regarding which patient, disease, genomic and treatment characteristics explain variation in health expenditure.
Similar content being viewed by others
Notes
MBS data will also capture private hospital services and hospital outpatient services if they are billed via Medicare.
As the cohort did not specifically target the elderly, and is not expressly interested in issues with end-of-life care, we chose not to establish a linkage with the Commonwealth Department of Veterans’ Affairs (DVA) database. Furthermore, the cost of requesting Commonwealth data is not inconsequential, so it was also for budgetary reasons. This likely means that our estimate of total healthcare expenditure is an underestimate. See Ward et al. [16] for an analysis that includes DVA patients.
Interested researchers are invited to discuss applications to access the Cancer 2015 data with the Steering Committee.
As VDL de-identify the data, they can provide data for as far back as the study team requests, although linkage and data quality does diminish.
Note that there is currently a lag with batch testing within the NGS panel, as it is more cost effective to run it with large samples of data.
Although this is just the first wave of data from Cancer 2015 to be linked, the breadth of the data in both the cohort and available from the State and Commonwealth governments results in a combined dataset that is greater than 67 Mb; this will grow exponentially as enrollment and follow-up continue, and will very much be in the realm of terabytes of Big Data.
The research team considered it unlikely that an individual diagnosed with cancer would not appear in either of these records; on the other hand, they may not appear in the hospital records if a palliative pathway was established from the outset or the cancer was particularly aggressive.
The peak just after diagnosis is likely to be due to the large number of tests that occur to inform diagnosis and treatment alternatives, and in those circumstances where it is possible, the surgery to remove the tumour.
The Hausman test rejected the null hypothesis that the random effects and regressors are uncorrelated, favouring the fixed-effects specification.
These were identified in the PBS records using the high level ATC (Anatomical Therapeutic Chemical) Code. These two ATC codes were included to reflect possible adverse effects of cancer treatment.
The UK EQ-5D-3L values were used to estimate EQ-5D values. We used a means of forward (and backward) extrapolation to determine QOL prior to (after) the first (last) recorded EQ-5D-3L, and a method of linear interpolation between QOL measurement points, such that we have a measure of QOL for each time period.
All analyses were undertaken in STATA/MP® 13 (STATACorp LP, College Station, TX, USA). The dataset required manipulation and selection to perform optimally; it is likely that future analyses with more linked data will require alternate software packages.
References
Trusheim MR, Berndt ER, Douglas FL. Stratified medicine: strategic and economic implications of combining drugs and clinical biomarkers. Nat Rev Drug Discov. 2007;6(4):287–93.
Sullivan R, Peppercorn J, Sikora K, Zalcberg J, Meropol NJ, Amir E, et al. Delivering affordable cancer care in high-income countries. Lancet Oncol. 2011;12(10):933–80.
Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inf Sci Syst. 2014;2(1):3.
Bates DW, Saria S, Ohno-Machado L, Shah A, Escobar G. Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Aff. 2014;33(7):1123–31.
Boyd D, Crawford K. Critical questions for big data: provocations for a cultural, technological, and scholarly phenomenon. Inform Commun Soc. 2012;15(5):662–79.
Collins B. Big Data and health economics: strengths, weaknesses, opportunities and threats. Epub: Pharmacoeconomics; 2015.
Hart CL, MacKinnon PL, Watt GC, Upton MN, McConnachie A, Hole DJ, et al. The Midspan studies. Int J Epidemiol. 2005;34(1):28–34.
Geue C, Briggs A, Lewsey J, Lorgelly P. Population ageing and healthcare expenditure projections: new evidence from a time to death approach. Eur J Health Econ. 2014;15(8):885–96.
Parisot JP, Thorne H, Fellowes A, Doig K, Lucas M, McNeil JJ, et al. “Cancer 2015”: a prospective, population-based cancer cohort—phase 1: feasibility of genomics-guided precision medicine in the clinic. J Personalised Med. 2015;5(4):354–69.
Katz SJ. Cancer care delivery research and the National Cancer Institute SEER program challenges and opportunities. JAMA. 2015;313(2):165–73.
Reichman ME, Altekruse S, Li CI, Chen VW, Deapen D, Potts M, et al. Feasibility study for collection of HER2 data by National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) Program central cancer registries. Cancer Epidemiol Biomark Prev. 2010;19(1):144–7.
Laney D (2001). 3D data management: controlling data volume, velocity and variety. META Group Research Note; 2001. File: 949. http://blogs.gartner.com/douglaney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf. Accessed 13 Apr 2015.
Marr B. Big Data: using SMART big data, analytics and metrics to make better decisions and improve performance. Chichester: John Wiley & Sons; 2015.
McNulty E. Understanding Big Data: the seven V’s. 2014. http://dataconomy.com/seven-vs-big-data/. Accessed 13 Apr 2015.
Borne K, editor. Top 10 Big Data challenges—a serious look at 10 Big Data V’s. MARP Blog 2014 Apr 11. https://www.mapr.com/blog/top-10-big-data-challenges-%E2%80%93-seriouslook-10-big-data-v%E2%80%99s. Accessed 13 Apr 2015.
Ward RL, Laaksonen MA, Gool K, Pearson SA, Daniels B, Bastick P, et al. Cost of cancer care for patients undergoing chemotherapy: the Elements of Cancer Care study. Asia Pac J Clin Oncol. 2015;11(2):178–86.
Wong S, Fellowes A, Doig K, Ellul J, Bosma T, Irwin D, et al. Assessing the clinical value of targeted massively parallel sequencing in a longitudinal, prospective population-based study of cancer patients. Br J Cancer. 2015;112(8):1411–20.
Sundararajan V, Henderson TM, Ackland M, Marshall R. Linkage of the Victorian Admitted Episodes Dataset. Symposium on health data linkage: its value for Australian health policy development and policy relevant research; 20–21 Mar 2002; Sydney.
Independent Hospital Pricing Authority (IHPA). National Efficient Price Determination. 2015. http://www.ihpa.gov.au/internet/ihpa/publishing.nsf/Content/national-efficient-price-determination-lp. Accessed 2 Feb 2015.
Independent Hospital Pricing Authority (IHPA). Technical specifications and NWAU calculators. 2015. http://www.ihpa.gov.au/internet/ihpa/publishing.nsf/Content/tech-specs-lp. Accessed 2 Feb 2015.
Mihaylova B, Briggs A, O’Hagan A, Thompson SG. Review of statistical methods for analysing healthcare resources and costs. Health Econ. 2011;20(8):897–916.
Jones AM (2000). Health econometrics. In: Culyer AJ, Newhouse JP, editors. Handbook of health economics. Part 1. Amsterdam: North Holland; 2001. p. 265–344.
Wong SQ, Li J, Tan AY, Vedururu R, Pang J-MB, Do H, et al. Sequence artefacts in a prospective series of formalin-fixed tumours tested for mutations in hotspot regions by massively parallel sequencing. BMC Med Genomics. 2014;7(1):23.
Ellis RP, Fiebig DG, Johar M, Jones G, Savage E. Explaining health care expenditure variation: large-sample evidence using linked survey and health administrative data. Health Econ. 2013;22(9):1093–110.
Medeiros BC, Satram-Hoang S, Hurst D, Hoang KQ, Momin F, Reyes C. Big data analysis of treatment patterns and outcomes among elderly acute myeloid leukemia patients in the United States. Ann Hematol. 2015;94(7):1127–38.
Lorgelly P, Knott R, Doble B, Harris M (2015). Modelling the cost of cancer: a system of equations approach to understanding inter-relationships. In: Health Economists Study Group, 22–24 June 2015. Lancaster University, UK.
Saleema J, Shenoy PD, Venugopal K, Patnaik L. Cancer prognosis prediction model using data mining techniques. Data Min Knowl Eng. 2014;6(1):21–9.
Al-Bahrani R, Agrawal A, Choudhary A. Colon cancer survival prediction using ensemble data mining on SEER data. 2013 IEEE International Conference on Big Data; 6–9 Oct 2013; Silicon Valley.
Crown WH. Potential application of machine learning in health outcomes research and some statistical cautions. Value Health. 2015;18(2):137–40.
Piana R. National Cancer Institute pulls PSA data from SEER. The ASCO Post. 2015;6(11). http://www.ascopost.com/issues/june-25,-2015/national-cancer-institute-pulls-psa-data-from-seer.aspx. Accessed 22 Oct 2015.
Blakely T, Atkinson J, Kvizhinadze G, Wilson N, Davies A, Clarke P. Patterns of cancer care costs in a country with detailed individual data. Med Care. 2015;53(4):302–9.
Acknowledgments
We would like to sincerely thank all of the cancer patients who agreed to participate in the cohort. We acknowledge the contributions of the following staff and collaborators of this multi-site cohort: Kristy Barnes-Cullen, Kate Crough, Jessica McDonald, Kim Waddell, Jasmine Marr, Mandy Ballinger, Ann Officer, Anne Fennessy, Sonia Mailer, Connie Mascarenhas and Mathew Shibi from the Peter MacCallum Cancer Centre; Kate Richards, Laura Zamurs and Kate Hurford from Cabrini Health; Carolyn Wielens, Lea-Anne Harrison, Judi Broad, Robert Swiger, Tina Smith and Anne Woollett from The Andrew Love Cancer Centre, Barwon Health; Sandra Robinson and Marcelle Hennig from SouthWest Health; Monica Merceica, Stefanie Hartley, Pat Bugeja, Lidia Veca, Christopher Bates and Nicole Ng from The Royal Melbourne Hospital, Melbourne Health; Thomas John from The Olivia Newton John Cancer and Wellness Centre, Austin Health; Neil Watkins from Monash Medical Centre, Southern Health; and Paul Waring and Melissa Southey from Department of Pathology, University of Melbourne. We also acknowledge the contributions of the Cancer 2015 Expert Advisory Committee consisting of Richard Sullivan, John Zalcberg, Andrew Biankin, Sean Grimmond, David Roder and David Goldstein.
The ‘Big Data’ data would not be possible without the collaboration of the Commonwealth Department of Human Services, and the Victorian Data Linkage Unit at the Victorian Department of Health and Human Services. The authors are especially grateful to Rhonda Charlesworth at DHS and Ying Chen at VDL.
Cancer 2015 investigators
David M. Thomas, Division of Cancer Research, Peter MacCallum Cancer Centre; Sir Peter MacCallum Department of Oncology, The University of Melbourne; The Kinghorn Cancer Centre and Garvan Institute
Stephen B. Fox, Division of Cancer Research, Peter MacCallum Cancer Centre; Sir Peter MacCallum Department of Oncology, The University of Melbourne; Department of Pathology, Peter MacCallum Cancer Centre; The Department of Pathology, The University of Melbourne
Heather Thorne, Division of Cancer Research, Peter MacCallum Cancer Centre; Sir Peter MacCallum Department of Oncology, The University of Melbourne
John P. Parisot, Division of Cancer Research, Peter MacCallum Cancer Centre; Sir Peter MacCallum Department of Oncology, The University of Melbourne
Ken Doig, Division of Cancer Research, Peter MacCallum Cancer Centre; Sir Peter MacCallum Department of Oncology, The University of Melbourne
Andrew Fellowes, Department of Pathology, Peter MacCallum Cancer Centre
Alexander Dobrovic, Translational Genomics and Epigenomics Laboratory, Olivia Newton-John Cancer Research Institute; The Department of Pathology, The University of Melbourne; School of Cancer Medicine, La Trobe University
Paul A. James, Division of Cancer Medicine, Peter MacCallum Cancer Centre
Lara Lipton, Department of Medical Oncology, The Royal Melbourne Hospital
David Ashley, The Andrew Love Cancer Centre, Geelong Hospital, Barwon Health
Theresa Hayes, Warrnambool Hospital, SouthWest Healthcare
Paul McMurrick, Department of Surgery, Cabrini Institute, Cabrini Health
Gary Richardson, Department Haematology and Oncology, Cabrini Institute, Cabrini Health
Paula Lorgelly, Centre for Health Economics, Monash University
Mark Lucas, Department of Epidemiology and Preventative Medicine, Alfred Centre, Monash University
John J. McNeil, Department of Epidemiology and Preventative Medicine, Alfred Centre, Monash University
Tom John, Department of Medical Oncology, Olivia Newton John Cancer and Wellness Centre, Austin Health
Author information
Authors and Affiliations
Consortia
Corresponding author
Ethics declarations
Cancer 2015 is funded by the Victorian Cancer Agency Translational Research Program. PKL, BD and RJK have no conflicts of interest to declare. This study was approved by the Human Research Ethics Committee (HREC) at the Peter MacCallum Cancer Centre (HREC number 11/69) and all participating hospitals; and has been performed in accordance with the ethical standards of the Declaration of Helsinki. Informed consent was obtained from all individual participants included in the study.
Author contributions
All authors contributed to the paper concept, data analyses, interpretation of results and report writing. PKL acts as the guarantor.
Additional information
The list of Cancer 2015 investigators is given in Acknowledgments.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
See Fig. 3.
Rights and permissions
About this article
Cite this article
Lorgelly, P.K., Doble, B., Knott, R.J. et al. Realising the Value of Linked Data to Health Economic Analyses of Cancer Care: A Case Study of Cancer 2015. PharmacoEconomics 34, 139–154 (2016). https://doi.org/10.1007/s40273-015-0343-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40273-015-0343-2