Stabilizing Sparse Cox Model Using Statistic and Semantic Structures in Electronic Medical Records

Gopakumar, Shivapratap; Nguyen, Tu Dinh; Tran, Truyen; Phung, Dinh; Venkatesh, Svetha

doi:10.1007/978-3-319-18032-8_26

Shivapratap Gopakumar¹⁰,
Tu Dinh Nguyen¹⁰,
Truyen Tran¹⁰,
Dinh Phung¹⁰ &
…
Svetha Venkatesh¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9078))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

4164 Accesses

Abstract

Stability in clinical prediction models is crucial for transferability between studies, yet has received little attention. The problem is paramount in high dimensional data, which invites sparse models with feature selection capability. We introduce an effective method to stabilize sparse Cox model of time-to-events using statistical and semantic structures inherent in Electronic Medical Records (EMR). Model estimation is stabilized using three feature graphs built from (i) Jaccard similarity among features (ii) aggregation of Jaccard similarity graph and a recently introduced semantic EMR graph (iii) Jaccard similarity among features transferred from a related cohort. Our experiments are conducted on two real world hospital datasets: a heart failure cohort and a diabetes cohort. On two stability measures – the Consistency index and signal-to-noise ratio (SNR) – the use of our proposed methods significantly increased feature stability when compared with the baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Austin, P.C., Tu, J.V.: Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. Journal of Clinical Epidemiology 57, 1138–1146 (2004)
Article Google Scholar
Lin, W., Lv, J.: High-dimensional sparse additive hazards regression. Journal of the American Statistical Association 108, 247–264 (2013)
Article MATH MathSciNet Google Scholar
Sandler, T., Blitzer, J., Talukdar, P.P., Ungar, L.H.: Regularized learning with networks of features. In: Advances in Neural Information Processing Systems 21. Curran Associates, Inc., pp. 1401–1408 (2009)
Google Scholar
Barabási, A.L., Gulbahce, N., Loscalzo, J.: Network medicine: a network-based approach to human disease. Nature Reviews Genetics 12, 56–68 (2011)
Article Google Scholar
Tran, T., Phung, D., Luo, W., Venkatesh, S.: Stabilized sparse ordinal regression for medical risk stratification. Knowledge and Information Systems, 1–28 (2014)
Google Scholar
Ye, J., Liu, J.: Sparse methods for biomedical data. ACM SIGKDD Explorations Newsletter 14, 4–15 (2012)
Article Google Scholar
Zhao, P., Yu, B.: On model selection consistency of lasso. The Journal of Machine Learning Research 7, 2541–2563 (2006)
MATH MathSciNet Google Scholar
Cun, Y., Fröhlich, H.: Biomarker gene signature discovery integrating network knowledge. Biology 1, 5–17 (2012)
Article Google Scholar
Dao, P., Wang, K., Collins, C., Ester, M., Lapuk, A., Sahinalp, S.C.: Optimally discriminative subnetwork markers predict response to chemotherapy. Bioinformatics 27, i205–i213 (2011)
Article Google Scholar
Sun, H., Lin, W., Feng, R., Li, H.: Network-regularized high-dimensional cox regression for analysis of genomic data. Statistica Sinica 24, 1433–1459 (2014)
MathSciNet Google Scholar
Fröhlich, H.: Including network knowledge into cox regression models for biomarker signature discovery. Biometrical Journal 56, 287–306 (2014)
Article MATH MathSciNet Google Scholar
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for cox’s proportional hazards model via coordinate descent. Journal of Statistical Software 39, 1–13 (2011)
Google Scholar
Vinzamuri, B., Reddy, C.: Cox regression with correlation based regularization for electronic health records. In: ICDM, pp. 757–766 (2013)
Google Scholar
Tibshirani, R., et al.: The lasso method for variable selection in the cox model. Statistics in Medicine 16, 385–395 (1997)
Article Google Scholar
Cox, D.R.: Partial likelihood. Biometrika 62, 269–276 (1975)
Article MATH MathSciNet Google Scholar
Xu, H., Caramanis, C., Mannor, S.: Sparse algorithms are not stable: A no-free-lunch theorem. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 187–193 (2012)
Article Google Scholar
Liu, D.C., Nocedal, J.: On the limited memory bfgs method for large scale optimization. Mathematical Programming 45, 503–528 (1989)
Article MATH MathSciNet Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 1345–1359 (2010)
Article Google Scholar
Tran, T., Phung, D.Q., Luo, W., Harvey, R., Berk, M., Venkatesh, S.: An integrated framework for suicide risk prediction. In: KDD, 1410–1418 (2013)
Google Scholar
Kuncheva, L.I.: A stability index for feature selection. In: Artificial Intelligence and Applications, 421–427 (2007)
Google Scholar
Vinzamuri, B., Li, Y., Reddy, C.K.: Active learning based survival regression for censored data. In: CIKM 2014, 241–250. ACM, New York (2014)
Google Scholar
Bilal, E., Dutkowski, J., Guinney, J., Jang, I.S., Logsdon, B.A., Pandey, G., Sauerwine, B.A., Shimoni, Y., Vollan, H.K.M., Mecham, B.H., et al.: Improving breast cancer survival analysis through competition-based multidimensional modeling. PLoS Computational Biology 9, e1003047 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Center for Pattern Recognition and Data Analytics, Deakin University, Melbourne, 3216, Australia
Shivapratap Gopakumar, Tu Dinh Nguyen, Truyen Tran, Dinh Phung & Svetha Venkatesh

Authors

Shivapratap Gopakumar
View author publications
You can also search for this author in PubMed Google Scholar
Tu Dinh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Truyen Tran
View author publications
You can also search for this author in PubMed Google Scholar
Dinh Phung
View author publications
You can also search for this author in PubMed Google Scholar
Svetha Venkatesh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shivapratap Gopakumar .

Editor information

Editors and Affiliations

Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam
Tru Cao
Singapore Management University, Singapore, Singapore
Ee-Peng Lim
Nanjing University, Nanjing, China
Zhi-Hua Zhou
Japan Advanced Institute of Science and Technology, Nomi City, Japan
Tu-Bao Ho
The University of Hong Kong, Hong Kong, Hong Kong SAR
David Cheung
Osaka University, Osaka, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gopakumar, S., Nguyen, T.D., Tran, T., Phung, D., Venkatesh, S. (2015). Stabilizing Sparse Cox Model Using Statistic and Semantic Structures in Electronic Medical Records. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9078. Springer, Cham. https://doi.org/10.1007/978-3-319-18032-8_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-18032-8_26
Published: 09 May 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18031-1
Online ISBN: 978-3-319-18032-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics