Abstract
Stability in clinical prediction models is crucial for transferability between studies, yet has received little attention. The problem is paramount in high dimensional data, which invites sparse models with feature selection capability. We introduce an effective method to stabilize sparse Cox model of time-to-events using statistical and semantic structures inherent in Electronic Medical Records (EMR). Model estimation is stabilized using three feature graphs built from (i) Jaccard similarity among features (ii) aggregation of Jaccard similarity graph and a recently introduced semantic EMR graph (iii) Jaccard similarity among features transferred from a related cohort. Our experiments are conducted on two real world hospital datasets: a heart failure cohort and a diabetes cohort. On two stability measures – the Consistency index and signal-to-noise ratio (SNR) – the use of our proposed methods significantly increased feature stability when compared with the baselines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Austin, P.C., Tu, J.V.: Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. Journal of Clinical Epidemiology 57, 1138–1146 (2004)
Lin, W., Lv, J.: High-dimensional sparse additive hazards regression. Journal of the American Statistical Association 108, 247–264 (2013)
Sandler, T., Blitzer, J., Talukdar, P.P., Ungar, L.H.: Regularized learning with networks of features. In: Advances in Neural Information Processing Systems 21. Curran Associates, Inc., pp. 1401–1408 (2009)
Barabási, A.L., Gulbahce, N., Loscalzo, J.: Network medicine: a network-based approach to human disease. Nature Reviews Genetics 12, 56–68 (2011)
Tran, T., Phung, D., Luo, W., Venkatesh, S.: Stabilized sparse ordinal regression for medical risk stratification. Knowledge and Information Systems, 1–28 (2014)
Ye, J., Liu, J.: Sparse methods for biomedical data. ACM SIGKDD Explorations Newsletter 14, 4–15 (2012)
Zhao, P., Yu, B.: On model selection consistency of lasso. The Journal of Machine Learning Research 7, 2541–2563 (2006)
Cun, Y., Fröhlich, H.: Biomarker gene signature discovery integrating network knowledge. Biology 1, 5–17 (2012)
Dao, P., Wang, K., Collins, C., Ester, M., Lapuk, A., Sahinalp, S.C.: Optimally discriminative subnetwork markers predict response to chemotherapy. Bioinformatics 27, i205–i213 (2011)
Sun, H., Lin, W., Feng, R., Li, H.: Network-regularized high-dimensional cox regression for analysis of genomic data. Statistica Sinica 24, 1433–1459 (2014)
Fröhlich, H.: Including network knowledge into cox regression models for biomarker signature discovery. Biometrical Journal 56, 287–306 (2014)
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for cox’s proportional hazards model via coordinate descent. Journal of Statistical Software 39, 1–13 (2011)
Vinzamuri, B., Reddy, C.: Cox regression with correlation based regularization for electronic health records. In: ICDM, pp. 757–766 (2013)
Tibshirani, R., et al.: The lasso method for variable selection in the cox model. Statistics in Medicine 16, 385–395 (1997)
Cox, D.R.: Partial likelihood. Biometrika 62, 269–276 (1975)
Xu, H., Caramanis, C., Mannor, S.: Sparse algorithms are not stable: A no-free-lunch theorem. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 187–193 (2012)
Liu, D.C., Nocedal, J.: On the limited memory bfgs method for large scale optimization. Mathematical Programming 45, 503–528 (1989)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 1345–1359 (2010)
Tran, T., Phung, D.Q., Luo, W., Harvey, R., Berk, M., Venkatesh, S.: An integrated framework for suicide risk prediction. In: KDD, 1410–1418 (2013)
Kuncheva, L.I.: A stability index for feature selection. In: Artificial Intelligence and Applications, 421–427 (2007)
Vinzamuri, B., Li, Y., Reddy, C.K.: Active learning based survival regression for censored data. In: CIKM 2014, 241–250. ACM, New York (2014)
Bilal, E., Dutkowski, J., Guinney, J., Jang, I.S., Logsdon, B.A., Pandey, G., Sauerwine, B.A., Shimoni, Y., Vollan, H.K.M., Mecham, B.H., et al.: Improving breast cancer survival analysis through competition-based multidimensional modeling. PLoS Computational Biology 9, e1003047 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Gopakumar, S., Nguyen, T.D., Tran, T., Phung, D., Venkatesh, S. (2015). Stabilizing Sparse Cox Model Using Statistic and Semantic Structures in Electronic Medical Records. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9078. Springer, Cham. https://doi.org/10.1007/978-3-319-18032-8_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-18032-8_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18031-1
Online ISBN: 978-3-319-18032-8
eBook Packages: Computer ScienceComputer Science (R0)