Skip to main content

Stabilizing Sparse Cox Model Using Statistic and Semantic Structures in Electronic Medical Records

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9078))

Included in the following conference series:

  • 4164 Accesses

Abstract

Stability in clinical prediction models is crucial for transferability between studies, yet has received little attention. The problem is paramount in high dimensional data, which invites sparse models with feature selection capability. We introduce an effective method to stabilize sparse Cox model of time-to-events using statistical and semantic structures inherent in Electronic Medical Records (EMR). Model estimation is stabilized using three feature graphs built from (i) Jaccard similarity among features (ii) aggregation of Jaccard similarity graph and a recently introduced semantic EMR graph (iii) Jaccard similarity among features transferred from a related cohort. Our experiments are conducted on two real world hospital datasets: a heart failure cohort and a diabetes cohort. On two stability measures – the Consistency index and signal-to-noise ratio (SNR) – the use of our proposed methods significantly increased feature stability when compared with the baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Austin, P.C., Tu, J.V.: Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. Journal of Clinical Epidemiology 57, 1138–1146 (2004)

    Article  Google Scholar 

  2. Lin, W., Lv, J.: High-dimensional sparse additive hazards regression. Journal of the American Statistical Association 108, 247–264 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  3. Sandler, T., Blitzer, J., Talukdar, P.P., Ungar, L.H.: Regularized learning with networks of features. In: Advances in Neural Information Processing Systems 21. Curran Associates, Inc., pp. 1401–1408 (2009)

    Google Scholar 

  4. Barabási, A.L., Gulbahce, N., Loscalzo, J.: Network medicine: a network-based approach to human disease. Nature Reviews Genetics 12, 56–68 (2011)

    Article  Google Scholar 

  5. Tran, T., Phung, D., Luo, W., Venkatesh, S.: Stabilized sparse ordinal regression for medical risk stratification. Knowledge and Information Systems, 1–28 (2014)

    Google Scholar 

  6. Ye, J., Liu, J.: Sparse methods for biomedical data. ACM SIGKDD Explorations Newsletter 14, 4–15 (2012)

    Article  Google Scholar 

  7. Zhao, P., Yu, B.: On model selection consistency of lasso. The Journal of Machine Learning Research 7, 2541–2563 (2006)

    MATH  MathSciNet  Google Scholar 

  8. Cun, Y., Fröhlich, H.: Biomarker gene signature discovery integrating network knowledge. Biology 1, 5–17 (2012)

    Article  Google Scholar 

  9. Dao, P., Wang, K., Collins, C., Ester, M., Lapuk, A., Sahinalp, S.C.: Optimally discriminative subnetwork markers predict response to chemotherapy. Bioinformatics 27, i205–i213 (2011)

    Article  Google Scholar 

  10. Sun, H., Lin, W., Feng, R., Li, H.: Network-regularized high-dimensional cox regression for analysis of genomic data. Statistica Sinica 24, 1433–1459 (2014)

    MathSciNet  Google Scholar 

  11. Fröhlich, H.: Including network knowledge into cox regression models for biomarker signature discovery. Biometrical Journal 56, 287–306 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  12. Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for cox’s proportional hazards model via coordinate descent. Journal of Statistical Software 39, 1–13 (2011)

    Google Scholar 

  13. Vinzamuri, B., Reddy, C.: Cox regression with correlation based regularization for electronic health records. In: ICDM, pp. 757–766 (2013)

    Google Scholar 

  14. Tibshirani, R., et al.: The lasso method for variable selection in the cox model. Statistics in Medicine 16, 385–395 (1997)

    Article  Google Scholar 

  15. Cox, D.R.: Partial likelihood. Biometrika 62, 269–276 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  16. Xu, H., Caramanis, C., Mannor, S.: Sparse algorithms are not stable: A no-free-lunch theorem. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 187–193 (2012)

    Article  Google Scholar 

  17. Liu, D.C., Nocedal, J.: On the limited memory bfgs method for large scale optimization. Mathematical Programming 45, 503–528 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  18. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 1345–1359 (2010)

    Article  Google Scholar 

  19. Tran, T., Phung, D.Q., Luo, W., Harvey, R., Berk, M., Venkatesh, S.: An integrated framework for suicide risk prediction. In: KDD, 1410–1418 (2013)

    Google Scholar 

  20. Kuncheva, L.I.: A stability index for feature selection. In: Artificial Intelligence and Applications, 421–427 (2007)

    Google Scholar 

  21. Vinzamuri, B., Li, Y., Reddy, C.K.: Active learning based survival regression for censored data. In: CIKM 2014, 241–250. ACM, New York (2014)

    Google Scholar 

  22. Bilal, E., Dutkowski, J., Guinney, J., Jang, I.S., Logsdon, B.A., Pandey, G., Sauerwine, B.A., Shimoni, Y., Vollan, H.K.M., Mecham, B.H., et al.: Improving breast cancer survival analysis through competition-based multidimensional modeling. PLoS Computational Biology 9, e1003047 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shivapratap Gopakumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Gopakumar, S., Nguyen, T.D., Tran, T., Phung, D., Venkatesh, S. (2015). Stabilizing Sparse Cox Model Using Statistic and Semantic Structures in Electronic Medical Records. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9078. Springer, Cham. https://doi.org/10.1007/978-3-319-18032-8_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18032-8_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18031-1

  • Online ISBN: 978-3-319-18032-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics