Stabilizing Linear Prediction Models Using Autoencoder

Gopakumar, Shivapratap; Tran, Truyen; Phung, Dinh; Venkatesh, Svetha

doi:10.1007/978-3-319-49586-6_46

Shivapratap Gopakumar¹⁸,
Truyen Tran¹⁸,
Dinh Phung¹⁸ &
…
Svetha Venkatesh¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10086))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2454 Accesses

Abstract

To date, the instability of prognostic predictors in a sparse high dimensional model, which hinders their clinical adoption, has received little attention. Stable prediction is often overlooked in favour of performance. Yet, stability prevails as key when adopting models in critical areas as healthcare. Our study proposes a stabilization scheme by detecting higher order feature correlations. Using a linear model as basis for prediction, we achieve feature stability by regularizing latent correlation in features. Latent higher order correlation among features is modelled using an autoencoder network. Stability is enhanced by combining a recent technique that uses a feature graph, and augmenting external unlabelled data for training the autoencoder network. Our experiments are conducted on a heart failure cohort from an Australian hospital. Stability was measured using Consistency index for feature subsets and signal-to-noise ratio for model parameters. Our methods demonstrated significant improvement in feature stability and model estimation stability when compared to baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://apps.who.int/classifications/icd10.
2.
https://www.aihw.gov.au/procedures-data-cubes/.
3.
We ignore the bias parameter for simplicity.
4.
Ethics approval was obtained from the Hospital and Research Ethics Committee at Barwon Health (number 12/83) and Deakin University.

References

Au, W.H., Chan, K.C., Wong, A.K., Wang, Y.: Attribute clustering for grouping, selection, and classification of gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 2(2), 83–101 (2005)
Article Google Scholar
Austin, P.C., Tu, J.V.: Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. J. Clin. Epidemiol. 57(11), 1138–1146 (2004)
Article Google Scholar
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
Article MATH Google Scholar
Betihavas, V., Davidson, P.M., Newton, P.J., Frost, S.A., Macdonald, P.S., Stewart, S.: What are the factors in risk prediction models for rehospitalisation for adults with chronic heart failure? Aust. Crit. Care: Official J. Confederation Aust. Crit. Care Nurses 25(1), 31–40 (2012). http://www.ncbi.nlm.nih.gov/pubmed/21889893
Article Google Scholar
Cun, Y., Fröhlich, H.: Network and data integration for biomarker signature discovery via network smoothed t-statistics. PLoS One 8(9), e73074 (2013)
Article Google Scholar
Gopakumar, S., Tran, T., Nguyen, T.D., Phung, D., Venkatesh, S.: Stabilizing highdimensional prediction models using feature graphs. IEEE J. Biomed. Health Inform. 19(3), 1044–1052 (2015)
Article Google Scholar
Jacob, L., Obozinski, G., Vert, J.P.: Group lasso with overlap and graph lasso. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 433–440. ACM (2009)
Google Scholar
Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12(1), 95–116 (2007)
Article Google Scholar
Kamkar, I., Gupta, S.K., Phung, D., Venkatesh, S.: Exploiting feature relationships towards stable feature selection. In: IEEE International Conference on Data Science and Advanced Analytics (DSAA), 36678, pp. 1–10. IEEE (2015)
Google Scholar
Kuncheva, L.I.: A stability index for feature selection. In: Artificial Intelligence and Applications, pp. 421–427 (2007)
Google Scholar
Li, C., Li, H.: Network-constrained regularization and variable selection for analysis of genomic data. Bioinform. 24(9), 1175–1182 (2008)
Article Google Scholar
Lin, W., Lv, J.: High-dimensional sparse additive hazards regression. J. Am. Stat. Assoc. 108(501), 247–264 (2013)
Article MathSciNet MATH Google Scholar
Ma, S., Song, X., Huang, J.: Supervised group lasso with applications to microarray data analysis. BMC Bioinform. 8(1), 1–17 (2007)
Article Google Scholar
Meinshausen, N., Bühlmann, P.: Stability selection. J. Roy. Stat. Soc. B (Stat. Methodol.) 72(4), 417–473 (2010)
Article MathSciNet Google Scholar
Park, M.Y., Hastie, T., Tibshirani, R.: Averaged gene expressions for regression. Biostatistics 8(2), 212–227 (2007)
Article MATH Google Scholar
Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. Health Inf. Sci. Syst. 2(1), 1–10 (2014)
Article Google Scholar
Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th International Conference on Machine Learning, pp. 759–766. ACM (2007)
Google Scholar
Sandler, T., Blitzer, J., Talukdar, P.P., Ungar, L.H.: Regularized learning with networks of features. In: Advances in Neural Information Processing Systems, vol. 21, pp. 1401–1408. Curran Associates, Inc. (2009)
Google Scholar
Simon, N., Friedman, J., Hastie, T., Tibshirani, R., et al.: Regularization paths for cox’s proportional hazards model via coordinate descent. J. Stat. Softw. 39(5), 1–13 (2011)
Article Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 67(1), 91–108 (2005)
Article MathSciNet MATH Google Scholar
Tran, T., Phung, D., Luo, W., Harvey, R., Berk, M., Venkatesh, S.: An integrated framework for suicide risk prediction. In: 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1410–1418. ACM (2013)
Google Scholar
Tran, T., Phung, D., Luo, W., Venkatesh, S.: Stabilized sparse ordinal regression for medical risk stratification. Knowl. Inf. Syst., 1–28 (2014)
Google Scholar
Ye, J., Liu, J.: Sparse methods for biomedical data. ACM SIGKDD Explor. Newsl. 14(1), 4–15 (2012)
Article MathSciNet Google Scholar
Yu, L., Ding, C., Loscalzo, S.: Stable feature selection via dense feature groups. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 803–811. ACM (2008)
Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. B (Stat. Methodol.) 68(1), 49–67 (2006)
Article MathSciNet MATH Google Scholar
Zhao, P., Yu, B.: On model selection consistency of lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)
MathSciNet MATH Google Scholar
Zhou, J., Sun, J., Liu, Y., Hu, J., Ye, J.: Patient risk prediction model via top-k stability selection. In: Proceedings of the 13th SIAM International Conference on Data Mining. SIAM (2013)
Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. B 67, 301–320 (2005)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Center for Pattern Recognition and Data Analytics, Deakin University, Burwood, Australia
Shivapratap Gopakumar, Truyen Tran, Dinh Phung & Svetha Venkatesh

Authors

Shivapratap Gopakumar
View author publications
You can also search for this author in PubMed Google Scholar
Truyen Tran
View author publications
You can also search for this author in PubMed Google Scholar
Dinh Phung
View author publications
You can also search for this author in PubMed Google Scholar
Svetha Venkatesh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shivapratap Gopakumar .

Editor information

Editors and Affiliations

University of Technology , Sydney, New South Wales, Australia
Jinyan Li
University of Queensland , Brisbane, Australia
Xue Li
Beijing Institute of Technology , Beijing, China
Shuliang Wang
University of Western Australia , Crawley, West Australia, Australia
Jianxin Li
University of Adelaide , Adelaide, South Australia, Australia
Quan Z. Sheng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gopakumar, S., Tran, T., Phung, D., Venkatesh, S. (2016). Stabilizing Linear Prediction Models Using Autoencoder. In: Li, J., Li, X., Wang, S., Li, J., Sheng, Q. (eds) Advanced Data Mining and Applications. ADMA 2016. Lecture Notes in Computer Science(), vol 10086. Springer, Cham. https://doi.org/10.1007/978-3-319-49586-6_46

Download citation

DOI: https://doi.org/10.1007/978-3-319-49586-6_46
Published: 13 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49585-9
Online ISBN: 978-3-319-49586-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics