A New Evolving Tree-Based Model with Local Re-learning for Document Clustering and Visualization

Chang, Wui Lee; Tay, Kai Meng; Lim, Chee Peng

doi:10.1007/s11063-017-9597-3

A New Evolving Tree-Based Model with Local Re-learning for Document Clustering and Visualization

Published: 06 February 2017

Volume 46, pages 379–409, (2017)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Wui Lee Chang¹,
Kai Meng Tay¹ &
Chee Peng Lim²

511 Accesses
6 Citations
3 Altmetric
Explore all metrics

Abstract

The Evolving tree (ETree) is a hierarchical clustering and visualization model that allows the number of clusters to grow and evolve with new data samples in an online learning manner. While many hierarchical clustering models are available in the literature, ETree stands out because of its visualization capability. It is an enhancement of the Self-Organizing Map, a famous and useful clustering and visualization model. ETree organises the trained data samples in the form of a tree structure for better presentation and visualization especially for high-dimensional data samples. Even though ETree has been used in a number of applications, its use in textual document clustering and visualization is limited. In this paper, ETree is modified and deployed as a useful model for undertaking textual documents clustering and visualization problems. We introduce a new local re-learning procedure that allows the tree structure to grow and adapt to new features, i.e., new words from new textual documents. The performance of the proposed ETree model is evaluated with two (one benchmark and one real) document data sets. A number of key aspects of the proposed ETree model, which include its topology representation, learning time, as well as recall and precision rates, are evaluated. The results show that the proposed local re-learning procedure is useful for handling increasing number of features incrementally. In summary, this study contributes towards a modified ETree model and its use in a new domain, i.e., textual document clustering and visualization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Evolving Tree for Text Document Clustering and Visualization

An incremental clustering algorithm based on semantic concepts

Article 15 February 2024

Mahboubeh Soleymanian, Hoda Mashayekhi & Marziea Rahimi

A semi-supervised framework for concept-based hierarchical document clustering

Article 02 October 2023

Seyed Mojtaba Sadjadi, Hoda Mashayekhi & Hamid Hassanpour

References

Rui X, Wunsch DC (2009) Clustering. Wiley, IEEE Press
Google Scholar
Kohonen T (2001) Self-organizing maps, 3rd edn. Springer, Berlin
Book MATH Google Scholar
Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480
Article Google Scholar
Rauber A, Merkl D, Dittenbachm M (2002) The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data. IEEE Trans Neural Netw 13(6):1331–1341
Article Google Scholar
Carpenter GA, Grossberg S, Rosen DB (1991) ART 2-A: an adaptive resonance algorithm for rapid category learning and recognition. Neural Netw 4:493–504
Article Google Scholar
Carpenter GA, Grossberg S, Markuzon N, Reynolds JH, Rosen DB (1992) Fuzzy ARTMAP: a neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Trans Neural Netw 3(5):698–713
Article Google Scholar
Pal NR, Pal K, Keller JM, Bezdek JC (2005) A possibilistic fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst 13(4):517–530
Article Google Scholar
Kanungo T, Mount DM, Nethanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892
Article Google Scholar
Xu C, Tao D, Xu C (2015) Multi-view self-paced learning for clustering. In: Proceedings of 24th international conference on artificial intelligence, pp 3974–3980
Arora R, Gupta MR, Kapila A, Fazel M (2013) Similarity-based clustering by left-stochastic matrix factorization. Mach Learn Res 14(1):1715–1746
MathSciNet MATH Google Scholar
Hsu CC, Lin SH, Tai WS (2011) Apply extended self-organizing map to cluster and classify mixed-type data. Neurocomputing 74(18):3832–3842
Article Google Scholar
Tai WS, Hsu CC, Chen JC (2010) A mixed-type self-organizing map with a dynamic structure. In: International conference on neural networks, pp 1–8
Matharage S, Alahakoon D, Rajapakse J, Huang P (2011) Fast growing self-organizing map for text clustering. In: Lecturer notes computer science, neural information processing, 7063, pp 406–415
Kuo RJ, Wang CF, Chen ZY (2012) Integration of growing self-organizing and continuous genetic algorithm for grading lithium-ion battery cells. Appl Soft Comput 8(12):2012–2022
Article Google Scholar
Huang SY, Tsaih RH (2012) The prediction approach with growing hierarchical self-organizing map. In: International conference on neural networks, pp 1–7
Hosseini HS (2011) Binary tree time adaptive self-organizing map. Neurocomputing 74(11):1823–1839
Article MathSciNet Google Scholar
Allahyar A, Yazdi HS, Harati A (2015) Constrained semi-supervised growing self-organizing map. Neurocomputing 147:456–471
Article Google Scholar
Pakkanen J, Iivarinen J, Oja E (2006) The evolving tree-analysis and applications. IEEE Trans Neural Netw 17(3):591–603
Article Google Scholar
Pakkanen J, Iivarinen J, Oja E (2004) The evolving tree: a novel self-organizing network for data analysis. Neural Process Lett 20(33):199–211
Article Google Scholar
Fabrizio S (2005) Text cetegorization. In: Alessandro Z (ed) Text mining and its applications. WIT Press, Southampton, pp 109–129
Google Scholar
Fabrizio S (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47
Article Google Scholar
Lagus K, Kaski S, Kohonen T (2004) Mining massive document collections by the WEBSOM method. Inf Sci 163(1):135–156
Article Google Scholar
Kaski S, Honkela T, Lagus K, Kohonen T (1998) WEBSOM: self-organizing maps of document collections. Neurocomputing 21(1):101–117
Article MATH Google Scholar
Lewis DD (1998) Naïve Bayes at forty: the independence as assumption in information retrieval. Lect Notes Comp Sci 1398:4–15
Article Google Scholar
Hotho A, Maedche A, Staab S (2002) Ontology-based text document clustering. KI 16(4):48–54
Google Scholar
Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Burlington
Google Scholar
Dhillon IS (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of 7th international conference on knowledge discovery data mining, pp 269–274
Liu Y, Loh HT, Sun A (2009) Imbalanced text classification: a term weighting approach. Expert Syst Appl 36(1):690–701
Article Google Scholar
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52
Article Google Scholar
Ye J, Li Q (2004) LDA/QR: an efficient and effective dimension reduction algorithm and its theoretical foundation. Pattern Recognit 37(4):851–854
Article Google Scholar
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Article Google Scholar
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Article MATH Google Scholar
Yu J, Tao D, Wang M (2012) Adaptive hypergraph learning and its application in image classification. IEEE Trans Image Process 21(7):3262–3272
Article MathSciNet Google Scholar
Yu J, Hong R, Wang M, You J (2014) Image clustering based on sparse patch alignment framework. Pattern Recognit 47(11):3512–3519
Article Google Scholar
Tao D, Tang X, Li X, Wu X (2006) Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans Pattern Anal Mach Intell 28(7):1088–1099
Article Google Scholar
Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032
Article MathSciNet Google Scholar
Tao D, Li X, Wu X, Maybank SJ (2007) General tensor discriminant analysis and gabor features for gait recognition. IEEE Trans Pattern Anal Mach Intell 29(10):1700–1715
Article Google Scholar
Luo Y, Tao D, Ramamohanarao K, Xu C, Wen Y (2015) Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Trans Knowl Data Eng 27(11):3111–3124
Article Google Scholar
Luo Y, Tang J, Yan J, Xu C, Chen Z (2014) Pre-trained multi-view word embedding using two-side neural network. In: Proceedings of 28th AAAI conference, pp 1982–1988
Moore BC (1981) Principle component analysis in linear systems: controllability, observability, and model reduction. IEEE Trans Automat Control 26(1):17–32
Article MathSciNet MATH Google Scholar
Bingham E, Mannila H (2001) Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of 7th international conference on knowledge discovery data mining, pp 245–250
Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18(5):401–409
Article Google Scholar
Kohonen T, Kaski S, Lagus K, Salojarvi J, Honkela J, Paatero V, Saarela A (2000) Self organization of a massive document collection. IEEE Trans Neural Netw 11(3):574–586
Article Google Scholar
Bourgeois N, Cottrell M, Deruelle B, Lamasse S, Letremy P (2015) How to improve robustness in Kohonen maps and display additional information in factorial analysis: application to text mining. Neurocomputing 147:120–135
Article Google Scholar
Liu Y, Wang X, Wu C (2008) ConSOM: a conceptional self-organizing map model for text clustering. Neurocomputing 71(4):857–862
Article Google Scholar
Lughofer E (2011) Evolving fuzzy systems-methodologies, advanced concepts and applications, 1st edn. Springer, Berlin
Book MATH Google Scholar
Kim HJ, Kim JU, Ra YG (2005) Boosting Naïve Bayes text classification using uncertainty-based selective sampling. Neurocomputing 67(4):403–410
Article Google Scholar
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Article Google Scholar
Bezdek JC, Keller J, Krisnapuram R, Pal NR (1999) Fuzzy models and algorithms for pattern recognition and image processing. Kluwer, Dordrecht
Book MATH Google Scholar
Chang WL, Tay KM, Lim CP (2014) A new evolving tree for text document clustering and visualization. In: Soft computing in industrial applications, Springer, pp 141–151
Chang WL, Tay KM, Lim CP (2013) Enhancing an evolving tree-based text document visualization model with fuzzy \(c\)-means clustering. In: IEEE international conference fuzzy, pp 1–6
The Reuters-21578, Distribution 1.0 test collection is available from http://www.daviddlewis.com/resources/testcollections/reuters21578
Porter MF (1980) An algorithm for suffix stripping. Program Electron Lib 14(3):130–137
Google Scholar
The Default English Stop-words List is available from http://www.ranks.nl/resources/stopwords.html
Debole F, Sebastiani F (2005) An analysis of the relative hardness of Rueters-21578 subsets. J Am Soc Inf Sci Technol 56(6):584–586
Article Google Scholar
Yang Y, Liu X (1999) A re-examination of text categorization methods. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp 42–49
King A (2012) Online k-means clustering of nonstationary data. Prediction Project Report
Lin YS, Jiang JY, Lee SJ (2014) A similarity measure for text classification and clustering. IEEE Trans Knowl Data Eng 26(7):1575–1590
Article Google Scholar
Nagwani NK (2015) A comment on “a similarity measure for text classification and clustering”. IEEE Trans Knowl Data Eng 27(9):2589–2590
Article Google Scholar

Download references

Acknowledgements

To 2nd Regional Engineering Conference 2008 (EnCon 2008), and the organizing committee. Special thanks to Miss Liew Hui Chang who had helped during information collections and compilations. The authors had the permission to use the collection of abstracts from EnCon 2008, in which the authors would like to express gratitude for.

Author information

Authors and Affiliations

Faculty of Engineering, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia
Wui Lee Chang & Kai Meng Tay
Institute for Intelligent Systems Research and Innovation, Deakin University, Geelong, Australia
Chee Peng Lim

Authors

Wui Lee Chang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Meng Tay
View author publications
You can also search for this author in PubMed Google Scholar
Chee Peng Lim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kai Meng Tay.

Appendix

See Tables 7, 8, 9, and 10.

Table 7 Textual documents mapped onto \(N_{81,14} \)

Full size table

Table 8 Textual documents mapped onto \(N_{82,14} \)

Full size table

Table 9 Summary of textual documents mapped onto \(N_{4,2} \)

Full size table

Table 10 Summary of textual documents mapped onto \(N_{5,2} \)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chang, W.L., Tay, K.M. & Lim, C.P. A New Evolving Tree-Based Model with Local Re-learning for Document Clustering and Visualization. Neural Process Lett 46, 379–409 (2017). https://doi.org/10.1007/s11063-017-9597-3

Download citation

Published: 06 February 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s11063-017-9597-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Evolving Tree-Based Model with Local Re-learning for Document Clustering and Visualization

Abstract

Access this article

Similar content being viewed by others

A New Evolving Tree for Text Document Clustering and Visualization

An incremental clustering algorithm based on semantic concepts

A semi-supervised framework for concept-based hierarchical document clustering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A New Evolving Tree-Based Model with Local Re-learning for Document Clustering and Visualization

Abstract

Access this article

Similar content being viewed by others

A New Evolving Tree for Text Document Clustering and Visualization

An incremental clustering algorithm based on semantic concepts

A semi-supervised framework for concept-based hierarchical document clustering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation