Applications of Machine Learning for Linguistic Analysis of Texts

Applications of Machine Learning for Linguistic Analysis of Texts

Rosemary Torney, John Yearwood, Peter Vamplew, Andrei V. Kelarev
ISBN13: 9781466618336|ISBN10: 1466618337|EISBN13: 9781466618343
DOI: 10.4018/978-1-4666-1833-6.ch008
Cite Chapter Cite Chapter

MLA

Torney, Rosemary, et al. "Applications of Machine Learning for Linguistic Analysis of Texts." Machine Learning Algorithms for Problem Solving in Computational Applications: Intelligent Techniques, edited by Siddhivinayak Kulkarni, IGI Global, 2012, pp. 133-148. https://doi.org/10.4018/978-1-4666-1833-6.ch008

APA

Torney, R., Yearwood, J., Vamplew, P., & Kelarev, A. V. (2012). Applications of Machine Learning for Linguistic Analysis of Texts. In S. Kulkarni (Ed.), Machine Learning Algorithms for Problem Solving in Computational Applications: Intelligent Techniques (pp. 133-148). IGI Global. https://doi.org/10.4018/978-1-4666-1833-6.ch008

Chicago

Torney, Rosemary, et al. "Applications of Machine Learning for Linguistic Analysis of Texts." In Machine Learning Algorithms for Problem Solving in Computational Applications: Intelligent Techniques, edited by Siddhivinayak Kulkarni, 133-148. Hershey, PA: IGI Global, 2012. https://doi.org/10.4018/978-1-4666-1833-6.ch008

Export Reference

Mendeley
Favorite

Abstract

This chapter describes a novel multistage method for linguistic clustering of large collections of texts available on the Internet as a precursor to linguistic analysis of these texts. This method addresses the practicalities of applying clustering operations to a very large set of text documents by using a combination of unsupervised clustering and supervised classification. The method relies on creating a multitude of independent clusterings of a randomized sample selected from the International Corpus of Learner English. Several consensus functions and sophisticated algorithms are applied in two substages to combine these independent clusterings into one final consensus clustering, which is then used to train fast classifiers in order to enable them to perform the profiling of very large collections of text and web data. This approach makes it possible to apply advanced highly accurate and sophisticated clustering techniques by combining them with fast supervised classification algorithms. For the effectiveness of this multistage method it is crucial to determine how well the supervised classification algorithms are going to perform at the final stage, when they are used to process large data sets available on the Internet. This performance may also serve as an indication of the quality of the combined consensus clustering obtained in the preceding stages. The authors’ experimental results compare the performance of several classification algorithms incorporated in this multistage scheme and demonstrate that several of these classification algorithms achieve very high precision and recall and can be used in practical implementations of their method.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.