Reference Hub2
Machine Learning Algorithms for Analysis of DNA Data Sets

Machine Learning Algorithms for Analysis of DNA Data Sets

John Yearwood, Adil Bagirov, Andrei V. Kelarev
ISBN13: 9781466618336|ISBN10: 1466618337|EISBN13: 9781466618343
DOI: 10.4018/978-1-4666-1833-6.ch004
Cite Chapter Cite Chapter

MLA

Yearwood, John, et al. "Machine Learning Algorithms for Analysis of DNA Data Sets." Machine Learning Algorithms for Problem Solving in Computational Applications: Intelligent Techniques, edited by Siddhivinayak Kulkarni, IGI Global, 2012, pp. 47-58. https://doi.org/10.4018/978-1-4666-1833-6.ch004

APA

Yearwood, J., Bagirov, A., & Kelarev, A. V. (2012). Machine Learning Algorithms for Analysis of DNA Data Sets. In S. Kulkarni (Ed.), Machine Learning Algorithms for Problem Solving in Computational Applications: Intelligent Techniques (pp. 47-58). IGI Global. https://doi.org/10.4018/978-1-4666-1833-6.ch004

Chicago

Yearwood, John, Adil Bagirov, and Andrei V. Kelarev. "Machine Learning Algorithms for Analysis of DNA Data Sets." In Machine Learning Algorithms for Problem Solving in Computational Applications: Intelligent Techniques, edited by Siddhivinayak Kulkarni, 47-58. Hershey, PA: IGI Global, 2012. https://doi.org/10.4018/978-1-4666-1833-6.ch004

Export Reference

Mendeley
Favorite

Abstract

The applications of machine learning algorithms to the analysis of data sets of DNA sequences are very important. The present chapter is devoted to the experimental investigation of applications of several machine learning algorithms for the analysis of a JLA data set consisting of DNA sequences derived from non-coding segments in the junction of the large single copy region and inverted repeat A of the chloroplast genome in Eucalyptus collected by Australian biologists. Data sets of this sort represent a new situation, where sophisticated alignment scores have to be used as a measure of similarity. The alignment scores do not satisfy properties of the Minkowski metric, and new machine learning approaches have to be investigated. The authors’ experiments show that machine learning algorithms based on local alignment scores achieve very good agreement with known biological classes for this data set. A new machine learning algorithm based on graph partitioning performed best for clustering of the JLA data set. Our novel k-committees algorithm produced most accurate results for classification. Two new examples of synthetic data sets demonstrate that the authors’ k-committees algorithm can outperform both the Nearest Neighbour and k-medoids algorithms simultaneously.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.