Automatic Speech-Based Classification of Gender, Age and Accent

Nguyen, Phuoc; Tran, Dat; Huang, Xu; Sharma, Dharmendra

doi:10.1007/978-3-642-15037-1_24

Phuoc Nguyen²¹,
Dat Tran²¹,
Xu Huang²¹ &
…
Dharmendra Sharma²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6232))

Included in the following conference series:

Pacific Rim Knowledge Acquisition Workshop

890 Accesses
3 Citations

Abstract

This paper presents an automatic speech-based classification scheme to classify speaker characteristics. In the training phase, speech data are grouped into speaker groups according to speakers’ gender, age and accent. Voice features are then extracted to feature vectors which are used to train speaker characteristic models with different techniques which are Vector Quantization, Gaussian Mixture Model and Support Vector Machine. Fusion of classification results from those groups is then performed to obtain final classification results for each characteristic. The Australian National Database of Spoken Language (ANDOSL) corpus was used for evaluation of gender, age and accent classification. Experiments showed high performance for the proposed classification scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Schultz, T.: Speaker characteristics, in Speaker Classification I, pp. 47–74. Springer, Heidelberg (2007)
Book Google Scholar
Minematsu, N., Sekiguchi, M., Hirose, K.: Automatic estimation of one’s age with his/her speech based upon acoustic modeling techniques of speakers. In: Proc. IEEE Int’l Conference on Acoustic Signal and Speech Processing, pp. 137–140 (2002)
Google Scholar
Shafran, I., Riley, M., Mohri, M.: Voice signatures. In: Proc. IEEE Automatic Speech Recognition and Understanding Workshop (2003)
Google Scholar
Metze, F., Ajmera, J., Englert, R., Bub, U., Burkhardt, F., Stegmann, J., Müller, C., Huber, R., Andrassy, B., Bauer, J.G., Littel, B.: Comparison of Four Approaches to Age and Gender Recognition for Telephone Applications. In: ICASSP 2007 Proceedings, IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu, Hawai’i, USA, vol. 4, pp. 1089–1092 (2007)
Google Scholar
Shriberg, E.: Higher-Level Features in Speaker Recognition, in Speaker Classification I, pp. 241–259. Springer, Heidelberg (2007)
Google Scholar
Schötz, S.: Acoustic analysis of adult speaker age, in Speaker Classification I, pp. 88–107. Springer, Heidelberg (2007)
Google Scholar
Campbell, J.P., Reynolds, D.A., Dunn, R.B.: Fusing high- and low-level features for speaker recognition. In: Proceedings of Eurospeech, pp. 2665–2668 (2003)
Google Scholar
Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: The relevance of feature type for the automatic classification of emotional user states: Low Level Descriptors and Functionals. In: Proc. Interspeech, Antwerp, pp. 2253–2256 (2007)
Google Scholar
Schuller, B., Steidl, S., Batliner, A.: The INTERSPEECH 2009 Emotion Challenge. In: Proc. Interspeech. ISCA, Brighton (2009)
Google Scholar
Mitchell, A.G., Delbridge, A.: The Pronunciation of English in Australia, pp. 11–19 (1965)
Google Scholar
http://www.convictcreations.com/research/languageidentity.html
Harrington, J., Cox, F., Evans, Z.: An acoustic phonetic study of broad, general, and cultivated Australian English vowels. Australian Journal of Linguistics 17(2), 155–184 (1997)
Article Google Scholar
Berkling, K., Zissman, M., Vonwiller, J., Cleirigh, C.: Improving accent identification through knowledge of English syllable structure. In: ICSLP 1998, pp. 89–92 (1998)
Google Scholar
Kumpf, K., King, R.W.: Automatic accent classification of foreign accented Australian English speech. In: Fourth International Conference on Spoken Language Processing, pp. 1740–1743 (1996)
Google Scholar
Kollengode, A.S., Ahmad, H., Adam, B., Serge, B.: Performance of speaker-independent speech recognisers for automatic recognition of Australian English. In: Proceedings of the 11th Australian International Conference on Speech Science & Technology, Auckland, pp. 494–499 (2006)
Google Scholar
Eyben, F., Wollmer, M., Schuller, B.: Speech and Music Interpretation by Large-Space Extraction (2009), http://sourceforge.net/projects/openSMILE
Woodland, P.C., Gales, M.J.F., Pye, D., Young, S.J.: Broadcast news transcription using HTK. In: Proc. ICASSP 1997, Munich, pp. 719–722 (1997)
Google Scholar
Millar, J.B., Vonwiller, J.P., Harrington, J.M., Dermody, P.J.: The Australian National Database of Spoken Language. In: Proc. Int. Conf. Acoust., Speech, Signal Processing (ICASSP 1994), vol. 1, pp. 97–100 (1994)
Google Scholar
Duda, R.O., Hart, P.E.: Pattern classification and scene analysis. John Wiley & Sons, Chichester (1973)
MATH Google Scholar
Tran, D., Ma, W., Sharma, D., Nguyen, T.: Fuzzy Vector Quantization for Network Intrusion Detection. In: IEEE International Conference on Granular Computing, Silicon Valley, USA, November 2-4 (2007)
Google Scholar
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
MATH Google Scholar
Hathaway, R.: Another interpretation of the EM algorithm for mixture distribution. Journal of Statistics & Probability Letters 4, 53–56 (1986)
Article MATH MathSciNet Google Scholar
Huang, X.D., Lee, K., Hon, H., Hwang, M.: Improved acoustic modeling for the SPHINX speech recognition system. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Toronto, Canada, pp. 345–348 (1991)
Google Scholar
Reynolds, D., Rose, R.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Processing 3(1), 72–83 (1995)
Article Google Scholar
Wildermoth, B.R., Paliwal, K.K.: GMM based speaker recognition on readily available databases. In: Micro. Elec. Eng. Research Conf. 2003 (2003)
Google Scholar
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Knowledge Discovery and Data Mining 2(2), 121–167 (1998)
Article Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Chang, C.-C., Lin, C.-J.: LibSVM: a library for sup-port vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm

Download references

Author information

Authors and Affiliations

Faculty of Information Sciences and Engineering, University of Canberra, ACT 2601, Australia
Phuoc Nguyen, Dat Tran, Xu Huang & Dharmendra Sharma

Authors

Phuoc Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Dat Tran
View author publications
You can also search for this author in PubMed Google Scholar
Xu Huang
View author publications
You can also search for this author in PubMed Google Scholar
Dharmendra Sharma
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing ad Information Systems, University of Tasmania, TAS7250, Launceton, Tasmania, Australia
Byeong-Ho Kang
Computing Department,Division of Information and Communication Sciences, Macquarie University, 2109, Sydney, NSW, Australia
Debbie Richards

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, P., Tran, D., Huang, X., Sharma, D. (2010). Automatic Speech-Based Classification of Gender, Age and Accent. In: Kang, BH., Richards, D. (eds) Knowledge Management and Acquisition for Smart Systems and Services. PKAW 2010. Lecture Notes in Computer Science(), vol 6232. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15037-1_24

Download citation

DOI: https://doi.org/10.1007/978-3-642-15037-1_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15036-4
Online ISBN: 978-3-642-15037-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics