Skip to main content

Automatic Speech-Based Classification of Gender, Age and Accent

  • Conference paper
Book cover Knowledge Management and Acquisition for Smart Systems and Services (PKAW 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6232))

Included in the following conference series:

Abstract

This paper presents an automatic speech-based classification scheme to classify speaker characteristics. In the training phase, speech data are grouped into speaker groups according to speakers’ gender, age and accent. Voice features are then extracted to feature vectors which are used to train speaker characteristic models with different techniques which are Vector Quantization, Gaussian Mixture Model and Support Vector Machine. Fusion of classification results from those groups is then performed to obtain final classification results for each characteristic. The Australian National Database of Spoken Language (ANDOSL) corpus was used for evaluation of gender, age and accent classification. Experiments showed high performance for the proposed classification scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Schultz, T.: Speaker characteristics, in Speaker Classification I, pp. 47–74. Springer, Heidelberg (2007)

    Book  Google Scholar 

  2. Minematsu, N., Sekiguchi, M., Hirose, K.: Automatic estimation of one’s age with his/her speech based upon acoustic modeling techniques of speakers. In: Proc. IEEE Int’l Conference on Acoustic Signal and Speech Processing, pp. 137–140 (2002)

    Google Scholar 

  3. Shafran, I., Riley, M., Mohri, M.: Voice signatures. In: Proc. IEEE Automatic Speech Recognition and Understanding Workshop (2003)

    Google Scholar 

  4. Metze, F., Ajmera, J., Englert, R., Bub, U., Burkhardt, F., Stegmann, J., Müller, C., Huber, R., Andrassy, B., Bauer, J.G., Littel, B.: Comparison of Four Approaches to Age and Gender Recognition for Telephone Applications. In: ICASSP 2007 Proceedings, IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu, Hawai’i, USA, vol. 4, pp. 1089–1092 (2007)

    Google Scholar 

  5. Shriberg, E.: Higher-Level Features in Speaker Recognition, in Speaker Classification I, pp. 241–259. Springer, Heidelberg (2007)

    Google Scholar 

  6. Schötz, S.: Acoustic analysis of adult speaker age, in Speaker Classification I, pp. 88–107. Springer, Heidelberg (2007)

    Google Scholar 

  7. Campbell, J.P., Reynolds, D.A., Dunn, R.B.: Fusing high- and low-level features for speaker recognition. In: Proceedings of Eurospeech, pp. 2665–2668 (2003)

    Google Scholar 

  8. Schuller, B., Batliner, A., Seppi, D., Steidl, S., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: The relevance of feature type for the automatic classification of emotional user states: Low Level Descriptors and Functionals. In: Proc. Interspeech, Antwerp, pp. 2253–2256 (2007)

    Google Scholar 

  9. Schuller, B., Steidl, S., Batliner, A.: The INTERSPEECH 2009 Emotion Challenge. In: Proc. Interspeech. ISCA, Brighton (2009)

    Google Scholar 

  10. Mitchell, A.G., Delbridge, A.: The Pronunciation of English in Australia, pp. 11–19 (1965)

    Google Scholar 

  11. http://www.convictcreations.com/research/languageidentity.html

  12. Harrington, J., Cox, F., Evans, Z.: An acoustic phonetic study of broad, general, and cultivated Australian English vowels. Australian Journal of Linguistics 17(2), 155–184 (1997)

    Article  Google Scholar 

  13. Berkling, K., Zissman, M., Vonwiller, J., Cleirigh, C.: Improving accent identification through knowledge of English syllable structure. In: ICSLP 1998, pp. 89–92 (1998)

    Google Scholar 

  14. Kumpf, K., King, R.W.: Automatic accent classification of foreign accented Australian English speech. In: Fourth International Conference on Spoken Language Processing, pp. 1740–1743 (1996)

    Google Scholar 

  15. Kollengode, A.S., Ahmad, H., Adam, B., Serge, B.: Performance of speaker-independent speech recognisers for automatic recognition of Australian English. In: Proceedings of the 11th Australian International Conference on Speech Science & Technology, Auckland, pp. 494–499 (2006)

    Google Scholar 

  16. Eyben, F., Wollmer, M., Schuller, B.: Speech and Music Interpretation by Large-Space Extraction (2009), http://sourceforge.net/projects/openSMILE

  17. Woodland, P.C., Gales, M.J.F., Pye, D., Young, S.J.: Broadcast news transcription using HTK. In: Proc. ICASSP 1997, Munich, pp. 719–722 (1997)

    Google Scholar 

  18. Millar, J.B., Vonwiller, J.P., Harrington, J.M., Dermody, P.J.: The Australian National Database of Spoken Language. In: Proc. Int. Conf. Acoust., Speech, Signal Processing (ICASSP 1994), vol. 1, pp. 97–100 (1994)

    Google Scholar 

  19. Duda, R.O., Hart, P.E.: Pattern classification and scene analysis. John Wiley & Sons, Chichester (1973)

    MATH  Google Scholar 

  20. Tran, D., Ma, W., Sharma, D., Nguyen, T.: Fuzzy Vector Quantization for Network Intrusion Detection. In: IEEE International Conference on Granular Computing, Silicon Valley, USA, November 2-4 (2007)

    Google Scholar 

  21. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)

    MATH  Google Scholar 

  22. Hathaway, R.: Another interpretation of the EM algorithm for mixture distribution. Journal of Statistics & Probability Letters 4, 53–56 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  23. Huang, X.D., Lee, K., Hon, H., Hwang, M.: Improved acoustic modeling for the SPHINX speech recognition system. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Toronto, Canada, pp. 345–348 (1991)

    Google Scholar 

  24. Reynolds, D., Rose, R.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Processing 3(1), 72–83 (1995)

    Article  Google Scholar 

  25. Wildermoth, B.R., Paliwal, K.K.: GMM based speaker recognition on readily available databases. In: Micro. Elec. Eng. Research Conf. 2003 (2003)

    Google Scholar 

  26. Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Knowledge Discovery and Data Mining 2(2), 121–167 (1998)

    Article  Google Scholar 

  27. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  28. Chang, C.-C., Lin, C.-J.: LibSVM: a library for sup-port vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nguyen, P., Tran, D., Huang, X., Sharma, D. (2010). Automatic Speech-Based Classification of Gender, Age and Accent. In: Kang, BH., Richards, D. (eds) Knowledge Management and Acquisition for Smart Systems and Services. PKAW 2010. Lecture Notes in Computer Science(), vol 6232. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15037-1_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15037-1_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15036-4

  • Online ISBN: 978-3-642-15037-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics