Multi-lingual Speech Emotion Recognition System Using Machine Learning

Emel Çolakoğlu, Serhat HIZLISOY, Recep Sinan ARSLAN

Abstract


Predicting emotions from speech in different languages with high accuracy has been a challenging task for researchers in recent years. When we delve into the studies conducted in this field, it is clear that researchers generally try to recognize emotions from speech in their traditional language. However, these studies cannot be generalized for multi-lingual environments around the globe. The Turkish speech emotional dataset, which was created for use in our previous studies, was further expanded for use in this study too. Emo-db dataset was also used to benchmark the success of the proposed model. Various pre-processing stages such as standardization, sorting and resampling were applied to the data in the datasets to increase the performance of the model. OpenSMILE toolbox, which is frequently encountered in studies, was used to obtain features that provide meaningful information corresponding to the emotion in speech, and thousands of features were obtained from emobase2010 and emo_large feature sets. 8 different machine learning algorithms were used in the model to classify 4 different emotions for the Turkish dataset and 7 different emotions for the Emo-db dataset. The best recognition rates were achieved with 92.73% and 96.3%, respectively, for the Turkish dataset consisting of 1099 records and the Emo-db dataset consisting of 535 records, using the Emobase2010 as a feature set and Logistic Regression as a classifier.


Keywords


Speech Emotion Recognition; Machine Learning; Emo-DB; OpenSMILE

Full Text:

PDF

References


S. Hizlisoy and Z. Tufekci, “Türkçe Müzikten Duygu Tanıma,” Avrupa Bilim ve Teknoloji Dergisi, Vols. 6-12, 2020.

DOI: https://doi.org/10.31590/ejosat.802169

A. Tepecik and E. Demir, “Türkçe Ses Kayıt Verilerinin CountVectorizer ve TF-IDFVectorizer Yöntemleri ile BERT Modelleri Olarak Google Colab Platformunda ve RapidMiner’da Makine Öğrenmesi Algoritmalarıyla Analizi,” Fırat Üniversitesi Fen Bil. Dergisi, vol. 34(1), pp. 19-29, 2022.

R. S. Arslan, N. Barisci, N. Arici and S. Kocer, “Detecting and correcting automatic speech recognition errors with a new model,” Turkish Journal of Electrical Engineering and Computer Sciences, vol. 29, no. 5, pp. 2298-2311, 2021.

S. Hızlısoy And R. S. Arslan, “Text Independent Speaker Recognition Based On Mfcc And Machine Learning,” Selcuk University Journal Of Engineering Sciences, Vol.20, No.3, Pp.73-78, 2021.

R. S. Arslan and N. Barışçı, “The effect of different optimization techniques on end-to-end Turkish speech recognition systems that use connectionist temporal classification,” in 2018 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Ankara, 2018.

K. B. Bhangale and M. Kothandaraman, “Speech emotion recognition using the novel PEmoNet (Parallel Emotion Network),” Applied Acoustics, https://doi.org/10.1016/j.apacoust.2023.109613, Volume 212, September 2023, 109613 2023.

E. Çolakoğlu, “Speech Emotion Recognition In A New Turkish Emotional Speech Dataset,” Kayseri University, Institute Of Graduate Education, Msc Thesis., Kayseri, 2023.

C. Parlak and B. Diri, “Farklı Veri Setleri Arasında Duygu Tanıma Çalışması,” Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik dergisi, vol. 16, no. 48, pp. 21-29, 2014.

Z.T. Liu, B.H. Wu, M.T. Han, W.H. Cao and M. Wu, “Speech emotion recognition based on meta-transfer learning with domain adaption,”Applied Soft Computing, Volume 147, November 2023, 110766 2023.

E. Çolakoğlu, S. Hızlısoy and R. S. Aslan, “T-ser: An Efficient Speech Emotional Recognition Model For Turkish Language Based On Machine Learning Algorithms,” in Innovations And Technologies In Engineering, Eğitim Yayınevi, 2022, pp. 106-127.

R. S. Arslan and N. Barışçı, “Development of Output Correction Methodology for Long Short Term Memory-Based Speech Recognition,” Sustainability, vol. 11, no. 15, pp. 4250-4266, 2019

S. Hızlısoy, E. Çolakoğlu and R. S. Arslan, “Speech-to-Gender Recognition Based on Machine Learning Algorithms,” International Journal Of Applied Mathematics Electronics And Computers, no. 10(4), pp. 084-092, 2022.

R. S. Arslan and N. Barışçı, “A detailed survey of Turkish automatic speech recognition,” Turkish journal of electrical engineering and computer sciences, vol. 28, no. 6, pp. 3253-3269, 2020.

S. P. Mishra, P. Warule and S. Deb, “Variational mode decomposition based acoustic and entropy features for speech emotion recognition,” Applied Acoustics, Volume 212, September 2023, 109578 2023 https://doi.org/10.1016/j.apacoust.2023.109578.

M. Agarla, S. Bianco, L. Celona, P. Napoletano, A. Petrovsky, F. Piccoli, . R. Schettini and I. Shanin, “Semi-supervised cross-lingual speech emotion recognition,” Expert Systems With Applications, no. https://doi.org/10.1016/j.eswa.2023.121368. Volume 237, Part A, 1 March 2024, 121368, 2023.

S. Langari, H. Marvi and M. Zahedi, “Efficient speech emotion recognition using modified feature extraction,” Informatics in Medicine Unlocked, no. 20, 2020.

D. Issa, M. F. Demirci and A. Yazici, “Speech emotion recognition with deep convolutional neural networks,” Biomedical Signal Processing and Control, no. 59, 2020.

H. A. Abdulmohsin, H. B. Abdul Wahab and A. M. J. Abdul hossen, “A new proposed statistical feature extraction method in speech emotion recognition,” Computers and Electrical Engineering, no. 93, 2021.

J. Ancilin and A. Milton, “Improved speech emotion recognition with Mel frequency magnitude coefficient,” Applied Acoustics, no. 179, 2021.

L. Guo, L. Wang, J. Dang, E. S. Chng and S. Nakagawa, “Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition.,” Speech Communication, no. 136, pp. 118-127, 2022.

J. Liu, Z. Liu, L. Wang, L. Guo and J. Dang, “Speech Emotion Recognition with Local-Global Aware Deep Representation Learning,” in ICASSP 2020 - 2020 IEEE Uluslararası Akustik, Konuşma ve Sinyal İşleme Konferansı (ICASSP), Barselona İspanya, 2020.

K. Wang, G. Su, L. Liu and S. Wang, “Wavelet packet analysis for speaker-independent emotion recognition,” Neurocomputing, no. 398, pp. 257-264, 2020.

N. Jia and C. Zheng, “Two-level discriminative speech emotion recognition model with wave field dynamics: A personalized speech emotion recognition method,” Computer Communications, no. 180, pp. 161-170, 2021.

Z. Zhao, Z. Bao, Z. Zhang, N. Cummins, S. Sun, H. Wang, J. Tao and B. W. Schuller, “Self-attention transfer networks for speech emotion recognition,” Virtual Reality & Intelligent Hardware, no. 3, pp. 43-54, 2021.

V. Singh and S. Prasad, “Speech emotion recognition system using gender dependent convolution neural network,” Procedia Computer Science, Volume 218, 2023, Pages 2533-2540, 2023.

https://doi.org/10.1016/j.procs.2023.01.227

S. Ntalampiras, “Speech emotion recognition via learning analogies,” Pattern Recognition Letters, no. 144, pp. 21-26, 2021.

J. Li, X. Zhang, F. Li and L. Huang, “Speech emotion recognition based on optimized deep features of dual-channel complementary spectrogram,” Information Sciences. https://doi.org/10.1016/j.ins.2023.119649, Volume 649, November 2023, 119649 2023.

Eyben, F., & Schuller, B. (2015). openSMILE:) The Munich open-source large-scale multimedia feature extractor. ACM SIGMultimedia Records, 6(4), 4-13.

H. Perez and A. A. Torres, “Evaluation of quantitative and qualitative features for the acoustic analysis of domestic dogs’ vocalizations,” Journal of Intelligent & Fuzzy Systems, p. 5051–5061, 2019.

S. Mozaffari, The power of resampling: Techniques for boosting your ML model performance. LinkedIn. https://www.linkedin.com/pulse/power-resampling-techniques-boosting-your-ml-model-sadaf-mozaffari/. (Accessed:30 October 2023)

Editör Ekibi, “Matematik ile Makine Öğrenimi Standardizasyonu (Z-Score Normalization),” Yapay Zekaya Yönelik — Multidisipliner Bilim Dergisi, 2020.

V. Bolón, N. Sánchez and A. Alonso, “A review of feature selection methods on synthetic data,” 2012, p. 483–519.

C. Gan, K. Wang, Q. Zhu, Y. Xiang, D. K. Jain and S. García, “Speech emotion recognition via multiple fusion under spatial–temporal,” Neurocomputing, no. https://doi.org/10.1016/j.neucom.2023.126623, pp. Volume 555, 28 October 2023, 126623, 2023.

Z. Chen, M. Lin, Z. Wang, . Q. Zheng and C. Liu, “Spatio-temporal representation learning enhanced speech emotion recognition with multi-head attention mechanisms,” Knowledge-Based Systems, no. https://doi.org/10.1016/j.knosys.2023.111077, pp. Volume 281, 3 December 2023, 111077, 2023.

I. Shahin, O. A. Alomari, A. B. Nassif, I. Afyouni , . I. A. Hashem and A. Elnagar , “An efficient feature selection method for arabic and english speech emotion recognition using Grey Wolf Optimizer,” Applied Acoustics, Volume 205, 30 March 2023, https://doi.org/10.1016/j.apacoust.2023.109279.


Article Metrics

Metrics Loading ...

Metrics powered by PLOS ALM

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Abstracting and indexing

Index Copernicus International


scholar_logo_64dp.png


Selcuk university journal of engineering sciences (Online)


ICI World of Journals


ResearchBib

 

 

Selcuk University Journal of Engineering Sciences (SUJES)

ISSN: 2757-8828

 

Creative Commons License
This work is licensed under a  Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.