1 AIT Asian Institute of Technology

Performance evaluation criteria of deep learning techniques for music emotion retrieval using the combined audio and lyric feature

AuthorRaboy, Love Jhoye Moreno
Call NumberAIT Diss. no.ICT-24-03
Subject(s)Music--Data processing
Information storage and retrieval systems--Music
Deep learning (Machine learning)
NoteA dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Information and Communication Technologies
PublisherAsian Institute of Technology
AbstractThis study explores the efficacy of deep learning techniques in music emotion recognition by leveraging a combination of audio and lyric features. Considering the varied emotional expressions across different song sections, our analysis involves different datasets structured as Verse1, Chorus, Verse2, and combinations in Verse1-Chorus-Verse2 sequences. This research utilizes established metrics such as accuracy, recall, precision, and F1 score to determine which deep learning models perform optimally across these structured datasets. Our models are categorized into two types: unimodal, which includes Artificial Neural Networks (ANN), One Dimensional Convolutional Neural Networks (1DCNN), and Recurrent Neural Networks (RNN), each used independently. The second category comprises stacked ensemble models, which combine ANN-ANN, ANN-1DCNN, and ANN-RNN configurations. In these combinations, ANN is employed for audio features, while ANN, 1DCNN, and RNN are utilized for processing lyrics. Additionally, this study integrates advanced natural language processing techniques by employing Global Vectors for Word Representation and Bidirectional Encoder Representations from Transformers embeddings (BERT) to enhance the interpretation of lyrical content.The findings reveal that stacked ensemble models, using features from both audio signals and lyrics, significantly outperform unimodal models. This underscores the critical role of multimodal approaches in enhancing music emotion detection, particularly when different song sections are considered. This study not only advances our understanding of optimizing deep learning for emotion detection in music but also provides a valuable framework for future research, highlighting the importance of song structure in emotional analysis.
Year2024
TypeDissertation
SchoolSchool of Engineering and Technology
DepartmentDepartment of Information and Communications Technologies (DICT)
Academic Program/FoSInformation and Communication Technology (ICT)
Chairperson(s)Attaphongse Taparugssanagorn
Examination Committee(s)Mongkol Ekpanyapong;Chaklam Silpasuwanchai
DegreeThesis (Ph.D.) - Asian Institute of Technology, 2024


Usage Metrics
View Detail0
Read PDF0
Download PDF0