1
Performance evaluation criteria of deep learning techniques for music emotion retrieval using the combined audio and lyric feature | |
| Author | Raboy, Love Jhoye Moreno |
| Call Number | AIT Diss. no.ICT-24-03 |
| Subject(s) | Music--Data processing Information storage and retrieval systems--Music Deep learning (Machine learning) |
| Note | A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Information and Communication Technologies |
| Publisher | Asian Institute of Technology |
| Abstract | This study explores the efficacy of deep learning techniques in music emotion recognition by leveraging a combination of audio and lyric features. Considering the varied emotional expressions across different song sections, our analysis involves different datasets structured as Verse1, Chorus, Verse2, and combinations in Verse1-Chorus-Verse2 sequences. This research utilizes established metrics such as accuracy, recall, precision, and F1 score to determine which deep learning models perform optimally across these structured datasets. Our models are categorized into two types: unimodal, which includes Artificial Neural Networks (ANN), One Dimensional Convolutional Neural Networks (1DCNN), and Recurrent Neural Networks (RNN), each used independently. The second category comprises stacked ensemble models, which combine ANN-ANN, ANN-1DCNN, and ANN-RNN configurations. In these combinations, ANN is employed for audio features, while ANN, 1DCNN, and RNN are utilized for processing lyrics. Additionally, this study integrates advanced natural language processing techniques by employing Global Vectors for Word Representation and Bidirectional Encoder Representations from Transformers embeddings (BERT) to enhance the interpretation of lyrical content.The findings reveal that stacked ensemble models, using features from both audio signals and lyrics, significantly outperform unimodal models. This underscores the critical role of multimodal approaches in enhancing music emotion detection, particularly when different song sections are considered. This study not only advances our understanding of optimizing deep learning for emotion detection in music but also provides a valuable framework for future research, highlighting the importance of song structure in emotional analysis. |
| Year | 2024 |
| Type | Dissertation |
| School | School of Engineering and Technology |
| Department | Department of Information and Communications Technologies (DICT) |
| Academic Program/FoS | Information and Communication Technology (ICT) |
| Chairperson(s) | Attaphongse Taparugssanagorn |
| Examination Committee(s) | Mongkol Ekpanyapong;Chaklam Silpasuwanchai |
| Degree | Thesis (Ph.D.) - Asian Institute of Technology, 2024 |