1 AIT Asian Institute of Technology

Drum recognition and automatic transcription from augmented digital sounds with bi-directional recurrent neural network

AuthorTin Panthong
Call NumberAIT Thesis no.IOT-25-01
Subject(s)Drum
Sound--Data processing
Neural networks (Computer science)
Computer sound processing
NoteA thesis submitted in partial fulfillment of the requirements for the degree of Master of Engineering in Internet of Things (IoT) Systems Engineering
PublisherAsian Institute of Technology
AbstractThis thesis presents an experimental drum recognition and transcription system, aimed at enhancing drumming skills using advanced technology. The transcription of drums, partic ularly due to their complex sound characteristics such as inharmonicity and rapid transients, presents significant challenges. To address these challenges, this study develops a system that transcribes drum performances into standard notation using MIDI data from an Electronic Drum Kit (E-GMD). The use of MIDI eliminates the variances and uncertainties found in real acoustic drum recordings, making the transcription process more controlled and accu rate.The proposed automatic drum transcription system integrates Short-Time Fourier Transform (STFT) and Bi-Directional Recurrent Neural Network (Bi-RNN). The STFT extracts promi nent time-frequency features from drum sounds, which are then used as inputs to the neural network. In parallel, the Bi-RNN applies supervised learning to interpret drum activity and capture the temporal dynamics of individual drum hits. To address the issue where the model is overfitting, the training process employs a Binary Cross-Entropy loss function combined with the Adam optimizer. Furthermore, the learning rate scheduler is utilized to decrease the learning rate once the loss stabilizes at a fixed point.Operating offline on a personal computer (PC), the system ensures a cost-effective and flex ible solution for experimental use, providing an efficient alternative to real-time processing. The transcription accuracy is evaluated using the F-measure, ensuring that the system deliv ers reliable performance. The results show an impressive F1 score of approximately 0.80 with a model convergence loss of 0.03. This research advances the field of music technology by offering an experimental solution to drum transcription. The system has the potential to support music educators, musicians, and students by providing a tool that aids in learning and improving drumming techniques through detailed transcription feedback.
Year2025
TypeThesis
SchoolSchool of Engineering and Technology
DepartmentDepartment of Information and Communications Technologies (DICT)
Academic Program/FoSInternet of Things (IoT) Systems Engineering
Chairperson(s)Attaphongse Taparugssanagorn
Examination Committee(s)Chaklam Silpasuwanchai;Chantri Polprasert
Scholarship Donor(s)AIT Fellowship
DegreeThesis (M. Eng.) - Asian Institute of Technology, 2025


Usage Metrics
View Detail0
Read PDF0
Download PDF0