Use this URL to cite or link to this record in EThOS:
Title: Music emotion recognition based on feature combination, deep learning and chord detection
Author: Zhang, Fan
ISNI:       0000 0004 7961 7774
Awarding Body: Brunel University London
Current Institution: Brunel University
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Access from Institution:
As one of the most classic human inventions, music appeared in many artworks, such as songs, movies and theatres. It can be seen as another language, used to express the authors thoughts and emotion. In many cases, music can express the meaning and emotion emerged which is the authors hope and the audience feeling. However, the emotions which appear during human enjoying the music is complex and difficult to precisely explain. Therefore, Music Emotion Recognition (MER) is an interesting research topic in artificial intelligence field for recognising the emotions from the music. The recognition methods and tools for the music signals are growing fast recently. With recent development of the signal processing, machine learning and algorithm optimization, the recognition accuracy is approaching perfection. In this thesis, the research is focused on three differentsignificantpartsofMER,thatarefeatures, learningmethodsandmusicemotion theory, to explain and illustrate how to effectively build MER systems. Firstly, an automatic MER system for classing 4 emotions was proposed where OpenSMILE is used for feature extraction and IS09 feature was selected. After the combination with STAT statistic features, Random Forest classifier produced the best performance than previous systems. It shows that this approach of feature selection and machine learning can indeed improve the accuracy of MER by at least 3.5% from other combinations under suitable parameter setting and the performance of system was improved by new features combination by IS09 and STAT reaching 83.8% accuracy. Secondly, another MER system for 4 emotions was proposed basedon the dynamic property of music signals where the features are extracted from segments of music signals instead of the whole recording in APM database. Then Long Shot-Term Memory (LSTM) deep learning model was used for classification. The model can use the dynamic continuous information between the different time frame segments for more effective emotion recognition. However, the final performance just achieved 65.7% which was not as good as expected. The reason might be that the database is not suitable to the LSTM as the initial thoughts. The information between the segments might be not good enough to improve the performance of recognition in comparison with the traditional methods. The complex deep learning method do not suitable for every database was proved by the conclusion,which shown that the LSTM dynamic deep learning method did not work well in this continuous database. Finally, it was targeted to recognise the emotion by the identification of chord inside as these chords have particular emotion information inside stated in previous theoretical work. The research starts by building a new chord database that uses the Adobe audition to extract the chord clip from the piano chord teaching audio. Then the FFT features based on the 1000 points sampling pre-process data and STAT features were extracted for the selected samples from the database. After the calculation and comparison using Euclidean distance and correlation, the results shown the STAT features work well in most of chords except the Augmented chord. The new approach of recognise 6 emotions from the music was first time used in this research and approached 75% accuracy of chord identification. In summary, the research proposed new MER methods through the three different approaches. Some of them achieved good recognition performance and some of them will have more broad application prospects.
Supervisor: Meng, H. ; Boulgouris, N. Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Long-short term memory ; Random Forest