Optimal feature selection and machine learning for high-level audio classification : a random forests approach

Al Maathidi, MM 2017, Optimal feature selection and machine learning for high-level audio classification : a random forests approach , PhD thesis, University of Salford.

PDF - Accepted Version
Download (2MB) | Preview


Content related information, metadata, and semantics can be extracted from soundtracks of multimedia files. Speech recognition, music information retrieval and environmental sound detection techniques have been developed into a fairly mature technology enabling a final text mining process to obtain semantics for the audio scene. An efficient speech, music and environmental sound classification system, which correctly identify these three types of audio signals and feed them into dedicated recognisers, is a critical pre-processing stage for such a content analysis system. The performance and computational efficiency of such a system is predominately dependent on the selected features.

This thesis presents a detailed study to identify the suitable classification features and associate a suitable machine learning technique for the intended classification task. In particular, a systematic feature selection procedure is developed to employ the random forests classifier to rank the features according to their importance and reduces the dimensionality of the feature space accordingly. This new technique avoids the trial-and-error approach used by many authors researchers. The implemented feature selection produces results related to individual classification tasks instead of the commonly used statistical distance criteria based approaches that does not consider the intended classification task, which makes it more suitable for supervised learning with specific purposes. A final collective decision-making stage is employed to combine multiple class detectors patterns into one to produce a single classification result for each input frames.

The performance of the proposed feature selection technique has been compared with the techniques proposed by MPEG-7 standard to extract the reduced feature space. The results show a significant improvement in the resulted classification accuracy, at the same time, the feature space is simplified and computational overhead reduced.

The proposed feature selection and machine learning technique enable the use of only 30 out of the 47 features without degrading the classification accuracy while the classification accuracy lowered by 1.7% only while just 10 features were utilised. The validation shows good performance also and the last stage of collective decision making was able to improve the classification result even after selecting only a small number of classification features. The work represents a successful attempt to determine audio feature importance and classify the audio contents into speech, music and environmental sound using a selected feature subset. The result shows a high degree of accuracy by utilising the random forests for both feature importance ranking and audio content classification.

Item Type: Thesis (PhD)
Schools: Schools > School of Computing, Science and Engineering
Depositing User: MM Abd
Date Deposited: 20 Feb 2018 12:39
Last Modified: 20 Oct 2021 14:28
URI: https://usir.salford.ac.uk/id/eprint/44338

Actions (login required)

Edit record (repository staff only) Edit record (repository staff only)


Downloads per month over past year