Audio content analysis in the presence of overlapped classes : a non-exclusive segmentation approach to mitigate information losses

Mohammed, DY, Duncan, PJ and Li, FF ORCID: 2015, Audio content analysis in the presence of overlapped classes : a non-exclusive segmentation approach to mitigate information losses , in: Global Summit and Expo on Multimedia & Applications, 10-11 August 2015, Birmingham, UK.

PDF (Abstract) - Accepted Version
Download (292kB) | Preview
[img] Microsoft Word (Abstract) - Accepted Version
Download (125kB)


Soundtracks of multimedia files are information rich, from which much content-related metadata can be extracted. There is a pressing demand for automated classification, identification and information mining of audio content. A segment of the audio soundtrack can be either speech, music, event sounds or a combination of them.There exist many individual algorithms for the recognition and analysis of speech, music or event sounds, allowing for embedded information to be retrieved in a semantic fashion. A systematic review shows that a universal system that is optimised to extract the maximum amount of information for further text mining and inference does not exist. Mainstream algorithms typically work with a single class of sound, e.g. speech, music or even sounds and classification methods are predominantly exclusive (detects one class at a time) and losing much of information when two or three classes are overlapped. A universal open architecture for audio content and scene analysis has been proposed by the authors. To mitigate information losses in overlapped content, non-exclusive segmentation approaches were adopted. This paper is presented from one possible implementation deploying the universal open architecture as a paradigm to show how the universal open architecture can integrate existing methods and workflow but maximise extractable semantic information. In the current work, overlapped content is identified and segmented from carefully tailored feature spaces and a family of decision trees are used to generate a content score. Results show that the developed system, when compared with well established audio content analysers, can identify and thus extract information from much more speech and music segments. The full paper will discuss the methods, detail the results and illustrate how the system works.

Item Type: Conference or Workshop Item (Speech)
Schools: Schools > School of Computing, Science and Engineering
Journal or Publication Title: Conference Proceedings of the Global Summit and Expo on Multimedia & Applications
Publisher: Omics Group
Related URLs:
Depositing User: Duraid Yehya Mohammed
Date Deposited: 07 Jul 2017 08:21
Last Modified: 15 Feb 2022 22:11

Actions (login required)

Edit record (repository staff only) Edit record (repository staff only)


Downloads per month over past year