An audio-visual system for object-based audio : from recording to listening

Coleman, P ORCID: https://orcid.org/0000-0002-3266-7358, Franck, A ORCID: https://orcid.org/0000-0002-4707-6710, Francombe, J ORCID: https://orcid.org/0000-0003-3227-5001, Liu, Q ORCID: https://orcid.org/0000-0003-0778-2992, de Campos, T, Hughes, RJ ORCID: https://orcid.org/0000-0002-8916-1227, Menzies, D ORCID: https://orcid.org/0000-0003-1475-8798, Galvez, MFS, Tang, Y ORCID: https://orcid.org/0000-0003-1149-4272, Woodcock, JS ORCID: https://orcid.org/0000-0001-5654-5374, Jackson, PJB, Melchior, F, Pike, C ORCID: https://orcid.org/0000-0002-6638-7645, Fazi, FM, Cox, TJ ORCID: https://orcid.org/0000-0002-4075-7564 and Hilton, A 2018, 'An audio-visual system for object-based audio : from recording to listening' , IEEE Transactions on Multimedia, 20 (8) , pp. 1919-1931.

[img]
Preview
PDF - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview

Abstract

Object-based audio is an emerging representation for audio content, where content is represented in a reproduction format-agnostic way and, thus, produced once for consumption on many different kinds of devices. This affords new opportunities for immersive, personalized, and interactive listening experiences. This paper introduces an end-to-end object-based spatial audio pipeline, from sound recording to listening. A high-level system architecture is proposed, which includes novel audiovisual interfaces to support object-based capture and listenertracked rendering, and incorporates a proposed component for objectification, that is, recording content directly into an object-based form. Text-based and extensible metadata enable communication between the system components. An open architecture for object rendering is also proposed. The system’s capabilities are evaluated in two parts. First, listener-tracked reproduction of metadata automatically estimated from two moving talkers is evaluated using an objective binaural localization model. Second, object-based scene capture with audio extracted using blind source separation (to remix between two talkers) and beamforming (to remix a recording of a jazz group) is evaluated

Item Type: Article
Schools: Schools > School of Computing, Science and Engineering
Journal or Publication Title: IEEE Transactions on Multimedia
Publisher: IEEE
ISSN: 1520-9210
Related URLs:
Funders: Engineering and Physical Sciences Research Council (EPSRC)
Depositing User: RJ Hughes
Date Deposited: 11 Dec 2019 09:05
Last Modified: 16 Feb 2022 03:38
URI: https://usir.salford.ac.uk/id/eprint/53437

Actions (login required)

Edit record (repository staff only) Edit record (repository staff only)

Downloads

Downloads per month over past year