Coleman, P ORCID: https://orcid.org/0000-0002-3266-7358, Franck, A
ORCID: https://orcid.org/0000-0002-4707-6710, Francombe, J
ORCID: https://orcid.org/0000-0003-3227-5001, Liu, Q
ORCID: https://orcid.org/0000-0003-0778-2992, de Campos, T, Hughes, RJ
ORCID: https://orcid.org/0000-0002-8916-1227, Menzies, D
ORCID: https://orcid.org/0000-0003-1475-8798, Galvez, MFS, Tang, Y
ORCID: https://orcid.org/0000-0003-1149-4272, Woodcock, JS
ORCID: https://orcid.org/0000-0001-5654-5374, Jackson, PJB, Melchior, F, Pike, C
ORCID: https://orcid.org/0000-0002-6638-7645, Fazi, FM, Cox, TJ
ORCID: https://orcid.org/0000-0002-4075-7564 and Hilton, A
2018,
'An audio-visual system for object-based audio : from recording to listening'
, IEEE Transactions on Multimedia, 20 (8)
, pp. 1919-1931.
|
PDF
- Published Version
Available under License Creative Commons Attribution. Download (1MB) | Preview |
Abstract
Object-based audio is an emerging representation for audio content, where content is represented in a reproduction format-agnostic way and, thus, produced once for consumption on many different kinds of devices. This affords new opportunities for immersive, personalized, and interactive listening experiences. This paper introduces an end-to-end object-based spatial audio pipeline, from sound recording to listening. A high-level system architecture is proposed, which includes novel audiovisual interfaces to support object-based capture and listenertracked rendering, and incorporates a proposed component for objectification, that is, recording content directly into an object-based form. Text-based and extensible metadata enable communication between the system components. An open architecture for object rendering is also proposed. The system’s capabilities are evaluated in two parts. First, listener-tracked reproduction of metadata automatically estimated from two moving talkers is evaluated using an objective binaural localization model. Second, object-based scene capture with audio extracted using blind source separation (to remix between two talkers) and beamforming (to remix a recording of a jazz group) is evaluated
Item Type: | Article |
---|---|
Schools: | Schools > School of Computing, Science and Engineering |
Journal or Publication Title: | IEEE Transactions on Multimedia |
Publisher: | IEEE |
ISSN: | 1520-9210 |
Related URLs: | |
Funders: | Engineering and Physical Sciences Research Council (EPSRC) |
Depositing User: | RJ Hughes |
Date Deposited: | 11 Dec 2019 09:05 |
Last Modified: | 16 Feb 2022 03:38 |
URI: | https://usir.salford.ac.uk/id/eprint/53437 |
Actions (login required)
![]() |
Edit record (repository staff only) |