Blind estimation of room acoustic parameters from speech and music signals
Kendrick, P 2009, Blind estimation of room acoustic parameters from speech and music signals , PhD thesis, University of Salford.
|PDF - Accepted Version |
Download (7MB) | Preview
The acoustic character of a space is often quantified using objective room acoustic parameters. The measurement of these parameters is difficult in occupied conditions and thus measurements are usually performed when the space is un-occupied. This is despite the knowledge that occupancy can impact significantly on the measured parameter value. Within this thesis new methods are developed by which naturalistic signals such as speech and music can be used to perform acoustic parameter measurement. Adoption of naturalistic signals enables passive measurement during orchestral performances and spoken announcements, thus facilitating easy in-situ measurement. Two methods are described within this work; (1) a method utilising artificial neural networks where a network is taught to recognise acoustic parameters from received, reverberated signals and (2) a method based on the maximum likelihood estimation of the decay curve of the room from which parameters are then calculated. (1) The development of the neural network method focuses on a new pre-processor for use with music signals. The pre-processor utilises a narrow band filter bank with centre frequencies chosen based on the equal temperament scale. The success of a machine learning method is linked to the quality of the training data and therefore realistic acoustic simulation algorithms were used to generate a large database of room impulse responses. Room models were defined with realistic randomly generated geometries and surface properties; these models were then used to predict the room impulse responses. (2) In the second approach, a statistical model of the decay of sound in a room was further developed. This model uses a maximum likelihood (ML) framework to yield a number of decay curve estimates from a received reverberant signal. The success of the method depends on a number of stages developed for the algorithm; (a) a pre-processor to select appropriate decay phases for estimation purposes, (b) a rigorous optimisation algorithm to ensure the correct maximum likelihood estimate is found and (c) a method to yield a single optimum decay curve estimate from which the parameters are calculated. The ANN and ML methods were tested using orchestral music and speech signals. The ANN method tended to perform well when estimating the early decay time (EDT), for speech and music signals the error was within the subjective difference limens. However, accuracy was reduced for the reverberation time (Rt) and other parameters. By contrast the ML method performed well for Rt with results for both speech and music within the difference limens for reasonable (<4s) reverberation time. In addition reasonable accuracy was found for EDT, Clarity (C80), Centre time (Ts) and Deutichkeit (D). The ML method is also capable of producing accurate estimates of the binaural parameters Early Lateral Energy Fraction (LEF) and the late lateral strength (LG). A number of real world measurements were carried out in concert halls where the ML accuracy was shown to be sufficient for most parameters. The ML method has the advantage over the ANN method due to its truly blind nature (the ANN method requires a period of learning and is therefore semi-blind). The ML method uses gaps of silence between notes or utterances, when these silence regions are not present the method does not produce an estimate. Accurate estimation requires a long recording (hours of music or many minutes of speech) to ensure that at least some silent regions are present. This thesis shows that, given a sufficiently long recording, accurate estimates of many acoustic parameters can be obtained directly from speech and music. Further extensions to the ML method detailed in this thesis combine the ML estimated decay curve with cepstral methods which detect the locations of early reflections. This improves the accuracy of many of the parameter estimates.
|Item Type:||Thesis (PhD)|
|Contributors:||Cox, TJ(Supervisor) and Li, F (Supervisor)|
|Themes:||Built and Human Environment|
Media, Digital Technology and the Creative Economy
Memory, Text and Place
|Schools:||Colleges and Schools > College of Science & Technology > School of Computing, Science and Engineering|
Colleges and Schools > College of Science & Technology > School of the Built Environment
|Depositing User:||P Kendrick|
|Date Deposited:||20 May 2011 10:38|
|Last Modified:||19 Feb 2014 15:16|
Document DownloadsMore statistics for this item...
Actions (login required)
|Edit record (repository staff only)|