Robust speaker recognition in reverberant condition - toward greater biometric security

Al-Karawi, KA ORCID: https://orcid.org/0000-0001-9275-6902 2018, Robust speaker recognition in reverberant condition - toward greater biometric security , PhD thesis, University of Salford.

[img] PDF - Submitted Version
Restricted to Repository staff only until 8 June 2020.

Download (5MB)

Abstract

Automatic speaker recognition systems have developed into an increasingly relevant technology for security applications in modern times. The primary challenge for automatic speaker recognition is to deal with the variability of the environments and channels from where the speech was obtained. In previous work, good results have been achieved for clean, high-quality speech with the matching of training and test acoustic conditions. However, under mismatched conditions and reverberant environments, often expected in the real world, system performance degrades significantly.“ The main aim of this study is to improve the robustness of speaker recognition systems for real-world applications in reverberant conditions by developing methods that can reduce the detrimental effects of reverberation on the single microphone speech signal”.

The collection of suitable speech data sets is of crucial importance for testing the performance in the development of speaker recognition techniques. Therefore, a data set of anechoic speech recordings was generated and used to conduct the study regarding the suggested methods in this thesis. Furthermore, a typical speaker recognition system was implemented and then evaluated based on the current state of the art technique using Gaussian Mixture Models with two standard features. The effect of “reverberation time” and the “distance from the source to a receiver” on the system performance have also been examined, and the result confirms that whilst both parameters could affect the system accuracy.

A “maximum likelihood algorithm” is used for blind-estimate reverberation time from speech signals submitted for verification. The estimated values are used to choose a matched acoustic impulse response for inclusion in the retraining or fine-tuning of the pattern recognition model.

To endeavour more improvement, the “autocorrelation function” has been used to estimate the early reflections sound value for the submitted signal. The estimated early reflections sound value has convolved with the anechoic signal, and then used for training the pattern recognition model. Furthermore, both of the early to late ratio and RT have identified for the submitted sample and practically used to determine a matched channel for the training on the fly to improve the system performance.

The principal findings are that “reverberation time”, “early reflections” and “early to late ratio” can be estimated and then used with “training on the fly methods” to improve the speaker verification performance. The system is an improvement, which is demonstrated by comparing the performance of speaker recognition using “conventional methods” with the performance of the proposed “re-training method”.

Item Type: Thesis (PhD)
Contributors: Li, FF (Supervisor)
Schools: Schools > School of Computing, Science and Engineering > Salford Innovation Research Centre
Funders: The Ministry of Higher Education, Iraq
Depositing User: Khamis Ahmed Yousif
Date Deposited: 21 Sep 2018 09:33
Last Modified: 21 Sep 2018 09:33
URI: http://usir.salford.ac.uk/id/eprint/47139

Actions (login required)

Edit record (repository staff only) Edit record (repository staff only)

Downloads

Downloads per month over past year