Datenbestand vom 15. November 2024
Tel: 0175 / 9263392 Mo - Fr, 9 - 12 Uhr
Impressum Fax: 089 / 66060799
aktualisiert am 15. November 2024
978-3-8439-4883-8, Reihe Elektrotechnik
Benjamin Cauchi Non-Intrusive Quality Evaluation of Speech Processed in Noisy and Reverberant Environments
128 Seiten, Dissertation Carl von Ossietzky Universität Oldenburg (2021), Hardcover, B5
In many speech applications such as hands-free telephony or voice-controlled home assistants, the distance between the user and the recording microphones can be relatively large. In such a far-field scenario, the recorded microphone signals are typically corrupted by noise and reverberation, which may severely degrade the performance of speech recognition systems and reduce intelligibility and quality of speech in communication applications. In order to limit these effects, speech enhancement algorithms are typically applied. The main objective of this thesis is to develop novel speech enhancement algorithms for noisy and reverberant environments and signal-based measures to evaluate these algorithms, focusing on solutions that are applicable in realistic scenarios.
First, we propose a single-channel speech enhancement algorithm for joint noise and reverberation reduction. The proposed algorithm uses a spectral gain to enhance the input signal, where the gain is computed using a combination of a statistical room acoustics model, minimum statistics and temporal cepstrum smoothing. This single-channel spectral enhancement algorithm can be combined easily with existing beamforming techniques when multiple microphones are available. Second, we propose two non-intrusive speech quality measures that combine perceptually motivated features and predicting functions based on machine learning. The first measure uses time-averaged modulation energies as input features to a model tree. The second measure uses time-varying modulation energies as input features to a recurrent neural network in order to take the time-dependency of the test signal into account. Both measures are trained and evaluated using a dataset of perceptually evaluated signals comprising a wide range of algorithms, settings and acoustic scenarios.