Datenbestand vom 04. April 2025

Verlag Dr. Hut GmbH
Sternstr. 18
80538 München
Tel: 0175 / 9263392
Mo - Fr, 9 - 12 Uhr

Impressum	Warenkorb	Datenschutzhinweis	Dissertationsdruck	Dissertationsverlag	Institutsreihen		Preisrechner

aktualisiert am 04. April 2025

ISBN 9783843948838

72,00 € ^{inkl. MwSt, zzgl. Versand}

978-3-8439-4883-8, Reihe Elektrotechnik

Benjamin Cauchi
Non-Intrusive Quality Evaluation of Speech Processed in Noisy and Reverberant Environments

128 Seiten, Dissertation Carl von Ossietzky Universität Oldenburg (2021), Hardcover, B5

Zusammenfassung / Abstract

In many speech applications such as hands-free telephony or voice-controlled home assistants, the distance between the user and the recording microphones can be relatively large. In such a far-field scenario, the recorded microphone signals are typically corrupted by noise and reverberation, which may severely degrade the performance of speech recognition systems and reduce intelligibility and quality of speech in communication applications. In order to limit these effects, speech enhancement algorithms are typically applied. The main objective of this thesis is to develop novel speech enhancement algorithms for noisy and reverberant environments and signal-based measures to evaluate these algorithms, focusing on solutions that are applicable in realistic scenarios.

First, we propose a single-channel speech enhancement algorithm for joint noise and reverberation reduction. The proposed algorithm uses a spectral gain to enhance the input signal, where the gain is computed using a combination of a statistical room acoustics model, minimum statistics and temporal cepstrum smoothing. This single-channel spectral enhancement algorithm can be combined easily with existing beamforming techniques when multiple microphones are available. Second, we propose two non-intrusive speech quality measures that combine perceptually motivated features and predicting functions based on machine learning. The first measure uses time-averaged modulation energies as input features to a model tree. The second measure uses time-varying modulation energies as input features to a recurrent neural network in order to take the time-dependency of the test signal into account. Both measures are trained and evaluated using a dataset of perceptually evaluated signals comprising a wide range of algorithms, settings and acoustic scenarios.