Datenbestand vom 15. November 2024
Tel: 0175 / 9263392 Mo - Fr, 9 - 12 Uhr
Impressum Fax: 089 / 66060799
aktualisiert am 15. November 2024
978-3-8439-5504-1, Reihe Ingenieurwissenschaften
Mhd Modar Halimeh Some Contributions to Machine Learning-based System Identification and Speech Enhancement for Nonlinear Acoustic Echo Control
203 Seiten, Dissertation Universität Erlangen-Nürnberg (2024), Softcover, A5
The contributions in this thesis can be outlined by decomposing the task of nonlinear acoustic echo control into two subtasks: Nonlinear Acoustic Echo Cancellation (NAEC) and Acoustic Echo Suppression (AES). In particular, by formulating the single-channel NAEC model-adaptation task as a Bayesian recursive filtering problem, an evolutionary resampling strategy for particle filtering is proposed. The resulting Elitist Resampling Particle Filter (ERPF) is shown experimentally to be an efficient and high-performing approach. The fundamental problem of nonlinear model design is addressed by proposing a novel Artificial Neural Networks (ANNs)-based approach (denoted the Adaptive Filtering-Inspired (AFI) ANN) that learns the optimal nonlinear basis functions to approximate the underlying nonlinear system. Using transfer learning, the learned basis functions are incorporated into conventional nonlinear models. Extending the ERPF to multichannel nonlinear models enables the adaptation of Nonlinear-in-the-Parameters (NIP) Multiple-Input/Multiple-Output (MIMO) echo path models. This extension is realized using a cooperative strategy which exploits the redundancy in the multichannel system identification problem and enables a geometrically informed approach that can utilize the microphone and loudspeaker array geometries. The resulting Cooperative Multichannel Elitist Resampling Particle Filter (CM-ERPF) is evaluated for both synthetic and real-world nonlinearities. Finally, for AES, a complex-valued Deep Neural Network (DNN) architecture (denoted the CPF) is proposed to estimate a complex-valued mask to extract the desired near-end speech signal. By utilizing complex-valued neural modules, the network is provided the capability of processing and exploiting complex-valued patterns and features such as complex-valued spectrograms. This results in speech signal estimates with minimal distortions and better overall quality when compared to other conventional counterparts.