Datenbestand vom 12. November 2024
Tel: 0175 / 9263392 Mo - Fr, 9 - 12 Uhr
Impressum Fax: 089 / 66060799
aktualisiert am 12. November 2024
978-3-8439-2314-9, Reihe Informationstechnik
Felix Johannes Weninger Intelligent Single-Channel Methods for Multi-Source Audio Analysis
221 Seiten, Dissertation Technische Universität München (2015), Softcover, A5
This thesis investigates the potential of recent machine learning methods for the challenging task of single-channel, multi-source audio audio analysis, i.e., information extraction from single-channel audio where the sources of interest (e.g., speech) are mixed with multiple interfering sources.
First, it is shown that source separation by recently proposed techniques for non-negative matrix factorization can significantly improve the recognition performance, compared to the state-of-the-art approach of training the recognition task with multi-source data.
Second, it is shown that by formulating the source separation problem itself as a recognition task, state-of-the-art methods for supervised training of recognition systems such as deep neural network models can be used to achieve previously unseen performance in single-channel source separation. In this context, supervised training of non-negative models is introduced as well.
The task of multi-source recognition as defined above is exemplified by challenging real-world speech separation and recognition problems where speech is mixed with non-stationary background noise such as music, and world-leading results in international evaluation campaigns are demonstrated for this task.
Furthermore, state-of-the-art results are presented in selected music information retrieval applications involving polyphonic audio, such as characterizing the singer, or transcribing the music into a score.