Datenbestand vom 15. November 2024

Warenkorb Datenschutzhinweis Dissertationsdruck Dissertationsverlag Institutsreihen     Preisrechner

aktualisiert am 15. November 2024

ISBN 9783843901222

72,00 € inkl. MwSt, zzgl. Versand


978-3-8439-0122-2, Reihe Informatik

Carsten Henneges
Feature Selection and Data Mining for Proteomics and Metabolomics

156 Seiten, Dissertation Eberhard-Karls-Universität Tübingen (2011), Softcover, A5

Zusammenfassung / Abstract

The past decades saw a rapid improvement in the technique of biological experiments. While the beginning was coined with long and complex experiments carried out by laboratory staff, automatic high-throughput methods emerged. Especially proteomics profited from advances in mass spectrometry to identify fragments from digested proteins. Mass spectrometry entered the novel field of metabolomics. Metabolomics investigates small molecules from metabolism for their role as disease markers. The goal here is to develop monitoring and screening techniques based on easily obtained body fluids. However, further methods, as IR spectroscopy, are waiting for their advent into metabolomics research.

Connected to each area of research is the topic of data mining. Essentially, data mining can be subdivided into the tasks of feature selection and feature construction. Feature selection aims to select relevant features out of a larger pool. A selected combination may aid visualisation and thus understanding of a dataset as well as improve the prediction performance of learning algorithms. To this end, three general approaches exist: wrapper, filter and embedded methods. While wrappers employ an arbitrary learning algorithm for assessing the value of a feature combination, filters rely on statistical criteria. Most recently, embedded methods attracted research interest, wherein feature selection is integrated into a learning algorithm. Feature construction algorithms on the other hand reconstruct the subsignals of an additive superposition. The key approach thereby is matrix factorisation by constraints. Frequent constraints used for this purpose are statistical independence as well as non-negativity and sparsity, leading to problem specific algorithms.

This book supports life science researchers with adapted data mining methods from both feature selection and feature construction for proteomics and metabolomics. We describe biomarker identification for breast cancer prediction using a SVM-based wrapper and develop faster wrapper algorithms using surrogate-based optimisation. Applying filters for ranking-specific feature selection, we also design a cost-efficient prediction system for proteotypic peptides. As an application, embedded methods are used to infer energetical interaction patterns in protein 3D structures. Finally, we develop a novel factorisation method for feature construction to decompose IR spectra within a metabolomics context.