Datenbestand vom 15. November 2024
Tel: 0175 / 9263392 Mo - Fr, 9 - 12 Uhr
Impressum Fax: 089 / 66060799
aktualisiert am 15. November 2024
978-3-8439-0012-6, Reihe Informatik
Georg Hinselmann Data Mining on Chemical Graphs Using Kernel Algorithms
171 Seiten, Dissertation Eberhard-Karls-Universität Tübingen (2011), Softcover, A5
This thesis describes novel algorithmic approaches and experiments for kernel-based data mining algorithms for chemical graphs. Data mining is the process of mining knowledge from given data by applying algorithms for pattern recognition. The assessment of in silico chemical compounds by specific data mining approaches is essential to reduce the number of expensive real-world experiments. First, the theoretical foundations needed to understand the encodings and data mining approaches are introduced. An important point here is how chemical information can be compared and a special class of functions, the so-called Mercer kernels, can be applied to chemical graphs. Then, a new open-source toolkit for chemical fingerprints is introduced, which has been used in several publications. Afterwards, the results of a study are presented where a modified large-scale linear support-vector machine library was used to predict large and unbalanced classification problems related to chemical graphs. Large-scale machine learning tasks are becoming more and more important because the data available is steadily growing. Most approaches, however, are not suited for large-scale learning tasks because of their computational complexity. Next, new approaches are introduced to compute fingerprint encodings which can be compared efficiently. Finally, a novel graph kernel framework is presented using the geometrical and topological distance information between the vertices. The results obtained in combination with a support vector machine were excellent.