Datenbestand vom 15. November 2024
Tel: 0175 / 9263392 Mo - Fr, 9 - 12 Uhr
Impressum Fax: 089 / 66060799
aktualisiert am 15. November 2024
978-3-8439-1207-5, Reihe Statistik
Andreas Mayr Boosting beyond the mean - extending component-wise gradient boosting algorithms to multiple dimensions
209 Seiten, Dissertation Universität Erlangen-Nürnberg (2013), Softcover, A5
Flexible statistical modelling approaches have an increasing impact in biomedical research. One major focus is on modelling high-dimensional data sets which are typically resulting from modern applications such as genome-wide association studies. This thesis aims to extend component-wise boosting algorithms to multiple dimensions – where the term dimensions refers to both, the prediction space as well as the parameter space.
All presented algorithms are based on the concept of boosting, which has its roots in the field of machine learning. The basic idea is to iteratively apply simple regression methods and aggregate them to a final additive model. In this work, we will first introduce a sequential stopping rule for the tuning of component-wise boosting algorithms. A second approach focuses on extending the classic point prediction, resulting from boosting, to prediction intervals via quantile regression. Third, a boosting algorithm is presented that achieves the simultaneous estimation of additive models for different distribution parameters as implied by the GAMLSS (generalized additive models for location, scale and shape) model class. Applying boosting for estimating GAMLSS makes it possible to include the selection of informative predictors in the fitting process. Furthermore, the amount of available types of effects that can be modeled is increased in comparison to the standard implementation. These advantages are demonstrated both by empirical results based on simulated data as well as in various biomedical applications.
The results of this thesis make it possible to simultaneously model different distribution parameters via boosting while incorporating the selection of the most influential variables for each model. Furthermore, it is for the first time possible to fit GAMLSS in the presence of more candidate variables than observations. The computational implementation of all presented algorithms is freely available, and the practical usage of the underlying code is demonstrated in different examples.