Datenbestand vom 15. November 2024
Tel: 0175 / 9263392 Mo - Fr, 9 - 12 Uhr
Impressum Fax: 089 / 66060799
aktualisiert am 15. November 2024
978-3-8439-2781-9, Reihe Statistik
Moritz Maximilian Berger On the Detection of Latent Structures in Categorical Data
256 Seiten, Dissertation Ludwig-Maximilians-Universität München (2016), Softcover, A5
With the growing availability of huge amounts of data it is increasingly important to uncover the underlying data generating structures. The present work focusses on the detection of latent structures for categorical data, which have been treated less intensely in the literature. In regression models categorical variables are either the responses or part of the covariates. Alternative strategies have to be used to detect the underlying structures.
The first part of this thesis is dedicated to regression models with an excessive number of parameters. More concrete, we consider models with various categorical covariates and a potentially large number of categories. One interesting aspect is to identify the categories or units that have to be distinguished with respect to their effect on the response. The objective is to detect ``latent groups'' that share the same effects on the response variable. A novel approach to the clustering of categorical predictors or fixed effects is introduced, which is based on recursive partitioning techniques.
The second part of this thesis deals with item response models, which can be considered as regression models that aim at measuring ``latent abilities'' of persons. In item response theory one uses indicators such as the answers of persons to a collection of items to infer on the underlying abilities. When developing psychometric tests one has to be aware of the phenomenon of Differential Item Functioning (DIF). A general tree-based method is proposed that simultaneously detects the items and subgroups of persons that carry DIF including a set of variables on different scales. Compared to classical approaches a main advantage is that the proposed method automatically identifies regions of the covariate space that are responsible for DIF and do not have to be prespecified.
The last part of the thesis addresses regression models for rating scale data that are frequently used in behavioural research. Heterogeneity among respondents caused by ``latent response styles'' can lead to biased estimates and can affect the conclusion drawn from the observed ratings. The focus is on symmetric response categories and a specific form of response style, namely the tendency to the middle or extreme categories. The strength of the proposed models is that they can be embedded into the framework of generalized linear models and therefore inference techniques and asymptotic results for this class of models are available.