Quantitative Aspekte der Modalpartikelverwendung.
Untersuchungen zum automatisch annotierten Korpus für gesprochenes Deutsch FOLK
DOI:
https://doi.org/10.13092/9wv7n672Abstract
The topic of this article are quantitative aspects of the use of modal particles (MPs) in corpora of spoken German, i. e. the token rates of individual or all MPs and the ranking of their frequency. Since MPs consistently have heterosemes in other word classes (adverb, focus particle, interjection, etc.), in the past such analyses had to be conducted manually. It was only in 2017 that automatic POS tagging adapted to spoken language was released for the FOLK corpus, enabling an automatic search for MPs. By comparing the automatic counts in FOLK to the frequency data of the manually analysed corpora of Hentschel (1986) and Brünjes (2014) and by checking the POS tagging of samples randomly extracted from FOLK, the paper seeks to answer the question of how reliable the automatically generated MP-data of FOLK are. With regard to the list of lexemes considered in FOLK, errors are essentially limited to the MP eigentlich and to some quantitatively marginal cases. The overall frequency of MPs in FOLK (token rate 2.55%) also seems plausible. Major deviations from previous studies arise in the frequencies of some single MPs, of which auch, mal and halt are analysed in more detail. While the discrepancies for auch are due to deficits in POS tagging, for mal and halt corpus characteristics (discourse types and survey periods) play a major role. When extrapolating the adjusted frequencies found in the random samples to the whole corpus, the MP frequency rankings of FOLK however correlate just as well with those of manual counts (r=0.81/0.82) as the manually determined MP rankings of different corpora do with each other.

