Étudier l'écrit SMS: Un objectif du projet sms4science

  • Louise-Amélie Cougnon
  • Thomas François


This paper details an international project called sms4science that aims to collect text message corpora (hereafter referred to as "SMS corpora") from across the globe for scientific research. The project already has ten participating regions, including Belgium, Réunion, Switzerland and Quebec. This article first presents the initial corpora collected from these four areas (resulting in a combined total of 116'000 text messages) and the accompanying methodology. It then exposes the research possibilities related to it: the corpus-based studies pertain as much to linguistics and sociolinguistics as they do to natural language processing and statistics. A specific statistical study is thus presented here and its possible conclusions outline the differences in SMS practices between regions, notably when you consider abbreviation rate or message length. Finally, the paper delineates the project obstacles and correspondingly proposes fresh perspectives for the ongoing year (2011).
Cougnon, L.-A., & François, T. (2011). Étudier l’écrit SMS: Un objectif du projet sms4science. Linguistik Online, 48(4). https://doi.org/10.13092/lo.48.331