1 minute read

Amsterdam Data Science has awarded one of its best thesis award to Amsterdam’s Rachel van ‘t Hull for her BSc thesis “Towards Explainable Artificial Text Detection”

Supervisors: Dr. Willem H. (Jelle) Zuidema, Valentin Vogelmann, Bas Cornelissen

Praise from the jury:

As the quality of artificially generated texts improved considerably over the past few years, manifested by deep-learning based language models trained on many billions of tokens, so does the possibility to deceive others by means of fake reviews, identity theft and phishing. To counter such maleficent usage of text generation, this thesis focuses on the task of distinguishing artificially generated texts from human-written text. In contrast to detector models that are commonly applied to this task, lacking in generalizability and transparency, the thesis explores the value of word distributions as a signal in the context of different grammatical categories and corpus sizes. The outcomes of the extensive experimentation show a promising performance of the proposed explainable approach to text classification.

In terms of impact, the thesis provides extensive empirical evidence that a focus on word distributions provides a broadly applicable and explainable alternative to the opaque detector-based models, thereby inspiring future studies into artificial text detection to further investigate this perspective. The complete pipeline is publicly shared.

The endeavor has obvious societal relevance, providing a powerful and explainable handle to automatically flag deceptive texts that are generated at large scale. This work is a well-deserved winner of the thesis prize, standing out with a clear writing style and experimental structure, as well as extensive detail and motivation. It has an impressive conceptual depth for a bachelor thesis, and reflects good computational and statistical skill.