Artigo Revisado por pares

Authorship Attribution via Coupon-Collector-Type Indices

2019; Routledge; Volume: 27; Issue: 4 Linguagem: Inglês

10.1080/09296174.2019.1577939

ISSN

1744-5035

Autores

Lukun Zheng, Huiqiang Zheng,

Tópico(s)

Natural Language Processing Techniques

Resumo

Authorship attribution is the process of determining the author of a text in question by capturing an author’s writing style based on selected stylistic features. In this paper, we propose a new methodology for authorship attribution based on a profile of indices related to the generalized coupon collector problem, called coupon-collector-type indices. The coupon collector problem and its generalizations are of traditional and recurrent interests. Coupons are drawn one at a time from a population containing n distinct type of coupons. The process continues until a complete set of n distinct coupons is obtained and the total number of draws, X, is recorded. We base our methodology on function words. We establish a testing procedure by constructing a confidence band of the coupon-collector-type indices using an empirical bootstrap technique. We validate our proposed methodology using several writing samples whose authorship is known. We then apply this methodology to explore the question of who wrote the fifteenth Oz book, whose authorship is disputed between Lyman Frank Baum (1856–1919) and his successor) on the Oz series, Ruth Plumly Thompson (1891–1976).

Referência(s)