Capítulo de livro Acesso aberto Revisado por pares

HierCost: Improving Large Scale Hierarchical Classification with Cost Sensitive Learning

2015; Springer Science+Business Media; Linguagem: Inglês

10.1007/978-3-319-23528-8_42

ISSN

1611-3349

Autores

Anveshi Charuvaka, Huzefa Rangwala,

Tópico(s)

Spam and Phishing Detection

Resumo

Hierarchical Classification (HC) is an important problem with a wide range of application in domains such as music genre classification, protein function classification and document classification. Although several innovative classification methods have been proposed to address HC, most of them are not scalable to web-scale problems. While simple methods such as top-down "pachinko" style classification and flat classification scale well, they either have poor classification performance or do not effectively use the hierarchical information. Current methods that incorporate hierarchical information in a principled manner are often computationally expensive and unable to scale to large datasets. In the current work, we adopt a cost-sensitive classification approach to the hierarchical classification problem by defining misclassification cost based on the hierarchy. This approach effectively decouples the models for various classes, allowing us to efficiently train effective models for large hierarchies in a distributed fashion.

Referência(s)