HierCost: Improving Large Scale Hierarchical Classification with Cost Sensitive Learning
2015; Springer Science+Business Media; Linguagem: Inglês
10.1007/978-3-319-23528-8_42
ISSN1611-3349
AutoresAnveshi Charuvaka, Huzefa Rangwala,
Tópico(s)Spam and Phishing Detection
ResumoHierarchical Classification (HC) is an important problem with a wide range of application in domains such as music genre classification, protein function classification and document classification. Although several innovative classification methods have been proposed to address HC, most of them are not scalable to web-scale problems. While simple methods such as top-down "pachinko" style classification and flat classification scale well, they either have poor classification performance or do not effectively use the hierarchical information. Current methods that incorporate hierarchical information in a principled manner are often computationally expensive and unable to scale to large datasets. In the current work, we adopt a cost-sensitive classification approach to the hierarchical classification problem by defining misclassification cost based on the hierarchy. This approach effectively decouples the models for various classes, allowing us to efficiently train effective models for large hierarchies in a distributed fashion.
Referência(s)