Artigo Acesso aberto Revisado por pares

Accurately Estimating User Cardinalities and Detecting Super Spreaders Over Time

2020; IEEE Computer Society; Volume: 34; Issue: 1 Linguagem: Inglês

10.1109/tkde.2020.2975625

ISSN

2326-3865

Autores

Peng Jia, Pinghui Wang, Yuchao Zhang, Xiangliang Zhang, Jing Tao, Jianwei Ding, Xiaohong Guan, Don Towsley,

Tópico(s)

Spam and Phishing Detection

Resumo

Online monitoring user cardinalities in graph streams is fundamental for many applications such as anomaly detection. These graph streams may contain edge duplicates and have a large number of user-item pairs, which makes it infeasible to exactly compute user cardinalities due to limited computational and memory resources. Existing methods are designed to approximately estimate user cardinalities, but their accuracy highly depends on complex parameters and they cannot provide anytime-available estimation. To address these problems, we develop novel bit/register sharing algorithms, which use a bit/register array to build a compact sketch of all users' connected items. Our algorithms exploit the dynamic properties of the bit/register arrays (e.g., the fraction of zero bits in the bit array) to significantly improve the estimation accuracy, and have low time complexity $O(1)$ to update the estimations for a new user-item pair. In addition, our algorithms are simple and easy to use, without requirements to tune any parameter. Furthermore, we extend our methods to detect super spreaders with large cardinalities in real-time. We evaluate the performance of our methods on real-world datasets. The experimental results demonstrate that our methods are several times more accurate and faster than state-of-the-art methods using the same amount of memory.

Referência(s)