Capítulo de livro Revisado por pares

A Method of Source Code Authorship Attribution Based on Graph Neural Network

2021; Springer Science+Business Media; Linguagem: Inglês

10.1007/978-981-16-6372-7_70

ISSN

1876-1119

Autores

Dixiao Guo, Anmin Zhou, Liang Liu, Shan Liao, Lei Zhang,

Tópico(s)

Advanced Malware Detection Techniques

Resumo

Source code authorship attribution aids in resolving software infringement and plagiarism issues, it is also helpful with the identification of the author of malware in the field of cybersecurity. However, traditional de-anonymization methods mainly extract semantic and lexical features, ignoring code structural features such as control flow and data flow, and the feature vectors generated by them are sparse vectors which are prone to overfitting when dealing with large-scale programmer's de-anonymization. In this paper, we proposed a novel code de-anonymization model, which is based on the AST, by extracting both AST and structural features, the model builds the feature graph representation of Python file and then uses graph neural network to realize code de-anonymization. Experimental results show the high accuracy, we achieve an accuracy of 98.06% with 117 programmers, and 95.60% with 1000 programmers on Google Code Jam Python datasets.

Referência(s)