A Method of Source Code Authorship Attribution Based on Graph Neural Network
2021; Springer Science+Business Media; Linguagem: Inglês
10.1007/978-981-16-6372-7_70
ISSN1876-1119
AutoresDixiao Guo, Anmin Zhou, Liang Liu, Shan Liao, Lei Zhang,
Tópico(s)Advanced Malware Detection Techniques
ResumoSource code authorship attribution aids in resolving software infringement and plagiarism issues, it is also helpful with the identification of the author of malware in the field of cybersecurity. However, traditional de-anonymization methods mainly extract semantic and lexical features, ignoring code structural features such as control flow and data flow, and the feature vectors generated by them are sparse vectors which are prone to overfitting when dealing with large-scale programmer's de-anonymization. In this paper, we proposed a novel code de-anonymization model, which is based on the AST, by extracting both AST and structural features, the model builds the feature graph representation of Python file and then uses graph neural network to realize code de-anonymization. Experimental results show the high accuracy, we achieve an accuracy of 98.06% with 117 programmers, and 95.60% with 1000 programmers on Google Code Jam Python datasets.
Referência(s)