On Leveraging Coding Habits for Effective Binary Authorship Attribution
2018; Springer Science+Business Media; Linguagem: Inglês
10.1007/978-3-319-99073-6_2
ISSN1611-3349
AutoresSaed Alrabaee, Paria Shirani, Lingyu Wang, Mourad Debbabi, Aiman Hanna,
Tópico(s)Topic Modeling
ResumoWe propose BinAuthor, a novel and the first compiler-agnostic method for identifying the authors of program binaries. Having filtered out unrelated functions (compiler and library) to detect user-related functions, it converts user-related functions into a canonical form to eliminate compiler/compilation effects. Then, it leverages a set of features based on collections of authors' choices made during coding. These features capture an author's coding habits. Our evaluation demonstrated that BinAuthor outperforms existing methods in several respects. First, when tested on large datasets extracted from selected open-source C/C++ projects in GitHub, Google Code Jam events, and Planet Source Code contests, it successfully attributed a larger number of authors with a significantly higher accuracy: around $$90\%$$ when the number of authors is 1000. Second, when the code was subjected to refactoring techniques, code transformation, or processing using different compilers or compilation settings, there was no significant drop in accuracy, indicating that BinAuthor is more robust than previous methods.
Referência(s)