cfaed Publications
A Comparative Study on the Accuracy and the Speed of Static and Dynamic Program Classifiers
Reference
Anderson Faustino da Silva, Jeronimo Castrillon, Fernando Magno Quintão Pereira, "A Comparative Study on the Accuracy and the Speed of Static and Dynamic Program Classifiers", Proceedings of the 34th ACM SIGPLAN International Conference on Compiler Construction (CC 2025), Association for Computing Machinery, pp. 13–24, New York, NY, USA, Mar 2025. [doi]
Abstract
Classifying programs based on their tasks is essential in fields such as plagiarism detection, malware analysis, and software auditing. Traditionally, two classification approaches exist: static classifiers analyze program syntax, while dynamic classifiers observe their execution. Although dynamic analysis is regarded as more precise, it is often considered impractical due to high overhead, leading the research community to largely dismiss it. In this paper, we revisit this perception by comparing static and dynamic analyses using the same classification representation: opcode histograms. We show that dynamic histograms—generated from instructions actually executed—are only marginally (4-5%) more accurate than static histograms in non-adversarial settings. However, if an adversary is allowed to obfuscate programs, the accuracy of the dynamic classifier is twice higher than the static one, due to its ability to avoid observing dead-code. Obtaining dynamic histograms with a state-of-the-art Valgrind-based tool incurs an 85x slowdown; however, once we account for the time to produce the representations for static analysis of executables, the overall slowdown reduces to 4x: a result significantly lower than previously reported in the literature.
Bibtex
author = {Anderson Faustino da Silva and Jeronimo Castrillon and Fernando Magno Quint\~{a}o Pereira},
booktitle = {Proceedings of the 34th ACM SIGPLAN International Conference on Compiler Construction (CC 2025)},
title = {A Comparative Study on the Accuracy and the Speed of Static and Dynamic Program Classifiers},
doi = {10.1145/3708493.3712680},
isbn = {9798400714078},
location = {Las Vegas, NV, USA},
pages = {13--24},
publisher = {Association for Computing Machinery},
series = {CC 2025},
url = {https://doi.org/10.1145/3708493.3712680},
abstract = {Classifying programs based on their tasks is essential in fields such as plagiarism detection, malware analysis, and software auditing. Traditionally, two classification approaches exist: static classifiers analyze program syntax, while dynamic classifiers observe their execution. Although dynamic analysis is regarded as more precise, it is often considered impractical due to high overhead, leading the research community to largely dismiss it. In this paper, we revisit this perception by comparing static and dynamic analyses using the same classification representation: opcode histograms. We show that dynamic histograms---generated from instructions actually executed---are only marginally (4-5\%) more accurate than static histograms in non-adversarial settings. However, if an adversary is allowed to obfuscate programs, the accuracy of the dynamic classifier is twice higher than the static one, due to its ability to avoid observing dead-code. Obtaining dynamic histograms with a state-of-the-art Valgrind-based tool incurs an 85x slowdown; however, once we account for the time to produce the representations for static analysis of executables, the overall slowdown reduces to 4x: a result significantly lower than previously reported in the literature.},
address = {New York, NY, USA},
month = mar,
numpages = {11},
year = {2025},
}
Downloads
2503_daSilva_CC [PDF]
Permalink
https://cfaed.tu-dresden.de/publications?pubId=3805