Data Mining and Machine Learning for Reverse Engineering
Reverse engineering is one of the critical processes for mitigating the exponentially increasing threats from malicious software. It is also a common practice for detecting and justifying software plagiarism and software patent infringements when the source code is unavailable. However, it is a manually intensive and time-consuming process even for experienced reverse engineers. By closely collaborating with reverse engineers, I studied the challenges, designed and implemented an award-winning binary analysis platform called Kam1n0. It empowers reverse engineers with the latest possibilities driven by novel data mining and machine learning models.
Kam1n0 enables large-scale subgraph clone search of assembly code. It greatly reduces the manual effort of reverse engineering since it can identify the cloned parts that have been previously analyzed. It also includes specialized techniques that can mitigate the variance introduced by different processor families, different compilers, optimization techniques, and binary protection techniques. Additionally, Kam1n0 provides a specialized neural network that can statically and accurately summarize a given malware’s dynamic behaviors. Extensive experimental results suggest that Kam1n0 is accurate, efficient, and scalable for handling a large volume of data.
Kam1n0 won the Hex-Rays international plug-in contest award. Kam1n0 has been presented at the Smart Cybersecurity Network Canada (SERENE-RISC), SOPHOS, ESET, Above Security, and Google. It is now used in Defence Research and Development Canada (DRDC), Cisco, and the Los Alamos National Laboratory in the USA.
Steven is a Ph.D. Candidate at McGill University. He is affiliated with the Data Mining and Security Lab. His research focuses on developing novel data mining and machine learning techniques driven by the needs and challenges of real-life applications in cybersecurity. Steven is awarded the Dean’s Graduate Award at McGill University. He is also a recipient of the FRQNT Doctoral Research Scholarship of Quebec. His studies on binary analysis and authorship analysis have been published in the top data mining and security forums, such as ACM SIGKDD and IEEE S&P. See Steven’s research website http://stevending.net
for more information.