Toward Exact And Inexact Approximate Matching Of Executable Binaries

Lorenz Liebler discusses his research at DFRWS EU 2019.

The application of approximate matching (a.k.a. fuzzy hashing or similarity hashing) is often considered in the field of malware or binary analysis. Recent research showed major weaknesses of predominant fuzzy hashing techniques in the case of measuring the similarity of executables (Pagani et al., 2018).

Summarized, well known Context-Triggered Piecewise-Hashing approaches are not very reliant for the task of binary comparisons, as even benign changes heavily impact the underlying byte representation of an original binary. Modifications could be caused by benign or malicious source code changes, different compilers, and changed compiler settings.

Approaches based on the extraction of statistically improbable features (Roussev, 2010) or n-gram histograms (Oliver et al., 2013) showed a better detection performance in case of inexactly matching binaries with varying build settings or source code modifications.

Watch the presentation