A Novel Adversarial Example Detection Method For Malicious PDFs Using Multiple Mutated Classifiers

Chao Liu: Good morning, ladies and gentlemen, I would like to quickly introduce myself. My name is Chao Liu and I am a student of the Institute of Information Engineering, Chinese Academy of Sciences. Today, I would like to share with you our work: A Novel Adversarial Example Detection Method for Malicious PDFs Using Multiple Mutated Classifiers.

My talk this morning is divided into four main sections: motivation, background, methodology, experiments and analysis. First I’m going to present the motivation for working. I think you all know that deep learning and especially DNN have emerged as one of the primary techniques in academic communities, as well as industries. They bring great convenience to technology and people’s lives, but it also brings safety concerns. These systems have been shown to be vulnerable to adversarial environments with novel evasion attacks. The true [inaudible] interest in researches in detecting or defending against adversarial examples.

In computer vision, methods have been proposed to improve the robustness of the DNN model. So able to detect the adversarial malicious PDFs from massive PDF files imposes a big challenge to the forensic investigators as well.

Next, I will introduce a little background of the research as the knowledge it needed. Let’s start with a quick look at the basic concept on Portable Document Format, PDF. The structure of a PDF document consists of four parts: header, body, cross-reference table and trailer.  As shown in the figure, the body specifies the content of the PDF and the [indecipherable] blocks, font, images, and the metadata regarding the file itself. In the content is a set of PDF objects that consist of the content of the document. The attack is usually achieved by modifying the body and the cross-reference table. Supervised machine learning has been widely deployed for malware detection. Concerning PDF files, multiple detectors were developed in the last decade that have implemented such technology.

The primary goal of machine learning detectors for malicious document detection is to decide whether some unseen PDFs should be labeled as malicious or benign. In general, their structure is shown as in the figure, which is composed of three main parts: pre-processing feature extraction, and the classifier. Adversarial attacks on documented detection systems are also called the evasion attacks. They take advantage of knowledge of how the machine learning system operates and extracts access to training set of features to evade detection skillfully. Like image adversarial example, document adversarial example can be generated using two major approaches: content based approach and the feature space-based approach.


Get The Latest DFIR News

Join the Forensic Focus newsletter for the best DFIR articles in your inbox every month.

Unsubscribe any time. We respect your privacy - read our privacy policy.


The former merely includes mimicry, reverse mimicry and the flawed document, while the latter is represented by gradient descent attack and EvadeML attack. One [method] to defend against adversarial example is adversarial training, but it can only be used to reinforce existing systems. There are also ways to deal with adversarial example in terms of documented detection, such as ensemble classifier mutual agreement analysis, robust feature and document entropy time series. Unfortunately they all have several shortcomings.

Then I will share with you our methodology. It responds to above challenges. We propose a detection method based on one essential attribute of adversarial example. It is that existing adversarial examples are usually generated for certain model. And even if affected by transferability, adversarial examples are likely to fail on another similar model. Our approach finds the models to animated the effects of transferability. Then we use the prediction in [indecipherable] of each model to get the detection results. It also depends on the reinforcement design of the detection system.

Next I introduce the two main methods and the final detection system in turn. The first main method is mutation model generation. We take a similar string as in the fuzzing to generate a detection model group. In other words, the training parameters of some normal machine learning models can be adjust to generate similar mutation models.

It contains three key points: One uses six basic models commonly used in malware detection, including SVM, kNN, Naïve Bayes, Logistic Regression, Decision Tree and Random Forest. And second explore and test the corresponding mutation operators. Last one is selection of models. The first two are to ensure smooth model generation and the selection of model is strongly related to the stack of the main step method. 

After the generation of detection model group, next step is to input the testing examples into the models, sequentially, and they record the prediction result of each model. Then we will calculate the Prediction Inversion Rate: PIR, and comparing two ways the benign examples to m·Pb. The principle of detection using PIR is showing a [inaudible] and benign example [inaudible] the original model. The prediction is benign, zero. The prediction reverse if the model in the detection model group classifiers, the input is malicious, one, and the same change occurs when the input is an adversarial example.We can calculate the PIR from the number of models that have reversed. And the finds that the adversarial example is more sensitive to the model.

Two points need to be noted. The first is that the PIR calculation module needs to support the calculation of benign example and the second is to set the adjustment factor of the threshold.

Based on the above modules we redesigned the detection system based on machine learning model. The whole structure is relatively clear. Three parts of the detection model were repaired in the pre-processing stage of machine learning methods. Examples filtering of flawed documents and mimicry attack is added. And subsequently, the feature extraction part, we mainly used two types of effective features set: content-based features and structure-based features. To ensure that the detection results are reliable when generating the final classifier, we first needed to train an original model and ensure that’s factory performance, then generate a series of models for calculating PIR based on the detection model group generation algorithm, and the PIR calculation module was added after the classifier.

 Finally, I will analyze the experimental results. The collection of data stats, experimental settings and evolution indicators can be revealed in the paper, which will not be described here. 

You can see table one lists the indicators of the original model. The detection performance of the model is very high in the case of using integrated features. Based on its performance, the final generating model group contains 700 models. As a table tool records the mean PIR values of all examples of each type, it can be seen from the table that the PIR of a benign example is significantly less than the PIR of any adversarial example.

This table records the result of the comparative experiment. Our system can effectively detect adversarial examples generated by forty attacks. I think there are three main reasons. First we used comprehensive features when training the model. Second, we added the state-of-art parsing method and the pruning algorithm in early stage. Most importantly, PIR points out the essential flaws of adversarial example. In addition, our system is slightly higher in average detection time than other detectors. The main reason is insistence of not optional modules. That’s all for my introduction. Thank you for listening. If you have any questions, please don’t be hesitate.

Leave a Comment