Automated Control Logic Forensics In Industrial Control Systems

Upon reading the title, the first few questions that might come to a reader’s mind are: what are Industrial Control Systems? Why are they important? How can an attacker attack them? What is control logic, engineering software, etc.? How does control logic forensics work? 

In this article, we will go through these questions one by one to understand industrial control systems, the attacks on them, and the challenges faced during a forensic investigation.

What are Industrial Control Systems (ICS) and why they are important?

In today’s world, industrial control systems play a vital role in our modern infrastructures. These systems are used to control and monitor a range of cyber-physical systems ranging from things you see every day like traffic signals, elevators etc. to highly critical infrastructures like power plants, oil and gas pipelines, electric grid stations and different industrial plants. 

Industrial control systems consist of two parts:

  • The control center which contains engineering workstations to program the control system, human machine interface to observe and manage the current state of the cyber-physical system, the historians to record the data and logging the state of ICS system, and the control servers to communicate with the other side of ICS.
  • The field sites where you have the actual cyber-physical system like gas pipeline, grid-station, nuclear plant etc. These cyber-physical systems are controlled by programmable logic controllers (PLC) according to user defined programs, also known as control logic. 

Depending on the nature of the physical process, the control engineer can write a control logic in one of the five languages (Ladder logic, Function block diagram, Sequential function chart, Structured text, and Instruction list) supported by the software provided by the PLC vendor (engineering software) and then download or write it to the PLC via Ethernet.

First, the engineering software uses a built-in compiler to translate the high-level control logic to machine-readable binary. Then, the software encapsulates this binary in different ICS protocols, which are sometimes proprietary or have a proprietary layer. Finally, the binary is sent to the PLC via a series of request-response messages.

This process is called “downloading” control logic. Similarly, the control engineer can also read the project running on the PLC in the engineering software by “uploading” it to the engineering software. 

The security of industrial control systems is very critical and any cyber-attack on them can have devastating effects. For example, an attacker can disrupt the operations of a critical infrastructure like an electric grid station, leaving people without electricity, or can target water stations and pipelines to cut the supply of the water to a region.

After getting a basic understanding of industrial control systems and their importance, we will next discuss how cyber-attacks are performed on these systems. 

Figure 1: An overview of industrial control systems

How can an attacker attack Industrial Control Systems?

In ICS, PLCs are generally an attacker’s main target. The ideal goal is to disrupt the PLC’s functions to sabotage the physical process it controls. For this purpose, an attacker can infect the engineering workstation and then use it to download malicious control logic to the PLC to manipulate the behavior of a physical process. 

These types of attacks are called control logic injection attacks. Stuxnet is one example of such attacks. Stuxnet compromises the Siemens SIMATIC STEP7 engineering software and infects the control logic of a Siemens S7-300 PLC to modify the motor speed of centrifuges periodically from 1410 Hz to 2 Hz to 1064 Hz.

Figure 2: An attacker can download a malicious control logic to PLC to disrupt the physical system.

How does control logic forensics work? What tools are available? What are the challenges?

Although the attacker can hide his activity in the engineering workstation, the network traffic of control logic transfer (download), if captured, contains the malicious control logic. Given a forensic tool that can extract the binary control logic from the network and then transform it into human readable high-level language, a forensic investigator can recover the malicious control logic for investigation.

Developing such a tool is not a trivial task, as there are many challenges:

  • First, the binary control logic does not have a standard open format, but rather, vendor-specific proprietary format.
  • Second, the engineering software from different vendors support some of the five languages mentioned above, so the binary control logic must be converted to the language supported by the PLC engineering software.
  • Finally, engineering software from different vendors use proprietary ICS protocols and because their specification is not publicly available, it is very difficult to extract the control logic from the network dump. 

Some existing tools, like Laddis and Similo, can be used to extract the control logic from the network traffic, but they both have some limitations.

Laddis is a binary control logic decompiler for the AlleneBradley’s RSLogix engineering software and MicroLogix 1400 PLC. It uses a complete knowledge of the PCCC proprietary protocol to extract the control logic from the network traffic and further utilize low-level understanding of binary control-logic semantics for decompilation.

Unfortunately, Laddis requires tedious and time-consuming manual reverse engineering efforts for exploring the ICS proprietary network protocols and semantics of binary control-logic.

Similo, on the other hand, addresses some of the Laddis system’s shortcomings. Similo is designed to investigate control-logic theft attacks, where the attacker reads the control logic from a PLC over the network. However, it does not support the forensic investigation of control logic injection attacks where the attacker transfers a malicious control logic from the engineering software to a target PLC.

Given the limitations of existing tools, we developed a novel control-logic forensics framework for control logic injection attacks. It extracts and decompiles control logic from a network traffic dump automatically without any manual reverse engineering, knowledge of ICS protocols, or underlying binary control-logic format and can extract the control logic from both download and upload traffic.

Figure 3: Reditus can extract the malicious binary control logic from the network traffic and then convert it to human-readable form

How does Reditus work?

Based on the observation that an engineering software can read a control logic from a PLC remotely using the upload function and has a built-in decompiler that can further transform the control logic into its source-code.

Reditus uses a “virtual” PLC to recover the control logic by integrating the decompiler in the engineering software with a (previously-captured) network traffic dump using the upload functionality. The virtual PLCs take the network dump under investigation and connect with the engineering software to upload the control logic.

There are many challenges in reusing the old network dump — for example the different session related ID’s, transaction ID, difference in structure of upload and download message etc. — so in order to resend the captured traffic back to the engineering software Reditus has to perform two important tasks:

  • First, identify all the session-dependent fields present in the messages and update them according to new sessions with the engineering software.
  • Second, learn and develop a template for upload response so it can transform the write messages from control logic download (during control logic injection attack) to read messages, and send it back to the engineering software.

To perform these tasks, Reditus undergoes a two-part learning phase. In the first part it learns about the different session-dependent fields, and in the second part it learns the upload message template. The following figures show the two learning phases and the testing phase of Reditus:

Diagram

Description automatically generated

Figure 4: Overview of learning and testing phase of Reditus

Learning the session-dependent fields

Reditus learns the session-dependent fields using differential analysis, i.e analyzing two entities to identify differences between them. So in order to find the session-dependent fields — i.e the values in message headers that vary over different sessions — Reditus takes multiple sets of two benign network dumps from different sessions that have the same control logic and transfer direction (both download or both upload).

It identifies and pairs the same messages in two different sessions based on the length of the message and message string similarity, divides the messages in different groups based on the length of message, and then finally performs the differential analysis to identify the location of session dependent fields in each message pair and file.

The results are aggregated upon multiple messages and control logic files and the final set of session dependent fields locations are used for further analysis. 

Learning the session upload template

By examining the upload responses messages from a real PLC to the engineering software, we observed that the upload response message can be divided into  four types of fields:

  1. Session-Dependent Fields which vary over different sessions (e.g. Transaction ID).
  2. Static Fields that Remain constant over all the upload response messages such as the function codes.
  3. Dynamic Fields that Depend on the content of the message and vary in different messages
  4. Control Logic field: That contains chunks of the control logic binary being transferred.

Since Reditus has already learned the session-dependent fields in the first part, in the second learning phase it takes two benign network dumps that contain the same control logic but in different directions — i.e one network dump of control logic upload and one of download, pair the similar messages similar to previous phase and then perform a series of operations to get the information about the above mentioned fields. 

For static fields, Reditus compares all the upload response messages and identifies the locations where the values do not change. For dynamic fields, like length of message, Reditus runs a sliding window over an upload response message and compares the value inside the window with the length of message outside.

Finally, for the control logic, Reditus takes the Longest Common sub-sequence in upload responses and download requests, which show the location of control logic in both upload and download messages. After learning all the required fields and templates, Reditus goes into the testing phase, where it starts communication with the engineering software.

In the testing phase, Reditus runs a virtual PLC and takes the forensic artifact or target PCAP file and generates a database of request and response messages present in the PCAP. Then it starts a PLC server on the same port as the real PLC and waits for a connection from the engineering software.

Once the PLC server receives any request message from the engineering software, it forwards it to the Response Generator, which generates the response message using the messages present in database, upload template, and session-dependent fields. Finally this response message is sent to the engineering software. 

In this way, Reditus can transform the download messages to upload messages and extract the control logic downloaded on the PLC from the network dump.

How good is Reditus?

The extensibility of Reditus to work with different engineering software and PLCs aside, the most important thing is the accuracy of the control logic uploaded by the framework. In order to check that, we used Schneider Electric Modicon M221 PLC and SoMachine Basic v 1.6. and uploaded 40 different control logic programs of various sizes and complexities using Reditus and a Real PLC. We then manually compared each of them to check if they contain the same. Our results show that Reditus is able to upload any control logic from a network dump with 100% transfer accuracy.

Stuxnet is not an isolated case of attacks on critical infrastructures. In the past few years, there have been many cases: a German steel mill (2014), Ukrainian power grid attacks (2015, 2016) and the Kemuri water company (2016), in which attackers caused physical damage to the infrastructure and put the lives of many operators and consumers at risk.

These attacks highlight the increasing trend of cyber attacks on critical infrastructures. Along with the increase in number, the attackers are also diversifying the threat types i.e. malware, ICS network intrusion, remote code execution, and code injection, etc.

In these circumstances, there is a dire need of developing new ICS forensics tools that can help the forensics community in investigating these cyber-attacks. One of the main hurdles in developing such tools is the heterogeneity (variety) of the ICS environment where different PLC vendors have different protocols, decompiler, engineering software, firmware, etc.

We hope that through this research, the ICS community will focus more on automated and generic tools like Reditus that can support a wide range of PLCs.

Syed Ali Qasim is a PhD candidate and research assistant at Virginia Commonwealth University, where he studies digital forensics, industrial control systems, and protocol reverse engineering. Winner of the Best Student Paper Award at DFRWS-US 2020, “Control Logic Forensics Framework using Built-in Decompiler of Engineering Software in Industrial Control Systems” builds on Qasim’s previous research on an automated scalable framework for ICS control logic forensics. 

Dr. Irfan Ahmed is an Assistant Professor of Computer Science at Virginia Commonwealth University. He directs a Security and Forensics Engineering (SAFE) Lab where Qasim is a Ph.D. student.

Leave a Comment