The Opportunity In The Crisis: ICS Malware Digital Forensics And Incident Response

by Christa Miller, Forensic Focus

Malware aimed at industrial control systems (ICS) is nothing new. Nearly 10 years have passed since Stuxnet first targeted the supervisory control and data acquisition (SCADA) systems and programmable logic controllers (PLCs) associated with centrifuges in Iran’s nuclear program. Since then, Havex, BlackEnergy 2, and Crash Override / Industroyer have targeted various ICS.

Until very recently, targeted attacks on ICS have remained rare. In 2017 Dragos, a provider of industrial security software and services, reported that most malware infections on ICS were accidental.

The following year, the Kaspersky lab likewise reported that most ICS malware infections — including cryptomining, ransomware, remote-access trojans (RAT), spyware, and other threats — were random. Dragos has also reported, however, that targeted ICS intrusions aren’t as rare as first believed. 

Attacks on the electrical grid and other ICS have caused concern for safety in hospitals, transportation networks, and other systems. Still, ICS is deliberately designed with failsafes; in other words, if one system fails, independently running safety instrumented systems (SIS) are triggered to shut down, limiting risk to life.

Get The Latest DFIR News

Join the Forensic Focus newsletter for the best DFIR articles in your inbox every month.

Unsubscribe any time. We respect your privacy - read our privacy policy.

That changed in 2017, when a new form of malware, Triton — also known as TRISIS or HatMan — was found to attack those very safety features.

Triton’s authors, known as XENOTIME, are far-reaching. In April 2019, Motherboard reported that FireEye had been hired to respond to a breach at an undisclosed critical infrastructure facility, and that during the response, they had found traces they could link back to Triton authors. 

So what’s the rub for digital forensics analysts and incident responders?

  1. Your skillset is in demand — and in flux. Not only could you end up using your abilities reactively, to respond to an incident; you might also be tapped to use them proactively, as part of threat hunting.
  2. Because the effectiveness of ICS security best practices themselves are being questioned, any assessments you work on may shift along with the landscape, too.

Triton: Some Background

So named because of its targeting Triconex SIS built by Schneider Electric, the Triton malware was first discovered in August 2017. It affected systems located at the Saudi Arabian Oil Company (Saudi Aramco)’s Petro Rabigh petrochemical complex.

The attack concerned cybersecurity experts for many reasons:

  • It was the first known instance of a targeted attack on safety control systems that are designed to prevent explosions, hazardous chemical releases, or other threats.
  • Because the attackers had to have had the budget and time to purchase their own controller, the attack pointed to a nation-state.
  • The attack also took advantage of the way the plant conducted its operations; for example, initially targeting systems on a Saturday, when staffing was minimal.

The August attack was actually the second to occur. Incident responders found that an initial shutdown, which occurred two months previously in June on a single controller, had been triggered by the same malware.

That time, the plant called Schneider for support with troubleshooting the controller. According to Kelly Jackson Higgins, writing for Dark Reading in January 2019: “[T]he vendor pulled logs and diagnostics from the machine, checked the machine’s mechanics, and, after later studying the data in its own lab, addressed what it thought was a mechanical issue.” 

The second attack may have successfully fooled the company, too, had it again triggered only a single shutdown. This time, though, the malware had infected six controllers, which all shut down. That made engineers suspicious.

The next clue that something was wrong, reported the New York Times, was discovery of the malware itself. Written to resemble legitimate Schneider logic, the malware contained an error that had triggered the shutdown.

Incident responders quickly discovered multiple signs of an ongoing attack. Antivirus software alerted the organization to Mimikatz-related traffic, according to Higgins at Dark Reading, and Remote Desktop Protocol (RDP) sessions were running on the plant’s engineering workstation from within the network.

Additional findings, as reported by Gregory Hale for the Industrial Safety and Security Source blog, included unknown programs running in the infected controllers’ memory, and Python scripts created on the engineering workstation around the time of the August attack. Later on, suspected beacons to the attackers’ control network were found.

Ultimately, wrote Higgins, responders determined that the DMZ firewall between the information technology (IT) and operational technology (OT) networks was poorly configured. That allowed the attackers to compromise the DMZ itself so they could pivot into the control network.

The primary reason the attackers didn’t tamper with or disable the Triconex safety instrumented systems was due largely to human error. Jason Gutmanis, the primary responder to the Triton incident, was quoted in a January 2019 article, believing the attackers simply became complacent after nearly two years of persistence. Their mistake left tools on the system, which they later tried to eradicate.

Challenges of Digital Forensics in ICS

What might be called the “industrial internet of things” is a risk, wrote Catalan Cimpanu for Bleeping Computer in 2017, because of the “clear benefits from controlling SCADA equipment over the Internet” in contrast to isolating the equipment from either internet or internal networks. 

That’s supported by the Kaspersky reports, which stated that the main source of infection was the internet – accounting for more than one-quarter of attacks across 2018. (Removable storage media and email clients accounted for the rest.)

A facility’s own networks are only one potential target. Attackers could also pivot into an ICS network through a third party, in much the same way the 2013 Target attackers breached their payments system through a heating, ventilation and air conditioning (HVAC) system vendor. 

Forensic analysis of ICS compromises can take nothing for granted, in other words, but even this approach has its challenges. In a presentation for the 2016 SANS Institute ICS Security Summit, Chris Sistrunk, Senior ICS Security Consultant at Mandiant, detailed the “DFIRence” for ICS responses involving embedded devices. Specifically, Sistrunk detailed what kind of volatile and non-volatile data to collect, along with the major differences between a “standard” incident response process, and one for ICS.

Tasks like situational assessments, communication to management, and documentation of findings are similar across environments. However, ICS complicates the return to a normal state of operation for a few reasons:

  • Responders must account for physical processes in addition to digital processes.
  • ICS device constraints make it difficult to remediate and regain control of affected devices.
  • ICS devices also have different protocols, which must be collected manually.
  • Analysis can be tricky because there are no ICS-specific DFIR tools. Instead, Sistrunk noted, responders may have to rely on manual collection.

Who’s a first responder in an ICS facility?

Sistrunk’s presentation broke down ICS response and analysis responsibilities this way:

First responders consist of the ICS engineer or technician, network engineer, and/or vendor, who examine user and event logs to see what they reveal. Checks on firmware, running last known good configurations as well as standard configurations, and communications should all be performed at this stage.

In the analysis stage later on, the vendor, digital forensics specialist, or embedded systems analyst should evaluate embedded operating system files, captured data at rest and in transit, and if possible, volatile memory for code injection and potential rootkits.

What constitutes “first response” and “first responder,” though, seems to be in a state of flux. Gutmanis is on record saying that Schneider, the controllers’ manufacturer, could have detected the attacks two months previously, following their first attack in June 2017. 

In an official statement quoted by Higgins at Dark Reading, however, Schneider insisted this wasn’t the case because plant engineers themselves didn’t suspect a security incident. They “took one Triconex system offline, completely removing the Main Processors, and sent them to Schneider Electric’s Triconex lab in Lake Forest Calif…. Once they were removed from power, the memory was cleared and there was no way to conclude that the failure was the result of a cyber incident.” Schneider’s focus at that point was whether the controllers worked correctly within their safety function, which it determined they did.

The engineers’ actions following the first Triton incident point to a need for greater awareness. In April 2018 for the Harvard Business Review, Andy Bochman, Senior Grid Strategist, National & Homeland Security at the Idaho National Laboratory (INL), argued: “Every employee, from the most senior to the most junior, should be aware of the importance of reacting quickly when a computer system or a machine in their care starts acting abnormally: It might be an equipment malfunction, but it might also indicate a cyberattack.”

Indeed, Sistrunk’s presentation noted: anomalies, whether they consist of increased network activity, strange behavior, or some kind of failure, always require investigation to answer the question of whether it’s a “known bad” or an “unknown bad” and, if unknown, whether to escalate it to a security incident.

Part of this is, of course, preventing the kinds of mistakes that can leave systems vulnerable to begin with. For instance, as reported by Dragos, “[t]he Triconex SIS controller had the keyswitch in ‘program mode’ during the time of the attack and the SIS was connected to the operations network against best practices.”

This mistake is an example of the kinds of trade-offs companies make in the name of greater efficiency, lower costs, and reliability. In another example, writes Bochman, to prevent operational disruption, security patches are installed in batches during periodic scheduled downtime, which could take place months after their release.

Moreover he argues, even perfect best practices implementation “would be no match for sophisticated hackers, who are well funded, patient, constantly evolving, and can always find plenty of open doors to walk through.”

Towards new best practice frameworks?

Best-practices frameworks like those offered by the National Institute of Standards and Technology’s (NIST) cybersecurity framework and the SANS Institute’s top 20 security controls, writes Bochman, “entail continuously performing hundreds of activities without error. They include mandating that employees use complex passwords and change them frequently, encrypting data in transit, segmenting networks by placing firewalls between them, immediately installing new security patches, limiting the number of people who have access to sensitive systems, vetting suppliers, and so on.”

Bochman points to “numerous high-profile breaches” at companies with “large cybersecurity staffs [which] were spending significant sums on cybersecurity when they were breached” as evidence that adherence to best practices may be a losing battle. “Cyber hygiene is effective against run-of-the-mill automated probes and amateurish hackers,” he writes, “but not so in addressing the growing number of targeted and persistent threats to critical assets posed by sophisticated adversaries.”

Bochman outlines an unconventional INL strategy: consequence-driven, cyber-informed engineering (CCE), which could help companies to prioritize and mitigate damage to targets they might once have deemed unlikely:

“Identify the functions whose failure would jeopardize your business, isolate them from the internet to the greatest extent possible, reduce their reliance on digital technologies to an absolute minimum, and backstop their monitoring and control with analog devices and trusted human beings.”

By forcing organizations to create prioritized (versus comprehensive) inventories of hardware and software assets — something Bochman argues most “fail at” — the lab’s methodology “invariably turns up vulnerable functions or processes that leaders never realized were so vital that their compromise could put the organization out of business.”

Threat hunting: part of the mix

Much of what’s known about Triton is due to painstaking research by Dragos. Among their findings: Triton isn’t very scalable because even within the same product lines, like Triconex, “each SIS is unique and to understand process implications would require specific knowledge of the process. This means that this malware must be modified for each specific victim reducing its scalability.” 

Even so, they added, “the tradecraft displayed is now available as a blueprint to other adversaries looking to target SIS.” At the same time, Bochman wrote: “Information systems now are so complicated that U.S. companies need more than 200 days, on average, just to detect that they have been breached….”

That’s where threat hunting comes in. In an interview with CSO Online’s Roger Grimes, the SANS Institute’s Rob Lee described how:

“Threat hunters are an early warning system. They shorten the threat’s “dwell time,” which is the time from the initial breach until they are detected…. A threat hunter is taking the traditional indicators of compromise (IoC) and instead of passively waiting to detect them, is aggressively going out looking for them.” This activity is especially important since a “[crafty adversary] will avoid tripping the normal intrusion detection defenses.”

Such is the case with Triton’s authors, XENOTIME. Eduard Kovacs, writing for SecurityWeek, described FireEye findings showing how the group’s tools, techniques and procedures focus on maintaining access to compromised systems. Additional XENOTIME data comes from Dragos, which presented on some of their activities at SecurityWeek’s 2018 ICS Cyber Security Conference

Lee’s recommended path to threat hunting: “… first work as a security analyst and likely graduate into IR and cyber threat intelligence fields.” One of the most advanced skillsets in information security, threat hunting requires security operations and analytics, IR and remediation, attacker methodology, and cyber threat intelligence capabilities. “Combined with a bit of knowledge of attacker methodology and tactics, threat hunting becomes a very coveted skill,” Lee said.

The threat of ICS-targeted malware is chilling, but it also presents extraordinary opportunities for DFIR analysts who have the motivation. Security and vulnerability assessments, comprehensive forensic investigation, and threat hunting are all rapidly growing disciplines within the industry. Subsets such as machine learning add dimensions.

Leave a Comment

Latest Videos

Podcast Ep. 80 Recap: Empowering Law Enforcement With Nick Harvey From Cellebrite

Forensic Focus 20th February 2024 11:49 am

Digital Forensics News Round Up, February 14 2024 #digitalforensics #dfir

Forensic Focus 14th February 2024 7:23 pm

Picture Perfect - Using Screenshots And Screen Recording In Mobile Device Investigations

Forensic Focus 13th February 2024 11:23 am

This error message is only visible to WordPress admins

Important: No API Key Entered.

Many features are not available without adding an API Key. Please go to the YouTube Feed settings page to add an API key after following these instructions.

Latest Articles