DFRWS Virtual USA 2020: Recap

In July, digital forensics professionals from the public, private, and academic sectors came together to hear about some of the latest research in the industry. In his opening remarks, DFRWS USA Conference Chair Josiah Dykstra, a Technical Fellow with the Cybersecurity Collaboration Center at US Department of Defense, reflected that although the conference was virtual this year, the selection of papers was as rigorous as ever: a fitting way to reflect 20 years of success.

Since August 2001, Dykstra said, when the first organizers in New York State circulated a roadmap for digital forensics research, more people have joined the profession and more tools have become available. DFRWS itself has grown as an organization into a multinational event. As of next year, its conference series will include not only the US and Europe, but also the Asia-Pacific region.

As with DFRWS-EU, Lightning Talks afforded attendees the chance to converse on a range of topics, including AI-based digital forensics solutions and diversity in the field. Meanwhile, the Digital Forensics Rodeo — a capture the flag-style event which extended throughout the week — featured steganography challenges involving audio and pictorial data, as well as challenges associated with files, decryption, redaction, and other data.

The keynotes: New frontiers and ways to keep learning

David Cowen, Managing Director at KPMG, kicked off the conference with a big-picture look at new opportunities in DFIR — including ones that might not even exist for this year’s graduates. This creates, he said, “a world of opportunities” to apply for grants and to research new categories of digital forensics, including the following drivers of funding:

  • Web 3.0. Shifts in data storage from endpoint to cloud and back to endpoint in “offline mode” have created forensic artifacts such as cache and history files — “byproducts of developer choices” to make things faster and easier to use.
  • DevOps, or “development operations” are a “huge area for research,” said Cowen, because development philosophies applied to infrastructure creation and maintenance make for unprecedented access to data on both endpoints and servers.
  • Cloud forensics. Amazon, Azure, Google Compute, and others offer entirely new data sources, such as provider audit trails. However, the cloud also offers the promise of new techniques and automation, like parallel processing. (Look for a forthcoming SANS class!)
  • Automated correlation takes human error out of the process by normalizing data from different structures (such as the connection between the $MFT and parsed LNK files) — and allows human analysts to do what they’re better at: putting the data together in a bigger picture.
  • Ephemeral containers, which might be spun up and then spun down in seconds depending on demand, impact the data that relies on them — which might likewise be ephemeral. They offer little visibility, logging, or telemetry, so practitioners need other ways to investigate what was happening in containers like this.

The following day, Mari DeGrazia, a SANS instructor and Associate Managing Director at Kroll Cyber Security, Inc., presented “Everything I Know About Forensics, I Learned from Elle Woods.” A lighthearted but inspiring message grounded in the protagonist from the movie Legally Blonde, DeGrazia’s presentation offered the following insights:

  • Embrace change. Just as Elle shifted her career aspirations from fashion merchandising to law, DeGrazia herself switched from civil engineering to computer science and forensics — reflecting, she said, a common phenomenon in digital forensics.
  • Put in the time. Books and training afford the ability to develop core knowledge, while conferences, blogs and podcasts, Twitter, forums, Discord, and Slack offer what DeGrazia called “shorter tail” information on new artifacts and techniques.
  • Use the tools you need to get the job done. The tools that “everyone else” is using may not be best or most intuitive for you, so it’s wise to understand tools’ limitations and unique capabilities — and validate their results on an ongoing basis.
  • Find a mentor. Like Elle’s law professor, a good mentor asks questions to help you connect the dots — and gives the strong words of advice and motivation to get you back on track when you want to give up.
  • Flex when you need to. People might make incorrect assumptions about your expertise. There’s no need to be rude about it, said DeGrazia, but handle it like Elle: show your knowledge and put your critic in their place.
  • Follow your instincts, which can extend from parsing artifacts to building connections across cases. This way, if something looks “off” and you don’t know why, you can still know when to follow its lead.
  • Be bold and take chances. It can be “terrifying,” said DeGrazia, to post first your blog based on your research, or to commit to public speaking — anything that goes outside your comfort zone. What helps: support from people in our lives who see things in us we may not see, encourage us to take chances, and support the decisions we make.
  • Stand out. This ties to being bold and doing what others don’t want to take on. It’s a way, DeGrazia said, to elevate yourself and put your accomplishment on your resume.
  • You do you. Elle remains true to herself, doing things “the Elle Woods way.” She succeeds by being herself — and ultimately, the sum of her unique experience helps her win her case.

“Everything we bring to the table with us is based on our journey to get there,” said DeGrazia, observing that because each practitioner sees cases differently through their own lens, making different connections because of that perspective — and bringing those perspectives together — has real value.

Opportunities for collaboration

DeGrazia’s points built on ones Josh Hickman, a senior associate with Kroll Cyber Risk, had raised the day before in his presentation “I Care, But Where Do I Start? Sharing Knowledge in Digital Forensics.”

Hickman said incident response and mobile forensics (among others) change so rapidly, it’s always possible to see a new tactic/technique/procedure (TTP), a change to an existing TTP, or an artifact that no one else has seen. The discipline progresses iteratively as practitioners build on one another’s work, so the key is to start with a reasonable — though slightly difficult-to-reach — goal.

Even at that, he said, everyone experiences impostor syndrome. One example from Hickman’s own experience: delaying a blog post for nine days for fear of being “wrong.” This happens when you get outside of your comfort zone, though, so it’s also a sign of growth.

Lack of time, or work prohibitions, can be dealt with by reaching out to other researchers for help with proofreading, research guidance, and other steps; Phill Moore, he said, has a “how to blog” post pinned, and “microblog” resources like Twitter are options for those who want to start small. Resources like AboutDFIR, This Week in 4n6, and DFIR.training can help to promote posts, while DFIR Review exists to peer review blog posts and lend credence to research. (Editor’s note: Forensic Focus welcomes research. Write an article for us to post, or join us on our forums!)

One potential collaborative opportunity came from Joe Sylve, Director of Research and Development at BlackBag Technologies (a Cellebrite company). His “work in progress” presentation, “Revisiting the Linear Hash,” invited collaborators and comments via email or Twitter.

Why it needs revisiting: the common linear hash, where each and every bit of data is read and hashed in logical order, is not always appropriate for modern evidence sources. For instance, Sylve said, memory may be unmapped so that it’s out of order, and be stored in petabytes. In general, data streams aren’t necessarily as sequential as the linear hash itself, which can’t be parallelized — so that data can’t be hashed as fast as it can be read.

Although the term “hashing” has become synonymous with the linear hash, Sylve said, there’s likely no “one size fits all” solution. Other methods, like hashing the logical image stream or Merkle Hash Trees, can be time consuming, change embedded metadata, or even change the block size and thus the end result.

In contrast, modern imaging formats like AFF4 map data across multiple devices and data streams. Based on a set of standard identifiers, Sylve argued for extending AFF4 in a way that improves tools’ interoperability via a standardized way to communicate their choice of hashing. That way, users can know what kind of hash it was to be able to validate it.

Technical presentations: File system forensics

Not unrelated to Sylve’s presentation was “An Empirical Study of the NTFS Cluster Allocation Behavior Over Time,” presented by Martin Karresand of the Norwegian University of Science and Technology (NTNU). Standard linear extractions, Karresand echoed, have become inefficient given increases in both storage capacity and the number of cases. 

Because this affects file carving, his team’s research — a continuation of work presented at last year’s DFRWS-USA conference — explored improving forensic efficiency based on the way the NTFS allocation algorithm structures stored data.

Studying the algorithm and its behavior over time between Windows 7 and Windows 10, the researchers found that disk space isn’t a factor. Instead of claiming the unused space at the end of a partition, the allocation strategy aims to fill “holes” in the already written area.

In other words, said Karresand, like looking for a set of dropped keys in the approximate location where you dropped them, operating systems’ allocation algorithms — and thus, forensic practitioners — can calculate the place where user data is most likely to reside. 

File carving was also the subject of “Generic Metadata Time Carving,” winner of the Best Overall Paper award. Presented by Kyle Porter of the NTNU, this paper described using timestamps as a dynamic, common metadata signature for files and directories within each file system.

Like linear extractions and hashing, traditional file carving has its limits. That’s because it relies on data structures like file signatures or semantics within the file. As a result, it misses filesystem metadata and has trouble with fragmentation.

Porter’s research team relied on file systems’ way of recording multiple timestamps per metadata entry. They found that three or more collocated timestamps per entry offered a reliable way to find and carve file and metadata structures in different file systems — and to recover the associated files, as long as they aren’t deleted.

Porter then described the research team’s generic timestamp carver, which is an algorithm that relies on simple string matching to provide potential locations for repeated timestamps in each metadata structure. After identifying these, the carver’s semantic parser filters the results depending on the specific file system type.

Both file carving and metadata were topics of “Unifying Metadata-Based Storage Reconstruction and Carving with LAYR,” a paper delivered by Janine Schneider of the FAU. She defined a “semantic gap” between the abstraction layers used by computing systems to organize their storage resources: the higher level storage for file systems and files, constructed from lower level storage such as disk volumes.

Bridging the gap between these layers, said Schneider, is important because there are so many different forms of digital evidence, and within each, differences in how data is stored, parsed and analyzed. However, she added, most forensic storage reconstruction techniques work on either side of the gap.

For example, metadata-based reconstruction techniques such as The Sleuth Kit (TSK) and many commercial tools also gather data at lower layers and interpret this data to reconstruct higher layers. Conversely, pattern-based reconstruction — file carvers like Foremost and Scalpel — focus mainly on hard-to-reconstruct deleted files.

Schneider’s research team bridged the gap via LAYR, a forensic reconstruction / analysis tool derived from the researchers’ modular framework. It automatically and reliably combines the different reconstruction approaches, seamlessly drawing on each technique’s respective strengths.

Technical presentations: Memory forensics

Ralph Palutke presented the paper he authored together with Frank Block, Patrick Reichenberger, and Dominik Stripeika of the Friedrich-Alexander Universität Erlangen-Nürnberg (FAU) IT security research team: “Hiding Process Memory via Anti-Forensic Techniques.”

Palutke described three novel subversion techniques used by attackers to prevent malicious user space memory from appearing in analysis tools, and to make the memory inaccessible to security analysts. Any of these, he said, can be used alone or together:

  • Memory area structure (MAS) remapping
  • Page table entry (PTE) remapping malicious pageframes to benign, as well as PTE erasure
  • Shared memory subversion, which makes use of fact that memory doesn’t have to be shared; just that it can be.

All three techniques are detectable, said Palutke, demonstrating two Rekall plugins that automate hidden memory detection for the shared memory scenario. The research evaluated all techniques on Windows and Linux operating systems using memory forensics and live analysis.

Andrew Case of the Volatility Foundation followed that up with his talk, “Memory Analysis of macOS Page Queues.” Unstructured techniques relying on strings, regular expressions, or file carving don’t offer the context needed to analyze modern malware and attacker toolkits, said Case — yet many memory forensics tools ignore the memory pages that might otherwise be reconstructed from these queues.

In part, that’s because the pages need to be reordered from the non-contiguous virtual address spaces that the macOS and running processes use to organize their code and data, back to contiguous physical pages.

Performing this reordering, said Case, means understanding the macOS memory management mechanisms that allocate and manage the physical pages, in order to model the address translation accurately. That way, tool frameworks like Volatility could translate between virtual addresses and physical offsets.

A new tool and visualization framework, FORESHADOW — Memory FOREnSics of HArDware cryptOcurrency Wallets — was introduced by Tyler Thomas and Matthew Piscitelli, members of the hacking team at the University of New Haven’s research lab (cFREG).

Referring to the increase in cryptocurrency use in major crimes — including the high-profile Twitter hack in which $100,000 in Bitcoin was scammed from people, as well as ransomware attacks — the researchers said it can be difficult for law enforcement to find out who perpetrated crimes.

Towards closing this gap, their research, funded by the National Science Foundation (NSF), asked three questions: given a memory image of a computer recently running a cryptocurrency hardware wallet client:

  • Can we detect the wallet usage?
  • Can we extract forensically relevant data (e.g. transaction history) related to that use?
  • If so, how long does the data persist in memory?

Their research indeed found forensically relevant data in memory including transaction history, extended public keys, passphrases, and unique device identifiers. As a result, the team was able to contribute to the Artifact Genome Project, as well as to the Volatility framework with a new plugin. The final contribution, the FORESHADOW framework itself, helps analysts to:

  • Associate a hardware wallet with a computer
  • Allow an observer to deanonymize all past and future transactions due to hierarchical deterministic wallet address derivation.
  • Measure the persistence and integrity of artifacts produced by Ledger and Trezor hardware wallet clients.

Technical presentations: Development

In “Integrating GRR Rapid Response with Graylog Extended Log Format,” Jacob Brown relied on two real-world scenarios to demonstrate how an organization might set up a security operations center (SOC) to perform remote live forensics at minimal cost.

Google’s open source Rapid Response (GRR) project, developed to offer rapid, scalable remote live forensics, can already gather volatile data, live stored data, and data in transit from remote hosts. However, GRR’s existing six output plugins are, said Brown, “somewhat limited” if not in scalability or functionality, then by accessibility: some carry licensing costs.

Brown demonstrated both reactive forensic and proactive threat hunting possibilities using the Graylog Extended Log Format (GELF) protocol, a no-cost, enterprise-ready incident response plugin.

Designed to overcome the shortcomings of Syslog, including limited message lengths, payload size limits, lack of data types, and lack of compression, GELF helps to further analyze GRR output. Brown said incident responders at smaller organizations can manage data coming back from a large number of hosts by indexing and graphing it, then sending to GELF automatically.

Limited logging is a significant risk with 3D printers, said Muhammad Haris Rais of Virginia Commonwealth University (VCU), presenting “Forensic Readiness Framework for 3D Printing Process: A Case Study of Ultimaker 3.” That’s because malicious activity — what Rais termed thermodynamic and kinetic attacks — can surreptitiously damage a 3D printing object.

Because 3D printing goes layer by layer, said Rais, a malicious actor doesn’t have to sabotage raw materials. Instead, they can make slight changes to manufacturing parameters such as nozzle temperature and positioning. This kind of attack can destabilize an object’s internal layers, including:

  • Object shape, print orientation, and toolpath sequence 
  • Infill pattern — the geometry of each layer and how layers are printed
  • Layer thickness and quantity of material needed for each object
  • Timing profile: commands’ sequence and execution, which can affect objects’ properties even if they look the same
  • Temperature profiles for the nozzle and the printing bed

These factors’ impact on an object’s material properties has critical safety implications for components of commercial products such as aircraft engines, turbines, and cabin interiors — all of which are increasingly manufactured using 3D printers, and which are likely to undergo forensic investigation in the aftermath of a catastrophic incident.

Assessing a 3D printer for compromise isn’t a matter of conventional digital forensics, said Rais, because compromised printers can contain fake data logs to appear normal. His team came up with a “forensic readiness” framework for a fused deposition modeling (FDM)-based 3D printer, the Ultimaker 3. The framework relies on out-of-bound sensors and algorithms that log and analyze sensor data for critical printing parameters.

Another VCU researcher, Syed Ali Qasim, presented an acquisition and decoding-oriented paper, “Control Logic Forensics Framework using Built-in Decompiler of Engineering Software in Industrial Control Systems” — winner of Best Student Paper at the conference.

Qasim described how the injection of malicious control logic into industrial control systems’ (ICS) programmable logic controllers (PLCs) could sabotage physical processes at the field sites of nuclear plants, traffic-light signals, elevators, and conveyor belts.

The malicious logic can leave behind forensic artifacts within suspicious network traffic, said Qasim, but existing solutions like Laddis and Similo are limited. His research team wanted to be able to analyze network traffic from a control logic to recover the binary control-logic source code.

The team came up with its own control logic forensics framework, Reditus, which integrates the decompiler built into engineering software. This wasn’t without its own challenges — the binary format isn’t standardized, multiple programming languages are in play, and different ICS protocols are often proprietary — but by evaluating the framework at the functional level, the packet level, and for its transfer accuracy, the team was able to confirm it would work.

Mobile forensics tools and methods

Following his presentations at the National Cybercrime Conference (NCCC) and the SANS DFIR Summit, Alexis Brignoni described his community-driven, open source Python frameworks, iLEAPP and ALEAPP — brought together as xLEAPP — which offer an open source solution to triage, parse, test and validate data extracted from iOS and Android operating systems.

Brignoni demoed xLEAPP, which offers a GUI with a command line interface option, and is cross-platform, including on Windows. Users can input TAR, ZIP, or even logical extraction files to parse. An HTML-based report extracts actionable data about files processed and their locations.

One unique artifact xLEAPP offers is a notifications report that contains both message content and protobuf content, or libraries that generate a Python dictionary. In turn, the dictionary helps automate the process of locating artifacts within an extraction.

Beyond these benefits, said Brignoni, xLEAPP is an option to those for whom commercial tools are out of reach. It also enables examiners to parse brand-new or obscure artifacts — those which vendors haven’t added to established tools. Because it’s open source, analysts are welcome to add to the project — Brignoni’s Python study group on the DFIR Discord server encourages development.

iOS “vault,” or content hiding, apps necessitated the development of VIDE – the Vault App Identification and Extraction System for iOS Devices. Gokila Dorai, an assistant professor at Augusta University, described vault apps as well as decoy apps — vaults disguised as standard apps, such as calculators — and how to use VIDE to identify and analyze them.

VIDE relies on what Dorai called the first in-depth study of iOS vault apps. Focusing on iOS devices and vault apps from the App Store in the United States, Russia, India and China, VIDE operates in a two-step process that rapidly identifies and then extracts data concealed within vault apps.

The lightweight, automated VIDE starts by scanning apps from an App Store. To identify potential vault apps, it applies English, Russian, Chinese, or Hindi keywords to the app title, subtitle, and URL. Details from positively identified apps are stored in VIDE’s local database

This initial categorization is followed by more precise binary classification, which is needed, said Dorai, because vault apps are regularly removed from the US App Store. They aren’t well maintained and often contain malware payloads, so even after they’re removed, VIDE can still identify their presence. Once this is done, VIDE’s extraction engine performs a logical backup of the iOS device, then identifies the app’s unique ID and extracts the data.

Of course, extracting data using a tool like VIDE or iLEAPP is just the start. Putting a user behind an iOS device is the next step, as Heather Mahalik, Cellebrite’s senior director of digital intelligence and a SANS Institute author, co-curriculum lead and senior instructor, pointed out.

Mahalik focused on iOS device setup, saying that understanding whether an iOS device was activated fresh out of the box, restored from iCloud or via an iTunes backup, or even wiped prior to setup can help to determine who’s responsible for artifacts and activities.

Sharing her test methodology and process — devices, models, OS versions, a test plan, type of tools to use (anything free for validation reasons), and documentation — Mahalik stressed the need to share research results, for a number of reasons:

  • Well conducted research can positively impact many investigations, whether you come up with something new, update someone else’s work, compare and contrast methods, or try the same methods on new devices / versions.
  • Peer feedback — even pushback — can reflect important perspective, because — as reflected in DeGrazia’s keynote — we all think differently.
  • Validating results can actually help to relieve the pressure to obtain all the right answers, especially when results may be unexpected or not make sense at first.
  • Validation also shows that while practitioners may come across artifacts in different ways, consistent and factual answers are what counts.

Other presentations at DFRWS-USA 2020

  • “Cyber Sleuth: Education and Immersion for the Next Generation of Forensicators,” to be recapped in a forthcoming article about secondary education for teens interested in digital forensics
  • “Statistical Methods for the Forensic Analysis of Geolocated Event Data,” to be recapped in a future article about the use of likelihood ratios and other structured evaluation methods for digital evidence
  • “Exploring the Learning Efficacy of Digital Forensics Concepts and Bagging & Tagging of Digital Devices in Immersive Virtual Reality,” to be recapped in a forthcoming article about DFIR education during a pandemic. 
  • “Facilitating Electromagnetic Side-Channel Analysis for IoT Investigation: Evaluating the EMvidence Framework,” which we recapped from the DFRWS-EU conference.
  • “The Potential of Digital Traces in Providing Evidence at Activity Level” from Dr. J. Henseler at the University of Applied Sciences Leiden (the Netherlands); we gave an overview of this talk in our recent article, “Timelines in Digital Forensic Investigation.”
  • Workshops on the CASE Ontology.
  • “Certificate Injection-based Encrypted Traffic Forensics in AI Speaker Ecosystem,” following up from a paper published at DFRWS 2019, and this year proposing a forensic model to directly analyze encrypted traffic between AI speakers and the cloud.
  • “Performing Linux Forensic Analysis and Why You Should Care,” offered by Dr. Ali Hadi at Champlain College. (Note: Dr. Hadi’s shorter presentation on the topic is in our SANS DFIR Summit recap; look for an article coming soon on Forensic Focus!)
  • “Investigating Windows Subsystem for Linux (WSL) Endpoints,” delivered by Tanium’s Director, Endpoint Detection & Response (EDR), Asif Matadar, who described the underlying architecture changes that will allow investigators to respond to incidents involving compromised Windows 10 or Windows Server 2019.
  • “How Security Ninjas whisper the Sigma sounds” by Roberto Martinez, a Senior Security Researcher at Kaspersky, GReAT, covering how a Sigma instance can help round out tools like Snort and Yara by focusing on the logs, events, and other artifacts residing inside operating systems.
  • “Forensic Audio Clarification,” a hands on workshop for beginners presented by David Notowitz.
  • “Transitioning from Python to Rust for Forensic Tool Creation” by Matthew Seyer.

Dykstra gave his thanks to DFRWS sponsors, its organizing committee, and the volunteers who helped run the conference, which will again be virtual in 2021 out of pandemic-related caution. Learn more at DFRWS.org!

Christa Miller is a Content Manager at Forensic Focus. She specializes in writing about technology and criminal justice, with particular interest in issues related to digital evidence and cyber law.

Leave a Comment