by Christa Miller, Forensic Focus
This year’s Digital Forensics Research Workshop (DFRWS) EU, originally intended to take place in Oxford, England in March 2020, instead moved online in June. The conference included:
- Fourteen paper presentations focused on memory forensics, digital forensic science, AI-enabled investigations, network forensics, and file system forensics.
- Three in-depth workshops, along with shorter practitioner presentations and lightning talks, rounding out the educational portion.
- Nine Birds of a Feather sessions allowed participants to interact around a range of technical and operational topics.
- Organizers also left plenty of room for networking and fun: the first two days ended with team activities including the Forensics Rodeo and Pub Quiz.
Standardization was a major theme. Multiple paper presentations, the keynote address, and a workshop described the need to standardize both processes and tools to improve not just the way they run, but also their accountability in courts of law.
Positioning the matter in terms of risk management, Dr Gillian Tully’s keynote on Day 2 drew on her extensive experience as the United Kingdom’s Forensic Science Regulator. Overseer of the ISO/IEC 17025 accreditation process for the UK’s 60+ legal entities performing digital forensics services, Tully described the results of a recent assessment that detailed how forensic labs have improved, and areas they still need to address. (Forensic Focus will detail these findings in a forthcoming article.)
The Research Papers
Resolving these problems by standardizing the entire digital forensics process, or at least parts of it, featured in a number of the presentations.
Digital forensic science
Virginia N. L. Franqueira presented her research with Graeme Horsman, “Towards Sound Forensic Arguments: Structured Argumentation Applied to Digital Forensics Practice,” which earned the Best Paper Award at the conference.
The ability to organize key facts and align with a scientific approach, said Franqueira, is increasingly both necessary and difficult given digital forensics’ investigative complexities. Using three case examples, she described how philosopher Stephen Toulmin’s structure of arguments could make it easier for peers and attorneys to understand and scrutinize forensic work.
Argumentation was also the subject of a paper presented by Erisa Karafili, “An Argumentation-Based Reasoner to Assist Digital Investigation and Attribution of Cyber-Attacks.” Karafili’s research together with Linna Wang and Emil Lupu focused on their proof-of-concept argumentation-based reasoner (ABR), which combines technical and social evidence of a cyber-attack.
Describing the ABR as a “countermeasure” to complex analysis and attribution both during and following attacks, Karafili spoke about how the tool handles incomplete and conflicting information, filling knowledge gaps and indicating missing evidence or new investigation paths to follow. The goal: to attribute the attack to a particular threat actor.
Eoghan Casey then presented “Structuring the Evaluation of Location-Related Mobile Device Evidence,” a framework that can allow a forensic examiner to generate multiple alternative hypotheses and then, supposing each is true, evaluate the evidence to state the probability of the evidence supporting each hypothesis.
Although they acknowledged that the framework has both advantages and disadvantages when applied to mobile device location data, Casey and his coauthors — David-Olivier Jaquet-Chiffelle, Hannes Spichiger, Elénore Ryser and Thomas Souvignet — emphasized how it could reorient a forensic examination on using observable data to answer technical questions, versus those better left to triers of fact.
In “BMCLeech: Introducing Stealthy Memory Forensics to BMC” — winner of one of two Best Student Paper Awards this year — Tobias Latzo described incident response using BMCLeech, the first software to bring forensic memory acquisition to the Baseboard Management Controller (BMC), which allows an administrator to monitor and administer a server remotely.
Based on OpenBMC and compatible with PCILeech — the framework for memory acquisition via DMA — BMCLeech makes it possible to acquire host memory, inject code into the kernel, or pull files from the host’s filesystem. Quantitatively and visually comparing this approach to LiME for software-based memory acquisition, Latzo and coauthors Julian Brost and Felix Freiling concluded that using BMCLeech together with PCILeech is a practical step in incident response.
In the following session, “Tampering Digital Evidence is Hard: The Case of Main Memory Images,” Janine Schneider described research that sought to quantify the effort needed to convincingly manipulate digital evidence. The conclusion: although tampering with main memory dumps appears to be more difficult than tampering with hard disc images, it’s also more likely that an analyst will miss the signs of manipulation or will need to put greater effort into detection.
Acknowledging that their research turned out to be a qualitative study of approaches to main memory manipulation, rather than a statistically significant study, Schneider and her research partners, Julian Wolf and Felix Freiling, wanted to show how digital evidence tampering could affect the interpretation of data, and the risks to a court case.
Ricardo Rodríguez then presented “On Challenges in Verifying Trusted Executable Files in Memory Forensics.” The research investigated memory forensics’ limitations — data incompleteness, data changes caused by relocation, catalog-signed files, and executable file and process inconsistencies — when it comes to the digital signature verification process of WindowsPE signed files obtained from a memory dump.
Malware abuses this code signing technology in order to establish trust and remain undetected in computer systems. Rodríguez and his coauthor, Daniel Uroz, therefore wanted to come up with a way to easily verify digital signatures. Part of their study included the development of a Volatility plugin, sigcheck, which could recover executable files from a memory dump and compute its digital signature.
Felix Anda presented “DeepUAge: Improving Underage Age Estimation Accuracy to Aid CSEM Investigation.” He described how he and fellow researchers Nhien An Le Khac and Mark Scanlon used VisAGe, the largest underage facial age dataset, to train a deep learning classification model, DeepUAge, that achieved state-of-the-art performance for age estimation of minors.
To solve the problem of existing algorithms’ difficulty estimating children’s ages, the researchers validated class labels and reduced unnecessary image features such as hair or backgrounds. DeepUAge thus represents a significant step forward for automated age estimation, which is crucial to reducing both investigator exposure to CSAM and the backlog of data from social media, messaging, CCTV, hard drives, and other media to be examined.
Following was the paper that was awarded both the conference’s other Best Student Paper Award and Best Poster Award, and was also the subject of a workshop on Day 3: “Cutting through the Emissions: Feature Selection from Electromagnetic Side-Channel Data for Activity Detection.”
Asanka Sayakkara presented research done together with Luis Miralles, Nhien An Le Khac and Mark Scanlon on using electromagnetic side-channel analysis (EM-SCA) for digital evidence acquisition. Explaining how computer processors’ patterns of electromagnetic (EM) radiation signals correlate with software behavior manipulated at the processor, Sayakkara described how to use passive EM-SCA techniques for digital forensics on Internet of Things (IoT) devices.
This method can result in data that is too plentiful to process on-site in real time, so Sayakkara described how AI can automate data processing to decrease the noise-to-signal ratio and lead to a much smaller subset of channels — offering the capacity to analyze these channels in real time. Sayakkara’s workshop went in-depth, showing how the EMvidence open-source software framework can be used to inspect IoT devices for digital forensic use cases.
Patrick Mullan then presented “Towards Open-Set Forensic Source Grouping on JPEG Header Information.” Grouping roughly 2.8 million JPEG images by file header, Mullan and coauthors Christian Riess and Felix Freiling wanted to see whether an algorithm could predict the make of a previously unseen camera model — and from there, identify the source of multimedia content.
The goal was to apply AI rather than have to keep a database updated with new camera make/model information. The results: median accuracies beyond 90% for preprocessed images from each make. Working with images that were post-processed, median accuracy dropped to about 55% for desktop software and 75% for smartphone apps.
Sadegh Torabi presented “A Scalable Platform for Enabling the Forensic Investigation of Exploited IoT Devices and their Generated Unsolicited Activities,” research he conducted together with Elias Bou-Harb, Chadi Assi and Mourad Debbabi.
By leveraging Apache Spark — a big data analytics framework — the research team designed and developed a scalable system to automate the detection of compromised IoT devices. After identifying 27,849 compromised IoT devices that generated more than 300 million unsolicited packets, the researchers used the data to infer and fingerprint their unsolicited activities.
Following Torabi’s presentation, Xiaolu Zhang presented another angle of IoT compromise. “IoT Botnet Forensics: A Comprehensive Digital Forensic Case Study on Mirai Botnet Servers,” research Zhang conducted together with Oren Upton, Nicole Beebe and Kim-Kwang Raymond Choo, examined IoT bot malware in light of IoT’s increasing foothold around the world. The goal: to identify both data to obtain, and information about which device(s) to target for acquisition and investigation.
This first published digital forensic case study of the Mirai bot malware examined infected devices and Mirai network devices by setting up a fully functioning Mirai botnet architecture. Comprehensive forensic analysis — both remote and hands-on — of the botnet server allowed the team to discover forensic artifacts left on the attacker’s terminal, command and control server, database server, scan receiver and loader, as well as network packets.
The final presentation in this topic set, “IP addresses in the context of digital evidence in the criminal and civil case law of the Slovak Republic,” was delivered by Pavol Sokol. Analyzing judicial decisions from the Slovak Republic between 2008 and 2019, Sokol and fellow researchers Laura Rózenfeldová, Katarína Lučivjanská and Jakub Harašta wanted a sense of both current and developing trends in the use of IP addresses as digital evidence in judicial proceedings.
The research produced a surprising finding: the more evidence is produced, the more likely it is that the court finds the defendant not guilty or the civil action unsuccessful. More salient, however, were “common errors” in courts’ reliance on IP addresses to identify people rather than devices. The researchers concluded that the use of IP address as evidence in court — especially criminal proceedings — would benefit from standardization.
File System Forensics
Tobias Groß described his research with Paul Prade and Andreas Dewald, “Forensic Analysis of the Resilient File System (ReFS) Version 3.4,” detailing Microsoft’s new modern file system.
Their research was designed to document ReFS’ general concepts and internal structures, with particular emphasis on deleted file recovery. They did this by extending the open source forensic tool The Sleuth Kit to be able to parse and interpret ReFS partitions. They additionally found that page carving enabled the recovery of more data than did the use of only deleted entries still present on the disk.
Following this talk, Frank Breitinger presented “Artifacts for detecting timestamp manipulation in NTFS and their reliability.” Coauthored with David Palmbach, Breitinger’s research offered a new use of four existing windows artifacts, going beyond $MFT and $LogFile to include the$USNjrnl, LNK files, prefetch files, and Windows event logs to detail information about executed programs or additional timestamps.
On the other hand, none of the tested artifacts turned out to be a reliable source of information because a sophisticated attacker or malicious software could still delete or manipulate them, and their ability to retain information — either due to manipulation or obsolescence — is limited. The researchers recommend combining multiple artifacts with methods such as time-stamp rules to reveal inconsistencies and increase the odds of detecting timestamp forgery.
Practitioner Presentations, Lightning Talks, and Poster Presentations
In “Digital Forensic Techniques for Preservation and Archiving”, Neil Jefferies described how digital forensics can be applied to preservation of archives and library collections, with scholarly authenticity the goal rather than legal admissibility. Because corporate archives — the work of multiple people over time — can be complex, and signature systems age poorly, digital forensics can help with long term data management, even reconstructing narratives that are important to humanities studies.
Ameya Puranik presented “Relevance scoring and clustering of digital traces,” combining timeline and metadata analysis to identify relevant incriminating traces from large datasets where automated timeline reconstruction and metadata-based classification could be applied to distributed computing.
In “Forensic analysis of Apple Homepod,” Mattia Epifani described different artifacts available from the Homepod, the “hub” of the Apple Homekit system. Room information, music playback and timeline, wifi logs, power logs, and the syslog can all provide insights into a user’s activity in their home environment.
Timothy Bollé and Eoghan Casey described research they conducted together with Francesco Servida, Johann Polewczyk, and Thomas Souvignet: “Expressing evaluative conclusions in cases involving tampering of digital evidence.” Their approach formalizes case assessment and interpretation by offering a method of forming and testing hypotheses that can be applied at any decision point in an investigation.
Casey also discussed “Automated normalisation and correlation of mobile device extractions using CASE,” his research together with Martina Reif and Quentin Rossy. While the goal is to visualize data for analysis, Casey said, bringing it all together cohesively is the challenge because commercial tools tend to retain the data in the tool. That limits analysts’ ability to bring in data from outside the tool.
The open-source, ontology-based standard Cyber-investigation Analysis Standard Expression (CASE) offers a structured way to “liberate” data, allowing different data sources to be correlated and combined in any type of investigation. Casey and others explained the concept further in a Day 3 workshop. “Making the CASE for Cyber-investigation Interoperability” provided an overview and update of CASE, including practical case studies and a roadmap. https://caseontology.org/
Lightning Talks and Poster Presentations
Both prearranged and submitted on the spot, Lightning Talks provided overviews of poster presentations as well as information about relevant research in the community:
- Three poster presentations described the use of blockchain for evidence management throughout a chain of custody:
- “Tainted Digital Evidence and Privacy Protection in Blockchain-based Systems”, David Billard
- “The Application of Blockchain of Custody in Criminal Investigation Process”, Yueh-Tan Chiang, Fu-Ching Tsai
- “Chronological independently verifiable electronic chain of custody ledger using blockchain technology”, Xavier Burri
- “PNG Data Detector for DECA”, a file carving algorithm that checks hard drive clusters for relevant file sizes and header/footer data, then skips clusters without those types of data. Researchers: Kingson Chinedu Odogwu, Pavel Gladyshev and Babak Habibnia
- “Big Data Forensics: Hadoop 3.2.0 Reconstruction”, Edward Harshany, Ryan Benton, David Bourrie and William Glisson
- “Detecting Cyberbullying “Hotspots” on Twitter: A Predictive Analytics Approach”, Shuyuan Mary Ho, Dayu Kao, Ming-Jung Chiu-Huang, Wenyi Li and Chung-Jui Lai
- “Infection Detection of Emotet Malware Using Capture-Display-Analyze Model in Wireshark Packet Extraction”, Te-Min Liu, En-Chun Kuo, Da-Yu Kao
- Context-based decryption for law enforcement, offered by Aikaterini Kanta, who described how a suspect’s digital life — their local device information, online presence, previous passwords, etc. could be engineered into a smarter, personalized dictionary list to unlock encrypted devices.
- Felix Freiling talked about the Cybercrime & Forensic Computing Project in Germany, a collaborative effort between law and computer science researchers seeking to apply scientific methods to questions — reproducibility, quantifiability, relationships between offense types, and classes of digital evidence — that are relevant in courts of law.
- Hans van Beek described the Hansken project, or “Digital Forensics as a Service,” being undertaken by the Netherlands Forensic Institute. An open, transparent, scalable service for a traceable chain of evidence that withstands Dutch judicial review, Hansken is available to both law enforcement agencies and academic institutions worldwide.
- Jessica Hyde talked about DFIR Review, a way to get blog posts, papers, and other articles peer reviewed and assigned a DOI number for improved published research.
- Daryl Pfeif offered information about the Cyber Sleuth Science Lab, a National Science Foundation-funded project designed to prepare high school students to work in digital forensics.
DFRWS EU afforded the opportunity for practitioners and researchers from many countries to connect and interact in spite of the COVID-19 pandemic. DFRWS US will likewise be virtual, running July 20-24. Register here.