by Brandon Epstein, Walter Bruehs, Bertram Lyons, and Dan Fischer
Video evidence is prolific in modern criminal investigations, and cell-phone video is one of the most prevalent ways video is captured [1]. While these silent witnesses can provide excellent investigative leads or paint a picture for a jury at trial, they do present unique challenges to detectives and attorneys.
There are many questions asked of digital evidence, from understanding content to authentication of video files. Perhaps the most difficult question to answer is the provenance, or origin, of a video file, or the ability of an investigator to attribute a video to a specific device. Compounding the difficulty in answering these questions is that investigators are often asked about individual files, either emailed to an investigator or obtained from social media or streaming sites such as YouTube.
A novel approach to digital media examination
Digital and forensic video examiners may currently perform a number of examinations on submitted video files, including:
- authenticity evaluation – examining media to determine if it is what it purports to be, and has not been manipulated from the time it was created.
- image clarification – enhancing the viewer’s ability to see images/video and make informed decisions of its content.
- comparative analysis – evaluating an object in recorded imagery, like a shirt, hat or car and to determine if it is the same object in the real world (e.g. seized by law enforcement after a crime).
- photogrammetry – determining height and/or distances of three-dimensional objects in a two-dimensional image or video.
- image source authentication – evaluating a recorded image and matching it to a specific digital camera based on a one-to-one correlation of the Photo Response Non-Unifority (PRNU) [2].
Although examiners are often asked from where a video file originated, there is currently no means at their disposal to effectively and efficiently answer this question. Typically metadata information is examined to try to ascertain the origins of a digital file; however, metadata is easily altered and often removed by editing programs.
Recent research and product development has led to a new approach in examining digital video files in this manner [3, 4]. Investigators are now able to conduct device source and generational classification/authentication based on a single video file’s attributes, in other words identifying the device that created the video along with other devices/software it touched along the way.
This classification/authentication is not based upon the visual content of the video file, rather the unique way that files are constructed and handled (opaque to the user) during many transmission (and editing) processes.
For example, a singular video file sent to an investigator may not have any associated metadata indicating where that file originated or the devices/software that it may have passed through before being sent to law enforcement.
By analyzing the construction of the file itself, it can be determined that the file originated from an iPhone 6S and was sent by WhatsApp to its recipient. Additionally, the authenticity can be evaluated by identifying the presence (or lack thereof) of any editing within the construction of the file or if the file has touched any editing software.
While it is possible to manually perform a one to one comparison of a questioned video to a known file, it is not possible to manually conduct device source and generational classification/authentication based on a singular video [5].
Much like a fingerprint examiner uses an automated fingerprint identification system (AFIS), automation is required to evaluate source and generation in a practical manner. Research and development of an automated system for source and generational classification/authentication using a basic machine learning algorithm has shown promise in not only correctly identifying source, generation, and manipulation, but assigning a level of confidence to those findings [6]. This automation will allow for front-line officers, investigators, and forensic examiners to evaluate large batches of files in a rapid manner with little effort.
If the concept of analyzing the construction of video files to determine where it came from and how it got from point A to point B sounds complex, it is for good reason. Digital video files are constructed in a variety of different ways and there are a multitude of variables found within those [7].
Those variables make it challenging to understand the inner workings of a file, but also the same characteristics that allow for the individualization of the file to a potential source.
Furthermore, every piece of software that touched the video file can leave identifiable traces of the interaction, if you know where to look. Although this may sound like carving for a specific artifact in the hex of a file, in fact it is a different approach, looking at the construction arrangement of data, within the entire file.
Potential impacts of source and generational classification to investigations
The implications of this new approach are far reaching across many kinds of investigations.
ICAC/CSAM investigators are constantly attempting to identify where illicit images came from prior to arriving on a suspect’s device. Attribution to a specific device or suspect could make all the difference between a specific charge, potential prosecution, or even victim identification.
Generational classification of a file can reveal if a video was stored on the device (gallery, photos, DCIM folder) and then transmitted versus captured within an app (iMessage, WhatsApp, Instagram) and transmitted within the app.
This determination can reveal if a video originated on a specific device as well as help to identify a pattern of transmission in large scale investigations where agencies could accept and authenticate large amounts of crowd-sourced video through a number of different channels (e.g. email, text message, website upload).
Patrol officers and investigators are often tasked with obtaining a singular video of interest from a witness or victim’s cell phone. Do they take the phone for forensic analysis, depriving the owner of their phone for days or do they have the victim/witness email the file directly to them, potentially opening an avenue to challenge authenticity? [8]
With device source and generational classification/authentication, singular files can be transmitted (email, DropBox, Google Drive, file sharing platform), evaluated for manipulation, and tied to a source model with specific attributes, potentially eliminating the need to forensically acquire the entirety of a device when only a single video is sought.
Authenticity remains a concern even when using a digital forensic tool’s targeted acquisition; an independent identification of a video’s source could help to authenticate individual files from a targeted acquisition without the contents of an entire device.
Given our ever-connected society it is not uncommon for investigators to make a public request for potential video evidence to be sent to an agency. This potential deluge of video from unknown and/or anonymous sources presents significant challenges to evaluating the authenticity and provenance of video files. The ability to independently evaluate each file for manipulation and provenance can help investigators rely on specific files and ease their introduction in legal proceedings.
Social media and streaming videos are also becoming commonplace across a wide array of investigations. While best practices currently dictate that the video be obtained directly from the service provider, legal process can be cumbersome and time consuming [9].
As such, investigators often rely upon third party tools or screen capture to obtain these videos. Even with files direct from the hosting service, much of the video metadata is removed in the conversion processes that allows for efficient streaming of the file.
By integrating the capability to download many cloud-based videos using open-source tools with device source and generational classification/authentication, investigators can gain insight into the authenticity as well as provenance absent the streaming file’s metadata.
In the time before COVID-19 became a household name and the lead of every newscast, the 2020 presidential election dominated this country’s conversations. A recurring theme in those conversations, as well as others, are the effects of deepfake videos on the election process and society in general [10].
While deepfake videos may not be an immediate threat and many experts are attempting to develop a solution to identify manipulated videos, there is currently no comprehensive video authentication tool.
By altering the approach from examining a video file’s content to examining its source, investigators can gain new insight into the authenticity of a video. It may be easy to alter a video’s content and provide realistic results; doing so without altering any of the intricate construction of the file itself remains to be seen.
Practical Considerations
It should be noted that while source and generational classification/authentication may be a promising solution to many issues involving digital video files in criminal investigations, it is not an all-encompassing solution. Identification of a video’s source and method of transmission is powerful information, but it is not content authentication [11].
For example, a video is captured of a purported celebrity in the company of someone who is not their spouse. Source classification/authentication may show that the video was captured with a Samsung Galaxy S9 cell phone and has not been manipulated, but it provides no answer as to the true identity of the persons in the video; is it actually the celebrity or a look-a-like?
While source and generational classification/authentication based on a single video file’s attributes has proven accurate, there is still opportunity for improving the amount of file formats and devices supported for analysis. Like any new approach or technology, this support will grow over time, possibly exponentially due to machine learning and AI algorithms.
It is foreseeable that in the near future, the vast majority of files will be able to be authenticated and mapped across multiple transmission sources to the originating device, changing the way data can be acquired and investigations conducted.
About The Authors
Brandon Epstein, Bertram Lyons, and Dan Fischer are respectively the Director of Law Enforcement Training, Managing Director of Software, and Senior Developer at Medex Forensics. Medex Forensics’ flagship product, Medex, a source and generational classification and authentication tool is due to launch in the third quarter 2020.
Walter E. Bruehs is employed by the Federal Bureau of Investigation as the Supervisory Physical Scientist (Image Analysis), in the Forensic Audio, Video, Image Analysis Program in the Digital Forensic Analysis Unit.
References
[1] “2020 Annual Digital Intelligence Benchmark Report: Law Enforcement,” Cellebrite Digital Intelligence (published April 2020), available at https://www.cellebrite.com/en/insights/industry-report/ (accessed May 19 2020)[2] Chen, M., Fridrich, J., Goljan, M., and Lukas, J.: “Determining Image Origin and Integrity Using Sensor Noise.” IEEE Transactions on Information Security and Forensics, vol. 3(1), 2008, pp. 74-90.
[3] B. Lyons and W. Bruehs. “Structural Signatures: Using Source-Specific Format Structures to Identify the Provenance of Digital Video Files.” Presented at the 104th IAI International Educational Conference, Reno, August 15, 2019.
[4] B. Lyons, and D. Fischer. “Structural Signatures: Using Source-Specific Format Structures to Identify the Provenance of Digital Video Files.” Presented at the Joint Technical Symposium, Amsterdam, October 5, 2019. https://weareavp.aviaryplatform.com/collections/6/collection_resources/13957
[5] T. Gloe, A. Fischer and M. Kirchner, “Forensic analysis of video file formats,” Digital Investigation, vol. 11, pp. 68-76, 2014.
[6] D. Fischer and B. Lyons. Source Identifying Forensics System, Device, and Method for Multimedia Files. US Patent Pending. New York, August 12, 2019.
[7] “SWGDE Technical Overview of Digital Video Files,” Scientific Working Group on Digital Evidence (published September 2019), available at www.swgde.org (accessed May 27, 2020).
[8] “SWGDE Best Practices for Mobile Device Evidence and Collection Preservation and Acquisition,” Scientific Working Group on Digital Evidence (published July 2019), available at www.swgde.org (accessed May 19, 2020).
[9] “SWGDE Best Practices for Digital and Multimedia Evidence Video Acquisition from Cloud Storage,” Scientific Working Group on Digital Evidence (published April 2018), available at www.swgde.org (accessed May 19, 2020).
[10] C. Vaccari and C. Chadwick, “Deepfakes and Disinformation: Exploring the Impact of Synthetic Political Video on Deception, Uncertainty, and Trust in News,” Social Media and Society, Jan-Mar 2020, pp. 1-13, 2020.
[11] N. Dimitova, H. Zhang, B. Saharay, I. Sezan, T. Huang and A. Zakhor. “Applications of Video-Content Analysis and Retrieval,” IEEE Multimedia, Jul-Sep 2002, pp. 42-55, 2002.
The proposed Medex Forensics software solution is intriguing, but it must have a large enough library to detect and descern different file origins, especially for videos from social media as then likely have gone through multiple transcodings and different codec/containers. I will submit a public review (likely on Forensic Focus) once I have evaluated their first beta or release version. In the interim, I primarily rely upon Vocord’s Video Expert, VideoCleaner, and a hex editor (I also extract an i-frame so I can use Amped’s Authenticate).