Author: Stuart Clarke, 7Safe
The EDRM (Electronic Discovery Reference Model) is a widely accepted workflow, which guides those involved in eDiscovery. Typically, the identification and collection phases see email and common office documents harvested, but as technology moves forward is this enough? Many of us are experiencing a rise in audio discovery projects using solutions including phonetics and speech to text. In time this is likely to move onto rich media, in particular video. As a forensic analyst, I know only too well the variety of different data sources which are overlooked in electronic disclosure exercises, yet I appreciate the strong argument of proportionality. Nevertheless, it is relatively straightforward to circumvent some proportionality claims with the appropriate skill sets and techniques. Throughout this article I will discuss proof of concept solutions dealing with Skype in eDiscovery.
Skype is an ever present communication application on both a personal level and within the business environment. Skype is also a perfect example of the electronic data we are overlooking in eDiscovery. Skype claim that during peak times they have around 20million users online (http://about.skype.com/). This figure is hardly surprising when you consider that Skype runs on a range of popular electronic devices including computers and smart phones. Current versions of Skype support instant messaging (IM), file transfers, SMS, standard speech calls and video calls. Helpfully, Skype records our activities in a non-encrypted form on the local device. Furthermore, the average user rarely disables the logging functionality of Skype. Consequently, all IM conversations, call records and details of file transfers are stored locally, be it a computer hard drive or a mobile device. This might not come as a surprise to many of you, and I expect that most Skype users have at some point noticed you can view your Skype message history at the click of a button.
Skype records history in various database tables containing countless fields of data, which in honesty is not user friendly and requires the analyst to have some database knowledge. Many of the fields within the database are not of any great interest for an electronic discovery matter. Yet perhaps the real reason this data is ignored is the format of the Skype database file, which is a SQLite database. There are few computer forensic or eDiscovery solutions available to fully index and present the data from a SQLite database in an understandable manner to a non-technical person.
By manipulating the data within the Skype database file, it is possible to extract and present some powerful information from a user’s Skype history. Instant messaging is perhaps one of the more interesting features of Skype from an investigative prospective. Not only does Skype store the instant message content, but it also evidences details of file transfers within an instant message session. As a result, I will discuss and demonstrate how data from within the Skype history databases can be manipulated and presented in a format that is understood by many common eDiscovery products. To do this, I have developed a process which allows Skype instant messages to be extracted into an email archive, which has been formatted in such a way that processing engines and forensic tools will parse the Skype messages like email messages.
Using the Nuix processing engine, I have demonstrated how the Skype data is presented. Each Skype IM thread is presented as a single message – as shown in Figure 1.1. Such formatting is a vast improvement on the formatting of Skype data within the native SQLite databases, where a single line in a table represents each IM entry. These entries also require some level of analysis, for example the dates are stored as numerical values and require decoding, and chat threads must be manually linked together based on the Skype referencing.
Figure 1.1 – Extracted Skype IM thread in Nuix
Metadata is a vitally important part of an eDiscovery matter; litigations often rely on it for filtering and searching purposes and to identify ownership and the lifecycle of an electronic file. Consequently, I have strived to build and maintain a metadata resource in this Skype solution. A sample of metadata fields include a list of all the participants in the Skype chat session, the date the chat session started, the chat subject, the path of the original Skype database file, the thread
identifier of the chat session and also the details of each sender and recipient of a message. This includes the number of seconds elapsed since the chat session started when each participant posts a new message. I have formatted this information in a form suitable for most eDiscovery platforms to ingest, making it fully available for searching, load file production and legal review.
Figure 1.2 – Summary of extracted metadata
By harnessing the power of Nuix in processing and in data representation, it is possible to present the Skype data visually – as shown in Figure 1.3. It is clear that this is a powerful, high level overview of the Skype instant messages and I believe at the time of writing is quite unique. Using such graphical representations of the data a reviewer can quickly establish who a custodian regularly communicated with and also determine the volume of separate chat threads exchanged.
Figure 1.3 – Network view of data in Nuix
We must also bear in mind that like other forms of data used in eDiscovery, this example of Skype data becomes fully searchable in an eDiscovery platform. The reason for this is again due to the way I have formatted the data in an email archive. Consequently, I have had great success with Nuix, where it creates a full text index of the instant message content and the metadata, making it
available for searching and for metadata filtering. Furthermore, we have also had positive results in production of this data, and a PDF example of the rendering of a Skype thread is given in Figure 1.4.
Figure 1.4 – PDF render of Skype chat thread
It is accepted that questions regarding the production of the Skype data will be raised. The most prominent of these questions is likely to focus around the data not being produced in its ‘native’ format. This is of course a valid argument, but we need to consider the proportionality of reviewing Skype data in its native format. To conduct such a review would require the searching personnel to have an understanding of the SQLite database format and to be able to construct and execute SQL syntax. Additionally, the knowledge would be needed to decode a numerical date value to being human readable. After conquering these issues, a review could take place, however, it would be very demanding and time consuming, much like reviewing an Excel Spreadsheet cell by cell and line per line. The solution presented in this paper acts as a manipulation of the Skype data format, aiming to present the native data into a readable format. No changes are made to the native data or IM content. In addition, all actions undertaken during the manipulation are fully audited allowing for repeatability, as per the ACPO guidelines for computer based evidence.
It must be accepted that electronic data exists in many forms and it is not always possible for this information to be presented in an appropriate, human readable manner. A prominent example of this is an Outlook email message, which is again stored in a database format and is composed of many different components. These different components of an Outlook email message are also manipulated to allow an Outlook message to be presented as an MSG file. While not an identical replica to the process described in this paper, the production of Outlook email in a MSG format demonstrates the data manipulations that already occur in the industry.
This paper has focused solely on Skype instant message data as I deem it to be of most relevance to eDiscovery today. Furthermore, Skype offers considerably more to investigation teams including contact lists, IM, details of file transfers and SMS messages all of which are recorded by default. This proof of concept need not stop at Skype, there are countless instant message applications available, the majority of which are free including Windows Live Messenger, Facebook chat and Yahoo Messenger. Like Skype, these chat applications include instant messaging, with some including contact lists and also file transfers, again stored on a user’s local machine. As the eDiscovery community starts to become more accepting of the value of instant messaging data , they will start to appreciate that these artefacts cannot be left behind.
For a copy of the full research paper undertaken, or to find out further information, please contact [email protected]
To view full article click here