YouDetect – Implementing the principles of statistical classifiers and cluster analysis for the purposes of classifying illegally acquired multimedia files

Author: Jonathan Murphy, 7Safe


Whilst all instances of the illegal acquisition of multimedia are not known, it is not possible to gain a complete loss value, but a loss of $12.5 billion has been suggested by the IPI. Continued response as a means of protecting the media companies and the income they receive from legal sales continues as copyright enforcement attempts to eradicate illegal downloading. This is forcing those who support the legal downloading material to invent new and more creative means to adapt technology to achieve an end to their means. ‘YouTube Downloader’ (YTD) is a proof of concept which allows the user to download videos (of any nature) from a number of video streaming websites simply by entering the URL of the video they wish to download. Whilst the application is specifically named after the website,, videos from many other websites can be acquired in this manner. The software allows the user to convert this video to a variety of multimedia formats including .mp3 and .avi. The individual can then view on these files on any supporting media device or computer. In the case of copyrighted material, the individual who uploaded the material to YouTube in the first instance, as well as the individual who then ‘reproduced’ the material by extracting the video file have infringed on copyright law. As of September 2011, YTD has received approximately 85 million downloads via software download website, ‘’ making it the most commonly used tool of its type by a significant margin. Yet, for something which significantly assists and supports illegal downloading and multimedia piracy so significantly, little has been done to develop a suitable response.

Evidential Markers

It is important to consider the evidential value of suspected illegal download files and how these files would be identified. Evidential markers are defined as ‘identifying components of the internal structure of a file based on the file creation method’. The analysis of the evidential markers proposes a unique problem, in that the previous research into YTD as a major contributor in the development of illegal acquisition of multimedia is virtually non-existent. The concern with file conversion is that as the file having been converted completely, as opposed to simply having its extension changed, is recognised by file signature comparison. As such, the detection of converted files must be achieved through alternative means. The evidential markers were found to be unique to the conversion process and differ significantly from the legitimate multimedia file alternatives, allowing for clear distinction on a suspect’s machine of those multimedia formats which had been acquired through other means and those acquired through YTD. This means that only the relevant evidential files could be extracted or reported making the potential for a software application solution more suitable than before.

Get The Latest DFIR News

Join the Forensic Focus newsletter for the best DFIR articles in your inbox every month.

Unsubscribe any time. We respect your privacy - read our privacy policy.

In conclusion from the research conducted for the topic analysis, it has been possible to identify the scope of YTD as an issue in the fields of copyright infringement and computer crime investigation. From this, it was possible to propose the theoretical and practical exploration of a potential solution to the issue, as well as justifying the requirements and ensuring a suitable selection of tools and techniques to be implemented. Since the development of a software application has proven viable, hereby referred to as ‘YouDetect’, was undertaken.

Statistical Classification and Cluster Analysis

YouDetect utilises the data mining process of pattern extraction from data. This acts as a way of establishing a link between the data which must be identified and the data which is already available. Statistical classification is the procedure whereby input data, known as an ‘instance’, is processed to place it into a category or ‘classes’. Classification requires data sets to be exact matches to each other to allow for perfect implementation. Cluster analysis consists of discovering similarities in data sets and providing groupings for this data based on the closeness of the similarity.

As two of the core principles of data mining would be executed together, extra caution was required when producing evidence from this procedure, as to avoid the infringement of privacy and the inclusion of incidental data which may be collected alongside the desired files. Statistical classification acts as the core data mining process, with cluster analysis being used as a refinement technique of the files initially identified. In this sense, cluster analysis consists of the degree of similarity component. Rather than grouping files into distance measures, the principle of similarity will be extracted to allow for the inclusion of more detailed evidential markers than are used in statistical classification. This allows for a much more complete and robust solution to have been developed, improving the potential appropriate admissibility of the evidence acquired for use in prosecutions.

YouDetect Functionality

YouDetect is able to identify all multimedia files which exist in plain view on a device and is then able to use principles of statistical classification or cluster analysis to establish which multimedia files have been created using the YTD conversion process. This is evidenced by the repeated testing undertaken. By ensuring the analysis abilities of YouDetect are 100% successful in discovering files in plain view, both efficiently and accurately, a genuinely useful application has been developed. The fact that YouDetect is able to identify the modified YTD files is the most important result and summation which can be derived. If the application was unable to provide this functionality, it would have failed to meet expected standards. Regardless of the minor issues which will be developed on in the future development of YouDetect, it is possible to describe YouDetect as an application which successfully provides the functionality it claims to provide. As with all applications, the available memory on which it can run is a core component of efficiency. The scanning of a full hard drive can take quite some time and as such, questions should be asked as to whether an application which performs an extensive task solely is suitable. However, the application does allow flexibility in that the triage investigator can chose which directory they wish to implement the scan from. So whilst the investigator should be performing a full scan from the root of the drive, it is also possible for them to select more specific directories should they wish, allowing for more efficient searching to take place.

YouDetect conforms fully with the relevant forensic guidelines of evidential integrity. In terms of triage and live analysis, it is always more likely that changes can occur to original evidence, in the same way that running some anti-virus scans will alter the ‘last accessed’ metadata. However, YouDetect avoids this issue and the metadata of the multimedia files remains referencing the last time the file was executed. It is possible for YouDetect to be executed via an external device, meaning that a type of USB Dongle could be created to contain the executable file and the database repositories it uses. It was originally noted that by storing the application on a USB device a record of the device insertion would remain on the suspect device machine. This has been an unavoidable occurrence but one which does not compare to the benefits of YouDetect and one which could be easily justified in a court of law. The product testing has also revealed that no other alterations are made to files which have been scanned, the auditing .dat files and that accordance with ACPO guidelines has been achieved.

The reports created by the application are suitably formatted in a method through which it would be possible to extract specific metadata as required. These reports currently exist in .txt format as this allows for simple report production and processing by the application itself.  It should be noted that the produced reports are actually incredibly useful and provide full disclosure of all actions which have been undertaken as a result of the investigation. This assists in proving conformance to specification point two, in relation to evidential integrity, in that time and date stamps for all actions are also logged.

A key aspect to any software tool to be used within computer forensics is a practical and functional graphical user interface (GUI). Officers who may be inexperienced in computer forensics, as a discipline, have to be able to understand and relate to the interface with a minimum of training, otherwise triage becomes more of a burden to computer forensics than a benefit. This consequently means that a terminal/command prompt interface was not appropriate, as the technical knowledge of the commands required to operate an application in such a way, whilst potentially useful, would not be suitable for an inexperienced user. The efficiency and ease of use of the core program are key non-functional requirements and when combined with the concept that the interface should only feature components which are necessary to the task, to be undertaken by the triage officer, then usability should increase. The interface is both simple and intuitive in regards to usability. However, as discussed, in terms of perception it may be more suitable to develop a GUI using an alternative library or through the use of interface development software.

Integration as a Forensic Module

It may also be possible to convert the functionality of YouDetect into the programming language used by Guidance Software’s EnScript coding. EnScripts provide forensic investigators with automated means of performing particular tasks. As they exist within a programming language similar to JavaScript, it would be relatively easy to translate the Java functionality into one which could be suitably used within EnCase. Whilst this ignores the desired triage functionality of YouDetect, it does provide an alternative solution for those forensic teams who are unable to organise and manage triage response units who would otherwise be denied access to beneficial software. The same could be said of alternative forensic suites which could accommodate the inclusion of such a module regardless of the programming language used by each.


Whilst every new development should seek to be unique in the approaches it undertakes, or the benefits it provides, this is often not the case. However, the development of YouDetect and the illegal acquisition of multimedia through video conversion do seem to be. It may have been possible to view this lack of response as a lack of interest. Yet initial analysis showed that this gap in the market seemed to be due to lack of competitors. With this positive situation in mind, the development of YouDetect was undertaken.

Knowing that a conversion process was undertaken to fully convert one file format into the other, as opposed to simple file extension swapping, it was hypothesised that these files would contain unique markers. These evidential markers would provide a conclusive basis from which to discover and categorise illegally acquired multimedia. Having created a hypothetical basis from which to launch an analysis, a collection of YTD converted files were acquired and analysed. It was found that the proposed evidential markers did exist and could be extracted for comparison usage. Whilst YTD is not the only video downloading/conversion software tool available, it was shown to be the most prevalent. With this consideration and knowing that the developed application was a proof of concept, it was uniquely the YTD converted architecture which would be investigated. However, it was always considered with future development in mind.


Through the development of YouDetect, research has been undertaken into both of the modern computer crime issues, triage and illegal downloading, and shown that a joint solution is appropriate. Specifically, this research has presented unique concepts for the classification vectors extracted from YTD files. Evidential markers have been discovered for all the variations of YTD converted files which present a complete solution to this issue. The markers used across this project are not publicly documented and were not incorporated into other forensic response tools. Alongside the research conducted into evidential markers, this paper has also provided research into illegal multimedia acquisition and computer forensic triage. This paper has indicated that triage forensics is a developing area but one through which many future benefits can be derived. It would appear that the CPS is aware of the benefits of sole triage investigation in cases, but that this is still in the testing and development stage.

The solution is a software application which uses the principles of triage forensics in prioritising to classify and identify instances of illegally acquired multimedia. YouDetect is a forensic triage application which successfully uses statistical classification and cluster analysis to identify all instances of YTD converted files. Evidential integrity of the files which are scanned is conducted to comply with the ACPO guidelines. It produces reports automatically that are based on the actions of the triage officer and the results of the scan. Significant testing has been conducted in the form of user and statistical analysis to ensure a robust solution has been developed.

As an industry, the multimedia organisations of the world have dealt with piracy in multiple forms of development. This is true in comparison to the degree of piracy which will always exist on the Internet. Whilst it may not be possible to completely prevent individuals from illegally acquiring multimedia, the implementation of a software application, such as YouDetect, has the potential to provide computer forensic specialists with a manageable starting point of investigation. Perhaps, in turn, this deterrent will allow the legitimate multimedia industry to prosper.

For a copy of the full research paper undertaken, or to find out further information regarding YouDetect, please contact

To view full article click here 

1 thought on “YouDetect – Implementing the principles of statistical classifiers and cluster analysis for the purposes of classifying illegally acquired multimedia files”

Leave a Comment

Latest Videos

This error message is only visible to WordPress admins

Important: No API Key Entered.

Many features are not available without adding an API Key. Please go to the YouTube Feeds settings page to add an API key after following these instructions.

Latest Articles