Rick: Good morning. My name is Rick Ayres, I’m the project lead for the Computer Forensics Tool Testing project at NIST. My talk today is going to provide an overview on updates to our Computer Forensics Reference Data Sets project, also known as CFReDS.
Digital data sets, which are a collection of information about a specific topic or use case, are extremely valuable to the forensics community, but often difficult to find, as they tend to be spread out across various websites. They’re typically not organized. We’ve been working on an improved CFReDS portal, which is a gateway to documented digital forensics datasets produced by various contributors that assist in a variety of tasks, such as general practitioner training, toll testing, developing an understanding of the tools’ capabilities and limitations, etc. So our goal is for the CFReDS website to be a centralized portal, providing the forensics community with a quick and easy way to find and share datasets of interest.
So why are data sets hard? Creating data sets is a difficult task and time consuming. In order for them to be useful, they must be accurately constructed and provide users with detailed documentation. Data sets must contain important artifacts that show a tool, or procedure’s limitations. For instance, in mobile, we have a test for long address book names, such as John Jacob Jingleheimer Schmidt. They check that a mobile forensics tool doesn’t truncate the entry. Others include messages containing non-Latin characters in social media related data, such as Facebook, Twitter, LinkedIn, and many others.
So digital forensics tools often include a variety of functions, such as string searching, file carving, deleted file recovery, disk imaging, mobile acquisition, etc. Each of which require an individual dataset. Also, lastly, there are few standards or best practices for dataset development in digital forensics.
The updated CFReDS is a taxonomy driven website that supports large amounts of data that can be quickly searched and allows third parties to upload datasets that are approved by an admin user. Each data set within CFReDS contains a description and tags, which add another layer of information, which are useful for searching datasets of a particular subset. The description and the finding aids help users locate datasets by the year produced, author, or by attributes of a dataset. The current beta development is at the URL located at the bottom of this slide.

Here is the homepage, which provides a description of the site and is categorized into two columns, one for newest datasets, and another for popular datasets. The homepage allows users to either search for a specific dataset with a quick search bar located at the top left, or browse individual datasets by title, author, date, and tags. If you click browse datasets, this takes you to the browse page. If you click contribute, this takes you to a page allowing third parties to upload datasets, which are approved by an admin-privilege user.
Data sets on the browse page allow the user to view all data sets or begin filtering what is displayed by the title, author, date and/or tags. The author and tag fields have a pull-down function that displays the associated content for all datasets. So if one is looking for an author of a specific data set, but can’t remember the name, this is a nice feature.
Here’s a view of the taxonomy, which is controlled by admin-privilege users. I’ve expanded the nodes beneath the IT system type root node, leading to phone, mobile and tablet, Android, Android OS, and Android version.

Our upcoming plans for CFReDS 2.5 include test report, and integration of the CFTC tool catalog database. Test report integration will allow users to search for reports containing specific variables, such as a mobile report: it contains information on a Samsung Galaxy device, running Android OS 6 that was populated with Facebook related data.
This is useful if one is interested in seeing how a tool performed extracting data from a specific application and device. The tool catalog integration will allow users to search for digital forensics tools capable of performing a specific functionality that a digital forensics tool possesses, such as mobile data extraction, string searching, file carving, deleted file recovery, right blocking and many others.
We’re very excited to go live with CFReDS 2.5 and feel it will be extremely beneficial to the forensics community, providing a centralized portal for finding datasets of interest, as well as allowing others with a location to share datasets.
The remainder of the presentation provides a demo with the functionality of the CFReDS website.
Mehdi: Hello everyone. My name is Mehdi Shahid, and I’m the lead developer behind the new CFReDS. Please excuse my voice: I’m currently a little bit under the weather at the time of recording. Also, please excuse the accent, but there isn’t much I can do about that one! Anyways, I’m going to give you guys a quick rundown of the new features of CFReDS 2.0. If you’re used to the current website, this should feel fairly new, so please bear with me.
This is the homepage. And as you can see how it’s been fully redesign. The navigation is now more intuitive, leveraging a search bar on top that can let you access anything from any page on the app. We also now have a menu on the left, that will take you to the main sections of the application at a click. The content has been given the most amount of space available right in the middle.
Below the small description of what CFReDS is and what it’s trying to do, we now have a news section that is aiming to provide the users with up to date relevant information on what is happening in the computer forensics world. For demonstration purposes, it has been filled with placeholder data for now, but anyone can submit new content, and as long as it gets approved by CFReDS staff, it will be visible to everyone else in the forensic community.
One cool feature of the new homepage is that you get quick access to the newest datasets, and also the most popular ones ranked by number of downloads.

The core of the application is a browsing page where you can find all the datasets that have been added. You can see there’s a lot of them right now, but no need to worry, the browsing page has a filtering feature. You can filter using the title of the dataset, but the app also groups all distinct authors in a list and gives you the ability to fine-tune your new query using those. Same things goes for the date of creation of a dataset.
And finally, you can filter using tags. Please note that all features can actually interact with each other and you can mix and match different metadata to categorise your search. We can use this to find datasets that match very powerful queries, such as: let’s find all the iOS datasets created by NIST in 2020 in the search.
Each dataset has dynamic feature weighted tags that are chosen by the user when uploading the dataset. We know it can be a little bit daunting to scroll through hundred of tags, so we have also given you the option to do graphical filtering.
What you’re seeing right now is version 1.0 of the CFReDS taxonomy. Each tag that is available in CFReDS is actually a node of a ever evolving taxonomy. And we have found that navigating text structurally, as it is shown right now, is much easier and human-readable. For instance, we scroll down to iOS, and you are now seeing all the datasets that have been tagged with iOS.

By clicking over the little down arrow next to the dataset title, you get a short description of what the dataset is about. But if you want to access the full data that is inside, you can also click the title, and it will take you to a more in-depth view.
In this detailed view, you have access to all the information about the dataset: you can see the tags associated with it, the short description, a long description if the author felt you need more information, the download links, and we have also added a comment section for the community to discuss hopefully relevant information about the specified dataset.
One of the ways of navigating through CFReDS is to use the quick search bar. It is available at all times and on all pages. As you can see, you can use it to directly type in the title of a dataset, or you can also use it to do a quick search using the fields we described earlier.
Quick, graphical filtering is also available from anywhere on the app, by pressing the icon next to the search bar. And let’s head out to one other very important feature, which is the dataset input process. I figured it was best if we just did it together, and try to add a new dataset.
So let’s say I’m John Smith, this dataset was created by Kevin Smith. Let’s input his email. I have a very cool iOS image extracted from an iPhone, so I will just put “iPhone iOS image” as a title, which isn’t the best title, but this is once again, only for demonstration purposes. Same thing goes for the short description, and let’s say this image was created in 2020. I don’t have a need to provide a longer description, so I would just click next.

I am now asked to tag my dataset with something relevant that would help users find it. So thanks to the taxonomy that we discussed, if I just put in iOS, the app will automatically add all the subsequent nodes in the tree as tags, which makes the tagging process very easy and straightforward. I don’t have to be worried about having missed anything, or a specific tag. Then, I just specify where the dataset is hosted, and we’re good.
Finally, I get taken to a summary of everything that I have inputted, and if everything seems fine to me, we can just click create. That’s it, it tells me my dataset has been successfully submitted and it is waiting for review. It is now time to showcase the management capabilities of CFReDS.
By logging into our private administrators site, the admins can review all the user submitted data that is currently available on the application. After logging in, a new navigation menu appears on the bottom. Each section allows the admins to manage a specific portion of the application.
The first tab of a data section is the data review tab. The data review tab focuses on all the data submitted by the users that hasn’t been approved yet. For now we can manage the datasets, the comments that we mentioned earlier, and the users’ requests that we will get to in a bit. Some other features have been implemented, but they are not crucial in the showcase of the application for now, so we’ll just skip them.

Once again, to better demonstrate how everything works, let’s just go back to our real-life example, and, let’s approve the dataset that John Smith has just submitted. We can search for it using a number of fields, but I’m going to use the author tag in this example.
Once found, we have three quick links that give us quick management options. First one is the details button: by clicking it, you can access all the information about the dataset that the user has selected. The next one is the authorize button, which allows us to make a dataset visible to everyone on the app. Finally, the delete button, which, you’ve guessed, makes the dataset disappear.
Let’s click on details, and we are taken to the full description of the iPhone iOS image. However, there is now an edit button next to all the fields which allows us to change the metadata if something seems wrong. This all looks fairly good, so let’s go ahead and make this dataset visible.
The second tab of the data section focuses on all the user submitted data that has been approved and is now visible to all the users. There is no need to go into much more detail as it is very similar to what we’ve just seen, but this is how we can unauthorise data at any time.
The next section in the administrative panel is the news section. Once again, it works exactly the same as with the data section, but this one focuses solely on the news aspect of the application. We have no news pending approval as of now, so the first tab is empty, and you can see a placeholder news on the other one.
Finally, we will go over one last use case to fully demonstrate the new CFReDS functionalities. What happens if you submitted a dataset and you’ve made a mistake or something has changed, or you would simply like to have it deleted? Well, you can head into the last navigation link, which is the dataset management tool. You simply have to select which dataset you are inquiring about and fill out a small form, giving us proper instructions. In our example, the iPhone iOS image was actually running iOS 14. So I just go in to ask the admins to add that tag to our dataset.

We are now back in the management style, and if we head to the users requests, we can see a new one giving us the instructions we just typed. We have a small issue though: we don’t have an iOS 14 tag just yet as it didn’t exist when the taxonomy was last updated.
This, very conveniently, takes us to the last section of the administrator panel that I’m going to share with you today. That is the settings section. Let’s delete this request and quickly take care of it. In this section, you can manage all of the application logic data that the website is using. We are now only focusing on the taxonomy because we need to add a new iOS 14 tag, but all the non-relevant features are also available.
Similarly to what we did when doing the graphical filtering, we can now scroll to the proper tree level and simply add the iOS 14 node, which will automatically create a tag at the proper level and propagate throughout the whole application. We can now update Mr. Smith’s dataset and we are done.
The application is currently running, and I’m sure there are a lot of things that we can improve on. So if you have any feedback or suggestions, please feel free to contact us. Thank you very much for listening, and have a good day.