Selective Deletion Of Non-Relevant Data

Christian Zoubek’s work on selective deletion was featured at DFRWS 2017.

Presenter: Unfortunately, Christian, who made the [00:16] about selective deletion in images, is ill, and he asked me as his advisor for his [Masters thesis] to present the … to give the presentation. So please apologize that I shortened the presentation in the detailed technical part and that I got some notes with me. It was a spontaneous thing that I’m now the presenter of this presentation.

A short outline: after an introduction, we will have a look at selective deletion, a wider look at the evaluation, and a short conclusion. The motivation of this [Masters thesis] was just to give you some background – I’m working at a police station in [01:08] as computer forensic examiner – motivation for this thesis was that in law enforcement investigations, the search and seizure of digital evidence is a standard procedure, as we already have heard this morning, from Martin, and normally, a bitwise copy is created of the digital evidence. Even Martin said this this morning – if the reason of the investigation is a non-cybercrime, is of non-cybercrime nature … I talked to the judge on Monday, who was responsible for all [the crimes] resulting in death, like murder and so on, and she said that in most cases, up to 90 per cent, the digital evidence is significant for the court.

Martin already mentioned some problems arising to law enforcement agencies. It’s the mass data, or mass of data, the needle in the needlestack, you heard this morning. A possible solution is selective imaging, but this is not the topic. And we face more and more specialized defense counsels … what does this mean, I will show you on the next slide. And possible solution is selective imaging or the selective deletion.

We already heard some buzzwords like wiretapping in the talk of Martin and the talk of [Daniel Spiegelman], but what does wiretapping have to do with selective deletion? A lot. Because the legal considerations for wiretapping and for the storage and seizure of digital evidence are in the [02:59] very similar. The first decision in this point was made by the German federal constitutional court in 1957, the so-called Elfes decision was the first decision which limits the access and the usage of private information. This is for wiretapping and normal digital evidence in general.


Get The Latest DFIR News

Join the Forensic Focus newsletter for the best DFIR articles in your inbox every month.

Unsubscribe any time. We respect your privacy - read our privacy policy.


So what does this mean? The quintessence of all the privacy data laws and privacy data acts [in US] is that law enforcement is forced to spare out blocks of irrelevant data to the case. [It’s forced but] if it’s done, this is not [03:42]. If this cannot be done while the imaging process or the accusation process, the deletion has to be applied as soon as possible, and the documentation of the deletion is mandatory. Sparing out the blocks during the accusation process, or even the deletion after the accusation process is not a common process in the German law enforcement agencies. There are several reasons why this is not a common practice. The first reason you will hear if you talk to some law enforcement investigators is the deletion of data modifies the image you have created from the evidence, and the second is that the image afterwards may be not accepted in court.

Now, just to give you a short example – just imagine you have a blog server hosting hundreds of blogs, and you are in a law enforcement investigation, you get to the server, [04:50] forensically correct will be the seizure of the server, but nobody wants to seize the server and then put it on the [05:01].
The normal procedure would be imaging directly in front of the server.

But what happens if you image the whole server? You get a lot of case-irrelevant data, especially the data from innocent bystanders if the blog server [hosts] hundreds of servers and only a few of them are involved in illegal activities.

Which questions do arise – under German law, the deletion afterwards is [05:30], but how do you have to delete this data? The moment deletion of files is [05:39], everybody knows of you, [is in the middle], deletion by instructions implemented in OS, this is not a real deletion, only the index is modified. The second possible [05:51] would be wiping zeroing out of the content, of the physical allocated blocks of the data. But we stated that even metadata still yields enough information to get non-relevant information about one’s privacy and one’s private life.

The first thing is that no current forensic tool on the market exists which allows directly modifying or deletion in images. That’s why [this] forensically sound software.

[To extend] this example to the metadata point, think about that there is some private data stored on the blog server, maybe some pictures of holidays, [in a directory structure] like ‘Holidays 2017 Berlin’ and ‘Holidays 2018 in France’. And even if you delete the content of the data, you may get some information about one’s private life out of the metadata. Maybe you know where the person was in 2016, 2017, 2018, whatever.

Two problems arising with [mass data] or with data and metadata is how to classify which data is relevant or not. Martin said that already, that this is just [human thing]. There is no tool that says this data is relevant or not. The second thing is how do you delete this data, this non-relevant data in the images, without affecting all the other data on the image. You get to guarantee the integrity of all the other data.

The definition, our definition of a forensically sound selective deletion would be with respect to the law, that private non-relevant data has to be deleted, the integrity of the residual data has to be guaranteed. The technical point of view – the deletion or the wiping out of the physical allocated blocks of the data has to be done, and our additional requirement or demand is that all the metadata that may yield information about one’s private life has to be deleted too.

Let me say a few words to [Christian’s] [08:33] task force to investigate whether a secure selective deletion tool is technically achievable or not. You’ll realize this task as a plugin for the Digital Forensics Framework. You see it on the slide. Unfortunately, the DFF site goes down last week. And we bounded this plugin to Microsoft’s NTFS system, because we wanted to focus first on one operating system and on one file system to … yeah.

During this implementation of the selective deletion tool, some more functionality in not only the deletion module was implemented. One module was the detection of duplicate files, which are not necessarily flagged. We’ll come to the point that the search and checking flagging of relevant data is a human procedure, and you know that [09:40] a human would have skipped one file, [but you won’t] get all files in your investigations.

The second module or the second part of this implementation was a basic partition table parser, because the original partition table parser of the DFF framework was corrupt. Another module was the detection of carved, the so-called [10:08], which managed to show carved files for the former format of the device. The next was the hard link detection to prevent if more than one file is linked to the same content that the residual data is [10:30]. Then the calculation of hash trees – to that one we will come later. This is for the integrity checks.

This is just a short overview about the deletion module, in the introduction, that I’m not so familiar with this technical implementation. The main module is the deletion module besides the zeroing out of the content of the allocated data blocks for the selected data objects. The deletion module [11:05] the deletion of the metadata. If you have looked at the MFT entries, it is not possible to zero [11:14] totally out, then you will get the image in a [corrupt] state. So the MFT entries, especially with regard to fixup-values, have to be modified in a special way.

If you modify the NTFS system and delete some folders or files, especially folders, the [lower] B-tree structure would be corrupt, so another part of the deletion module was the B-tree update. It is written down that there’s some different things if the deleted object is on a leaf level. There have to be other routines that … if the deleted data object is on a node level, but at that point let me refer to [question case] for the technical details.

The hash calculator module was implemented to guarantee the integrity of the residual data. To wrap that up with the hash calculator module, two [12:21] trees were calculated, one before the deletion and the second after the deletion, and the comparison of the two hash trees should show the same modification as the [log] file which was created by the initial module.

Just to give you a short overview about the evaluation. For the evaluation, there was an experimental setup with two scenarios. The first scenario was set up to show the correct functionality of the implemented [12:57]. It consists of seven various test cases. In some test cases directories were deleted, in some test cases files, resident and non-resident files were deleted. And another test case was a reformatted device to show the correct functionality of the [power cleaner] module, and two of the test cases work with bootable Windows to show that [13:24] tree update, update works correctly.

The second scenario was the comparison against existing implementation of professional software, in this case it was forensics. We set up the USB sticks with many directories and [cloned or duplicate] files in different directories, deleted one whole folder and the duplicates of the files in this directory on the total image.

Evaluation of the first scenario showed the correct functionality of the implementation, without major problems. The data content was erased, B-tree update was managed properly, and the image disk or the image and the disk were mounted properly. And even the Windows partition could be booted. There was one exception in the Windows partition. We tried to delete a user’s home directory, and after booting up the Windows, a pop-up popped up with a warning that Windows is broken and the user director was created by Windows but there was no data in it. Just the Windows directory. And afterwards, the Windows worked correctly.

We cross-checked that against other tools, against the FTK, which [14:57] that all metadata and all data content, the physical allocated areas were zeroed out.

The second scenario was another test set up, first both tools would find duplicates. The pro software deletes the files by sparing only the data content. So the metadata is still usable. Even the full names were found. The pro software is only zeroing out of the data of the physical allocated data blocks of the selected data objects. The input image was not modified in the pro software, because – I said it before – no [pro forensical] software is able to modify images. It’s [15:51] in that case created a so-called [cleanse] image out of the original image.

So all what [EnCase] does was this – the marked entries are deleted by only skipping these file contents during the copying process. In comparison to that, our tool operates directly on the image, as I said. There is no original image and a cleansed image, you only got one image, and if you delete something on this image it’s gone. So be careful, if you use it. You can use it. And we did a further verification of the correctness of these two evaluations with FTK, which are an [exam to show] that implementation [16:42].

Just to show to you right now … slide on top, you see the original image, [16:53] knows that this is familiar to the one you see on the [X-Ways] page. [X-Ways] only [spares out] during the copying process the root blocks, the file content of file C or the file content of file A, and the selective deletion [17:10] for [DFF] not only spares the content blocks out, and even the MFT entries are modified.

So let me directly come to the conclusion. Christian showed in his Master’s thesis that the practical approach of a prototypical selective deletion tool is possible. All of our legal requirements and our additional requirements demands were fulfilled by this tool, that not only the relevant physical blocks of the data, the file content was wiped, deleted, even the metadata was wiped, unrecoverable wiped. The residual data stays in an untouched way. With the usage of the [hash calculation trees] [18:01] [the trees], the verification of the data integrity is guaranteed, and continues logging of every single step is the documented … [18:12] documented [18:13].

That leads me to the last thing, and to a problem – the logging while the deleting could also reveal information about bystanders. But everybody knows you should do your documentation, logging, and so maybe, you should keep the log files away from the image.

So that’s all. For questions, please contact Christian.

[laughter, applause]

Presenter: Thank you. Just one more word. The tool may be available from now on – I don’t know – maybe next week on this home page.

Host: Alright, thank you. Do we have any questions?

Audience member: Hi. I might have two questions in one. [19:09] If I’m working in forensically sound manner, all of my steps needs to be auditable by a third person. [So if I] do a deletion out of the image, I need to maintain the image, that the step can be audited. If I do not conduct logging … so that is some kind … a paradox, so …

Presenter: Yes. It is, yes, of course. But this, the same problem with wiretapping – if you have an investigation into … you do wiretapping, you have to delete it from the record on the tape, and you write down what you deleted, or what timestamp you deleted, and you [didn’t have to know which one]. So yes, it’s not forensically sound … it’s not repeatable. This is the point. Yeah.

Audience member: And the second part would be: I have to search for the allegations and to [relieve the suspect]. So if I delete a part where I say, as an investigator, that has nothing to do with the case, I expect that the defender in court stand up and say, “Okay, you maybe deleted evidence to [relieve] my suspect. [Prove for that, that it does not happen.”]

Presenter: Yeah. The first point is that the court has to rely on you, that you don’t delete any relevant data. Maybe I think the tool is for … the storage in the law enforcement agency is [20:41] not for the court. It’s not to delete a special [data] and give the image [or] the device back to the suspect. For the storage, if you have to delete private data like [… in the Wiretapping Act], but it’s forensically … yeah.

Host: Thank you. Any other questions?

Audience member: First, I find the [21:08] very elegant, but I was wondering what the [21:13] has to be, the requirements would be enough to just do that afterwards, or if you would want to create the [21:23] [tree root] already at positions, so that you have to [21:28].

Presenter: You [calculate two magnitudes], one before the deletion and one after the deletion. And this is deletion directly on the image, so normally, if you have an investigation, you calculate [21:44] …

Audience member: My question was is that sufficient for [tracking] the chain of custody or if you should already calculate the [21:52] when you’re acquiring your image.

Presenter: No. During the acquisition, you just calculate normal hash values with the tool of your choice. And this is just another [22:08] … it’s just for the image. It’s not for the device, it’s really for the evidence file image or [expert witness image] or whatever. So not during the acquisition, [22:19].

Host: Any other questions?

Audience member: Question. Which side effects to expect? And if I understand right, it’s no effect on the local file and [22:32], and nothing relevant in the registry. But how about effect of the tool itself?

Presenter: About the traces from the tool in the image?

Audience member: Yes.

Presenter: There are no traces. I think there are no traces.

[laughter]

Presenter: We haven’t seen any traces, in our evaluation. It’s not “there are no traces”. Yeah.

Host: Any other questions? [Hans]?

Audience member: So I can see that this would work for complete files, and [servers with] a database application probably are not too good targets, and for personal computers I think it would work good. But the registry might of course contain all kinds of privacy-related information. Do you have … did you think about solution for cleaning up the registry?

Presenter: No. About this data, or metadata, no. Even on internet history [only] the explorer, no. Just thought about EMFT entries and … this kind of metadata.

Audience member: Thank you.

Audience member: [23:38] [slack space]. Do you also wipe the [slack space]?

Presenter: That’s a good question.

[laughter]

Presenter: Um … Christian?

[laughter]

Presenter: No, I’m sorry, I can’t answer this question. About this [slack space], I don’t know. Sorry.

Host: Okay. Thank you very much, [Hans], in [24:08].

[applause]

End of Transcript

Leave a Comment