How Trustworthy Is Digital Evidence?

Leonhard Hosch shares his research at DFRWS EU 2018.

Leonhard: So, thank you. I’ll be presenting the paper ‘Controlled Experiments in Digital Evidence Tampering, or How Trustworthy is Digital Evidence?’ I’m Leonhard Hosch, I’m with Felix Freiling, I’m currently doing my master’s thesis, and … yeah, I work for him.

What actually is digital evidence? In our case, we use quite simple definition. We say that the evidence we get is on a hard drive, and of course this isn’t always the case, but mostly. And how does it come into existence? It’s some input to some device, and depending on the inputs, something gets written to the hard drive and we have our evidence. It’s not really that easy. I guess some of you already know this cartoon from P. Steiner from 1993. Actually, the year I was born. And yeah, that’s kind of a problem.

We try to see which input happened and which input leads to the evidence we have, and we have to reconstruct that. But obviously, it’s not that easy. Anybody can hit some keys on a computer, even a dog. So, that’s another problem, because the number of possible inputs is just really, really, really big, even compared to the space that we have now on our hard drives.

So, yeah, I don’t know how many of you know the Trojan horse defense in cybercrime cases. Do any of you know it? Raise your hands. Yeah, a couple of you. It’s like somebody … the police was there, took some evidence, and it’s taken to the court, and they say, “We found these masses of incriminating pictures on your device,” and the defendant says, “Well, actually, you found it on my device, but somebody else placed it there. Maybe I was framed by the police and the police officers put it there before they took the hash.” Or maybe, “Oh, I have some distant cousin that absolutely hates me and of course it was him.” Or maybe it was just a child surfing the internet and somehow finding shady links and putting them on the device.

So, that [leads to] our research questions: How easy is it to forge evidence? And which factors actually do influence the effort needed to do so. We put up a quite simple scenario. We have some evidence, some hard drive, actually a system drive. And there was tampering on that before the first hash has been calculated. Another equal scenario would be that whoever tampered with the device had the chance to also change the hash, which is somewhat unrealistic.

Our setting – fortunately we had some course called ‘Forensic II’. It was held in October 2016 to February 2017. We had 20 students, where all of them were already basic … they did basic forensics education, IT forensics education actually, from earlier courses, though they were completely inexperienced. But of course [they are now … they were now] real IT security experts. We gave them a pre-study questionnaire and an evidence tampering task, and afterwards they had to perform some analysis to find evidence of tampering or, as it will show, no evidence. And the documentation of effort was mandatory for every participant. In total, we used data from … actually, it should be 14 participants. 14 participants actually did the questionnaire, 13 only participated in the experiments afterwards.

The tampering task: There was a given computer system R and a given period time T, in the past, and some website W, and the task was to manipulate the device such that a forensic investigator will reach the conclusion that the website W has been visited during that time period T. We’ll take a closer look at that.

We had some image of some device R, and during the time period T – here, time is going from top to the bottom – and well, we had some … we call them originals, Q1 to Qn, and they actually had access to the website W in the time period T. Then there were images which had no access to W. We took that image and gave it to the participants, and they got the first task. They had to manipulate that image so that at least looked like the website W was accessed. We called that stage 1.A, and yeah, they had to do the forging.

Afterwards, we put all the forged images and the originals into a random pool, redistributed it to the participants, and they had to switch roles now. They no longer had to forge; instead, they had to analyze these images and write a short report about why or why not this image has been tampered with or whether it was an original. We called that stage 1.B.

And then we did another one, with some slight changes. Again, we have our image, the computer R. Again, some had access to the website, some had no access. But this time, we didn’t give the images to the students. Instead, the participants had to write some short software or script or whatever they liked. We would take it, put it on the machines, executed the binary or scripts or whatever it was, as root, and then we did the same thing like in the first round. Again, we put all the images into the random pool, redistributed them, gave them to students, and we called these stages 2.A and 2.B. And actually, the students were told that the probability of getting an image that is an original or a forgery, that would be 50/50, so that they had the least possible bias.

But of course, they knew that there were forgeries, so that would be maybe equal to some case where you get an image to analyze and maybe there was a policeman involved, which is … that’s currently another … some people will say, “He did tampering with evidence,” whatever, and you get to know that, so … that’s not too unrealistic.

Well, then take a closer look at what we actually found. Here, you can see the manipulation effort by the participants. This one was the first round, we see 13 participants … yeah, actually did that task. There’s quite different times, from really low, as low as 120 to really … some students really invested a lot of time. It’s in minutes, so 120 minutes is two hours.

And on the right one, you see the chart for the second task. You see some students actually dropped out of the course, as expected. And it wasn’t really obligatory to do the task, so some just didn’t do the task and still participated in the course.

What we can see here is that the average effort for manipulating an image was nearly the same, but the standard deviation … it was definitely a lot higher in the first round than in the second one. So, it could be due to learning or exhaustion effects, we don’t really know, but it’s interesting at least. And well, the median actually is a lot higher in the second phase, so maybe some invested more time?

The most interesting results, that the participants actually get to find out if the images were forged or not. And well, we were kind of surprised, but in hindsight, we might have guessed it – every forgery was correctly identified as one. Most of the originals also were identified correctly, except for that one. We see here somebody … this is the column which is the effort for analyzing, and that student invested a lot of time. It’s the second-highest actually. And yeah, he found something that he thought that absolutely can’t happen in a normal image, this is obviously a forgery. What he did find was some log files which showed that the time synchronization was disabled in log files, and … yeah, we still don’t really know why that’s in there, but it definitely wasn’t due to any tampering with the evidence. So, if you search long enough, you will find something that’s hard to explain but [possibly not impossible] to explain. Like we heard the talk yesterday about the phones and battery voltage, that’s hard to find.

And yeah, also notably, this one student, 13, here … well, the task was to manipulate the image so that nobody would be able to tell the difference. What he actually did – he put a lot of anti-forensics inside that image, like some terabyte of zeroes sit together and if you actually try to boot that image, it would delete all files, and some other funny things. So, yes, what he also did, which was fairly interesting, is that he deleted log files. So, this actually was deemed the cause that the analyzer’s effort was so high. It’s the highest one. So, the student which analyzed that image really had a hard time finding really good clues, because well, the log files were not detected. It was kind of obvious that there has been tampering, but it was hard to prove. So, that might be another approach you could do, so we’ll add it in the data.

I need my next slide.

So, yeah, let’s take another look at the times actually needed to classify as [non-originals or originals]. They were … the ones which were classified correctly as originals, they took an effort between 180 and 855 minutes, which means some people actually invested a lot of time. And the average effort to classify an original correctly was about 386 minutes. And yeah, okay, we got one original that was wrongly classified as a forgery, [where the guy] invested a lot of time.

And which is interesting – classifying a forgery took an average 680 minutes, which is a lot, lot more than it took to classify an original. This is most likely not due to more effort in documentation, because the reports they had to write were really short. It was only a couple of pages, which they had more or less formatted submission. So, this actually … we actually think the students spent that time analyzing that image.

Next slide. Yeah, let’s take a look at the manipulation effort in the first round, compared to the analysis effort in the first round. Generally, we can see that higher effort for forging also leads to a higher effort needed to analyze that image. Also, the effort that was needed to analyze the image was generally lower than the one needed to manipulate it, which actually is good for us, because it means any time we spend on it, analyzing images, somebody needs to spend more time to manipulate it. On the other hand, that exponential interpolation slightly points upwards, so maybe with a really, really big amount of time put into manipulating an image, actually our effort would increase even more. But it’s only a couple of data points, six, so we can’t be too sure about that.

So, what actually was the difference between the first and the second round, considering the effort for manipulation? Well, as you would expect, you need more time to forge some image that you only can put some software on, compared to the ones which had full control. So, yeah, what actually is interesting too, that the students which put more effort into the first task also put more effort into the second task. So, maybe that number is also correlated with some abilities or … more connected to the students than actually to the analysis. That could also be a possibility. Some students might actually be … we don’t really know that much about how good these students were, because all were classified correctly, at least apart from one. So, maybe some just used better tools or had better scripting or better skills to be faster. It would be interesting actually to find out if that was the case and how they did it. But you needed a couple of more students for that.

Then, another one, the effect of control. Now we compare the effect of control, which means the effort in the first and the second one, but not for manipulation, instead for the analysis. And again, interestingly, if you have less control over the image, you actually … if the forger has less control over the image, actually, the analyst has an easier job, because well, we expect that the images that you spend … the images that you manipulated with a perfect control were better. The probability to leave traces that you didn’t realize is much smaller when you can look at the complete image and flip some bits or bytes, compared to actually having to execute software and there are traces everywhere, log files, tampering with time in this, because you had to do the access in the past. And also, some [19:37], so that leaves traces too. You have to delete your own traces, which isn’t that easy, considering that the image was later then analyzed, and if you just used the normal delete, you can still reconstruct the data that was deleted.

So, you have to use a secure delete, and if you really were good, you had to also delete [inotes] and look that there were no suspicious numbers or ranges of numbers in the [inotes]. Which actually – some students actually looked at the [inotes], but most students had success with identifying forgeries more by looking at log files mostly, and some even left really obvious traces like not deleting their own manipulation software scripts. That was obvious of course.
And in the paper there also is a more thorough description of what students did and how the forgeries were detected. It’s like … yeah, you can read the paper. [chuckles]

So, let’s summarize the results. In our scenario, creating a good forgery was actually a lot harder than we expected. Also, it takes more time to analyze a forgery than it takes to analyze an original. Which may lead to the conclusion that if you, after some time, don’t find any traces of a forgery, maybe there just isn’t one. Also, the better the forgery, the more time invested into it, the harder it becomes to detect. And the time needed for analysis of a forgery has been at least, in our cases, shorter than the time needed to create that forgery. Actually, I like that.

However, there are some restrictions. The students were students – they had some experience, they had done some courses concerning digital forensics, but yeah, they are not really experienced investigators, or criminals and forgers. But it was the best we could get for the time.

But if you want to participate in such experiments, contact me, and we’ll set up a new one. [chuckles] More participants would actually be really helpful too. We could take a more close look at how the forgers were detected if we had more participants, more images. We did look at them, but of course with a couple of students it wasn’t really enough to draw good conclusions out of that.

So, is perfect manipulation a realistic setting? Having access to an offline image before it is … before the hash sum is calculated and it’s stored? [Or] maybe you actually need to be an expert in forging such images to make use of that, you have to be able to really create that image or access one or somehow get the physical hard drive, take it from a computer, forge it, put it into there, and then tip off the police or something before … that would be a real setting. It’s not that unlikely if you take some secret services as the attacker. Because they have manpower, they have experience, and they have the money. So, maybe that perfect manipulation setting is only valid for really high-profile targets.

Yeah, of course with more students, it would be easier to take a look at how the manipulations were detected. And maybe draw some conclusions in detail.
So, that’s it.

[applause]

End of transcript