Jacques Boucher discusses his work at DFRWS EU 2018.
Jacques: Thank you. Alright. First of all, I’ll start off by thanking U-City, thanking [An], who’s my prof, who [00:13] be here today, but actually he’s teaching in Sri Lanka, so he wasn’t able to be here.
So, this presentation will be a lot less technical than some of the other ones, although it’s still very much related to forensics. The agenda will look at the motivation for why I undertook this research; what the objective was; did a quick survey of practitioners, I’ll talk about that when we get to it; we’ll go through the framework; we’ll look at the application of the framework; and then, conclusion and future work.
The motivation was in… today, application developers strive to make users’ experience seamless between your devices. So, whether you got your Mac or your iPhone or even in the Android environment, with Google, everything is seamless. So, Google Chrome, you log into one device, and you go through the other device, everything syncs over – which is great for the end user, but it does create a challenge for us as forensic examiners. How can we show that the activity that was found on one particular device actually occurred on that device and didn’t sync from another device, and why would that be important?
Well, if I seize a laptop from an individual who was accused of navigating to child porn websites, I analyze the browser, I find they’ve navigated to the child porn websites, we get to court, the user takes the stand, says, “Your Honor, that wasn’t me. I’ve got a tablet at home, and my tablet logs into my Chrome account. One of my kids or one of their friends must have went to that. It wasn’t me.” So, if we don’t look at synced data, how can we address that defense in court?
So, the objective was to develop a framework that computer forensic examiners can use to attempt to answer the question, “Is it local or is it synced data?” One of the things I did, and this was kind of last-minute – the beginning of the week, I sent out a really quick survey, not a very well-constructed survey I realized once I started seeing answers come in. But it still gave us a good picture of … this was a survey to about 400 practitioners, and I got about a 10% response.
I asked, do you look for artefacts … at whether your artefacts are synced or local, when you’re doing your forensic analysis? And 45 or almost 46% said they never look at that. And another 30% only look at it between 10-30% of the time. And if we look at those who look at it always, it’s only 8%. So, we see it’s not something that examiners are currently looking at very often.
And when I asked them why do you not look for it, we had 42% said because they never thought of it. And then, we have another 11%, they don’t see the relevancy of checking for it. Another 11% said it’s too time-consuming. And another 34% said they don’t know how they would look for that. And again, the reason for the framework.
So, the framework itself seems quite trivial when we look at it through this graph here. We have your operating system analysis, your application analysis, and timeline analysis. You take all of that into consideration, and from there you form an opinion.
If we flesh that out a bit more, [there] your application analysis – this is the framework I came up with. And bear in mind, I’m sure there’s … improvements [will be able] to come from this. But what you want to look at as forensic examiners, you’re going to have to look at does your application support syncing of data? If it doesn’t, then you’re done. It doesn’t sync data, there’s no need to worry, you know it was created on that device. Does the application log the syncing that it does, if it does sync? Can you selectively choose what syncs? Can you opt in or out of syncing? And there’s another series of questions there. So, as I work through my research, this is the research I did on the application.
In my case, I used Google Chrome and Windows 10 for the environment to do my testing. For Google Chrome, I had to answer all these questions. And one of the challenges that became obvious … and just to give you a bit of background, I teach Google Chrome forensics at the [Canadian] Police College, and I’ve been researching Google Chrome for some time now. And I have a fair bit of knowledge built up on Google Chrome. But despite all that, there was still a considerable amount of effort I had to put into answering some of these questions regards to Google Chrome. So, imagine if it was an application that an examiner’s not familiar with. So, one of the challenges is the lack of documentation on these applications.
The other step was the operating system analysis. Does your operating system track when a device is powered on or off? Does it track when a user logs in or out? Does it track when there’s network connectivity? For example, in the case of the application analysis, its strength is … in the case of Google Chrome for example, it tracks whether a URL is local or synced. Other applications may not. But if the device is tracking that, then that’s good information, and that’s fairly reliable.
In the case of your operating system, it’s not as good a part of the framework to determine whether something was local. It’s better to tell you whether something was synced. So, for example, if certain sites were navigated on a certain date and time, and you go and you look at the logs, the event logs in windows, for example, and you determine that the computer wasn’t even powered on at that time, you can then form an opinion that, in all likelihood, that that data synced from the other device. In absence of mistakes with the time … like the time has been changed on a device. But putting that aside, once you’ve ruled that out – because I do have that in part of my framework as well – you can pretty much say that this happened on another device, because the device wasn’t even powered on, or the user wasn’t even logged in.
The last part of the framework is the timeline analysis. Now, the timeline analysis is arguably the least developed part of the framework, from the research I’ve done, it’s quite a bit more challenging. What I did for the timeline analysis is I attempted to look at what does the timeline of the system look like when there’s no activity on it, when it’s just powered on, no user logged in? Then I did another one with the user logged in, but no activity, then another one using the device, but not using Google Chrome, which is what I was using [for] my application platform for testing. And then another one when I was using Google Chrome. And I observed the different timelines, that a problem I faced here is when I used log2timeline, to create my timelines, I noticed that if I ran log2timeline more than once against the exact same dataset, the exact same virtual machine, it had different results. It carved out a different amount of events each time. So, clearly, it wasn’t a reliable tool to create my timelines. So, my fallback was I used X-Ways, and used that to carve out a timeline.
So, you want to look at … the timeline will give you more of a graphical view of what was happening on that system. But one of the challenges with the timeline is what looks normal on my system is going to be very different from what looks normal on someone else’s system. It’s going to depend on what applications I have installed, what launches at startup. So, it’s trying to come up with a way to determine, when I look at a timeline, can I tell from the activity in there whether there was [a user at a keyboard] or not at that time. Because here again, if the system’s powered on, the user’s logged in, but there’s no user at the keyboard, then arguably, user data that’s been created on that device during that period of time on a timeline likely didn’t occur on that device, but it would have came from another device.
The application of the framework – I looked at Google Chrome and Windows 10. With Google Chrome, [like I said,] fortunately, Google Chrome tracks what URLs are synced from another device and which ones are local. But if we cast that aside, how would we approach it? And that’s what I did in this case here. Because not all applications are going to let you know what’s synced and what’s local.
In the case of Google Chrome, I did the research and, as many probably in this room know, Google Chrome uses SQLite database for its backend for all of its artefacts. The artefacts that I found as synced were the history, but only the typed URLs. Other activity in there would not sync, unless there was an equivalent typed URL to it. Form data would sync, login data would sync. What doesn’t sync is your cache, your cookies, your network action predictor, shortcuts, downloads, those are all things in Chrome that doesn’t sync.
Here’s an example – so, we have a site that was visited, that’s the Canadian Police College website. So, it’s visited on 2018-03-18 at 2011 hours. So, when I run an SQLite query against the history, I can see where it was visited here – we see have a typed URL and we have clicks on the link here.
The next thing I did is I went and I looked at cookies, because I know cookies, in the case of Chrome, doesn’t sync. So, when I looked at cookies, I found that indeed there were cookies of the Canadian Police College – which I knew there would be, obviously, because it’s a scenario I created. But seeing these cookies for the Canadian Police College then tells me, okay, these cookies, they don’t sync in Chrome, so therefore, it had to be a local activity. The time of the cookies matched the time of the history activity.
The next thing I looked at was the network action predictor. Same thing – we have some activity here for the Canadian Police College, and now, in this case, there’s no timestamp in this particular SQLite file, but I know this is activity that does not sync, based on a testing I did.
The other thing is in the cache – cache doesn’t sync in Chrome. So, in looking at the cache activity, I was able to find cache for this particular website that had been surfed, and that was found in History.
So, based on all that, what allowed me to formulate an opinion is the fact that cache exists, the fact that network action predictor exists, the fact that just cookies exist, for this URL that I found in History, it allows me to form an opinion that that surfing had to have happened on that local device, it didn’t sync from another device. So, if the accused [10:52] tries to say, “No, Your Honor, that wasn’t me. It must have synced from one of my other devices,” you can say, “Nope, that’s not possible, because these artefacts would not exist if it was synced data.”
The strength of the application analysis – it gives you a higher confidence that the artefacts are from the applic- … because your artefacts are from the application itself. So, depending on the app, some will track it better than others. If you end up having an application that syncs absolutely everything seamlessly and there’s absolutely no trace, it becomes a lot more challenging.
In the case of Windows 10, what you can go look at is … in my case, I looked at event logs. You can run XML queries against the event logs to pull out events from there. So, to log on, log off, network connections – if there’s no network connectivity at the time of certain activity that is internet activity, then obviously it couldn’t have happened on that local device, because the internet wasn’t present, the connection wasn’t present to the internet.
For the OS analysis, what it allows you to say is, in this case, the system was powered on at the time, there was a user logged in, there was a network connection that was present. So, it corroborates that it could have happened on the device, but it doesn’t give you the same level of confidence as … [did it happen] on the device. Because it’s still possible that, as I’m actively surfing on this device, Mark could be actively surfing on the device over there that’s synced to the exact same Chrome account, and it would sync to my computer here. The fact that I’m logged in corroborates that it could be local activity, but that by itself won’t tell you it’s local activity. You have to use other … the application artefacts.
The timeline … this just a sample one from the research I’d done, where, in this particular case … and you can’t see it on the timeline right here because there are so few activities. But the internet activity in this area here, and this is where the system shut down, and this is where the system powered back on. So, the timeline can give you a real quick snapshot if there is activity during this period here of browsing. And you see nothing else, the timeline is pretty flat. It allows you to form an opinion here again, that in all likelihood, there was no user at the keyboard, the system wasn’t being used at the time that those artefacts were created.
As I said, it provides you a visual overview. But the problem with the timeline – it’s only as good as the tool that creates it. And as I mentioned earlier, with log2timeline it created some problems. It created different events every single time. And arguable – or, not arguably – obviously, it’s the least developed of the three steps of the framework here.
Before I get into the conclusion part, I just want to give you a real-world scenario. I’ve got a colleague of mine working for an international enforcement agency where they had three … one shared Skype account with chats from three different individuals. So, one person would log in, they would chat, they would log out, someone else would log in, they would read the draft that was left on that same account, and they would log out, the third person would log in. And they were in three different geographical areas. So, the investigator contacted me at the time when I was doing my research, and he said, “Is there any way to prove which part of these chats happened on which device, in order to attribute those chats to specific individuals,” from these three individuals he had.
I just messaged them this morning, and turns out there actually wasn’t three individuals, there was only two, who were both within Europe. They were pretending there was a third individual, making it look like there was a third individual involved in that chat, because it had to do with defrauding for taxes or something, making it look like there was another individual there, and then making claims. So, it was … but he said it was very messy in this case here, because the people were actively chatting at the same time, and all this data was syncing between their devices. So, it became very challenging.
So, there are some real challenges when that happens. But in your typical scenario, where a user is using one device at a time, your artefacts would be a lot cleaner, to be able to determine on which device the activity happened.
The conclusion – it’ll become more prevalent and seamless in the world of Internet of Things, the potential to become the next go-to defense. I liken this to the malware defense. Back in the early 2000s, when I started in forensics, malware was becoming more and more relevant in our forensic analysis. A lot of forensic examiners were kind of casting it aside and keeping their fingers crossed they wouldn’t be challenged in court on it. But of course we got challenged in court on it, so we were forced to develop a process on how do we look for malware on a device, and if we do find it, how do we deal with it.
This is what’s happening with synced evidence. We are going to get challenged on it, defense is going to go to court, and they’re going to say, “Your Honor, it didn’t happen on that device. My client’s not guilty. It’s synced from another device.” So, unless we have a framework on how we address that, how do we deal with it, we’re not going to be able to respond to those challenges.
The application analysis, [as I said] it’s got the most potential. The OS analysis, the strength is to identify synced data – so if the OS is showing the laptop is shut off, it gives you pretty good evidence. But if the OS says the laptop is on, the user is logged, doesn’t give you as much confidence to be able to say what exactly happened. And the timeline is only as good as the tool that created it. But it can provide you a quick snapshot.
For future work, the framework could be tested with other applications and other operating systems. In the amount of time that I had doing my research, I was only able to test it on Windows 10 and Google Chrome, and that still amounted to over 300 hours’ worth of research. So, it’s very time-consuming.
The tool to compare … another possible future work is a tool to compare application artefacts across multiple devices. So, if you’ve got a laptop and a desktop and a mobile phone that you’ve seized, and you could examine, for example, Chrome or Skype across all three of those devices, and do some analytics to determine, okay, based on what I’m seeing on all three of those devices, this activity within Chrome or Skype happened here, this activity happened over here, and this activity happened over here.
The other possibility is with machine learning. Could you do a binary classification model? And I’m going to say that I’m not a data scientist by any stretch of the imagination, but I do recognize that there’s potential value. So, could we use machine learning to determine whether or not certain activity was synced or not? Or whether or not there was a user at the keyboard or not? I expect it would be challenging because of all the different platforms, different applications could be running. But I know there’s experts in the room here, so hopefully some will be able to see opportunities here to apply that.
The other thing is we don’t have a credible repository of research that can be shared within the international community. As I said, I had to do extensive research on Chrome, but if somebody faces that elsewhere in the world who doesn’t have access to my research, they’re going to have to repeat all that research, or the research I’ve done on Windows, they’ll have to repeat that research. So, we do need a way to centralize all that research so it can be accessible, so that people aren’t repeating that whenever they have to answer these questions.
And that’s it. Thank you very much. Any questions?
[applause]End of transcript