Turbinia: Automation Of Forensic Processing In The Cloud

Thomas Chopitea and Aaron Peterson discuss their research at DFRWS US 2018.

Thomas: It’s our first time presenting here at DFRWS. It’s also my first time and Aaron’s first time attending. So, we’re pretty excited to be here. I’m Tom, this is Aaron. We both work at Google. We do forensics and incident response. So, this basically means that we write a lot of code, because we’re lazy and we like to do things automatically. Aaron is the core developer for [Turbinia], I’m one of the core developers of [dfTimewolf], which we will introduce in a minute.

We’re in this sweet position where we can write a lot of code and also use the same code that we write in our daily incidents. So, that’s pretty cool, because we don’t have to follow feature requests [… well, we do … between] both of us. But most of the time, we can get … since we’re in the same team, we can get things solved pretty fast. And we really know what to expect and what we want our tools to do. So, that’s pretty cool.

And both of our ambitions – and I guess the same goes for [01:05] team – is to automate ourselves out of a job, to be able to spend most of our time doing actual forensics investigation and analysis, rather than [01:15].


Get The Latest DFIR News

Join the Forensic Focus newsletter for the best DFIR articles in your inbox every month.

Unsubscribe any time. We respect your privacy - read our privacy policy.


So, why are you guys here? You’re going to learn about most of our toolkits [that applies] to everything [cloud]. So, everything that we’re going to present is free and open-source software, so you can use it in your own infrastructure, in your own projects, and since presentation of tools can be a bit boring, we will try to articulate all of these with a scenario of sorts. So, we’re going to focus on dfTimewolf, which I mentioned, Turbinia, which I mentioned. We’re also going to talk about Plaso and timesketch. It’s going to be a bit … a lot of things to cover in the 18 minutes that are left. So, we’re going to [01:58].

A quick show of hands – who’s heard about [log2timeline or plaso]? Alright, that’s [02:04] so you all know how it works. It basically builds a big timeline of things it encounters – files, file systems, etc., event logs. It’s going to extract timestamps and try to build a nice, neat timeline of things that happen on a system.

Who knows about timesketch? [Also pretty cool.] So, timesketch plays really well with plaso, as well as [it should,] because we both … our team develops both. It’s basically a forensics timeline [visualization] tool, so it’s some sort of enhanced [rep]. Which is pretty cool, because you can see several of the timelines that you extracted, plus [go on] different systems, you can all see them in the same [timestamps], so this is pretty useful when dealing with large-scale investigations. It’s also multi-user, so you can have many users on the same [timesketch incidents], collaborating with each other. Multi-case – you can have several sketches in the same timesketch incidents, for different incidents. And also multi-timeline, so you can timeline systems and have them all in the same sketch. So, you can easily draw associations between [each of them].

I mentioned dfTimewolf – who knows about dfTimewolf? Oh, that’s a few people! That’s pretty cool. Thanks! So, dfTimewolf is basically everything that glues together our toolset. If it has an API, then dfTimewolf can talk to it and can make sense out of how to use it. So, basically, it works with modules. Modules are [just] very simple Python classes that interact with the different APIs. So, you can have [collectors], so for example, if you need [03:43], you can have a [03:45] collector. We have a plaso processor that’s going to process the evidence that came from [03:49], and we can have a timesketch exporter that’s going to export whatever file the processor [gave it]. So, everything is articulated [with what we call recipes], which is basically a very small json dictionary that explains the details, how to change all these modules.

We’re also going to go talk about Turbinia, and I’ll let Aaron talk about this.

Aaron: Hey everybody. I’m going to talk to you guys about Turbinia and some of the work we’ve been doing over the last year in this project. It’s an open-source project, similar to all the other ones that we’re working on. Can you guys hear me okay?

I’m going to … Is that better?

Audience: Yes.

Aaron: Awesome. Yeah, so what we want to do with this is automate a lot of the common tools that we use every day. We want to be able to do this at scale, and we also want to be able to do this not just in the cloud, because it gives us some nice scaling properties, but also on the cloud, meaning actually processing forensic artefacts from the cloud itself, and process, as we move … people are moving more of their infrastructure into the cloud. We wanted to be able to process that for the rest of our tools, to keep up with that.

A couple of other notes about this. This was originally conceived by [Johan and Corey] a couple of years ago. They were on the initial proof of concept, a year and a half ago maybe, when we started rewriting that, [re-architecting] a little bit. We’re not actually very good at [05:24] here, this is the first [05:25] that we came up with, which we affectionately refer to as the flying [05:28]. [laughter] [We came up with the one we have now just a little bit earlier.]

Yeah, so we can install this in different environments. Cloud is kind of a major [05:43] case that we have, and [05:44] everything is in the cloud, all of the evidence that we store, all the processors, and we use the cloud infrastructure. We also have the ability to put this in kind of a hybrid environment, so the infrastructure pieces are still in cloud, so you [have] to maintain those. But then the workers and storage stays locally, so you don’t have to push all of your data in the cloud, but you get some of the benefits of that. And then, finally, there, just in the last couple of months, we have this local version of Turbinia, so you can keep everything completely local. That was contributed by [Eric at Facebook]. And that uses some other [06:27] and that’s kind of nice, because then you can have all your [06:35].

Here’s the overall architecture. I’m not going to spend a lot of time on this, but you can see there are clients that make requests to the server [through Pub/Sub]. The server then schedules tasks, also through [Pub/Sub], in this case using [06:49]. Workers pick up those tasks, and can talk to the evidence storage, which in this [line] … this is the cloud version, so this is all in cloud, in [VCS] or system disks. And the server stores its state into datastore, and then we can query that from our clients through cloud functions. So, it’s kind of nice, because this is all in the cloud, and from the client’s perspective, you can have a super lightweight client that can just talk [07:17] functions.

Here’s the hybrid version, which is basically the same, except for you can see at the bottom the workers [in the] shared storage has moved off, but we can still leverage the rest of the functionality of the cloud. And then, here’s the new local version, very similar diagram, [except for] you have the components replaced with Kombu as the messaging interface, Celery as the task scheduling, and Redis is the storage that clients can talk directly to. And then, [save here, the local …] the workers are also local, on [07:53].

Obviously, there are some pros and cons to doing this. The cloud is nice – you can scale up, you don’t have to manage that infrastructure. But it’s a little bit difficult to get some evidence in and out of the cloud. You’ve got a lot of big raw disks, and it could be a pain to copy it up there. So, in our environment, we’ve got a couple of different installations, and we can push things to whatever makes sense, whichever [08:20] make sense. For the hybrid as well, only the metadata goes up to the cloud, but then everything stays local, so you can just have everything on shared storage. And then, local, again, that’s everything is local.

Evidence. In this case, what we mean by evidence is really anything that we want to process. This can be a raw disk, a cloud disk, a plaso file, anything that we can define metadata for. And these are all defined in Python. The tasks, when they run [… generally, when we make your] request in to Turbinia, it’s to process something like a disk. As when the tasks are processing this, they can generate new evidence that gets fed back into Turbinia. [That would get mapped] against jobs that can process, and then tasks can get rescheduled, to process those, and that kind of generates a tree of processing that happens.

The evidence, as far as the [client and server] is concerned, is really just metadata, and then the actual storage for that, the metadata [09:26] stored on some type of storage or cloud.

We have [a pre and post] processor that can make this … the evidence available to the tasks. So, for example, the [cloud disk can attach it]. A nice property of this is that the evidence types can be kind of stacked. Because this is all in Python, you can have … the example I have here is a cloud disk, raw embedded, and what that is is a persistent disk with a raw disk inside of it, so the outer disk can be mounted first, and then the parent classes pre-processor can be run to mount the inner disk, and then we can continue down that chain, for example, [one of the things] we want to add to it is support for encrypted disks. And then just have another [10:17] process.

The output manager – because we’re running in cloud, we need to be able to copy things back and forth, so this basically just copies things [10:27] task comes up, and you need access to evidence, you can copy that into the task, and then conversely, when it’s [10:35] new evidence is generated, copy that back to storage.

A typical workflow is the client sends a request to the server, the server maps that against jobs that can process that particular evidence type, and it creates tasks, which get scheduled, and then a worker from [10:56] will pick that up, it’ll copy that if needed. Pre-process that, and then it will actually run the task itself, to process the evidence, and then, as I mentioned, it can generate more evidence, and send that back [11:10].

I mentioned earlier, this kind of creates a [graph] for jobs. This is actually an auto-generated graph from the code. You can see that the rectangles are evidence, and they are mapped to jobs, which generate more evidence, and then that kind of flows all the way [from our disk] down to a [filtered text] file.

Creating new tasks is easy, now that the infrastructure is there, even if you’re not a super strong coder. For a simple execution task, it’s just 15 lines of code. There’s some of the boilerplate code, and documentation [11:53] and adding things.

And then, just a quick note on the scope here. For Turbinia, all of the orchestration and things that happen externally, like with Timewolf, [I’m sure you’re demoing] that, but we intentionally want to limit privileges that Turbinia has. We could have it go out and grab all of these things from different places, but we actually want to push things into Turbinia rather than have it pull. And then Timewolf to push that.

Next steps. We’ve got a lot of things we want to do with this. Support encryption. We want to add reporting to the tasks. This quarter we’re focusing on adding new tasks, we’re doing [12:33] analysis. We want to add recipes, similar to what Timewolf has, so we can make it a little bit more configurable. And some of the code … the very last demo that we’re going to show you, we still have to polish up and publish more external [repo], so that’s [12:49] everything else is [going to be] open source, [12:52].

Just to go quickly – the big picture here, this is again kind of end-to-end with our tools, we didn’t talk much about GRR. That’s kind of like an end point [13:01] systems, and pull data back from that. We can Timewolf to go out and grab that from GRR. We can grab things from cloud, and then we can push that for processing here with Turbinia. And then, finally, after processing happens, we can push timelines and other things like that into timesketch.

I put this iceberg image on here just because I really wanted to make the point: a lot of the stuff, automation and things we’re doing, is built on a lot of other tools, and a lot of time and effort, and [this has been] years of all these other parsers with plaso, libyal, TSK, all these things that we depend on, made this possible.

With that, I’m going to introduce our scenario. And Tom is going to talk a little bit about the demos we’re going to do.

Thomas: [Let’s see a bit,] let’s talk about the scenario that we chose. Just FYI, none of what I’m going to talk about next is true except for the demos. So, please don’t freak out if you see people being compromised.

The victim that we chose is Greendale Poly. It’s a very famous fictitious university. Everyone’s on semester break when, suddenly, someone gets a tip, and the admin reports a suspicious domain. It’s grendale.xyz, with one ‘e’. And well, this is the scenario, and now Greendale has been migrating, conveniently, all their infrastructure to Google Cloud, so this is where all [of our tools can] become very useful.

So, if we look at this, we can think, “Okay, grendale, that looks pretty targeted.” We’re going to look for related artefacts, and let’s explore a bit more what our forensics options are. This is the first demo.

Basically, what we’re going to do is run dfTimewolf on a cloud instance in a cloud project, [always GCP] in this case, and this will create the GCP_forensic [14:58] as we can see here. It will create … it will [bootstrap an analysis VM] in another separate project. It will grab a snapshot of the disk that exists already in the allegedly compromised instance, which is Greendale admin here, [because] we want to check if the admin has clicked on any links. And it will create a disk … it will create a copy of the disk, [15:23] Greendale copy.
So, what can we do with this? This will also create an analysis VM that we can [SSH] into, and we already have all our tools available there, like [FLS] and the disk. The compromised, allegedly compromised disk, is already [15:37], so you can just do an [FLS] [15:40] if you want, and you can start doing forensics right there.

What if you want to do things [a bit more on scale with Turbinia]? Exactly what Aaron is going to talk about now.

Aaron: This is the demo we have for Turbinia. Just to set this up real quickly, we’ve got three different windows here, in [16:09] session. We have a client – that’s kind of the view you’d actually have as a client. And I’ve put the server and worker here, so you can see what’s happening.

So, we make a request to the server, and we can see here that there’s … we’re specifying a keywords file. In that keywords file, we put the grendale.xyz domain, and that’s going to filter some of the output for this, [when we’re processing our cloud disk. And there [… we have the other problem parameters] for that. We can see that the server picked up that request. It’s mapping it against the different jobs. And then, now we can see it’s scheduled a plaso task. That got picked up by a worker, we can see plaso is running, and it’s going to generate a plaso file. And send that back to the server here in a second, the server is going to map that against jobs, and then it’s going to create some other jobs as a result of that. One of those jobs is the rep task. And it’s going to take the keywords that we entered in the first step, and filter the output and generate a second filter list of output here.

Again, at the client, at the bottom there, we can see how many attacks have been run. And then, what I do here is I just look … it’s a little squished on the screen here, but I take the filtered output from there, and look at it, and then we’ll be able to see the … we can see here if there’s a [17:37] to our domain at [the bad domain here].

[That is Turbinia]. Everything’s fine.

I’m just going to go through … we kind of already talked about this, we’re running out of time, so I’m not going to go through all this, but we’re … dfTimewolf is copying the disk, send it to Turbinia, and what I really want to show you here is the last demo, which really combines all the stuff. This is the part that we’ve been working on the last few weeks that just got completed or almost complete. Complete enough to make a demo of it, and we’ll polish that up and publish it soon. But this is going to look very similar … it’s very fast [and works] very similar to the first one. We specify a separate recipe in the [UCP Forensics] Turbinia recipe, and then we specify all the same cloud information to specify that. We could also potentially specify all of the other instances to automatically do the set scale, but we see … basically, similar to the first demo, where I copied the disks, then sent a request to Turbinia, a little way in here, and then I time-sliced this [to make it as fast], but we can see the output from that.

And then, at the very bottom, what we did here is a link to timesketch. And so, we can actually see all the output from those previous things in timesketch, which is right here, and if we do a search on that suspicious domain, we can also see here the output of plaso’s [parsed past batch history] file, we found that same [19:19].

I’m going to let Tom wrap up here.

Thomas: Right, so fortunately, no one got owned. The payload in question was a keylogger, but didn’t find any traces of lateral movement, and also, Greendale uses two-factor authentication everywhere, so they’re all safe. We suspect the attacker’s objective was to disrupt the launch of Greendale’s new PhD program, in AC flow studies, which is pretty interesting, and all the universities want it.

We’ve covered a bunch of tools, but what else can they do? We didn’t talk much about GRR, we just mentioned it. GRR can do some host timelining and also run custom Python scripts. It’s basically whatever you want to … whenever you want to grab stuff from your endpoints, you will use GRR. Plaso … well, you can also focus processing on specific user-selected artefacts, like you can tell plaso “I want to grab this disk, and show me all the browser history artefacts that you have. Process them and give me a timeline.” That’s something that you can do.

Timewolf, as we’ve seen, you can basically chain whatever has an API [to it]. So, if you want to augment, I don’t know, your output plaso file or your sketch with your threat intelligence [repo], then it’s easy to do. You just need to write a small module, and you can automatically [grab your … the last output] that you got, and interact with Timesketch.

Timesketch also has a bunch of other features, like heatmaps and graphs. They’re all pretty cool, and I encourage you to check them out. There’s a demo instance in the demo.timesketch.org, I think. And Turbinia, as we’ve seen, it’s [20:54] for repetitive, parallelizable tasks that you may want to run on [32 cores] if you don’t have those [at home].

If you haven’t been paying attention today, this is what you really need to write down and take back, and write down in your report. So, basically, [these are the whole tools] that you can use in your ecosystem. We’ve seen most examples with GCB, but we can use them, as Aaron mentioned, like locally, with other software. Timewolf is not bound to any things related to cloud.

There are some cloud recipes and cloud modules of course, but you can also use it with GRR and other things. It’s used daily by our teams, so these aren’t tools that are like proof of concepts. We already use them to respond to incidents. So, that’s pretty cool. And we also encourage [attributions]. It’s all open source, in the next slide that I’m going to show you, and it’s all Apache 2 license.

Here’s a list of all the open-source [21:47]. It’s mostly on GitHub. We also opened, recently, a Slack channel, https://open-source-dfir.slack.com. There’s a small herokuapp link that you can use to automatically join the Slack channel. It’s to provide [22:04] about all these tools, but also DFIR tools in general, as long as it’s open-source, we’re happy to talk about it. And I think that’s all for us, so if you have any questions, we’re happy to take them.

Thanks.

[applause]

Host: I think there is minus one minutes for questions, but we can indulge one while the next speaker comes up.

Audience member: I love plaso, and I like what I see here, especially some of the future discussions I think we’ll have about the … moving from [22:49] to the cloud. One question I have is how you’re dealing with, say, [drop] messages or invisible [areas] … the problem of processing, perhaps it’s not getting completed or missing the results, [23:02] back to some collection. I’ve seen this on [other distributed] processing systems. It’s very … we found it difficult to manage.

Aaron: Yeah, that can be an issue. We haven’t built anything in the system to try and do [23:20] sort of thing. We try to do a good job at detecting errors and [23:27] that up to the user, so that the user can make that decision, so if the task fails … like we have a status message that gets [23:34] back up to them. But that’s something that we have to deal with too.

Audience member: Thanks.

Aaron: Thanks, everybody.

[applause]

End of transcript

Leave a Comment