Eoghan Casey on the CASE Ontology for Digital Forensics Practice & Process

Christa: Forensic Focus’ coverage of standardization and digital forensics continues this week by exploring CASE: the Cyber investigation Analysis Standard Expression. An extension of the Unified Cyber Ontology, which defines classes of cyber objects in relation to one another, their provenance and actions, CASE is an international standard, supporting automated combination, validation and analysis of cyber investigation information, in any context, including criminal, corporate and intelligence.

To tell us more, including what it all means and how to get involved, this week the Forensic Focus podcast welcomes Eoghan Casey, who hardly needs an introduction, but for today is representing as the presiding director on the CASE Community Governance Committee. I’m your podcast host, Christa Miller. Welcome Eoghan.

Eoghan: Thank you, Christa. Thank you for having me.

Christa: So to start with, if you would please, walk us through what the CASE and UCO are. The implications for tool interoperability and advanced analysis, and beyond that, this culture of common comprehension and collaborative problem solving that you’ve mentioned. How is CASE contributing to the effort to improve the quality of digital forensic science, make it more evaluative?

Eoghan: Well, there’s a number of parts, and also CASE and UCO fit quite a range of needs that we’ve had for a long time in the cyber investigation community. So from the interoperability standpoint, fundamentally when we’re dealing with data from different sources and processing it using different tools, there’s a bit of hand jamming, or putting it into spreadsheets or formats that we can put together that’s error prone, and not particularly repeatable or consistent.

And so by automating a lot of the normalization and combination of the data, we increase efficiency. We bring those multiple data sources together more consistently and allow the practitioners to spend more time analyzing the data and less time just piecing it together.

And you mentioned provenance, that’s built in (really kind of baked in) to CASE. A fundamental portion of our structure is to allow for tracking who operated on digital evidence when, what actions were performed during an investigation, even where.

And so that’s a chain of evidence that allows us to make, I think, more concrete, or at least a traceable back to source and authentication of the evidence, sometimes in an automated fashion and filling some of the gaps that currently exist through just paper documentation, or let’s say, less routine and systematic documentation. 

One of the other areas in terms of quality is tool validation, and the automation of some of the tool outputs and comparison against known ground-truth really will make a big difference for validation and ultimately the quality of a digital forensic science.

Christa: So how so? Because I’m thinking about this in terms of accreditation efforts in the UK and elsewhere, the broader conversation around standardizing digital forensics processes and practices. So like just on a practical level, how do you envision that working?

Eoghan: Well, it’s more than envisioning. So in fact, there was a recent — this year, there was a funded, focused effort, in the UK, to show a proof of concept, automated tool validation using CASE. So it took some commercial tools with ground-truth data sets that were run through a process that used CASE as the layer of standardization and where you can take ground-truth representation, and compare it with the output of your tool automatically. You’re really taking a lot of the manual labor out of that, and some of the time also that it saves is important.

But you can do this on a routine basis as new versions of tools arise, you have the same ground-truth data set, and you can compare the new output or the output of the new version of the tool automatically against the ground-truth and see if there are any differences. So that’s happening now. It’s something that’s being kind of built out more in an operational context in the UK, given the importance now of some of these issues that have arisen in the past year.

Christa: Okay. I have to admit as a non practitioner, it’s still a little bit abstract for me. Just from a standpoint of using a tool interface for collection analysis and reporting, can you give us some examples of CASE objects or facets and the types of information they represent, and from there, the value to different sectors? Public sector, law enforcement, private academic, or commercial vendors.

Eoghan: Sure. And really from the user interface standpoint, it’s kind of ideally behind the scenes up to the point where it’s exporting or importing data to create this interoperability or to generate reports that can then be processed through some systematic way.

So if you look at Autopsy as an example, there’s a report generation option there that you can export some of the file system information in CASE/UCO format. And what that provides is some of the file system objects and metadata, and their context forensic image. But then what you start to see is, you’re not just dealing with pieces of metadata, you’re not just dealing with an individual date/time stamp or an individual file name. They are in context.

And so if you have a file, let’s say a outlook.pst file, that inside of it, it has email messages. And ultimately that’s parsed out to give you the headers with some email to and from addresses. You’re not just getting an email address as an entity in your data. You’re seeing an email address and its context within the to or the from, within the context of the wrappers that surrounds it.

And so you might, by maintaining its structural context, you can perform more powerful analysis where you can see email addresses coming from different contexts, but still be able to correlate them. And so you have this contextual analysis and even it’s a graph based representation. So even this link node analysis that allows for very powerful, more advanced analysis and digitalization. So that structure is the key. And it’s what CASE is, it’s ontology based. So it provides this conceptual structure, it allows for more advanced analysis on the data.

Christa: Based on the CASE website I saw a handful of commercial vendors along with some of the open source tools. You mentioned Autopsy as well as the NIST NSRL, have adopted or are in the process of adopting CASE. In what ways and have any independent tool developers signed on?

Eoghan: So there’s a range of developers. Most of the community is coming from organizations that either make tools or use tools in our domain. Some independent developers are involved, but we’re looking for more.

One of the key elements here is that there’s a collaboration across all the sectors to try and find a balance between this structural integrity (that the data as it in its context), but then the use, so making it practical. And that’s where we’re really starting to pull together this coming few months, is really going to be a time where we’re pulling together the adopters and the developers to cross that threshold of making it much more usable and providing tooling to facilitate implementation.

But one of the things that’s helping is some organizations (so customers of commercial tool vendors), some organizations are now requiring this type of interoperability so that they can liberate their data and take the data from a variety of commercial tools and other open source and COTS tools and be able to use the data for their purposes and not have it stove-piped or separated into individual proprietary repositories.

Christa: I want to get to the different roles. You mentioned adopters and developers, and we’ll get to that in a little bit. I want to focus more on the community aspect of it at the moment. Because this quarter, I know CASE is part of a broader cyber domain ontology project that’s transitioning to become a Linux Foundation community project, thanks to unanimous affirmation from the CASE Community Governance Committee. Is that the plan that this will encourage more adoption across both commercial and open source developers, and if so, how do you anticipate that happening?

Eoghan: So, yeah, it’s an exciting milestone for the community to be transitioning a community driven effort to this larger cyber domain ontology project under the Linux Foundation, which has various other… in addition to the Linux operating system, they’re kind of the the holders of a lot of major open source projects. And by becoming a part of that ecosystem, which is used within both the private and public sectors (the open source stacks of software) we do have a bit more visibility and ability to plug into other efforts.

But fundamentally it will improve our ability to do fundraising and start to put some money into the process, which up until now has been largely voluntary or really donated time by the community. What we ultimately can expect is not just for CASE and UCO to thrive, but the vision of this effort was to bring other cyber domains into this overarching cyber domain ontology.

So supply chain risk management, alignment with CTI and trying to, for example, align with the STIX standard and create a much more cohesive culture of common understanding, not just for cyber investigation, but for the cyber domain and improve cybersecurity and our ability to be resilient against the growing threats, more so than we are now. And perhaps we can start to grow in areas that we haven’t foreseen, but ultimately we’re just trying to create a kind of a fertile ground for collaboration and developing this culture of common understanding

Christa: On that note, at a webinar in September, you spoke about the Linux Foundation being neutral international ground to develop CASE and UCO. I wanted to find out more about what that means in particular.

Eoghan: Well, for a number of years, we were trying to… the CASE and UCO community were thinking about becoming nonprofit and to do something like that, you have to pick a place, right? So if we became a nonprofit in the US, then that creates some intellectual property challenges for folks outside of the US and vice versa.

And we tried navigating that, and we couldn’t find a viable solution until the Linux Foundation came and said, they’ve figured this out for many different projects and could hold the intellectual property. In a way that’s neutral, and is not creating any sort of barriers to entry or barriers to use anywhere in the world by any organization. So truly open source and collaborative.

Christa: Having said that, as we’ve been looking at and covering some of the existing EU-based projects, like LOCARD, FORMOBILE, the Netherlands Forensic Institute Hansken project, how does CASE, or would CASE fit with, or enhance projects like those as well as independent academic research?

Eoghan: That’s an excellent question. And it’s something that in fact… I’ll start with the NFI’s Hansken, because they’ve been a founding contributor to the CASE UCO effort with a lot of influence from their experience in building their data model. But they’re now really focusing and investing in the coming year on implementing CASE in their Hansken system.

And what that amounts to is the ability to export data in a common format and import data from other tools that support CASE. And so you have then the much more powerful ability to… Hansken itself is very powerful as a system, with a lot of capabilities and a lot of providence information within the system. When they export that, you’re pulling data out from, in PDF format or in some formats that are useful for, let’s say, core presentation.

But to make it interoperable with other systems and really have a two-way bi-directional flow between tools, organizations, countries that are working together in joint operations, doing that a machine speed is what the common language of CASE enables. And so that’s really where the main benefits come.

But there’s another aspect that we’re really starting now to bring to the community, which is the analysis and inference part. So formalizing how evidence-based inferences are represented. By doing that you start to get people to think about their process in a bit more of a formal way, where right now we don’t necessarily (in digital forensic science) we don’t interpret our digital evidence in a consistent or even common standardized way.

I’m not suggesting that we’ll get to that point as a result of CASE, but it creates some discipline and some rigor to have to represent your inferences in a formalized way. And so we’ve created an inference concept that is flexible enough to support anyone’s approach, but to get people to really put it out there instead of it happening just in the brain and coming out as a conclusion, without some transparency into how that happened, we formalized how that works.

And I know that the Hansken developers have been looking at building this type of inference and evaluation into their tools. So we’re going to be collaborating with them to see how those concepts can be made useful in practice.

With the FORMOBILE, they’re working on some standardization, also in training for mobile device analysis. And so we’ve been collaborating to figure out how CASE can fit into this. And that’s a work in process, and it really is dependent on ultimately their deliverables, but they have tools that they’re focused on. And some of those tools and the developers of those tools are involved in the CASE community.

So trying to figure out ways to create this, when you have an ontology or you’ve defined your concepts as a community, it’s a common language and that’s something we’ve lacked to date. And so that can perhaps have some use in a training context or in some of the tool testing context that FORMOBILE is working on.

With LOCARD, they’re focused more on the kind of the storage and what I would call the management of the data, and CASE can provide a metadata layer on top of that. And we’ve actually developed a proof of concept, it was a blockchain based, which is part of what LOCARD provides.

And so we’re trying to provide the tooling to allow anyone in the community for any purpose that they have, to make use of the collective effort of the CASE community. Like I said, at the beginning, it covers a lot, and there’s a steep learning curve. My recommendation is: come participate, but define your problem first, because you can’t swallow this whole.

Christa: On that note, one other thing that you mentioned in September is a project is moving from Confluence to the very popular GitHub for better transparency and participation. And then going back to what you were saying earlier about adopters and developers, the website lists four roles, including ontologists, mappers and discussors as well as adopters. What does each involve and then how can practitioners and developers determine the most appropriate fit for them?

Eoghan: So just to kind of give some background: we’ve always used GitHub as the repository for the results of work, and there’s a lot in there in terms of being able to see the work that’s ongoing. But some of the workflow that had proposal development for new concepts or filling gaps in the current standard, were happening in a development environment, in Confluence. That was somewhat restricted just in terms of being able to keep track of who needed access to what to make sure that they could get the work done.

What we realized in moving to the Linux Foundation, but also as we grew, was that that wasn’t scalable. And also it’s not as inclusive as we want to be. And just in discussions with the Linux Foundation, they said, “Well, don’t you want anybody to be able to come with a proposal that’s kind of formed that you can consider?” “Yes, we want this federated model”, and we couldn’t keep up with all the new ideas that people were coming with, kind of, proposals. We said, “Okay, go develop it, we’ll help you, but do it in your GitHub repository or collectively as a working group.”

And by federating it, we increase our scalability and then people can come with more fully formed proposals, not just an idea, but we have a particular set of requirements for the proposals. They can come with it for comment and review. And so it streamlines the process a lot more, and it’s much more transparent because you don’t have to then have access to Confluence. You just have to have access to GitHub, and you’re good to go. 

One of the things we are still using some of the workflow management within the Atlassian suite, with Jira and Confluence, is to just track the workflow of new, new issues, new proposals, making sure that it follows a consistent process and it’s documented so that everyone feels like and knows that they can get a fair consideration for any proposal they bring.

And get comments back from the broader community, which is really a kind of a huge side benefit of all of this is the collaboration that grows between people in our community. And they come up with other ideas that they go and collaborate on. And it’s really an exciting thing to see.

Christa: Yeah, it sounds like it! Can you give examples of the kinds of proposals that people are coming up with? Like how are they arriving at them? As people are listening and they might be interested in getting involved, like how is their workflow feeding some of these proposals and what kinds of things are people in the community coming up with?

Eoghan: Just to give you one example: coming from the telecommunications side of things and seeing more data coming from cell towers, and let’s say, call detail records. A particular group of the community needs that information in CASE format to correlate with data from other sources, from mobile devices and computers. We have a group now working on proposals for some of that data.

Another example is that many of us in the community said, “We need to represent the output of machine learning.” Now, machine learning is being applied to digital evidence in a variety of ways, we need to be able to represent the output of that. But that’s quite a big ground, it’s kind of quite a broad scope to cover.

And so we realized in talking through it that there are some things that you need to represent, but ultimately what the output is, it’s an inference on the evidence. And that’s where this inference concept came from and we collectively developed this inference, a concept proposal that now is well-developed and we’re starting to implement.

So it ranges from just representation of certain data types that parts the community need, to this more, let’s say, analysis focused and advanced inference and reasoning side of things. Once we can connect all of those things together with the tools it’s, I think, going to elevate our ability to find data and reason on that data across instant response digital forensics. The whole range of the cyber investigation process… from SOC to court, we would say!

Christa: All right, it sounds fascinating. I can’t wait to see how it evolves from here.

Eoghan: Yeah. So the evolution… the next year is going to be fairly accelerated. And so I would say that we’re expecting a lot more momentum, people joining, some of this federated development I mentioned, but also the momentum that we’re looking for through the Linux Foundation and some of the collaboration that comes with other initiatives that they have. And so yes, we have a busy 2022, and welcome all comers.

Christa: Exciting. All right. Well Eoghan, thank you again for joining us on the Forensic Focus podcast to talk about it all.

Eoghan: Thank you, Christa. And my pleasure. Yeah. Let me know if you want me to come back.

Christa: Absolutely, thank you! Thanks also to our listeners. You’ll be able to find this recording and transcription along with more articles, information and forums at www.forensicfocus.com. Stay safe and well.

Leave a Comment