Daniel Spiekermann discusses his research at DFRWS EU 2017.
Spiekermann: I would like to present a paper of digital investigation in OpenFlow networks with ForCon. I am a PhD student at the [FernUniversitat in Hagen] and the topic of virtual networks and network forensic investigation is my … I’m focused on.
Before, I would like to start with a scenario – just imagine you work for a law enforcement agency, [as Martin mentioned before], maybe the state police of [00:43], and you have a project. You have the job to wiretap the traffic of the red virtual machine. Typically, the network forensic investigation differs slightly, the network forensic investigation in law enforcement differs slightly from common network forensic investigation, I would say, in a company, whereas the focus is on to capture the traffic of all hosts, but you only capture traffic which you pre-define. In wiretapping something, you try to capture every packet of the system of interest.
In this case, it’s the red one, and the common traditional method forensic investigation is really easy, I would say. You identify the target – it’s the red virtual machine – you install your equipment, whatever you need for this tap, [01:38], you capture the traffic, you wait, and then you analyze the data.
Normally, you would choose [01:47] here on the left side to capture the traffic at this physical [01:53], this physical switch, whatever you prefer. But it’s a virtual environment, so what might happen? The virtual machine might [degrade] from one physical host, Physical Host I on the left side, to physical host two on the right side. Your capture technique is installed on the left side. So what will you do?
Here you can take your [sneakers], you can run to Physical Host II, and install the equipment again, reconfigure the capture process, and meanwhile, you won’t capture any traffic. So the captured traffic file will not contain any relevant information.
So my research question is based on that. How can you capture the entire network traffic of the system of interest? If you separate this question in little steps, [there is] the question. How can you determine the migration of the system of interest? There are different sources of information which you can use, but which are the best? And how can you reconfigure the capture process as fast as possible? Which means in time the virtual machine [degrades], and you should reconfigure your capture process as soon as possible to capture each package sending or transmitting via the virtual machine. And how can the network traffic be reduced to the relevant information? Mark mentioned the case of wiretapping the mobile phone, that’s really not fun. Analyzing lots of network packets captured in one file, without any relevant information of the virtual machine, is not [really funny too.]
So just a short outline. I will briefly mention the new challenge. There are lots of new challenges in virtual networks, for network forensic investigations. I will just discuss the migration problem. I will talk shortly about the basics of [software-defined] networks and OpenFlow. In our research we derive the virtual network forensic process framework, which offers the guide to implement successful virtual network forensics. And as a result, [we] developed ForCon. It’s still a proof of concept. But we try to evaluate our virtual network forensic process with this software. And I will finish my presentation with a short evaluation of ForCon.
So the new challenge – in virtual environments, it’s, I think … everyone knows the migration of virtual machines is everyday use. So it’s not a really special event which is very rare. I think it happens every time, every minute, lots of virtual machines are moved all through the virtual network, from one physical host to another. Now, this is done by … I would say it’s very simplified, two different controllers that are cloud controllers, which can manage the migration of the virtual machines and the depending services, hence the storage tools and [this], and you have on the other side the software defined networks controller, which manage part of the network, which means the traffic control, the [forwarding] route, the [firewall rules] routing policies, [access controllers], all is adapted in time by the SDN controller. There are some other controllers, like for the virtual storage and so on, but it’s a simplified model.
So how does the [06:04] work? Software defined network with OpenFlow … we focused on OpenFlow because we say it’s the most notable protocol for the so-called southbound [api], the API. This is the part which defines the communication between the SDN controller and the SDN switch. There are the northbound API and the eastbound and westbound, but we focus in network forensic investigation in law enforcement to the southbound API.
OpenFlow – [the name is it] – uses flows to process the packets. And these flows are stored in the flow table on the OpenFlow switch. They are not stored in the controller the whole time. They are valid for the given SDN switch.
So, in the flow, the meaning of OpenFlow is a combination of header fields. A typical layer two network switch is able to extract, I would say, or use a [07:08] maybe some additional information, but OpenFlow is much more powerful. It uses all these types of header fields of different [LAN] information, of the source IP, destination IP, port addresses or port numbers, even the ports … the network package runs at the OpenFlow switch.
And at the bottom of the slide, you see that’s just [a flow]. To be correct, that’s a rule that defines, I would say, a [07:46] request or something. The [broadcast] destination is 255 four times, so something like that [07:57].
The SDN controller in combination with the cloud controller really creates a highly dynamic and a very flexible environment for the provider. The network forensic investigator has lots of issues with this highly dynamic environment.
Why? The traditional network forensic investigation is static. It’s limited to your hardware-based techniques like [span], or [08:24], [08:25], a bridge, whatever to choose – this is really static. You can’t reconfigure it as fast as the environment really provides it for the virtual machines.
At the beginning, we try to split the whole process of capturing traffic in virtual networks in different parts, different steps. And tried to identify relevant parts. And identify – identification is the first step. You still have, of course, find your system of interest. In traditional network forensic investigation, it’s really easy. You can trace the cable or you can find it in a rack, because it’s static. This is a physical [server], so this is the traffic where I should connect my [capture] device. In virtual environments, you have to find the relevant [OpenFlow] switch too. This is not easy, because there is no physical OpenFlow switch. There are OpenFlow switches which might be physical, but most of the relevant OpenFlow switches we try to use are implemented in software. So you have to find the relevant OpenFlow switch.
After that, you have to extract the needed information. What does this mean? You have to extract the flows out of the OpenFlow switch. The OpenFlow switch stores the flows which are [varied] for this switch. Not the controller. Only the OpenFlow switch stores every relevant flow which is needed to transfer the network packets arriving at this switch. And in our approach, we try to manipulate these flows on the [relevant] switch. It’s [a phase of preparation called]. And then you can capture and store the traffic. We define in network forensics in law enforcement, the network forensic three steps – capturing, recording, and analysis. So you have to capture and you have to store the data – only capturing in the memory is not enough, you have to store it for the subsequent analysis on a hard disk.
This is really necessary, and this is new, we would say. You have to monitor the environment. The only first implementation of identification, preparation, capture, and recording is not enough. You have to monitor the environment to react on relevant changes. The network is really flexible, lots of things happen there, and at first you have to react on [11:16] … react on [all] changes, and then you have to evaluate, to analyze these changes, which are really relevant for your capture process. Maybe let’s say another virtual machine starts, which produces lots of files and changes in the environment. But the start on the virtual machine is maybe not relevant for your capture process, let’s say for the [11:45]. If a group [11:47] somewhere starts in the environment, sure, but it does not impede our capture process. So to react on relevant changes is necessary.
And if a relevant even appears in the network, you have to adapt your capture process, which means you have to create new flows, you have to manipulate other flows in the network to reconfigure your capture process as fast as possible. And okay, the subsequent analysis is not part of the capturing traffic, but it belongs to the network forensic investigation, so we mentioned the analysis phase at the end of this process.
And if you put some arrows between the spaces, you get another framework. It’s called the virtual network forensic process. It derives from the generic network forensic process, which is, I would say, limited to the left side, which means identification, preparation, capture, recording, and analysis. And we define three separate phases, which are used repetitive, and it’s a circle, and it’s ongoing and ongoing, until someone – for example, the digital investigators – say, “Okay, let’s quit this process,” and then the circles leave to [analyze] the relevant data.
So this is really theoretical, and sometimes I am preferred to validate my theoretical thoughts in a practical way. So we developed ForCon. ForCon is still in proof of concept. It’s the short form for Forensic Controller, and it bases on the virtual network forensic process, and ForCon implements network forensic investigation in virtual networks with a focus on law enforcement. It won’t implement a successful network forensic investigation capture relevant traffic of all machines you have in a network. It only captures traffic of one given system, identified by let’s say the MAC address, IP address, the combination of both, whatever you want.
ForCon uses one central server, and distributed agents. We call them SDN-agent and mirror-agent, [14:30]. One SDN-agent per physical host, which means every [14:36] [node] running in your network, which means, say the [Amazon] network maybe you have to … I don’t know, 100, 1,000 SDN-agents? But you only need one mirror-agent. The mirror-agent is to establish or to connect your capture system somewhere in the network where all traffic is transferred to.
And the main idea of ForCon is to extract relevant OpenFlow flows and manipulate them. This manipulation tries to implement the ongoing capture process of the [time] system. ForCon is still … has a command type information, at the beginning different agents connect to your central server, and then each agent transmits the local flows he found on each OpenFlow switch installed on his physical host to the central server. It starts with an ‘I’ for informational, then the name of the OpenFlow switch instance is transmitted, and in this case two MAC addresses, which means the local OpenFlow switch has two flows in it, which defines the source which the MAC address ending with a 3, and [16:06] the destination MAC address is 1, and otherwise, which means if you try to identify the communication, there are only clients communicating, which is above. That’s simplified by the [server] [16:22] shows that it works in larger environments.
The agent only extracts flows and send them to the server. And the ForCon server analyzes these flows and searches for the identifier given by the digital investigator. So , maybe the MAC address, the IP address, a combination, whatever you want. And depending on this identification, ForCon has to decide either it’s a miss – so there is no information about the relevant system I am searching for, so I am just waiting. I do not do anything, I’m waiting for other flows inside the network, sent maybe by another agent. And the other side, if the identifier or the target system is identified inside the flows, ForCon creates a special tunnel. In this case, we use vxlan-tunnel, but you can use [gre] or [vlan] tunnel, whatever you want, between the mirror-agent, the one agent, the one mirror-agent in the network, and the involved vswitch.
How does ForCon manipulate the flows? There are different parts, which are necessary to implement valid manipulations. At first, we split the communication into ingress and egress flows. Ingress means our target system is the destination, so it’s … these are the packets transmitted to our system of information. Seen here, destination MAC address is adding with a 3, which means in our case, we try to capture all traffic of the system of interest MAC address ending with number 3. And ingress is really easy to implement.
You have the actions in the OpenFlow … so here. Which means put these, all traffic with the destination MAC address ending with a 3, to port 2. ForCon manipulates this and add it, in additional [output mode], in this case it’s 99, which is defined by the tunnel which is created. So the first flow means if the packets sent to our system of interest, [send it] to this system of interest, because the system should not know that we are wiretapping him. And on the other side, copy this packet and transmit it to our capture system.
Outgoing packets – egress packets – are not that easy. We need additional information, which means only by defining our destination, in this case number 1, please copy this information to output 99, [which] implement that lots of packets that aren’t relevant for us would arrive at the capture system. So we combine different information and implement a flow, with … yeah, information of our system of interest, in this case, we add the MAC address again, as dl_src, and this flow defines each packet sent from our system of interest ending with the MAC address 3, sent to a system with the MAC address 1 at the end, put … [20:09] send it to output port 1, and send it additionally to output port 99. And to use this flow really, we have to implement the priority. Priority manages the use of dedicated flows in the switch. The higher the priority, the … yeah, this flows [20:34].
How do we do this? [Let’s just all] dive into it. So now we’re using existing tools. Our agents are limited to [ov switch], but I think it’s the most notable OpenFlow switch in software. It’s an improvement of the old [20:56] switch, and it has lots of benefits. So in a virtual environment, [open v switch is …] yeah. [Highly used].
So the format of [flow is] deterministic, but it’s vendor-specific, so we can’t access the SDN controller to extract information. So we have to go on the virtual switch, on the OpenFlow switch, to extract the relevant information, and we have to take care of different fields; the priority I mentioned before; the action, which means the traffic is sent to this port or to this port; there are groups to combine different flows and put it in another table, and these are all parts we have to manage; there are different timers in OpenFlow, which means after running out, [be it] the hard timeout, defines if the hard timeout reaches zero, you have to delete this flow of the OpenFlow switch.
So in law enforcement, you can’t say, “Oh, it’s [22:11] but we will still capture the traffic [22:13].” It has a reason to put the hard timeout, so we have to mention this.
We split the flows, we store the relevant data, and the adaptation is [with regards] to the given situation.
Lots of text, plus an image. This is the simplified but realistic infrastructure. You see two different SDN controllers, one on the left, one on the right. The OpenFlow switch s1 is connected to both SDN controllers, which is possible. And the OpenFlow switches s2 and s3 are connected only to the right SDN controller. Then you can implement ForCon in your virtual environment, and these green marked squares are the agents, one per [computer], one per physical host, not one per OpenFlow switch. So [number is little bit slower].
And you have the [brmon] agent running on the upper right side, where your capture system is connected. And if you start ForCon and say, “Okay, let’s capture the traffic,” [don’t know] [23:33]. On the right side of switch 2, it identifies, it tells the agents, “Okay, send me your flows.” Each agent, on the left and on the right side, send the flows to ForCon. ForCon analyzes the flows, and then says, “Okay, my system of interest is connected to OpenFlow switch s2, so we have to create the channel to capture all the traffic and transmit every packet to our capture system.”
After the initial capture, the monitoring phase is started. So each agent, or at the beginning, only the agent on s2, on the physical host where s2 is running, [24:21] environment, waiting for a relevant change of the connection of these systems of interest. All other events are irrelevant. But if [something] happens, the agent enforce ForCon, and ForCon says, “Okay, all agents, please update your flows, and send me the actual data you got,” and if the system of interest has moved to s3, then the information is sent from the agent, and now your system is connected to s3 and the tunnel is reconfigured, not ending here, so it’s just ending there. And you have ongoing capture process, without any [packet loss].
[Is really the case?] So we evaluated this. This is a very simplified environment, but how is ForCon operating in different situations? What will happen if the CPU load reaches 100 per cent? Or the [25:23] address the memory table where our MAC addresses are stored? [Is] the numbers stored in it are still increasing to 100, 1,000, whatever the limit of the [25:35] memory table is? What will happen if the network we are using, the physical network, is highly used? Lots of packets are transmitted, what will happen, how will ForCon act in these scenarios? And does ForCon capture really all network packets or is it there are packet loss?
It’s very easy to decide, you only compare the number of transmitted and received packets [in the environment], you can send a predefined capture file into it and you can see, okay, this capture file is ten packets, so on the other hand, I should receive ten packets too. And the most important I would say for law enforcement: are the captured packets correct? Do they really have the transmitted data in it or is it something changes? Because it’s a virtual environment and we’re using a tunnel to transfer the data. Is it really correct, what we capture?
In short, yes, it is. We evaluated the high CPU load, we used tool stress to increase the CPU rate to 100 per cent and we do not have any packet loss. We [filled the CAM] with, I would say, 2,500 virtual NICs, so 2,500 MAC addresses are stored in the CAM, and so the number of flows is still increasing, exploding, and we still have 100 per cent packet match. No packets were [lost]. And the network usage, we used iperf to simulate a high load on the network, and even with 100 per cent packet match – no packets lost.
The integrity was not that easy to define if the packets are correct, because it takes time to transfer the packet to different systems. The capture process, it’s on the [right side], it’s not as close to the source as the other systems. So the packets have to travel through the network, so the arrival timestamp differs. Depending on the environment, it may [vary].
But if you extract only the payload which is relevant, so if you cut off some information, like timestamps or so, and you only extract the MAC address or the IP address of the real transfer data, you can see the [comparison] with hash values showed that the packets are the same. It’s not that easy, you have to split your whole capture file, store each network packet in a single file, and chop off some information, and if you compare then, the network data, the real payload, you see that the captured data is the same.
So let me conclude. SDN and network virtualization increase flexibility and dynamic in nowadays data centers. So I would say it’s getting a bigger problem for law enforcement agents, the virtual network. Traditional wiretapping techniques will fail in the period of time, I would say. OpenFlow as the most notable protocol does not provide any forensic capabilities. I don’t know [really there’s] a tool which is designed to use in a data center which provides any forensic technology which improves the work of a digital investigator.
Lots of issues exist in these virtual environments, but the migration of virtual machines is, I would say, the most critical. If a virtual machine migrates and moves to another physical host, your static capture process will fail.
ForCon, as a proof of concept, eradicates the static implementation and really creates a highly dynamic and flexible process to ensure the ongoing capture process, even if the system of interest moves. ForCon uses distributed agents, one per [computer] node, to monitor [each] environment, and one mirror-agent. And yeah, the evaluation of ForCon validates the correctness.
That’s it. Thank you.
Host: Thank you, Daniel. I think we’ve got time for two very quick questions. Anyone?
Audience member: Chris. Thank you for your excellent talk. Why do you communicate with the OpenFlow switch directly instead of integrate with the SDN controller or communicate with the SDN controller.
Spiekermann: We have different ways to extract the relevant information. In law enforcement, you typically come into an environment where you do not have any information at the beginning. So if you come to, let’s say, some environment, and they use an SDN controller named Floodlight. Floodlight stores … no SDN controller stores any relevant information for a lot of time. They are in memory, but you have to extract them … in any way you could imagine. But there is no valid way to extract the flows out of each SDN controller. In our research, we found more than 58 different SDN controllers, and you need … some provide an API to extract relevant information, sometimes you have to reload some modules to extract this information, but by extracting the flows out of the OpenFlow switch, you have all information you need.
So the information, the amount of relevant information on an SDN controller is typically not enough to create the capture process. And it might happen that the SDN controller has informed the OpenFlow switch, “Hey, transmit all these packets now to open port 5.” And if you then, after this, get to the controller and extract information off it, you won’t get this information. So it takes more time to create the initial capture process. The OpenFlow switch has each flow start, and if you extract this information, you can create your capture process in time. Just extracting, creating, finish. Otherwise, you have to wait until the controller sends new information of this agent. Okay?
Host: One more, very quick …
Audience member: Maybe just a question about scalability of the system. How scalable? Because we are talking about virtualization, we are not talking of actual machines, right?
Audience member: So you have some [33:09] result about how scalable can …
Spiekermann: We evaluated ForCon with … ForCon needs an agent on each physical host. So if you’ll leave some physical hosts without any supervision, you won’t get any information. So you have to install an agent on each physical host. That’s first. But it’s a beginning, each agent sends information to the server, and then only relevant switch is involved, and this switch is [33:55]. So you can have 40,000 agents, but there is only one relevant [in this type]. So it’s … ForCon is not a virtual environment, it’s just … in our proof of concept it’s just some code lines of program in Python, which extract the relevant information. So the scalability of this is till now not needed, I would say.
Host: Good. Thank you, Daniel. Let’s give him a hand.
End of Transcript
Forensic Focus will be at DFRWS EU in Italy in March – register here.