Hi, my name is Tina Wu. And today I’ll be presenting my presentation on IOT Network Traffic Analysis: Opportunities and Challenges for Forensic Investigators.
I’ll be giving you a brief introduction on the topic, what motivated us to carry out this research. a detailed breakdown of the methodology we took, and a presentation of our results from the experiments, and finally, we talk about the tool developed that will aid an investigator in carrying out the analysis.
So with the increasing number of IOT devices and the potential evidence on them it’s become more and more important when working on criminal cases. When IOT devices are involved in an investigation, it is important to make a decision on how to collect evidence such as collected from the memory or the network layer, well this work focuses on the latter approach.
Typically this involves examining the network traffic between the devices and the systems it communicates with, so it is looking at all the communication channels of IOT devices, such as the cloud and mobile apps, looking for example, for unencrypted information.
So what motivated us to carry out this research? We wanted to understand what metadata in a network traffic can be useful as evidence. So there’s been a significant amount of research on IOT devices, on network traffic from various angles, such as the security perspective. However, most existing research does not focus on forensic implications. This helped us formulate four research questions.
First research question which we wanted to investigate was, does an IOT device expose ports that allow an investigator to connect or access a device? Well, significant previous research has shown that many IOT devices expose their remote ports. So such as using port 22, which is SSH, this will allow investigators to easily access and acquire the file system, or attain evidence for an investigation. So we can also study if remote access was widely available, or if it is limited to a subset of IOT devices.
Our second research question was, do IOT devices use encryption when sending or receiving information from the cloud and then mobile app? It’s often stated in previous research that the majority of IOT devices encrypt their communication channels. This makes it very difficult to obtain any useful forensic traces. So we set out to investigate whether this assertion is correct, and if not, whether the devices were more, whether a certain device is more prone to sending clear text data.
Our third research question was, do IOT device mobile apps use encryption when sending or receiving information. So being able to observe the content between a mobile app in the cloud can be of forensic interest. So even when encryption is used, we can use a proxy server to decrypt the HTTPS traffic. So we set out to investigate what content is sent between IOT device mobile apps and the cloud.
Our last research question was to which countries do IOT devices and mobile apps communicate or establish connections with? This would help provide an indicator of where data resides. It’s often highlighted in previous research that data is spread across many different countries. We also make the assumption that data from IOT devices are either stored locally or in the same country or within the EU at least. So we wanted to investigate whether this is true.
So we now give a detailed breakdown of our methodology. So for our experiments, we used network traffic obtained from two sources for our datasets.
- Our dataset one, this is network traffic collected from 17 IOT devices connected to our test bed.
- Dataset two, this was an existing dataset created by previous research where the authors used their data set to classify IOT devices traffic into various categories using machine learning, so whether this was a light bulb or a home assistant.
They collected traffic from 28 ID devices, such as cameras, switches and hubs. Given the sheer amount of data they collected, we randomly selected only seven days from the dataset and excluded any devices that overlapped with our experiments. This left us with 15 IOT devices.
So once we merged the data sets together from datasets one and two, this allowed us, this created 32 IOT devices we used for experiments. When selecting IOT devices, 17 IOT devices, we had selection criteria that it was from a variety of families, so we included hubs, cameras, switches and smart speakers.
So this was to ensure that we had different manufacturers as well, so it represented a sample of devices available in the market. Popularity – we searched various popular outlets, Amazon, eBay, and we selected devices based on popularity, average customer rating and reviews. This was to ensure that we selected devices that were more likely to be used by the consumers.
Also, we selected whether a device for the compatibility of virtual assistants. So when looking at popular devices, we found that users very often favored devices compatible with Amazon Alexa, or Google Home Mini. So if we had a choice between the two particular devices, we chose the one that was compatible.
Network traffic was collected from various communication channels in our IOT environment so we first carried out port scanning, we then collected network traffic from the three different communication channels of an IOT environment. We then use a proxy server to examine network traffic between a mobile app and the cloud, then searched HTTPS traffic. Finally, we established the location of the data.
So to answer research question one – does an IOT device expose ports that allow an investigator to connect or access the data or access the device? So the aim of this was to carry out port scans to identify which open ports which then can allow an investigator to connect to a device. We used nMap to do a quick scan of open ports, we then used appropriate software such as browser to open ports, to access ports 80 and 443.
We then used PuTTY to access port 22. We found the result that the majority of devices use well-known ports or proprietary ports. Only one device allowed remote access. This was the Vera hub and this was port 22. The root pass was also written on the hub, so this allowed us easily to gain remote access.
We found the Victure cam and the Wansview cam exposed to a large number of TCP and UDP ports. This was in contrast to two other smart cameras. The Xiaomi cam and the YI cam, where all the ports were closed. Though it’s beneficial to have all the ports closed from a security perspective, but it obviously prevents an investigator gaining remote access to acquire the file system using traditional forensic tools.
So network traffic analysis was divided into three parts, whether the device uses encryption, whether we could decrypt the network traffic using the HTTP proxy and location of the data.
As an initial step when we looked at whether a device used encryption, we analyzed all the network traffic with Network Miner and Wireshark. In Network Miner we used the clear text dictionary to carry out a customized search, and in Wireshark, we carried out a string search on the network traffic of each device, searching for any device identifiers and personal information, such as names, emails, and passwords.
To find the easy method to identify if traffic was un-encrypted, we used entropy tests, which is we used a tool called ENT, which analyzes the packet payload to look for unencrypted information. So the test values runs between zero and eight. High entry value indicates there’s randomness in the payload, which is most likely encrypted. Anything closer to zero, which is low entropy, means there’s more likely to be clear text. Although there’s no definitive threshold, we set our threshold level to seven to avoid missing any unencrypted traffic.
We then set up a proxy server to intercept any traffic between a mobile and the cloud. We used the proxy server Fiddler and then a mobile device, Android mobile device, and ran and tested various interactions to device log in and log off.
In total, we examined 30 mobile apps. We then manually analyzed the data in Fiddler. In terms of location of the data we used, using the metadata, we primarily focused on location of the connected cloud services. So we used, we developed a Python script to extract the destination IP address, host field, and then we’ll bias that server from each device, we then use the destination IP address to identify the location using GeoIP database and a host address using WHOIS data to identify the IOT cloud infrastructure.
Research Question two – do IOT devices use encryption when sending and receiving information from the cloud and the mobile app?
So we examined the unencrypted traffic for any evidence potentially useful for investigation. We found a majority of devices encrypted the network traffic. This was especially between the mobile app and the cloud.
Overall, we found nine devices used no encryption with the cloud or mobile app. Seven devices used no encryption between the device-to-cloud. Three devices used no encryption between the mobile app-to-device.
Especially specifically the Xiaomi camera communicated with the cloud used no encryption and the D-Link camera communicated to the mobile app in clear text. While both of these devices showed high entropy scores, even though because they didn’t use any encryption, we found that because the devices use video compression, it meant that the entropy test will fail on these devices, especially cameras. So we’ve also found that when the Xiaomi camera detected motion, it would send the unencrypted video, the Mac address and timestamp.
This was also present in the mobile in the D-Link camera when it’s activated, when the mobile app was activated, between the device and the mobile app during live streaming where partial JPEG images were present in the HTTP header in clear text.
The Samsung camera also sent unencrypted HTTP post requests to the cloud, that exposed unique identifiers, such as the Mac address, user name, serial number, timestamp and other user-specific device names.
The three devices manufactured by Withings all sent clear text data through HTTP post requests. More specifically the Withings smart scales displayed a considerable amount of user information, weight, height in the host site mainly in HTTP post requests. So all this data is sensitive and helpful not just identifying the user, but also their physical characteristics.
So research question four. Do IOT device applications and mobile applications use encryption when sending and receiving information? So the mobile app-to-cloud – this is the communication between the mobile app and cloud, and we found seven of the thirteen apps allowed a proxy connection where the rest used different certificate pinning.
So what we found was when we opened the YI cam mobile app, it would send a list of URLs which contain the motion captured, the username, user ID and API key.
We also found some more unusual activity from the TP-LINK camera. When we opened the app, it would take a snapshot which included a timestamp and a URL linked to the JPEG snapshot. Note we weren’t actually able to control or disable this functionality. So from the same mobile app, we also captured HTTP GET requests which exposed the basic authentication field that contain a username and password to login for the advice. This is only encoded in base 64. So this can be quite easily decoded.
Research question four. To which countries do the IOT devices and apps communicate and establish connections, which can then give an indication where the data resides, this was to identify a destination that the data transverses. We found that overall 26 of the 32 devices terminated in the US as shown in this table. This is unexpected as the closest data Amazon data center obtained data sensitive to our test bed is in UK is actually an island.
So this shows the flow of the traffic to the top 10 countries, the height of the bands corresponds to the bytes sent by each device. Overall, we found that 75% of these IOT devices sent data to multiple destinations. From an investigator point of view, this is interesting, as multiple destinations and jurisdictions can potentially cause delays in getting access and security in the data.
Next, we examined a device that contacted the most destinations. We found that the light bulb – one of the light bulbs actually contacted over 28 destinations. When compared to other devices such as smart cams and smart hubs, we found the light bulb had limited features. So it’s surprising they contacted that many destinations.
So Tool Creation. Although our results can be found using separate open-source tools, it would require an investigator considerable time to actually manually extract the data. So consequently we developed a tool called IOT Network Analyzer, which was developed in Python to automate a process.
The tool has the following four main features. It has the entropy calculation, so it can identify all sessions in clear text, using a Shannon entropy test. It can locate the data by extracting the source and destination IP address. It can make the assumption of the geolocation, the usage of the secure ports. So first a list of ports is created with a total number of occurrences for each port. This list is then checked against a predefined list of 22 secured ports. Finally, it can then extract, the tool extracts, any clear text information it finds and the network packets.
So this is just a quick demo of the prototype tool developed called IOT Network Analyzer. It’s, we’re using it to analyze a PCAP file, captured from one of the IOT devices on our test bed. It has several capabilities. One of them is a command line interface with that. It’s a very interactive tool.
So first we have to load the PCAP file that you want to analyze. The user can then select various parameters. So for example, they use select G which lists all of the IP addresses and a geolocation data. If, if the user wants to select E, it displays a list of the calculated entropy values. And if the user selects CN, it lists all connections between the source and destination and if the command PO is used it overviews of all the ports used, and occurrences of that port.
Finally, we carry the evaluation of the tool, tool’s four main features and the performance of the tool. We used five PCAP files that we knew had clear text data and five PCAP files with no clear text data. So testing entropy tests the entropy feature we evaluated, we compared our results to a similar tool called ENT and they provided accurate identical results and location of data.
We used an external IP stack service for detection called IPSTACK. So it was for detection and location, and we compared it to two similar services, IP2location light and Geolocation 2, which showed very little difference between the databases, meaning any of these databases would be equally suitable. We tested the feature and the uses of clear ports. To test it actually with the results of this feature, we compared the results to a similar tool called Tshark, and we showed, both tools showed identical results.
In terms of clear text extraction, we evaluated, we compared our tool to Network Miner and with our tool identified the same key PCAP files, which contained the identical results. They both showed the same clear text information.
Finally, we tested the performance of the tool with respect to processing speed, central processing unit and memory usage. The results from processing a PCAP of sizes which ranged from 500 to 75,000 kilobytes are shown, and the results show that it takes around one second to six minutes. CPU and memory evaluation shows that the tool is not resource intensive as it consumed only 60% CPU and has a very small memory footprint.
Finally, our conclusion. So out of the 17 devices, we found only one device with remote access to smart cameras had closed all their ports. We found the majority of devices encrypted the network traffic. Devices that sent data in clear text were mainly smart cameras and smart healthcare devices, with smart healthcare devices exposing the most personal information useful for investigator to identify features of a person of interest.
We found seven of the 13 mobile apps allowed a proxy connection. Unexpectedly some of these apps when opened would take snapshots, which included a timestamp and a URL link to the JPEG snapshot. While the majority of data we found was sent to the US, despite our test bed being based in the UK, and the second data set collected in Australia.
Thank you for listening to my presentation.