Morteza Safaei Pour shares his research at DFRWS EU 2019.
Morteza: Hello and good morning – my name is Morteza Safaei Pour, from CTI Lab Florida Atlantic University. Today I’m going to talk about comprehending the IoT cyber threat landscape, and how we get information about IoT maliciousness and IoT probing campaigns.
So we heard a lot about the Internet of Things. There are many of these devices produced every day and connected to the network. We expect by 2025 that more than 75 billion of these devices [will be] connected to the network. These devices are ranging from network devices, IP cameras, smart homes, to cyber physical systems and serial control systems.
And usually, people connect these devices directly to the internet, or these devices use some protocols like ports and connect to the public network. It’s kind of interesting to mention that, if you have a brand new device and connect it to the network, it takes a few seconds to a few minutes to get infected by one of the IoT malwares.
There are some problems related to these devices. Most of them are because the manufacturers don’t take into account the security considerations; or they want to cut the budget, or meet the deadline. And also, maybe these devices have limited resources. So there is no security module on these devices; or the main problem is the update mechanism, or the patch mechanism.
Also, these devices are deployed widespread, all around the world, on different kinds of networks: they don’t have access to many of these devices, and security of one of these devices, the security of other parts of things. These devices have a heterogeneous nature of various vendors, firmwares, hardwares… so we cannot give one solution for all of them. And we have a serious lack of IoT-centric empirical data to analyse and get more insights about these devices.
There are some concerns related to the insecurity of the Internet of Things. For those who have heard about the Mirai botnets, in 2016, it performed one of the largest DDoS attacks in history, and after that we witnessed the [evolution] of this botnet, and also several other players, and other IoT botnets. That shows that they are getting more and more sophisticated, by employing… comparing to the first generation of Mirai, they use more powerful vulnerabilities and attacks, to gain access to the devices and infect them.
They use these devices to perform larger-scale DDoS attacks, spam attacks, and also for cryptojacking activities: running cryptominers on your devices without your permission; and several other activities. Also, they can use it for attacking critical infrastructure, not only by getting access to some industrial control systems, in a manufacturer; they can also control a large number of high-wattage devices to cause power outages or damage the power grid system.
They can be used as an entry point to some other parts of the networks, because network administrators don’t pay attention to these devices and because of the – as mentioned – the lack of updates and patch mechanisms, to patch the vulnerabilities.
Also, there are severe privacy concerns related to these devices. You know, these devices have many kinds of different sensors. In many cases they can watch you [and] listen to what you’re saying, [find] different information, using the smarthome systems and other services.
But the objective and the goal for this research was to have an internet-wide view of IoT maliciousness. We wanted to… because the first step for gaining information and for mitigation or remediation is to have a look on all of these devices working together. And as a next step then we want to find the campaigns of these devices working together to… based on the network traffic, to find the campaign of these devices.
But to achieve these goals, we need to… we have some challenges in front of us.
First is data gathering. These devices are distributed all around the world: how can we get information about all of these devices? And most of the time, the networks don’t want to share the information, because of different reasons.
Also, how can we find that these devices are infected, without getting access to these devices? In addition, we don’t know how to detect that these devices are IoT or non-IoT. And we don’t have any grants to test our models or approach.
For the data-gathering part, we rely on the darknet, or network telescope. It’s a collection of routable, allocated but unused IP addresses. It’s like a kind of collection of sensors on the internet; they don’t have any interaction, and they passively gather information and all the packets coming to them. And they contain no legitimate hosts and are never assigned to any machines, so there is no purpose to some patches or to have any interaction with these sensors. So all the packets coming to these devices are the result of some malfunctioning, or there is something suspicious related to them.
For this research, we used the CAIDA /8 Darknet, it’s about 17 million IP addresses, so it’s a good vantage point to all the internet.
When one of these devices get infected, they start scanning the internet to find new devices, new vulnerable devices; infect them, and join them to their botnet. And also, they are doing this usually in an orchestrated manner, dividing the task between several bots, several devices.
So when they are scanning the internet, they inevitably hit the darknet as well. And by processing the data coming to this darknet, we can gain information about these devices doing some scanning.
So we used this approach to analyse the darknet data and infer these probing and scanning activities, and consider as an indication of compromise to find the infected scanners, or infected devices. But the problem is, we don’t know which one of these devices are IoT or non-IoT.
So at the next step, you get the Shodan information. Shodan is a search engine for [the] Internet of Things; it crawls the internet, old IPs, and actively scans all of them, and indexes all the internet-faced devices and services.
So we get the Shodan database, and we correlate the information we gather from the darknet – the list of the scanners – and we find the IoT devices doing the scanning, and find the infected devices on the internet.
This is a big picture of the methodology, [how we] analyse [the] darknet. It’s more than 100 GB of data, of packets, per hour coming to the CAIDA. We analyse all the packets using threshold random walking to detect scanning activities. After that, we correlate it with Shodan and filter out non-IoT devices; we add more information about the geo-location: IP, country, and more information. So we can tag many different entities and network providers, and provide a database… prepare the database based on the different business sectors that gave us information about the manufacturers: healthcare, government, or other sectors.
For the next part, we want to find the campaigns of the devices working together. So we need to extract some features, some behavioural features, of a scan module.
First, we estimate the probing rate based off the packets coming to the darknet for each scan. We use the protocol TCP, UDP for the kind of scan. The scan type, which can be vertical, horizontal, or a mixture of these two. The scan trend, that could be IP-sequential, reverse-sequential, or a permutation. For this purpose, we used the Man-Kendall statistical test to see if the set of the target IP addresses, if there’s any trend in the data, and if there is a trend is it increasing [or] decreasing? And also, we use entropy to see how much is targeted. So we put them in different themes, and we calculate entropy to see how much is targeted.
The next feature is the dispersion. So for a scan event, we go through a window of packets, and find… like, if we have a 16 MSB constant bit, we put the dispersion call to 16. So it means how much the scope of the scan is narrow or wide. And also, the target port: it shows the intention of the scanner.
For the next part, we need to do the dimensionality reduction and clustering, to find the campaigns. But usually, for this work we used the L1-PCA. We know about the principal-component analysis, PCA, for dimensionality reduction, but it’s based on the… using some of the [square arrows]. So it’s kind of sensitive to the outliers or faulty data in a dataset.
So we tried to use the L1-PCA and introduce [an activity] approach because the L1-PCA has a remarkable resistance against faulty data, or any outliers in the dataset. So if there is a problem, like if there is some of the scanners that do not belong to any campaigns, they can affect our methodology; but with L1-PCA, we reduce this effect.
Also we propose the algorithm to find the element for the dimensionality reduction based on the L1-PCA, to be computationally acceptable. So we can make it 10 datasets, because we don’t have any [indecipherable] to compare our results. So we mix some information, like we know about the orchestrating probing campaign in 2012, so we get the… it’s from the Sality botnet, we get the information from that, we mix that with new information and with the features from the new data in 2018, hitting the same darknet, and also other scan activities as noise.
To compare how much our features and our methodologies are affected. So we compare the L1-PCA and the L2-PCA, and we show that the L1-PCA has more accuracy, gives us more resistance against faulty data or any outliers in the dataset.
So when we have these methodologies employed on the real data, we use it on ten hours of darknet data: it’s more than 1TB of darknet. And we detect about 130,000 infected IoT devices that actively scan the internet to find new devices. And they are widespread over all the countries, many, many ISPs.
But we were more interested to see the business [stores] that had these infected devices. We expected to see many of these devices from internet service providers, because many of these devices belong to consumers and users. But we see the existence of many of these infected devices in the government, manufacturer and healthcare. We are learning this.
Also, we sent automatic emails after inferring these infected devices in real-time, to alarm about these infections and inform the network providers about these.
This is the most known devices, manufacturer of devices, that get infected. As we can see, most of them are network devices, like MikroTik, or IP cameras, or environmental sensors. But we see some other devices, like time attendance and parking-related devices as well.
So using this methodology, we find more than 140 large IoT probing campaigns. There are several interesting campaigns between them, like the Campaign 2 that’s distributed all around the world, trying to find open telnet services.
Also, in Campaign 6, they’re trying to scan to find cryptominers on the internet. There [are] two campaigns that [are] trying to attack the ADB, or Android devices, and breach, and Campaign number 8 is scanning with a really low rate to try to hide from the detector. And the main campaigns, with more than 50,000 devices, that are scanning for ports of entry, telnet ports.
This is more information about this campaign. We see that many of the device manufacturers have been infected; different kinds of devices; and also different hosting sectors. Between them are finance and governments as well.
When we explored all of these campaigns, we realised that we noticed some other activity, some campaigns that [were] trying to scan for open resolvers that can be used for amplification attacks. So we went through these campaigns, and we found that many were trying to scan for MEMCACHED or CHARGEN, or for several of these open resolvers at the same time.
During this work, we provide a methodology to have a look into what internet-wide IoT maliciousness using the network telescope or darktnet. We consider scanning of these devices as an indication of compromise; we extract some features from these devices, and cluster them to find campaigns of devices working together to do something harmful.
We report on the big IoT probing campaigns. We detected a campaign with more than 50,000 members; also several campaigns are trying to hide, to scan with very low rate; or most of them belong to some specific vendor.
[In] the report also, we noticed many IoT probing campaigns searching for open resolvers to use them for amplification attacks in future. And for future works, we’re trying to do automated malware attribution, and also IoT fingerprinting, so that we can remotely say what’s the device type, what’s the manufacturer, and what IoT malware they are infected with.Thank you everyone.
Host: Thanks. Do we have any questions?
Audience member: I have a question. So you showed the features that you selected, and then you used L1-PCA, L2-PCA, but have you tried comparing their performance to the original dataset? So the one without principal-component analysis?
Morteza: You mean…?
Audience member: Just using those initial six features that you mentioned. Yeah, those.
Morteza: We tried that, but first of all, the size of the data is really big, so because we were trying to find all the… to consider all the devices. So it’s more than 130,000 devices and features. So it’s a problem from a computational perspective.
Also when the dimension is higher, usually this clustering method is not working very well. So it has done that, and then we decided to do the dimensional instruction, and then…
Audience member: But then, [isn’t] L1-PCA more computationally intense? Because you mention in your paper, well, your system, and well, it’s a pretty powerful system, on which you perform the actual L1-PCA…
Morteza: L1-PCA?
Audience member: [Was] it worth using that system, in order to still spend time, or just a straightforward use of original features? That is my question.
Morteza: Yeah. I mean, we compare it, we first used different approaches, like the same thing that came in. I didn’t provide the result here, but the result improved when we used L1-PCA or L2-PCA comparing to not using them. And L1-PCA is an improvement on L2-PCA.
Audience member: Thanks.
Host: Any other questions? So I’ve got a comment, or maybe a suggestion for future research. Typically, when we talk about IoT security, we’re talking about the protection of the device, and the security of the device. But these devices are in constant communication with the vendor servers, sending telemetry data, exchanging data, getting configurations, getting software updates, and that’s a bit concerning if you think about the security of the servers.
If you have a company that’s selling internet-connected toasters, and you sell one million toasters, if a criminal can break into your company’s servers, that criminal instantly has a one-million-node botnet. And I think it would be useful to do a bit of research on the security of these telemetry servers that are managing these… well, they are vendor botnets, basically. And that might be an idea for the future.
Good. Thanks a lot.