Automatic Classification Of CVE Items And Cyber Security Articles

Tianyi Wang: Hi, I’m Tianyi Wang from the university of Hong Kong. Today I’m presenting my extended abstract automatic classification upon CVE entries and cyber security articles. 

Common Vulnerabilities and Exposures (CVE) is a famous cyber security vulnerability database, often referenced as a standard in cybersecurity territory for both research and commercial purposes. Common Weakness Enumeration CWE) has provided useful vulnerability taxonomy on CVE entities serving as a baseline for identification and classification on cyber weaknesses and vulnerabilities. 

In order to achieve maximum utilization of how CVE entries and the CW categories, National Vulnerable Database, known as NVD, performs analysis on CVE entries that binds CVEs and CWs. And here is an example showing that the CVE entry has the corresponding CWE ID 20 improper input validation. However, the generation process of CW categories is totally by manual working. And this has made cybersecurity professionals suffer from the unpredictable timing, waiting for the up-to-date information to be published.

Therefore we propose automatic classification models that assigns CW IDs to unlabeled CVE candidates with the adoption of CVE entries that have been labeled with corresponding CW IDs. Actually there are a considerable number of researches that have been done upon CVE entries. However, to our knowledge, only one published article was found attempting CVE classification with the help of CV entries and the corresponding CW categories. This study proposed Naïve 

Bayes on CVEs from 1999 to 2016 and performed vulnerability types classification for the top 10 CWE categories in terms of CW frequency and acquired a 75.5% accuracy. Due to data imbalancing, we propose automatic classification model upon CVE entries of the top 10 CW categories in terms of the CW frequencies. We propose three different deep learning, deep neural network models and compared their performances. 

The first one we propose is the bi-directional LSTM. We applied embedding layers with the LSTM and followed by the dropout layer, fully connected layer and the softmax. The output has a dimension of 10, which corresponds to these10 CW categories. After training with a hundred adults with the learner rate five times 10 to the minus five, we obtained an accuracy of 75.85%, which is slightly higher than the existing work. Here’s the loss curve of our model and we can see that it converges.

The next one we propose is CNN. We did the same thing by adding a embedding layer, dropout layer, fully connected layer and Softmax.  More specifically for the CNN part we used three kernel layers, which the kernel sizes are six times six, five times five, four times four, respectively. As a result, after training with the same hyper parameters for the CNN, we got an accuracy of 82.45%, which is much higher and here’s the loss curve, which also converges. 

The third one is the Bi-directional Encoder Representation from Transformers known as BERT. It is a neural network based technique for natural language processing  pre-training. We used the four adults as suggested by the original paper, then we got the accuracy of 83.73%, which is the highest. And just a reminder, the existing Naïve Bayes claimed to have 75.5% accuracy. As a result, the Novo BERT based model outperforms LSTM CNN and the existing Naïve Bayes work. Therefore it shows a potential direction for improvements that is improving the fine-tuned BERT. The world trend classification models can further be improved and adjusted to apply to real life threat intelligence related articles and reports for indexing convenience and performance evaluation in the future. That’s it. Thank you.

Kacper Gradon: Ladies and gentlemen, my name is Kacper Gradon, and I will introduce the notion of the so-called future crimes in relation to the practices of the law enforcement and intelligence services. 

For the purpose of my work, I’ll call the internet and related modern technologies the new battlefield where two sides of the struggle, the criminals on the one side and law enforcement on the other fight and compete for domination. 

The reason for the use or abuse of technology is quite simple. It’s a multi-purpose tool that is easy to use and inexpensive and enables maintaining a high degree of anonymity while reaching global goals. Additionally, due to generational changes, the so-called digital natives, new generations of offenders highly skilled with modern technologies will be entering criminal market in the near future. We can see a sudden shift in trends from the traditional notion of cyber crime to abusing technology with the effects in real life circumstances, from the preparation stage, until the completion of the crime, and the so-called future crimes deal with all aspects of criminal behavior from “regular” crime to organized crime, terrorist, violent extremism up to cyber warfare.

One of the things to look at is the Open Source Intelligence based attacks where the victim selection and monitoring or target selection is done online using various sources of information. One of the interesting new developments is the abuse of wearable technology to profile victims. One of the interesting cases is the recent Strava data abuse that was used to disclose the location of American military bases abroad. 

We can see recently that also the so-called pet wearables are used to track the victims or the human victims or their whereabouts. And generally speaking, social media information is most frequently used for by the criminals in cases like child grooming, child pornography, cases of human trafficking, also to groom and profile potential violent extremists. This is a quite well-known problem, but due to the decreasing price of 3D printing technologies, we can expect such crimes to be on the rise in the near future for the printing abuse, as well as the attacks on the Internet of Things, either the specific devices owned by specific people or the whole networks of IoT enabled devices, including the medical devices, which pose quite a substantial threat nowadays. 

In the near future in our analysis we also covered the potential abuse of autonomous or semi-autonomous vehicles used as weapons or roadblocks or even kidnapping devices, as well as the use of various types of drones for surveillance, as weapons for the delivery of drugs or for the disruption of air traffic control systems.

Military robots might seem to be a distant future, but the technology is getting cheaper so we might brace ourselves for potential abuse of such technology in the future. 

The next notion is cyber warfare with the specific stress put on this sort of information and misinformation used by rogue governments to attack democracies as in the cases of US presidential elections, meddling the Brexit or the whole armies of trolls, either humans or bots that are used to destroy the social fabrics of the societies. The artificial intelligence is a whole new chapter and the same goes with blockchain or cryptocurrency abuse used by the criminals worldwide for various reasons and objectives.

New horizons that we can expect in the criminal market will be the abuse of emerging technologies, such as quantum computing and the technologies that are entering the market such as 5G telecom networks that will provide increased capabilities for the criminal offenders or terrorists. 

So to summarize, those technologies make up these so-called “sum of all fears”, something that we need to get ready for. If you have any questions, please reach me on email. Thank you very much. And goodbye.

Leave a Comment