Katrin Franke discusses her research at DFRWS EU 2018.
Katrin: What I’d like to share with you is computational forensics – this is the field I’m working on officially, with the title for more than 10 years, unofficially the term was just not born I think more than 20 years. Using machine learning, pattern recognition, computational intelligence to advance forensic sciences; first, physical evidence, documents, handwriting, banknotes, passports, now the last ten years, digital evidence.
Alternative title, ‘Artificial Intelligence in Digital Forensics: What is behind the buzz?’ And I’m saying this a little bit provocative, because currently, there are so many talks about artificial intelligence out there that I am getting sometimes really on what is promised and what expectations are raised. In my presentation, I would like to take the chance to give you a little bit more deeper insight. So, I assumed that 50% working already with this, but I also assumed that the other 50% are a little bit scared on what is coming next.
So, keynote is usually teach us something and show us something exciting, so I hope I push all buttons.
There is Norway. [This we have been, we have] NTNU. NTNU is a technology university. 33% are natural sciences and technology, which is quite unusual for a university. This is the number of students. If you think, however, that Norway is a very industrialized country, mainly benefitting from oil and gas, shipping industry, and providing lot of manufactory – for instance, German car automotive industry buys in great extent in Norway – then you may understand why Norway has so much education and research, also, and technology. And then, of course, given the distances in Norway, digitalization, everything is [automised], communication technology – has a long tradition.
What [are we] doing? Digitalization is also a buzz, everybody talks about it. What we see is that there are great forecasts and promises of the impact of digitalization. 10%, for instance, increase in digitalization leads to 0.75% in forecast in growth of gross domestic product for Norway. And at the same time, there are [enormous] losses due to cyber incidents. For Norway, only the reported numbers is approximately the same size. So, we have to do something. And this is the mission that we have been sent out, supporting digitalization and making it better secure, in critical infrastructures, in health, in finance, and also protecting national security and defense.
We are affiliated with the Department of Communication Technology and Information Security, it’s 80 full-time employees and about 131 bodies. If you compare to other universities where you have a digital forensics or computer science department, it’s hardly at its size. And we are communication and information security only.
We support two master’s programs, two PhD programs, and 45% of our budget in the department comes from [external]. And then, of course, research activities, active in communities like [Oisorix], which is also quite famous information security conference, which was held in Oslo last year.
We realize that it’s not enough to be among academics also, so we established close collaboration with public and private sector, and we established the Center for Cyber and Information Security, which has now 26 partners, who are all actively contributing financially to the research and education in the center.
And then of course, thematically, our research is organized into groups, typical academic groups. But for instance, in our group, there are also [cops, now I say] special investigators from the high-tech crime unit or economic crime unit in Norway, active members.
The model for this and credit for 2Centre originally established between UCT and Ireland and France, we wanted to become a member, it was not possible since we were only Norway-associated members. We like the model, we support the model, and when we got the no-go from European Commission, the police commissioner of Norway said, “If Europe does not pay us, we do it ourselves.” And he funded three professorships over ten years, which was the kick-off to run the digital forensics group [in this great] extent, and with the resources we have today.
And as you see, the topics are traditional digital forensics topics, but already, in 2013, the strategic decision was taken to support machine learning and computational intelligence, to support forensic sciences.
The group are today about 26 members. We have still positions to fill and we are quite active in different domains. I will elaborate later on. Our focus is on technology, digital forensics from mobile-embedded devices up to computing algorithms and [algorithmary] liability.
Funding of course is important. I would like to particularly point out the joint educational program with Police University College, which is in collaboration or inspired by UCT educating police officers in their primary educations.
This is the team, and I put in particular the second one – there are two pictures or two faces you should have been recognizing if you attended yesterday. In the last row is sitting Jens-Petter and Gunnar, who have given their presentation yesterday, both on embedded devices.
On education, we have the full range of education, from bachelor, master, and PhD level, and as I said, in collaboration with Police Academy also, lower levels for retraining police personnel to deal with digital evidence.
Teaching is important, research is fun. Research agenda to warm up – computational forensics, which I will elaborate in detail today; cloud forensics, cybercrime investigation, so two of our team members will be for digital forensics research workshop in [the US] and present their work there; economic crime investigation of course, crime happens where the money is, so this is important; and more and more, mobile and embedded devices, I just mentioned the work from Jens-Petter and Gunnar.
We are technologists but of course we have close collaborations also with people with legal background, and information security management. For example, we work closely on a project [with Aprisio and Marie-Angela] in essential to educate technologists and legal personnel in [09:23] trainings network essential.
Large-scale investigation, from a technical perspective, [meet] us more and more. We have increasingly large amount of data that needs to be analyzed. It’s widely distributed. Needs to be enriched from open sources from internet or from darknet. If there are, for instance, investigations, it will be also mobile devices, computers, whatever can be found. Or, in the case of economic crime, there are also traditional paper documents.
In order to deal with the massive amount of data, human capacity is simply not possible anymore. We have to bring in computing support in order to deal with the amount. And everybody who works, I usually say, in the basement, knows tight operational times are always there. Cases need to be pushed out quickly, so any let’s say machinery support is highly appreciated. The question is how can we apply those methods in the most sensible way without manipulating or falsifying evidence?
I would like to remind you that when we are talking about large-scale investigations, we differentiate between two fundamental types. One type is it’s found, for instance, in the regional police district, where we have the typical child … now I say “typical”, but … where we have child exploitation, where we have drug abuse, robbery, murder, and so on. Usually, there are a smaller amount of … or comparable small amount of data to be analyzed. But the number of cases and the cross-analysis between cases contributes to more and more increasing amount that needs to be handled. On the other hand, we are talking about large-scale investigations as they usually occur in economic crime investigations. Where a single case ends up in terabytes of data that needs to be only one cases and where the cases usually take years, at least one year and more.
Examples for this and what is public is, for instance, Enron email corpus, I think everybody knows in the room, where 160 GB became publicly available after investigation of the company, which ends up to 1.7 million email messages, which is originally real-world email communication in the company. The other amount of data as an example, which is also very known, are the Panama Papers. It was in the press everywhere. Which ends up to 1.11 million documents that are 2.6 TB of data. And only to analyzing those data, huge human effort – 376 journalists have been working. And I don’t think they went through to all of this.
I think there is not needed more any motivation that something needs to be done and that we cannot do it manually any longer. Comparison – this is a previous leakage from cases, Wikileaks for example versus Panama Papers. I was happy to find some international case studies – how can we specify case numbers? Normal cases have, for example, around 100,000 documents. Large cases I am told 1 million. Very large cases 100 million. And they called it [laughs] “Ridiculous” cases, indeed, these are more than 100 million. Ouch. What to do with this?
And Okokrim was so nice to give us some real numbers. Their largest case currently is 20 times larger than Panama Papers, around 220 million documents that end up to 52 TB of data. How can we handle this? And how, too, can we handle it correctly?
And remember, it’s not only economic crime that produces so much data. In the Internet of Everything, as we say today, we have electronic data, digital data, and all types of systems, traditional [scatter] system, industrial systems … Okokrim, for example, is also responsible for environmental crime, and investigates the fish farms in Norway. So, the [15:22] that you eat in Italy comes most likely from Norway. And there’s a huge insurance fraud, for instance, by trying to cash in funding from the insurances for fish that is lost. But surprisingly, the fish machines, [the feeding] machines have been working. And they are proprietary systems that produce data that needs to be analyzed, and need to be, for instance, cross-correlated with the buying of fish food and the invoices of this.
Computational forensics. How too can we tackle it? First of all, I would like … this is now the teacher in me. I would like to make a clear statement. My definition in the difference between forensics and criminology. Forensic sciences are based on the natural sciences, which apply different methods from the different discipline in order to identify artefacts, analyze them, and provide evidence in order to reconstruct what has happened. In forensic sciences, we do not develop theories about crimes. It’s only fact, purely fact-based.
Criminology, on the other hand, studies crime as a sociology, with a background from sociology and the social sciences. And they have hypotheses and theories, what is the cause for crime, how it can prevent it, how does it spread, and so on.
In everything what we are doing so far, we are staying fact-based, [funded] in natural sciences – forensic sciences.
That was the wrong button.
And everybody knows, I think in this room, what forensic sciences is, but I highlighted some words. And this is: within the objective to investigate and reconstruct, we need to collect and analyze trace evidence. And then, we are doing: identify, classify, quantify, individualize.
If someone who has never heard about machine learning, pattern recognition, may say, “Of course, this is what we do.” When I saw this the first time, I said, “Oh, perfect. I’m from machine learning, pattern recognition. This is what we have the algorithm for.”
The question, is this what forensic experts usually do? Can we automate it, or semi-automate it at least?
What are the challenges? Of course, tiny pieces of evidence, chaotic environment, we have to deal with incompleteness, we are looking at abnormalities, specific properties. This all, what normal, traditional, mainstream pattern recognition machine learning is not doing. For instance, if I try to predict this book – people who read this book also read the other book. This was not what we are interested in, right? We are looking for this one person who has this very strange interests, for instance, in reading books. And we are not interested in all the others who do the same. And by dealing with those particularities, as I say it, in forensics, we have to be objective, we need to be robust, and reproducible. So, Monday is the same decision as Friday’s also for our algorithms.
Our motivation, when we are talking about computational forensics, is to assist basic and applied research, to establish and prove a scientific base for forensic disciplines. And some of you may [hear it ringing] – yes, we work towards supporting [19:52] criteria fulfillment. And then, of course, if we have a scientific basis, then we can assist a forensic expert in their daily casework. And we do not want to replace forensics experts, but like to have an inter-cooperation between human and machine.
Computational methods out there. Long-studied in this field of computing sciences, but how can we bring them efficiently and reliably in the forensic domain?
We would like to define computational forensics as an in-depth understanding of a forensic discipline; evaluate a method basis; and provide a systematic approach. In general, we apply modelling, simulation, analysis, and recognition.
In differentiation, please remember, computational forensics can deal with any type of evidence. It can support the traditional disciplines – chemistry, biology – whereas computer forensics deals with digital evidence. And here and now, we want to apply computational methods to digital evidence.
And now you’ll tell me, “Katrin, don’t you know there are all these cool tools out there already? Why do you bother?” Well, yes, there are a lot of tools out there. There are a lot of companies developing new software to support forensic casework. Do you know what’s in the box? Do you know when it works and when not? Can you explain it to the judge and the jury in a court of law? My point is if we like to more and more apply computational methods, we truly need to understand it. And there need to be forensics experts who know the [inside] and who can challenge. Because a computer may be a new witness in the courtroom, and may be challenged like any other expert witness. But for this, we need to understand what’s under the [22:24].
In this list from [Gardner], there are three tools not mentioned, but I’d like to mention here – [Squell, Hansken, and Palentia]. Those are currently the hot tools. I’m not going to talk about tools. My focus is on algorithms. And for everybody who has not been working with algorithms, let’s look how machines learn and how machines identify. Machine learning, pattern recognition 101.
First, machine learning, pattern recognition are very old disciplines, developed independently in engineering and computer science. Engineers try to define patterns – for instance, for quality control and production processes.
Computer scientists try to create a machine that can learn like human and can behave like human. At the end and over the last 20 years, both disciplines merged, and around 10 years ago, businesses picked up and coined the term ‘predictive analytics’. More or less … you want to challenge me? [laughs] Maybe even earlier.
More or less, the technologies that are behind are similar, the purposes are little bit different. The primary goal is to perform classification, either learning by example or without examples, and to establish small intra-class variations of those patterns, and large intra-class variations.
[Simple methods] – and this is really 101 – in supervised learning, at the time of system design, we need to define the classes, we need to describe the classes that are then later being used to allow us to, for instance, match a pattern characteristics. And those classes are mathematical models, formulas or similar.In unsupervised learning, we just look at characteristics, and I usually say compare to playing puzzle. You start up finding the corner pieces, the edge pieces, and then you go about [25:24]. Similar, it’s done in unsupervised learning.
Machines learn by example, so any machine now is more or less as good as it has been trained, either by a human operator, giving him a description of pattern, structured rules, or patterns have been presented and the machine adopted their own parameters. And I usually compare this, if you remember an old receiver with [an F], you were turning the buttons in order to receive the carrier wave. This is approximately the same idea, where we [tune] parameters in order to capture some patterns that are of same characteristics.
Machine learning is a well-defined task. It has an objective, it learns from experience, and there is a performance measure in how good the task is being solved. And usually, it’s done in iterations. Several presentations of patterns need to be performed in order to tune to parameters.
In order for machines, we need to represent the real world into a machine-understandable way, and there the challenge kicks in. We need to describe our phenomenon with, for example, numerical parameters.
And now the question is: You, the main experts, how do you describe malicious malware behavior, or malware behavior in numerical terms, so that a machine can learn malicious behavior of malware?
The good news is: even if there are these intelligent machines out there, and artificial intelligence, the main experts, forensics experts, malware experts, are still in urgent need in order to prepare those or describe those phenomena. Don’t answer me now, but now we have deep learning, you [don’t need] us anymore. I will tell you we still need it. And in particular, if you need to go to the judge, you cannot bring your deep neural network with you. How do you explain?
Forensics expertise need [incorporation] so that we can make machine learn. And then, the challenge is how many parameters do I need in order to describe malware behavior? How many parameters do I need in order to describe package interception, for instance, for intrusion detection systems? What is it? And there is a simple [mass] task. Do you know the solution for this problem?
Numerical parameters help us to describe. And depending on how many numerical parameters we have, the more classes we can differentiate. For example, if you only have two parameters or one parameter, you can only differentiate in one class. You cannot differentiate in 10 malware families. You need a little bit more. So, the crux for successful machine learning is the description of the phenomenon [with] parameters. How many? The answer – it depends. The right ones. But what are the right ones? It depends from the task. So, the take-home message from me is everybody can solve this little examples, the [toys] examples, manually by themselves, and then think about how to turn your forensics problem into numerical parameters that can be represented to a machine learning algorithm.
How often do I need to present my example to the machine to remember? If you have [30:38], it’s like with children. We start easy. Small problem, well, we maybe need to present it only 100 times. Big problem, more complex, 60 features, 100 features, 1000 features, maybe 10,000 times. And performance measures will help us to determine this. What you may … let’s say, may recognize slowly, it’s not just [buying] the tool. It’s understanding what is behind, in order to be able to pick up and to assess whether your machine could be [able to] learn at all.
This you know. Identification [versus] verification. Identification is the traditional SQL query in the database, give me pattern, or give me query with attributes as follows. But what are we doing if you have no SQL or if you have more dynamic parameters? The other one is verification. I usually took the example of signature verification, in typical forensics problem, forensics signature analysis, where a forensic expert needs to [infer] whether two signatures are written by the same person, genuine or forged. How do we teach in machine? On the parameters and then on the comparison. How many [tolerance] do we allow?
Altogether, there is pattern recognition now, this is a 101, process that allows us to systematically describe the different steps, and this is the same as you do in the forensics process, where you have acquisition, pre-processing, analysis, linkage … similar pre-processing steps exist for machine learning. So, there is a nice analogy where you can go between the disciplines, back and forth.
The typical and the most easy examples are related, for instance, to template matching, and here, I would say if you have a blueprint from a [chip] where you don’t know, and in your records there are other blueprints, then you just try to match the wiring or the [boundings]. This would be template matching, the first one. Structural pattern recognition would be where you describe, for instance, the linkage between individuals or the linkage between connections for phone calls and so on. And the last one is very much where you count the frequency of particular properties, and use, let’s say, adaptive mechanisms like on our radio to adapt parameters to separate between classes, purely based on statistical properties of the parameters.
And then, it’s going deep. Only for statistical pattern recognition, there is a nice [survey] paper, it’s from 2000, but it’s a classic in the domain. There are nine different methods for feature projection. So, many of you may have heard principal component analysis – well, this is just one of [them, linear discriminant analysis is] another one, and there are seven others.
Feature selection methods – seven in the literature, feature selection is still an active area of research, permanently there are new methods that are, for instance, better optimized, that are not heuristic any longer. There is something happening.
Learning algorithms, classification methods, and then also the fusion of classifier schemes. It’s a whole world by itself. And tools one buys may have implemented something for you that is easy, but can you trust that they use the right tools? We hope.
What I’d also like to point out is that all the methods of statistical pattern recognition have been originally implemented for rather simple tasks. Well, 20 years ago, our computers were not so powerful. Maybe IBM Mainframes, but who had them or had access to them? Now, we have ever-increasing computing power, we have the opportunity to have advanced computational methods, and data-driven methods, because we permanently produce data. And with this, also the type of algorithm needs to change. And the effect that you have been experiences and hearing now everywhere, deep neural network is exactly one effect of this.
Another example what I just like to mention, because it’s so popular in the forensic community – everybody likes regular expressions, everybody uses them, and sometimes I think, “Oh my god, are they able to encode all the variations that they want to encode? Wouldn’t it be nice to make it a machine for us? So, yes, it’s possible.
[State] machines are very much able … [37:00] state machines are able to support us. We just need to get our hands dirty and work on it. And then, I say forensics experts don’t need to analyze those algorithms, or, let’s say don’t need to develop the algorithms. But it’s very fruitful if forensics expert work with computer scientists hand in hand, in order to tune those new algorithms to be practical and applicable in the forensic discipline.Theoretical foundations. Something you don’t hear usually in [commercial] products. Ugly Duckling theorem? Who was able to write when it was formulated? ’96. Ugly Duckling theorem is one of the fundamental theorems in machine learning, pattern recognition, and it says that as long as I have no task at hand or no prior knowledge, all attributes or all features to describe a pattern are equally good.
And my example in class is usually: If I want to differentiate or, to say it the other way around … the other way around is it doesn’t matter how many features I put into the pot [to do] my classification, I need to select features that support my classification task. And the example I give in class is: If I try to differentiate male and female, there is no point to count arms, fingers, legs, and so on. It is important to get the right features. And now, [translate this] – what does it mean for your malware? What does it mean for intrusion detection? What does it mean for image analysis on your hard drives? Which features are those features that support the task? Ugly Duckling theorem. And any company does not reveal their features to solve the task should be forbidden to sell their products.
The next one – No Free Lunch theorem. There is not such a thing like a perfect algorithm. As long as we research, we will not find an algorithm that solves all pattern-matching problems. It’s a mathematical proof by Wolpert and Macready in ’97. For each algorithm, we can create a problem where the algorithm fails. Now, translate for forensics.
There is no guarantee that nothing slips if you use machine in an investigation. Eventually, your features and your classifier are unable to solve the particular task for the particular case you have. If people get tired about [40:41], you need to test, it shall be peer reviewed, [there need to be standardized datasets]. This all becomes even more important if you introduce computing technology in forensic sciences. Because machine learning, pattern recognitions are developed and tuned to fulfil a particular task. And the particular task under a particular condition only. And we have to assess under which condition this algorithms work.
So, if you remember only two things from my presentation – Ugly Duckling and No Free Lunch, please.
Data science – it can be complicated. So, data-driven approaches – computational intelligence, soft computing, evolutionary algorithms, neural networks, fuzzy logic. There is a lot of more algorithms developed between ’80s, ’90s, 2000s, to deal with the increasing amount of data and to learn from data and adapt data. And being also more soft in their decision than a hard threshold, like in statistical pattern recognition. And those algorithms are inspired by nature, but by no means try to simulate human nature, like reasoning or decision-making. That’s why you will hear from me only computational intelligence and never artificial intelligence.
What do we mean between hard computing and soft computing? If we have a decision tree – this is something everybody can easily follow – then you have [crisp line to the leaf], where you say “This is this type of pattern”. If you use fuzzy logic, there you have a fine-tuning, you have a function, you have particular patterns only to some degree. And a little bit to the other. With this, we can be smoother on the edges, we can be smooth in our decision-making. More [fuzzy].
There are more specific challenges when we are dealing with computing in forensic sciences. The first thing – there are many algorithms that struggled with performance, and they introduced some randomization in order to find approximations in their solutions, heuristics. The typical is, for instance, random [Monte Carlo] method, where you try to optimization problem just by trial and error. You find a solution eventually, but there is no guarantee that you will find exactly the same solution two times. It’s a little bit different.
How do we deal with this in forensics? Do we go to the judge, “Oh, I got this result last week. If I show right now, it might be a little bit different”? Assume we use heuristic methods for feature selection, just by trial and error, and we get, on two different days, at two different times, two different sets of features – oops! So, careful. If you look at algorithms used in forensic sciences, even if it takes longer, take an optimal or deterministic algorithm over an heuristic algorithm. Then you know what you get, and it’s reproducible.
Second – outliers versus normal. In forensic sciences, we are very much interested in the outliers. Many of the shelf products deal with general characteristics, outliers – so this one data point that is up there, ah! That’s [noise]. We can filter this away. Not in forensics. This may give us the lead.
So, look into the algorithms or look together with a data scientist in the algorithms that you get, so there is no filtering out from outliers. Because those may provide you the indication for your case.
Next problem – imbalanced datasets. If we are developing machine learning algorithms [or tune] for solving our problems, we need examples. Remember, machine learning – machine learn from examples. What do we get? Usually, our datasets are very imbalanced. There is a lot of normal traffic, and then this is this one incident. How do we deal with those highly imbalanced datasets? How do we deal with that at the time of [system design], we don’t have available potentially characteristics of an attack?
And the last one, and this I mentioned earlier – what are we doing if we use this fancy new deep neural network and we are going to present our expert findings in a court of law, and we are asked to explain “how did you come to your solution”? Will you tell the judge and the jury, “Oh, I have this deep neural network. It has 1000 input nodes, it has 10 layers. Each layer has about 100 nodes. And then I have these three output nodes. This made my decision.”
Sorry? Can we translate it in something that is understood by a human? Yes, but it costs us some work.
Quickly, some highlights on how to challenge this. Whatever I am telling you now, we have solutions, but I think this is a work in progress for everybody, for the whole community, and we need more research and cooperation in order to further improve it.
First example related to economic crime investigation. I talked about mobile money. There are big telecom providers in Norway who have large datasets in mobile money, and they have a huge problem in fraud and mobile money, and they give researchers even real datasets to learn from. So, what we did is we used analysis by synthesis, we used this one dataset that we got, we created a simulation engine to systematically … first, to understand fraud patterns in the dataset; second, to systematically create fraud pattern into a clean dataset. I mean, it’s the same what you would do when you ingest malware in a sandbox environment, same we do with fraud for financial. And we have established a financial fraud simulator, which is open-source, publicly available. And we are also … for those who are interested, the datasets can be downloaded from Kaggle. Kaggle is a platform for machine learning, pattern recognition competitions.
Now, we have been … four weeks ago, it’s official, we have received generous funding to establish a national cyber-range. Yes, there are cyber-ranges usually in the basements in homeland security organisations. We have received funding to have a smaller cyber-range, in particular for research and for cooperation with industry. If we have a cyber-range, a real one, then we are able also to produce simulations in those, collect evidence, and use it for later analysis. And our studies from financial fraud and [50:15] will very much help us stay tuned.
Another example I like to give is in threat intelligence. Cyber threat intelligence – very much to provide intelligence before incidences occur, and for this, create attention also to dark web markets, where, for instance, malware samples are traded. For this, we cooperated with one of the main security providers in Norway, who provided us also with datasets. The objective was, for instance, to establish trends and … yeah, security-related trends in dark web forums and what is … for instance, where are password databases traded, where are other credentials traded, for [service], and so on.
And without going into, let’s say, algorithmic details, I’d like to point out that we were very much eager to understand the performance of deep learning. Because everybody talks about it. And we are wondering, is it really so good? And according to No Free Lunch, we should be able to break it. The message is … deep neural networks, also, if we are using, for instance, pre-collected models from Google, have their, let’s say, performance … and this is … now I try whether I find … yes. This is convolutional neural network. This is the performance. And you see, everything has quite a good performance.
But unsurprisingly, with the right features and a classic support vector machine algorithm, you can achieve quite exactly the same recognition rates. So, there is not necessarily the need to bring this huge machinery of deep neural network to solve your problems. If you understand your problem, if you choose the right features and understand the classification task, then also, simple algorithm [do]. Simple – I mean, I left at a time when support vector machine was new, and everybody was in the same hype like they are today about deep neural network. It’s just a circle going around. So, this is the proof – No Free Lunch, there is no superior algorithm.
Example for threat intelligence – this is an ongoing project where we cooperate with a major security provider in Norway who provides threat intelligence first for the financial sector and then it’s handed over to federal police and Europol.
Dog food – you know what dog food is? One of the major … soft coders usually know, software engineers know. Dog food is one of the … I forgot the name … producers of dog food said, “Our product is good if you dare to eat [… you, producer of dog food], if you dare to eat your own dog food.” So, we use dog food – using our own algorithms to support security operations of the whole university. The section from digital security is working closely with us together, and we have the opportunity that some of the PhDs are security-cleared and can work there and test new algorithms.
Last example, malicious code – malware analysis of course. This is a little bit an older slide, only to show you the different approaches – static analysis, dynamic analysis – this is cookbook information. We have been quite studying intensively dynamic analysis for information-based dependency matching, so where we follow the entire [flow]. And of course, PDF analysis, because 2012 Norway was under major national attack with infected PDFs. So, they are our friends from the basement, had a huge interest.
An example that I picked for now is malware classification, multinomial static analysis, in order to identify malware categories and families, which is particularly important if you look for malware analysts. They don’t want to … everyday, millions of samples popping in, and they don’t want to analyze those malware samples that are known, they want to do the sandboxing of malware samples that are unknown. And how can we quickly filter out known malware samples? If they have no, let’s say, EC signature.
And what we used there is … because we have all these obfuscation techniques … is fuzzy logic. And on one hand, to deal with, let’s say, approximate matching on the patterns, but also, to get rid of [crisp hardcode] formulation. Like, for instance, if you want to warn someone there’s a weight dropping on you, if you tell him, “There’s a weight dropping with 105 tons in a velocity of …” boom! So, you better said “Watch out!” and he jumps. So, in the same, we would like to achieve if we are communicating with lawyers or our forensic experts.
And there’s … according to my knowledge … please tell me if there’s something better. One opportunity out there that allows to have the advantage of explorative data analysis with neural networks, and the creation of rules that are human-understandable, one method that can combine those two advantages, this is neuro-fuzzy. If you use neural networks for statistical data analysis first, and then transfer those, as I said, 1000 input parameters, 10 times 1000 hidden layers output, create this into rules, if-then … malware, if this and this, don’t know, benign. Simple, understandable rules. And we used neuro-fuzzy – neuro-fuzzy was developed in the ’90s. It was quite unpopular because it couldn’t compete with algorithms like support vector machines, so it was a little bit in decline. But thinking about the problem, forensic needs something human understand. So, I thought let’s pull it out again. We know it’s not good, and let’s look what happens. Let’s take the algorithm apart.
And what we realized is … this, I had the combination, and what we realized is that in the transfer from the neural network output to the fuzzy rules, there are some very inaccurate estimations of so-called fuzzy patches. Very machine learning – I don’t bother you with this. And just … I go one slide … this is the slide.
And originally, we wanted to do something good for forensics, to help create algorithms that are human-understandable. No algorithm was there, but [badly working] neuro-fuzzy approach known in the literature. And by taking the time and the effort to understand what is going wrong with the algorithm, we found out that those estimation of those patches that describe little data clusters was too inaccurate, and it could be improved. And by doing the improvement, we are able to provide a completely new methodology to the forensic domain, and yes, we can also combine it, and this I had … earlier … sorry, and now running … and combine it with all the advantages of deep learning.
So, there is hope to deal with complexity. There are methodologies to turn complex parameter output into human-understandable rules. But still, my message is: Watch out, many things can go wrong, and we don’t want that, for instance, promising technologies are taken out just because someone didn’t know how to apply it correctly.
With this, I … I have much more. [chuckles]
With this, I go to my last slide, [lesson] summary. The admission of computational forensics. With computational forensics, we can increase performance in forensic sciences, efficiency, and effectiveness. We can provide datasets for benchmarking, for assessment data, and we can implement new methodologies, standardized procedures. My personal dream is to have computers as expert witness in the room, and being challenged like any other expert. In order to get there, of course we need more training, we need better understanding, and we need cooperation between data scientists and forensic scientists. And I don’t think that any hardcore forensic scientist will be replaced by computers. There will be a synergy between the both fields. And both sides can learn from each other.
Last but not least, introducing computing technology and forensic sciences in great extent requires also a look at the law. Forensics is done with respect to law enforcement. So, if we are creating a new technology to be used, then the question is: What are we doing if a computing mechanism goes wrong? No Free Lunch, Ugly Duckling – how to deal with this?
I hope you are more curious than frustrated. And I am happy to look forward talking with you. Thank you.
[applause]End of transcript