I vaguely recall an article in an old edition of the BT Technology Journal that BT ran trials, used or use (can't remember which one) as part of their data mining exercises AI for landline call traffic analysis and network investigation.
Data mining is an interesting one, another thought occurs with regard to the reassembly of file fragments …
I recently saw a demo of predictive coding in the ED area. As you tag docs, it starts suggesting other docs that are similar. The more you tag, the better its suggestions get. If it's used as an aid, then I'm all for it. If you start letting it run your case for you, then I'm against it.
Just wondering how comfortable you guys would feel about the use of Artifical Intelligence within the field of computer forensics?
It is nearly impossible to answer a question like this, simply.
First of all, there are many individual areas of research and applied technology lumped into the category of AI including, but not limited to, machine learning, natural language understanding/generation, decision support systems, neural networks. etc.
Each of these disciplines focuses on a narrow aspect of "intelligence" rather than the whole ball of wax. As an example, the system that I cut my programming teeth on, Internist-1/Caduceus, was attempting to model the reasoning processes of expert diagnosticians using a couple of different models of reasoning. It did this very well, but that was all that it did.
Certainly, we are nowhere near, today, the vision of artificial intelligence as manifested in the works of Arthur C. Clarke or Isaac Asimov and nowhere near the point where any AI system could replace a human reasoner.
However, much has been learned via AI research that can be applied to the technical problems of today. Bayesian classification systems are commonly found in SPAM and malware detection programs and appliances. Natural language understanding and generation is widely used in voice response systems.
Some of the techniques used in search engines and text classifiers started out as subjects of research interest in AI. It is probably no coincidence that the inventors of Google and Yahoo! came from Stanford University, one of the early pioneers of AI research and Lycos from CMU (another AI pioneer).
So, the simple answer is that techniques which were, once, considered AI are already in commonplace tools and there is no reason that they would not be found in forensics. I've often thought that malware detection would be a nice application for neural networks which excel at automated classification.
But I don't see it happening in my lifetime that HAL would be taking the stand as a rebuttal witness and wouldn't be too concerned that the output of AI systems would survive a Daubert challenge.
At least not in the forseeable future.
Yes! I also read some development on using subsets of information classification - thereafter the "AI" predicts with better accuracy how relevant an ESI is to the case, compared to an other set of humans. Was very interesting…
I recently saw a demo of predictive coding in the ED area. As you tag docs, it starts suggesting other docs that are similar. The more you tag, the better its suggestions get. If it's used as an aid, then I'm all for it. If you start letting it run your case for you, then I'm against it.
Yes! I also read some development on using subsets of information classification - thereafter the "AI" predicts with better accuracy how relevant an ESI is to the case, compared to an other set of humans. Was very interesting…
I recently saw a demo of predictive coding in the ED area. As you tag docs, it starts suggesting other docs that are similar. The more you tag, the better its suggestions get. If it's used as an aid, then I'm all for it. If you start letting it run your case for you, then I'm against it.
I did read recently an article that suggested that some AI was 90% accurate in responsive document selection compared to 51% with pure human effort. I recall that it was a vendor site though, so I question the objectivity of the article, but it does make you wonder what the future potential benefits are, especially with the large data sets I'm seeing these days.
As Sean points out, AI is kind of a grab bag of different techniques. The first differentiation that I'd like to make is the Strong AI aspect (creating, and debating, machine intelligence equal or superior to human intelligence) from the grab bag of algorithms and techniques which may be better characterized as "advanced search techniques". I won't get into the former; it's a conversation better left to armchairs and late-night brandies by the fireplace.
The advanced search techniques broadly break down into supervised and unsupervised categories. Supervised techniques, like neural nets, support vector machines, Bayesian classification, etc., work by trying to build a model based on human feedback (the feedback part is called "training"). Unsupervised techniques build models based solely off the data and are not subject to human bias. In general, the supervised techniques have better price/performance ratios, but often lack explanatory power. Unsupervised techniques often require more horsepower… and there's no guarantee that the model they build up will be meaningful. For example, decision trees are extraordinarily transparent, but there's no guarantee that the optimal decision tree would make any intuitive sense to Joe Investigator, let alone the Hon. Joseph Judge.
Predictive coding, which is quite common in the higher-end eDiscovery world, is an example of supervised techniques. The reviewers are building up training data from previously coded documents, and the model gets applied to new documents. The computer might be able to say, hey, I predict there's a 96% chance you'll code this document as Privileged, for example.
I think that supervised techniques work very well for helping rank and/or highlight interesting results (either in eDiscovery, which is mostly just traditional information retrieval, or in forensics). While normal human error is probably far more likely in any given case than a bad judgment due to supervised techniques, I do think it'd be hard to defend using supervised techniques as justification in a [perhaps somewhat idealized] courtroom.
Unsupervised techniques seem to offer a better path, potentially. For example, most of the unsupervised techniques are clustering algorithms, and they can help an investigator understand the data set pretty easily. In addition, you could derive a decision tree, perhaps a non-optimal one, from the clusters and then get court agreement that the decision tree could be used for filtering the data.
I've been working on these types of techniques for a while now, and hope to release more information (and tools) in 2011.
Jon
As others have stated, I am all for better methods and tools to do my job. However, I am dubious at best and resistant at the worst to AI as a means of doing my job for me or in place of any competent CF examiner. A person always needs to be in the loop to make the decisions and final conclusions.
Anything that could reduce an ever growing amount of information would be welcome.
The best stuff that achieves this today is the opposite - artificial ignorance
http//
can u plz suggets me some link that will hlp to find info abt use of AI in network forensic