All,
I am developing my knowledge on helping my fellow investigators ways of saving time while doing analysis of the hard drive/network data captured to find the "bad files". I am trying to figure out how to best help them, fuzzy hashing, data mining techniques that have been integrated into the analysis tools, proven methodologies and processes. If you know of any current processes that you are using, most appreciated. Every organization has its own policies on how it wants the data presented for evidence, leads, ect. Please list your source, journal, or website that outlines your high-level approach to big data in digital forensics.
Thanks,
Eric
What are the 'bad files' Eric? Your question is quite wide - the type of investigation usually dictates if you can use some of the 'big data' techniques, and if so, which ones are appropriate.
One option purloined from the ediscovery world is clustering or what AD calls Document Content Analysis in later versions of FTK/Lab page 401 on from https://
Good old fashioned hashing can help, again depending on whether you are wanting to include/exclude. KFF has always been useful here to exclude a lot of irrelevant data. There are a few good technologies out there to 'best guess' Explicit Images if that's what your investigation hinges on, things like Cerberus to highlight malicious code if that's important to you, and so on…
Redcat,
The bad files are files that would be considered evidence, I.e. Cp pictures, illegally downloaded software, movies, malware dispensing programs, and the list goes on. I am aware of ftk, encase, and sans software to get process evidence.
I am trying to utilize and develop a standard process for analyzing the possible evidence. Possibly using data mining techniques. I am still in the learning phase, I regularly use encase on my cases. Good but I know there better programs out there.
Thanks,
Eric,