I'm looking for a 'one stop' software solution (preferably free) that will index files and using a keyword list produce output (csv) of what keywords appear in each document. These are 'general' office docs and mail (EML and MSG)
Doc1.doc Keyword1 Keyword2 Keyword3
Test.xlsx Keyword2 keyword3
Scan.pdf Keyword1 keyword3
Message.eml keyword 4
These keywords need to able to be input via GREP and the keyword might also comprise of two parts which I want to show if only both appear ie 'elephant' and 'giraffe' but not report on if only one appears.
I've got an idea of how to go about this in EnCase and playing around with the csv output in a database and querying it etc but is there a simpler solution out there?
Many thanks for any advice.
As far as I know this is either impossible or impractical in EnCase v6. There is no way to create an exported list like that or combine search hits in the way you describe using a basic keyword search. You may be able to do it if you index the case, but sadly Indexing in EnCase 6 is very poor.
XWays, on the other hand, will do everything you need it to from a simple keyword search. Once a search is complete, XWays will populate a special column with "keyword hits". You could then export the filename along with this column to get the output you want.
XWays also handles "combination" keywords very well. You can do exactly what you specified quite easily - i.e, only show files where "elephant" AND "giraffe" appear.
Many thanks for the reply. I agree with EnCase being impractical, it is achievable as I've done in a small test data set but it involves a several stage process to get to my desired result and putting this into practice on several more much larger sets and larger number of keywords would be achievable but logistically a nightmare.
I will give X-Ways a go.
dtSearch will provide a list of keywords, and their frequency.
It also has the ability to look "inside" non-text documents, such as MSG, PDF, spreadsheets and MS Office documents.
You may want to try P2 Commander. P2 Commander does a good job with email and can perform recursive searches through email, archives, ole strings, etc. so the search results will be comprehensive. You can then bookmark the results and create a .CSV report showing the path to the search results. You can even select to have the target files exported along with the report.
PM me and I can get you a sample report.
In FTK, you can use the Labels function to assign labels to your responsive files. You could then export the file properties with the Labels column included in the report. However, this means running one keyword search at a time then creating the label and applying it to the responsive files for that search instance. This becomes impractical when you have a long list of keywords.
Was the production of a CSV file the ultimate aim?
Or was the production of a CSV file just a intermediate step which you believed was necessary to perform a search via grep or to load up the data into a custom database?
Because if the ultimate aim was to do a search across a set of documents (with various boolean expressions) there are much better ways to go about building a index that what you have proposed.
For example your solution doesn't deal with stemming, exact phrase searches, different character sets, stop words, ranking results by relevance, breaking up Email archive files into individual Emails, searching by criteria other that keywords (eg dates, file names, Email from addresses), wildcard searches, etc…
Your solution will also be slow.
You are much better off using a pre-made solution, like the well respected products suggested above, or our own OSForensics software, which may well do what you want for free.