Hi All,
First time poster, longtime lurker. I work more in the ediscovery area, but dabble in non-forensic collections.
I've been using FTK Imager to collect pc's for some time now, I usually bring this back to the server area then use a proper ediscovery tool to index, search and process the docs i want.
I need to head out to a clients house next week to collect some data. My usual "collect everything then cull later" is not going to work as it is a home-pc and they dont want me to take everything. I have a small list of keywords that i have to use.
Has anyone got any clever suggestions? I dont need to restore the deleted items, and really i dont need anything "forensic", but more so just a decent process.
I was thinking of just using something like DTSearch to index the machine and collecting the documents from there?
Am i missing something?
Bob
I was thinking of just using something like DTSearch to index the machine and collecting the documents from there?
Am i missing something?
How much time are they going to give you to index their drive? And then choose what you want to collect and then collect it?
And this is not a smart alec question, knowing how much time you are allotted makes a difference when recommending tools.
I reckon i will have about 1-2 hrs…
It was one of those friday night calls from the legal team…
Running some testing now on my home laptop just to get the process down and hopefully give me a rough benchmark on gb/hr.
If you have a known set of keywords, and you know you wont be adding to them, there really isn't a need to index the drive since your time is short. You can do a live search and export the hits.
X-Ways Forensics would be a really good choice, if you have it. XWF can search the drive with your set of keywords, export those hits, create a logfile of what you did, and you can also create a spreadsheet listing of every file exported with all metadata listed in the spreadsheet. Other tools can do the same, but with XWF, you can run it from an external drive on your custodian machine, which will save you time of removing the drive and connecting to your machine, since you aren't going to image it.
1 to 2 hours is a tight schedule, especially if you happen to run into a large capacity hard drive. With whatever search tool you use, you are going to have to configure it to search user created files only to speed up the process. And if email is involved, such as the drive having .pst files, your time just will keep getting shorter.
Hi Bob,
Whip the drive out and write block it to your system and use EnCase (or portable direct onto the target system if you have it)? Build some basic conditions, ie find by file extensions DOC, XLS, PDF, PST, ZIP etc etc to strip out the unwanted system clutter and focus on bits you may want. I know this methodology is not reliant on file signature tests but in a tight time limit it is better than wasting an hour + on file sig search. Then copy/unerase these files out to a spare drive, run dtSearch on it for the keywords you want and export these out again to the media you are going to take away. Wipe the original export. In these time limited cases you arent going to be able to search really effectively. For instance, dtSearch isnt going to be able to read any 'flat' format PDFs or any scanned docs that might be in TIFFs etc. I tend to do a quick condition for images over 70k and manually review these in ImageSorter were I can quickly go to ones that look like they are white background with black text and see if they are of any use. For the PDFs I run a PDF converter over them and convert them to txt and add these to dtSearch. But with your time constraints it might be worth running your eyes over them manually dependant on the quantity.
If you don't have EnCase you could make a WinFE disk up and boot the host with this. Use some utility like TreeSize to find files by extension again and do the same export/save routine as above.
Good luck.
Regards
Shep
Thanks Bithead, Bshavers and Shep47 for your replies.
Normally i would throw this kind of job to the forensic guys, but you know what lawyers are like ("search some documents - how hard can it be?"). He was using webmail so no need for psts/nsf or anything like that.
I ended up going with DTSearch and just indexed the extensions we were looking for. Worked pretty well in the end. I wasnt worried about him trying to hide/obscure documents - not that kind of case so no need to go to that level.
We dont have licences for Encase or Xways (
Shep47 It's interesting you mention about the flat tiff/pdf files. I'm blown away by how many forensic applications do not run an additional ocr process over files with no/small character counts.
Anyhow all, thanks all.
Rob
. I'm blown away by how many forensic applications do not run an additional ocr process over files with no/small character counts.
Not to change the subject too much…but I can think of a number of reasons for this
1. Forensic application developers very often are not practitioners. I have used a commercial forensics tool for some years, and it's currently many versions passed when I started using it…and yet, when I run a keyword search, it still only tells me which file the hit was located in…I then have to extract the file itself in order run an additional search across it to find out where within the file the hit is located, and determine the context of the hit. Look at AccessData's FTK 2.0 when it came out, or the current issues publicized on the web regarding EnCase v7…many practitioners seem to be unhappy with their licensed, paid-for tools due to some of the design and implementation decisions made by vendors.
2. Most practitioners seem to prefer to struggle and make do with what they have, rather than contact a vendor or open source tool author and ask for assistance. I've talked to a number of practitioners who are happy spending days or weeks trying to figure something out themselves, rather than contacting someone and getting help to get the job done now.
3. Years ago, I was associated with an investigation where the customer thought that PCI data may have been exposed. Only after we had sent an analyst on-site did we find out that the system involved was a fax server, and that the faxes which supposedly contained clear text CCNs were TIFF files located in PDF documents. However, in over 3 1/2 years of doing PCI exams, that's the ONLY instance I ever heard of this kind of issue. As such, there may not be enough of this kind of work or issue to justify spending the money to add an OCR capability to a search utility.
OK I know we're down a different road now, but OCR's kinda interesting. We used Sherpa's Discovery Attender for Exchange for out internal eDiscovery/eDisclosure efforts. It's cheap and generally does what it says on the tin including listing Exceptions when it comes across password-protected, encrypted, empty, or corrupted files.
What it doesn't do - and for a major litigation case this was a show-stopper - is list filetypes that it couldn't search inside e.g. JPG, TIFF, scanned PDFs etc. So we ended up collecting everything and giving that to external counsel who in turn got their Lit Support vendor to run their super-duper software - which attempts to OCR anything non-text on the fly and report on any errors.
Cheers
Who doesnt love a good off-topic though?
Cults14 - you are spot on, for litigation not having OCR capability can be a massive showstopper. A number of times I have had to explain to the legal team that they might not have Doc X if it was scanned and not OCR'd by the client. This is followed by letters to the client, other party, judge etc.
So many of the key documents are often minutes with handwritten notes (and scanned back in) and other signed documents such as contracts/agreements. I get massively concerned when i hear Big Accounting Firm X using a tool that I know does not OCR.