by Dominik Weber
In this month's installment, I will take a break from a specific problem and talk about a fundamental issue with deep forensics Scalability.
Scalability is simply the ability of our forensic tools and processes to perform on larger data sets. We have all witnessed the power of Moore's law. Hard drives are getting bigger and bigger. A 2 TB SATA hard drive is to be had for much under $100. With massive storage space being the norm, operating systems, and software is leveraging this more and more. For instance, my installation of Windows 7 with Office is ~50GB. Browsers cache more data and many temporary files are being created. After Windows Vista introduced the TxF layer for NTFS, transactional file systems are now the norm, and the operating system keeps restore points, Volume Shadow Copies and previous versions. Furthermore, a lot of the old, deleted file data will not get overwritten anymore.
This "wastefulness" is a boon to forensic investigators. Many more operating and file system artifacts are being created. Data is being spread out in L1, L2, L3 caches, RAM, Flash storage, SSDs and hard drive caches. For instance the thumbnail cache now stores data from many volumes and Windows search happily indexes a lot of user data, creating artifacts and allowing analysis of its data files…
Please use this thread for discussion of Dominik's latest column.
Dominik,
Regarding Windows Desktop Search, you're right that there are no mainstream tools which analyse the Windows.edb file, however, Joachim Metz has a Source Forge project which does this very nicely indeed. He even wrote an article here on Forensic Focus (http//www.forensicfocus.com/windows-search-forensics) which went into some detail about how Microsoft's obfuscation of the text fields was achieved and how he reversed this.
I've written a wrapper around this to ease some issues around the extraction for analysts and we're using this in our lab to obtain some very fine evidence!
Kind regards,
John.
I think one big problem with every increasing disks sizes is that a lot of analysis is sequential, ie read a sector and process it. I feel there may be three basic tasks, read, process, and save. Thus 5 cores on an 8 core machine will often be idle. (Many applications it will be 7 cores idle)
Any attempt to have two cores reading, will end up with the hard drive relocating it's heads all the time.
Software has to catch up with hardware. Microsoft Visual Studio 2010 at last has support for parallel programing, so some improvements may be seen in the near future. The trouble is that it is much easier to think sequentially, rather than in parallel where one may have to deal with an error, a long time after the data was read or processed.
The other big improvment we may all benefit from is USB-3.
6 years ago a typical drive I saw for recovery was 40-80GB. It is now typically 500GB and multi TB RAIDs are getting common too.
The original post also commented on more complex data structures. Processing times will continue to get longer.