Forensic Focus - Computer Forensics, Computer Forensic Training, Digital Forensics
LoginRegisterForumsArticles/PapersEducationReviewsInterviewsNewsletterJobsEventsBlogAdvertise
Search Forensic Focus
Custom Search

Find us on Facebook
Follow Forensic Focus on Twitter

Submit article, paper or blog post
Latest Articles
· “The Data Specimen is the Blood of Cyber Forensics”
· Forensic Imaging of Hard Disk Drives- What we thought we knew
· Can Your Digital Images Withstand A Court Challenge?
· Review: Proof Finder by Nuix
· Forensic Toolkit v3 Tips and Tricks ― Not on a Budget
· Is your client an attorney? Be aware of possible constraints on your investigation. (Part 2 of a multi-part series)
· iPhone Tracking – from a forensic point of view (Update!)
· Android Forensics Study of Password and Pattern Lock Protection
· Skype in eDiscovery
· Forensic Toolkit v3 Tips and Tricks – On a budget

read more...
Main Menu
MY ACCOUNT
COMMUNITY
EMPLOYMENT
EDUCATION
RESOURCES
MISC
Follow Forensic Focus

Join newsletter

Join LinkedIn group

Follow on Twitter

Subscribe to news

Subscribe to forums

Subscribe to blog

Subscribe to tweets

Members' blogs

External feeds

Bookmark & share: Bookmark and Share

Newsletter
Newsletter

You must be a
registered user
to receive our newsletter

Register Now!
Forensic Focus

Forensic Focus

Copy and paste the text below to insert the button displayed above on your site. Thanks for your support!


Scalability: A Big Headache

by Dominik Weber

Dominik Weber
About the Author

Dominik Weber is a Senior Software Architect for Guidance Software, Inc.

In this month's installment, I will take a break from a specific problem and talk about a fundamental issue with deep forensics: Scalability.

Scalability is simply the ability of our forensic tools and processes to perform on larger data sets. We have all witnessed the power of Moore's law. Hard drives are getting bigger and bigger. A 2 TB SATA hard drive is to be had for much under $100. With massive storage space being the norm, operating systems, and software is leveraging this more and more. For instance, my installation of Windows 7 with Office is ~50GB. Browsers cache more data and many temporary files are being created. After Windows Vista introduced the TxF layer for NTFS, transactional file systems are now the norm, and the operating system keeps restore points, Volume Shadow Copies and previous versions. Furthermore, a lot of the old, deleted file data will not get overwritten anymore.

This "wastefulness" is a boon to forensic investigators. Many more operating and file system artifacts are being created. Data is being spread out in L1, L2, L3 caches, RAM, Flash storage, SSDs and hard drive caches. For instance the thumbnail cache now stores data from many volumes and Windows search happily indexes a lot of user data, creating artifacts and allowing analysis of its data files.

That was the good news. The bad news is that most of this data is in more complex, new and evolving formats, requiring more developer efforts to stay current. For instance I am not aware of any forensic tool that analyzes Windows Search databases - not that I had time to look (if you know of such a tool, post in the forum topic, please - see below). Worse than that is the need to thoroughly analyze the data. Traditionally, the first step is to acquire the data to an evidence file (or a set thereof). The data must be read, hashed, compressed and possibly encrypted. All this does take time, despite new multi-threaded and pipelined acquisition engines appearing (for instance in EnCase V6.16). High speed hardware solutions are also more prevalent. Luckily, this step is linear in time, meaning that a acquiring a full 2TB hard drive will take twice as long as a full 1TB drive. Note that unwritten areas of hard drives are usually filled with the same byte pattern (generally 00 or FF) and these areas will compress highly, yielding faster acquisition rates.

The next step is the interpretation of the file systems by the forensic tool. And here we run into some issues. We want to detect and display deleted and overwritten files properly in their hierarchy, detect cross linked files and compute unallocated space. This process is non-linear with the number of files and with the hard drive space. A lot of processes are n-squared. That means that a doubling of the input would quadruple the output. For instance, if a 125 GB NTFS drive would take 1 minute to process, a 2 TB would take about 4.5 hours (256 minutes). In reality this figure is actually worse. This is due to smaller datasets being able to be handled in memory without paging; the file system cache and large amounts of RAM really help here. This impact is because we need to analyze the file system as a whole, rather than having the luxury of just being concerned with a single non-deleted file or directory. This is why there is no slowdown noticeable on your own computer. Well that and that the CPU, graphics and disk I/O are more powerful to handle the new load of the applications and operating system. In large sets, the system starts paging and the performance degrades due to the I/O subsystem needing to read from the evidence while paging memory in and out (or accessing the database files).

The next steps also take more time; carving, detecting and mounting file structures (zip/cab files, office documents, PST/DBX and similar). Here we run into similar issues, and this step adds more items to the whole data set.

Now we have a large amount of files and data to sift through in order to find the data of interest. New and inexperienced investigators can be quickly overwhelmed by the data flood. So what can we do about this? There are some practices that are currently used that help with this data flood at the expense of some computing / automated analysis time:

1) Hash Analysis: All the files get hashed and these hashes get looked up in Hash sets. This will determine if a file is part of a standard software packet/ operating system installation. There are also hash sets for malware and known child pornography images.

2) Indexing / Search analysis. The data gets either searched raw with keywords that would hit on terms of interest (victim's name, credit card number etc.) or the whole text is indexed.

3) Custom analysis: Images can be carved out, custom scripts, plug-ins and processes that analyze the data can be run; these can range from image finding and skin tone analysis to detection for encrypted data and email thread or web cache data mining.

4) Data sets can be reduced by filtering and other criteria and this data is put into its own reduced data set (logical evidence files).

While these things do help in human processing time, they also increase the automated processing time. This time is usually spent up-front, leading to larger lag times. Unfortunately most of these algorithms are worse-than-linear. Thus, the fundamental challenge in forensic tool design / programming is to use advanced algorithms that have better run times. Multi-threading, pipelining and a better use of the hardware platform to optimize I/O, CPU and RAM throughput are needed. These techniques take much longer to design, implement and debug because they are much more complex, the data sets ran against them are large and the result must be validated. This large effort is very hard to achieve in open-source projects and even in commercial products it takes effort, commitment and larger architectural changes. This effort must be balanced against the need to release products and provide updates and new features.

So, dear reader, as a forensic developer, I am curious, what do you think about the current state of forensic tools? Are they adequate? What other things do you do in order to deal with the data volume? How should the "new feature" be balanced against the scalability of the existing process?


Click here to discuss this article.





--

Read Dominik's previous columns

Dominik Weber is a Senior Software Architect for Guidance Software, Inc. He has a Masters of Computer Science from the University of Karlsruhe, Germany and worked for video game companies (Activision) and on computer animation / motion-capture projects (Jay Jay the Jet Plane) before joining Guidance Software in 2001. He can be reached at forensicdeveloper@gmail.com


Guidance Software is recognized worldwide as the industry leader in digital investigative solutions. Its EnCase and Enterprise platforms provide the foundation for government, corporate and law enforcement organizations to conduct thorough, network-enabled, and court-validated computer investigations. Worldwide there are more than 30,000 licensed users and thousands attend its renowned training programs annually.


Forensic Education

computer forensics education choices COURSE DIRECTORY

User Info

Welcome Anonymous

Nickname

Membership:
Latest: Draugrs
New Today: 0
New Yesterday: 13
Overall: 20808

People Online:
Members: 5
Visitors: 37
Bots: 4
Staff: 0
Staff Online:

No staff members are online!
Latest Jobs

Data Analytics Assistant Director, Dubai
Last post by ScottBurkeman in Digital Forensics Job Vacancies on Feb 02, 2012 at 17:14:03

Experienced Forensic Computer Analyst, Surrey
Last post by pickle in Digital Forensics Job Vacancies on Jan 31, 2012 at 12:35:31

eDiscovery Analyst and Assistant Manager, London £35-£50000
Last post by ScottBurkeman in Digital Forensics Job Vacancies on Jan 23, 2012 at 14:12:11

QCC Vacancy - Digital Forensics Sales Executive (London)
Last post by garybrevans in Digital Forensics Job Vacancies on Jan 20, 2012 at 13:17:43

E-Discovery Consultant- London- £40-£50K basic + 10% bonus
Last post by Teval in Digital Forensics Job Vacancies on Jan 20, 2012 at 10:09:56

Senior Software Licence Review Manager. London. Up to £100K
Last post by Tyrrell66 in Digital Forensics Job Vacancies on Jan 19, 2012 at 13:46:41

Senior Forensic Manager - London
Last post by diana2012 in Digital Forensics Job Vacancies on Jan 18, 2012 at 18:05:43

Data Analytics Consultant
Last post by Nicola in Digital Forensics Job Vacancies on Jan 18, 2012 at 18:04:08

Forensic General Investigations Accountant Consultant London
Last post by Nicola in Digital Forensics Job Vacancies on Jan 17, 2012 at 15:13:44

Forensic Technology - Sr. Consultant Needed in Boston, MA
Last post by mfeeley in Digital Forensics Job Vacancies on Jan 12, 2012 at 18:39:18

Blog
· Harry Onderwater
· Forensic Toolkit v3 Tips and Tricks ― Not on a Budget
· Is your client an attorney? Be aware of possible constraints (Part 2)
· iPhone Tracking – from a forensic point of view
· Android Forensics Study of Password and Pattern Lock Protection
· Skype in eDiscovery
· Forensic Toolkit v3 Tips and Tricks – On a budget
· Anonymous, what does it mean?
· YouDetect – Implementing the principles of statistical classifiers and cluster analysis for the purposes of classifying illegally acquired multimedia files
· Advice for Digital Forensics Job Seekers

read more...
Members' Blogs

Start Blogging

What is Computer Forensics?
Computer forensics (or forensic computing) is the use of specialized techniques for recovery, authentication, and analysis of electronic data with a view to presenting evidence in a court of law.
Downloads
  1: Forensic Examination of Digital Evidence: A Guide for Law Enforcement (pdf)
  2: ACPO Good Practice Guide for Computer based Electronic Evidence
  3: Ancysoft Data Recovery Software
  4: Electronic Crime Scene Investigation: A Guide for First Responders (pdf)
  5: HELIX incident response CD
  6: PDA Forensic Tools:An Overview and Analysis
  7: Recover My Files
  8: Autopsy Forensic Browser Version 2.03 (source code)
  9: Handy Recovery
  10: PC On/Off Time

Use of this website signifies your agreement to the Terms of Use/Privacy Policy available here.

All logos and trademarks in this site are property of their respective owner. The comments are property of their posters, all the rest © 2011 Forensic Focus


Interactive software released under GNU GPL, Code Credits, Privacy Policy
.: fisubsilver shadow phpbb2 style by Daz :: CPG-Nuke port by norseman :: ported to CPG-Dragonfly by jamin :.