Notifications

Clear all

Flaw in evidence verification process?

ThePM · 2014-05-02T19:23:00Z

Hey guys, I would appreciate your input on a discussion that we had at the office regarding the verification process of evidence files used by most forensic software/hardware.For years, we were under the impression that after verifying an evidence file/drive, if the MD5 / SHA1 match, that it was confirmation that the source and the destination data was exactly the same and that we had a "forensic copy" of the source drive.However, what was pointed out is that if there is an error in the bitstream of data that is read from the source drive, the erroneous data will be written to the destination file/drive and the cumulative "source" MD5 will be calculated from this erroneous data. When the target MD5 will be calculated for verification, it will be calculated using the same erroneous data, thus the verification MD5 will match the cumulative "source" MD5. But this "verification" absolutely does not mean that we have an exact copy of the source drive, since it has been calculated on erroneous data. The only way to be absolutely certain that the evidence data is a forensic copy of the source data would then be to hash the source drive (aside from SSD drives that bring additional challenges).Am I missing something here? Is there some data validation during the data transfer that I'm not aware of? Because now, from my standpoint, I can't testify that I'm using a forensic copy of a drive just because the hashes match.thanks.

Page 2 / 2 Prev

General (Technical, Procedural, Software, Hardware etc.)

Last Post by jaclaz 11 years ago

12 Posts

8 Users

0 Reactions

2,484 Views

RSS

jaclaz

(@jaclaz)

Illustrious Member

Joined: 18 years ago

Posts: 5133

05/05/2014 9:26 pm

As I see it, the hash done while imaging takes into account the "flux" of data read from the source and written to the target disk.

As such when you verify the hash of the written image by comparing it to the one you obtained during imaging you are ONLY saying that there were not "write errors" on your target, not necessarily that there were not "read errors" on the source.

If you prefer the hashing is a way to know for sure that the image that you examined and of which you provided a copy to the other part in the trial has not been tampered with, and represent an exact "snapshot" of what was read from the device on a given date.

With perfectly working disks and perfectly working equipment/software, and in theory, what you read is actually what is on the source.

But then we need to leave alone for one moment "pure forensics" and get to "data recovery".

Hard disks do develop "bad sectors" and do have "malfunctionings".
Still in theory any modern hard disk is so intelligent to re-map a "weak sector" to a spare one "transparently", and normally the way the Disk internal OS works is (more or less)

let me read (because the OS told me to do so) sector 123456
hmmm, the ECC sector checksum (or whatever) does not match at first attempt
let me try to correct the data read through my internal (and not documented) recovery algorithm
hmmm, nope, it still does not work
let me try to impememnt the parity check algorithm (another not documented feature)
pheeew, now it matches, good
to be on the safe side, let me remap sector 123456 to spare sector 999001 (without telling the OS, nor the filesystem) and let me jolt down this new translation in my G-list (pr P-list or *whatever*)

It is perfectly possible (in theory and practice) that in the exact moment you are reading a sector this "becomes" bad.

What happens then?
The sector was "weak", but was *somehow* read correctly, it became "bad" exactly one fraction of a nanosecond after having been read, and the disk managed the issue fine.

But what if a given sector passes from "good" to "bad" immediately after you have read it?
The disk, at next occasion, finds it bad, attempts t recover it and fails (or succeeds but for *whatever* reasons fails in the copying it to the spare sector or fails in updating the list.

When you try to rehash the source drive, you will have either errors or another hash.

On the other hand, I believe that is not "common practice" to write the image from source to several targets at the same time.

So for a given period of time you have only a "source" and a "target", the same malfunctioning may happen to the "target" instead of the source (and you find it only because a new hashing of the target or of a copy of it comes out different), in which case I think that what is done s to re-image from the original.

In other words, the hashing process is an important part of the procedures but it is not the "only" solution.

A better approach could be that of doing a more granular form of hashing.
The smallest "atomic" component being a sector or "block".
So you could hash each sector by itself and create a list of hashes one for each sector or decide to group 10/10/1000/10000/100000 sectors into a "blocklist" and hash these blocklists.
This would bring IMHO two advantages

you know for sure that ONLY a given "blocklist" is affected (and ALL the other ones are fine)
if more than one blocklist (or many or all of them) do not hash correctly then something (be it OS instability, hardware issues or *whatever*) is causing it in a "generalized" way before completing the "whole" image

[/listo]

jaclaz

P.S. it seems that not only the previous idea is nothing new, but it has also been ported to a "next" level
Distinct Sector Hashes for Target File Detection
Joel Young, Kristina Foster, and Simson Garfinkel, Naval Postgraduate School
Kevin Fairbanks, Johns Hopkins University
http//www.computer.org/csdl/mags/co/2012/12/mco2012120028.pdf

ReplyQuote

jaclaz

(@jaclaz)

Illustrious Member

Joined: 18 years ago

Posts: 5133

09/05/2014 10:01 pm

On further check, the idea hinted in the PS above is being implemented within/with the support of digitalcorpora, see here
http//digitalcorpora.org/archives/391

And there is an article (of course behind the usual paywall, but the abstract is enough)
http//www.tandfonline.com/doi/abs/10.1080/15567280802050436

which provides an interesting empirical analysis

This article reports the results of a case study in which the hashes for over 528 million sectors extracted from over 433,000 files of different types were analyzed. The hashes were computed using SHA1, MD5, CRC64, and CRC32 algorithms and hash collisions of sectors from JPEG and WAV files to other sectors were recorded. The analysis of the results shows that although MD5 and SHA1 produce no false-positive indications, the occurrence of false positives is relatively low for CRC32 and especially CRC64. Furthermore, the CRC-based algorithms produce considerably smaller hashes than SHA1 and MD5, thereby requiring smaller storage capacities. CRC64 provides a good compromise between number of collisions and storage capacity required for practical implementations of sector-scanning forensic tools.

that confirms my initial thoughts that one could use a much simpler algorithm than MD5 for block hashing (thus saving computational resources, i.e. time and space for the hash database).

So, one could "keep" the current MD5 or SHA-1 hashes for the "whole image" but use a much simpler CRC algorithm for "block hashing" and - since the scope here is only verification and not comparing with a database of known hashes, a simple CRC32 would be enough.

Summing this with the considerations about block size in other articles about the same subject, particularly the one here

Using purpose-built functions and block hashes to enable small block and sub-file forensics. Simson Garfinkel, Alex Nelson, Douglas White, and Vassil Roussev.

http//www.dfrws.org/2010/program.shtml
http//www.dfrws.org/2010/proceedings/2010-302.pdf
It would make sense to hash with CRC32 blocks of 16,384 bytes or maybe 32,768 bytes or even 65,536.
The "overhead" of the hash database would be anything (for an "average" 500 Mib to 1 Tib disk image) between 30 and 200 Mb, IMHO not trifling, but not really preoccupying.

jaclaz

ReplyQuote

Page 2 / 2 Prev

8 Forums
15.7 K Topics
92.3 K Posts
250 Online
41.1 K Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed