If you want to prove that a drive is all zero then scan it with a hex editor for a non-zero byte, if the scan fails to find a result then the drive is all zeroes.
A checksum that simply adds all bytes will roll over at some point so a zero result will not prove all zero bytes although you would have to be very unlucky to get a zero checksum with a drive with non-zero data.
> If you want to prove that a drive is all zero then scan it
> with a hex editor for a non-zero byte, if the scan fails to
> find a result then the drive is all zeroes.
Agreed that works fine; but for very large files can take a long time. The main reason for the app was that if for example you've already captured a drive/device, and when you look it appears to be all-zero, you can check the already-generated-during-imaging hash against the allZeroHash from the app to validate that all is zero in seconds, rather than the hour(s) it would take for a [^\x00] to run across a terabyte image.
> A checksum that simply adds all bytes will roll over at
> some point so a zero result will not prove all zero bytes
> although you would have to be very unlucky to get a
> zero checksum with a drive with non-zero data.
Yep, same is true for the carefully selected CRC - you'd only get zero result if unlucky or from deliberate manipulation. Another option I did consider was doing a multiple precision sum (using the GNU bignum library, offering multiple precision) which would allow summation across 2TB within a 50-bit accumulator without overflow. However while this would be a little quicker than the hash, I like the other benefits of non-manipulation of hash property, and the ability to "quick check" for existing images and non-DD format.
BTW, I'm not really trying to champion a methodology here - everyone's preferred method will vary, and some more suitable in different circumstances; but having had the issue of people not understanding the properties of CRCs across large data blocks on a few occaisions, I just thought I'd share my thoughts, and the app, in case it helped others.
Phil.
So, how do you go about proving that in court?
Here is how I heard it. You image the drive, but after doing your examination on the image you want to start up the computer as the suspect did, same hardware and such. You can't just put the evidence drive in, so you do a copy of the image to a new, similiar drive. However, the defense might balk, how do you know that the new, similiar drive did not have old data on it, data that you are now using against their client.
I might have heard wrong, but my curiousity remains. How do you go about proving a zero filled drive is actually zero filled? A program or something?
This brings up the old debate of whether each image/case should be made on/stored on a separate device which has been previously securely deleted to avoid the accusation of "cross-contamination".
If you're going to get into explaining hashes to jurors anyway as your scenario implies, then you can state (and show) to the court that the acquisition hash matches the verification hash = no data on the image was added or modified since imaging took place. A simple and established procedure used universally in computer forensics; there's really no forensic requirement to prove a drive is zero-filled in this situation.
There is nothing 'unforensic' whatsoever about placing images from different cases on to the same storage device - be that a HDD, server, NAS or SAN.
No reasonable hash will produce zero for a string of N zero-bytes. I think the direct answer to your question is that you can demonstrate the drive is zeroed by hashing the drive (and recording its size) after zeroing it. Then, hash an equivalent volume of zero bytes and show that hashes are the same. This is straightforward in Linux and fairly easy to program in a scripting language.
> Agreed that works fine; but for very large files can take a long time. The main reason for the app…..
Phil.
Our posts crossed, so I was not trying to compete with your app. I was just trying to give the OP a simple answer to his qusetion using readily available tools without any comment as to whether it was necessary or not.
The question of zeroing forensic media/using a seperate drive for each case is a continual one, there is clearly no technical reason to do it but if it can avoid a line of questioning then it must be worthwhile mustn't it? The only downside (apart from the effort) is if explaining why it is done is more complex for the Court than explaining why it does not need to be done.
I'm playing around with doing a zero fill. I've done a 500 gig hard drive and a 2gig flashdrive. However, after I do the zero fill, the checksum doesn't add up to zero. The MD5 hash, for instance, returns a non-zero result. I was under the impression that a zero'ed out drive returns a zero hash. Looking at the disk with a hex editor confirms the drive was zero'ed out.
Assuming my information is wrong. How do you ensure the drive is all zero's, without having to pour through pages and pages of Zeros?
Thanks!
This is in response to the comment above, as well as those that followed.
The problem is that the terms checksum, CRC and hash are being used interchangeably - they each use a different algorithm in their calculations. A very straightforward method for showing that a drive has been 'zeroed' - the original question - is to use a checksum program capable of handling very large numbers. The checksum algorithm adds up the number of '1' bits - if the final answer is "0", then there are no '1' bits, and you can say with confidence that the drive was zeroed. I like this because it can be easily demonstrated on a whiteboard to anyone.
lsg
A very straightforward method for showing that a drive has been 'zeroed' - the original question - is to use a checksum program capable of handling very large numbers.
A simpler method that avoids the problem of an arbitrarily long accumulator is to do a logical OR of every byte on the disk. This has the effect of 'collecting' all the one bits which will answer the question and would only take a few lines of code.
If I was asked to check if a disk is just zeros, then the answer has be Yes or No.
I would start by skipping through the disk checking maybe every millionth sector. If a sector contains a single bit then the answer is no, and you can stop. This could find an area of the disk that has been used, without searching the whole disk.
After that, then just sequentially check every byte of every sector, until either the end of the disk is found, or a bit is set.
The final answer will be a Yes or No. No sumcheck is required
Happy New Year
My last post for the decade!
To address a few points from above…
Though CRC is simple to manipulate, this is not a reason to shy away from it. (because you would have to be the manipulator). The 2 main reason we do not use it are
*SPEED, and
*there is a rough 1 in 30,000 chance of checksum collision (ie an incorrect checksum is produced and still validates the current byte). See Birthday Attack / Birthday Collision if you're interested in this topic. (Technical note.. this case won't occur because the research is specific to using CRCs as a network transmission verification tool .. you can use it here if you don't mind the speed hit. Without the network tx, there's no risk of a bit flip being produced.)
Why you should not use disk-level operations
You may need to track bad clusters on your own. This adds an unnecessary level of complexity. (Remember, bad sectors are NEVER written to … even when we're zeroing out a disk.)
Why you should not add up all bytes and test for a non-nil value
Possibility of variable overflow in your program. If anything, test if the current byte is non-zero, and once the 1st is reached, break, returning your finding.
Why you should not use Logical OR
Unnecessary comparison. You can simply check for the 1st non-zero and jump out. No need to do any math!
The final answer will be a Yes or No. No sumcheck is required
mscotgrove's exactly right.
What's the checksum of a zeroed out disk when MD5 and CRC32 are used? Why are the checsums different for two zeroed out disks of differing size?


