Dear All,
Myself and a colleague are currently battling against 200 USB devices, the background is that we need to check that all the information that is stored on each USB pen is the same as the next.
We have tried several theories but none of which work or they take too long and it would be the opening event of the olympics before we had finished all 200.
We have hashed each file including the folders with the USB device however in doing so we would need to check each file and make sure no additional files were found - after testing we have found that some programs don't work at all, and some compare the files but don't find the extras that could be on the devices making that method pointless.
We have tried to get an MD5 of the drive as a whole, but i know that with 200 devices there will be batch variations between the different USB denies meaning the number of bytes differ slightly and as such the MD5's are different, making this solution null and void.
Finally we tried to hash the volume, but the hashes don't match. unsure as to why but it mean this method is out too.
each USB device is the same type, same model and same manufacturer (as far as we can tell) each are the same size (1gb, 950mb usable)
Does anyone have any ides or programs we can try?
Thanks
Greetings,
Here's what I'd do, using Linux
First, figure out a command to hash all the files on the drive. Here's one suggestion
find . -type f -a \! -name MD5SUMS | parallel -j+0 "md5sum {} >>MD5SUMS"
or
find path/to/folder -type f -print0 | sort -z | xargs -0 sha1sum | sha1sum
(both lifted from stackoverflow) If you use the first version you should definitely add a "sort -z" to it.
Then, using your "golden" USB drive, generate a master hash list.
For each of the remaining drives, run your hash command and diff the results against the master hash list.
If the diff matches, you're good to go. If not, look at the output to see why. It should show files where the hash doesn't match as well as instances where there are fewer of more files on the current drive than there were on the master.
-David
Try and find why the volume hashes don't match. Take some images (DD) and do a byte by byte compare. (in DOS fc /b image1 image2) If it just a few bytes different then it may be Volume ID, or a Date change and probably can be ignored. If there are large areas, these will need to be investigated.
Hashing is a very binary test - identical or different. Almost identical may be what you are looking for.
I would start by imaging all 200 USB drives - and create hash at the same time. It is easier to investigate multiple images rather than multiple physical devices.
Have you tried using md5deep?
Create "hash.txt" using the golden key. Then run a negative hash against the usb drives.
What make/model are these USBs?
Wear leveling, for example?
Wear leveling should be invisible to any access through the USB port.
If you write to sector xx, the data must always be in sector xx. Otherwise, the drive could not be moved to a different PC with a remapping file.
What the chip does internally is nobodies concern. Wear leveling is invisible to users.
I stand corrected; I was thinking of housekeeping processes not the wear leveling.
Yes, house keeping might change a single bit/byte within maybe a date field.
This is the limitation of hashing, it only shows yes or no. A compare will indicate if it is 'close enough'
Kovar's solution is pretty good. My default answer would have been that if you do the duplication on a Linux system by DD'ing the whole device (partition table and all), you should be able to get the volume hashes to agree. If not, I'd use cmp or something to figure out where they differ and, from there, why.
Remember, though, if you're looking at a volume hash, that mounting the drive read-write changes the bits on the disk and thus the hash.
You could use dcfldd to split the drives during imaging, and then have it hash each sub/split file as it goes…
then you would have hashes for each X number of bytes, and you could compare the number of hashes that are the same in each file.
You could also try using ssdeep
http//ssdeep.sourceforge.net/