How to explain diff...
 
Notifications
Clear all

How to explain different hash values?

6 Posts
5 Users
0 Reactions
7,484 Views
CHA23x
(@cha23x)
New Member
Joined: 3 years ago
Posts: 2
Topic starter  

As I understand the hash of a file is the hash of its contents. Metadata such as the file name, timestamps, permissions, etc. have no influence on the hash.

Now we have copied a large amount of data several times (for forensic analysis) and we are seeing for a few files the same size and content of data but diffent values of the modified timestamp. We strongly assume that this is caused by a known issue of ROBOCOPY (modified date set to 2/1/1980).

What I now do not understand at all: the hash values are different too.
How would you explain this strange result?

 


   
Quote
AmNe5iA
(@amne5ia)
Estimable Member
Joined: 9 years ago
Posts: 175
 

You are correct in your assumption that change to a modified date shouldn't result in a change of the files hash.  But that is only true fo file system timestamps.  If the modified timestamp is internal to the file, then this will obviously change the hash of the file, as the contents of the file itself has changed.  An example of this would be the Last Modified date within a word document.

As an exercise, you could create a word document with some content and save it.  Hash it, then open it again. Save it again without making any changes to the actual contents and you should find that the hash is now different.

If the hash is different then the contents of the files are different.  To even start to figure out what may have caused this, you need compare both versions of the files at tehy binary level to see which parts of the file have changed.


   
ReplyQuote
(@Anonymous 6593)
Guest
Joined: 17 years ago
Posts: 1158
 

> As I understand the hash of a file is the hash of its contents. Metadata such as the file name, timestamps, permissions, etc. have no influence on the hash.

That would be true for most cases. However, it is up to you to know how the tools you are using actually compute hashes.  You don't say what tools you are using, so we are not in a particularly good position to make suggestions.

>Now we have copied a large amount of data several times (for forensic analysis) and we are >seeing for a few files the same size and content of data but diffent values of the modified >timestamp. We strongly assume that this is caused by a known issue of ROBOCOPY (modified >date set to 2/1/1980).

Don't stop there.  Verify it.  Under what circumstances does the version of ROBOCOPY you are using behave like that? Always?  That is, you can repeat the job (or a subset of it) say three times and get exactly the same result each time?  Or does something (even insignificant, like a timestamp or a hash value) change between attempts?

>What I now do not understand at all: the hash values are different too. How would you explain this strange result?

Something changes your source data while you are working, or your data path is not forensically clean, or you are not using your tools in the right way.  Or all of those. Have you validated your imaging methodology on the relevant platform? Or do you have the necessary data to do such a validation?  (This would basically be a set of files containing hash validation data, i.e. data for which you already have authoritative results.)

You don't say how you run ROBOCOPY, so you have to answer yourself: do  any of the command line options you used affect the forensic cleanness of the original data or the data path?  Did it return an exit status indicating an error?

How did you establish that the hash data are wrong? You are not working on live data, I hope?

You may want to create a test job from one single file that doesn't make in through unchanged, and then repeat for that file only. If that keeps failing, you have something to work with.  If it doesn't fail, you can start looking an another area of the behaviour of robocopy that sometimes produces problems with access to remote shares.

Robocopy is not a perfect tool to use, as its logs do not contain all information you would need to collect, particular the release version.  Nor does it report the command line options you used, but instead reports the options it decided were in effect -- you may need to verify that those are correct --, and I have a vague memory that you need to take particular care to catch any error messages yourself, as they don't end up in the log file. (But that may be from using an old release, and not the one you are using.)

 


   
ReplyQuote
mokosiy
(@mokosiy)
Trusted Member
Joined: 13 years ago
Posts: 55
 
Posted by: @cha23x

What I now do not understand at all: the hash values are different too.
How would you explain this strange result?

We can only remotely ponder about it. That said, you have everything to figure it out pretty quickly. I think the most effective is digging into the data of same files with different hash values.

  1. Select the smallest possible "identical" files with different hashes
  2. Use binary comparison tool like this one to see HEX-differences between them: https://www.guiffy.com/Binary-Diff-Tool.html
  3. The different bytes will serve you as the best clues to answer the question

If the GUI tool is not good enough, there is some paid software like Araxis.

As a last resort, there is Windows consule utility to compare in binary mode:

fc.exe /b file1 file2

   
ReplyQuote
CHA23x
(@cha23x)
New Member
Joined: 3 years ago
Posts: 2
Topic starter  

Thanks for these helpful answers, especially the procedures you recommended. We are now analyzing deeper the respective files. 

The tech guys in my team do see a combination of the ROBOCOPY issue with the hash value differences. I'll keep you updated ...


   
ReplyQuote
(@surfandwork)
Eminent Member
Joined: 19 years ago
Posts: 26
 

For copying large files and folders containing a lot of files I use FastCopy ( https://fastcopy.jp/).

I tested several forensic file copying programs and FastCopy was the only one able to copy 1TB+ size files, 5+ subfolders deep, hash verify, text log, copy with original date/time, portable to put on a USB flash drive, updated few times a year, and free.


   
ReplyQuote
Share: