Join Us!

Notifications
Clear all

Excel hashing  

  RSS
RABIDFOX
(@rabidfox)
New Member

So i did some hash test files in excel. I made 4 one was an exact copy same hash, one I renamed in windows also same hash value and one i renamed using the save as feature and it displayed a different hash value. So I was wondering if anyone can explain why this has happened?

Quote
Posted : 01/12/2018 1:32 am
tracedf
(@tracedf)
Active Member

If you opened the file in Excel and chose "save as", the metadata was probably was updated. Even if the change was not visible to you, something changed.

ReplyQuote
Posted : 01/12/2018 3:31 am
athulin
(@athulin)
Community Legend

So i did some hash test files in excel. I made 4 one was an exact copy same hash, one I renamed in windows also same hash value and one i renamed using the save as feature and it displayed a different hash value. So I was wondering if anyone can explain why this has happened?

The best way to figure that out is, usually, to compare the files, byte by byte. Easiest way is probably to do

C\Users\Whoever> COMP book1.xlsx book2.xlsx

and examine the output. You'll get a list of places where the two files differ. If they do … hashes will differ as well, of course.

As xlsx files are zip archives, you can unpack them, and compare the contents. Or, open both in 7zip and check the CRC column. I expect that only the docProps folders will show different CRC data. If you want to find exactly where the difference is located, just go on from there.

ReplyQuote
Posted : 01/12/2018 6:40 am
randomaccess
(@randomaccess)
Active Member

Athulin, the second way you suggested is probably going to yield more useful results. As you pointed out, the xlsx format is a zip, so I think the first one might show that they're different, but the data will still be compressed.

If you unzipped both documents and then hashed the individual components youd probably see the difference quickly; my guess is internal metadata stored in docprops is what's changed (which was also suggested by athulin)

ReplyQuote
Posted : 01/12/2018 11:30 am
athulin
(@athulin)
Community Legend

Athulin, the second way you suggested is probably going to yield more useful results. As you pointed out, the xlsx format is a zip, so I think the first one might show that they're different, but the data will still be compressed.

It may yield results, but in this particular case, I think the only useful result is if the OP begins to understand what's going on.

ReplyQuote
Posted : 01/12/2018 7:04 pm
RABIDFOX
(@rabidfox)
New Member

forgot you could extract them so the two files within excel that had changed were core.xml and workbook.xml.
in core the meta-data is physically stored so modified time affect that and in workbook there is a unique document ID that changes.
thanks for the help guys

ReplyQuote
Posted : 01/12/2018 8:32 pm
randomaccess
(@randomaccess)
Active Member

It may yield results, but in this particular case, I think the only useful result is if the OP begins to understand what's going on.

looks like that's happened

Good work OP, digging into the weeds of the file format is always a good place to start when trying to understand how this whole crazy world fits together

ReplyQuote
Posted : 02/12/2018 7:20 am
Share: