Notifications

Clear all

Compare two Excel files forensically

General (Technical, Procedural, Software, Hardware etc.)

Last Post by PaulSanderson 16 years ago

9 Posts

8 Users

0 Reactions

905 Views

RSS

andries

(@andries)

Active Member

Joined: 17 years ago

Posts: 12

Topic starter 06/01/2010 10:13 pm

Hi There

I am working on a case where we found an Excel file on a suspect PC that belonged to a company he used to work for. Now the company wants to prove that this file was the same as there original file.

What tools are out there to compare files Forensically to see if they are the same?

I found Compare IT and windiff.

Thanks

Quote

keydet89

(@keydet89)

Famed Member

Joined: 21 years ago

Posts: 3568

06/01/2010 10:29 pm

I would suggest that the question is far too broad.

First off, Excel is a binary format…particularly the versions prior to Office 2007. Tools usually used to determine if the files are exactly identical would include running MD5 hashes…but changing a single bit would also change the hash. If the user opened the file their own system, you may find that that action alone would modify enough information within the binary contents of the file (ie, metadata) to alter the hash.

I'd start with file sizes, then document the hash and metadata. Probably the best way to get a definitive answer on this would be to just open the files and compare the two.

ReplyQuote

seanmcl

(@seanmcl)

Honorable Member

Joined: 19 years ago

Posts: 700

06/01/2010 10:48 pm

I had a very similar case in which a contractor claimed that an ex-employee had taken a copy of their job bidding software which was implemented in Excel. The bidding program was the key to my client's success because it data and algorithms which allowed him to underbid competitors and still make a profit.

In this case, the ex-employee changed certain column names, macro names, etc., but the essence of the program remained the same.

We were able to convince a judge that the defendant's system was a copy of the plaintiff's by looking at similarities in the macros and the location ad values for key data elements.

In other words, brute force.

If the data (not the code) is important, you can always dump each spreadsheet to a CSV file and diff the files.

Also, if this is pre-Office 2007, don't forget the metadata which might help.

ReplyQuote

mscotgrove

(@mscotgrove)

Prominent Member

Joined: 17 years ago

Posts: 940

06/01/2010 11:04 pm

Assumimg there are differences then hashing is not relevant. ie they are different

If it is xlsx file, use Winzip to extract the elements of the files - they are all xml files. (You may need to rename the file .zip) You then want to do a text compare of these xml files and determine if the differences are relevant.

For a 2003 file, I think the idea of a csv dump, and then a text compare of these files. This should highlight lines that are different.

From my limited use of WinDiff, I would suggest this is a good starting point for the text compares

ReplyQuote

douglasbrush

(@douglasbrush)

Prominent Member

Joined: 16 years ago

Posts: 812

06/01/2010 11:20 pm

Do you have access to the system hard drive where the suspect file was retrieved from as well as the file and drive of the clients system? You can hash the systems and files and look for comparisons. If you have the ability to do some forensic images of the two systems it would be helpful. You can also look for deleted files and unallocated space for instances of the files.

As the others said however if the files have been handled quite a bit the data might have changed. In cases such as this I find the client has passed the file around numerous times and/or opened and emailed it before and THEN ask for comparisons. Try to stem the changes by capturing the data forensically before you try to compare.

The meta data and OLE content can be analyzed. There is the now famous case of the Blair document
http//www.computerbytesman.com/privacy/blair.htm

FOCA Onlne can give you some examples
http//www.informatica64.com/FOCA/default.aspx

ReplyQuote

reedsie

(@reedsie)

Eminent Member

Joined: 16 years ago

Posts: 48

07/01/2010 5:11 am

Maybe Fuzzy hashing

ReplyQuote

douglasbrush

(@douglasbrush)

Prominent Member

Joined: 16 years ago

Posts: 812

14/01/2010 6:28 am

Anyone been playing around with Office 2007 ,docX, xlsx files?

If you change the extension to .zip you will see several XML files. These can individually be hashed and compared to another document the same way to see if there are sections that match.

ReplyQuote

Patrick4n6

(@patrick4n6)

Honorable Member

Joined: 16 years ago

Posts: 650

14/01/2010 7:14 am

X-Ways displays the contents of docx as a group of XML files by default. It is after all a container format.

ReplyQuote

PaulSanderson

(@paulsanderson)

Honorable Member

Joined: 19 years ago

Posts: 651

14/01/2010 3:09 pm

If the hashes were different and the data looks the same then I would start by extracting the different OLE data streams, i.e. ignore the properties streams, and see if they compare.

ReplyQuote

Android Forensics

I have conducted a logical and partial extraction of a ...

By SgtAndroid , 10 hours ago
Article: The Balance Between Digital Forensic Examiners And Digital Evidence Technicians: Expertise Vs. Efficiency

Can digital forensic labs cut backlogs without cutting ...

By Zoe , 1 day ago
Prefetch Question

Hello All, I have a question regarding Windows prefet...

By Forensic_Tester , 2 days ago

8 Forums
15.7 K Topics
92.3 K Posts
157 Online
41.1 K Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed