Hi There
I am working on a case where we found an Excel file on a suspect PC that belonged to a company he used to work for. Now the company wants to prove that this file was the same as there original file.
What tools are out there to compare files Forensically to see if they are the same?
I found Compare IT and windiff.
Thanks
I would suggest that the question is far too broad.
First off, Excel is a binary format…particularly the versions prior to Office 2007. Tools usually used to determine if the files are exactly identical would include running MD5 hashes…but changing a single bit would also change the hash. If the user opened the file their own system, you may find that that action alone would modify enough information within the binary contents of the file (ie, metadata) to alter the hash.
I'd start with file sizes, then document the hash and metadata. Probably the best way to get a definitive answer on this would be to just open the files and compare the two.
I had a very similar case in which a contractor claimed that an ex-employee had taken a copy of their job bidding software which was implemented in Excel. The bidding program was the key to my client's success because it data and algorithms which allowed him to underbid competitors and still make a profit.
In this case, the ex-employee changed certain column names, macro names, etc., but the essence of the program remained the same.
We were able to convince a judge that the defendant's system was a copy of the plaintiff's by looking at similarities in the macros and the location ad values for key data elements.
In other words, brute force.
If the data (not the code) is important, you can always dump each spreadsheet to a CSV file and diff the files.
Also, if this is pre-Office 2007, don't forget the metadata which might help.
Assumimg there are differences then hashing is not relevant. ie they are different
If it is xlsx file, use Winzip to extract the elements of the files - they are all xml files. (You may need to rename the file .zip) You then want to do a text compare of these xml files and determine if the differences are relevant.
For a 2003 file, I think the idea of a csv dump, and then a text compare of these files. This should highlight lines that are different.
From my limited use of WinDiff, I would suggest this is a good starting point for the text compares
Do you have access to the system hard drive where the suspect file was retrieved from as well as the file and drive of the clients system? You can hash the systems and files and look for comparisons. If you have the ability to do some forensic images of the two systems it would be helpful. You can also look for deleted files and unallocated space for instances of the files.
As the others said however if the files have been handled quite a bit the data might have changed. In cases such as this I find the client has passed the file around numerous times and/or opened and emailed it before and THEN ask for comparisons. Try to stem the changes by capturing the data forensically before you try to compare.
The meta data and OLE content can be analyzed. There is the now famous case of the Blair document
FOCA Onlne can give you some examples
Maybe Fuzzy hashing
Anyone been playing around with Office 2007 ,docX, xlsx files?
If you change the extension to .zip you will see several XML files. These can individually be hashed and compared to another document the same way to see if there are sections that match.
X-Ways displays the contents of docx as a group of XML files by default. It is after all a container format.
If the hashes were different and the data looks the same then I would start by extracting the different OLE data streams, i.e. ignore the properties streams, and see if they compare.