Question about time...
 
Notifications
Clear all

Question about timestamp in Office Open XML files

7 Posts
3 Users
0 Likes
1,310 Views
doublezero
(@doublezero)
Posts: 12
Active Member
Topic starter
 

Hey guys,

I'm working on a case where we are trying to find modifications on a Excel xlsx file.
These office files (docx, xlsx, etc) work as "packages" and can be unpacked as compressed files.
The "uncompressed" files on that packages contains timestamps, and these timestamps matches the file metadata. OK!
But I'm now being questioned on why the folders on that structure are "1 jan 1970" (epoch date), and to be honest, I don't know WHY.
Can someone point to some direction?

 
Posted : 07/04/2020 1:47 am
(@athulin)
Posts: 1156
Noble Member
 

But I'm now being questioned on why the folders on that structure are "1 jan 1970" (epoch date), and to be honest, I don't know WHY.

For that a whole lot of other information needs to be collected.

What folders exactly? _rels? xl?

What tool is saying that? Have you validated that it does translate these timestamps correctly? If not, the reason may be 'it's a bug in my software'.

What file timestamp exactly? Zip only supports 'last modification date' as standard, but stores it in two places. And Zip allows for at least one situation in which one of these may be left zero – though it involves encryption, which I assume is not relevant. Or is it some other timestamp you're talking about?

Are you looking at the .xlsx/zip internals, or the result of an unpack? If the latter, perhaps the unpacking tool misbehaves – have you validated that it works as expected?

Does normal use of .xlsx produce other time stamps? If it does, the reason may be that these files were produced by some other software that don't follow Microsoft conventions. (I've checked a Office 2010 .xlsx file and PKZIP without seeing anything like what you seem to be saying, so … ???)

Without knowing what tools you are using or exactly where those tools take their information from, it's not easy to suggest an answer.

That said …

One possible anomaly noted Are you sure that '1 jan 1970' is correct? ZIP stores timestamps in MS-DOS format, which has the epoch 1980, so seeing a date so closely related to Unix or sometimes C time epochs is odd. (Is that what the question is really about?)

And ZIP does not store timestamps for directories, as far as can find. Only for files.

On the test I mentioned above, PKZIP (which is as close to official implementation I believe I can come) does not list any date for any of the directories of the test .xlsx/ZIP file, and the timestamp '1980-01-01 000' for all files present. (I was using Microsoft Excel 2010 to create and write the file; just the default empty spreadsheet written straight to .xlsx)

Any directory timestamps thus seem fairly likely to come from one or more of the tools you have used.

(Added at this point it might be relevant who actually asked you about the timestamp. If this in an educational setting, this is as good an illustration I can think of that tools may not do the right thing, processes may not be valid for all file systems, and that the more things you do with a particular piece of evidence (the .xlsx file) the larger the risk for unwanted behaviour to affect the result – in this case, an unknown unpacking tools is at work, as well as an unknown file system, and not unlikely an untested and possibly non-forensic 'tool' to read out the timestamps from that file systems). Thus, as teaching, this may be just what is intended.

If an officer of the court asked the question – opposing counsel/attorney, for example – it's a bit different.

You don't need to reply to that question.)

 
Posted : 07/04/2020 7:06 am
jaclaz
(@jaclaz)
Posts: 5133
Illustrious Member
 

On the test I mentioned above, PKZIP (which is as close to official implementation I believe I can come) does not list any date for any of the directories of the test .xlsx/ZIP file, and the timestamp '1980-01-01 000' for all files present. (I was using Microsoft Excel 2010 to create and write the file; just the default empty spreadsheet written straight to .xlsx)

I don't have any Office that can create .xlsx but I checked with 7-zip a few .xslx files (at random) I had (stuff downloaded or sent me) and I can confirm that in it
1) files have date '1980-01-01 000'
2) folders have no date

BUT (only for the record) 😯 by pure chance I have found a single .xslx that behaves differently (it is encrypted/protected by password), in it
1) files have NO date
2) folders have both created and modified dates (and they are "current" *like* 2013-10-04 955 )

It is part of a package I downloaded and I don't have (nor need) a password for it, it is only a spreadshhet with some listing that I already got from a previous version.

In root of the .zip/.xlsx there is
[6]DataSpaces (Folder with date)
EncryptedPackage (File no date)
EncryptionInfo (File no date)

It seems like 7-zip can open it, but it is not really a .zip, but rather an OLE compound
https://stackoverflow.com/questions/858232/office-open-xml-ooxml-specification-encryption
http//www.lyquidity.com/devblog/?p=35
https://code.google.com/archive/p/ooxmlcrypto/

jaclaz

 
Posted : 07/04/2020 10:24 am
doublezero
(@doublezero)
Posts: 12
Active Member
Topic starter
 

I’ll be in the lab later today and get more info about it for a reply. Thank you guys.

 
Posted : 07/04/2020 3:32 pm
doublezero
(@doublezero)
Posts: 12
Active Member
Topic starter
 

But I'm now being questioned on why the folders on that structure are "1 jan 1970" (epoch date), and to be honest, I don't know WHY.

For that a whole lot of other information needs to be collected.

What folders exactly? _rels? xl?

_rels, docProps, drawings

What tool is saying that? Have you validated that it does translate these timestamps correctly? If not, the reason may be 'it's a bug in my software'.

What file timestamp exactly? Zip only supports 'last modification date' as standard, but stores it in two places. And Zip allows for at least one situation in which one of these may be left zero – though it involves encryption, which I assume is not relevant. Or is it some other timestamp you're talking about?

Are you looking at the .xlsx/zip internals, or the result of an unpack? If the latter, perhaps the unpacking tool misbehaves – have you validated that it works as expected?

On linux, Ark shows epoch as 1 jan 1970 for the folders I pointed before, all the other files have modification dates as expected for last time the file was modified. Unzip and 7z shows 1 jan 1980 for [Content_Types].xml and everything else is modified to the date of decompression. I'll test more tools.

Does normal use of .xlsx produce other time stamps? If it does, the reason may be that these files were produced by some other software that don't follow Microsoft conventions. (I've checked a Office 2010 .xlsx file and PKZIP without seeing anything like what you seem to be saying, so … ???)

Without knowing what tools you are using or exactly where those tools take their information from, it's not easy to suggest an answer.

The original file was created on libreoffice, but I tested on latest Microsoft Office with other files and the same applies.

That said …

One possible anomaly noted Are you sure that '1 jan 1970' is correct? ZIP stores timestamps in MS-DOS format, which has the epoch 1980, so seeing a date so closely related to Unix or sometimes C time epochs is odd. (Is that what the question is really about?)

And ZIP does not store timestamps for directories, as far as can find. Only for files.

Some directiories on my file contains "correct" timestamps (compatible with the file metadata from Aug 2019), but others not. The same applies to some files inside the zip. I cant see a pattern here…

On the test I mentioned above, PKZIP (which is as close to official implementation I believe I can come) does not list any date for any of the directories of the test .xlsx/ZIP file, and the timestamp '1980-01-01 000' for all files present. (I was using Microsoft Excel 2010 to create and write the file; just the default empty spreadsheet written straight to .xlsx)

Any directory timestamps thus seem fairly likely to come from one or more of the tools you have used.

(Added at this point it might be relevant who actually asked you about the timestamp. If this in an educational setting, this is as good an illustration I can think of that tools may not do the right thing, processes may not be valid for all file systems, and that the more things you do with a particular piece of evidence (the .xlsx file) the larger the risk for unwanted behaviour to affect the result – in this case, an unknown unpacking tools is at work, as well as an unknown file system, and not unlikely an untested and possibly non-forensic 'tool' to read out the timestamps from that file systems). Thus, as teaching, this may be just what is intended.

If an officer of the court asked the question – opposing counsel/attorney, for example – it's a bit different.

You don't need to reply to that question.)

Yeah, my problem is that the lawyer who asked for this job is concerned about the epoch date timestamp. Everything else that we have can help with the case, but the epoch date can help the opposing side.

 
Posted : 07/04/2020 8:18 pm
doublezero
(@doublezero)
Posts: 12
Active Member
Topic starter
 

guys, after several tests, I can confirm that "pure" libreoffice xlsx saves modification times to ALL files inside the container. Microsoft xlsx dont.
I think that on the case I'm working, the original file was created using Microsoft xlsx, and was then modified by libreoffice with some files saved with the modification times.

 
Posted : 07/04/2020 9:48 pm
(@athulin)
Posts: 1156
Noble Member
 

On linux, Ark shows epoch as 1 jan 1970 for the folders I pointed before, all the other files have modification dates as expected for last time the file was modified. Unzip and 7z shows 1 jan 1980 for [Content_Types].xml and everything else is modified to the date of decompression.

What version of Ark is that? When I try Ark 17.2.3 (on Kubuntu, latest LTS, default installation) it does not 'show' anything for folders in an 'Untitled.xlsx' created from an empty spreadsheet from LibreOffice. For the files, it shows, as you note, the correct date, not 1 jan 1980. (For the Windows test file, it does show 1980, but then we have already established that Microsoft don't seem to set file timestamps correctly.)

(Added see below. I was able to repeat it with a later version of Ark …)

'Unzip -l' as well as '7z l' show correct dates for the files in the Linux test xlsx file, and 1980 for the files in the Windows test file. Directories are not shown at all, except as component in file paths, so no timestamps for those.

Extracted file time stamps show up with 'ls -l' to be close enough to extraction time, though ARK produces slightly different results from unzip and 7zip. It looks as if there's a time zone adjustment going on with ARK that does not happen for the other tools, but … it's not a conclusion, only an impression.

So … as I can confirm your findings, except for ARK and the 1970 date, I suspect your ARK may not be 'forensically sound', or that your source file may do something aggravating to ARK(?) that I can't repeat. As ARK seems to time-zone adjust timestamps without timezone specification, I would, in general, be inclined to put it on the list of 'Linux tools that are not forensically sound'. But then it probably wasn't designed to be so.

From a practical standpoint, any extracted directory structures from .xlsx = zip archives should come with the warning that directory timestamps are not and cannot be correct, as zip archives do not contain such information (but see jaclaz' posting on encrypted archives).

That make me scratch my head over your statement that

Some directiories on my file contains "correct" timestamps (compatible with the file metadata from Aug 2019), but others not.

assuming it relates to directories in the . xlsx archive.

(Added later I repeated the tests with Ark 19.04.3 … and it gives unpacked directories a Jan 1 1970 time stamp

ath@ubuntu~/New Folder/Book1$ ls -l
total 16
-rw-rw-r– 1 ath ath 1304 Jan 1 1980 '[Content_Types].xml'
drwxrwxr-x 2 ath ath 4096 Jan 1 1970 docProps
drwxrwxr-x 2 ath ath 4096 Jan 1 1970 _rels
drwxrwxr-x 5 ath ath 4096 Apr 8 0038 xl

That's probably slightly better than to give them 'now' as timestamp, as the ear.lier version I tested, but it still may be mistaken for a real timestamp. (The only filesystem I know that allow for explicitly undefined time stamps is ISO 9660 – perhaps UDF has it too. On those it would be possible to avoid this particular problem … )

So … the warning about about timestamps of extracted directories holds true.)

 
Posted : 08/04/2020 7:23 am
Share: