Word Forensic Analysis And Compound File Binary Format

by Arman Gungor

Microsoft Word forensic analysis is something digital forensic investigators do quite often for document authentication. Because of the great popularity of Microsoft Office, many important business documents such as contracts and memoranda are created using Word. When things go south, some of these documents become key evidence and subject to forensic authentication.

My goal in this article is to review a sample Word document in Word Binary File Format, take a look at the underlying data in Compound File Binary (CFB) file format and see what we can find out beyond what mainstream tools show us.

I chose a sample in Word Binary Format (i.e., .doc) rather than in Word Extensions to the Office Open XML File Format (i.e., .docx) because many other file types in the Microsoft universe, such as MSG files, are also based on the CFB file format. I consider CFB to be a treasure trove of forensic artifacts.

Get The Latest News!

Don't miss our top stories every month.

Unsubscribe any time. We respect your privacy - read our privacy policy.

Target Document for Word Forensic Analysis

Our target Word document is a document created on 8/30/2018 8:19 PM (PDT) using Word 2007 on a computer running Windows 7 SP-1. It was saved as a DOC file by using the “Word 97-2003 Document” option in the file save dialog in Word. While installing Office 2007, the suspect had chosen “Chris Doe” and “CD” as his “User name” and “Initials” respectively. These preferences are shown in Word options as follows:

Manipulation by the Suspect

It is important to the suspect that this Word document appears to have been created in February 2007. He is somewhat tech-savvy and identifies the FILETIME structures in the Summary Information stream of the document. The creation date (GKPIDSI_CREATE_DTM), last save date (GKPIDSI_LASTSAVE_DTM), and last printed date (GKPIDSI_LASTPRINTED) timestamps look as follows:

Internal Timestamps in The Summary Information Stream Found during Word Forensic Examination

The suspect uses an online date converter and converts the date February 8, 2007 16:15:19 UTC to FILETIME format and arrives at the value “80456D539C4BC701”. He makes a working copy of the file using Windows Explorer, and then replaces the bytes above for the creation timestamp with his new FILETIME value.

The suspect checks the internal metadata of the Word document using freely available tools such as olefile and ExifTooland sees that the creation date internal timestamp is reported as he expected. The following is the output by olefile (emphasis added):

The suspect then fires up Word to see if his edits are recognized by Word as he intended. Word shows the properties of the file as follows:

To his surprise, the original creation date of 8/30/2018 8:19 PM (UTC -7) is still there! Running a search for the byte sequence for that FILETIME value (006A3A5CD940D401) returns no results. So, where is this date coming from?

He also notices a discrepancy between how Word counts the number of words in the document (538), and how ExifTool and olefile count them (539). He is not too concerned about this from a Word forensic authentication perspective.

Perplexed by the mysterious creation date, the suspect goes back to the drawing board, does more research and learns that Word documents also contain a Dop structure, which stores their creation (dttmCreated), last modification (dttmRevised), and last print (dttmLastPrint) dates as DTTM structures.

The DTTM structure is quite different than a FILETIME structure. It looks as follows:

* Day of the week is an unsigned integer starting with Sunday (0x0) and ending with Saturday (0x6).

The suspect finally finds the DTTM structure that represents the creation date of the document (dttmCreated). It looks as follows:

Word dttmcreated Value in Dop Found during Word Forensic Analysis

He then converts his desired creation date (Thursday, February 8, 2007 08:15 AM (UTC -8)) to DTTM as follows (note that the DTTM structure does not contain any data for seconds, nor does it contain time zone information):

This results in a DTTM value of 0F42 B286. Once the byte sequence is replaced, Word shows the internal creation timestamp as follows:

Word document metadata after manipulation

Pleased with his accomplishment, the suspect emails the manipulated document and calls it a “native ESI production”. This way, he thinks, he won’t have to worry about inconsistencies in the file system timestamps. Although, with some more effort, he is confident that he could doctor them, too.

Forensic Authentication of the Word Document

The forensic examiner receives a copy of the email containing the manipulated Word document for forensic authentication. The email is in MSG format, exported from the mailbox of the attorney who hired her. This is not ideal, but it is the best available copy she has access to at that moment.

After making a preservation copy, she starts by examining the attachments table in the MSG file. The IAttach interface shows the following MAPI properties for the attachment:

MAPI Properties for Attachment Found during Word Forensic Authentication

Manually decoding the FILETIME values for PR_CREATION_TIME and PR_LAST_MODIFICATION_TIME, she finds a creation timestamp of 9/10/2018 22:20:46.9509489 (UTC) and a last modification timestamp of 9/11/2018 04:20:34.1458881 (UTC). The email containing the attachment has creation and sent dates (PR_CREATION_TIME and PR_CLIENT_SUBMIT_TIME) that are both several hours later—9/11/2018 20:48 (UTC).

Considering the presence of high-resolution timestamps and the timing of when the email was sent, it is very likely that the creation and last modification timestamps the examiner identified above were the file system timestamps of the Word document on the suspect’s system as he attached the file to the email. Moreover, it is likely that the files resided in a file system with high timestamp resolution such as NTFS. She makes a note of these timestamps.

The forensic examiner then saves the Word attachment out to a folder for further analysis. She notes that when the attachment is saved, the creation file system timestamp is preserved (i.e., 9/10/2018 22:20:46.9509489 (UTC)), but the last modification file system timestamp is set to the time when she saved the attachment.

Keeping this in mind, she runs the file through X-Ways 19.5 and extracts internal file metadata to get her examination started. X-Ways shows the following information:

Word Document Metadata Extracted by X-Ways Forensics 19.5 during Word Forensic Authentication

There are a few things here that she finds interesting from a Word forensic authentication perspective:

Application Version (AppVersion)

X-Ways reported an AppVersion value of 12.0. Our forensic examiner wants to manually verify where this value is coming from. The Document Summary Information stream of the Word document contains a property named GKPIDDSI_VERSION. This property specifies the version of the application that wrote the property set storage.

In this case, the value of this property is set to 000C 0000. The 000C 0000 bytes indicate the major and minor version of the application, which is interpreted as 12.0 in the following manner:

0xC0000 is equal to 786,432 in decimal, which was the “version” value reported by olefile earlier in the article.

Word 12.0 is also known as Word 2007, which was released in late 2006. So, the application version does not pose a problem with the apparent creation date of the document, which is in February 2007.

Operating System Version (OSVersion)

In addition to the AppVersion, both the Summary Information and the Document Summary Information streams in a Word document contain a 4-byte PropertySetSystemIdentifier structure. The first two bytes of the structure indicate the major and minor versions of the operating system that wrote the property set. The last two bytes represent the OSType. According to the specification, OSType must be 0x0002.

Operating System Version Found during Word Forensic Analysis

In the screenshot above, you can see the PropertySetSystemIdentifier structure highlighted. The 06 and 01 values indicate the major and minor version of the OS respectively. Windows 6.1 represents Windows 7, which was released to the public in the second half of 2009, which is after the apparent creation date of the Word document.

It is easy to jump to a conclusion here and consider this a red flag. However, the forensic examiner knows that when the Word document is saved, the property sets are re-written and the AppVersion and OsVersion values are updated to reflect the application and OS that were used during the last save. Since the Word document was last modified in 2018, it is possible that Windows 7 was used in a subsequent save, but not necessarily when the document was initially created.

Root Entry Date

In addition to the internal creation and last modification timestamps, the Word document contains a FILETIME structure that represents the modification timestamp of the root entry of the CFB format file. This value looks as follows:

CFB Root Entry Modification Date Found during Word Forensic Examination

The FILETIME value 90AD48A1DA40D401 represents 8/30/2018 03:28:05.3530000 PM (UTC). The digital forensics expert notes a few things here:

  1. The root entry modification date is within the same minute as the internal last modification timestamp of the document. This makes sense, as saving the document via Word would cause the modification date of the root entry to be updated.
  2. The root entry timestamp has millisecond precision although the FILETIME structure allows for higher precision. This is consistent with a Word document in this format.
  3. This value matches what X-Ways reported as “Internal Modification”

Resolution of The Timestamps in The Summary Information Stream

When the forensic examiner looks at the internal timestamps found in the Summary Information stream of the document, she sees the following:

Resolution of The Timestamps found in The Summary Information Stream during Word Forensic Analysis

The timestamps are as follows:

Last printed date: 003C84C7D940D401 (8/31/2018 03:22:00.0000000)
Creation date: 80456D539C4BC701 (2/8/2007 16:15:19.0000000)
Last modification date: 00E0179EDA40D401 (8/31/2018 03:28:00.0000000)

As you will notice in the highlighted digits, both the last printed and the last modification timestamps have minute precision, while the creation timestamp has second precision. While the FILETIME structure allows for much higher precision, in my experience, the three timestamps found in the Summary Information stream of Word documents have minute precision.

The computer forensics expert notes that the creation timestamp found in the Summary Information stream of the Word document has inconsistent precision compared to the other timestamps. This could be because the timestamp was altered outside of Word.

Word Forensic Authentication Findings & Next Steps

The forensic examiner had limited information to work off of in this case. She had to look at the Word document in isolation and attempted to find if there were any inconsistencies. A summary of her key findings is as follows:

  1. The file system modification timestamp of the Word document (9/11/2018) did not match the last modification timestamps found inside the Word document (8/31/2018). Apparently, the file system timestamp was changed after the Word document was last saved. This may have been caused by someone altering the document outside of Word.
  2. The internal creation date timestamp found in the Summary Information stream of the Word document had an inconsistent resolution when compared to other timestamps in the same stream.

At this point, the forensic examiner will want to review the workstation where the Word document was created, modified and accessed. Examining the artifacts found on the workstation, she will attempt to confirm that the Word document was backdated, and find out how.

Conclusions and Notes

First of all, some of the readers might be thinking that this is not the most efficient way to backdate a Word document. It requires quite a bit of technical knowledge and leaves a lot of room for error. You are right! My goal was to highlight certain data structures that may be valuable to fellow digital forensic examiners—not to help the bad guys get more proficient in document forgery.

It is important to note that the mainstream digital forensics tools that I tested (i.e., X-Ways v19.5 and FTK v6.4, as well as freely available, general-purpose tools such as olefile and ExifTool) did not parse the DTTM structures in the Dop. So, when there was a discrepancy between the dates found in the Summary Information stream of the Word document and those in the Dop, I was able to observe the discrepancy only in Word, and by manual examination. It pays to check the Dop against the Summary Information stream if manipulation of the document is suspected.

The various FILETIME structures used throughout the CFB file format have different resolutions by design. For example, the creation, last modification and last print timestamps in the Summary Information stream of the document have minute precision; the edit time value (GKPIDSI_EDITTIME) found in the Summary Information stream of the document represents a duration in hundreds of nanoseconds rather than a full date; and the modification timestamp of the root entry has millisecond precision. It is important to be familiar with the resolution of each timestamp for word forensic authentication.

AppVersion and OSVersion are data points that can be helpful in identifying discrepancies in a Word document. For example, if the OSVersion points to Windows 10, but none of the timestamps in the document are after 2013, you might want to take a closer look at that document.

Finally, if you perform Word forensic authentication, I strongly recommend that you get familiar with the Microsoft specifications listed below as references. You will see that the CFB file format contains a ton of interesting information which could be valuable in your next investigation. You can also manually verify values parsed by your forensic tools, and find evidence that’s beyond what mainstream tools are able to report.

For example, comparing the outputs of X-Ways v19.5 and FTK v6.4, I found that FTK did not report the AppVersion or the root entry modification date, and reported the OSVersion of the Word document as 6.1.2, which is not entirely correct as the “2” at the end refers to the OSType, not version. Such interpretation issues can quickly be cleared up if you are able to write a quick script, or fire up your hex editor and take a look for yourself.


  • Word (.doc) Binary File Format [MS-DOC]
  • Office Common Data Types and Objects Structures [MS-OSHARED]
  • Compound File Binary File Format [MS-CFB]
  • Object Linking and Embedding (OLE) Property Set Data Structures [MS-OLEPS]

About The Author

Arman Gungor, CCE, is a digital forensics and eDiscovery expert and the founder of Metaspike. He has over 21 years’ computer and technology experience and has been appointed by courts as a neutral computer forensics expert as well as a neutral eDiscovery consultant.

1 thought on “Word Forensic Analysis And Compound File Binary Format”

Leave a Comment

Latest Videos

This error message is only visible to WordPress admins

Important: No API Key Entered.

Many features are not available without adding an API Key. Please go to the YouTube Feed settings page to add an API key after following these instructions.

Latest Articles

Share to...