±Forensic Focus Partners

Become an advertising partner

±Your Account


Username
Password

Forgotten password/username?

Site Members:

New Today: 0 Overall: 36317
New Yesterday: 0 Visitors: 142

±Latest Articles

±Follow Forensic Focus

Forensic Focus Facebook PageForensic Focus on TwitterForensic Focus LinkedIn GroupForensic Focus YouTube Channel

RSS feeds: News Forums Articles

±Latest Videos

±Latest Jobs

Windows Search forensics

Windows Search forensics



Page: 1/3

Analyzing the Windows (Desktop) Search Extensible Storage Engine database

by Joachim Metz
[email protected]


Summary

While some may curse Windows Vista for all its changes, for us forensic investigators it also introduced new interesting 'features'. One is the integration of Windows (Desktop) Search into the operating system. Most corporations have been reluctant to adopt Vista, however more and more Windows XP systems are being replaced by Windows 7 equivalents. Windows 7 also contains Windows Search and enables it by default. It actually can be challenging to disable it so one can conclude that Windows Search is becoming a relevant source of information in forensic analysis of Windows systems.

What is not widely known is that Windows Search uses the Extensible Storage Engine (ESE) to store its data. This is the same engine that Microsoft Exchange uses. Because ESE uses a propriety database format, little information about it is available in the public domain. As a consequence, it is unclear how well different forensic tools support the ESE database format.

Several years after the introduction of Windows Vista and Windows Search, currently only a handful of forensic analysis tools seem to provide support for the Windows Search database even though a Windows Search database can be a valuable source of evidence. This paper provides an overview of the ESE database format and the Windows Search database and what it might contribute in your investigations.


Background

Although the Extensible Storage Engine (ESE) is a generic database engine, forensic analysis of ESE databases seem to be centered around Exchange. Little information about forensic investigation of ESE databases in general, seem to have been published in the public domain. As far as I can tell, Mark Woan author of EseDbViewer, was one of the first who published information about forensic analysis of ESE databases in general. This was in 2008.

Early 2009, I was getting search results in Windows.edb files (Windows Search databases) on Windows XP system in some investigations. Neither EnCase or FTK seem to offer any support for this file, although they claim to have EDB support. Not many other tooling seemed to be available to analyze the Windows Search ESE database. However when investigation Windows Vista system the Widows.edb file no longer contained any relevant results.

Besides trying to verify my assumptions on the Exchange related parts in the Microsoft Exchange OST files, this triggered me to start working on the ESE database format. I therefore started the libesedb project in September 2009. Findings from the libesedb projects and some of Mark Woan's EseDbViewer have been integrated in this document.


Table of Contents

1. Overview of the ESE database format
1.1. Database header
1.2. Page based storage
1.3. Database tables and indexes
2. Analysis of a Windows Search database
2.1. Data obfuscation
2.2. Data compression
2.3. Investigative artifacts and usefulness
2.4. The Vista welcome mail
3. Conclusion
Appendix A. References
Appendix B. GNU Free Documentation License


1. Overview of the ESE database format

The Extensible Storage Engine (ESE) database format is mainly known for its use in the Microsoft Exchange, i.e. for the priv1.edb file. What is less widely known that a lot of Microsoft products use this file format, some of which are Active Directory (ntds.dit), Windows (Desktop) Search (Windows.edb) and Windows Mail (WindowsMail.MSMessageStore).

ESE is also known as Jet Blue in contrast to Jet Red that refers to the Microsoft Access database format. Microsoft has kept the specification of ESE database format closed, although the Jet Blue API has been partially documented on MSDN. The information in this document was obtained by the information available on the Internet and reverse engineering of the file format. The information obtained is maintained in a working documented titled: the Extensible Storage Engine (ESE) database (DB) format specification [ESEDB09].

There are three main variants of the ESE, one for Exchange 5.5 (ESE97), one for Exchange 2000 and later (ESE98) and one for Windows NT and later (ESENT). Active Directory and Windows Search use the ESENT version.

Basically an ESE database consists of the following elements:

• database header and a backup
• pages containing:
• space tree data
• database table data
• database index data
• long value data

The following paragraphs provide an overview of some of these elements.


1.1. Database header

The ESE database starts with a database header. The effective size of the database header is at least 667 bytes of size, e.g. the first 16 bytes.

00000000: 5c ca 88 0b ef cd ab 89 20 06 00 00 00 00 00 00 \....... .......

Bytes 4 to 8 of the database header contain the unique signature '\xef\xcd\xab\x89' of the ESEDB format. Other significant values in the header are the file type, format version and revision and page size. The database header is actually stored in a block the size of a page; which is directly followed by another block containing a backup of the database header. This is one of the data redundancy measures provided in the ESE database format.

Different versions of Windows NT use different revisions of ESE, e.g. Windows XP uses version 0x620 revision 9, Windows Vista uses version 0x620 revision 12 and Windows 7 uses version 0x620 revision 17. Different revisions can have different methods of storing data, e.g. the Windows 7 version of ESE allows for 'native' compression of data; in previous versions applications using ESE needed to do compression themselves, like the (RTF) LZFu compression used by Exchange.

When no measures are taken to detect and handle compressed data, linear search and index-based search techniques will fail. So these techniques do not suffice for finding all the strings in ESE databases.

The ESE database format is also used for streaming file, e.g. priv1.stm used by Exchange, however until now little is know about the specifics of these streaming files. ESE uses transaction logs, which in theory could be used to analyze different versions of the data and mutations. However version analysis currently is in a state of infancy.

ESE comes with the eseutil (or its equivalent esentutl). Eseutil can be used to print the database header of an ESE database. The following example prints the database header of a Windows Vista Search (Windows.edb).

eseutil.exe /mh Windows.edb

Initiating FILE DUMP mode...
Database: Windows.edb
File Type: Database
Format ulMagic: 0x89abcdef
Engine ulMagic: 0x89abcdef
Format ulVersion: 0x620,12
Engine ulVersion: 0x620,12
Created ulVersion: 0x620,12

Sometimes you can come across a 'dirty' database. This is a database that was not neatly closed. The following information in the header information will indicate if an ESE database is considered 'dirty'.

State: Dirty Shutdown

A 'dirty' database can be repaired using the repair option in eseutil.

eseutil.exe /r Windows.edb

Repairing an ESE database will alter the database file, but might be necessary for tools that cannot open 'dirty' databases. Sometimes it is also necessary repair before eseutil can perform certain operations on 'dirty' databases. Note that a successful repair is not guaranteed. Libesedb [ESEDB09] will try to open the database in its 'dirty' state.


1.2. Page based storage

At the lowest level an ESE database stores its data in pages. The size of the pages is stored in the database header and is applied to the entire database. A single page consists of a header, values and an index. A page does not need to be entirely filled, therefore a page has 'page unallocated space' which can contain remnant data. This remnant data can be of interest for forensic analysis.

A feature of impact on this remnant data is 'ESE (page) zeroing' which overwrites unused pages with various byte values. The 'zeroing' can be performed manually, by eseutil, or automatically, during online backup. For Exchange online backup is controlled by the following Registry key.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\MSExchangeIS\ParametersSystem\Zero Database During Backup

Currently the actual impact of ESE (page) zeroing for forensic investigations is unknown.

As of Windows Vista Seach, a page can contain an error correcting code (ECC). The Microsoft documentation states these ECC can only recover single-bit errors. The actual ECC method is not documented. In Windows 7 three additional ECCs were added, which probably allows for multibit recovery. This is another data redundancy measure provided in the ESE database format. Note that libesedb currently does not corrects errors using ECCs.

A page can contain multiple page values. Eseutil can be used to print the page values in page. The following example prints the values in page 13 of a Windows Vista Search ESE database (Windows.edb).

eseutil.exe /m /p13 Windows.edb

Initiating FILE DUMP mode...
Database: Windows.edb

Page: 13

expected checksum = 0x5c54a3ab36656192
new checksum format
expected ECC checksum = 0x5c54a3ab
expected XOR checksum = 0x36656192

checksum <0x00FE0000, 8>: 6653122505280414098
(0x5c54a3ab36656192)
dbtimeDirtied <0x00FE0008, 8>: 4646
(0x0000000000001226)
pgnoPrev <0x00FE0010, 4>: 0 (0x00000000)
pgnoNext <0x00FE0014, 4>: 14 (0x0000000e)
objidFDP <0x00FE0018, 4>: 2 (0x00000002)
cbFree <0x00FE001C, 2>: 3636 (0x0e34)
cbUncommittedFree <0x00FE001E, 2>: 0 (0x0000)
ibMicFree <0x00FE0020, 2>: 5151 (0x141f)
itagMicFree <0x00FE0022, 2>: 74 (0x004a)
fFlags <0x00FE0024, 4>: 10242 (0x00002802)
Leaf page
Primary page
New record format
New checksum format


TAG 0 cb:0x000d ib:0x0000 offset:0x0028-0x0034 flags:0x0000
TAG 1 cb:0x0037 ib:0x000d offset:0x0035-0x006b flags:0x0004 (c)
TAG 2 cb:0x0033 ib:0x0044 offset:0x006c-0x009e flags:0x0006 (cd)
...
TAG 73 cb:0x0057 ib:0x1025 offset:0x104d-0x10a3 flags:0x0005 (cv)

First the information about the page header is provided followed by locations of the page values. Each page value is defined a tag (or index entry) and controlled by three flags, which are identified by the characters c, d and v. The actual meaning of the flags is undocumented but the dflag seems to be used for deleted or defunct values. These deleted values are not overwritten and therefore can be interesting from an investigative point-of-view.

Eseutil does not provide means to access the data in the page values, except for some database metadata tables, like the catalog and the space tree






Next Page (2/3) Next Page