by Anil Kumar
Types of Virtual Hard Disk Image Format
The hard disk of a VM is implemented as the files, which live on their native file systems of the host machine. MS Virtual PC & MS Virtual Server support the below mentioned types of virtual hard disk formats:
Fixed – The fixed hard disk image is that particular file, which is allocated to the virtual disk size. Say for instance, if a user has created a particular virtual hard disk of 2GB, then his machine will be creating one host file, which would be around 2GB. This space that is allocated for the data is then tracked by the structure of the footer. The complete file size is actually the hard disk size in the guest OS, added to the footer size. Due to the limitation of the size in the host file system, this type of hard disk may be somewhat limited. For instance, on some FAT32 file systems, the highest virtual hard disk size is around 4 GB.
Dynamic – This type of hard disk image is a basically a dynamic file, which, at any point in time, is of the size that is as big as the original data which is written on it, added to the header & footer size. In a dynamic hard disk, the allocation is carried out in blocks. As time progresses, as more and more data gets written to it, the file itself increases dynamically in its size by allocating more and more blocks. As an example, the file size that backs a virtual hard disk of 2GB size is originally about 2 MB size on the host file system. With time, as more and more data keeps getting written to that image, it keeps on growing in size with a maximum limit of 2GB. These types of hard disks usually store the metadata which is used to access the user data that is saved on the hard drive. The upper limit of the hard disk size – the dynamic one – is around 2040 GB. The real size is constrained by the primary protocol of the disk hardware. For instance, the ATA dynamic hard disks usually have 127-GB size limit. Each moment, when any data block is to be added, the footer of the dynamic hard disk has to be transferred to the end of that file. Now, as the footer of the hard disk is an important and crucial part of that dynamic hard disk image, the hard disk footer is imitated as the header at the front part of the file for the purpose of redundancy.
Differencing – This type of hard disk image shows the present virtual hard disk state as a pile of all the modified blocks compared to the parent image. Differencing hard disk image is not the independent one; rather it is the one that depends on other hard disk image for being fully functional. The parent image can be any one of the three mentioned image types, which includes one more differencing image of the hard disk.
Format for the Footer
All types of hard disk image collectively share one basic format for the footer. Every type of the hard disk further extends this basic format as per the specific needs.
The footer format for the hard disk is given in the table below:
Footer Fields in the Hard disk |
Size (in bytes) |
Cookie |
8 |
File Format Version |
4 |
Features |
4 |
Time Stamp |
4 |
Data Offset |
8 |
Creator Version |
4 |
Creator Application |
4 |
Disk Type |
4 |
Creator Host OS |
4 |
Disk Geometry |
4 |
Original Size |
8 |
Current Size |
8 |
Checksum |
4 |
Saved State |
1 |
Unique Id |
16 |
Reserved |
427 |
Important to Note
The versions prior to the MS Virtual PC 2004 version used to form disk images have a disk footer of 511 bytes. So, the footer of the hard disk could exist in the ending 511/512 bytes of that file holding the hard disk image.
Field Descriptions of the Footer
The information below will provide you with the detailed definitions and informative explanations of the footer fields of the hard disk.
Features
This one is basically the bit field that is used for indicating particular support features. The table given below is displaying the list of all the features. The fields that are not listed are the ones that are reserved.
Feature |
Value |
Temporary |
0x00000001 |
Reserved |
0x00000002 |
No features enabled |
0x00000000 |
Cookies
Cookies are used for distinctively identifying the original hard disk image creator. These particular values are very much case-sensitive. Microsoft makes use of the string “connectix” for identifying this particular file as the hard disk image formed by the Virtual PC, MS Virtual Server, as well as the precursor ones. Cookies are basically stored as 8-character ASCII strings with “c” in the very 1st byte, “o” in the 2nd one, and so on and so forth.
- “No features enabled.” This means that the hard disk image has no specific special features that are enabled in it.
- “Temporary.” This implies that this bit is set when the present disk is a temporary one, which signifies to a particular application that this specific disk is a contestant for deletion when the machine is shut down.
- “Reserved.” The reserved bit should always be set to one.
All the other bits are also reserved and they all must also be set to zero.
The Data Offset
This field in particular is there to hold the complete byte offset, right from the very beginning to even the next structure. The data offset field is used in the dynamic & differencing disks, and not in the fixed ones. For the fixed ones however, this data offset field must be set to the value “0xFFFFFFFF”.
The File Format Version
This specific field is separated into two versions: the major and the minor. It matches the specification version that is used in file creation. The most important two bytes are for the major one & the least important two bytes are for the minor one. Also, this must match the specification of the file format. For the present prevailing specification, the initialization of the field must be to the value “0x00010000”. The increment of the major version will happen only when modification of the file format is done in such a fashion that it has stopped being compatible with the previous file format versions.
The Time Stamp
The time stamp field is there to store the hard disk image’s time creation. This is in UTC/GMT and it is the number of seconds from 1st Jan of year 2000 at 12:00 AM.
The Creator Version
The creator version field is there to hold the application’s major version, or the minor version which created the hard disk image. In the Virtual Server version 2004, this is set to the value “0x00010000” and in the Virtual PC version 2004, this is set to the value “0x00050000”.
The Creator Application
The creator application field is used for documenting the application that had formed the hard disk. This specific field is basically a text field that is left-justified. This field makes use of a character set that is single-byte. For MS Virtual PC hard disk, “vpc” is written in this particular field. And if the hard disk image is formed by MS Virtual Server, then “vs ” is written in this field. Other such applications must make use of their own specific and unique identifiers.
The Creator Host Operating System
This particular field basically stores the host OS type on which this specific disk image is created.
Host Operating System Type |
Value |
Macintosh |
0x4D616320 (Mac) |
Windows |
0x5769326B (Wi2k) |
The Current Size
This specific field is for storing the present hard disk size, in bytes, from the viewpoint of the VM. This particular value is very much the same as the original size which was there at the time of the hard disk creation. This specific value is subject to change as per the expansion of the hard disk.
The Original Size
This field is to store the hard disk size in bytes, from the perception of the VM, at the time of creation. This field is basically there for informational purposes.
The Disk Geometry
This field stores the heads, cylinder and sectors for each hard disk track value.
Disk Geometry field |
Size (in bytes) |
Sectors per track/cylinder |
1 |
Heads |
1 |
Cylinder |
2 |
Upon the configuration of a hard disk such as an ATA hard disk, its CHS values (i.e. Cylinder, Heads, and Sectors for each track) are used by the ATA controller for determining the disk size. When an end-user executes the process of creating a hard disk of any size, then the hard disk image size in the VM is less than the one that is created by that end-user. This happens more or less due to the CHS value, which is calculated from the size of the hard disk rounded down.
The Disk Type Fields and Their Values Table
The Disk Type Field |
The Value |
None |
0 |
Reserved (deprecated) |
1 |
Fixed hard disk |
2 |
Dynamic hard disk |
3 |
Differencing hard disk |
4 |
Reserved (deprecated) |
5 |
Reserved (deprecated) |
6 |
Reserved
This field consists of zeroes; its size is 427 bytes.
Saved State
This field holds the 1-byte flag, describing whether or not the entire system is in the saved state. If it is in the saved state, then this value is set to “one”. Also, compaction and expansion operations cannot be carried out on the hard disk if it is in the saved state.
Checksum
The checksum field in the hard disk holds a basic hard disk footer checksum. It is only the one’s complement collectively of the summation of all bytes in the footer with no checksum field. If the verification of the checksum somehow fails, then the Microsoft Virtual PC & the MS Virtual Server products will make use of the header in its place. And if this checksum in the hard disk header fails, then the user should come to the conclusion that the file is corrupt.
Unique ID
Each hard disk consists of a unique ID that is being stored there, which is used to identify it. This is a 128-bit UUID, which stands for “universally unique identifier”. This particular field of unique ID is used for associating a parent hard disk image with its differencing ones.
Dynamic Disk Header Format
For the dynamic disk image as well as the differencing disk image, the field for the “Data Offset” in the disk image footer points to some secondary structure, which gives extra info regarding the disk image. The dynamic disk header must appear on a 512-byte sector boundary.
Dynamic Disk Header Format Table
Dynamic Disk Header Fields |
Size (calculated in bytes) |
Cookie |
8 |
Table Offset |
8 |
Data Offset |
8 |
Max Table Entries |
4 |
Header Version |
4 |
Block Size |
4 |
Parent Time Stamp |
4 |
Reserved |
4 |
Checksum |
4 |
Parent Unique ID |
16 |
Parent Unicode Name |
512 |
Parent Locator Entry 1 |
24 |
Parent Locator Entry 2 |
24 |
Parent Locator Entry 3 |
24 |
Parent Locator Entry 4 |
24 |
Parent Locator Entry 5 |
24 |
Parent Locator Entry 6 |
24 |
Parent Locator Entry 7 |
24 |
Parent Locator Entry 8 |
24 |
Reserved |
256 |
Field Descriptions for the Dynamic Disk Header
The details below give informative explanations about the dynamic disk header fields.
Cookie
This particular field holds the value “cxsparse”. This field is there to identify the header.
Data Offset
The data offset field consists of the complete byte offset inside the hard disk image which goes to the next structure as well. This is presently not used by the prevailing formats and must be set to the value 0xFFFFFFFF.
Table Offset
The table offset field stores the compete byte offset of BAT, which is the “Block Allocation Table” in the file.
Header Version
The header version field is there to store the dynamic disk header version. This field is separated into the Major version and the Minor version: the lowest important 2 bytes depict the minor one and the highly important 2 bytes show the major one. This should go with the specification for the file format. For this particular specification, the header version field should have the initialization to the value 0x00010000. The major version will only be incremented when the header format has the modification in such a manner that it is not any more compatible with the previous product versions.
Max Table Entries
This field is there for holding the most number of entries that are present in the “Block Allocation Table” or BAT. This must be equivalent to the number of disk blocks (which is calculated by dividing the disk size with the block size).
Block Size
To define ‘block’: it is an expansion unit for both the dynamic and the differencing hard disks. The storage of this is in bytes. Block size will not comprise of the block bitmap size. This block size is only the data section size of the particular block. The number of sectors for each block should always be a power of 2. Its value by default is “0x00200000” (which indicates a 2 MB block size).
Checksum
This field holds in it a basic dynamic header checksum. It is the summation of all bytes in the header with no checksum field. In cases where the verification of the checksum somehow fails, then the user must consider the file to be corrupted.
Parent Unique ID
The parent unique ID field is usually used for the differencing disks. Normally, a differencing hard disk stores a 128-bit UUID, which stands for “universally unique identifier,” of the parent hard disk.
Parent Time Stamp
The parent time stamp field is there to store the parent hard disk’s modifying time stamp. This is in UTC/GMT and it is the number of seconds from 1st Jan of year 2000 at time 12:00 AM.
Reserved
This particular field must be set to 0.
Parent Unicode Name
This field consists of a UTF-16 Unicode string of the filename of the parent hard disk.
Parent Locator Entries
The parent locator entries actually store a complete byte offset within the file at the place where the parent locator is stored for a differencing disk. This field is used only for the differencing ones and must be set to 0 for the dynamic ones.
Table Describing the Fields within Every Locator Entry
Parent Locator Table Field |
Size (Calculated In Bytes) |
Platform Code |
4 |
Platform Data Length |
4 |
Platform Data Space |
4 |
Platform Data Offset |
8 |
Reserved |
4 |
BAT & Data Blocks
The BAT or the Block Allocation Table is actually the table of the total sector offsets that back the hard disk in the file. The Dynamic Disk Header’s “Table Offset” field points to this. During hard disk creation, the BAT size gets calculated. The BAT entries are the number of blocks which is needed for storing the disk contents once it completely expands. For instance, a disk image of 2GB which uses 2MB blocks needs entries up to 1024 BAT. Every entry is 4 bytes in length. All the non-used entries of the table have been initialized to the value “0xFFFFFFFF”. The BAT is always extended to a particular boundary of the sector. The field of “Max Table Entries” in the Dynamic Disk Header shows the number of valid entries. Every BAT entry implies a certain block inside the disk image. The data block contains a sector bitmap & data.
- For dynamic disks, this sector bitmap shows which are the sectors that consist of valid data (ones), as well as which are the sectors that are never modified (zeros).
- For differencing disks, this sector bitmap shows which are the sectors that are located inside the differencing disk (ones) as well as which sectors are located in the parent one (zeros).
Padding of the bitmap is done to a boundary of 512-byte sector. Usually, a block is a “power-of-2” multiple of the sectors. In the default setting, the block size is set to 4096 512-byte sectors (i.e. 2 MB). All the blocks inside a provided image should be of similar size, which is determined in the field of “Block Size” of the header of the Dynamic Disk. All the sectors in total inside a block, those who have corresponding bits as 0 in the bitmap should be containing 512 bytes of 0 on the disk. The softwares that are accessing this disk image might as well get the benefit of this supposition for increasing the performance.
Note
Even though varying block sizes are supported by the format, MS Virtual PC 2004 version & Virtual Server 2005 version are both tested only with 512K block size and 2 MB block size.
Implementation of a Dynamic Disk
Allocation of the blocks is done as per the demand. On the creation of a dynamic disk, none of the blocks are allocated in the beginning. A freshly formed image consists of only those data structures that are described earlier (which include BAT & Dynamic Disk Header). When we write data to the image, the dynamic disk expands to include a fresh block. Updating of BAT is done to contain the offset for every fresh block that is allocated inside the image.
How to map disk sector to a Sector inside the block
For the purpose of calculating a block number from a referenced sector number, the below mentioned formula is used:
BlockNumber = floor(RawSectorNumber / SectorsPerBlock)
SectorInBlock = RawSectorNumber % SectorsPerBlock
The BlockNumber is used as the index into BAT. The BAT entry consists of the complete sector offset of the start of the bitmap of the block which is then followed by the data of the block.
The formula below can be put to use for calculating the data location:
ActualSectorLocation = BAT [BlockNumber] + BlockBitmapSectorCount + SectorInBlock
In this way, the blocks can be easily allocated in any given order at the time of the maintenance of their sequence using BAT. At the time of the allocation of a block, the image footer has to be pushed back to the end of the file. Also, the file portion that is expanded has to be zeroed.
The Process of Splitting Hard Disk Images
The versions prior to MS Virtual Server 2005 used to support the disk images’ splitting process. This was done in the case where a disk image grew greater in size than what was supported by it. In other words, splitting was done when the disk image grew larger than the maximum file size supported on the file system of the host machine. Certain file systems, like FAT32, have a 4GB file size limit. Now, if any disk image gets expanded more than this size, then the MS Virtual PC 2004 version as well as the previous ones would split that disk image into another file. The split files will not have any footers or headers; they will only have raw data. The last split file would have the footer, which would be stored in the file end. The first file inside the split hard disk image will have the file name extension of *.vhd. The ones that will come after the first file will make use of the filename extensions as .v01, .v02, and so on and so forth. The split files would be in the same directory as the main disk image. The highest number of split files that are allowed to be present here is 64. One cannot change the size of the split file.
Implementation of a Differencing Hard Disk
Differencing disks go on to store the parent hard disk file locator within the differencing hard disk themselves. On attempting to open a differencing hard disk by a VM, both the differencing disk as well as parent one will be opened. The parent one can even be the differencing one, and in this case, there can exist a string of differencing disks that would eventually end in one non-differencing disk. In order to be able to move the hard disks across different platforms, the format of the hard disk is formed in such a manner that it is able to store the file locators of the parent hard disk for various platforms simultaneously. The parent locator table is utilized by the differencing disks only, which save a platform code for each parent file locator that is being stored in that file. The VM would read the correct parent file locator for the present platform and then it opens the disk image. In Microsoft Windows, there exist two types of platform locators: one is “W2ku” and the other one is “W2ru”. W2ku is the complete parent hard disk pathname, and W2ru is parent hard disk pathname that is relative to the differencing disk.
Differencing Hard Disk’s Write Operation
For this, all the data has to be written to the differencing disk image. In this type of operation, one marks the block bitmap as dirty for all sectors that are written to that specific block.
Differencing Hard Disk’s Read Operation
On reading the hard disk image sectors by a VM, the differencing disk subsystem would be checking the block bitmap within the differencing disk. This subsystem would read the sectors that have been marked dirty from the differencing one and those sectors that have been marked as completely clean from the parent one.
Parent Hard Disk Image Identification
Each hard disk has a specific UUID that is stored in the footer of that hard disk. On the creation of a differencing hard disk, the parent hard disk UUID is stored within the differencing disk. The UUID as well as the parent disk name both are utilized for recognizing the parent hard disk.
Modifying Parent Hard Disk Image
After the creation of a differencing disk for a parent one, the parent one must not be set to any modification, as the modification of the parent one would invalidate the differencing hard disk state. To ensure that nothing like this ever occurs, the date of the parent modification must be stored in the hard disk structure of the differencing hard disk.
Note
Both the UUID of the parent hard disk and the modification date of the parent hard disk must be checked to ensure that a valid parent-child relationship exists.
Anil Kumar is a senior analyst at BitRecover Software with more than 20 years of experience in fast and certified data forensics recovery of hard drives, RAID, SSDs, smartphones and digital camera media.