mft2csv - NTFS systemfile extracter and $MFT decoder
I post it here as well since it is in fact open source and free..
So I finally finished first version of my mft2csv application. I registered it at the project hosting at Google code; http//code.google.com/p/mft2csv/
Edit New home is; https://github.com/jschicht
It is 2 applications
- One that will carve files directly from physical disk by reading sectors as specified in $MFT. Several modes are available.
- The main application is the mft2csv tool that will decode and log large amounts of data from $MFT and to csv format. The current base of 126 variables are a very good starting point for further improvement (and maybe some are unneeded).
Check out the site, there is source, compiled exe and descriptions. The source is free and preatty much without restrictions.
I really recommend doing something like this if you want to dig into NTFS and learn more about it the interesting and exciting way. It is very likely that I will expand it with more features soon. Since it is written in the scripting language AutoIt it is Windows only. However it is very easy to work with.
Great tool with an easy-to-look-at output!
I was having some problems with it recognizing a volume mounted with FTK Imager, but it could see the same image mounted in EnCase with PDE, so must be an issue with Imager.
I also was able to carve MFT records and rename the output $MFT.bin and the MFT2CSV utility parsed it out well.
As a suggestion for leaning the $MFT layout, it might be good to put the attribute byte offset in each column header.
PS - Darren Freestone's booklet at lockandcode.com is also a great resource for parsing the $MFT manually.
Thanks for the feedback. Suggestions noted.
New version of both apps are just uploaded.
In mft2csv I've fixed the issue with more than 3 ADS's. Have also added file size as a new csv field. It is for the first $DATA attribute I think it is easier to also have it available this way instead of as separated on resident/non-resident fields only.
In the extracter there was an error in the function _GetAllRuns() that incorrectly solved runs in certain cases. That is now fixed. Note however, that the experimental functionality to rip files by their MFTnumber, is not fully working. I did not account for the fragmentation of MFT itself. But since I have the runs solved I think I can handle it too.. But until it's fixed I've put a hardcoded exit after record 1000, since fragmentation is rare so early in the file. The functionality would work when MFT is not fragmented though..
While fixing a tiny bug in the timestamps, I started wondering what kind of timestamp is the best to use. Is it as UTC or local time?
Well, the timestamp is what it is. What you're asking is what timezone you should present the information using. And my answer is - I don't adjust the timestamp at all. The user should record the timezone settings of the system they pulled the $MFT from.
You could add a switch to allow the user to report all times using a specific timezone.
Sorry for late reply. I was trying to completely understand the timestamp and timezone handling, and I think it's clear now. As you suggested it only really matters what timezone configuration is on the system that analyzes the $MFT (and not the system it was taken from). For that reason I've implemented a choice when launching the app to choose if UTC/local time is wanted in the timestamps. Also added information in the csv header about what format (utc or local) the timestamps are shown in, as well as info about what timezone configuration is present (the file_time_delta/36000000000=hours). Btw, milliseconds are now as they should, after a bugfix.
Sample csv header
#Timestamps presented in Local time
#Current timezone configuration (bias) including adjustment for any daylight saving = -12 hours
Now also the $MFT runs are solved, so kind of any file on the volume with a valid record should be extractable from the fs (with certain limitations).
I'm not an Autoit programmer, but I think you have a small logic error in your mft2csv program. The line
If NOT StringMid($MFTEntry,1,8) = '46494C45' Then
does not do what you think it does. I think it should be
If NOT (StringMid($MFTEntry,3,8) = '46494C45') Then
with the two values enclosed in brackets. Note also the offset should be 3 not 1.
Thank you very much for taking your time and comment on the app. You are absolutely right about the error in detecting invalid records. It will be fixed.
You're welcome, it's a fascinating file system.
I've been following the references that you gave and just realised that the original Autoit script and the python one before that all seem to omit one important aspect of mft records. An mft record needs to be corrected before it can be decoded properly. In Linux I've seen the process referred to as 'fixup'. It uses the update sequence to provide a coarse integrity check on each sector of the record. Using the partial record shown below, it works as follows
At offset 0x04, the two bytes, 30 00, give the offset to the update sequence (0x30) and the next two bytes, 03 00, give the size of the update sequence array in words (0x03=6 bytes).
At offset 0x30 the 6 bytes are 09 00 31 0b 00 00. The first two bytes are used to replace the real bytes at the end of each sector in the record. So at offset 0x01fe and at 0x03fe the bytes 09 00 appear again as shown. This provides an integrity check for the sectors. If the bytes are not the same the sector, and hence the record, is probably bad.
Fixing up a good record is done by moving the second pair of bytes in the update sequence array to the end of the first sector and the third pair of bytes to the end of the second sector. In this case, 31 0b is moved to 0x01fe and 00 00 is moved to 0x03fe.
For this particular partial record, the change at 0x03fe makes no difference, but the one at 0x1fe is part of the data run for the file and would make a huge difference when reading the file. The bottom line is that if you don't do the fixup, it will bite you at some stage.
0000–46 49 4c 45 30 00 03 00-56 e6 03 1d 07 00 00 00
0010–14 00 02 00 38 00 01 00-10 02 00 00 00 04 00 00
0020–00 00 00 00 00 00 00 00-05 00 00 00 d6 a7 00 00
0030–09 00 31 0b 00 00 00 00-10 00 00 00 60 00 00 00
01f0–31 89 72 05 31 32 b8 98-fa 31 2d 19 91 02 09 00
0200–2c 72 fd 00 d0 30 94 e2-ff ff ff ff 82 79 47 11
03f0–00 00 00 00 00 00 00 00-00 00 00 00 00 00 09 00
Please let me know if you can't follow my explanation and I'll have another go.
I have recently updated a few things in the package
In the file extractor
- Added a FileSelectFolder function to specify where to save the output.
- Removed the default ".bin" extension, so that the outputted extension is as given in $MFT.
- Created a new format like YYYY-MM-DD HHMMSSMSMSMS, which is more filter friendly in Excel. But I'm open for suggestions on other datetime formats.
- Added nanoseconds to the datetime variables.
- Default outputted datetime format changed to "YYYY-MM-DD HHMMSSMSMSMSNSNSNSNS" for all datetime variables.
- MSecTest changed to evaluate the last 7 digits (msec + nsec)
Also included 64-bit compiled binaries for both applications.
I forgot about the issue you described, but will fix that next time.
Nice to see you are still progressing the utility, but disappointed you have not corrected the programming errors.
Just a couple of quick questions and a piece of information.
In your code, when you use the Select statement, you end the cases with a long "AND" expression. Is there some reason you can't simply use "Case Else" which basically means any other value not already cased?
I'm sure this is a silly question and the answer is obvious but I just can't see it. In your csv output file, why are all the terms in inverted commas? Surely these are redundant in a csv file?
The information I have is that you only accept vales of 0-3 for the $HEADER_Flags variable but, if you look at the mft record for the $Secure metafile, you will see its flag value is 9. Also the values for $Quota, $ObjId and $Reparse are 13. Unfortunately in terms of files/folders etc, I don't know what these other bits mean. Maybe one has to do with file locking?
Updated to version 1.4 of mft2csv.
- A bug in the record signature evaluation.
- Added csv entry for record signature evaluation.
- Added record integrity check by comparing the Update Sequence number to the record page (sector) end markers (2 bytes for each sector).
- Some cosmetic changes.
Answers to your questions
1. Quotation marks (inverted commas) in csv.
Honestly I don't remember. I agree with you and they are now removed.
2. Unknown $HEADER_Flags.
I have not found documentation for these, nor found any logic for it. If you (or anybody else) are certain about some new flags, I am happy to add it. And that's very fast and easy to change anyway.
3. The logic error when evaluating record signature.
You where absolutely right and it is now fixed. I added a new variable ($Signature) to the csv as the second field. Implementation is a little bit of a shortcut, because I just put "GOOD" to all 0x46494C45 and "BAD" to all others.
4. The fixups.
You are right about it (of course). For now I have implemented a basic integrity check by comparing those 3 values. If any of those differ, then the variable $IntegrityCheck in the csv will get the value "BAD", else it's "OK". Implementing a full fixup is thus on the todo list.
5. The Select statement.
That can probably be done in several different ways. If I get the time, I can test to see if any speed can be improved by changing it. However, the speed issue, is a drawback and will be looked into in the future.
Re the unknown $HEADER_Flags, like you I have not found any documentation for any of the bits other than the first two. It is very definitely a four bit field though. If you examine the mft record for the metafile $Secure, its bit field is 1001 (ie 9). Also the three metafiles $ObjId, $Quota and $Reparse which are in the $Extend folder have bit fields 1101 (ie 13).
I can give you a rational explanation of what these extra bits are likely to mean but first you need to look at the first two bits a little differently.
Normally we have 00xy, where x indicates file/folder and y indicates in use/not in use. One of the basic features of the NTFS file system though is that everything is a file. So if instead we view the x bit as indicating a file containg data/file containing an index, this provides the key into the unknown bits.
If you look at the mft records for the above named metafiles, you will see that they all contain sets of indices not data. I think bit four, when set, refers to a file containing security indices. Similarly a set bit three refers to a file containing "other" indices.
As far as I am aware these are the only files that use these extra bits.
Both apps with new update;
Changed record signature detection to
46494C45 = GOOD (although FILE is strictly speaking the correct one)
44414142 = BAAD
00000000 = ZERO
all other = UNKNOWN
Added header flag interpretation
0x9 = FILE+INDEX_SECURITY (ALLOCATED)
0xD = FILE+INDEX_OTHER (ALLOCATED)
Fixed crash on decoding of certain corrupt $MFT, by skipping decode of those records.
Solved several bugs in the solution that calculated runs. Negative moves where reloved wrongly. And very fragmented files with runs located past record offset 0x1fd was also incorrectly solved. Actually that last fix also fixed decoding of attributes extending over that same offset.
Note however that compressed and/or sparse files are not yet supported for extraction. I therefore added a break when trying to extract such a file.
I'm now very sure that runs are correctly solved. My "worst" $MFT at 97 MB and with 121 runs (including lots of negative ones) are correctly extracted!
I just made a small discovery when redoing the runs. Byte 3-4 (in big endian) in the Update Sequence Array is the missing 2 bytes at offset 0x1fe-0x1ff when an attribute streches beyond that (for instance runs within $DATA). Ie, in such a case (for instance with resident data or maybe other attribute) you must not forget to copy the 2 bytes of missing data that should have been at 0x1fe-0x1ff, back in.
Edit Just uploaded version 1.5 of the extracter with a fix on resident data extraction.