mft2csv - NTFS syst...
 
Notifications
Clear all

mft2csv - NTFS systemfile extracter and $MFT decoder

68 Posts
9 Users
0 Reactions
15.2 K Views
joakims
(@joakims)
Estimable Member
Joined: 15 years ago
Posts: 224
Topic starter  

I have recently updated a few things in the package

In the file extractor
- Added a FileSelectFolder function to specify where to save the output.
- Removed the default ".bin" extension, so that the outputted extension is as given in $MFT.

In WinTimeFunctions
- Created a new format like YYYY-MM-DD HHMMSSMSMSMS, which is more filter friendly in Excel. But I'm open for suggestions on other datetime formats.

In mft2csv
- Added nanoseconds to the datetime variables.
- Default outputted datetime format changed to "YYYY-MM-DD HHMMSSMSMSMSNSNSNSNS" for all datetime variables.
- MSecTest changed to evaluate the last 7 digits (msec + nsec)

Also included 64-bit compiled binaries for both applications.

@Ddan
I forgot about the issue you described, but will fix that next time.


   
ReplyQuote
 Ddan
(@ddan)
Eminent Member
Joined: 14 years ago
Posts: 42
 

Nice to see you are still progressing the utility, but disappointed you have not corrected the programming errors.

Just a couple of quick questions and a piece of information.

In your code, when you use the Select statement, you end the cases with a long "AND" expression. Is there some reason you can't simply use "Case Else" which basically means any other value not already cased?

I'm sure this is a silly question and the answer is obvious but I just can't see it. In your csv output file, why are all the terms in inverted commas? Surely these are redundant in a csv file?

The information I have is that you only accept vales of 0-3 for the $HEADER_Flags variable but, if you look at the mft record for the $Secure metafile, you will see its flag value is 9. Also the values for $Quota, $ObjId and $Reparse are 13. Unfortunately in terms of files/folders etc, I don't know what these other bits mean. Maybe one has to do with file locking?

Ddan


   
ReplyQuote
joakims
(@joakims)
Estimable Member
Joined: 15 years ago
Posts: 224
Topic starter  

Updated to version 1.4 of mft2csv.

Fixed
- A bug in the record signature evaluation.
- Added csv entry for record signature evaluation.
- Added record integrity check by comparing the Update Sequence number to the record page (sector) end markers (2 bytes for each sector).
- Some cosmetic changes.

@Ddan
Answers to your questions

1. Quotation marks (inverted commas) in csv.
Honestly I don't remember. I agree with you and they are now removed.

2. Unknown $HEADER_Flags.
I have not found documentation for these, nor found any logic for it. If you (or anybody else) are certain about some new flags, I am happy to add it. And that's very fast and easy to change anyway.

3. The logic error when evaluating record signature.
You where absolutely right and it is now fixed. I added a new variable ($Signature) to the csv as the second field. Implementation is a little bit of a shortcut, because I just put "GOOD" to all 0x46494C45 and "BAD" to all others.

4. The fixups.
You are right about it (of course). For now I have implemented a basic integrity check by comparing those 3 values. If any of those differ, then the variable $IntegrityCheck in the csv will get the value "BAD", else it's "OK". Implementing a full fixup is thus on the todo list.

5. The Select statement.
That can probably be done in several different ways. If I get the time, I can test to see if any speed can be improved by changing it. However, the speed issue, is a drawback and will be looked into in the future.


   
ReplyQuote
 Ddan
(@ddan)
Eminent Member
Joined: 14 years ago
Posts: 42
 

Re the unknown $HEADER_Flags, like you I have not found any documentation for any of the bits other than the first two. It is very definitely a four bit field though. If you examine the mft record for the metafile $Secure, its bit field is 1001 (ie 9). Also the three metafiles $ObjId, $Quota and $Reparse which are in the $Extend folder have bit fields 1101 (ie 13).

I can give you a rational explanation of what these extra bits are likely to mean but first you need to look at the first two bits a little differently.

Normally we have 00xy, where x indicates file/folder and y indicates in use/not in use. One of the basic features of the NTFS file system though is that everything is a file. So if instead we view the x bit as indicating a file containg data/file containing an index, this provides the key into the unknown bits.

If you look at the mft records for the above named metafiles, you will see that they all contain sets of indices not data. I think bit four, when set, refers to a file containing security indices. Similarly a set bit three refers to a file containing "other" indices.

As far as I am aware these are the only files that use these extra bits.


   
ReplyQuote
joakims
(@joakims)
Estimable Member
Joined: 15 years ago
Posts: 224
Topic starter  

Both apps with new update;

mft2csv v1.5
Changed record signature detection to

46494C45 = GOOD (although FILE is strictly speaking the correct one)
44414142 = BAAD
00000000 = ZERO
all other = UNKNOWN

Added header flag interpretation
0x9 = FILE+INDEX_SECURITY (ALLOCATED)
0xD = FILE+INDEX_OTHER (ALLOCATED)

Fixed crash on decoding of certain corrupt $MFT, by skipping decode of those records.

Extractor v1.4
Solved several bugs in the solution that calculated runs. Negative moves where reloved wrongly. And very fragmented files with runs located past record offset 0x1fd was also incorrectly solved. Actually that last fix also fixed decoding of attributes extending over that same offset.

Note however that compressed and/or sparse files are not yet supported for extraction. I therefore added a break when trying to extract such a file.

I'm now very sure that runs are correctly solved. My "worst" $MFT at 97 MB and with 121 runs (including lots of negative ones) are correctly extracted!

@Ddan
I just made a small discovery when redoing the runs. Byte 3-4 (in big endian) in the Update Sequence Array is the missing 2 bytes at offset 0x1fe-0x1ff when an attribute streches beyond that (for instance runs within $DATA). Ie, in such a case (for instance with resident data or maybe other attribute) you must not forget to copy the 2 bytes of missing data that should have been at 0x1fe-0x1ff, back in.

Edit Just uploaded version 1.5 of the extracter with a fix on resident data extraction.


   
ReplyQuote
 Ddan
(@ddan)
Eminent Member
Joined: 14 years ago
Posts: 42
 

That 'small discovery' was what my earlier post about Fixup was all about. I did say that it would bite you at some stage if you didn't do it. It is not unusual for mft records to extend past the 0x1fe-0x1ff point, but I don't think I have seen any where the bytes at 0x3fe-0x3ff make any difference.

Having now discovered for yourself that you really do need to do the fixup before processing the mft record, you will undoubtably find that it will also resolve the problem with filenames that have bad unicode characters!

Those bytes at 0x1fe-0x1ff can have a dramatic effect on things like long filenames, times and dataruns when they are not corrected.

Ddan


   
ReplyQuote
 Ddan
(@ddan)
Eminent Member
Joined: 14 years ago
Posts: 42
 

I'm sorry to keep harping on about this, but it really is important. I thought from my first post about Fixup that you had understood, but re-reading your last post makes it clear that you didn't. So let's be unequivocal.

When you read an mft record from disk, you do not get the real record. What you get is a modified version. It has been modified so that it has an inbuilt integrity check. You need to undo those modifications before you process the record. If you don't undo the modifications, you are not processing the real record. The undo process is called Fixup. It consists of putting two pairs of bytes back into their proper position. The two pairs of bytes are in the update sequence array and their proper positions are the last two bytes in each sector of the mft record.

You have already seen examples of what happens when you don't do Fixup. It causes dataruns to go haywire, filenames to have bad unicode, resident data to contain bad values, etc, etc, etc.

Ddan


   
ReplyQuote
joakims
(@joakims)
Estimable Member
Joined: 15 years ago
Posts: 224
Topic starter  

He he, I now feel like a moron.. Anyways, have moved over to reassembling runs with compression (including decompressing it), and it's going really good. But this time I will keep my mouth shut a little longer. )


   
ReplyQuote
joakims
(@joakims)
Estimable Member
Joined: 15 years ago
Posts: 224
Topic starter  

A question concerning ntfs compression.

I have resolved runs, meaning I can extract the raw compressed data of the file. I have also identified the winapi rtldecompressbuffer being able to handle at least parts of the compressed data. But I'm facing the issue that only the first 8192 bytes are decompressed, so I'm wondering if

1. There is a better api or method to use?
2. More modification to the reassembled compressed parts are needed?

Anybody familiar with this?

Edit Turns out decompression is working just fine if applied to each individual run, so reassembly must be done on decompressed chunks.


   
ReplyQuote
 Ddan
(@ddan)
Eminent Member
Joined: 14 years ago
Posts: 42
 

I'm not familiar with the api so cannot help there.

With the 8192 bytes though, was the cluster size 512 bytes? If it was, then it probably is ok.

The compression works on a block-by-block basis. A compression block is 16 clusters. The compression flag is always 4, and 2 to the power of 4 is 16. The fact that the flag is set doesn't actually mean that the file is compressed. It only means that the file is scheduled to be compressed.

If the file is resident, it is never compressed. A non-resident file is first split into 16-cluster compression blocks. Each block is then examined and if compression saves at least one cluster, then it is compressed. Otherwise it is not compressed. This means that the compressed file can have blocks which are compressed and some which are not compressed. The first two bytes in the compressed block tell you which is which.

To determine whether a file is compressed, you need to split the data run into 16-cluster subsets. A compressed subset would be of the form (x data clusters) plus (y sparse clusters) where x+y=16. Note that after decompression, the x data clusters will expand to 16 clusters so the sparse clusters are not actually used.

Hope this helps.

Ddan


   
ReplyQuote
Page 2 / 7
Share: