NTFS: More than one...
 
Notifications
Clear all

NTFS: More than one INDX record pointing to the same MFT rec  

Page 1 / 3
  RSS
CyberGonzo
(@cybergonzo)
Active Member

From time to time, rarely but still, I see more than one INDX record pointing to the same MFT record. Inside one folder. There are a few differences (time, size) perhaps, but the essence (MFT) is the same.

I was wondering if there is a clue I'm not taking in account to disregard such records. Although I see no flag differences etc. To avoid parsing an MFT record twice (or more), and listing the same files / folders twice (or more) in one folder.

At the moment I check every new MFT ID, as I parse the INDX, to see if it was not used already for a file/folder inside the folder I'm parsing. But for huge folders (e.g. Windows\WinSxs\ with > 16,000 objects) there is a performance penalty.

So if there is something I'm overlooking, kindly let me know

Quote
Posted : 29/07/2017 1:15 am
jaclaz
(@jaclaz)
Community Legend

Have you compared your results with those from (say)
https://github.com/jschicht/Indx2Csv

Or maybe
http//www.williballenthin.com/forensics/indx/
https://github.com/williballenthin/INDXParse

?

jaclaz

ReplyQuote
Posted : 29/07/2017 1:29 am
CyberGonzo
(@cybergonzo)
Active Member

Thanks. I tend not to look into other code, so no I haven't. I'm not good with that anyway. I like a specification (if there is one) and a blank 'canvas'. Or some reverse engineering, but in this case I'm not seeing it.
I was more hoping somebody knows something I don't and points me in the right direction.

ReplyQuote
Posted : 29/07/2017 1:53 am
passcodeunlock
(@passcodeunlock)
Senior Member

There are now right directions, just follow jaclaz's hints this time )

ReplyQuote
Posted : 29/07/2017 3:43 am
JimC
 JimC
(@jimc)
Member

The Binary Markup Toolkit (BMTK) software I developed includes an INDX parser.

The software is available to bona fide forensic practitioners and is completely free. My only request is that you please let me know what you think about it, how it works and what improvements you would like to see. You can read more about the software here

www.binarymarkup.com

I would be happy to answer any questions about the software either here or via email.

Best wishes

Jim

ReplyQuote
Posted : 29/07/2017 4:59 am
CyberGonzo
(@cybergonzo)
Active Member

Thanks All.

Jim, perhaps you can answer my question here, since you implemented a parser yourself ?

It's not a big issue or anything I'd like to put much time in. It works fine for me the way it does, has been for many years, used by thousands of investigators in fact, I just wonder if there is anything I don't know about (regarding), so I'm looking for that verbal

'yup, you're right, there can be more than one INDX records pointing to the same MFT record and you can't tell from the INDX record itself that there are more such records in the INDX, so the only way to avoid doubles is to check if you didn't parse a similar link already'

or

'nope, you can definitely see in the INDX record itself, that it's not worth checking the MFT record because another similar INDX record will follow (or preceeded), pointing to the same MFT record, but this particular INDX record is not it'

I hope you understand what I mean ?
Of course in case of the latter I'd love a pointer as to what field to check.
Right now I simply check all prior parsed objects in the same folder to see if that MFT record was not already processed.

I posted this question after a long day, an investigation, tired, starting to see double )
As far as I could tell the records were identical (except for time and size) but no way could I tell there would be another INDX record following, pointing to the same MFT, or vica versa.
Hence my thought, let's post it here and see if others have seen this too and know something I don't know.

ReplyQuote
Posted : 29/07/2017 1:36 pm
joakims
(@joakims)
Active Member

Maybe you can elaborate on how you are locating these INDX's? Do you even know it is in allocated space? And if so, what kind of allocated?

ReplyQuote
Posted : 29/07/2017 2:24 pm
JimC
 JimC
(@jimc)
Member

To expand on the previous post, yes it would be very helpful to know more about the circumstances

a. Which sort of INDX record are you taking about? I assume you mean $I30 indexes of filenames but worth checking nevertheless.

b. Are the INDX records you've found in an active index or old records in unallocated space?

c. How do they differ from the $FN attribute in the "live" MFT record?

Assuming you are talking about the most common $I30 index of filenames then these are really little more than an ordered list of the FILE_NAME structure present in the NTFS $FN attribute. Each record contains the 4 timestamps (see below), the file size and file name. If the INDX record is in unallocated space then I would anticipate that it is an older record recording the state of the FILE_NAME structure at some earlier time. This could be evidently significant because it may refer to a previous file name (if it has been renamed), file size or time stamps.

One other thing you may like to consider is the header of the INDX stream. Each INDX cluster (typically 4KB) starts with an INDX_CLUSTER_HEADER. This is similar to an MFT record header and starts with the "INDX" signature. The 3rd record in the header is the VCN of the INDX cluster. This defines the logical order of the record relative to others for the same index. This may be useful if reassembling "old" indexes that have been found in unallocated space. Given enough bits may be possible to completely reassemble the non-resident index and therefore the contents of a directory at previous time.

NB Timestamps in NTFS $FN attributes should be taken with a pinch of salt. They are only updated when the $FN record itself is modified (for instance when record first created or object is renamed) and probably aren't of much forensic value without other corroborating artifacts like a chronological USN change journal.

I hope this helps.

Jim

www.binarymarkup.com

ReplyQuote
Posted : 29/07/2017 5:08 pm
jaclaz
(@jaclaz)
Community Legend

Thanks. I tend not to look into other code, so no I haven't. I'm not good with that anyway. I like a specification (if there is one) and a blank 'canvas'. Or some reverse engineering, but in this case I'm not seeing it.
I was more hoping somebody knows something I don't and points me in the right direction.

I suggested you to compare results NOT to *look into other code* (though in this case, being the suggested tools OpenSource it wouldn't be a problem).

What I meant was reducing the possibilities, right now for what I know EITHER
1) such duplicates actually exist
OR
2) such duplicates do not exist "in nature" 😯 and either your (buggy) software creates them out of thin air or your samples (and you samples only) contain them for *whatever* reason
(just for the sake of reasoning)

Then, once determined that these duplicates actually exist, other Authors may well
1) have completely ignored their duplicated nature (and their tool's results will show them duplicated)
2) have noticed them but decided to not care (and their tool's results will show them as above)
3) have noticed them and decided to de-list randomly duplicates
4) have noticed them and found a clean, smart way to dedupe them

There are of course no "real" specifications for NTFS (the only ones that would be "authentic" are somewhere in a safe in Redmond wink ) only some reverse engineering here and there, with - lately - Joakim Schicht (joakims) that did lots of interesting work and made available a number of nice related tools.

jaclaz

ReplyQuote
Posted : 29/07/2017 5:13 pm
CyberGonzo
(@cybergonzo)
Active Member

@joakims

Allocated space yes (* see further)
$I30 indexes of filenames, correct

This is just normal parsing of a good working file system. Not looking for deleted files or anything. Just starting with the root and working down.
PS. the instance I'm looking at at the moment is in fact for the root.

@JimC

(*) Can you define unallocated space ?
I just had another look and for this particular root I'm parsing there are 3 x 4K INDX blocks.
Based on the header information of each block I parse what is 'allocated'
As far as I know I do this correctly but I can dig deeper in a week or two (I'm having a bit of vacation now).
This is years-proven code, which doesn't mean there can't be an issue of course, but if there is it has alluded binary comparison testing of found files/folders so far.

> NB Timestamps in NTFS $FN attributes should be taken with a pinch

I only use the information from the MFT records
INDX records are used only to find the associated MFT records basically

@jaclaz

> I suggested you to compare results

It's of no use if the tools ignore such dupes (as that is what Windows does). Or even if they list them too (see further).

Which brings me to the initial question, do you 'recognize' a dupe as dupe or do you check if the associated MFT record (and hence filename) was processed already for this folder

Unless you're saying the situation I describe is not possible, never seen out there, then I need to consider a bug in my software. But for the time being I have no indication yet that I'm doing something wrong.

I suppose I had hoped for a quick and crystal answer saying 'this is how you recognize it' or even 'this is not possible, something is wrong in the image or your code' by somebody who recognizes the situation.

> 1) such duplicates actually exist

Yes. Unless I'm in unallocated space, which at this point I still doubt (but to be investigated)

2) such duplicates do not exist "in nature" Shocked and either your (buggy) software creates them out of thin air or your samples (and you samples only) contain them for *whatever* reason

These are image files sent by people 'out there' from their properly working Windows systems. Where they see my software list a few duplicate folders, but Windows obviously doesn't see them.
I have never seen such a situation myself, during testing, but I am now aware of two cases, from the many tens of thousands of installs, but then most people would probably not bother to report it so that is not a good indication

> 1 - 4

Exactly. And what tool does what in which situation ? I'd still be ploughing through all code to see if they disregard the dupes or not find them at all if they don't list the dupes, or simply show them, which only confirms what I see but doesn't offer an answer either.

> There are of course no "real" specifications for NTFS

If there was I would have searched it there. I was more hoping to get an AHA moment from somebody who does the same and has seen this as well. There is of course the possibility that nobody has seen this yet. Because of an NTFS rarity or because my code reads where it shouldn't. But then I'd like to hear that too.

I HAVE to sign off not to aggravate the family more ) We leave soon.
I'll review the code when I get back.
Meanwhile I'd appreciate insights from people who have seen the same or know something I don't. E.g. it is not possible, or it happens from time to time and this is what I do in such cases …

ReplyQuote
Posted : 29/07/2017 7:29 pm
jaclaz
(@jaclaz)
Community Legend

These are image files sent by people 'out there' from their properly working Windows systems. Where they see my software list a few duplicate folders, but Windows obviously doesn't see them.
I have never seen such a situation myself, during testing, but I am now aware of two cases, from the many tens of thousands of installs, but then most people would probably not bother to report it so that is not a good indication

Well with a little of torture applied we managed to get that at least is not an overly common issue.
(which may mean that it was never noticed before)

I would point out anyway (maybe I am too skeptic) that the fact you never managed yourself to find this situation may mean that this

These are image files sent by people 'out there' from their properly working Windows systems.

may be written as

These are image files sent by people 'out there' from Windows systems that they affirm are working properly.

Subtle difference, still ….

Which software of yours is that?

OT (but not much) and JFYI
http//reboot.pro/topic/21558-files-now-think-theyre-directories/
Never seen that before in quite few years, a directory that is a sparse file AND that CHKDSK does not fix.

jaclaz

ReplyQuote
Posted : 29/07/2017 8:13 pm
CyberGonzo
(@cybergonzo)
Active Member

> Which software of yours is that?

www.isobuster.com

ReplyQuote
Posted : 29/07/2017 8:20 pm
jaclaz
(@jaclaz)
Community Legend

> Which software of yours is that?

www.isobuster.com

Ahh, good ) this shows how old I am getting ( , I still connected it with just CD/DVD's.

jaclaz

ReplyQuote
Posted : 30/07/2017 5:22 pm
joakims
(@joakims)
Active Member

@CyberGonzo
I have seen many strange things with regards to NTFS, but it is hard to just summarize everything in an understandable way here. If you have an example with real data sometime, I can probably assist in sorting it out if you need further help.

ReplyQuote
Posted : 30/07/2017 6:01 pm
JimC
 JimC
(@jimc)
Member

Unallocated space

The INDX records could be more correctly called "Non-resident" NTFS indexes. Small indexes are stored purely in the MFT using the $INDEX_ROOT (0x90) attribute. When the index gets a little larger (typically just a handful of filenames) a $INDEX_ALLOCATION (0xA0) attribute is created and the bulk of the index pushed outside the MFT to normal data clusters. The clusters used are recorded in the $INDEX_ALLOCATION attribute and marked as allocated in the $Bitmap file. Each INDX cluster starts with the INDX_CLUSTER_HEADER and, as noted already, uses the VCN field to record the logical position of the cluster in the complete index.

As the directory contents changes the index is frequently rebuilt. If a cluster is no longer required (for instance if the directory has shrunk or maybe just because NTFS felt like storing the data elsewhere) it is marked as unallocated in the $Bitmap file but the contents will likely remain. The INDX record becomes an "orphan" recording the directory contents (with links back to the MFT) at a particular point in time.

Therefore, if parsing an NTFS file system for the INDX header I would suggest considering

a. Is the cluster currently allocated in $Bitmap

Yes - The INDX record is part of a "live" index
No - The INDX record is no longer in a "live" index. It is an orphan from the past

b. Searching the MFT for the $INDEX_ALLOCATION attribute that contains the cluster.

If appropriate, there may also be further mileage in

c. Searching volume shadow copies of the MFT for previous instances

d. Searching the USN Change Journal for operations on the MFT record of interest

There is an excellent explanation of this in Brian Carrier's book. As @joakims suggested, if you can post some example data it may be possible to help more.

Jim

www.binarymarkup.com

ReplyQuote
Posted : 31/07/2017 11:09 pm
Page 1 / 3
Share: