Yep, but we are trying to agree on a set of "definitions", what about
- "file slack" <- unindexed space between file size and allocated space, three types
- "sector file slack" <- unindexed space between file size and next sector border (Maxsize=SectorSize-1, i.e. on a 512 byte/sector device at the most 511 bytes)
- "cluster file slack" <- unindexed space between file size and next cluster border
- "valid data file slack" <- unindexed space between actual file size and declared file size (only on NTFS filesystems where the SetFileValidData() function or "fsutil file setvaliddata" has been used to allocate to the file a size exceeding the size of it's contents)
- "filesystem slack"<- unindexed space within the filesystem
- "volume or partition slack" <- unindexed space outside the filesystem but inside the partition/volume
- "disk slack" <- unindexed space outside the partition volume but (obviously) inside the disk
(Maxsize=ClusterSize-1, i.e. on a typical 4096 byte/cluster filesystem at the most 4095 bytes)
"cluster file slack" is comprehensive of "sector file slack" .
[/listo]
[/listo]
?
jaclaz
Adding "sparse file" concept just confuses the definitions.
In general sparse files are OS/application specific "compression", nothing more. From the perspective of a drive, a file, be it logical sparse file or physical sparse file, still remains a file.
I have seen two types of sparse files. One, where the OS/application interprets a "empty" section of a large data set and compressed it on the disk. Second, where a an application reserves a large amount of disk space by creating a large file with empty content, to be used in future.
In either case, from the drive perspective, file slack may exist. In the first scenario there is no virtual "slack", as the empty space only exists as it is referenced from the OS/application. In the second scenario, there is "internal slack space" that may or may not have been used, but currently not in use.
Adding "sparse file" concept just confuses the definitions.
I am not sure to understand you.
If I got this right, a "sparse" file is different from the above.
A "sparse" file has TWO different "sizes"
- the declared one (Bigger and fixed)
- the actually used on disk one (smaller and growing up to the above size) corresponding to increases of content
- the declared one (fixed)
- the actually used on disk one (same as above and fixed)
- the actual size of content (either growing or shrinking)
[/listo]
Apart from a minimal initial "overhead", the actual used disk space "grows" when the content is added or, if you prefer, no real disk space is "claimed" until actual contents are added/increase.
The "set valid data" is different, it still has TWO different sizes, but they are actually THREE of which two are the same
[/listo]
I.e., when you create an empty "sparse" file only the very minimal disk space is "claimed/used" and this amount changes as soon as you add content, with an empty "set valid data" file, you directly claim a given space on disk (even if you write to it NO content) and this space may contain residual "deleted data" or "freed clusters previously occupied by *something* of use".
jaclaz
"valid data file slack" <- unindexed space between actual file size and declared file size (only on NTFS filesystems where the SetValidData() function or "fsutil file setvaliddata" has been used to allocate to the file a size exceeding the size of it's contents)
I'm afraid I don't find the definition enlightening – it doesn't explain anything, particularly not to someone who doesn't already know what the definition is about. It also relies on definitions of 'actual file size' and 'declared file size' which needs to be provided somewhere.
That said, I do appreciate the difficulty in creating a definition. Myself, I typicall explain it from the other end I walk the person (who has to be a programmer!) through the variuous steps of extending a file by adding zeroed bytes at the end, through the recommended coding pattern that was found to have unforeseen consequences for NTFS, and end up with the solution that Microsoft came up with to solve that particular problem. And with a reasonably experienced Windows programmer, I may discuss alternative ways of solving it – like using sparse files.
My own attempt at a definition
valid data file slack – file slack created by NTFS when a file is extended by a number of zero bytes through the use of the system call SetValidData(). While a suitable number of clusters are allocated and added to the file to support the extra bytes, the previous cluster contents is not wiped. Instead NTFS uses special code to ensures that any attempt to read from the additional bytes result in a suitable number of programmatically zeroed bytes, instead of actual cluster content. Not until the program writes to one (or more) of these clusters will they be initialized and from there on be treated as 'normal' clusters.
But I expect that's just as difficult to understand. Trying to get it into a single sentence is probably doomed to failure.
By the way, what is 'unindexed space'?
We are in agreement in both scenarios, jaclaz.
Two types.
Let's presume OS/Application uses memory for declared data.
Let's also presume D represents data, 0 represents zeroed out data, and X just general disk space.
Also, let's agree that { and } represents beginning and ending of files, and | represents "clusters" (or whatever the chunking is for the storage media).
Finally, the underlined characters are slack space as I understand it in each scenario.
Type 1
Memory content
DDDDD00000DDDDDDDDDDD
Disk content allocated to the "file" (declared size to OS)
|{DDDDD|DDDDD|DDDDD|D}XXXX|XXXXX
Type 2
Memory content
DDDDDDDDDDDDDDDD
Disk content allocated to the "file" (declared size to OS)
|{DDDDD|00000|DDDDD|DDDDD|DXXXXX|}XXXXXX
In type 1, it is simply a "compression". This is what most people understand under sparse files.
In type 2, the OS/application reserves disk space in essence for its own use. We used this in large scale distributed databases to prevent running out of storage before the systems can fully synchronize. In some implementation the allocate space is wiped, and in some other, the OS is told "this is part of the file", yet nothing is really written to it - until needed.
In either case, the OS or file system does not know that actual data content size.
In type 1, the slack is outside of the file.
In type 2, the slack is inside of the file. Note that type 2 may still have an outside slack (but in my experience this is not the case, as apps tend to take advantage of the storage segmentation).
valid data file slack – file slack created by NTFS when a file is extended by a number of zero bytes through the use of the system call SetValidData(). While a suitable number of clusters are allocated and added to the file to support the extra bytes, the previous cluster contents is not wiped. Instead NTFS uses special code to ensures that any attempt to read from the additional bytes result in a suitable number of programmatically zeroed bytes, instead of actual cluster content. Not until the program writes to one (or more) of these clusters will they be initialized and from there on be treated as 'normal' clusters.
File slack is created by many other file systems, not just NTFS.
What is "SetValidData()"? In what programming language? The only system calls I know of for disk are INT 13h calls . . .
mrgreen
My point - i think you overcomplicated it.
File slack is created by many other file systems, not just NTFS.
At present, the only known source of this particular type of slack is NTFS. Strictly speaking, it has only been verified for Windows; status on non-Microsoft implementations of NTFS is still unknown.
What is "SetValidData()"?
A Microsoft Windows system call – though 'system call' should not necessary be interpreted to mean 'kernel call' or 'BIOS call'. It's implemented by 'kernel32.dll' – that's why I call it a system call. You find the details in the usual MS places – try googling for 'MSDN SetValidData' for one possibility.
I know that EnCase 6 supports this kind of file system structure, though it's not classed as file slack. I'm less sure what EnCase 7 does, though it probably does the same, and I have no idea how other forensic suites or toolkits deal with it.
My point - i think you overcomplicated it.
It may not be a type of file slack that is the most important thing to get right in a list of definitions of slack in general, true. But it does belong on a list of types of file slack.
Type 1 […]
Type 2 […]
Just to make sure, SetValidData() creates a structure that could be called Type 3
Memory content (added spaces for legibility)
DDDD DDDD DDDD DDD0 000
Disk content allocated to the "file" (declared size to OS)
|{DDDD|DDDD|DDDD|DDDX|XXX}X|
I think there is a mixup of the winapi SetFileValidData and the fsutil switch SetValidData.
My own attempt at a definition
valid data file slack – file slack created by NTFS when a file is extended by a number of zero bytes through the use of the system call SetValidData(). While a suitable number of clusters are allocated and added to the file to support the extra bytes, the previous cluster contents is not wiped. Instead NTFS uses special code to ensures that any attempt to read from the additional bytes result in a suitable number of programmatically zeroed bytes, instead of actual cluster content. Not until the program writes to one (or more) of these clusters will they be initialized and from there on be treated as 'normal' clusters.
With all due respect, that is not a definiton, it is a short article illustrating the feature. wink
But I expect that's just as difficult to understand. Trying to get it into a single sentence is probably doomed to failure.
Yes, it is, but at the moment we are still into the "list" and "definitions", we can add to each "short", "synthetic" (and possibly clear enough) "definition" a "corollary" of *any* length to explain and expand on the definition.
By the way, what is 'unindexed space'?
To me "unindexed space" is some space that is "not indexed" ) in the sense that is something that you cannot access through the normal commands.
A "sparse" file does not (in my view) fall in this category as it is "indexed" allright (like with DIR) and accessible allright (if you prefer it never can contain "residual data").
The fsutil setvaliddata one on the contrary "indexes" more space then you can see by (as an example for a plain .txt file) with a TYPE command.
The other "slack types" all can contain at least a bunch of bytes that are not viewable if not with direct disk access.
On the other hand, IF it was easy we would already have a valid set of definitions and we wouldn't try to create one and discuss on it…..
I think there is a mixup of the winapi SetFileValidData and the fsutil switch SetValidData.
Very possibly ? , athulin introduced the SetFileValidData() function, I found the linked to article that uses the fsutil setvaliddata and I assumed that fsutil used the SetFileValidData() function (this would be "logical", but you never know with the good MS guys).
If you prefer, athulin introduced a concept and I found a practical way to try replicating it/an existing command line that creted that effect, since I have no way of "calling" the SetFileValidData() function, I cannot try comparing the two, care to disambiguate the issue?
jaclaz