I'm looking for opinions of which type of file, either by extension or type, is easier to uncover forensically. Is it graphic, video, sound, or text?
There is no right answer to this question, IMHO. There are many different formats for the types of files that you mention and most (not all), can be identified by certain signatures such as headers and footers.
But then you have the problem of file fragmentation; a file may not be stored, contiguously, on disk. So to recover the file completely, you need to be able to file system data to determine which blocks the file actually occupied.
The tasks are the same for almost any instance of the file types mentioned, so I don't know that you can say that one is easier to recover than another.
A raw text file, with no structure, or a binary file with no discernable structure could be the most difficult to recover.
Well ASCII Text files of course. Just by looking at them in a text or ASCII viewer, you will easily see their contents. UTF-16BE and UTF-32BE English may be a little more difficult, and non-western languages might be the most difficult Text files.
It might be more interesting to talk about what file types are the most difficult to investigate. Your first step is to identify the file. It is pretty difficult to identify Portrait Innovations Images (.PI2), because they contain no header or signatures and are encrypted.
Here are some more file types that are equally difficult to identify. We utilize our own proprietary methods to identify them with FI TOOLS.
Crystal Decisions 3D View Angles (.3DA)
Extended Binary Coded Decimal Interchange Code (EBCDIC)
MIME Base64 Archive (.B64; Headerless)
Sound Sample (.SND or .SOU; Headerless)
XnView Filter (.HFP)
If you're not using FI TOOLS, hopefully there isn't any evidence hidding in any of these file types. -)
Well ASCII Text files of course. Just by looking at them in a text or ASCII viewer, you will easily see their contents.
Sure, but on a 1 Tb hard drive with loads of data in unallocated space, how do you find a text file if there is no hint in the MFT and no headers or footers?
I suppose that you could search for words starting with those most statistically likely to be found in an average document, but that still leaves a lot of brute force browsing.
I suppose that you could search for words starting with those most statistically likely to be found in an average document, but that still leaves a lot of brute force browsing.
At what levels do RedBull consumption and lack of sleep become dangerous?
seanmcl,
If we're talking about data/file carving, then the most structured file types would be the easiest. They would include a header, clear signatures at the start of every object and a footer. Files based on RIFF, IFF, TIFF, PDF, OLE2 and PK Zip might be the best candidates. Did I miss any chunk based file types?
With OLE2 containers using their own form of internal disk structure, their block borders may actually line up with the logical and physical disk sector borders as well. Less RedBull should be required. -)
ForensicRob
I don't disagree. The OP ask what kind of files were easiest to uncover forensically not the easiest to identify. To that question, I'd have to answer ASCII text or binary data files which lacked an obvious structure.
I third ASCII.
Most other formats require extraction and interpretation.
ASCII will appear as is even when viewed "directly".
The easiest to recover is the format with a header and a footer or a header with stored file-size within.
IOW No specific category.
There are really two categories there. Graphics, video, and sound data are almost always compressed and stored in structured files of a variety of formats. If you have complete files and a table of appropriate information, these would be incredibly easy to detect programatically (and easy but slow to do by hand). With smaller fragments of data, you may be less successful. Text tends to be uncompressed and either stored in a structured file (like a Word document), stored plain in a file, or embedded in some larger structure. Even small blocks of data that are English text (ASCII or UTF8) can be identified by hand. This can by done programatically, too, with statistics – strings of bytes that happen to form valid ASCII or Unicode character sequences are unlikely for reasonably long strings.