File Signature Analysis and Data Carving
I have to find an occurrence of a PDF file within a disk image (it is in raw data not within the file system). Once it is identified the file has to be retrieved using data carving techniques.
Does anyone know the method to do this using Sluethkit in ubuntu?
Hi Blonde,
This sounds to me like the sort of thing a student has been asked to do. We've seen a lot of discussions here on helping students with assignments so, here are my suggestions.
If you have to do this using Sleuthkit on Ubuntu then I would suggest that you read the Sleuthkit documentation first.
However, if you don't have to use Sleuthkit, you will learn more about what the tool would be doing by doing it manually. Google will point you to resources on file signature analysis and the PDF file signature. Using a hex editor you will be able to locate occurrences of that signature. You will need to understand the file system of your image to do this avoiding false positives and you could do worse than invest in a copy of Brian Carrier's File System Forensic Analysis. It will be an investment.
Finally you should be able to carve out the file using dd or dcfldd. Use the man pages or Google to find out more about how they work.
Best of luck,
Steak'n'Eggs
Hi Blonde,
This sounds to me like the sort of thing a student has been asked to do. We've seen a lot of discussions here on helping students with assignments so, here are my suggestions.
If you have to do this using Sleuthkit on Ubuntu then I would suggest that you read the Sleuthkit documentation first.
However, if you don't have to use Sleuthkit, you will learn more about what the tool would be doing by doing it manually. Google will point you to resources on file signature analysis and the PDF file signature. Using a hex editor you will be able to locate occurrences of that signature. You will need to understand the file system of your image to do this avoiding false positives and you could do worse than invest in a copy of Brian Carrier's File System Forensic Analysis. It will be an investment.
Finally you should be able to carve out the file using dd or dcfldd. Use the man pages or Google to find out more about how they work.
Best of luck,
Steak'n'Eggs
I would recommend foremost or scalpel to carve out the data. I have never heard of using dd or it's variants for anything other than imaging a block device. I would be interested in how you could use dd to carve.
I second Beetle's comment…I would be very interested to see the process used to carve using dd or dcfldd.
Thank you.
I second Beetle's comment…I would be very interested to see the process used to carve using dd or dcfldd.
Thank you.
It's relatively simple. The good thing about PDFs is that they are too large to be resident, so you're dealing with straight cluster carving. Identify the off-set to the file type header match, and then use DD with the BS, SKIP and COUNT parameters. Of course this isn't so great if you have fragmentation, but then nothing is good at fragmentation with no directory entries.
E.G., your drive uses 4k clusters, and you locate a hit for the file signature at cluster 50123. Determine a reasonable size for a carve, say 100k, and you get
> dd if=/jobs/case123/image123.dd of=/jobs/case123/export/carve0001.pdf bs=4k skip=50123 count=25
It's relatively simple to script something like that also to handle multiple file types.
Maybe I am missing something, but why would you use DD, or a variant? Wouldn't you want to use icat, blkcat, or something like that?
And yes, this totally sounds like a classroom exercise. If you can't answer this question without cheating, you probably shouldn't be in Computer Forensics.
I would recommend foremost or scalpel to carve out the data. I have never heard of using dd or it's variants for anything other than imaging a block device. I would be interested in how you could use dd to carve.
These are good but photorec from the testdisk suite is more simple to use and recovers fragmented files where it can. It uses an ncurses interface by default but there are CLI options too.
Don't be fooled by the name - it recovers
Paul
Folks,
Patrick4n6 is on my wavelength but I'd go a bit further with this particular example bearing in mind that I'm working on the premise that this is a student's assignment.
Using dd or dcfldd for files types which have both footer file signatures as well as header file signatures (e.g. pdf, jpeg) allows the operator to measure the potential file size and then carve out the specific file exactly. Clearly one has to be aware of false positives given that the footer will not be aligned to a cluster boundary.
There is no doubt that using your carving tool of choice is easier but this method is focussed on the student. It enables a learner to manually execute the process, hopefully leading to a solid understanding of the mechanics of what a tool would be doing. In my opinion this is a necessity, otherwise we're building on sand…
Steak'n'Eggs
PS
I find myself in a bit of a quandary here. I would be happy to post a fully-detailed example but I'm mindful of the fact that my intention is not to give the OP the answer on a plate.
How is that any different from using foremost or scalpel? I mean, besides the fact that I can just give those tools the header and tell it to extract X number of bytes afterward? Whereas, with dd/dcfldd, I first have to locate the offset through some other means?
"I would be happy to post a fully-detailed example but I'm mindful of the fact that my intention is not to give the OP the answer on a plate."
I'd look at it this way…posting a fully-detailed example is likely to be useful to many more on this list (and others), so giving the OP the answer on a plate would be a rather small consideration, I'd think.
I second Beetle's comment…I would be very interested to see the process used to carve using dd or dcfldd.
Thank you.
It's relatively simple. The good thing about PDFs is that they are too large to be resident, so you're dealing with straight cluster carving. Identify the off-set to the file type header match, and then use DD with the BS, SKIP and COUNT parameters. Of course this isn't so great if you have fragmentation, but then nothing is good at fragmentation with no directory entries.
E.G., your drive uses 4k clusters, and you locate a hit for the file signature at cluster 50123. Determine a reasonable size for a carve, say 100k, and you get
> dd if=/jobs/case123/image123.dd of=/jobs/case123/export/carve0001.pdf bs=4k skip=50123 count=25
It's relatively simple to script something like that also to handle multiple file types.
If you have identified the offset of the PDF header, with presumably a hex editor, isn't it just easier to select a block from that point to where you think the file ends, copy it out and write out to a new file ala old style Norton disk editor deleted file recovery? Using dd as you set out would work but is an additional step with another tool that's not necessarily the most efficient approach. I suspect the exercise (assumed) was set by the instructor or teacher after the introduction of the various tools had been made during a lecture or class demonstration (this is all assuming this is some kind of class exercise) and that the point is to take these tools and effectively use them.