I have a drive image that has many sectors that hold exactly the same data; there is a great deal of repetition.
I am attempting to identify every sector on the drive that holds this data, by MD5 hashing the repeated sector/data and then searching through the sectors finding matches.
I have two questions that hopefully someone can answer, but here is what i have done so far
For testing purposes i have exported the first 10 mb of data from the drive into a file named "1st_10meg.bin"
I then tested that i could interrogate the file by entering the following
odd if=1st_10meg.bin bs=512 count=1 od -Ax -tx1z
This successfully displays the first sector, as expected.
I then entered this
dd if=1st_10meg.bin bs=512 count=1 | md5sum
Which displayed an MD5 hash for the first sector.
Next i entered
for ((x=0;x<10;x++)); do dd if=1st_10meg.bin bs=512 count=1 skip=$x | md5sum)) md5s.txt; done
Which puts all of the sectors md5 hashes into the text file specified, in this case "md5s.txt".
The md5 hash of the repeated sector happened to contain the number "146", and next i entered
grep 146 md5s.txt
Which displayed all md5 hashes containing "146", and so displayed all of the repeated sectors.
I then displayed the number of repeated sectors by entering
grep 146 md5s.txt | wc - 1
Next i entered
for ((x=0;x<10;x++)); do echo -n "$x " >> md5s.txt; dd if=1st_10meg.bin bs=512 skip=$x | md5sum >> md5s.txt; done
which as before put all md5 hashes into a text file, however this time with their related sector numbers.
and that is where i am up to, what i want to know is
** how to take the repeated blocks from "md5s.txt" and put them into their own text eg "repeated_blocks"
and more importantly
** how to use these commands to interrogate the whole of the drive as opposed to a file with containing the first 10 mb.
I attempted to display the first sector from the drive directly by typing the following
dd if=//./ physicaldrive01 bs=512 count=1 od -Ax -tx1z
however this did not work.
I hope someone on here knows more about this stuff and can help…
Thanks
Hi Forenz,
I know that we have had some brief discussion over PM, but for the record, I think it is best to answer here so that it can be cross referenced in future -)
I've voiced the opinion that you'd be best to do things in Perl, and that really is my opinion alone … I find it easier to construct anything more complex than a one liner in. However, as you pointed out that time was something of a limited variable, completing your task as you have started is fair enough. For reference though, the books by Harlan Carvey are very good for Perl on Windows, obviously "The Camel Book" etc. from O'Reilly are great, and, if you are going to write regularly, I would strongly recommend "Perl Best Practices" by Damian Conway also O'Reilly.
Right, back to the task in hand, working backwards …
Cygwin is a fickle beast, it is UNIX running on Windows … A match made in Hell … So to carry out the command that you have tried to access the physical drive, you need to know that a "\" under UNIX is an escape sequence, and in order to type a "\" you need to type "\\" … With me ?
So the actual command is
dd.exe if=\\\\.\\physicaldrive0 bs=512 count=1 | od -Ax -tx1z
Your other question
** how to take the repeated blocks from "md5s.txt" and put them into their own text eg "repeated_blocks"
Is a little less clear. You could simply, assuming that you are only looking for blocks that match the "146" signature, run the output of your command through grep again and pipe the output to a file …
grep 146 md5s.txt | tee repeated_blocks
This however would also match any block number that happened to contain "146" ( 146, 1146, 1460 etc. ) which would be a pain. If you can get a longer search string ( preferably with a letter in it ! ) that would reduce hits.
This is hardly an answer with any finesse though 😉
So firstly, I'm going to make some small changes to your code so that we can do what we want …
for ((x=0;x<1000;x++));
do printf %10d%s $x " " >> md5s.txt;
dd if=\\\\.\\physicaldrive0 bs=512 skip=$x count=1 | md5sum >> md5s.txt;
done
This gives a slightly longer run of the first 1000 sectors, and rather than run it on the command line, I put it into a shell script file. Also, rather than just putting the sector number on the first part of the line, I have assigned it to a fixed length string. This means that in the next part, I know in advance how many of the chars at the beginning of each line I can ignore. Otherwise at some points it would be 1 ( 0 - 9 ), some it would be 2 (10 - 99), some 3 (100 - 999) etc. You could process this, but to my warped mind, I think that this is an easier way of doing it. Obviously it would be wise to figure out how many sectors you intend to process this way and make sure that they fit into the 10 chars that I have assigned !
( For more info on printf type "info printf" in the Cygwin command line ).
Ok, so now we have a file that looks something like this …
882 98643d9aa0859d267ce0d1196ab3bba0 *-
883 b04b87ce3d841179cdeff239963171ea *-
884 2da7cdae7acf7de11caea35da52e4d76 *-
885 7f97c09d649fd21f0bde6b2e4fe5a1b1 *-
886 2c08651e2ae72796062369ad988f358b *-
887 b7f611826de8ace5963ecc155c9cb956 *-
888 bf619eac0cdf3f68d496ea9344137e8b *-
889 bf619eac0cdf3f68d496ea9344137e8b *-
890 bf619eac0cdf3f68d496ea9344137e8b *-
891 bf619eac0cdf3f68d496ea9344137e8b *-
892 bf619eac0cdf3f68d496ea9344137e8b *-
893 bf619eac0cdf3f68d496ea9344137e8b *-
894 bf619eac0cdf3f68d496ea9344137e8b *-
895 56229dc60870150ab6ae225ccff2eb77 *-
896 f27e9596214101ca62892214ab1f042c *-
897 8b315805774b729af8c763b78a43d152 *-
898 559ea2dc61a4896fd9373bc9047d7a7c *-
899 bdcf2816e232a8319681b74788f64946 *-
900 cb66ec7f1a6d01fd349cb230b74af303 *-
901 fd4937c98025b9e582513148c8dccc79 *-
902 e1aed7dd3ec0bddb6df0422c5dfe1882 *-
At this point it gets easy 😉
Cygwin has the standard UNIX uniq utility - if you use the -D or –all-repeated switch, it will print all of the lines that are, well, repeated …
-s 10 skips the first 10 chars ( which we know are the sector numbers ), the "=separate" means to put a blank line between each block of duplicates.
uniq --all-repeated=separate -s 10 md5s.txt | tee repeated_blocks
This gives us, in the repeated_blocks file … ( a small subsection ) …
119 bf619eac0cdf3f68d496ea9344137e8b *-
281 bf619eac0cdf3f68d496ea9344137e8b *-
282 bf619eac0cdf3f68d496ea9344137e8b *-
283 bf619eac0cdf3f68d496ea9344137e8b *-
284 bf619eac0cdf3f68d496ea9344137e8b *-
285 bf619eac0cdf3f68d496ea9344137e8b *-
286 bf619eac0cdf3f68d496ea9344137e8b *-
340 bf619eac0cdf3f68d496ea9344137e8b *-
341 bf619eac0cdf3f68d496ea9344137e8b *-
348 bf619eac0cdf3f68d496ea9344137e8b *-
349 bf619eac0cdf3f68d496ea9344137e8b *-
350 bf619eac0cdf3f68d496ea9344137e8b *-
561 2c08651e2ae72796062369ad988f358b *-
562 2c08651e2ae72796062369ad988f358b *-
I've tested all of the above in Cygwin …
CYGWIN_NT-5.1 allanon 1.5.24(0.156/4/2) 2007-01-31 1057 i686 Cygwin
So hopefully you should be able to "plug and play" the code 😉
I do hope that this helps. wink
All the Best,
Azrael
Azrael,
Absolutely spot on, exactly what i wanted! thanks a lot for putting the time in to answer my question in full mate! its really appreciated.
Forenz
so if i made this into a windows command line program in PERL , would i also be able to make a nice GUI for it in PERL?
also how different/tricky would it be to make this for Windows with PERL? i suppose its a whole lot different from Linux commands!
Hmmm …
One of Perl's mottos is "There is more than one way to do it" 😉 Things would be easiest for you if you stick to the Cygwin Perl interpreter, rather than getting something like the ActiveState one - it is very good, but the Cygwin one will be more … urm … UNIXy ?
You could relatively easily run the system commands from Perl, using the backquotes `` to capture the output, or you could do individual parts of it from Perl itself. Or you could do the fully monty in Perl.
For an image file, it is easy to open it as a binary file, read n bytes, use the DigestMD5 module to get the hash ( or the DigestSHA1 for that matter … ), push results to an array and do comparisons across it …
I must admit that I work with images, not raw disks, but I'm positive that you would have no issues with reading directly from the same path as you did in the command line, provided that you run your Perl through the Cygwin window … Everything in UNIX is a file, that's why it's so cool.
With regard to the interface, Cygwin supports Perl Tk so you should be able to build a GUI, provided that you are able to run the X Windows Server part of the Cygwin distribution on your machine.
Perl isn't dissimilar to Shell Scripting, which is what you have done so far, it just has far more to it, and a greater extensibility.
Have a look at
http//search.cpan.org
http//
http//
http//
Try the Library for the books in the above post, and also try your Athens account - some of the Uni's subscribe to the O'Reilly Safari online bookshelf through Athens.
Hope this helps a bit. If you get stuck somewhere specific, post again 😉
All the Best,
Azrael
I don't know if i'm definately going to do the "full monty" in PERL yet, but i do fancy it at some point…
If i purchase the famous "camel book" http//
will this give me sufficient knowledge to replicate what has been done with these commands except entirely in PERL?
If i did this i would eventually want to create a GUI for it, it might be a useful project for uni.
Thanks again!
Ok. Short, medium, and long answers …
Short Yes.
Medium If you read the whole of the Camel book, you would know a heck of a lot about Perl, and, at the end of the process, you should be able to figure out how to do the above and a darn sight more.
Long Whilst it is possible to recreate all of the steps above ( md5 etc. ) it would be counter productive as these things have been done allready by other people, and these are available as Perl modules from/via CPAN. The Camel book explains how to use a generic module, but it doesn't tell you what is available or how to use the specific interface of a given module.
This means that you would have to do research beyond the scope of the book. Also, obviously, it is a book about Perl, not forensics, so any knowledge about that area would need to be gleaned from elsewhere …
The Camel book doesn't cover Perl TK at all, so you would need to learn that from elsewhere as well. ( There is a section on it in the Perl Cookbook though … Again, a great investment … )
I completely appriciate that books in the UK cost a fortune - quite how a book that costs $40 retails for £40 ( more than double the price at todays exchange rate )* I will never understand. So there is a great deal to be said, when it comes to learning programming, to setting yourself a project and doing it. There isn't much in Perl that you couldn't find out for free on the internet. Try http//
If you get particularly stuck at any point PM me -)
—-
* before I get lynched - yes, you can get it all cheaper online, I'm refering to bookshops … But even so, "Programming Perl" is $32.97 from Amazon.com and $47.66 from Amazon.co.uk, that's still nearly 50% more …