I recently attended the F3 conference here in the UK and there was an excellent presentation from Guidance of hashing at block level. I totally "got" how useful this could be but I am intrigued by the legal implications and the weight of evidence a couple of hash matches on their own would carry. If I were to try to explain this in court, I would compare it to trying to prove that the suspect had pocession of a 100,000 piece completed jigsaw on their premises and the forensic investigator found 3 or 4 pieces scattered around the house that match the original pieces. I know it's a floored metaphor but it's the best I can come up with.
I have used this method, and you are correct in saying its floored, with 4 pieces however….
if you discovered scattered around the floor 87,000 pieces it changes everything
Mitch
agree totally, but there is a grey area somewhere in between that will need careful consideration to say the least
I guess it comes to the matter of 'reasonable doubt'.
If you, as an expert, can say beyond reasonable doubt that file x existed on system a, because y percent of the total file still exists on it and in your expert opinion there is no other explanation for those blocks existing where they do, then surely that should be sufficient - that is your purpose as an expert witness after all, to present the evidence as exists and provide your expert opinion.
It depends entirely on the size and content of the file (i.e. lots of zero-blocks in a small file is not going to convince me!)
Further corroborating evidence (as per Simon's demonstration, there were limewire artefacts pointing to the file having been downloaded) is surely hugely useful.
Its easy to fall into the trap of saying that it is "x" percent of the total file (implying that the rest of that particualr file was there). But there is a danger that because we are trying to prove that one file was present, we ignore the other possibilities. What are the odds on another file producing a perfect match at block level? For example, a video file was demonstrated during the presentation and then around 25% was matched and re-constructed. Because the file format was sequential, the first part of the video ran and as we were comparing it to the first video that was shown, its easy to put 2 and 2 together. But there could be 1000 different versions of that video available online to download that have been edited half way through. So, on that basis, it's very hard to say that it's beyond reasonable doubt that the exact file was ever stored on that piece of media.
Obviously, the context of the investigation is all important.
I was at the same conference.
I think there are two issues. One is a match of a hash value, and we all know that a false positive is possible. However, if you can access the original file and do a byte compare on the sector / cluster then this removes false positives. You do though need to look at the data to make sure it is not likely to be a random match, eg a may file starts can be the same, or a sector with just a few numbers, and mainly zeros or 0xFFs.
The longer the match the better, ie cluster rather than sector. If you can show that the matching clusters are all user data (an not just control blocks) then I would suggest a few matches could be extremely important.
The demo we saw tried to scatter the sectors in an odd sequence, normally a file does not get very fragmented (5 fragments is a large number), and typically most fragments are in sequence.
To give more weight to the match, I would expect your few fragments to be sequential, and it should be clear from the disk use that the the other frgaments have been overwritten, and hopefully you can tell what has been written before and after the frgaments you have found. ie 4 matches in sequence is more significant than 4 random matches over the disk.
… but there is a grey area somewhere in between that will need careful consideration to say the least
That is probably 'the least'. What you need is research, both as to distribution of block hashes, as well as the size and distribution of fragments in non-contiguous files. In short, you need statistics.
The presence of unique block hashes – hashes that are present only in one file – would probably be of much higher importance than hashes that appear in several files.
But until there is solid statistics to lean on, there seems to be little to do with this.
I am intrigued by the legal implications and the weight of evidence a couple of hash matches on their own would carry.
Surely you can answer this for yourself. If someone is going to adduce this evidence in court then the non-technical people in the court (including the jury) are going to rely on the opinions of experts. What would you say if you were the technical expert asked to comment on the evidence that a file consisting of 500 * 512 byte blocks, only 1 or 2 blocks were found? What if it were 100, or even 250? What changes if the blocks found are all 0x00 or 0xFF?
Having seen this presentation at F3 myself, my one regret is that I didn't have this script 3 years ago. I had a case where I had to do block level hashing by hand (almost). I still have a print-out of the 4096 times table on my desk )
Paul
I can't speak about what was presented but I can say that the term"block level hashing" can a bit of a misnomer, at least as it is applied to an approach such as ssdeep/spamsum.
The more appropriate term is a "rolling hash" and this is somewhat different. It is easy to oversimplify (and harder to explain) but the goal of a rolling hash program is to determine the degree to which two files are alike, rather to identify which blocks are identical.
I want to be sure that we are talking about the same thing.
IMHO, true block level hashing could be important in identifying, for example, whether a certain program had been executed (i.e. via memory analysis) or whether unallocate space contained remnants of a file of interest, whereas a rolling hash would be more significant in identifying the degree to which two files were similar.
I would be much harder pressed to justify its use in demonstrating that a artefact found on a computer is indicative that a suspect file was there (given the small number of data points given in the straw man example).
Just want to make sure that this is an apples to apples comparison.
I wasn't at F3 but have heard about this presentation.
From listening to second hand versions of the talk, my thoughts are that it's an interesting academic exercise but it raises many questions as to its real world application. The technique as I understand it can prove that a fragment of a file that in its whole form is contraband exists. How do you then show how it got there? Was it ever there in its whole form? Was it originally there (as has already been raised) in an edited form? When did it get there? How do you show user intent? It's difficult enough showing this when you carve a whole file from unallocated, let alone a fragment of a file. Would be interested to hear whether these points were addressed.