Notifications

Clear all

Flaw in evidence verification process?

Page 1 / 2 Next

General (Technical, Procedural, Software, Hardware etc.)

Last Post by jaclaz 12 years ago

12 Posts

8 Users

0 Reactions

3,383 Views

RSS

ThePM

(@thepm)

Reputable Member

Joined: 18 years ago

Posts: 254

Topic starter 02/05/2014 8:23 pm [#11739]

Hey guys, I would appreciate your input on a discussion that we had at the office regarding the verification process of evidence files used by most forensic software/hardware.

For years, we were under the impression that after verifying an evidence file/drive, if the MD5 / SHA1 match, that it was confirmation that the source and the destination data was exactly the same and that we had a "forensic copy" of the source drive.

However, what was pointed out is that if there is an error in the bitstream of data that is read from the source drive, the erroneous data will be written to the destination file/drive and the cumulative "source" MD5 will be calculated from this erroneous data. When the target MD5 will be calculated for verification, it will be calculated using the same erroneous data, thus the verification MD5 will match the cumulative "source" MD5.

But this "verification" absolutely does not mean that we have an exact copy of the source drive, since it has been calculated on erroneous data. The only way to be absolutely certain that the evidence data is a forensic copy of the source data would then be to hash the source drive (aside from SSD drives that bring additional challenges).

Am I missing something here? Is there some data validation during the data transfer that I'm not aware of? Because now, from my standpoint, I can't testify that I'm using a forensic copy of a drive just because the hashes match.

thanks.

Quote

Techie1

(@techie1)

New Member

Joined: 13 years ago

Posts: 1

03/05/2014 2:36 am

Yes, although highly unlikely, I think erroneous data from the source media could cause this scenario. You may want to introduce a second separate hash of the source media - and preferably using a different tool to negate errors in the tool as well into your procedures.

IIRC a read command to IDE/SATA only has the facility to return data - or just not return data. There is no facility or separate channel to indicate errors. This is why when reading bad sectors a lot of computers seem to hang as they have a long timeout to wait for the data to be presented.

Probably need a post from an expert in HDD controllers etc to chip in on error detection on HDD reads.

ReplyQuote

mscotgrove

(@mscotgrove)

Prominent Member

Joined: 18 years ago

Posts: 940

03/05/2014 1:59 pm

I agree with Techie.

I think that an error in reading is very rare. Also, if there is an error it is normally going to be a repeated block, a block of rubbish, or maybe just a single bit error. Theses error will change the hash values.

However, the important point is if any very rare error will change the evidence. Again the chance is almost zero that it could change a 'no' to 'yes'.

A much bigger concern is how one images a failing disk where one knows that each read of the disk may produce different data.

Overall, a hash value is just one section of the overall 'control' system. If a hash difference is detected, the next stage will be to track down the reason for the difference and then decide if it is significant.

ReplyQuote

Anonymous 6593

(@Anonymous 6593)

Joined: 18 years ago

Posts: 1158

03/05/2014 2:09 pm

For years, we were under the impression that after verifying an evidence file/drive, if the MD5 / SHA1 match, that it was confirmation that the source and the destination data was exactly the same and that we had a "forensic copy" of the source drive.

You may want to try to trace where that idea comes from. It may apply to some particular piece of software, (or even hardware) used in particular circumstances, but it seems unsafe to generalize it beyond that.

The only way to be absolutely certain that the evidence data is a forensic copy of the source data would then be to hash the source drive (aside from SSD drives that bring additional challenges).

A hash can never give you absolute certainty of identity, only absolute certainty of non-identity. Of course, this depends on how you define 'absolute' – my interpretation is obviously 'absolute = with no error at all'.

Besides, hashing is not 'the only way'. You can also compare images bit by bit, without involving any hashing at all. It may be less practical, but it may be more useful, as it also tells you where and how extensive the discrepancies are, which is a base for more informed decision about if the discrepancies affects important evidence or not.

You also seem to assume that a hard disk will give you the same image the next time you image it. While it is probable, under normal conditions, it cannot be taken for granted. If the disk is stored away somewhere, and not actually used, the in formation on it decays. The next time you image it, you may get additional bad sectors, or changes in a known bad sector, and thus get a different hash. At that point, if the hash is all you go by, you're probably stuck.

Am I missing something here? … Because now, from my standpoint, I can't testify that I'm using a forensic copy of a drive just because the hashes match.

If the hash logged on acquiry matches a repeated image hash, it tells you that the image is unlikey (depending on what hash algorrithm is being used) to have been changed between time of acquiry and the time you perform the hash the second time.

But you should also know how the image was performed what the source is, what tool was used, how it was configured, if external conditions affected the operation, and if the acquiry report from it can be trusted or if it omits any information, and if it does, how you obtain it by other means. If there were bad sectors on the source disk, you should have a record of them, and you should know how they were treated (kept or replaced with zero, say).

You should, I think, be able to testify, that within those limits, the image corresponds to the original hard drive.

There are additional issues if an image is taken on an unstable platform, that unstability may affect data acquired. I remember an acquiry I made on a system with bad memory – I was unable to get a solid image until I had identified and removed the bad memory, but I did not get any error indications from the acquiry software. If the power supply is overloaded, or if a laptop has bad batteries (even if it is connected to mains power), you can get get some weird behaviour, which also may affect the behaviour of the image software. And if you're booting a Live CD for imaging, you may have to inspect – and perhaps even save – any system logs both before and after the acquiry to be able to say that there was no detected problems. (After you've ascertained that logging hasn't been turned off completely, of course.)

ReplyQuote

PaulSanderson

(@paulsanderson)

Honorable Member

Joined: 20 years ago

Posts: 651

03/05/2014 6:22 pm

Here an old post of mine from 2004 that impacts the OPs question.

http//osdir.com/ml/security.forensics/2004-04/msg00016.html

In this case one bit of the 16 bit IDE channel was held at 1 for every single read. Drive was read successfully (and could be re-read by any tool) but the data was corrupt.

Food for thought…

ReplyQuote

jhup

(@jhup)

Noble Member

Joined: 17 years ago

Posts: 1442

04/05/2014 9:11 am

First, if we want to be purists, you never, ever have an exact copy. You lack the sync, alignment, gap, ECC, bad blocks, various tables, the controller program, and other nuances from the device. But, I digress…

Is the error originates from the device? Is the part that generating the error part of the original evidence?

I think if, for example, there is a bit error generated by an IDE interface on a drive, then that error is part of the data - and should be part of it. Of course tracing it back and identifying it is important. Also, if the error is introduced by the forensic process, it must be removed if possible, or find a mitigating solution for the error generation.

Example - Would malware in evidence data be part of the evidence? This may sound circular, and contain the answer in itself. Yet I have talked to "forensicators" who's first reaction is to clean the malware.

This goes back to a pet peeve of mine.

We do not need exact "bit-by-bit" copies for forensics. Think about it. Is finger print analysis uses 100% of a (already partial copy of) fingerprint? Does DNA analysis uses 100% of the DNA?

Here is something that should blow your mind, if you are stuck on "bit-by-bit". In most other forensics fields the evidence, at least partially is destroyed… 😯

Remember, beyond reasonable doubt.

ReplyQuote

MDCR

(@mdcr)

Reputable Member

Joined: 16 years ago

Posts: 376

04/05/2014 9:39 am

Would probably be better if there were some sort of multi-hash signature that could detect corruption and even if that occurred you would have a signature that said that 99.99% of the media were still intact and you could probably ignore the whole signature problem - as it works today.

Example

A+B+C+D+E+F+G+H+I+J
vs
Q+B+C+D+E+F+G+H+I+J

= 90% still intact.

ReplyQuote

jaclaz

(@jaclaz)

Illustrious Member

Joined: 19 years ago

Posts: 5133

04/05/2014 7:40 pm

This goes back to a pet peeve of mine.

We do not need exact "bit-by-bit" copies for forensics. Think about it. Is finger print analysis uses 100% of a (already partial copy of) fingerprint? Does DNA analysis uses 100% of the DNA?

Here is something that should blow your mind, if you are stuck on "bit-by-bit". In most other forensics fields the evidence, at least partially is destroyed… 😯

Remember, beyond reasonable doubt.

Yep ) , I would spend a few words on the exact nature of the effects of data corruption (if any) in the imaging process.

One of the most common "issues" (among others) raised by the good guys "obsessed" by the bit-by-bit copy approach (and the use of write blockers, etc.) is that the sheer moment you connect a disk to a Windows NT system (before WinFE approach), it's signature may be altered.
This happens in two cases
1) the disk was never connected to a Windows NT (and thus has a 00000000 disk signature)
2) there is a collision with the disk signature of another disk connected at the same time (probabilistically very rare)
See also here
http//mistype.reboot.pro/documents/WinFE/winfe.htm#signatures
and more specifically here
http//reboot.pro/topic/18953-is-winfe-forensically-sound/
http//reboot.pro/topic/18953-is-winfe-forensically-sound/?p=177532

Think of a car accident, you take photos, you mark on the road where the vehicles are, you take measures and sketches, then you move the vehicles to allow the reopening of the road.

The day after you may decide to re-close the road for a few hours, put back the vehicles exactly where you found them to better understand the dynamics of the crash.

As long as the procedure is adequately documented, it is perfectly "forensically sound".

What I think are "common" false equations are
"forensically sound"="untouched"
"forensically sound"="identical"
"forensically sound"="unmodified"

I see "forensic sound" also something that has been "touched", "modified" or "moved", as long as this has been done along a procedure and of course a "proper" and repeatable procedure.

So, we attach a disk to a running Windows NT OS (with no automount).
In some cases it may change the disk signature. (as seen above there are several ways to avoid this or to make a "snapshot" of it before).

How this will affect the presence on the disk of a compromising exchange of e-mails, or of a folder containing tens or hundreds of CP images?
Will a disk signature change be able to create by sheer magic the above incriminating evidence?
More "widely" would a disk signature produce any change of any kind to other data (except the speciifc 4 bytes)?
Like altering timestamps, delete or make unrecoverable any data?

Taking it a step further, more serious (wrong) manipulations or changes to the filesystem will ever produce those artifacts?
Of course not, in the very worse case, a change to the filesystem will delete (or make inaccessible) some data.

If you think a bit about it, when you carve unallocated space and recover partially overwritten data, what you get is not "really sound" data, but rather fragments (or bits and pieces).
The "sound", "original" data, let's say as an example a Word document has already been altered (by beingn first deleted from the OS and then partially overwritten by another file), yet the parts that you manage to recover and re-assemble can be part of the accusatory or exculpatory evidence.
What if the same Word document becomes corrupt because of a malfunction (or instability, or whatever) of the system while you imaged it?
Are not the bits and pieces you recover from it "as good as" the bits ad pieces you recover form the .doc carved in free space?

jaclaz

ReplyQuote

ThePM

(@thepm)

Reputable Member

Joined: 18 years ago

Posts: 254

Topic starter 05/05/2014 8:04 pm

Thanks everyone for the input.

A couple of remarks about your comments

I think that an error in reading is very rare. Also, if there is an error it is normally going to be a repeated block, a block of rubbish, or maybe just a single bit error. Theses error will change the hash values.

The assumptions of my colleagues were that when using USB to create the image (or transfer any large amount of data), there were more risks of errors during the transfer. I cannot say that I share those assumptions as I have not seen more errors when transferring data via USB than through other connection modes. As for the statement "These errors will change the hash values", this is true if you generate the hash value of the source drive. Not if you rely only on the "verification" process of most forensic software/hardware solutions.

First, if we want to be purists, you never, ever have an exact copy. You lack the sync, alignment, gap, ECC, bad blocks, various tables, the controller program, and other nuances from the device. But, I digress…

Of cours, I was only considering "user data", not the data that is in the system area or the servo metadata.

Is the error originates from the device? Is the part that generating the error part of the original evidence?

I think if, for example, there is a bit error generated by an IDE interface on a drive, then that error is part of the data - and should be part of it. Of course tracing it back and identifying it is important

In the scenario I was debating with my colleagues, the error was introduced during the data transfer, so the error is not present on the source drive. As tracing it back goes, I'm not sure how this can be done if the verification process does not indicate an error in the first place.

We do not need exact "bit-by-bit" copies for forensics. Think about it. Is finger print analysis uses 100% of a (already partial copy of) fingerprint? Does DNA analysis uses 100% of the DNA?

Here is something that should blow your mind, if you are stuck on "bit-by-bit". In most other forensics fields the evidence, at least partially is destroyed…

I agree that we do not need exact "bit-by-bit". However, I believe that if you have the opportunity to use an entire exact copy of the data, why shouldn't you? I'm not a specialist in other fields of forensics, but I guess if fingerprint analysts could have the choice of working from partial fingerprints or full fingerprints, the would choose full fingerprints.

Again, I agree that we do not need exact copies. With proper documentation, we can definitely explain in court why a copy might not be an exact copy of the source drive. However, I would like to know that my copy is not exact and that's the issue I'm raising with the verification process. Because until now, when I created an image (or a clone) of a drive using a forensic software with the "verify" option and that I got a "verified successfully" or "hashes matching" result, I assumed that it meant that, at the time of capture, the user data from my source drive and my destination drive were identical and that I could testify on that. But, right now, I believe that I cannot swear that the drives are identical, despite the results giveng by the imaging software/device. A defence attorney who knows his stuff or that saw this thread might contradict me on this and he could be right.

A hash can never give you absolute certainty of identity, only absolute certainty of non-identity. Of course, this depends on how you define 'absolute' – my interpretation is obviously 'absolute = with no error at all'.

I'm not sure I'm following you here… If there the slightest difference between 2 files (even 1 bit), the hash values between the 2 files will be completely different. So, if the hash values of 2 files match, then it should indicate that they are identical, thus no error at all.

How this will affect the presence on the disk of a compromising exchange of e-mails, or of a folder containing tens or hundreds of CP images?
Will a disk signature change be able to create by sheer magic the above incriminating evidence?
More "widely" would a disk signature produce any change of any kind to other data (except the speciifc 4 bytes)?
Like altering timestamps, delete or make unrecoverable any data?

I totally agree with you on the fact that an error during the data transfer will not make incriminating evidence appear. I believe, as you said, that the worst thing that might happen is make evidence disappear or alter timestamps. And make you look bad in court if you testified that the copies were identical…

ReplyQuote

Anonymous 6593

(@Anonymous 6593)

Joined: 18 years ago

Posts: 1158

05/05/2014 8:58 pm

A hash can never give you absolute certainty of identity, only absolute certainty of non-identity. Of course, this depends on how you define 'absolute' – my interpretation is obviously 'absolute = with no error at all'.

I'm not sure I'm following you here… If there the slightest difference between 2 files (even 1 bit), the hash values between the 2 files will be completely different. So, if the hash values of 2 files match, then it should indicate that they are identical, thus no error at all.

I'm afraid that isn't correct – at least not from a strict point iof view. (That's why I explained my take on 'absolute')

If two files (of unknown contents) are hashed, and the hashes are different, then the files are also different. The only way the same file can produce two different hash sums is if the implementation is bad, or the hash function isn't repeatable … and I ignore those possibilities as uninteresting.

However, if the hashes are the same, there is a small probability that the files may be different. (After all, a hash of fixed width can only distinguish so many files, say N. Now add one extra file to that collection of N files that each hash to a unique hash value. What hash does that additional file get – it must be one of those already calculated, hence a collision. This just demonstrates the fact that there will be such collisions, not that they are likely. In some special cases, we can already generate two files of different contents that have the same hash – while this is quite artificial, it's still a sign of unwanted weakness in the hash function.)

The probability for such collision has never been well estimated – the closest anyone comes is by assuming the files have random contents, and that any imperfections of the hash function can be ignored, and so end up with an estimate of once in 2^(nr of bits in the hash) cases. However, as files are extremely unlikely to be random, and as hash functions are known to be less than perfect – though not very much – as regards to bit distribution, all we can say just now that 2^(whatever) is an optimistic estimate. But at present no-one seems to know what the error term is.

While just about everyone seem to prefer the notion that the probability for a collision can be ignored in practice, it seems foolish to insist that it is ignorable absolutely.

That's why I say – different hashes, definitely different files; same hashes, probably same file.

But then I don't have to deal with juries …

ReplyQuote

Page 1 / 2 Next