Notifications
Clear all

Let's talk about MD5

26 Posts
12 Users
0 Reactions
4,449 Views
(@jonathan)
Prominent Member
Joined: 20 years ago
Posts: 878
 

Paul, good points. However on the "mud sticks" level alone why not use SHA256/512 where practical to do so to save you having to answer these questions again?


   
ReplyQuote
(@Anonymous 6593)
Guest
Joined: 17 years ago
Posts: 1158
 

Based on the quote above that the probability of a collision is 2^21 this, taken at face value, means in any set of slightly over 2 million files we should expect to get a collision. In the real world how often does that happen?

Well spotted. I got confused over two different aspects of MD5 'strength'. The 2^21 does not refer to collision probability (as I claimed) but the computation complexity the amount of work it takes to find a collision 2^21 MD5 hashings. Even though it applies to a specific case, it is still a disastrous result for a cryptographic hash
algorithm.

That, of course, makes my reference to CRC32 collision probabilities nonsense. Mea culpa.

I'd also like to clarify that my opinion of MD5 refers to MD5 alone, and that is also the sense in which MD5 is broken. Actually, it should probably be argued that if we need to add a manual check of file contents after a MD5-based identification of a file as CP, we also have sufficient evidence that MD5 isn't completely trusted on its own.


   
ReplyQuote
(@mscotgrove)
Prominent Member
Joined: 17 years ago
Posts: 940
 

I would like to understand the 2 ^ 21 chance of a collision

If I take a file and manipulate a 128 bit buffer (maybe with random data) there is good likely hood of a collision within 2 million attempts. Or have I misunderstood the 2^21 chance?


   
ReplyQuote
jhup
 jhup
(@jhup)
Noble Member
Joined: 16 years ago
Posts: 1442
 

The numbers here do not grant proof of spoliation to the contesting person.

Although there maybe 1 in 2 million chance of MD5 hash collision, the probability is much higher for a collision where the content would be legitimate.

That is, although there maybe a collision, but the colliding data would be likely gibberish.

It is highly improbable to find, for example an E01 file with ex- and culpatory registry entries.

Second to the mathematics, when MD5 collisions are brought up in legal discussions, there is the giant naked mole rat in the corner, no one wants to talk about. This discussion implies that the chain of custody was compromised someone, or that the FI compromised the data intentionally. Both are high hurdle to pass with legitimate FIs.

So, as far as scientific exercise, and being aware of this is great.

From practical perspective, it is of little value to contest MD5s, on this context.

[corrected spelling]


   
ReplyQuote
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

So, I am using MD5 signatures to exclude a load of "known" executables/files/whatever so that I can reduce the number of files I need to review in a hacking/malware/whatever case, should I use MD5? hell, yes - it's a means to an end and if a review of what remains finds nothing this approach won't be the only string to my bow.

IMHO this is (possibly) the single point where I beg to disagree (playing devil's advocate).

Let's say that a file containing some (incriminating) data (steganographed, crypted, or what not) has been altered in such a way that file length and MD5 correspond to a "known file" in your list of MD5 hashes.

So you first divide the files between the "known ones" and the "unknown ones".

Then you go through the "unknown ones" with whatever toll is suited to analyze them.

Two possibilities

  1. you find "nothing"
  2. you find "something"
  3. [/listo]
    If #1 you go through the "prevously catalogued as known files" and see if you find "something".
    If #2 what happens?
    Again two possibilities

    1. the "something" you found is in your opinon "NOT enough"
    2. the "something" you found is in your opinon "enough"
    3. [/listo]
      If #1 again you go through the "prevously catalogued as known files" and see if you find "something more".
      If #2 you never check the "prevously catalogued as known files" as you are "satisfied" with what you have found (or you decide to go through them nonetheless?)

      jaclaz


   
ReplyQuote
(@mscotgrove)
Prominent Member
Joined: 17 years ago
Posts: 940
 

.

Although there maybe 1 in 2 million change of MD5 hash collision, the probability is much higher for a collision where the content would be legitimate.

But surely this means that by playing with just a few million variations of a non critical 128 bit block of data, completely different files could be made to match. Many files do not mind if data is added to end of them. As MD5 works it's value based on each 128 bit stream of data, it is probably not critical which 128 bit block you 'play' with.

Am I missing something? It sounds as if you can change file contents as much as you like, and then play with maybe 2-10 million random 128 bit blocks and hope for a MD5 value you require.

If the 2^21 chance is actually 2^61 chance, then it is secure (for a few more years).


   
ReplyQuote
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

@mscotgrove
But are we talking of "casual collisions" (IMHO unlikely) or "intentional collisions"?

The latter ones are relatively easy, see here
http//www.mathstat.dal.ca/~selinger/md5collision/
(for .exe's)

and here (postscript file)
http//ipsc.ksp.sk/contests/ipsc2006/real/problems/h.html
http//ipsc.ksp.sk/contests/ipsc2006/real/solutions/h.html

Another approach
http//www.win.tue.nl/hashclash/
http//cryptography.hyperlink.cz/MD5_collisions.html
of course a lot of CPU is needed but it seems like "more than doable".

jaclaz


   
ReplyQuote
jhup
 jhup
(@jhup)
Noble Member
Joined: 16 years ago
Posts: 1442
 

Exactly.

The the giant naked mole rat in the corner is what "intentional collisions" implies.

Either the forensic investigator failed to maintain custody, or she intentionally falsified data.

@mscotgrove
But are we talking of "casual collisions" (IMHO unlikely) or "intentional collisions"?

The latter ones are relatively easy, see here
http//www.mathstat.dal.ca/~selinger/md5collision/
(for .exe's)

and here (postscript file)
http//ipsc.ksp.sk/contests/ipsc2006/real/problems/h.html
http//ipsc.ksp.sk/contests/ipsc2006/real/solutions/h.html

jaclaz


   
ReplyQuote
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

Exactly.

Exactly WHAT? 😯

Q. Are we talking of A or B?
A. Exactly.

jaclaz


   
ReplyQuote
jhup
 jhup
(@jhup)
Noble Member
Joined: 16 years ago
Posts: 1442
 

The question.

Exactly the right question to ask. The answer is, the question. The question, is the answer.

mrgreen


   
ReplyQuote
Page 2 / 3
Share: