I think Paul said it perfect. We over complicate things to the very extreme. MD5 is a perfectly good has algorithm. Using some other is just a waste of CPU power. The likely hood of two files having the same MD5 is 2^128. Compound this with other factors like the two files in question are going to exist in the universe of document that you are testing and the statistics get even slimmer.
If you want something cheap and effective, try this
http//noclone.net
Using SHA1 is not a waste of CPU power in this case, as computing hashes is limited by disk I/O and not by how much computation is required for the hash. (Even computing MD5, SHA1, SHA-256, and SHA-512 simultaneously should be I/O-bound.)
The likely hood of two files having the same MD5 is 2^128.
No, the chance of two particular files having the same MD5 hash is 1 in 2**128, not any two files. If you have drawn an Ace the chance of drawing a second Ace is not the same as the chance of drawing any Pair from a deck.
The chance of two files having the same MD5 hash value will be 1/(2^128), roughly equal to 1/(3.4 X 10^38), or roughly the chance of one in 340 billion billion billion billion.
Yes, and the chance of a random MD5 collision in a set of N files is roughly N^2/2^129.
Yes sorry for the typo 1^128. My point is still that the reality of you having an MD5 collision is so remote that why worry about it. The courts have validated it as a valid method of fingerprinting a file.
MD5 is not safe, see example of MD5 collisions
http//
MD5 is not safe, see example of MD5 collisions
http//noclone.net/info/Trueduplicate.aspx
VERY clear sentence roll
Why MD5 is not reliable?
Some of the duplicate finding software in the market uncovers duplicate files by comparing MD5 hash string of file content. However it is not reliable that there is a chance of MD5 hash collision.
I like how the article expands on the computation of probabilities of an MD5 collision…. 😯
Unfortunately the actual "example" has no working links…. (
But they are here wink
http//
http//
It seems to me like the example is a "forged" one, i.e. something intentionally written, since there are only 6 bytes difference in FC /B ?
C\Downloaded\testnoclone>fc /B hello.exe erase.exe
Confronto in corso dei file hello.exe e ERASE.EXE
00000953 09 89
0000096D 86 06
0000097B 91 11
00000993 28 A8
000009AD 54 D4
000009BB E8 68
…. the "scaring string" is present also in the "harmless" hello.exe….
jaclaz
Yes, "hello" and "erase" are intentionally-produced collisions. There's a little "evilize" program that will allow you to generate such executables.