Notifications

Clear all

Hash validity

General (Technical, Procedural, Software, Hardware etc.)

Last Post by jaclaz 11 years ago

10 Posts

5 Users

0 Reactions

963 Views

RSS

jmack

(@jmack)

New Member

Joined: 12 years ago

Posts: 4

Topic starter 06/06/2014 1:53 am

Greetings

I am looking for scholarly research regarding hash validity. I am looking to ascertain the validity of hash values regarding courtroom testimony for various hash functions. I am specifically refereeing to forensic use of hash functions for the identification or validation of files or strings. For example the hash validity of MD5 collision being one in ################.

Quote

jaclaz

(@jaclaz)

Illustrious Member

Joined: 18 years ago

Posts: 5133

06/06/2014 2:00 am

Where ################ equates to N ~ 2^42 = 4.398.046.511.104 😯

Though this post is in Italian, it contains a large number of references
http//www.marcomattiucci.it/md5.php

Mainly (reyour question)
http//www.marcomattiucci.it/MD5collision.pdf
http//www.marcomattiucci.it/hash.pdf

jaclaz

ReplyQuote

jmack

(@jmack)

New Member

Joined: 12 years ago

Posts: 4

Topic starter 06/06/2014 2:00 am

SHA1 (160 bit) probability that two files will be similar one in a quindecillion chance.

ReplyQuote

jaclaz

(@jaclaz)

Illustrious Member

Joined: 18 years ago

Posts: 5133

06/06/2014 4:41 pm

SHA1 (160 bit) probability that two files will be similar one in a quindecillion chance.

No. (
There is nothing like "similar" in hashing, it is pretty much binary, two hashes are the same (=1) or they are NOT (=0).

As a matter of fact, to have a collision, the files need NOT to be "similar", and it is very likely that they will be quite different.

The chances of finding a collision are actually between 2^69 (specific calculation) and 2^80 (random one), see previous linked to paper and
https://www.schneier.com/blog/archives/2005/02/cryptanalysis_o.html

SHA-1 produces a 160-bit hash. That is, every message hashes down to a 160-bit number. Given that there are an infinite number of messages that hash to each possible value, there are an infinite number of possible collisions. But because the number of possible hashes is so large, the odds of finding one by chance is negligibly small (one in 2^80, to be exact). If you hashed 2^80 random messages, you'd find one pair that hashed to the same value. That's the "brute force" way of finding collisions, and it depends solely on the length of the hash value. "Breaking" the hash function means being able to find collisions faster than that. And that's what the Chinese did.

They can find collisions in SHA-1 in 2^69 calculations, about 2,000 times faster than brute force.

See also (about the feasability of the attempt and related costs)
https://www.schneier.com/blog/archives/2012/10/when_will_we_se.html

But the point is …? ?

jaclaz

ReplyQuote

Anonymous 6593

(@Anonymous 6593)

Guest

Joined: 17 years ago

Posts: 1158

06/06/2014 4:56 pm

SHA1 (160 bit) probability that two files will be similar one in a quindecillion chance.

Good exanple, mainly because it raises so many questions.

It's not quite right. Given file A, the comparison file must be chosen at random – if your choice is predictable, so is the hash.

Also, the contents of the file has to be random – for much the same reason.

At that point the OP must ask are those criteria fulfilled? Or is he comparing file A with a file that isn't a random choice, but has been chosen by someone else? In that case, nothing can be said about likelihood of getting the same hash by using this kind of approach.

Add to that the fact that very few files have random contents. How does that affect the result? Much? Little?

ReplyQuote

AlexC

(@alexc)

Reputable Member

Joined: 16 years ago

Posts: 301

06/06/2014 6:19 pm

For the right type of data, the probability of a collision in MD5 is 100%

http//www.mscs.dal.ca/~selinger/md5collision/ (TLDR you can create a windows executable of the same file size and has that does two different things)

SHA-1 is also compromised https://www.schneier.com/blog/archives/2005/02/sha1_broken.html (though not in as horrible of a way)

MD5 is properly, properly broken though.

Edit
Does this matter? Well the collisions are engineered, so for validating that some data which you have control of hasn't changed…it's not ideal, but it's not world ending. For checking the identity of previously known data - that is where that MD5 issue is dangerous. If you can hide malicious content in the data of known-good data… well, we just shouldn't be using MD5…

ReplyQuote

jaclaz

(@jaclaz)

Illustrious Member

Joined: 18 years ago

Posts: 5133

06/06/2014 6:34 pm

For the right type of data, the probability of a collision in MD5 is 100%

NO. cry
That is NOT the probability of a collision.

That is the fact that a collision can be artificially provoked through (a simple) calculation.

The calculation method found is simple enough to be within practical limits of *any* computer, hence MD5 is "compromised", but the probability of a collision on random files/data remains ~2^64, or more exactly 1.26*2^64 (but as seen in the already linked to sources can be lowered in a collision attack to around 2^42).

The SHA-1 as you can read in the links (also by Bruce Schneier) I posted earlier is NOT "compromised" as the calculation method found only reduces the number of brute force attempts from 2^80 (probability of a collision) to 2^69, which is however yet well beyond the possibilities of most computers
https://www.schneier.com/blog/archives/2012/10/when_will_we_se.html

Then, since log2(350) ~ 8.4 the cost of the attack will be approximately

213 * 28.4 = 221.4 ~ $2.77M in 2012

211 * 28.4 = 219.4 ~ $700K by 2015

29 * 28.4 = 217.4 ~ $173K by 2018

27 * 28.4 = 215.4 ~ $43K by 2021

A collision attack is therefore well within the range of what an organized crime syndicate can practically budget by 2018, and a university research project by 2021.

jaclaz

P.S. Edited to make more clear the difference between collision probability and range for a collision attack.

ReplyQuote

jhup

(@jhup)

Noble Member

Joined: 16 years ago

Posts: 1442

08/06/2014 7:07 am

Except much of this is really for entertainment purposes and to annoy the opposing counsel (which is worthwhile in itself).

After all, the original post was about "forensic use of hash functions for the identification or validation of files or strings."

As so succinctly pointed out by jaclaz, "[…] to have a collision,[…] it is very likely that they will be quite different."

When I hash things, I also know not just the hash (as in cryptography), but the file type, MACD, location, ACL and all kind of other details. Each detail further increases the requirements and the exponent of the collision possibility.

Think about it. What are the chances of creating two sequences of bytes which are both valid Microsoft Word 2007 documents, both have matching MFT entries, etc? Slightly less complicated with raw bytes based data (such as text files), but even there the text file can only use certain ASCII, in a specific sequence, and so on.

ReplyQuote

Anonymous 6593

(@Anonymous 6593)

Guest

Joined: 17 years ago

Posts: 1158

08/06/2014 11:36 am

Except much of this is really for entertainment purposes and to annoy the opposing counsel (which is worthwhile in itself).

For all we know, that may be the situation of the original poster to annoy or to be annoyed.

ReplyQuote

jaclaz

(@jaclaz)

Illustrious Member

Joined: 18 years ago

Posts: 5133

08/06/2014 2:55 pm

For all we know, that may be the situation of the original poster to annoy or to be annoyed.

If the second, you can leave it to me wink , I usually can manage to annoy someone in a very short time. D

For NO apparent reason mrgreen
http//media.veryfunnypics.eu/2013/03/funny-picture-430-am-and-youre-trying-to-sleep-let-me-sing-you-the-song-of-my-people.jpg

jaclaz

ReplyQuote

Android Forensics

I have conducted a logical and partial extraction of a ...

By SgtAndroid , 6 hours ago
Article: The Balance Between Digital Forensic Examiners And Digital Evidence Technicians: Expertise Vs. Efficiency

Can digital forensic labs cut backlogs without cutting ...

By Zoe , 1 day ago
Prefetch Question

Hello All, I have a question regarding Windows prefet...

By Forensic_Tester , 2 days ago

8 Forums
15.7 K Topics
92.3 K Posts
176 Online
41.1 K Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed