Child Exploitation ...
 
Notifications
Clear all

Child Exploitation Hash Sets

28 Posts
10 Users
0 Reactions
7,908 Views
tracedf
(@tracedf)
Estimable Member
Joined: 10 years ago
Posts: 169
 

If the hashsets were made publicly available I would be utterly shocked if there weren't sites on Tor which, within 24 hours of this availability, would guarantee (and advertise) that their entire collection of material is not found in any LE hashset.

Any of them could do this now. There's no reason to have the hash set; they just need to make each image unique by modifying at least one bit of the image. They could build a web app to serve up the images and modify one pixel at random each time the image is accessed; every download would be unique and therefore every hash value would be unique.

-Steven


   
ReplyQuote
PaulSanderson
(@paulsanderson)
Honorable Member
Joined: 19 years ago
Posts: 651
 

all forensic products and hash sets would need to move away from MD5 and adopt another hash algorithm.

Or maybe for just those investigations that rely solely on a hash.

Unless things have changed since I last did lots of investigations (quite possible).

Hashsets were used to identify the positives - which were then reviewed as there have always been a few spurious files in any hashset and sometimes classification baselines change.

Then the remainder of the images where either ignored or manually classified.

So even if poisoned images do start appearing - those that are found would be ruled out by someone having alook at the image before a charge was made.


   
ReplyQuote
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

That's not possible given any of the currently known attacks on MD5 or SHA-1. There are two basic criteria for a hash function

I have no idea (and I don't want to know) which hash algorithm the "known hashsets" use.

IF it is MD5, it is relatively easy (and cheap) to create a collision
http//natmchugh.blogspot.it/2014/10/how-i-created-two-images-with-same-md5.html

I found that I was able to run the algorithm in about 10 hours on an AWS large GPU instance bringing it is at about $0.65 plus tax.

@PaulSanderson
Sure ) , no risk of jailing an innocent.

The fun of the (fictitious/hypothetical) collision making "attack" would be exactly that of having the investigators go through thousands of false positive lolcats.

jaclaz


   
ReplyQuote
tracedf
(@tracedf)
Estimable Member
Joined: 10 years ago
Posts: 169
 

IF it is MD5, it is relatively easy (and cheap) to create a collision
http//natmchugh.blogspot.it/2014/10/how-i-created-two-images-with-same-md5.html

jaclaz

That's for collisions not pre-images. The distinction is that a pre-image matches a known hash value either by recreating the original input or by finding another input to match the same hash. The known, practical attacks on MD5 do not do this; instead, they work two find two inputs with the same hash but are able to modify either input.

Finding a preimage for a specific hash is a much harder problem.


   
ReplyQuote
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

That's for collisions not pre-images. The distinction is that a pre-image matches a known hash value either by recreating the original input or by finding another input to match the same hash. The known, practical attacks on MD5 do not do this; instead, they work two find two inputs with the same hash but are able to modify either input.

Finding a preimage for a specific hash is a much harder problem.

Sure, the practical result of the given experiment is two (actually three) images with the same hash, which is all that is needed to make m00t of MD5 hashes as a method of "recognizing" images, because (in theory) the bad guys could produce a number of specially crafted "real CP" images, and once they become part of the hashset, start feeding specially crafted lolcats with corresponding hashes.

The fact that pre-image breaking is much harder (which is obviously a good thing ) ) only avoids that a document that you created and hashed might be substituted by one with the same hash (because creating this latter document is impossible or too "processing heavy") but in this case the creators of the documents/images are unknown, the hashes are (presumably) added to the hashset as soon as they are "seen in the wild", so there is no control on the making of the initial document/image and thus the hash might be "contaminated".

The "Nostradamus" (nice BTW) POC about US Presidential Elections 2008
http//www.win.tue.nl/hashclash/Nostradamus/
works this way.

Possible? Yes.
Probable or going to happen soon? No.

jaclaz


   
ReplyQuote
BraindeadVirtually
(@braindeadvirtually)
Estimable Member
Joined: 17 years ago
Posts: 115
 

I'm not sure which I find less credible

1. An IIOC case is reviewed exclusively on whether data pinging against some known hash sets is present (so any new stuff just gets ignored presumably?) and otherwise it all gets overlooked - please please tell me nobody is doing this or anything like this! Hashing should be used exclusively for indicative white/blacklisting i.e. at best a timesaver prior to proper filtering and searching.

2. A suspect is sufficiently technically savvy to understand file hashing, obtain LE hash sets, compare them against his or her data, and then slightly modify his or her library of IIOC material as necessary to not flag anything when the device gets seized and investigated by the criminally lazy investigator in (1.)… and yet doesn't think to use extremely strong encryption/duress passwords etc, which, let's face it, are going to give most investigators far more of a headache than the fact your hash sets aren't pinging known IIOC.

Ahh, the rabbit holes we can go down when we completely ignore reality…


   
ReplyQuote
Chris_Ed
(@chris_ed)
Reputable Member
Joined: 16 years ago
Posts: 314
 

1. An IIOC case is reviewed exclusively on whether data pinging against some known hash sets is present (so any new stuff just gets ignored presumably?) and otherwise it all gets overlooked..

Nobody is saying this in this thread.

2. A suspect is sufficiently technically savvy to understand file hashing.. etc.

Of course, encryption is absolutely a better way to hide your data, and I doubt that any single person would take the time to alter their images in such a way - but even if this single use case is unlikely to occur, IMO it is still in the interest of LE not to publicly provide such hash sets.

Ahh, the rabbit holes we can go down when we completely ignore reality…

Snark noted and appreciated, thumbs up. )


   
ReplyQuote
BraindeadVirtually
(@braindeadvirtually)
Estimable Member
Joined: 17 years ago
Posts: 115
 

IMO it is still in the interest of LE not to publicly provide such hash sets.

I agree with everything you have said (you are always welcome for the snark) apart from this. A pure list of SHA256 / MD5 strings is of little to no value to somebody looking to employ counterforensics, in my opinion. That said, I'm in LE and have no intention of sharing any hashes that aren't already openly available with anybody, because that seems like the correct thing to do.


   
ReplyQuote
Page 3 / 3
Share: