Notifications

Clear all

Child Exploitation Hash Sets

airo · 2016-09-13T11:50:55Z

HiCan anybody help us locate the Child Exploitation Hash Sets. We are currently looking at writing scanning software for images and classify them in different categories. Having access to these hash sets would be useful.We know that these hashsets should be free but failed to get access to them, and not sure whom to reach.Thanks & regardsIan

Page 3 / 3 Prev

General (Technical, Procedural, Software, Hardware etc.)

Last Post by BraindeadVirtually 9 years ago

28 Posts

10 Users

0 Reactions

7,908 Views

RSS

tracedf

(@tracedf)

Estimable Member

Joined: 10 years ago

Posts: 169

16/09/2016 10:57 pm

If the hashsets were made publicly available I would be utterly shocked if there weren't sites on Tor which, within 24 hours of this availability, would guarantee (and advertise) that their entire collection of material is not found in any LE hashset.

Any of them could do this now. There's no reason to have the hash set; they just need to make each image unique by modifying at least one bit of the image. They could build a web app to serve up the images and modify one pixel at random each time the image is accessed; every download would be unique and therefore every hash value would be unique.

-Steven

ReplyQuote

PaulSanderson

(@paulsanderson)

Honorable Member

Joined: 19 years ago

Posts: 651

16/09/2016 11:01 pm

all forensic products and hash sets would need to move away from MD5 and adopt another hash algorithm.

Or maybe for just those investigations that rely solely on a hash.

Unless things have changed since I last did lots of investigations (quite possible).

Hashsets were used to identify the positives - which were then reviewed as there have always been a few spurious files in any hashset and sometimes classification baselines change.

Then the remainder of the images where either ignored or manually classified.

So even if poisoned images do start appearing - those that are found would be ruled out by someone having alook at the image before a charge was made.

ReplyQuote

jaclaz

(@jaclaz)

Illustrious Member

Joined: 18 years ago

Posts: 5133

17/09/2016 1:14 am

That's not possible given any of the currently known attacks on MD5 or SHA-1. There are two basic criteria for a hash function

I have no idea (and I don't want to know) which hash algorithm the "known hashsets" use.

IF it is MD5, it is relatively easy (and cheap) to create a collision
http//natmchugh.blogspot.it/2014/10/how-i-created-two-images-with-same-md5.html

I found that I was able to run the algorithm in about 10 hours on an AWS large GPU instance bringing it is at about $0.65 plus tax.

@PaulSanderson
Sure ) , no risk of jailing an innocent.

The fun of the (fictitious/hypothetical) collision making "attack" would be exactly that of having the investigators go through thousands of false positive lolcats.

jaclaz

ReplyQuote

tracedf

(@tracedf)

Estimable Member

Joined: 10 years ago

Posts: 169

17/09/2016 8:26 am

IF it is MD5, it is relatively easy (and cheap) to create a collision
http//natmchugh.blogspot.it/2014/10/how-i-created-two-images-with-same-md5.html
…
jaclaz

That's for collisions not pre-images. The distinction is that a pre-image matches a known hash value either by recreating the original input or by finding another input to match the same hash. The known, practical attacks on MD5 do not do this; instead, they work two find two inputs with the same hash but are able to modify either input.

Finding a preimage for a specific hash is a much harder problem.

ReplyQuote

jaclaz

(@jaclaz)

Illustrious Member

Joined: 18 years ago

Posts: 5133

17/09/2016 2:43 pm

That's for collisions not pre-images. The distinction is that a pre-image matches a known hash value either by recreating the original input or by finding another input to match the same hash. The known, practical attacks on MD5 do not do this; instead, they work two find two inputs with the same hash but are able to modify either input.

Finding a preimage for a specific hash is a much harder problem.

Sure, the practical result of the given experiment is two (actually three) images with the same hash, which is all that is needed to make m00t of MD5 hashes as a method of "recognizing" images, because (in theory) the bad guys could produce a number of specially crafted "real CP" images, and once they become part of the hashset, start feeding specially crafted lolcats with corresponding hashes.

The fact that pre-image breaking is much harder (which is obviously a good thing ) ) only avoids that a document that you created and hashed might be substituted by one with the same hash (because creating this latter document is impossible or too "processing heavy") but in this case the creators of the documents/images are unknown, the hashes are (presumably) added to the hashset as soon as they are "seen in the wild", so there is no control on the making of the initial document/image and thus the hash might be "contaminated".

The "Nostradamus" (nice BTW) POC about US Presidential Elections 2008
http//www.win.tue.nl/hashclash/Nostradamus/
works this way.

Possible? Yes.
Probable or going to happen soon? No.

jaclaz

ReplyQuote

BraindeadVirtually

(@braindeadvirtually)

Estimable Member

Joined: 17 years ago

Posts: 115

20/09/2016 4:27 pm

I'm not sure which I find less credible

1. An IIOC case is reviewed exclusively on whether data pinging against some known hash sets is present (so any new stuff just gets ignored presumably?) and otherwise it all gets overlooked - please please tell me nobody is doing this or anything like this! Hashing should be used exclusively for indicative white/blacklisting i.e. at best a timesaver prior to proper filtering and searching.

2. A suspect is sufficiently technically savvy to understand file hashing, obtain LE hash sets, compare them against his or her data, and then slightly modify his or her library of IIOC material as necessary to not flag anything when the device gets seized and investigated by the criminally lazy investigator in (1.)… and yet doesn't think to use extremely strong encryption/duress passwords etc, which, let's face it, are going to give most investigators far more of a headache than the fact your hash sets aren't pinging known IIOC.

Ahh, the rabbit holes we can go down when we completely ignore reality…

ReplyQuote

Chris_Ed

(@chris_ed)

Reputable Member

Joined: 16 years ago

Posts: 314

20/09/2016 5:49 pm

1. An IIOC case is reviewed exclusively on whether data pinging against some known hash sets is present (so any new stuff just gets ignored presumably?) and otherwise it all gets overlooked..

Nobody is saying this in this thread.

2. A suspect is sufficiently technically savvy to understand file hashing.. etc.

Of course, encryption is absolutely a better way to hide your data, and I doubt that any single person would take the time to alter their images in such a way - but even if this single use case is unlikely to occur, IMO it is still in the interest of LE not to publicly provide such hash sets.

Ahh, the rabbit holes we can go down when we completely ignore reality…

Snark noted and appreciated, thumbs up. )

ReplyQuote

BraindeadVirtually

(@braindeadvirtually)

Estimable Member

Joined: 17 years ago

Posts: 115

20/09/2016 6:21 pm

IMO it is still in the interest of LE not to publicly provide such hash sets.

I agree with everything you have said (you are always welcome for the snark) apart from this. A pure list of SHA256 / MD5 strings is of little to no value to somebody looking to employ counterforensics, in my opinion. That said, I'm in LE and have no intention of sharing any hashes that aren't already openly available with anybody, because that seems like the correct thing to do.

ReplyQuote

Page 3 / 3 Prev

8 Forums
15.7 K Topics
92.3 K Posts
309 Online
41.1 K Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed