Join Us!

Child Exploitation ...
 
Notifications
Clear all

Child Exploitation Hash Sets  

Page 1 / 2
  RSS
airo
 airo
(@airo)
New Member

Hi

Can anybody help us locate the Child Exploitation Hash Sets. We are currently looking at writing scanning software for images and classify them in different categories. Having access to these hash sets would be useful.

We know that these hashsets should be free but failed to get access to them, and not sure whom to reach.

Thanks & regards
Ian

Quote
Posted : 13/09/2016 12:50 pm
Chris_Ed
(@chris_ed)
Active Member

Hi Ian,

I'm going to assume you're not in Law Enforcement for the sake of this reply; if you are LE then there should be channels by which you can obtain hash sets. Whether they "should be free" or not is perhaps a different conversation. Outside of LE you will have a difficult time obtaining hash sets like this.

That aside - broadly speaking, what you are looking for is Project VIC. Have you tried applying to become an official partner at somewhere like ForceLab? That seems to be the developer-facing version of VIC.

As another aside - do you need specific hash sets of child expoitation material? If you are working on "scanning and categorisation" software then surely you can build your own hash set and demonstrate it on a working copy. but pretending for a moment that you have the hash sets, how do you test it? By downloading your own child exploitation material..? This is an extremely dangerous (and illegal) path, so please beware.

ReplyQuote
Posted : 13/09/2016 1:05 pm
airo
 airo
(@airo)
New Member

Thanks for your information I will have a look.

I know that we cannot test this particular functionality of the solution we are planning as it would be illegal, however if it works in practice we should be able to find LE with the necessary clearance willing to run some tests and pass feedback.

Thanks & regards
Ian

ReplyQuote
Posted : 13/09/2016 2:01 pm
dan0841
(@dan0841)
Member

Thanks for your information I will have a look.

I know that we cannot test this particular functionality of the solution we are planning as it would be illegal, however if it works in practice we should be able to find LE with the necessary clearance willing to run some tests and pass feedback.

Thanks & regards
Ian

I would just write your software and there should be no need for Child Abuse hash sets. You could test fully and in principle using any hash sets which you create yourself (Ie Hashes of any sets of legal pics/videos etc). The technical solution is identical.

LE would (and should IMHO) be very cautious about releasing hash sets externally.

ReplyQuote
Posted : 13/09/2016 3:09 pm
EricZimmerman
(@ericzimmerman)
Active Member

no one in LE is going to release hash sets like this.

as others have said, make up your own data and use that. if you are doing binary hashing, it doesnt matter. things like a photoDNA or similar solution (ie fuzzy matching) can still be fabricated.

once you get things to a working state, get with LE (local ICAC, IcacCops, etc) and let them test the software against their data (they will have all the actual files in question and the hash sets, so they can run it thru its paces)

ReplyQuote
Posted : 13/09/2016 10:22 pm
tracedf
(@tracedf)
Active Member

LE would (and should IMHO) be very cautious about releasing hash sets externally.

Why are they so restrictive about the hash sets? They can't be used to recreate the images. If they made these more widely available, I think they would find that many organizations would proactively scan for them and report offenders to law enforcement. I worked in a K-12 school district and we would have loved to have a way to identify if any of our staff/teachers ever downloaded child exploitation photos.

ReplyQuote
Posted : 13/09/2016 11:14 pm
EricZimmerman
(@ericzimmerman)
Active Member

because if a pedophile got a hold of the hash sets they would know what LE knows and can act accordingly.

if you have a school resource officer that is a good way to get access to LE stuff, but giving things out like hashes and keywords to the general public wont happen.

ReplyQuote
Posted : 13/09/2016 11:19 pm
UnallocatedClusters
(@unallocatedclusters)
Senior Member

Free Hash Sets for Download

http//www.nsrl.nist.gov/Downloads.htm

Paid Hash Sets for Download

http//www.hashsets.com/

My Favorite Hash Sets

https://www.pinterest.com/pin/222013456607420254/

ReplyQuote
Posted : 14/09/2016 7:33 am
tracedf
(@tracedf)
Active Member

because if a pedophile got a hold of the hash sets they would know what LE knows and can act accordingly.

if you have a school resource officer that is a good way to get access to LE stuff, but giving things out like hashes and keywords to the general public wont happen.

1) Do the sets include new images from open investigations? I can see limiting access to that, but the hashes from known images in cases where charges have already been filed and/or where the cases have been tried would still be really valuable to schools, service providers, etc.

I didn't think about having our school resource officers request it; that's a good idea. Thanks.

ReplyQuote
Posted : 14/09/2016 9:06 pm
jaclaz
(@jaclaz)
Community Legend

1) Do the sets include new images from open investigations? I can see limiting access to that, but the hashes from known images in cases where charges have already been filed and/or where the cases have been tried would still be really valuable to schools, service providers, etc.
.

So there is a given image with a given hash.

Knowing that the given hash is known, I can change just one byte of it and obtain an image indistinguishable from the original when seen but that will pass under the radar of a hash comparison.

Publishing the known hashsets has consequences.

And there is NOT one reason in the world for wanting a set of hashsets (without the images) if what you need/want is to validate the hashing algorithm or a specific implementation.
I would say that by this time the algorithm has been validated enough and anyway - since it is a generic algorithm of which tens of implementations exist - a specific implementation can be validated by comparison to existing tools applied to "common" images.

Using images of meerkats for the tests is the way to go
http//www.forensicfocus.com/Forums/viewtopic/p=6569664/#6569664

The only exception would be of course if you want to "filter" some traffic, but unless you are LE, that would pose another kind of problem.

Let's say that your filter finds a corresponding hash for a file called daisies.jpg downoaded from the Internet by Mrs. Donovan (the nice, elderly, gray haired lady that teaches Class 3E) and an alarm is triggered.

What is your action?
Examples
1) Log the file download but allow it, make a copy of the file on another PC/server and call the cops?
2) Log the file download but allow it, make a copy of the file on another PC/server and view yourself the image to make sure, then call the cops?
3) Drop/block the download, make a copy of the file on another PC/server and call the cops?
4)Drop/block the download, make a copy of the file on another PC/server and view yourself the image to make sure, then call the cops?
5) Something else …

Please consider the possible consequences of the action you choose from the list above or of the action you have in mind (please describe), both in the case of a correct "positive" and of a false one. ?

jaclaz

ReplyQuote
Posted : 14/09/2016 9:43 pm
tracedf
(@tracedf)
Active Member

So there is a given image with a given hash.

Knowing that the given hash is known, I can change just one byte of it and obtain an image indistinguishable from the original when seen but that will pass under the radar of a hash comparison.

Publishing the known hashsets has consequences.

<snip>

You don't need to know the hash to change the images. Any collector/distributor of child pornography would be smart to write a program that can toggle a random pixel in each image to break the hash. Releasing the hashes does nothing to aid the child pornographer.

I can't see trying to use hashes to filter images being downloaded–too much latency–but it would be useful for identifying child pornography stored on a workstation or file server. If it is detected, the best move forward may depend on the locality but I would run it by my organization's attorneys and coordinate with local law enforcement to determine what our response should be. With ordinary content filtering, we get a lot of false positives because many sites are categorized based on keywords so a NY Times article about sexual assault on college campuses can get categorized as pornographic. With hashes of known images, a positive result should be definitive 99.99% of the time; the only exception being images that were added to the hash set by mistake (a mis-identification of an adult pornographic image maybe).

In the K-12 environment, we had school resource officers who were sworn police officers so we could have leveraged them in our response.

I think there is more benefit to sharing the information than keeping it secret (excepting new images from open investigations).

For testing software, any hash set works so I agree that these are not needed for that purpose.

ReplyQuote
Posted : 15/09/2016 2:21 am
jaclaz
(@jaclaz)
Community Legend

You don't need to know the hash to change the images. Any collector/distributor of child pornography would be smart to write a program that can toggle a random pixel in each image to break the hash.

But then the whole hashsets concept is totally useless. 😯

I mean, if every collector/distributor/redistributor actually "injects" a few bytes and creates a "random" hash, the hashset will never find any positive, not even if it grows to billions of hashes, but it will likely start giving lots of false positives, for each hash that is added to it, the same image will be regenerated several times creating several new hashes, and if they are added to the hashset, before or later the hashset will contain every possible hash.

Maybe it's time to have image recognition techniques instead of hashes …

jaclaz

ReplyQuote
Posted : 15/09/2016 1:30 pm
tracedf
(@tracedf)
Active Member

… if they are added to the hashset, before or later the hashset will contain every possible hash.

Maybe it's time to have image recognition techniques instead of hashes …

jaclaz

Even with a 128-bit hash, exhausting the hash-space really isn't an issue. Even a handful of individual collisions is pretty improbably. As far as I know, the people who commit these crimes are not doing this, but it would be relatively easy to do if they had any programming skills. Supplementing hashsets with image recognition would be a good move and the technology exists (e.g. Google's reverse image search).

This is a bigger problem in computer security/incident response where the bad guys are constantly tweaking their tools and use techniques to generate new versions with trivial differences. In those cases, it is more difficult to identify their tools as you might have many different hashes or signature strings for the same tool.

-Steven

ReplyQuote
Posted : 15/09/2016 9:15 pm
armresl
(@armresl)
Community Legend

You are 100% right. Most of the time, it's just cops being cops and objecting just to object.

The argument will happen a lot of times if you happen to work for the defense. More to the point, the number of road blocks placed in your path if you are non LE grow very quickly.

LE would (and should IMHO) be very cautious about releasing hash sets externally.

Why are they so restrictive about the hash sets? They can't be used to recreate the images. If they made these more widely available, I think they would find that many organizations would proactively scan for them and report offenders to law enforcement. I worked in a K-12 school district and we would have loved to have a way to identify if any of our staff/teachers ever downloaded child exploitation photos.

ReplyQuote
Posted : 16/09/2016 8:20 am
Chris_Ed
(@chris_ed)
Active Member

While it of course is programmaticaly easy to change a file in order to generate a new hash, there will always be enough uncertainty in detection that there won't be huge efforts made in this regard. With a readily available hashset, there is no uncertainty.

If the hashsets were made publicly available I would be utterly shocked if there weren't sites on Tor which, within 24 hours of this availability, would guarantee (and advertise) that their entire collection of material is not found in any LE hashset.

Hash sets are not entirely ideal, yes, and perhaps with a large enough collection of child abuse images then we could effectively train a machine to spot them with decent accuracy, but for right now it is still the fastest way to detect this sort of stuff.

I work in the private sector and I can appreciate the frustrations, but IMO there are some things which rightfully should remain in the LE domain.

ReplyQuote
Posted : 16/09/2016 1:30 pm
Page 1 / 2
Share: