Processing thousand...
 
Notifications
Clear all

Processing thousands of images

13 Posts
10 Users
0 Likes
894 Views
(@bchaseaz)
Posts: 13
Active Member
Topic starter
 

I have a hard drive with tens of thousands of image files on it. I processed the entire drive with FTK.

There are about 4-6 copies of each image on the drive. I need to find a way to filter and sort images. Ideally, I'd like to produce a report or a folder with a single copy of each image. I'd also like to do things like filter out the jokes, meme images, etc.

It would also be nice to have that report show the image, and give the 5 places where that image was located on the drive. FTK can do some of this using the PhotoDNA feature. File hashing will not work, because the images are not exact copies.

Is there any good program to help me do this?

Thanks.

 
Posted : 23/06/2014 11:05 pm
(@mscotgrove)
Posts: 938
Prominent Member
 

With my software I just deduplicate the images based on hash values. This may remove a large number. Obviously this will only detect exact matches and on most drives there are many such matches.

The next stage is harder.

1) I use my software to skip known files, based on NRSL hash values. This will remove some.

2) As a rather vague stage, one can often skip small files, or files without embedded dates. (Again, CnW software adds in the date of the file from the internal metadata).

If you are only interested in certain files, it should be possible to filter by date, camera type etc

 
Posted : 24/06/2014 2:00 am
Adam10541
(@adam10541)
Posts: 550
Honorable Member
 

Sometimes the hard way is the only way )

If this is a CP related job then I would just bite the bullet and start reviewing with minimal exclusions because you never know what will be relevant.

You may even be able to exclude some locations and concentrate only on user folders.

Depending on what your goal is tens of thousands of images is not really that big of a job, it's when they get into the hundreds or millions that you have a hard time.

If you are just looking for particular content (CP or regular porn) then you should be able to get through 30-40k in a single day quite comfortably.

If the pics need more close examination then you are in for a long haul because no software can reliably cut out too much of the leg work here.

 
Posted : 24/06/2014 7:18 am
(@twjolson)
Posts: 417
Honorable Member
 

That is the kind of job Netclean was designed for. It uses PhotoDNA to group images that are visually similar into a visual group. The reporting isn't as good as I'd like, but when it comes to image and video review and categorization, it is top notch.

And, best of all, it is free to law enforcement.

 
Posted : 24/06/2014 11:56 am
(@sam305754)
Posts: 44
Eminent Member
 

Hi,

you can use Vizx2 from ZIUZ or LACE from Bluebear. These soft are not free but you can ask for trial version. ther are designed to analyze large amount of picturze and video. you can use hash values.
Vizx2 has the ZZ40 hash that detect ‘near duplicates’. Working closely with Microsoft, ZiuZ has integrated their PhotoDNA technology with VizX2 to create an additional means of matching. PhotoDNA is a little more tolerant to changes in an image than ZZ40.

Regards

 
Posted : 24/06/2014 12:39 pm
PaulSanderson
(@paulsanderson)
Posts: 651
Honorable Member
 

Reconnoitre has a very advanced reporting engine and can create very customisable reports (which can be saved and re-used or shared).

I have just modified an existing indecent images report template (took about 5 minutes) such that it compares images by hash and only reports the full details for the first of a series of duplicate images. Subsequent duplicate images have just summary information displayed, i.e. Filename, path and VSC number (Reconnoitre automatically processes VSCs and images within them.

A screenshot from a page of the report is shown and the full report (in this case a PDF) can be downloaded at the link below, the PDF shows a few more of the summary features in this report

Summary images report

 
Posted : 25/06/2014 1:09 pm
minime2k9
(@minime2k9)
Posts: 481
Honorable Member
 

I'll second the Netclean recommendation.

If your not LE and therefore can't get a free copy, I would say C4All is a good option just for viewing the images.

Ten's of thousands of image is actually a fairly low number, the average job in our unit it 150,000ish after known hashes have been removed.

 
Posted : 25/06/2014 7:27 pm
(@bchaseaz)
Posts: 13
Active Member
Topic starter
 

Thanks for all of the suggestions. I will start testing them out.
I am not LE, I'm on the defense side of this particular case.

I have a hard drive created by LE, which is made up of everything they collected. It's not a hard drive image, rather just an external drive where they dumped all of their data. So I can't filter by folders, users, access dates, etc.

The problem I have is I don't know what is relevant. It is not a CP case. Some relevant pictures may have people, others may show items, but we don't really know what we are going to find that is useful.
The attorney really want to filter down the list of images into something more manageable, because he needs to be able to go through them.

 
Posted : 28/06/2014 8:49 pm
jaclaz
(@jaclaz)
Posts: 5133
Illustrious Member
 

I have a hard drive created by LE, which is made up of everything they collected. It's not a hard drive image, rather just an external drive where they dumped all of their data. So I can't filter by folders, users, access dates, etc.

The problem I have is I don't know what is relevant. It is not a CP case. Some relevant pictures may have people, others may show items, but we don't really know what we are going to find that is useful.
The attorney really want to filter down the list of images into something more manageable, because he needs to be able to go through them.

This is a bit outside "common" forensics "standards", then, it is more like re-organizing a "mixed/shuffled" archive.

BTW (and as a side note) I believe that (at least in many countries) the LE/Prosecution has to provide to the defense the actual unmodified hard disk images and not a bunch of files copied to a hard disk.

If the images have been copied maintaining date/time of the original filesystem, dividing them in folders by date would make a lot of sense.

As well IF the images have still their EXIF data (with date/time) but not the actual filesystem timestamps it would also be possible to make a "big" selection between those that have not EXIF data and order the rest in folders by date based on EXIF data.

There are also tools that allow to preview the images sorting them by appearances/colours, as an example this oldish one often worked for me
http//download.chip.eu/it/ImageSorter-2.02_1756890.html

The original home page is "dead" though it can be accessed through the Wayback machine
https://web.archive.org/web/20100327205424/http//mmk.f4.fhtw-berlin.de/Projekte/ImageSorter

This should be the latest available version
http//www.pixolution.de/index.php?id=18

but I believe there are other softwares with similar approach/functions, like (examples)
http//www.mindgems.com/products/VS-Duplicate-Image-Finder/VSDIF-About.htm
http//www.visipics.info/index.php?title=Main_Page
http//www.keronsoft.com/dupdetector.html

jaclaz

 
Posted : 28/06/2014 10:00 pm
(@martjno)
Posts: 5
Active Member
 

In addition to the already mentioned products by Ziuz and Blubear, I think also Adroit Photo Forensics by Digital Assembly is worth a look.

 
Posted : 30/06/2014 11:08 am
Page 1 / 2
Share: