Processing thousand...
 
Notifications
Clear all

Processing thousands of images

13 Posts
10 Users
0 Reactions
1,528 Views
(@bchaseaz)
Active Member
Joined: 12 years ago
Posts: 13
Topic starter  

I have a hard drive with tens of thousands of image files on it. I processed the entire drive with FTK.

There are about 4-6 copies of each image on the drive. I need to find a way to filter and sort images. Ideally, I'd like to produce a report or a folder with a single copy of each image. I'd also like to do things like filter out the jokes, meme images, etc.

It would also be nice to have that report show the image, and give the 5 places where that image was located on the drive. FTK can do some of this using the PhotoDNA feature. File hashing will not work, because the images are not exact copies.

Is there any good program to help me do this?

Thanks.


   
Quote
(@mscotgrove)
Prominent Member
Joined: 17 years ago
Posts: 940
 

With my software I just deduplicate the images based on hash values. This may remove a large number. Obviously this will only detect exact matches and on most drives there are many such matches.

The next stage is harder.

1) I use my software to skip known files, based on NRSL hash values. This will remove some.

2) As a rather vague stage, one can often skip small files, or files without embedded dates. (Again, CnW software adds in the date of the file from the internal metadata).

If you are only interested in certain files, it should be possible to filter by date, camera type etc


   
ReplyQuote
Adam10541
(@adam10541)
Honorable Member
Joined: 13 years ago
Posts: 550
 

Sometimes the hard way is the only way )

If this is a CP related job then I would just bite the bullet and start reviewing with minimal exclusions because you never know what will be relevant.

You may even be able to exclude some locations and concentrate only on user folders.

Depending on what your goal is tens of thousands of images is not really that big of a job, it's when they get into the hundreds or millions that you have a hard time.

If you are just looking for particular content (CP or regular porn) then you should be able to get through 30-40k in a single day quite comfortably.

If the pics need more close examination then you are in for a long haul because no software can reliably cut out too much of the leg work here.


   
ReplyQuote
(@twjolson)
Honorable Member
Joined: 17 years ago
Posts: 417
 

That is the kind of job Netclean was designed for. It uses PhotoDNA to group images that are visually similar into a visual group. The reporting isn't as good as I'd like, but when it comes to image and video review and categorization, it is top notch.

And, best of all, it is free to law enforcement.


   
ReplyQuote
(@sam305754)
Eminent Member
Joined: 14 years ago
Posts: 44
 

Hi,

you can use Vizx2 from ZIUZ or LACE from Bluebear. These soft are not free but you can ask for trial version. ther are designed to analyze large amount of picturze and video. you can use hash values.
Vizx2 has the ZZ40 hash that detect ‘near duplicates’. Working closely with Microsoft, ZiuZ has integrated their PhotoDNA technology with VizX2 to create an additional means of matching. PhotoDNA is a little more tolerant to changes in an image than ZZ40.

Regards


   
ReplyQuote
PaulSanderson
(@paulsanderson)
Honorable Member
Joined: 19 years ago
Posts: 651
 

Reconnoitre has a very advanced reporting engine and can create very customisable reports (which can be saved and re-used or shared).

I have just modified an existing indecent images report template (took about 5 minutes) such that it compares images by hash and only reports the full details for the first of a series of duplicate images. Subsequent duplicate images have just summary information displayed, i.e. Filename, path and VSC number (Reconnoitre automatically processes VSCs and images within them.

A screenshot from a page of the report is shown and the full report (in this case a PDF) can be downloaded at the link below, the PDF shows a few more of the summary features in this report

Summary images report


   
ReplyQuote
minime2k9
(@minime2k9)
Honorable Member
Joined: 14 years ago
Posts: 481
 

I'll second the Netclean recommendation.

If your not LE and therefore can't get a free copy, I would say C4All is a good option just for viewing the images.

Ten's of thousands of image is actually a fairly low number, the average job in our unit it 150,000ish after known hashes have been removed.


   
ReplyQuote
(@bchaseaz)
Active Member
Joined: 12 years ago
Posts: 13
Topic starter  

Thanks for all of the suggestions. I will start testing them out.
I am not LE, I'm on the defense side of this particular case.

I have a hard drive created by LE, which is made up of everything they collected. It's not a hard drive image, rather just an external drive where they dumped all of their data. So I can't filter by folders, users, access dates, etc.

The problem I have is I don't know what is relevant. It is not a CP case. Some relevant pictures may have people, others may show items, but we don't really know what we are going to find that is useful.
The attorney really want to filter down the list of images into something more manageable, because he needs to be able to go through them.


   
ReplyQuote
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

I have a hard drive created by LE, which is made up of everything they collected. It's not a hard drive image, rather just an external drive where they dumped all of their data. So I can't filter by folders, users, access dates, etc.

The problem I have is I don't know what is relevant. It is not a CP case. Some relevant pictures may have people, others may show items, but we don't really know what we are going to find that is useful.
The attorney really want to filter down the list of images into something more manageable, because he needs to be able to go through them.

This is a bit outside "common" forensics "standards", then, it is more like re-organizing a "mixed/shuffled" archive.

BTW (and as a side note) I believe that (at least in many countries) the LE/Prosecution has to provide to the defense the actual unmodified hard disk images and not a bunch of files copied to a hard disk.

If the images have been copied maintaining date/time of the original filesystem, dividing them in folders by date would make a lot of sense.

As well IF the images have still their EXIF data (with date/time) but not the actual filesystem timestamps it would also be possible to make a "big" selection between those that have not EXIF data and order the rest in folders by date based on EXIF data.

There are also tools that allow to preview the images sorting them by appearances/colours, as an example this oldish one often worked for me
http//download.chip.eu/it/ImageSorter-2.02_1756890.html

The original home page is "dead" though it can be accessed through the Wayback machine
https://web.archive.org/web/20100327205424/http//mmk.f4.fhtw-berlin.de/Projekte/ImageSorter

This should be the latest available version
http//www.pixolution.de/index.php?id=18

but I believe there are other softwares with similar approach/functions, like (examples)
http//www.mindgems.com/products/VS-Duplicate-Image-Finder/VSDIF-About.htm
http//www.visipics.info/index.php?title=Main_Page
http//www.keronsoft.com/dupdetector.html

jaclaz


   
ReplyQuote
(@martjno)
Active Member
Joined: 13 years ago
Posts: 5
 

In addition to the already mentioned products by Ziuz and Blubear, I think also Adroit Photo Forensics by Digital Assembly is worth a look.


   
ReplyQuote
Page 1 / 2
Share: