Notifications

Clear all

Searching the internet for images by metadata

General (Technical, Procedural, Software, Hardware etc.)

Last Post by gmarshall139 17 years ago

9 Posts

6 Users

0 Reactions

3,552 Views

RSS

gmarshall139

(@gmarshall139)

Reputable Member

Joined: 22 years ago

Posts: 378

Topic starter 31/08/2009 10:08 pm [#4462]

The goal is to find images posted to the internet from a specific camera. So if the camera's serial number (or some other unique identifier) is contained within an image's metadata, could it be searched for to locate other pictures created by the same device?

How about creating a web crawler to do the same?

Quote

keydet89

(@keydet89)

Famed Member

Joined: 22 years ago

Posts: 3568

01/09/2009 12:55 am

If you can extract EXIF data from a JPG on an analysis system, I don't see why you couldn't do the same thing via this sort of functionality.

Good luck.

ReplyQuote

watcher

(@watcher)

Estimable Member

Joined: 20 years ago

Posts: 125

01/09/2009 1:26 am

The goal is to find images posted to the internet from a specific camera. …

A crawler that extracts and compares EXIF data is certainly technically possible, in fact it wouldn't be that hard. That said, there are some practicalities to consider.

Size and time alone would necessitate some kind of focus narrowing criteria. I suspect that new pictures are added to the web faster than you could crawl them.

Additionally, most images on the web outside of photo sites tend to be small low resolution images. Generally, external editing/resizing software does not keep the EXIF data in the new smaller images. Unless the image was taken directly by the camera in question and posted unedited, there is a good chance the EXIF data is gone.

ReplyQuote

wmpwi

(@wmpwi)

New Member

Joined: 17 years ago

Posts: 1

01/09/2009 5:02 am

The goal is to find images posted to the internet from a specific camera. So if the camera's serial number (or some other unique identifier) is contained within an image's metadata, could it be searched for to locate other pictures created by the same device?

How about creating a web crawler to do the same?

You just happened to hit real close to a hot button of mine right now. I've been doing a lot of research on how I might be able to use exif data in our investigations and we've already had enough luck to keep me encouraged. The crawl is an interesting idea, but I see some wisdom in what Watcher said.

If one can design the crawler, then it shouldn't be a big leap to focus the crawl to selected web sites. Any one prolific enough to catch a manufacturing case and careless enough to leave in the metadata may also frequent Picasa, Flikr, or the like. I've pulled exif data from those. Now just figure how to crawl or scrape their data and I'll sign on. I'll be watching for updates.

ReplyQuote

gmarshall139

(@gmarshall139)

Reputable Member

Joined: 22 years ago

Posts: 378

Topic starter 01/09/2009 6:56 pm

Additionally, most images on the web outside of photo sites tend to be small low resolution images. Generally, external editing/resizing software does not keep the EXIF data in the new smaller images. Unless the image was taken directly by the camera in question and posted unedited, there is a good chance the EXIF data is gone.

I agree that these two points are the major limitations. Any good resources for programming web crawlers?

ReplyQuote

gmarshall139

(@gmarshall139)

Reputable Member

Joined: 22 years ago

Posts: 378

Topic starter 01/09/2009 7:11 pm

I found something of a modular crawler here to play around with

http//www.cs.cmu.edu/~rcm/websphinx/

ReplyQuote

kovar

(@kovar)

Prominent Member

Joined: 19 years ago

Posts: 805

02/09/2009 3:10 am

Greetings,

Be aware that if you crawl large volumes of data from a professionally run site, they may shut you down. This often happens when imaging web sites without a throttle set on your application. The site will detect excessive bandwidth demands from your IP and either shut you off completely or throttle you way back. wget and other similar tools have settings to limit the bandwidth demands.

Crawling a photo site would likely trigger similar responses.

-David

ReplyQuote

seanmcl

(@seanmcl)

Honorable Member

Joined: 20 years ago

Posts: 700

02/09/2009 3:45 am

Look at

www.rxfn.com/projects

I have installed APF and BFD on a couple of client sites and it does exactly what you say, namely, it looks for an unusual number of connect attempts within a small period of time and creates an IPTABLES rule to block your site if it sees them.

Of course, this is Linux based, but most of the contraband that you see out there would likely be on Linux hosted servers.

ReplyQuote

gmarshall139

(@gmarshall139)

Reputable Member

Joined: 22 years ago

Posts: 378

Topic starter 02/09/2009 7:11 am

I've found it pretty easy to pull down images off of web sites with a crawler. So I can now process them in the normal way with a metadata parser or even a simple keyword search since I am looking for one specific string. Not an elegant or efficient solution though.

So far I haven't been blocked but I'm just running some tests and not really pushing it. It's also not a suitable technique for child pornography investigations either, for obvious reasons. But that's not what I need now anyway.

ReplyQuote