Automated extractio...
 
Notifications
Clear all

Automated extraction of camera RAW file metadata  

  RSS
mitchmcc
(@mitchmcc)
New Member

Hello all,

I am a retired software engineer who is trying to learn digital forensics.   My computer experience, which goes back 38 years, encompasses a lot of system software, including many decades of Unix/Linux.   I also have written a lot of Python scripts, and want to continue to use it for my own experiments in digital forensic analysis.

I just wrote a Python script to iterate over a set of files and directories, and to look at the EXIF data for all JPG files.    If the EXIF for the file contains GPS information, it will go out and convert it to address information, and look for a keyword in either the 'town', 'city', or 'village' fields (if present).    You can see how easy it would be to extend this model to look for anything that is part of the GPS location data, down to the street name.

The question I have is about camera raw files.  I have Canon equipment.   The older ones were .CR, and my newer 80D is .CR2.   When I want to edit photos, I  use a program like Canon's Digital Photo Professional to edit them, and convert to JPG.   There are other programs, like RawTherapee that do the same.

But my question is really about what a DF analyst would do with thousands of CR2 files.  The data about where they were taken would seem to be of possibly critical importance to an investigation, but it would be extremely time-consuming to manually convert them using a program like DPP4.exe.

I have tried to find Python libraries that would break open the raw files, but the only one I have found, rawpy, I cannot get to install properly.

But what do people do in the real world?   Do programs like Encase or Autopsy have the ability to look inside raw files?

Thanks,

Mitch

Quote
Posted : 21/09/2020 1:43 pm
jaclaz
(@jaclaz)
Community Legend

Well, cannot say about Encase or Autopsy, but exiftool does read Canon RAW format:
https://exiftool.org/

https://exiftool.org/TagNames/Canon.html

https://exiftool.org/canon_raw.html

jaclaz

ReplyQuote
Posted : 22/09/2020 7:43 am
keydet89
(@keydet89)
Community Legend

I've been in private sector DFIR for over 20 yrs, most of that as a consultant.  While I've had to iterate over a lot of stuff, finding locations from digital picture files was not something I've ever had to do.  

However, I can see where this would be something extremely valuable to those who work in areas of DFIR where image and video files with illicit content are of prime concern.

ReplyQuote
Posted : 22/09/2020 1:17 pm
mitchmcc
(@mitchmcc)
New Member

@jaclaz

Thanks for this information.   As I am not in LE, I have to try and rely on
Open Source tools.  I will look at exiftool....

ReplyQuote
Posted : 22/09/2020 2:32 pm
mitchmcc
(@mitchmcc)
New Member

@keydet89

Well, getting it out of regular JPG files (when it is present) was pretty easy using Python.

Here is the output from my program:

 

Address data for photo: amp2.jpg
Address: 87 Captain Freebody Road
City: Narragansett
County: Washington County
State: Rhode Island
Country: United States of America

Address data for photo: amp3.jpg
Address: 26 Major Arnold Road
City: Narragansett
County: Washington County
State: Rhode Island
Country: United States of America

Address data for photo: bobbit_and_connor.jpg
Address: NC 73
County: Cabarrus County
State: North Carolina
Country: United States of America

Address data for photo: cooper.jpg
Address: Gold Hill Road
County: Stanly County
State: North Carolina
Country: United States of America

Address data for photo: Dell Monitor Kybd and Mouse.jpg
Address: 17 Major Arnold Road
City: Narragansett
County: Washington County
State: Rhode Island
Country: United States of America

It seems like the latitude and longitude data is never quite exact, at least to the house
address.  Also, my program can look for a city/town/village keyword, and only report
on the ones that match.

Mitch

ReplyQuote
Posted : 22/09/2020 2:37 pm
Rich2005
(@rich2005)
Senior Member
Posted by: @mitchmcc

It seems like the latitude and longitude data is never quite exact, at least to the house

address.  Also, my program can look for a city/town/village keyword, and only report
on the ones that match.

GPS isn't perfect so I wouldn't expect it to be pin-sharp accuracy and there are a whole raft of factors that can improve / degrade its accuracy. As such, not being an expert in that, I'd treat it with a big pinch of salt, and if critical to a case, is the sort of thing I'd report the information but with a big caveat that if the accuracy, in terms of metres or tens of metres, is crucial, it should be assessed by someone with substantial GPS expertise.

This post was modified 1 month ago by Rich2005
ReplyQuote
Posted : 23/09/2020 12:20 pm
jaclaz
(@jaclaz)
Community Legend
Posted by: @mitchmcc

Also, my program can look for a city/town/village keyword, and only report
on the ones that match.

Personally I wouldn't rely on city/town/village keywords, since there are homonimies and possible collisions, it would make more sense (to me) to provide a "center" (in GPS coordinates) and provide a "radius" (or freely draw a shape on a map[1]) to extract relevant images whose coordinates are within the area.

jaclaz

[1] like it is done on some real estate websites

ReplyQuote
Posted : 23/09/2020 1:50 pm
mitchmcc
(@mitchmcc)
New Member

@jaclaz

I am not sure what you mean by "collisions".   If a given GPS Lat/Long returns a town/city/village, I don't see how it could be misleading to accept it.   The whole point of my program was not to try and get to even the precision that GPS offers, but to note a wider area, similar to what you are proposing by mentioning the "radius" idea.   If none of those 3 are returned (as I saw in my testing), I report the 'county'.

In any case, this has been a fun exercise for me to think about ways to use Python in DF, which
I know is a big topic that has been covered by many books and websites.

Mitch

ReplyQuote
Posted : 23/09/2020 8:00 pm
jaclaz
(@jaclaz)
Community Legend
Posted by: @mitchmcc

@jaclaz

I am not sure what you mean by "collisions". 

The data you get from the file has GPS coordinates, that you *somehow* translate to a location name.

Location names are
1) far from "unique"
2) may NOT represent "properly" an area the investigator might be interested in due to city/county/state/whatever/ borders/commonly used names/etc.

Example 1:

https://en.wikipedia.org/wiki/Paris_(disambiguation)

Example 2:

https://en.wikipedia.org/wiki/London_Borough_of_Islington

https://www.google.com/maps/place/London+Borough+of+Islington,+London,+UK/@51.5371971,-0.148157,13z/

https://en.wikipedia.org/wiki/London_Borough_of_Haringey

https://www.google.com/maps/place/London+Borough+of+Haringey,+London,+UK/@51.587973,-0.1413858,13z/

Does in your tool "51.575000, -0.125700" return:

a. Islington
b. London
c. Haringey
d. *something else*

?

Or, if you prefer, does a location like - say -  Lost Springs:
https://en.wikipedia.org/wiki/Lost_Springs,_Wyoming

have the same "resolution" as - still say - Chicago?

jaclaz

 

 

 

ReplyQuote
Posted : 24/09/2020 12:22 pm
Share: