Notifications

Clear all

JPEG carving/identifying/recovering

joachimm · 2014-09-27T11:56:10Z

For context this post originates from this threadhttp//www.forensicfocus.com/Forums/viewtopic/t=12127/Actually, It works with identifying JPGE file when Block begins - 0xff, 0xd8, 0xff, 0xe0 - 0xff, 0xd8, 0xff, 0xo1 - 0r 0xff, 0xd8, 0xff, 0xfeThis is incorrect. Please check the JPEG format specification the JPEG should start with 0xff, 0xd8 (according to its spec) the bytes that follow are common but other values are possible.

Page 2 / 4 Prev Next

General (Technical, Procedural, Software, Hardware etc.)

Last Post by impulse 11 years ago

39 Posts

5 Users

0 Reactions

7,214 Views

RSS

mscotgrove

(@mscotgrove)

Prominent Member

Joined: 17 years ago

Posts: 940

29/09/2014 11:59 pm

This is a typical blackswan statement; I've never seen it therefore it does not exists. And if you're not looking for them you'll not find them either.

This statement is true if just applied to data carving. However, it is not true if doing signature testing on good/deleted files on a working disk.

If I do a (non carving) file recovery and a jpg/jpeg file does not have a valid signature I check. It is in this mode that I have not been aware of other values - though it is not a perfect process, and I may have missed some.

Another area to be aware of is if say jpgs have been renamed as .dat files in an attempt to hide them. If they were non 0xe0/0xe1 files, the hiding might work as signature recognition would not detect them as jpgs.

ReplyQuote

joachimm

(@joachimm)

Estimable Member

Joined: 17 years ago

Posts: 181

Topic starter 30/09/2014 12:09 am

This statement is true if just applied to data carving. However, it is not true if doing signature testing on good/deleted files on a working disk.

It applies to your use case as well. How do you know your sample set is representative?
If you have a disk with all JPEG from the same applications it will likely yield the same results.
As long if all swans are white, it will re-enforce your thinking model.

I've done a lot of file format analysis, take it from me, if there can be an edge case sooner or later there will be a sample that confirms it.

jaclaz, interesting finds. I'll take the time to read it a bit more carefully. FYI I've pinged C. Grenier on the matter, let's see if he agrees.

ReplyQuote

francesco

(@francesco)

Trusted Member

Joined: 12 years ago

Posts: 79

30/09/2014 12:23 am

I wrote a JPEG raw recovery program a long, long time ago (15 years ago likely) and if I remember well to filter out most of the corrupted images I kept reading chunks after chunk until I found unexpected data at the end of the chunk, in that case marking the JPEG as corrupted (so no stream type checking whatsoever).
If the two bytes after the last chunk were FF F9 I considered the file as good and saved it.
I used only the first two bytes to recognize the header and all seemed to work alright (I still had to do additional cleanup on the recovered files with some command line tools though).

ReplyQuote

jaclaz

(@jaclaz)

Illustrious Member

Joined: 18 years ago

Posts: 5133

30/09/2014 12:31 am

Another area to be aware of is if say jpgs have been renamed as .dat files in an attempt to hide them. If they were non 0xe0/0xe1 files, the hiding might work as signature recognition would not detect them as jpgs.

Allow me to disagree. 😯

If you pass a ".dat" file through (say) the mentioned Trid, it will give you a "correct" identification as ".jpg" with a 75% confidence, actually higher than the "base" E0 file
TrID/32 - File Identifier v2.10 - (C) 2003-11 By M.Pontello Definitions found 5387 Analyzing...


File .\DATs\Base_hexE0.dat

 50.0% (.JPG) JFIF JPEG Bitmap (4003/3)
File .\DATs\modded0xDE.dat

 75.0% (.JPG) JPEG Bitmap (3000/1)

File .\DATs\modded0xFF.dat 75.0% (.JPG) JPEG Bitmap (3000/1)

Then you try viewing it with a "common" viewer (still to remain within the tested tool MS Photo Editor) BUT if it is one of those values that the tool does not display (let's say DE) what will you do?

Test it with jpegsnoop anyway?

Or viceversa, you test it with a "dedicated" program (again for the sake of the example jpegsnoop) BUT the file has value FF?

JPEGsnoop 1.7.2 by Calvin Hass http//www.impulseadventure.com/photo/ -------------------------------------


  Filename [D\partitionview\vidma04\Photorec_test\256_jpegs\DATs\Base_hexE0.dat]

  Filesize [36316] Bytes
Start Offset 0x00000000

 Marker SOI (xFFD8) 

  OFFSET 0x00000000
 Marker APP0 (xFFE0) 

  OFFSET 0x00000002

  Length     = 16

  Identifier = [JFIF]

  version    = [1.1]

  density    = 72 x 72 DPI (dots per inch)

  thumbnail  = 0 x 0
 Marker DQT (xFFDB) 

  Define a Quantization Table.

  OFFSET 0x00000014

  Table length = 67
....

JPEGsnoop 1.7.2 by Calvin Hass http//www.impulseadventure.com/photo/ -------------------------------------


  Filename [D\partitionview\vidma04\Photorec_test\256_jpegs\DATs\modded0xDE.dat]

  Filesize [36316] Bytes
Start Offset 0x00000000

 Marker SOI (xFFD8) 

  OFFSET 0x00000000
  OFFSET 0x00000002

  Header length = 16

  Skipping unsupported marker
 Marker DQT (xFFDB) 

  Define a Quantization Table.

  OFFSET 0x00000014

  Table length = 67
....

JPEGsnoop 1.7.2 by Calvin Hass http//www.impulseadventure.com/photo/ -------------------------------------


  Filename [D\partitionview\vidma04\Photorec_test\256_jpegs\DATs\modded0xFF.dat]

  Filesize [36316] Bytes
Start Offset 0x00000000

 Marker SOI (xFFD8) 

  OFFSET 0x00000000

Skipped 1 marker pad bytes OFFSET 0x00000003 WARNING Unknown marker [0xFF00], stopping decode Use [Img Search Fwd/Rev] to locate other valid embedded JPEGs
@joachimm
Good, I made a post on jpegsnoop's page to make Calvin Hass also aware of the matter.

jaclaz

ReplyQuote

mscotgrove

(@mscotgrove)

Prominent Member

Joined: 17 years ago

Posts: 940

30/09/2014 1:01 am

I think initially one has to work on what is most likely. If 99.5% of jpegs are standard*, then they will be picked up with any extension.

If the files are the 0.5% non standard, then there should be a warning because files called jpg, do not open, validate or match a signature.

At this point, I would dive into a hex editor and see what is what.

With pressure on speed of forensic examinations is 99.5% acceptable, or do we need 99.999%? Hence my question, how common are non e0/e1 files?

A disk under investigation is very likely to have many files of the same type. Thus if non e0/e1 files are present, they should be spotted

*percentage figure is my guess - am I a long way off?

ReplyQuote

joachimm

(@joachimm)

Estimable Member

Joined: 17 years ago

Posts: 181

Topic starter 30/09/2014 3:06 am

I think initially one has to work on what is most likely. If 99.5% of jpegs are standard*, then they will be picked up with any extension.

If the files are the 0.5% non standard, then there should be a warning because files called jpg, do not open, validate or match a signature.

What is standard? Most occurring in your sample set? Who defines the sample set? Do you include all the cameras in your sample set? What if a very exotic camera is relevant?

With pressure on speed of forensic examinations is 99.5% acceptable, or do we need 99.999%? Hence my question, how common are non e0/e1 files?

Common or uncommon is relative. And what relevance does it have? If your tool catches the edge cases as well that's a nice addition isn't it?

The relevance to your case will depend. In most cases where the graphical content of jpeg is relevant, maybe even a 50% success rate is sufficient. In case you're looking for that one special crafted JPEG file used to compromise your image web service very likely that you want 99.999%.

But if you don't know what you're looking and don't know how relevant it is, how can you justify not looking for it?
(no need to answer this is a philosophical question 😉

ReplyQuote

jaclaz

(@jaclaz)

Illustrious Member

Joined: 18 years ago

Posts: 5133

30/09/2014 3:17 pm

With pressure on speed of forensic examinations is 99.5% acceptable, or do we need 99.999%? Hence my question, how common are non e0/e1 files?

I understand your point of view ) , but I see it from a different standpoint.

Is it "better" to cover 99.51% than 99.50% of possibilities?

If it is, how much does it cost (in terms of *whatever*, be it time, money, little furry creatures images used for tests) this increase?

If nothing or next to nothing, why not doing it?

Also, is it the ONLY field where correctly identifying a JPEG and displaying it is digital forensics and specifically digital forensics on criminal cases matters?

Wouldn't data recovery, malware analysis, and more or less a larger set of other computer related fields take advantage of knowing about this peculiarities?

More generally, I guess we succeeded in at least moving the item from the group of the unknown unknowns 😯 to that of the known unknowns ) wink
http//en.wikipedia.org/wiki/There_are_known_knowns

jaclaz

ReplyQuote

impulse

(@impulse)

New Member

Joined: 16 years ago

Posts: 4

30/09/2014 9:44 pm

I thought I'd chime in with a few thoughts based on my experience developing JPEGsnoop and recovering damaged photos.

Attempting to identify "JPEG" images from a file header signature will run into a number of issues as has already been raised in this thread.

It's important to note that there are really three general standards at play with "JPEG" images

JPEG (ITU-T.81)
JFIF
EXIF

Of these, the JPEG format is most permissive, being the superset. JFIF and EXIF leverage "JPEG" but reduce the flexibility to a subset (eg. to address interoperability). Please note that the following is just a quick list of some considerations with respect to file carving – I am certainly no expert and therefore it is by no means exhaustive. Plus it is quite likely that I've made an error in overlooking a detail )

Some of the issues we could run into

1) It is possible to encode "viewable" JPEG images meeting the spec of JPEG and file header requirements of JFIF but not EXIF
2) It is possible to encode "viewable" JPEG images meeting the spec of JPEG and file header requirements of EXIF but not JFIF
3) It is possible to encode "viewable" JPEG images meeting the spec of JPEG without meeting either JFIF or EXIF requirements.
4) It is possible to encode "viewable" JPEG images not meeting the spec of JPEG and similarly not meeting either JFIF or EXIF requirements.

By the "spec" of JPEG, I mean the use of valid marker encapsulation, valid marker values and a sequencing that meets the "Flow of Compressed Data syntax" and "Flow of Marker segments".

When decoding a file that deviates from the "spec" (#4), it is up to the implementation to decide how resilient or sensitive it may be. Some invalid markers could be ignored, provided that the necessary tables (ie. other markers) are present before we get to the scan data. That is why JPEGsnoop and Windows may give different results.

By my read of the specifications, we have a few different "valid" file headers for #1, #2 & #3

1) JFIF 0xFFD8, 0xFFE0
2) EXIF 0xFFD8, 0xFFE1
3) JPEG 0xFFD8, {0xFFDB, 0xFFC4, 0xFFCC, 0xFFDD, 0xFFFE, 0xFFE0..0xFFEF, 0xFFC0..0xFFC7, 0xFFC9..0xFFCF, 0xFFDE, 0xFFD9}

Note that the file header list for #3 above may be overly permissive (I'd need to double-check that there are not further restrictions documented elsewhere).

So, if we are data carving to identify all possible images that fall under #1,#2,#3 then we could theoretically include all the values above. Although these may all be valid markers per the flow, some of them are probably unrealistic/incorrect to have at the start of the file.

Given that non-image files could also have random data that aliases with the above marker values (eg. if the parsing occurred mid-file because of file fragmentation prior to recovery), we should also consider cross-checking other encapsulation/format/sequencing checks in the file after validating the file header. I believe this was also mentioned earlier in the thread. In other words, one could have a permissive header check, but then apply further tests for validity to weed out the false-positives. I could followup in another post with more details/ideas.

Conversely, if you want a data carving utility to detect images that have been encoded improperly or corrupted (ie. #4), one can't rely on the subset of valid markers alone. Instead, one would need to look for further indicators or heuristics. For example, one could look for the scan segment (0xFFDA) and then look for stuff bytes (0xFF00), or one of many other methods.

If its helpful, I could try performing a search across my database of images from the web (100k+) and see what file headers were actually used.

Calvin

ReplyQuote

mscotgrove

(@mscotgrove)

Prominent Member

Joined: 17 years ago

Posts: 940

30/09/2014 10:03 pm

Calvin, I would love to know if my wild guess of 99.5% of images are 0xe0, 0xe1 is actually anywhere near correct. Your suggested scan might help here.

One type of file I skip when carving is jpeg headers within an AVI file, these have the string "AVI" starting in the 6th byte (after a e0 or e1). This was probably determined by false positives rather than detailed analysis of the spec!

ReplyQuote

jaclaz

(@jaclaz)

Illustrious Member

Joined: 18 years ago

Posts: 5133

01/10/2014 12:03 am

I thought I'd chime in with a few thoughts based on my experience developing JPEGsnoop and recovering damaged photos.

Which I am pretty sure will be useful/interesting, happy you joined, Calvin. )

By my read of the specifications, we have a few different "valid" file headers for #1, #2 & #3

1) JFIF 0xFFD8, 0xFFE0

2) EXIF 0xFFD8, 0xFFE1

3) JPEG 0xFFD8, {0xFFDB, 0xFFC4, 0xFFCC, 0xFFDD, 0xFFFE, 0xFFE0..0xFFEF, 0xFFC0..0xFFC7, 0xFFC9..0xFFCF, 0xFFDE, 0xFFD9}

Very good and definitely an non-trifling widening form the 2 values ones E0/E1 and from the 4 values one E0/E1/EC/FE.
(I believe there are some repeated values in #3)

Your list of "valid according to specifications" is similar to the one joachimm posted

application segment "\xff[\xe3-\xef]"

Table segments
"\xff\xc4" # Define Huffmann table (DHT)
"\xff\xcc" # Arithmetic coding condition table (DAC)
"\xff\xdb" # Define quantization table (DQT)

Reserved segments
"\xff\xc8" # Start of Frame (JPG) (Reserved for JPEG extensions)
"\xff[\xf0-\xfd]" # Reserved for JPEG extensions
"\xff\xfe" # Comment (COM)
"\xff[\x02-\xbf]" # Reserved

(though not exactly the same one).

And still, the test showed how jpegsnoop will "display" also (though "skipping" the unknown marker)

Values that produced a "valid" log (i.e. that continued the parsing after the header)
01, C4, C8, CC, DE, E0-FE

And will "choke" on

Values that crashed jpegsnoop
C0-C3, C5-C7, C9-CB, CD-CF

And

Values that produced an "invalid" log (i.e. that stopped the parsing after the header)
00, 02-BF, D0-DD, DF, FF

Now, I do understand how the "crash" and some of the "invalid" values are due to the "further" checks that jpegsnoop (which is not obviously a carver, nor a filetype identifier) performs, but it would be nice if the program would not crash and would have (say) an option (or whatever) to force the skipping ? .

As well I would like if there could be an option (or something) to have the possibility of "correcting" the values that are now "invalid" but that can be displayed by "common" viewers

BUT among the "invalid" values (that jpegsnoop could NOT display), these were viewable in Explorer/Photo Editor as above
00,D0-D7,DC, FF

One of the probabilities that mscotgrove would not take into consideration representing, say, 0.00000000000001% of cases but let's say that someone has an image that displays correctly in (still say) Photo Editor and that has the fourth byte E0.
After some time - for whatever reasons - *something* changes the fourth byte to 00.
The user continues having the image display fine in Photo Editor.
Then *something else* changes another byte, making the image not dislayed correctly anymore in Photo Editor.
Right now if the user gets jpegsnoop to attempt recovering the image, he will never be able to go past the initial fourth byte, which is not the actual "issue".

Someone will notice how three people (joachimm, Calvin and myself) that posted about these values managed to find three different notations for the same data, so that it is very difficult to compare the values. 😯

I propose the use of a "more graphical" notation, such as the "table format" I used in my half-@§§ed batches.

What Joachimm seemingly posted (please double check)
C4 C8 CC DB E0 E1 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE
What Calvin/Impulse seemingly posted (please double check)
C0 C1 C2 C3 C4 C5 C6 C7 C9 CA CB CC CD CE CF D9 DB DD DE E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF FE
jaclaz

ReplyQuote

Page 2 / 4 Prev Next

Article: The Balance Between Digital Forensic Examiners And Digital Evidence Technicians: Expertise Vs. Efficiency

Can digital forensic labs cut backlogs without cutting ...

By Zoe , 20 hours ago
Prefetch Question

Hello All, I have a question regarding Windows prefet...

By Forensic_Tester , 1 day ago
Webinar: Collaborative Forensics: Overcoming Challenges In Multi-Jurisdictional Investigations

Discover expert strategies for cross-border investigati...

By Zoe , 1 week ago

8 Forums
15.7 K Topics
92.3 K Posts
183 Online
41.1 K Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed