JPEG carving/identi...
 
Notifications
Clear all

JPEG carving/identifying/recovering

39 Posts
5 Users
0 Reactions
7,215 Views
(@mscotgrove)
Prominent Member
Joined: 17 years ago
Posts: 940
 

its like one you deleted the image it automatically zeroize the free space.
has any one encountered some thing like this ?

This was the original question, and I am not sure we have answered it. (The discussion has been useful though)

The following link

http//www.xda-developers.com/android/yet-another-reason-to-update-to-android-4-3-trim-support/

does state that Trim is included on S4 - this would surely explain the blank data area.


   
ReplyQuote
(@joachimm)
Estimable Member
Joined: 17 years ago
Posts: 181
Topic starter  

does state that Trim is included on S4 - this would surely explain the blank data area.

The OP does not indicate if the image was deleted on the internal storage or on a media card, neither did they indicate the file system or the android version. Or how they analyzed the free space. But yes trim is one of the options. Also see http//www.anandtech.com/show/7185/android-43-update-brings-trim-to-all-nexus-devices

What Joachimm seemingly posted (please double check)

As indicated these were based on my notes, let me back track to the origin of this information before comparing notes. (actually found that I was incomplete in copying from my notes)

First some links
http//forensicswiki.org/wiki/JPEG

Let's zoom in on itu-t81
http//www.w3.org/Graphics/JPEG/itu-t81.pdf

Annex B - Compressed data formats

Section "B.2.1 High-level syntax"
* here we see "SOI, frame, EOI" as the high level structure the first part of the frame is defined as and optional "Tables/misc." and "Frame header"

Section "B.2.4 Table-specification and miscellaneous marker segment syntax"

Here this segment is defined as a marker segment which consists of
* Quantization table-specification or
* Huffman table-specification or
* Arithmetic conditioning table-specification or
* Restart interval definition or
* Comment or
* Application data

Section "B.2.2 Frame header syntax"
Indicates "Start of frame marker" (SOFn)

Looking the individual sections and "Table B.1 – Marker code assignments"

* Quantization table-specification X’FFDB’
* Huffman table-specification X’FFC4’
* Arithmetic conditioning table-specification X’FFCC’
* Restart interval definition X’FFD0’ through X’FFD7’
* Comment X’FFFE’
* Application data X’FFE0’ through X’FFEF’
* Start of frame marker X’FFC0’, X’FFC1’, X’FFC2’, X’FFC3’, X’FFC5’, X’FFC6’, X’FFC7’, X’FFC9’, X’FFCA’, X’FFCB’, X’FFCD’, X’FFCE’, X’FFCF

So I rechecked my notes and did not copy it entirely, was missing the SOF markers

X’FFC8’, X’FF02’ through X’FFBF’, X’FFF0’ through X’FFFD’ are defined as reserved.
Since nothing is known about these one could argue if they are valid markers or not, for the sake of a flexible parser that does not break when one of these is added I choose to accept them.

Also something I just noticed
Page 82

Note that optional X’FF’ fill bytes which may precede any marker shall be discarded before determining which marker is present.


   
ReplyQuote
(@joachimm)
Estimable Member
Joined: 17 years ago
Posts: 181
Topic starter  

Some more notes re-checking FFE2 => application_segment_flashpix
so this is another specific standard on top of JPEG, also see
http//www.sno.phy.queensu.ca/~phil/exiftool/TagNames/FlashPix.html

However, some of the structures used in FlashPix streams are part of the EXIF specification, and are still being used in the APP2 FPXR segment of JPEG images by some digital cameras from manufacturers such as FujiFilm, Hewlett-Packard, Kodak and Sanyo.


   
ReplyQuote
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

X’FFC8’, X’FF02’ through X’FFBF’, X’FFF0’ through X’FFFD’ are defined as reserved.
Since nothing is known about these one could argue if they are valid markers or not, for the sake of a flexible parser that does not break when one of these is added I choose to accept them.

Well, yes and no.
I mean, we have to decide if we are going "according to specifications", according to "values found in nature" or "according to values that can be commonly displayed", I had hoped we could get a "probability map".

So, if I get it right, this is your "final" map of acceptable values please do review it once again
02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F
80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF
C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
D0 D1 D2 D3 D4 D5 D6 D7 D9 DB
E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF
F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE

Also something I just noticed
Page 82
Note that optional X’FF’ fill bytes which may precede any marker shall be discarded before determining which marker is present.

If I get it right this, translated into English means "repetitions of FF value must be considered as a single instance, thus FF is not an acceptable value for 4th byte" ?

All in all, considering that
00 displays in Photo Editor
01 displays in both Photo Editor and jpegsnoop
DC displays in Photo Editor
FF displays in Photo Editor

And that the (new entry in the test) Irtfanview can display
00-BF
D0-D8 and DC
E0-FF

I.e. the only values that "stop it" are
C0-CF
D9, DA, DB, DD

The three bytes pattern FFD8FF seems like a reasonable one, to "tentatively identify" a JPEG file, after which instead of checking the fourth byte, that can seemingly be *almost* any value with the exception of a handful of values, it would be smarter to look for other "typical" patterns or values.

jaclaz


   
ReplyQuote
(@joachimm)
Estimable Member
Joined: 17 years ago
Posts: 181
Topic starter  

Apparently we got our own thread 😉

If I get it right this, translated into English means "repetitions of FF value must be considered as a single instance, thus FF is not an acceptable value for 4th byte" ?

So yes, this also gets back to one of your earlier questions why 3 people yield 3 different results
1. language is ambiguous
2. reading and writing standards is difficult
3. theory and practice not necessarily align

As I read it
it means any valid segment marker can be preceded by an arbitrary number of FF byte values
e.g. FF C4 should be interpreted in the same way as FF FF FF FF FF FF C4
Which make parsing the format so much more fun [I'm being sarcastic here]

Well, yes and no.

I thought I addressed that with the remark about it being debatable 😉

I mean, we have to decide if we are going "according to specifications", according to "values found in nature" or "according to values that can be commonly displayed", I had hoped we could get a "probability map".

As I've indicated to mscotgrove probabilities are not worth much
1. what is the sample set you base the probabilities on?
2. is the exotic/edge case relevant for your case?

So again this depends on how you interpret the standard
* does reserved means it cannot be used? then again invalid would be more clear
* or does it mean some future variant of the format can use it
* if you want to build your parser to allow it or detect it, depends on your use case IMO, hence debatable

but to get back to assumption in software, here we have 2 areas where assumption happen
and if the tool does not point out the assumptions it is based on, can you as a digital forensic analyst rely on the tool?

So, if I get it right, this is your "final" map of acceptable values please do review it once again

02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F
80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF
C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
D0 D1 D2 D3 D4 D5 D6 D7 D9 DB
E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF
F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE

So D9 is not one I mentioned, since that would be EOI and according to the spec there should be always a valid frame (assuming I'm interpreting it correctly). But the rest looks good at first glance (though it has been a long day, so don't rely on 100% accuracy here at the moment)

All in all, considering that
00 displays in Photo Editor
01 displays in both Photo Editor and jpegsnoop
DC displays in Photo Editor
FF displays in Photo Editor

And that the (new entry in the test) Irtfanview can display
00-BF
D0-D8 and DC
E0-FF

I.e. the only values that "stop it" are
C0-CF
D9, DA, DB, DD

So this is where probably "creative interpretation" comes into play, not uncommon for file formats 😉
Again what is your goal, determining the edge cases, finding images that correspond to a certain camera, trying to find this one JPEG that was fabricated to compromise your web server? You (in general) as the analyst will have to adapt to the situation, hopefully your tool is as helpful as it can get without giving you false information.

The three bytes pattern FFD8FF seems like a reasonable one, to "tentatively identify" a JPEG file, after which instead of checking the fourth byte, that can seemingly be *almost* any value with the exception of a handful of values, it would be smarter to look for other "typical" patterns or values.

this will depends on your use case, but the point to stress here is this a conscious decision by you (in general) as the analyst? or are you relying on the knowledge of the people that created the tool?


   
ReplyQuote
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

Well, I am not the analyst, nor the one that writes a tool, I am simply someone passing by and trying to understand - since at first sight I notice something that seems to me "just not right" - if there are some margins for improvements in the tool, in the procedure or in both.

Let's try to recap.

In practice, there are two (E0 and E1) or maybe four (E0, E1, EC, FE) 4th bytes values that are "relevant".

  • mscotgrove thinks that the two values represent 99.50% (say) or more (and he is most probably right).
  • Cristophe Grenier evidently found "in the wild" the other two values (otherwise it would have made no sense to add the patterns to Photorec), let's say for simplicity that this account for 0.49%
  • I found that - depending on the viewer used (but without using any mis-known marginal tool on a very uncommon niche OS, using two or three common tools, some actually built-in into the most common OS's), there is a whole range of other values that maintain the image displayable and that the mentioned recovery tools would have no chance to recognize. In the tests it also came out how jpegsnoop which is a known tool for the analysing of JPEG either failed to display some of the files with "queer" values or crashed on them.
  • Joachimm confirmed how through a re-reading of the specifications some of these values (more or less depending on how "elastic" the reading is) are actually also within the specs.

I would think that recovery/carving tools would need to be *somehow* modified to take account of the rare possibility of a non-common fourth byte value.

This could be done in several ways, as an example with a "switch" to look only for "more common" and a switch for "all values" (or something similar), the output could be something like
Found 10,000 possible Jpeg images of which
9,950 standard, common ones E0/E1
49 not common EC/EF
1 extremely rare

JFYI (about how the standard is a mess) see here
http//www.hackerfactor.com/blog/index.php?/archives/242-Graphic-Content,-Parental-Supervision-Advised.html

Sadly, JPEG really looks like a format designed by committee (because it was). It is one of the most idiotic file formats I have ever come across.

wink

jaclaz


   
ReplyQuote
(@joachimm)
Estimable Member
Joined: 17 years ago
Posts: 181
Topic starter  

Well, I am not the analyst, nor the one that writes a tool, I am simply someone passing by and trying to understand - since at first sight I notice something that seems to me "just not right" - if there are some margins for improvements in the tool, in the procedure or in both.

Ironic, The person that is interested the most is neither of roles I would have *assumed* this to be most relevant for. Teaches me for assuming 😉

I would think that recovery/carving tools would need to be *somehow* modified to take account of the rare possibility of a non-common fourth byte value.

Just an update, C. Grenier has implemented the following change for photorec
http//git.cgsecurity.org/cgit/testdisk/commit/?id=4c5fcd4164b7fd06eafa54fa44ecfb9fb2d02d00

At least one carving tool more that now does this 😉

photorec should also handle 0xff-pre-segment-marker values

This could be done in several ways, as an example with a "switch" to look only for "more common" and a switch for "all values" (or something similar),

Agree there is a definite need for what was dubbed as "anomaly detection" in tools. Though I would personally like this be relative to a sample set I as an analyst can control not someone else's model of the world.

JFYI (about how the standard is a mess) see here

Now if it only were JPEG I would be very glad. IMO alas this is more rule than exception (that file formats have idiotic designs).


   
ReplyQuote
(@impulse)
New Member
Joined: 16 years ago
Posts: 4
 

Calvin, I would love to know if my wild guess of 99.5% of images are 0xe0, 0xe1 is actually anywhere near correct. Your suggested scan might help here.

Good guess Michael!

I ran a quick test on 100k images sourced from two popular image hosts (content from a few years back) and detected the following

  • APP0 (0xE0) 17.8%
  • APP1 (0xE1) 82.1%
  • APP13 (0xED) 0.084%
  • COM (0xFE) 0.048%

So, with this limited set, we see 99.9% are covered by the traditional EXIF and JFIF standard headers. One caveat here is that these image hosts might have employed some filtering on user uploads.

jaclaz & joachimm– you might be interested to note that JPEGsnoop v1.7.4 added a "relaxed parsing" mode that actually displays all images correctly from jaclaz's 256 image test set apart from the 3 below. (There should be no crashing going on ) )

  • SOI (0xD8)
  • EOI (0xD9)
  • SOS (0xDA)

One type of file I skip when carving is jpeg headers within an AVI file, these have the string "AVI" starting in the 6th byte (after a e0 or e1). This was probably determined by false positives rather than detailed analysis of the spec!

There are a few video formats that use embedded JPEG images, including both AVI and QuickTime container formats and some less common proprietary formats.

Of the more common containers

  • Microsoft AVI the JPEG AVI spec typically has an APP0 marker (0xFFE0) with "AVI1" identifier immediately after the SOI (start of image). This seems to match what you have observed empirically.
  • Apple QuickTime the Motion-JPEG A format typically has an APP1 marker (0xFFE1) with identifier "mjpg", but this does not necessarily immediately follow the SOI. Given that this identifier is not in a fixed position, filtering out these is a little more involved.

Cal


   
ReplyQuote
(@joachimm)
Estimable Member
Joined: 17 years ago
Posts: 181
Topic starter  

So, with this limited set, we see 99.9% are covered by the traditional EXIF and JFIF standard headers. One caveat here is that these image hosts might have employed some filtering on user uploads.

Maybe one of you can explain to me why you are so interested in these percentages? What value do these numbers have in your perception? If tomorrow e.g. a new EXIF variant hits the market that uses APP5 these numbers will change.

jaclaz & joachimm– you might be interested to note that JPEGsnoop v1.7.4 added a "relaxed parsing" mode that actually displays all images correctly from jaclaz's 256 image test set apart from the 3 below. (There should be no crashing going on Smile )

Thanks for making that so; that is good news.


   
ReplyQuote
(@mscotgrove)
Prominent Member
Joined: 17 years ago
Posts: 940
 

Why I want to know the percentage. My feeling is when dealing with JPEGs, there will normally be many photos from the same camera. This could be Granny on her seaside holiday, or CP images.

If either of them had a camera that was not standard, I hope any investigator / recovery person would spot many JPEG named files that did not open and so investigate.

If a disk has 1000 standard images and one non standard, then personally I would probably ignore it. It is unlikely that the only CP image would be in a different format - I would expect there would be many. If there was just one, I suspect it may have been downloaded having been processed elsewhere.

Sometime, iPhone 99, or Android version 27 may start using another standard, but this would hit software very quickly, and many updates would be required. In the meantime, I will upgrade my software to allow for the two extra codes highlighted in this discussion.

The area I am finding many many valid variations are with video files, along with built in fragmentation this keeps me busy.


   
ReplyQuote
Page 3 / 4
Share: