JPEG carving/identi...
 
Notifications
Clear all

JPEG carving/identifying/recovering

39 Posts
5 Users
0 Reactions
7,213 Views
(@joachimm)
Estimable Member
Joined: 17 years ago
Posts: 181
Topic starter  

Relying on 1 byte value here sounds like a big gamble.

Why not use alternative techniques here like photo dna [http//en.wikipedia.org/wiki/PhotoDNA]?


   
ReplyQuote
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

@Impulse

I tried jpegsnoop 1.7.3.
Could not find that option "relaxed parsing", and the tool (only quickly tested) behaves like 1.7.2.

Cannot find version 1.7.4.

I'll wait for it's release ) , and test it to confirm.

jaclaz


   
ReplyQuote
(@impulse)
New Member
Joined: 16 years ago
Posts: 4
 

@Impulse
Cannot find version 1.7.4.
I'll wait for it's release ) , and test it to confirm.

1.7.4 is in beta. I have sent you a PM with a link so that you can test it now on the modified headers.

thanks


   
ReplyQuote
jaclaz
(@jaclaz)
Illustrious Member
Joined: 18 years ago
Posts: 5133
 

1.7.4 is in beta. I have sent you a PM with a link so that you can test it now on the modified headers.

Got it and replied. )

The relaxed parsing added support for almost all the "invalid" ones, but the "Cx" ones still crash, at least here and on my test image.

The D8, D9 and DA don't parse (as expected), the FF also doesn't parse.

jaclaz


   
ReplyQuote
(@francesco)
Trusted Member
Joined: 12 years ago
Posts: 79
 

I'm a bit puzzled at those non-supported markers, how is a parser going to know if there's a payload after them and how can it be skipped? I remember that 0xFF in part of the data were followed by a 0x00 to prevent them being recognized as markers (the 0x00 had to be skipped) but I don't think it actually applied to all kind of segment payloads.


   
ReplyQuote
(@joachimm)
Estimable Member
Joined: 17 years ago
Posts: 181
Topic starter  

I'm a bit puzzled at those non-supported markers, how is a parser going to know if there's a payload after them and how can it be skipped? I remember that 0xFF in part of the data were followed by a 0x00 to prevent them being recognized as markers (the 0x00 had to be skipped) but I don't think it actually applied to all kind of segment payloads.

Every segment maker is followed by a size; from the revit07 specification.

# A segment consists of [2 byte segment marker] [2 byte segment size] [data]
# The segment size does not include the 2 bytes of the segment marker
# but the 2 bytes of the size itself is included
element segment
{
parameter pattern

variable size
{
value 2 big_endian
subtract 2
}
alignment size
}

Only 2 markers can have no size SOI and EOI. Note that the marker can be variable of size since leading 0xff values should be ignored (if I read the specs correctly).


   
ReplyQuote
(@francesco)
Trusted Member
Joined: 12 years ago
Posts: 79
 

I'm a bit puzzled at those non-supported markers, how is a parser going to know if there's a payload after them and how can it be skipped? I remember that 0xFF in part of the data were followed by a 0x00 to prevent them being recognized as markers (the 0x00 had to be skipped) but I don't think it actually applied to all kind of segment payloads.

Every segment maker is followed by a size; from the revit07 specification.

Only 2 markers can have no size SOI and EOI. Note that the marker can be variable of size since leading 0xff values should be ignored (if I read the specs correctly).

Wikipedia seems to list also the Restart (RST) markers as having no payload, I wonder if they're inside the segments data or outside (or both cases). Maybe it's better to not know 😯 .

I remember that JPEGs markers could be found by looking for any 0xFF that isn't followed by another 0xFF or a 0x00 but maybe I'm confusing that with another format. However I found this To allow for recovery in the presence of errors, it must be possible to detect markers without decoding all of the intervening data. Hence markers must be unique. To achieve this, if an FFh byte occurs in the middle of a segment, an extra 00h stuffed byte is inserted after it and 00h is never used as the second byte of a marker. on this page. Maybe I remembered things right or is that just a subset of cases? I remember I wrote two different carvers and one worked on that markers-are-unique concept but it was something like 15+ more years ago, I wonder if it would still work with recent images.


   
ReplyQuote
(@joachimm)
Estimable Member
Joined: 17 years ago
Posts: 181
Topic starter  

I remember that JPEGs markers could be found by looking for any 0xFF that isn't followed by another 0xFF or a 0x00 but maybe I'm confusing that with another format.

Seems to correspond with the standard
http//www.w3.org/Graphics/JPEG/itu-t81.pdf, page 32

B.1.1.3 Marker assignments
All markers shall be assigned two-byte codes a X’FF’ byte followed by a second byte which is not equal to 0 or X’FF’. The second byte is specified in Table B.1 for each defined marker. An asterisk (*) indicates a marker which stands alone, that is, which is not the start of a marker segment.

Which also answers your question on RST which apparently a stand alone marker making its size (at least) 2 bytes. So my earlier comment about SOI and EOI being the only markers was wrong, thanks for pointing that out. There appears to be another one used without a segment, namely TEM.

Also note
Page 82

Note that optional X’FF’ fill bytes which may precede any marker shall be discarded before determining which marker is present.


   
ReplyQuote
(@impulse)
New Member
Joined: 16 years ago
Posts: 4
 

For anyone still interested in the details… )

The only markers that aren't followed by a length are SOI, EOI, RST0-RST7 and TEM. Given the lack of redundancy in the scan segment (compressed image data), IMHO it would have been nicer if SOI was defined to include a length. Perhaps the lack of length was done to simplify the task of encoders built for streaming.

Wikipedia seems to list also the Restart (RST) markers as having no payload, I wonder if they're inside the segments data or outside (or both cases). Maybe it's better to not know 😯 .

We should only find RST within the scan data as they're intended to provide robustness in case the decoder loses sync / data stream gets damaged. I think it is really unfortunate that very few digicams or imaging programs enable restart markers. The small filesize hit is well worth it when it comes to dealing with recovery!

I remember that JPEGs markers could be found by looking for any 0xFF that isn't followed by another 0xFF or a 0x00 but maybe I'm confusing that with another format.

It's also possible to get runs of consecutive 0xFF as a result of padding (fill bytes) that immediately precede a marker.

It shouldn't come as much of a surprise that if we encounter corruption at certain key points within the file, the marker-length mechanism can misinterpret unrelated data as a length and derail any decoding/parsing. In JPEGsnoop (relaxed parsing mode), one method I have used to work around this was by speculatively testing for a valid marker after the interpreted length offset. If it is invalid, I then test for an adjacent marker or the typical length of the marker segment.

I remember I wrote two different carvers and one worked on that markers-are-unique concept but it was something like 15+ more years ago, I wonder if it would still work with recent images.

You'd be surprised – not a lot has changed since the spec came out ) For the purpose of JPEG carving, testing for stuff bytes (ie. 0xFF00 in the scan segment) can be a pretty effective heuristic given arbitrary sectors/fragments.


   
ReplyQuote
Page 4 / 4
Share: