"I was thinking about something like (I see that you well know about it Smile ) repair-jpg
Yes. That's the same idea as my tool and is what I got the idea from.
"I thought that a "smart" jpeg parser might be able to "guess" much smaller ranges"
That might be possible if you actually decompress and decode the data. And depending on the corruption the tool may guess if the data is corrupt or not. My tool does not do full decode, I only huffman decompress so that I can link points in a jpeg image to specific data inside the file.
But still, we all at some point saw some jpeg that looks distorted color shift and shift in image data, yet the image viewer shows the entire image. Which basically means it had no trouble decoding it. So then it will not be able to tell corrupt from intact data unless you'd implement some kind of AI or something that actually looks at the image to detect if it looks 'odd' and at what point 'odd-ness' is introduced.
Even when you can detect you can not replace just those few corrupt bytes, you'd need to rewrite the whole MCU. As for guessing what to replace corrupt data with I am now actually experimenting with looking at data in MCUs right above the corrupt MCU which is often very similar. This kinda works, and works better in larger files. Larger files tend to be more 'forgiving' in general.
But for now I guess, repair of corruption in the JPEG bit stream requires quite a bit of manual labor, https://
That might be possible if you actually decompress and decode the data. And depending on the corruption the tool may guess if the data is corrupt or not. My tool does not do full decode, I only huffman decompress so that I can link points in a jpeg image to specific data inside the file.
But still, we all at some point saw some jpeg that looks distorted color shift and shift in image data, yet the image viewer shows the entire image. Which basically means it had no trouble decoding it. So then it will not be able to tell corrupt from intact data unless you'd implement some kind of AI or something that actually looks at the image to detect if it looks 'odd' and at what point 'odd-ness' is introduced.
Even when you can detect you can not replace just those few corrupt bytes, you'd need to rewrite the whole MCU. As for guessing what to replace corrupt data with I am now actually experimenting with looking at data in MCUs right above the corrupt MCU which is often very similar. This kinda works, and works better in larger files. Larger files tend to be more 'forgiving' in general.
But for now I guess, repair of corruption in the JPEG bit stream requires quite a bit of manual labor, https://
www.youtube.com/watch?v=M_i0ZqXhDuU
Sure, the difference is that in this particular scenario (as compared to a "random" corruption) we know exactly which bytes in the JPEG are "wrong", i.e. we know exactly where to replace the values, or if you prefer, we know exactly where the corruption occurred and where the repair is needed.
jaclaz
Yeah, but unless you can determine a pattern, which sometimes is possible because of specific bits 'flipping' in the raw (compressed and encoded) data in which case you can just flip them back, it's still guesswork. Now if the MCU affected as a result this corruption in RAW data is for example part of a even blue sky you could just probably repeat previous MCU. But if MCU happens to be at the boundary of some object against a clear blue sky repeating previous MCU may affect all data in the stream following it. JPEG doesn't describe specific individual pixels, they're encoded blocks (MCUs) that so to speak build on data from previous MCUs. One corrupt MCU will affect all data that follows. And one corrupt bit may be all it takes to make that happen.
Yeah, but unless you can determine a pattern, which sometimes is possible because of specific bits 'flipping' in the raw (compressed and encoded) data in which case you can just flip them back, it's still guesswork. Now if the MCU affected as a result this corruption in RAW data is for example part of a even blue sky you could just probably repeat previous MCU. But if MCU happens to be at the boundary of some object against a clear blue sky repeating previous MCU may affect all data in the stream following it. JPEG doesn't describe specific individual pixels, they're encoded blocks (MCUs) that so to speak build on data from previous MCUs. One corrupt MCU will affect all data that follows. And one corrupt bit may be all it takes to make that happen.
Sure I understand that, but (pure theory, mind you), if we know the exact offset of the first three corrupted bytes, a parser can understand if they belong to which MCU, and then bruteforce values 000000-FFFFFF there until that MCU is valid.
What I suspect is that it may be possible to greatly restrict the range, i.e. depending on where the three bytes "fall" the possible valid values are not that many. ?
At this point in time (almost ten years later) the original issue is of course of no relevance whatsoever anymore, the text and html data has been retrieved and retrieveing the few "important" .jpg is not needed.
I was pointing it out to you because - maybe - similar cases of "known address corruption" could happen in other scenario's.
jaclaz
"Sure I understand that, but (pure theory, mind you), if we know the exact offset of the first three corrupted bytes, a parser can understand if they belong to which MCU, and then bruteforce values 000000-FFFFFF there until that MCU is valid."
Yes, I appreciate that and I see what you're getting at and I wish it was do-able. But, 000000 IS valid as far as JPEG spec is concerned, it will happily decode past that. Now you'd normally not see that because because of Huffman compression but I use it as a technique to shift image data, push all data following it to the right. In the video I linked to, each time you see part of the image shift to the right, I'm just stuffing zeros.
Eventually the only way to determine if data is valid, is by looking at the actual picture and visually verify if it looks right.
Edit Maybe BTW stuff you propose is doable but you're asking the wrong person. I am not an ace programmer to start with. And even then, my math skills are limited, to get some idea of the math involved in coding and decoding JPEG data you may want to watch this series of videos https://