±Forensic Focus Partners

Become an advertising partner

±Your Account


Username
Password

Forgotten password/username?

Site Members:

New Today: 0 Overall: 36768
New Yesterday: 0 Visitors: 115

±Follow Forensic Focus

Forensic Focus Facebook PageForensic Focus on TwitterForensic Focus LinkedIn GroupForensic Focus YouTube Channel

RSS feeds: News Forums Articles

±Latest Articles

±Latest Videos

±Latest Jobs

JPG Forensics

Computer forensics discussion. Please ensure that your post is not better suited to one of the forums below (if it is, please post it there instead!)
Reply to topicReply to topic Printer Friendly Page
Forum FAQSearchView unanswered posts
Page 1, 2  Next 
  

tootypeg
Senior Member
 

JPG Forensics

Post Posted: Sep 29, 19 16:25

Hi all,

Quick question, ive been looking into JPG forensics a little bit and I have a load of cached JPGS from a single website. They all have the same quantization values suggesting this is a trait of the site, but in addition, they also have similar 'Progressive DCT SOF2 17 byte values. I just cant find any resources to explain this to me in simple terms, what is this value?  
 
  

athulin
Senior Member
 

Re: JPG Forensics

Post Posted: Sep 29, 19 17:11

- tootypeg
Hi all,

Quick question, ive been looking into JPG forensics a little bit and I have a load of cached JPGS from a single website. They all have the same quantization values suggesting this is a trait of the site, but in addition, they also have similar 'Progressive DCT SOF2 17 byte values. I just cant find any resources to explain this to me in simple terms, what is this value?


'Progressive' refers to progressive coding to speed up display over the internet: the image is first coded very 'coarsely' giving a kind of out-of-focus view, then coded with progressively more 'fine' resolution, showing more and more details. Good when you have a slow link, as you get a rough approximation to the image fairly quickly. And if your browser/viewed does progressive JPEG -- used to be a problem in some quarters.

'DCT' = Discrete Cosine Tranform, which is the backbone of JPEG compression. Ask a mathematician to explain that ...

SOF - 'Start of Frame'. A magic number if you like: JPEG has multiple SOFs to flag the data described. One of these is the baseline DCT SOF (SOF0), another the progressive DCT SOF (SOF2).

Following the SOF are a number of fixed bytes, with the same content layout for all types of SOFs, (8 bytes I think) and a number of additional fields, the number of which are specified by one of the fixed fields. I guess that the particular SOF header you have ... produces 17 bytes either in toto, or in the SOF fields, or in only the dynamic SOF fields ...

You probably need the JPEG standard to make sense of the structure of a JPEG file.  
 
  

jaclaz
Senior Member
 

Re: JPG Forensics

Post Posted: Sep 29, 19 17:20

Around pages 34-35 here?

www.w3.org/Graphics/JP...tu-t81.pdf

jaclaz
_________________
- In theory there is no difference between theory and practice, but in practice there is. - 
 
  

trewmte
Senior Member
 

Re: JPG Forensics

Post Posted: Sep 29, 19 17:45

From what you have stated could it be you are looking at the 17 bytes associated with DHT?


DHT
The DHT (Define Huffman Table) marker defines (or redefines) Huffman tables, which are identified by a class (AC or DC3) and a number. A single DHT marker can define multiple tables; however, baseline mode is limited to two of each type, and progressive and sequential modes are limited to four. The only restriction on the placement of DHT markers is that if a scan requires a specific table identifier and class, it must have been defined by a DHT marker earlier in a file.

The structure of the DHT marker is shown below. Each Huffman table is 17 bytes of fixed data followed by a variable field of up to 256 additional bytes. The first fixed byte contains the identifier for the table. The next 1 6 form an array of unsigned 1-byte integers whose elements give the number of Huffman codes for each possible code length (1-16). The sum of the 1 6 code lengths is the number of values in the Huffman table. The values are 1 byte each and follow, in order of Huffman code, the length counts.

The number of Huffman tables defined by the DHT marker is determined from the length field. An application needs to maintain a counter that is initialized with the value of the length field minus 2. Each time you read a table you subtract its length from the counter. When the counter reaches zero all the tables have been read. No padding is allowed in a DHT marker, so if the counter
becomes negative the file is invalid.


Field Size
1 byte - The 4 high-order bits specify the table class. A value of 0 means a DC table, a value of 1 means an AC table. The 4 low-order bits specify the table identifier. This value is 0 or 1 for baseline frames and 0, 1 , 2, or 3 for progressive and extended frames.

16 bytes - The count of Huffman codes of length 1 to 16. Each count is stored in 1 byte

Variables - The 1-byte symbols sorted by Huffman code. The number of symbols is the sum of the 1 6 code counts
_________________
Institute for Digital Forensics (IDF) - www.linkedin.com/groups/2436720
Mobile Telephone Examination Board (MTEB) - www.linkedin.com/groups/141739
Universal Network Investigations - www.linkedin.com/groups/13536130
Mobile Telephone Evidence & Forensics trewmte.blogspot.com 
 
  

tootypeg
Senior Member
 

Re: JPG Forensics

Post Posted: Sep 29, 19 19:32

Thanks everyone, all three of you have really helped me move this forward.

My reason for asking is im putting together some work and want to make sure Im doing this correctly. I have collected the quantization tables from cached browser images from 50 pornography sites online. Basically I think the quant tables can allow me to tell which site an image has come from, but im addition, I also collected the Start Of Frame (Progressive DCT) 17 byte value as this also to me (from a data/pattern matching point of view), also seemed to be of value, but Im worried that I've miss understood and therefore Im tentative to include it.

So, im guessing from the data ive got, each site seems to encode its hosted JPGs in their own way (and all hosted JPGs on each individual site have consistent Quant tables). SO, I think if someone gave me a random image from one of the 50 sites, I could match it to one via comparing the quant tables. This seems to be my interpretation but Im also not a jpg expert. Does that sound useful and my understanding seem right?  
 
  

trewmte
Senior Member
 

Re: JPG Forensics

Post Posted: Sep 29, 19 19:46

tootypeg a bit of extra info:



SOFn
The SOFn (Start of Frame) marker defines a frame. Although there are many frame types, all have the same format. The SOF marker consists of a fixed header after the marker length followed by a list of structures that define each component used by the frame. The structure of the fixed header and the structure of a component definition are shown below.

Components are identified by an integer in the range 0 to 255. The JFIF standard is more restrictive and specifies that the components be defined in the order {Y, Cb, Cr} with the identifiers {1, 2, 3} respectively. Unfortunately, some encoders do not follow the standard and assign other identifiers to the components. The most inclusive way for a decoder to match the colourspace component with the identifier is to go by the order in which the components are defined and to accept whatever identifier- the encoder assigns. There can be only one SOFn marker per JPEG file and it must precede any SOS markers.

Fixed Portion of an SOF Marker

Field Size.......Description
1 byte..........Sample precision in bits (can be 8 or 12)
2 bytes.........Image height in pixels
2 bytes.........Image width in pixels
1 byte.........Number of components in the image


Component-Specific Area of an SOF Marker

Field Size.......Description
1 byte...........Component identifier. JPEG allows this to be 0 to 255. JFIF restricts it to 1 (Y), 2 (Cb), or 3 (Cr)

1 byte...........The 4 high-order bits specify the horizontal sampling for the component. The 4 low-order bits specify the vertical sampling. Either value can be 1 , 2, 3, or 4 according to the standard. We do not support values of 3 in our code

1 byte...........The quantization table identifier for the component. Corresponds to the identifier in a DQT marker. Can be 0, 1 , 2, or 3
_________________
Institute for Digital Forensics (IDF) - www.linkedin.com/groups/2436720
Mobile Telephone Examination Board (MTEB) - www.linkedin.com/groups/141739
Universal Network Investigations - www.linkedin.com/groups/13536130
Mobile Telephone Evidence & Forensics trewmte.blogspot.com 
 
  

athulin
Senior Member
 

Re: JPG Forensics

Post Posted: Sep 30, 19 05:34

- tootypeg
So, im guessing from the data ive got, each site seems to encode its hosted JPGs in their own way (and all hosted JPGs on each individual site have consistent Quant tables).


That sounds interesting, but somewhat odd. What mechanism in JPEG encoding/compression or web delivery would account for that? A tool artifact, I could accept. A user artifact ('Our user X set up the tool ...') perhaps. Web server plugins like ImageResizer -- probably falls into the tool artifact category.

Or is it an artifact associated with the original data? (Difficult to see how: why would tool1 2 and 3 do the same thing just because input data had a particular format?)

Or is it related to the content? Don't see how. Or image sizes?

Or is there some kind of copyright protection involved? Some kind of watermarking component involved that works depending on the identity of the publisher or the web site?

SO, I think if someone gave me a random image from one of the 50 sites, I could match it to one via comparing the quant tables. This seems to be my interpretation but Im also not a jpg expert. Does that sound useful and my understanding seem right?


It's a start, but .... What would happen if you took images from outside the master collection? Say, Corel image library, or images from some of those CDs that are archived over at archive.org? Are they identified as 'not recognized'?

Can you get the same result out of tools encoding *you* choose? Or have you found something that works only because of some kind of 'best practice' in the porn industry?  
 

Page 1 of 2
Page 1, 2  Next