Steganography with ...
 
Notifications
Clear all

Steganography with ooXML (zip) - abusing zip structures

20 Posts
5 Users
0 Likes
3,089 Views
joakims
(@joakims)
Posts: 224
Estimable Member
Topic starter
 

Updated to version 4 - 14.12.11; http//www.mediafire.com/?m9hk90yv93lhfld

I thought I came up with something brilliant and new and started writing code, and just realized when finalizing it that it's been partly described. It's about stealth hiding of data inside a zip archive by abusing the file format (zip format). Think of it like datahiding in NTFS metafiles, but instead for zip;

http//www.codeproject.com/KB/security/steganodotnet16.aspx
http//www.reversinglabs.com/products/NyxEngine.php

At first I thought about a general zip approach, but moved more into specializing it towards ooXML (like docx, pptx, xlsx, odp, odf, zip, etc). That is because ooXML is following the zip specification, the steganography method will also work on zip archives in general. Probably more methods will work if not focusing on ooXML specifically. And by working I mean, for instance Office opening the document without complaining about anything at the same time as unzipping the archive will not show the file. Some of the methods I started out with

- Data hidden in the Extra Field (Office 2007/2010 will not complain). What I have focused on.
- Data hidden before central directory, after compressed data in local file header (Office 2007/2010 will complain about errors, but most zip handling software will not).
- Data hidden right before central directory end (Office 2007/2010 will complain about errors, but most zip handling software will not).
- Data hidden in the information fields inside central directory (Office 2007/2010 will complain about errors, but most zip handling software will not).

Note about Office 2007/2010
Office 2007 (and probably 2010) cannot repair or detect content in a docx with tampered information fields. This is extremely lame as the zip structure is perfectly healthy and will open in most zip handling software (just tried WinRar and Windows' own zip implementation). This is due to MS Office using its own headers inside the EF.

The method used in the first link above, is different than mine, and will also produce errors when opening such a document in MS Office. The second link seems to describe the issues at a very general level. They mention a lot of potential issues/stuff, but do not go into the very details (I've browsed their documents). And by trying their supplied program in the SDK (NyxConsole.exe), it is unable to identify the stuff hidden by my hider part of my suit (ooXML_Steganography_Hider.exe). Though, it could be that the console has not all available stuff from the sdk implemented (I don't know). Apart from that the console cannot handle broken archives with the Central Directory (CDS) damaged.

So how does my program work?
It is (ab)using the Extra Fields (EF) tied to the Local File Headers (LFH) and the Central Directory (CDS). This is an information field that ooXML uses to store information about the growth of the file. According to the ooXML specification this field can be reserved as much as 65535 bytes per entry (minus the size of the other fields of that entry). In practice only the entries in CDS can hold that much data without Office complaining. The EF in LFH appears to handle a maximum of 256 or 512 bytes, and only for certain of the files. MS Office uses it own signature inside the EF and always starts with "20 A2" and followed by the total size of EF minus 4 bytes. Actually only these first 4 bytes of the header is actually used, leaving the next 4 bytes of the header free, as well as the whole following blocks of either 256 or 512 bytes (docx and pptx). The xlsx documents seems to follow a slightly different approach and is currently not supported by my program if using the LFH methods.

So how do you know where the data is located? I have used a custom header in the 4 bytes of the header that is not used. First 2 bytes is what part of a larger file this fragment belongs to. For smaller files it is always set to "01 00". The next 2 bytes is for the size of the hidden data. The hidden data is then always immediately followed.

The suit is comprised of 2 application. A hider and an unhider (ooXML_Steganography_Hider.exe and ooXML_Steganography_UnHider.exe). Should be self explanatory but I'll describe the images anyway, as there are some important things to note when using it.

Always start by choosing the file to be hidden. Tick off for compression and optionally encryption on it (AES 256). Then click "Prepare secret data" and its size after processing will be given in the box marked in read. Then choose the container for which you want to hide the secret file in. The program will then analyze the container and either enable or disable the controls for the different methods, depending on the size of the secret data. In the above image you see that the size is 152 bytes and you may choose between method 1, 2 or 3. Method 4 is only activated when the secret data file exceeds 65024 bytes, which then require splising it up in fragments. Choose one of the available internal archives from the dropdown box, and click on "Prepare archive", then followed by clicking on "Hide file". If the tickbox about timestamps are ticked, then all 4 MACE (or MAC for fat) timestamps are reset to original after modification. Now data should be hidden. To unpack the data launch the unhider;

Tick off for the same options as used when hiding the data. When the target file has been opened, it will be autoanalyzed. If hidden data is found the suitable methos is activated with the right internal archives present in the dropdown. Click "Unhide data" to ripp it out.

This is how it looks like in a hexeditor;

The start of the Local File Header (LFH) is marked with the upper red line. The start of the Extra Field is at offset 0x709. The required signature is marked. The next 4 bytes after that is my own header. There you can read that the size of the hidden data is 0x95 bytes, starting at 0x711.

Some limitations;
- MS Office / OpenOffice will wipe the extra field if the document is modified and saved (other zip handling software do not behave like this - like WinRar and Windows' own zip handler keeps the hidden data after a mod & save operation).
- Program must be restarted after each hide/unhide (yes I know it's lame).
- Only 1 encryption password per method (if hiding more than 1 file).
- When fragmenting with method 4, only 1 use of this method can be used on the same container.
- Max size of secret file depends on the container. It is given by the number of CDS entries with empty EF, multiplied by 65024 bytes. On a minimal docx this equates to roughly 700 kB. A pptx takes far more by the way.

When detecting this sort of stuff you must carefully evaluate the EF both in the LFH as well as in CDS. The CDS stuff is more easiy seen in a hex editor since the CDS is located at the end of the archive, right before the Central Directory End. The LFH hidden stuff is therefore more tricky to see, unless you know what to look for. But if you programatically analyze the files it is much easier. Beware that the code is not at all very efficient and there is plenty of room for improvement. I will provide the source when it has been improved slightly. I think it is just too buggy to show it off right now. Anyways, it proves the point that it works.

Tip
After a successful hide operation, try to unhide it just to check for any mismatches in CDS signature detection. If any error occur, redo the hide operation with a different encryption password and verify.

I have not really seen this described before, so I hope you find it interesting. I think it's very interesting.

Updated to version 4 - 14.12.11; http//www.mediafire.com/?m9hk90yv93lhfld

 
Posted : 18/07/2011 5:47 am
(@koonaka)
Posts: 7
Active Member
 

Very intersting stuff. Have you tried to detect the presence of your process being applied to a zipfile?
What are some applications for your system?

Adrian

 
Posted : 18/07/2011 7:26 pm
joakims
(@joakims)
Posts: 224
Estimable Member
Topic starter
 

Updated to version 2 of the toolkit; http//www.mediafire.com/file/t6ycso9tquboway/ooXML_Steganography_v2.zip

Test the modified container2.docx inside the download. Password is "joakim".

Fixes;
- Method 1 & 2 for xlsx's.
- Reduced the verbose output info.
- Changed the encryption algo back from 3DES to AES256.
- More correct activate/deactivate of method controls after initial analysis of the input file.

Note
MS Office will wipe the extra field if the document is modified and saved. The same behaviour seems to apply for OpenOffice. However, WinRar for instance will not wipe the extra field if the file is modified and saved, and nor will Windows' own zip handler.

@koonaka
I do not have a license for any of the good steganalysis software like StegAlyzer etc. If anyone have that, it would be nice with some info about how it will report on these modified documents.

The only purpose with this is to show how data can be hidden in zip structured files.

 
Posted : 19/07/2011 12:15 am
joakims
(@joakims)
Posts: 224
Estimable Member
Topic starter
 

New version http//www.mediafire.com/file/xod7y2riycfnmxu/ooXML_Steganography_v3.zip

Fixed
- The control showing the verbose information will now work satisfactory, and does not "freeze" anylonger.
- Added a proper information to indicated if program has finished successfully. The last message should say "FINISHED!!".

Tip
After a successful hide operation, try to unhide it just to check for any mismatches in CDS signature detection. If any error occur, redo the hide operation with a different encryption password and verify.

Some updated news
I discovered that the comment field in the Central Directory End (CDE) also can be used for storing data. I may have been wrong about what I said about this earlier. Size limit is as with EF in CDS (0xFFFF - minus the size of the rest of that entry). The beaviour of all software mentioned earlier is the same also for this entry. But this method is probably the easiest to identify since the data is located at the very end of the archive (end of CDE). According to the not-always-so-easy-to-understand ooXML documentation, it may seem like data in the Extra Field may survive a mod & save operation if archive is of type zip64..

 
Posted : 20/07/2011 12:09 am
joakims
(@joakims)
Posts: 224
Estimable Member
Topic starter
 

I got an evaluation copy of StegAlyzerSS which I tested on these newstyle office documents. It failed to identify anything suspicious, so I sent them a note about it. They replyed that it will be on their list of future enhancements. Have not yet tested other software in that category though. Report here if you have.

 
Posted : 22/07/2011 10:58 am
joakims
(@joakims)
Posts: 224
Estimable Member
Topic starter
 

Just a note about signed documents. The digital signature for MS Office documents is still valid after data has been injected as already described (I tested). 😯 That means documents with good signature still may contain hidden data. The document content is still the same, but not the file content. This is also close to how the same stuff works for signed executables.

 
Posted : 02/12/2011 2:50 am
joakims
(@joakims)
Posts: 224
Estimable Member
Topic starter
 

And to add even more to it If the (Office 2007/2010) document is both encrypted and signed by the built-in functionality, data can still be hidden without invalidating the signature or even make Office complain about the file's integrity. Simply add the data to EOF! Seems like many places in such encrypted documents are not properly evaluated, inlcuding the header too!! 😯

 
Posted : 05/12/2011 4:00 am
jaclaz
(@jaclaz)
Posts: 5133
Illustrious Member
 

And to add even more to it If the (Office 2007/2010) document is both encrypted and signed by the built-in functionality, data can still be hidden without invalidating the signature or even make Office complain about the file's integrity. Simply add the data to EOF! Seems like many places in such encrypted documents are not properly evaluated, inlcuding the header too!! 😯

If I get it right, this means that if one does

COPY /B myofficedoc.xlsx + myhidden.txt
will produce a still valid myofficedoc.xlsx but with the contents of myhidden.txt added to it?

jaclaz

 
Posted : 05/12/2011 6:16 pm
joakims
(@joakims)
Posts: 224
Estimable Member
Topic starter
 

If I get it right, this means that if one does

COPY /B myofficedoc.xlsx + myhidden.txt
will produce a still valid myofficedoc.xlsx but with the contents of myhidden.txt added to it?

That's true if myofficedoc.xlsx is encrypted.

 
Posted : 05/12/2011 7:34 pm
jaclaz
(@jaclaz)
Posts: 5133
Illustrious Member
 

That's true if myofficedoc.xlsx is encrypted.

Yep, of course.
I was thinking of something "hooking" the encrypting command and silently adding the whole doc unencrypted at the end, just for the fun of it twisted (and to show how security is something too important to let it in the hands of theotherwise good MS guys wink ).

jaclaz

 
Posted : 05/12/2011 11:46 pm
Page 1 / 2
Share: