Hello all,
Just a quick question. Are there any tools for extracting metadata from PDF and DOC files? This can be GUI or CLI. I prefer something for Linux, but will take what I can get.
Also, I've read that PDF files may contain javascript. Is there a way to review the javascript within a PDF?
Along the same lines, is there a tool to review any macros in a DOC file?
Thank you in advance.
John
For the Windows side Google turned up Pinpoint MetaViewer and Metadata Analyzer which are freeware tools. I use Metadata Assistant from Payne Consulting.
For Acrobat, JavaScript Console which is included in the Acrobat JavaScript Debugger is a good starting point.
For Macro analysis in Word Visual Basic or Visual Studio Express (free).
I've been using Workshare's product for about 3 years now for metadata analysis of Office files. It analyzes your files, shows you the contents, and then also has a cleansing feature that is pretty robust. The version I've used doesn't analyze PDF files though.
For the Windows side Google turned up Pinpoint MetaViewer and Metadata Analyzer which are freeware tools. I use Metadata Assistant from Payne Consulting.
For Acrobat, JavaScript Console which is included in the Acrobat JavaScript Debugger is a good starting point.
For Macro analysis in Word Visual Basic or Visual Studio Express (free).
Thanks for the quick reply. I appreciate the help.
MetaViewer is a pretty good tool but it doesn't appear to work with Office 2007 documents.
Also, the Acrobat debugger requires the full version of Acrobat Pro, not the Reader.
I'll have to do some research on how to view Office macros with Visual Studio.
Thanks
MetaViewer is a pretty good tool but it doesn't appear to work with Office 2007 documents.
You must be looking at the old version. The new version works with 2007 just fine.
Also, the Acrobat debugger requires the full version of Acrobat Pro, not the Reader.
True.
I'll have to do some research on how to view Office macros with Visual Studio.
TechNet can be your friend.
Am I wrong in saying this, but I think Office 2007 docs do not keep metadata?
I've used the Perl scripts included on the DVD with "Windows Forensic Analysis" very effectively for this. In fact, I've not only retrieved the last printed date from Word docs, but also retrieved metadata from Excel spreadsheets.
Am I wrong in saying this, but I think Office 2007 docs do not keep metadata?
Office 2007 documents (.docx) are actually zip archives. The metadata is stored in the file in the archive with the path
docProps/core.xml
You can extract it to your current directory by doing (assuming you have Unix or Cygwin)
unzip -e -j filename.docx docProps/core.xml
But you'll find that it only contains the creator and the last modified by information. Office 2007 does not keep all of the metadata kept in prior versions of Office.
This also means that if you have a pre-Office 2007 document and want to preserve all the metadata, don't save it as a .docx.