PDF Metadata Tool?
I'm looking for a couple of good PDF metadata viewing/extracting tools. Metadataminer Catalogue does an okay job, and I'm waiting for a trial version of APGetInfo from Appligent, but if anybody has any other tools to recommend, that would be excellent.
I'm conducting an internal test and evaluation of various metadata tools, and right now PDF is the only format I need more tools for.
Harlan, your Word Perl module looks very interesting - I'll probably be writing an extraction script using it as part of my tool evaluation.
Interesting location…means you're very near me.
Thanks koko, that works nicely. Plus, it's command line!
Harlan - yup! I work in downtown DC doing forensics for a federal agency.
Anybody have any other tools for comparison?
koko, do you know anything about the PdfID0 and PdfID1 fields? It looks like some sort of hash or checksum - which might be useful when verifying the authenticity of a partially-downloaded (or otherwise mangled) PDF doc.
i haven't come across those fields before, so i checked it out. according to the pdf reference (pdfreference16.pdf - you should download and peruse) it's a file identifier. an md5 of various info about the file so that it has a unique string to identify the doc without having to use the filename. there's a discussion of it in section 10.3.
from what i understand, if something's partially downloaded, you won't get the id because it comes in the trailer at the end. but you should be able to decode various objects that you did receive.