Join Us!

Notifications
Clear all

PDF Metadata Tool?  

  RSS
colsanders
(@colsanders)
New Member

Greetings all,

I'm looking for a couple of good PDF metadata viewing/extracting tools. Metadataminer Catalogue does an okay job, and I'm waiting for a trial version of APGetInfo from Appligent, but if anybody has any other tools to recommend, that would be excellent.

I'm conducting an internal test and evaluation of various metadata tools, and right now PDF is the only format I need more tools for.

Harlan, your Word Perl module looks very interesting - I'll probably be writing an extraction script using it as part of my tool evaluation.

Cheers,
Sean

Quote
Posted : 15/03/2006 3:46 am
keydet89
(@keydet89)
Community Legend

Sean,

Interesting location…means you're very near me.

Harlan

ReplyQuote
Posted : 15/03/2006 4:50 pm
koko
 koko
(@koko)
New Member

i've had very good experience with pdftk the pdf toolkit. best of all it is free.

http//www.accesspdf.com/pdftk/

ReplyQuote
Posted : 16/03/2006 12:15 am
colsanders
(@colsanders)
New Member

Thanks koko, that works nicely. Plus, it's command line!

Harlan - yup! I work in downtown DC doing forensics for a federal agency.

Anybody have any other tools for comparison?

Cheers,
Sean

ReplyQuote
Posted : 16/03/2006 7:32 pm
colsanders
(@colsanders)
New Member

koko, do you know anything about the PdfID0 and PdfID1 fields? It looks like some sort of hash or checksum - which might be useful when verifying the authenticity of a partially-downloaded (or otherwise mangled) PDF doc.

Example


InfoKey CreationDate
InfoValue D20051014152450-07'00'
PdfID0 a563ff9aa22cdcfc73c39dde67f9f2a
PdfID1 39da5c21b911045aa21b3d1a1a6ed45
NumberOfPages 18

Thanks!

ReplyQuote
Posted : 16/03/2006 7:38 pm
koko
 koko
(@koko)
New Member

i haven't come across those fields before, so i checked it out. according to the pdf reference (pdfreference16.pdf - you should download and peruse) it's a file identifier. an md5 of various info about the file so that it has a unique string to identify the doc without having to use the filename. there's a discussion of it in section 10.3.

from what i understand, if something's partially downloaded, you won't get the id because it comes in the trailer at the end. but you should be able to decode various objects that you did receive.

ReplyQuote
Posted : 16/03/2006 11:46 pm
Share: