Notifications

Clear all

PDF Metadata Tool?

General (Technical, Procedural, Software, Hardware etc.)

Last Post by koko 20 years ago

6 Posts

3 Users

0 Reactions

1,701 Views

RSS

colsanders

(@colsanders)

Active Member

Joined: 20 years ago

Posts: 8

Topic starter 15/03/2006 4:46 am [#767]

Greetings all,

I'm looking for a couple of good PDF metadata viewing/extracting tools. Metadataminer Catalogue does an okay job, and I'm waiting for a trial version of APGetInfo from Appligent, but if anybody has any other tools to recommend, that would be excellent.

I'm conducting an internal test and evaluation of various metadata tools, and right now PDF is the only format I need more tools for.

Harlan, your Word Perl module looks very interesting - I'll probably be writing an extraction script using it as part of my tool evaluation.

Cheers,
Sean

Quote

keydet89

(@keydet89)

Famed Member

Joined: 22 years ago

Posts: 3568

15/03/2006 5:50 pm

Sean,

Interesting location…means you're very near me.

Harlan

ReplyQuote

koko

(@koko)

Eminent Member

Joined: 21 years ago

Posts: 21

16/03/2006 1:15 am

i've had very good experience with pdftk the pdf toolkit. best of all it is free.

http//www.accesspdf.com/pdftk/

ReplyQuote

colsanders

(@colsanders)

Active Member

Joined: 20 years ago

Posts: 8

Topic starter 16/03/2006 8:32 pm

Thanks koko, that works nicely. Plus, it's command line!

Harlan - yup! I work in downtown DC doing forensics for a federal agency.

Anybody have any other tools for comparison?

Cheers,
Sean

ReplyQuote

colsanders

(@colsanders)

Active Member

Joined: 20 years ago

Posts: 8

Topic starter 16/03/2006 8:38 pm

koko, do you know anything about the PdfID0 and PdfID1 fields? It looks like some sort of hash or checksum - which might be useful when verifying the authenticity of a partially-downloaded (or otherwise mangled) PDF doc.

Example

…
InfoKey CreationDate
InfoValue D20051014152450-07'00'
PdfID0 a563ff9aa22cdcfc73c39dde67f9f2a
PdfID1 39da5c21b911045aa21b3d1a1a6ed45
NumberOfPages 18
…

Thanks!

ReplyQuote

koko

(@koko)

Eminent Member

Joined: 21 years ago

Posts: 21

17/03/2006 12:46 am

i haven't come across those fields before, so i checked it out. according to the pdf reference (pdfreference16.pdf - you should download and peruse) it's a file identifier. an md5 of various info about the file so that it has a unique string to identify the doc without having to use the filename. there's a discussion of it in section 10.3.

from what i understand, if something's partially downloaded, you won't get the id because it comes in the trailer at the end. but you should be able to decode various objects that you did receive.

ReplyQuote