Notifications

Clear all

Document Metadata Extraction

Page 1 / 2 Next

General (Technical, Procedural, Software, Hardware etc.)

Last Post by Novunix 12 years ago

18 Posts

10 Users

0 Reactions

2,257 Views

RSS

cvanaernam

(@cvanaernam)

Active Member

Joined: 12 years ago

Posts: 10

Topic starter 21/06/2013 4:00 am

I am looking for a program that can take a large group of assorted files, mainly Word Docs, Excels, Powerpoints and PDFs and extract the Document level metadata and then export that to a csv files.

I do have Pinpoint's Metadiscover tool, but that only handles Microsoft files and it has been hit or miss at times for me.

I also tried metadata-extractor, but could not get the program to run correctly and tried troubleshooting but still failed.

I have also tried the exif gui which allows for the extraction of metadata from a group of files, but does not have a csv output option and also creates an individual output file for each original file analyzed.

Thanks for any help

Quote

bshavers

(@bshavers)

Estimable Member

Joined: 20 years ago

Posts: 211

21/06/2013 4:38 am

In X-Ways Forensics,
1) Refine volume snapshot ("Extract internal metadata…")
2) Select the files and and right click to choose "Export list…"
3) Choose the export as TSV
That's it.

There also are a few dozen catagories other than metadata you can add to the export list (type, extension, hash, size, etc..).

ReplyQuote

keydet89

(@keydet89)

Famed Member

Joined: 21 years ago

Posts: 3568

21/06/2013 5:44 pm

EXIFTool is based on a Perl module…which means that you can modify it to do anything you like.

I taught the use of EXIFTool to identify metadata to a group of developers, and one of them turned around and used the module to create a tool to run through web directories, locate image files, and ID and remove all metadata prior to the site going live.

I would suggest that you could do something similar, or ask someone to assist you in doing the same thing.

ReplyQuote

jlindmar

(@jlindmar)

Eminent Member

Joined: 20 years ago

Posts: 30

21/06/2013 7:07 pm

Take a look at Nuix's Proof Finder http//www.prooffinder.com/

You will need to create a custom column profile to display and export the information, but it is able to extract a variety of metadata from a number of file types.

It's $100.00 annually and all proceeds go to charity. It does have a 15 GB data set cap though.

Regards,

Jesse

ReplyQuote

chad131

(@chad131)

Trusted Member

Joined: 16 years ago

Posts: 63

21/06/2013 11:17 pm

You could try MetaExtractor.

http//www.4discovery.com/our-tools/#2

Disclaimer I wrote the tool.

–
Chad

ReplyQuote

keydet89

(@keydet89)

Famed Member

Joined: 21 years ago

Posts: 3568

21/06/2013 11:24 pm

Chad,

Speaking of the tools, is there any reason why Link Parser doesn't parse the shell item ID lists?

http//windowsir.blogspot.com/2013/06/there-are-four-lights-lnk-parsing-tools.html

Note I didn't include Link Parser in the testing, in part because my selection of tools was based on what others said they used. However, I did run the tool separately and found that it does not parse the shell item ID lists.

ReplyQuote

cvanaernam

(@cvanaernam)

Active Member

Joined: 12 years ago

Posts: 10

Topic starter 21/06/2013 11:42 pm

Thank you all for the responses.

-Brett I do not have access to a copy of X-ways forensics so I could not try your method.

I eventually was able to mold exiftool to read the entire set of data, once I moved everything into one single directory and then parse the original document metadata date and times I needed to a csv file.

Chad I have MetaExtractor a try and it seemed to work very well, easy to use and does exactly what I wanted. A problem I found was that sometimes it would not totally parse all the files within a specific folder. For example I had a folder with 16 PDF files in it but when I added that specific folder to MetaExtractor it only parsed 6.

Thanks again.

ReplyQuote

chad131

(@chad131)

Trusted Member

Joined: 16 years ago

Posts: 63

21/06/2013 11:47 pm

Speaking of the tools, is there any reason why Link Parser doesn't parse the shell item ID lists?

When i originally wrote the link parser, I didn't need the shell items. I'll have an update out in the next week or two that fixes it. If you have any files that are good to test with, I would appreciate them.

ReplyQuote

chad131

(@chad131)

Trusted Member

Joined: 16 years ago

Posts: 63

21/06/2013 11:47 pm

Speaking of the tools, is there any reason why Link Parser doesn't parse the shell item ID lists?

ReplyQuote

chad131

(@chad131)

Trusted Member

Joined: 16 years ago

Posts: 63

21/06/2013 11:55 pm

Chad I have MetaExtractor a try and it seemed to work very well, easy to use and does exactly what I wanted. A problem I found was that sometimes it would not totally parse all the files within a specific folder. For example I had a folder with 16 PDF files in it but when I added that specific folder to MetaExtractor it only parsed 6.

Are the PDF files that it missed corrupt? Can you open them in Acrobat? Also, were the files missing from the output entirely or did they just not have metadata extracted (i.e. file name, path, md5 are there, just no metadata)?

ReplyQuote

Page 1 / 2 Next

Podcast: Well-Being In Digital Forensics And Policing: Insights From Hannah Bailey

Hannah Bailey shares her journey from frontline policin...

By Zoe , 3 days ago
RE: Android Forensics

Hi, Try decompressing the file using zlib in python. ...

By Dexter4n6 , 3 days ago
Interview: Neal Ysart, Co-Founder, The Coalition of Cyber Investigators

Neal Ysart shares how The Coalition of Cyber Investigat...

By Zoe , 4 days ago

8 Forums
15.7 K Topics
92.3 K Posts
12 Online
41.1 K Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed