Document Metadata E...
 
Notifications
Clear all

Document Metadata Extraction

18 Posts
10 Users
0 Reactions
2,257 Views
(@cvanaernam)
Active Member
Joined: 12 years ago
Posts: 10
Topic starter  

I am looking for a program that can take a large group of assorted files, mainly Word Docs, Excels, Powerpoints and PDFs and extract the Document level metadata and then export that to a csv files.

I do have Pinpoint's Metadiscover tool, but that only handles Microsoft files and it has been hit or miss at times for me.

I also tried metadata-extractor, but could not get the program to run correctly and tried troubleshooting but still failed.

I have also tried the exif gui which allows for the extraction of metadata from a group of files, but does not have a csv output option and also creates an individual output file for each original file analyzed.

Thanks for any help


   
Quote
bshavers
(@bshavers)
Estimable Member
Joined: 20 years ago
Posts: 211
 

In X-Ways Forensics,
1) Refine volume snapshot ("Extract internal metadata…")
2) Select the files and and right click to choose "Export list…"
3) Choose the export as TSV
That's it.

There also are a few dozen catagories other than metadata you can add to the export list (type, extension, hash, size, etc..).


   
ReplyQuote
keydet89
(@keydet89)
Famed Member
Joined: 21 years ago
Posts: 3568
 

EXIFTool is based on a Perl module…which means that you can modify it to do anything you like.

I taught the use of EXIFTool to identify metadata to a group of developers, and one of them turned around and used the module to create a tool to run through web directories, locate image files, and ID and remove all metadata prior to the site going live.

I would suggest that you could do something similar, or ask someone to assist you in doing the same thing.


   
ReplyQuote
(@jlindmar)
Eminent Member
Joined: 20 years ago
Posts: 30
 

Take a look at Nuix's Proof Finder http//www.prooffinder.com/

You will need to create a custom column profile to display and export the information, but it is able to extract a variety of metadata from a number of file types.

It's $100.00 annually and all proceeds go to charity. It does have a 15 GB data set cap though.

Regards,

Jesse


   
ReplyQuote
(@chad131)
Trusted Member
Joined: 16 years ago
Posts: 63
 

You could try MetaExtractor.

http//www.4discovery.com/our-tools/#2

Disclaimer I wrote the tool.


Chad


   
ReplyQuote
keydet89
(@keydet89)
Famed Member
Joined: 21 years ago
Posts: 3568
 

Chad,

Speaking of the tools, is there any reason why Link Parser doesn't parse the shell item ID lists?

http//windowsir.blogspot.com/2013/06/there-are-four-lights-lnk-parsing-tools.html

Note I didn't include Link Parser in the testing, in part because my selection of tools was based on what others said they used. However, I did run the tool separately and found that it does not parse the shell item ID lists.


   
ReplyQuote
(@cvanaernam)
Active Member
Joined: 12 years ago
Posts: 10
Topic starter  

Thank you all for the responses.

-Brett I do not have access to a copy of X-ways forensics so I could not try your method.

I eventually was able to mold exiftool to read the entire set of data, once I moved everything into one single directory and then parse the original document metadata date and times I needed to a csv file.

Chad I have MetaExtractor a try and it seemed to work very well, easy to use and does exactly what I wanted. A problem I found was that sometimes it would not totally parse all the files within a specific folder. For example I had a folder with 16 PDF files in it but when I added that specific folder to MetaExtractor it only parsed 6.

Thanks again.


   
ReplyQuote
(@chad131)
Trusted Member
Joined: 16 years ago
Posts: 63
 

Speaking of the tools, is there any reason why Link Parser doesn't parse the shell item ID lists?

When i originally wrote the link parser, I didn't need the shell items. I'll have an update out in the next week or two that fixes it. If you have any files that are good to test with, I would appreciate them.


   
ReplyQuote
(@chad131)
Trusted Member
Joined: 16 years ago
Posts: 63
 

Speaking of the tools, is there any reason why Link Parser doesn't parse the shell item ID lists?

When i originally wrote the link parser, I didn't need the shell items. I'll have an update out in the next week or two that fixes it. If you have any files that are good to test with, I would appreciate them.


   
ReplyQuote
(@chad131)
Trusted Member
Joined: 16 years ago
Posts: 63
 

Chad I have MetaExtractor a try and it seemed to work very well, easy to use and does exactly what I wanted. A problem I found was that sometimes it would not totally parse all the files within a specific folder. For example I had a folder with 16 PDF files in it but when I added that specific folder to MetaExtractor it only parsed 6.

Are the PDF files that it missed corrupt? Can you open them in Acrobat? Also, were the files missing from the output entirely or did they just not have metadata extracted (i.e. file name, path, md5 are there, just no metadata)?


   
ReplyQuote
Page 1 / 2
Share: