I am looking for a program that can take a large group of assorted files, mainly Word Docs, Excels, Powerpoints and PDFs and extract the Document level metadata and then export that to a csv files.
I do have Pinpoint's Metadiscover tool, but that only handles Microsoft files and it has been hit or miss at times for me.
I also tried metadata-extractor, but could not get the program to run correctly and tried troubleshooting but still failed.
I have also tried the exif gui which allows for the extraction of metadata from a group of files, but does not have a csv output option and also creates an individual output file for each original file analyzed.
Thanks for any help
In X-Ways Forensics,
1) Refine volume snapshot ("Extract internal metadata…")
2) Select the files and and right click to choose "Export list…"
3) Choose the export as TSV
That's it.
There also are a few dozen catagories other than metadata you can add to the export list (type, extension, hash, size, etc..).
EXIFTool is based on a Perl module…which means that you can modify it to do anything you like.
I taught the use of EXIFTool to identify metadata to a group of developers, and one of them turned around and used the module to create a tool to run through web directories, locate image files, and ID and remove all metadata prior to the site going live.
I would suggest that you could do something similar, or ask someone to assist you in doing the same thing.
Take a look at Nuix's Proof Finder http//
You will need to create a custom column profile to display and export the information, but it is able to extract a variety of metadata from a number of file types.
It's $100.00 annually and all proceeds go to charity. It does have a 15 GB data set cap though.
Regards,
Jesse
You could try MetaExtractor.
http//
Disclaimer I wrote the tool.
–
Chad
Chad,
Speaking of the tools, is there any reason why Link Parser doesn't parse the shell item ID lists?
http//
Note I didn't include Link Parser in the testing, in part because my selection of tools was based on what others said they used. However, I did run the tool separately and found that it does not parse the shell item ID lists.
Thank you all for the responses.
-Brett I do not have access to a copy of X-ways forensics so I could not try your method.
I eventually was able to mold exiftool to read the entire set of data, once I moved everything into one single directory and then parse the original document metadata date and times I needed to a csv file.
Chad I have MetaExtractor a try and it seemed to work very well, easy to use and does exactly what I wanted. A problem I found was that sometimes it would not totally parse all the files within a specific folder. For example I had a folder with 16 PDF files in it but when I added that specific folder to MetaExtractor it only parsed 6.
Thanks again.
Speaking of the tools, is there any reason why Link Parser doesn't parse the shell item ID lists?
When i originally wrote the link parser, I didn't need the shell items. I'll have an update out in the next week or two that fixes it. If you have any files that are good to test with, I would appreciate them.
Speaking of the tools, is there any reason why Link Parser doesn't parse the shell item ID lists?
When i originally wrote the link parser, I didn't need the shell items. I'll have an update out in the next week or two that fixes it. If you have any files that are good to test with, I would appreciate them.
Chad I have MetaExtractor a try and it seemed to work very well, easy to use and does exactly what I wanted. A problem I found was that sometimes it would not totally parse all the files within a specific folder. For example I had a folder with 16 PDF files in it but when I added that specific folder to MetaExtractor it only parsed 6.
Are the PDF files that it missed corrupt? Can you open them in Acrobat? Also, were the files missing from the output entirely or did they just not have metadata extracted (i.e. file name, path, md5 are there, just no metadata)?