Hi,
I am a student studying my final year at Northumbria University. As a project idea I have looked into the possibility of doing a PST analysis tool that will allow me to extract each email from a PST file created using the newer file format (Outlook 2003-).
I have dumped a PST file into Encase and been able to parse the information that way through the GUI however when looking at the hex of the PST there does not seem to be any structure to the file that I can see. I understand from the MSDN libary that the PST file uses COM Structured Storage and despite reading the the documents I still cannot work out a logical way of parsing the file.
Is anyone aware of any research in any format which may be able to help me with my studies?
Regards,
Matt
The PST structure is fairly complex. For a starting point, take a look at the LibPST project.
Thank you for the info juddlawr - I have looked through it and no doubt it will be invaluable to me and it has already helped me somewhat.
An issue I have however came across today is I created a standard PST file in Outlook (no encryption, compression, passwords etc) and created a single folder and added an email to that folder from my inbox. However using keyword search in Encase I have been unable to find any matches for a single word or string I know was included in the email.
Is anyone aware of how they are stored within the PST file? The site suggested above provides a lot of insight to getting certain details from the PST however as of yet I am unable to find anything relating to the message body (i.e. the message itself).
Thanks,
Matt
Hi Matt,
Did you have the PST file mounted in EnCase before the keyword ?
Cheers Dave
(Surprised you can concentrate on your studies with the panto at the toon at the moment!!)
Dave,
The PST file was not mounted at first when doing the keyword search which did not return the keywords. Upon mounting the PST file to Encase this allowed the keywords to be found as Encase parsed the file.
Cheers,
Matt
(Champions league in 2 seasons… no worries! P )
PST parsing is a massive undertaking. The text is normally encoded (PST "codepage") This will take you 4+ months of work. Lib PST is not complete and has issues.
Also you'd be re-inventing the wheel.
You can use visual basic to extract information from PST files.
We took a copy of the PST and wrote some trivial code to extract the attachments.
Simple google shows sample code, search results