±Forensic Focus Partners

Become an advertising partner

±Your Account


Forgotten password/username?

Site Members:

New Today: 0 Overall: 34081
New Yesterday: 0 Visitors: 109

±Follow Forensic Focus

Forensic Focus Facebook PageForensic Focus on TwitterForensic Focus LinkedIn GroupForensic Focus YouTube Channel

RSS feeds: News Forums Articles

±Latest Articles

RSS Feed Widget

±Latest Webinars

Email deduplication

Forensic software discussion (commercial and open source/freeware). Strictly no advertising.
Reply to topicReply to topic Printer Friendly Page
Forum FAQSearchView unanswered posts

Email deduplication

Post Posted: Thu Jan 17, 2013 12:36 am

Hello all,

I am new to this site and I have been in the Digital Forensics field for a few months. I recently found out that FTK does not hash emails. I have an email set coming from PST files of an exchange server and from a laptop. I have huge amounts of duplicate emails. Any suggestion on how to dedupe the emails? Thanks in advance. Much appreciated.  


Re: Email deduplication

Post Posted: Fri Jan 18, 2013 3:41 am

I have done research and understand why it is forensically unsound to hash email to begin with. My question now is how do you go about dealing with the huge amount of "duplicate" emails on a server. Its my understanding that the same email to multiple people will result in multiple files but how do you deal with duplicates when referring to separate email sets from an email server and a personal computer that goes through that server?  


Re: Email deduplication

Post Posted: Fri Jan 18, 2013 3:28 pm

The forensically sound way is to do the same tasks and analysis on both sources. It's not too inconvenient, as email analysis allows heavily automated processes.
But never merge artifacts! In my opinion, the idea so prone to errors, that it isn't even suitable for ediscovery. Consider that your server/client sources serve completely different purposes and are under different human interference. Usually, duplicates should be eliminated when matching individual results in common time lines, link charts etc. Before this, the fact that a communication left traces on two or more systems is information by itself.


Senior Member

Re: Email deduplication

Post Posted: Fri Jan 18, 2013 6:57 pm

Deduplication of e-mail is a touchy subject.

What are you going to deduplicate on?

In my experience deduplication across multiple mailboxes using to, from, subject, date&time, and sometimes unique ID works, but still fraught with many issues.

For example, date&time - which one? What if there are automagic timezone adjustments by client software? to - is it the verified source, the SMTP "to" field? What about alias, or "sent in name of"?

Experimented with a percentage of content as part of the deduplication, but a simple version change or automatic conversion from HTML to rich text to text would mess the whole thing up. The process requires normalization of all messages to a single format, then deduplicated, then mark the matching originals.

All deduplication methods should be agreed at the meet & confer - and you better be there, or you will end up with a pile of mess on your hand - like agreement to deduplicate a single mailbox . . .  

Senior Member

Re: Email deduplication

Post Posted: Sun Jan 20, 2013 5:54 pm

Thank you for the replies. Email deduplication seems rather blurred from case to case and an extremely touchy subject. Learned a lot in the process though. Thanks again.  


Re: Email deduplication

Post Posted: Mon Jan 21, 2013 3:42 am

I can describe what one CF person did in a matter where it was stated that de-dupe was needed.

They de-duped by message number; I'll elaborate a bit.

Many times messages are listed as Message 01 or Message 001. This person thought that they could just take the first Message 001, and delete all other Message 001's. What difference would it make? There were numerous email addresses, each one having a Message 001. PST's, AOL, Pop 3, it was all there.

Sad day for one side when a CF person is making statements that they've dug their hoof in the ground on, when there are up to 10x more emails than they have missed because of a poorly thought out de-dupe, which spelled the case out in black and white.
Why order a taco when you can ask it politely?

Alan B. "A man can live a good life, be honorable, give to charity, but in the end, the number of people who come to his funeral is generally dependent on the weather. " 

Senior Member

Page 1 of 1