±Partners and Sponsors

±Your Account


Nickname
Password


Forgotten password/username?


Membership:
New Today: 0
New Yesterday: 3
Overall: 27135
Visitors: 44

±Follow Forensic Focus

Join our LinkedIn group

Subscribe to news

Subscribe to forums

Subscribe to blog

Subscribe to tweets

Email deduplication

Forensic software discussion (commercial and open source/freeware). Strictly no advertising.
Reply to topicReply to topic Printer Friendly Page
Forum FAQSearchView unanswered posts
 
  

Email deduplication

Post Posted: Wed Jan 16, 2013 7:36 pm

Hello all,

I am new to this site and I have been in the Digital Forensics field for a few months. I recently found out that FTK does not hash emails. I have an email set coming from PST files of an exchange server and from a laptop. I have huge amounts of duplicate emails. Any suggestion on how to dedupe the emails? Thanks in advance. Much appreciated.  

D4CS
Newbie
 
 
  

Re: Email deduplication

Post Posted: Thu Jan 17, 2013 10:41 pm

I have done research and understand why it is forensically unsound to hash email to begin with. My question now is how do you go about dealing with the huge amount of "duplicate" emails on a server. Its my understanding that the same email to multiple people will result in multiple files but how do you deal with duplicates when referring to separate email sets from an email server and a personal computer that goes through that server?  

D4CS
Newbie
 
 
  

Re: Email deduplication

Post Posted: Fri Jan 18, 2013 10:28 am

Hi,
The forensically sound way is to do the same tasks and analysis on both sources. It's not too inconvenient, as email analysis allows heavily automated processes.
But never merge artifacts! In my opinion, the idea so prone to errors, that it isn't even suitable for ediscovery. Consider that your server/client sources serve completely different purposes and are under different human interference. Usually, duplicates should be eliminated when matching individual results in common time lines, link charts etc. Before this, the fact that a communication left traces on two or more systems is information by itself.

-Richard  

C.R.S.
Senior Member
 
 
  

Re: Email deduplication

Post Posted: Fri Jan 18, 2013 1:57 pm

Deduplication of e-mail is a touchy subject.

What are you going to deduplicate on?

In my experience deduplication across multiple mailboxes using to, from, subject, date&time, and sometimes unique ID works, but still fraught with many issues.

For example, date&time - which one? What if there are automagic timezone adjustments by client software? to - is it the verified source, the SMTP "to" field? What about alias, or "sent in name of"?

Experimented with a percentage of content as part of the deduplication, but a simple version change or automatic conversion from HTML to rich text to text would mess the whole thing up. The process requires normalization of all messages to a single format, then deduplicated, then mark the matching originals.

All deduplication methods should be agreed at the meet & confer - and you better be there, or you will end up with a pile of mess on your hand - like agreement to deduplicate a single mailbox . . .  

jhup
Senior Member
 
 
  

Re: Email deduplication

Post Posted: Sun Jan 20, 2013 12:54 pm

Thank you for the replies. Email deduplication seems rather blurred from case to case and an extremely touchy subject. Learned a lot in the process though. Thanks again.  

D4CS
Newbie
 
 
  

Re: Email deduplication

Post Posted: Sun Jan 20, 2013 10:42 pm

I can describe what one CF person did in a matter where it was stated that de-dupe was needed.

They de-duped by message number; I'll elaborate a bit.

Many times messages are listed as Message 01 or Message 001. This person thought that they could just take the first Message 001, and delete all other Message 001's. What difference would it make? There were numerous email addresses, each one having a Message 001. PST's, AOL, Pop 3, it was all there.

Sad day for one side when a CF person is making statements that they've dug their hoof in the ground on, when there are up to 10x more emails than they have missed because of a poorly thought out de-dupe, which spelled the case out in black and white.
_________________
Why order a taco when you can ask it politely?

Alan B. "A man can live a good life, be honorable, give to charity, but in the end, the number of people who come to his funeral is generally dependent on the weather. " 

armresl
Senior Member
 
 
Reply to topicReply to topic

Share this forum topic to encourage more replies



Page 1 of 1