Notifications

Clear all

deduplication software

Page 1 / 3 Next

General (Technical, Procedural, Software, Hardware etc.)

Last Post by indur 16 years ago

30 Posts

16 Users

0 Reactions

1,889 Views

RSS

BornToWriteBlock

(@borntowriteblock)

Active Member

Joined: 17 years ago

Posts: 15

Topic starter 03/10/2008 8:12 pm

Hey folks,
I was wondering if anybody has any recomendations for deduplication software. I often have a couple thousand files sitting in a folder and it would be great if I could run the folder through the the software and remove any dupes… Thanks in advance.

Quote

keydet89

(@keydet89)

Famed Member

Joined: 21 years ago

Posts: 3568

03/10/2008 9:17 pm

This should be pretty trivial to accomplish w/ Perl…

Run through the entire dir using readdir(), and get each file name (although you won't have duplicate names), size and MD5 hash. If there are any dups based on size + hash, flag them, and let the analyst decide to delete them.

ReplyQuote

BornToWriteBlock

(@borntowriteblock)

Active Member

Joined: 17 years ago

Posts: 15

Topic starter 04/10/2008 1:45 am

thats sounds good… any ideas without using Perl?

ReplyQuote

Minesh

(@minesh)

Trusted Member

Joined: 18 years ago

Posts: 75

04/10/2008 1:54 am

Google "Easy Duplicate Finder"

Minesh

ReplyQuote

mas66

(@mas66)

Eminent Member

Joined: 20 years ago

Posts: 21

08/10/2008 7:24 pm

I use 'Beyond Compare' www.scootersoftware.com

ReplyQuote

seanmcl

(@seanmcl)

Honorable Member

Joined: 19 years ago

Posts: 700

08/10/2008 10:18 pm

I use fdupes, myself. If you Google "fdupes" and select the Wikipedia entry, you'll find URLs for a number of open source/free deduplicators.

As a nod to Harlan, at least a couple of these are written in PERL.

ReplyQuote

donven

(@donven)

Eminent Member

Joined: 17 years ago

Posts: 26

05/02/2009 1:38 am

I know I'm digging up old post but any of you used these item successfully in court?

ReplyQuote

PaulSanderson

(@paulsanderson)

Honorable Member

Joined: 19 years ago

Posts: 651

05/02/2009 1:40 am

This should be pretty trivial to accomplish w/ Perl…

Run through the entire dir using readdir(), and get each file name (although you won't have duplicate names), size and MD5 hash. If there are any dups based on size + hash, flag them, and let the analyst decide to delete them.

Why size AND hash? - if the size is different the hash will also be different

ReplyQuote

keydet89

(@keydet89)

Famed Member

Joined: 21 years ago

Posts: 3568

05/02/2009 1:43 am

Two files can be the same size and have different hashes, indicating different content, and therefore the files are not duplicates.

ReplyQuote

kovar

(@kovar)

Prominent Member

Joined: 18 years ago

Posts: 805

05/02/2009 1:47 am

If I had to go to court, I'd probably do this with EnCase but I'd also be confident doing it as Harlan describes.

-David

ReplyQuote

Page 1 / 3 Next

8 Forums
15.7 K Topics
92.3 K Posts
207 Online
41.1 K Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed