deduplication softw...
 
Notifications
Clear all

deduplication software

30 Posts
16 Users
0 Reactions
1,889 Views
BornToWriteBlock
(@borntowriteblock)
Active Member
Joined: 17 years ago
Posts: 15
Topic starter  

Hey folks,
I was wondering if anybody has any recomendations for deduplication software. I often have a couple thousand files sitting in a folder and it would be great if I could run the folder through the the software and remove any dupes… Thanks in advance.


   
Quote
keydet89
(@keydet89)
Famed Member
Joined: 21 years ago
Posts: 3568
 

This should be pretty trivial to accomplish w/ Perl…

Run through the entire dir using readdir(), and get each file name (although you won't have duplicate names), size and MD5 hash. If there are any dups based on size + hash, flag them, and let the analyst decide to delete them.


   
ReplyQuote
BornToWriteBlock
(@borntowriteblock)
Active Member
Joined: 17 years ago
Posts: 15
Topic starter  

thats sounds good… any ideas without using Perl?


   
ReplyQuote
(@minesh)
Trusted Member
Joined: 18 years ago
Posts: 75
 

Google "Easy Duplicate Finder"

Minesh


   
ReplyQuote
(@mas66)
Eminent Member
Joined: 20 years ago
Posts: 21
 

I use 'Beyond Compare' www.scootersoftware.com

MS


   
ReplyQuote
(@seanmcl)
Honorable Member
Joined: 19 years ago
Posts: 700
 

I use fdupes, myself. If you Google "fdupes" and select the Wikipedia entry, you'll find URLs for a number of open source/free deduplicators.

As a nod to Harlan, at least a couple of these are written in PERL.


   
ReplyQuote
donven
(@donven)
Eminent Member
Joined: 17 years ago
Posts: 26
 

I know I'm digging up old post but any of you used these item successfully in court?


   
ReplyQuote
PaulSanderson
(@paulsanderson)
Honorable Member
Joined: 19 years ago
Posts: 651
 

This should be pretty trivial to accomplish w/ Perl…

Run through the entire dir using readdir(), and get each file name (although you won't have duplicate names), size and MD5 hash. If there are any dups based on size + hash, flag them, and let the analyst decide to delete them.

Why size AND hash? - if the size is different the hash will also be different


   
ReplyQuote
keydet89
(@keydet89)
Famed Member
Joined: 21 years ago
Posts: 3568
 

Two files can be the same size and have different hashes, indicating different content, and therefore the files are not duplicates.


   
ReplyQuote
(@kovar)
Prominent Member
Joined: 18 years ago
Posts: 805
 

If I had to go to court, I'd probably do this with EnCase but I'd also be confident doing it as Harlan describes.

-David


   
ReplyQuote
Page 1 / 3
Share: