Hey folks,
I was wondering if anybody has any recomendations for deduplication software. I often have a couple thousand files sitting in a folder and it would be great if I could run the folder through the the software and remove any dupes… Thanks in advance.
This should be pretty trivial to accomplish w/ Perl…
Run through the entire dir using readdir(), and get each file name (although you won't have duplicate names), size and MD5 hash. If there are any dups based on size + hash, flag them, and let the analyst decide to delete them.
thats sounds good… any ideas without using Perl?
Google "Easy Duplicate Finder"
Minesh
I use 'Beyond Compare'
MS
I use fdupes, myself. If you Google "fdupes" and select the Wikipedia entry, you'll find URLs for a number of open source/free deduplicators.
As a nod to Harlan, at least a couple of these are written in PERL.
I know I'm digging up old post but any of you used these item successfully in court?
This should be pretty trivial to accomplish w/ Perl…
Run through the entire dir using readdir(), and get each file name (although you won't have duplicate names), size and MD5 hash. If there are any dups based on size + hash, flag them, and let the analyst decide to delete them.
Why size AND hash? - if the size is different the hash will also be different
Two files can be the same size and have different hashes, indicating different content, and therefore the files are not duplicates.
If I had to go to court, I'd probably do this with EnCase but I'd also be confident doing it as Harlan describes.
-David