Severed bunny head
Register | User Profile | Member List | F.A.Q | Privacy Policy | New Blog | Search Forums | Forums Home
Doomworld Forums : Powered by vBulletin version 2.2.5 Doomworld Forums > Misc. > Everything Else > Program for weeding out duplicate files?
 
Author
All times are GMT. The time now is 01:35. Post New Thread    Post A Reply
invictius
Forum Regular


Posts: 681
Registered: 03-06


I'm in the process of downloading every single wad on all the servers here: http://camoyoshi.floorchan.org/master/

So far the total size is 260gb (!!!) with only a few hundred left - 26,000 files so far. However I want to get rid of duplicates - the download manager has been set to skip dupes, but some remain, and of course there will be dupes in my /idgames mirror. Can anyone recommend a file manager that will delete duplicates? I think Norton had something ages ago that did the job, though it wasn't freeware.

Old Post 12-10-13 09:27 #
invictius is offline Profile || Blog || PM || Email || Search || Add Buddy IP || Edit || Quote
Infinite Ammunition
Forum Regular


Posts: 694
Registered: 10-03


www.dupkiller.net/index_en.html

Old Post 12-10-13 12:13 #
Infinite Ammunition is offline Profile || Blog || PM || Search || Add Buddy IP || Edit || Quote
geo
Forum Staple


Posts: 3627
Registered: 10-05


I've done it with php and Flash. Just make it keep a log of files. When there's a dupe... don't download.

Old Post 12-10-13 16:04 #
geo is offline Profile || Blog || PM || Search || Add Buddy IP || Edit || Quote
fraggle
Filled with the code of Doom


Posts: 7827
Registered: 07-00


md5sum

Old Post 12-10-13 16:22 #
fraggle is online now Profile || Blog || PM || Email || Homepage || Search || Add Buddy IP || Edit || Quote
spicyjack
Mini-Member


Posts: 61
Registered: 08-09


I'm guessing you're running Windows, you don't mention in your original post.

Were you on a Linux/Mac, I would say fdupes

http://code.google.com/p/fdupes/

Debian/Ubuntu has a package of it. It will delete duplicate files that it finds, replacing the files with a hardlink (basically a UNIX-y pointer for files), saving you a bunch of space.

There's also teh googles and wikipaedias

https://www.google.com/search?q=fdupes%20windows
http://en.wikipedia.org/wiki/List_o...te_file_finders

For what it's worth, /idgames currently holds 34418 files, at around 32.56G of space.

Old Post 12-12-13 05:40 #
spicyjack is offline Profile || Blog || PM || Email || Homepage || Search || Add Buddy IP || Edit || Quote
Creaphis
I will deliberately take a contrary position just for the sake of writing incredibly long arguments


Posts: 4184
Registered: 10-05


This is exactly what I need for my porn collection.

Old Post 12-12-13 17:55 #
Creaphis is offline Profile || Blog || PM || Email || Search || Add Buddy IP || Edit || Quote
40oz
Why don't I have a custom title by now?!


Posts: 7031
Registered: 08-07


Ive been trying to think of a smart way to archive wads with repeat names. I think I accidentally pasted over a bunch of them when I downloaded the idgames archive and I dont really know how to keep different wads with identical names without renaming them or having a weird network of folders in my wad directory. Wads like HANGAR.WAD, BASE.WAD, CASTLE.WAD, HELL.WAD, etc. Common names like that.

Old Post 12-12-13 23:29 #
40oz is online now Profile || Blog || PM || Email || Homepage || Search || Add Buddy IP || Edit || Quote
GreyGhost
I have a custom title now!


Posts: 8859
Registered: 01-08


I keep my archive mirror separated from the other wads I've collected, though that still hasn't eliminated the need to rename the odd file or few.

Old Post 12-13-13 04:05 #
GreyGhost is offline Profile || Blog || PM || Email || Search || Add Buddy IP || Edit || Quote
Kirby
Senior Member


Posts: 1604
Registered: 10-04



40oz said:
I dont really know how to keep different wads with identical names without renaming them or having a weird network of folders in my wad directory. Wads like HANGAR.WAD, BASE.WAD, CASTLE.WAD, HELL.WAD, etc. Common names like that.
This is slightly related to me, in the sense that I do the same thing, but have not reached that point with WADs so much as I have with the files I organize at work.

At this point, why not append the author's name/last name to the wad file? My logic would say that if you now have wads stacking on your hard drive by the same exact names it wouldn't be a bad idea to include the author's first/last name in the wad file. That way, even if you have CASTLE.WAD (not named because you got it last year) and then CASTLE_FUENTES.WAD, you know which one was done by Mr. Fuentes because you included his name in the actual filename. Denoting which wads belong to which creators helps narrow down the field, and in my mind is a very good way to do so (at least with WAD files and clients sharing similar titles :P)

Granted, its not a surefire way to organize, but hopefully should help. If author names are no good, then look for something else that will help specifically identify a WAD other than it's 8-character name. It still requires you to rename them when you first put them on your computer, but 5-10 seconds of file name changing is a simple price to pay for organization

Old Post 12-13-13 04:13 #
Kirby is offline Profile || Blog || PM || Email || Search || Add Buddy IP || Edit || Quote
Opulent
Senior Member


Posts: 2124
Registered: 07-01


This is one of the things I do everyday.

Before I used a file duplicate checking program, I had over 1 million files of just demos. With demos and wads being posted and re-posted so many times, a duplicate file manager is a must for any file collector.

Unfortunately, solving this issue is not as easy as it would first appear since filenames can be descriptive, zipfiles can have different compression types and levels, and there are more than one popular archive type.

I use a combination of FastDuplicateFileFinder, Lookdisk, and linux scripts using md5 hashes.
FDFF is probably the first basic tool you are looking for. It purely matches whole files for duplicates. You can choose the directories you wish to compare.
LookDisk works the problem in a different way. It does the same thing, but you can choose all the files in a specific directory to be moved/deleted(instead of picking files individually).
LD can also look inside rar/zip 1.0/2.0 files for duplicates.

as someone who has done this for many years, I recommend: 1) having a main master folder that you compare everything new against. 2) don't 'work' on your master copy, only work on a copy of your data until you are sure it is good. 3) segregate your data into smaller chunks -- eg. Zdaemon demos, doom wads, resource wads, etc... 4) use 7zip to unzip large collections of miscellaneous data(misc data, not the idgames archive!) as it can auto-rename files as they are extracted to prevent overwriting. 5) use winrar to rescue broken zipfiles. 6) use winrar to mass re-zip files (so they are all zip2.0 and compressed in the same method -- so they are recheckable later as duplicates) 7) rom-zipper and batchtoolkit both have useful features, but also have limitations that can cause dataloss, so be careful. I use simple linux scripts to do things like zip files together (like av.txt and av.wad into av.zip) 8) BeyondCompare(not freeware) or TreeComp can compare entire directory trees.(this is useful for multiple copies of the Compet-N archive or the idgames archive).
9) back up your data!!!

All windows programs suffer from windows file-handing limitations(especially windows7/8) which can cause dataloss. You've been warned.

good luck

Old Post 01-11-14 22:19 #
Opulent is offline Profile || Blog || Email || Homepage || Search || Add Buddy IP || Edit || Quote
All times are GMT. The time now is 01:35. Post New Thread    Post A Reply
 
Doomworld Forums : Powered by vBulletin version 2.2.5 Doomworld Forums > Misc. > Everything Else > Program for weeding out duplicate files?

Show Printable Version | Email this Page | Subscribe to this Thread

 

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are OFF
[IMG] code is ON
 

< Contact Us - Doomworld >

Powered by: vBulletin Version 2.2.5
Copyright ©2000, 2001, Jelsoft Enterprises Limited.