Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
Sign in to follow this  
spicyjack

HOWTO: Make your own idgames mirror

Recommended Posts

Ever wanted to make your own copy of the idgames mirror?

Instead of downloading a few WADs here and there, why not just grab the whole thing?

I use a command line tool called 'wget' to mirror the idgames archive(s). I'm sure there's GUI equivalents for people who are afraid of a command line, but I'm only going to explain here what I know how to use.

I have a copy of both the idgames and idgames2 archives on my server at my house, and using the below script at regular intervals, I get all of the new uploads as well. The first download will take you a while, so be prepared. Subsequent downloads will only get you the things that showed up on the mirror after the last time you ran this script.


Here's the script I run:

/usr/bin/wget \
    --verbose \
    --mirror \
    --wait=2 \
    --random-wait \
    --no-host-directories \
    --cut-dirs=3 \
    --directory-prefix=/home/ftp/doom/idgames_mirror \
    --dot-style=binary \
    ftp://ftp.fu-berlin.de/pc/games/idgames/
PLEASE PLEASE PLEASE PLEASE don't abuse this command. With the options above, the wget program will connect to the mirror server with a random interval of between 1 and 3 seconds between each connection. If you shorten the wait time between queries, you'll hammer the server, and that usually makes server admins angry.

Current mirrors according to the README file:

ftp://ftp.fu-berlin.de/pc/games/idgames/
ftp://ftp.chg.ru/pub/games/idgames/
http://youfailit.net/pub/idgames/

About the wget options:
    * --cut-dirs removes that many directories after the server's hostname; in this case, I wanted to get rid of /pc/games/idgames and substitute it with my own directory structure.
    * --no-host-directories will prevent wget from using the server's hostname as part of the directory structure on your local filesystem.
    * All of the other options should be self-explanitory.
wget comes with pretty much every Linux distro. Windows users can try a few links at [1] for Windows downloads. Mac-o-philes can use MacPorts or Fink to grab a copy, I think by default OS X comes with curl, which is a similar tool.

NOTE: The one thing I haven't worked out (yet) is that files that disappear from the mirror do not get deleted on my local filesystem. This means that over time, your local copy of the mirror will build cruft in the form of files that were deleted from the mirror but still remain on your hard drive.

Now that I have my own copy of the mirror, I ran a program that detects duplicate files using filesize and MD5 checksums. I used fdupes, installed from the Debian package archive, so it's available for Ubuntu users too. For the curious, here's the output of fdupes after I just mirrored today:

http://doom.spicyjack.com/idgames_mirror_fdupes.log

Run with:
fdupes --recurse --sameline --size idgames_mirror/ \
 | tee ~/idgames_mirror_fdupes.log
Thanks to the mirror maintainers, and all of the people who host disk space and bandwidth so that we can grab this stuff. If you look at the fdupes file above, there's not a lot of duplication.

Here's the size of my mirror directories; I mentioned above that I haven't figured out pruning files that no longer exist on the mirror, so your disk usage sizes will most likely be different (less).
$ du -sh idgames_mirror/ idgames2_mirror/
29G     idgames_mirror/
4.7G    idgames2_mirror/
Edit: forgot to include the link to sites with Windows binaries: http://tinyurl.com/yec6k8j

Share this post


Link to post

Nice little tutorial.

I don't know of any GUI frontend for wget. It would be nice just to run a query to see a listing of available URLs from an individual index. A quick search of Firefox plug-ins shows DownThemAll and SpiderZilla ... the pictures seem to confirm that these utilities are quite similar to wget.

I've never tried setting up a mirror using rsync, but it doesn't sound impossible. Rsync is a GPL tool for maintaining a backup archive of a filesystem and it updates modified files as well as removing orphans automatically. I've just never found the time to put it to work on my system.

Share this post


Link to post
Toughguy said:

I've never tried setting up a mirror using rsynch, but it doesn't sound impossible. Rsynch is a GPL tool for maintaining a backup archive of a filesystem and it updates modified files as well as removing orphans automatically. I've just never found the time to put it to work on my system.


That's rsync no h, and in order to use rsync, you'd have to ask the people who own the mirrors for permission. The server admins on the server to be mirrored would have to set up either an rsync server or ssh access to the server so that you could use rsync to slurp up the data.

When you run rsync, the client rsync process talks to an rsync process on the server, and between the two of them, they work out the data that gets transferred. Using rsync would also take care of deleting files on the client that don't exist on the server, which I mentioned previously wget can't do.

I use rsync a lot actually... but in this case, plain wget works well because 99% or so of the Doomworlders only have access to the mirror server over HTTP/FTP, i.e. they lack God Mode on the server to be mirrored :)

Share this post


Link to post
spicyjack said:

NOTE: The one thing I haven't worked out (yet) is that files that disappear from the mirror do not get deleted on my local filesystem. This means that over time, your local copy of the mirror will build cruft in the form of files that were deleted from the mirror but still remain on your hard drive.

There's ftp://ftp.fu-berlin.de/pc/games/idgames/fullsort.gz which apparently contains a list of all files in the archive.

Share this post


Link to post
spicyjack said:

That's rsync no h

Fix'd, cheers.

spicyjack said:

in order to use rsync, you'd have to ask the people who own the mirrors for permission. The server admins on the server to be mirrored would have to set up either an rsync server or ssh access to the server so that you could use rsync to slurp up the data.

Oh yeah, that makes sense; that would be a sweet deal. Did you try typing IDDQD at the home page? :D

spicyjack said:

Using rsync would also take care of deleting files on the client that don't exist on the server, which I mentioned previously wget can't do.

That's what I meant by "removing orphans automatically." I was trying to cut down on words but couldn't find something more clear.

Share this post


Link to post

Once your mirrors set up you might like to try out FileZilla's synchronized browsing and directory comparison view modes as a means of locating stray files.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  
×