Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
Linguica

The end of .zip

Recommended Posts

Since 1989, PC users have known and loved the ZIP file format. Doom WADs benefited significantly from the compression, and it quickly became the standard way to distribute WADs (and LMPs) on the internet. Even today with PK3s and PK7s and what have you, the ****.zip filename is still standard operating procedure for distribution.

This is going to become a problem, unfortunately.

Google is moving to become the purveyor of .zip as a new top-level domain, like .com and .net and .world and whatever else. I've heard about this for a while, but didn't really think much of it. Then today I saw this:



Where doom.zip was pointing to a nonexistent http://doom.zip/ . And with that one hyperlink, I realized that the war, if there even had been one, was lost. For 25+ years we've become accustomed to being able to say "blah.zip" and trust that the reader is confident that we're talking about a filename. No longer. As Twitter has already shown, a bare filename ending in .zip is now increasingly likely to be automatically interpreted by your given website as an URL link to an unrelated domain name. And this is sort of a problem for the Doom community, since we've accumulated tens of thousands of ZIP files, and those ZIP files are stored in a public-facing FTP archive, and it's very easy to refer to a file as "d2twid.zip" or what have you.

And now in the future, if you happen to post on a website a link to

http://www.gamers.org/pub/idgames/levels/doom/v-z/wow.zip

You aren't going to know for sure if it's not going to come out the other end like

http://www.gamers.org/pub/idgames/levels/doom/v-z/wow.zip

I just finished rewriting the /idgames database to no longer use the .zip portion of filenames in canonical URLs to files, since they can no longer be implicitly trusted. (Incidentally, I had noticed that all the /idgames database URLs that ended in .zip were not being crawled by Google, even though they were perfectly valid, and Google absolutely refused to do so, even when given a sitemap. Coincidence?)

I'm not sure what the endgame is here. With one fell swoop, Google has basically just assured websites that deal with ZIP files no end of grief; whereas up until now we could safely assume a link ending in .zip was a link to a real file, now we have to do some sort of check for if it's a valid domain name; and worse, now it's possible that a malformed url to a ZIP file will instead lead to a .zip domain name, perhaps one specifically set up to phish or otherwise do bad things to you. Yay technology.

Share this post


Link to post

That was very informative and interesting, thanks for sharing your thoughts on it Ling. It's funny how the Internet and its associated big businesses operate. It still feels like we're in the old West or in its infancy in some regards. It's like there are no rules as to what is and isn't allowed.

Share this post


Link to post

Sooo how about when there are .txt, .wav, .jpg, .png, .rar, .7z, .mov, etc. TLDs? How about .wad? Do we just quit naming files with extensions entirely? >_>

Share this post


Link to post

Interesting stuff, Linguica, but I'm not convinced that the TLD will spell the end of .zip. The heuristics people like twitter use to find and mark up URLs are exactly that: heuristics, and they make mistakes. The question is, how often is foo.zip going to actually be a URL, and how often is it not? That will depend on how popular the TLD becomes, and that's hard to predict. But Doom isn't the only thing using ZIPs, they're still utterly prevalent all over the place. (Personally I prefer them to .tar.gz on UNIX systems, because they have a ToC at the header and you don't need to read the whole file to figure out what's in it)

I would not be surprised if twitter tweaked their algorithm and foo.zip fell back to not being a URL.

(and damn Google for jumping on the goldrush bandwagon with these nonsense TLDs. I can forgive .xyz for being a) cheap and b) meaningless, which pokes the whole unrestricted TLD nonsense in the eye a little bit.)

Share this post


Link to post

Why are the retards on google registering a tld with the same name as one of the most popular file extensions? Maybe the most popular, actually, after txt I guess.

Share this post


Link to post

What's next? A ".com" TLD? How are we supposed to know the difference between an executable and a domain!?

Share this post


Link to post
chungy said:

What's next? A ".com" TLD? How are we supposed to know the difference between an executable and a domain!?

You laugh, but that's been an actual issue in Windows since IE 4.0 was integrated with the shell ;)

Share this post


Link to post

I guess I can't tell people to look up newgothic.zip anymore. Not that I normally add the .zip to it. From now on I guess people are going to make databases with .7z files instead.

Share this post


Link to post
Quasar said:

You laugh, but that's been an actual issue in Windows since IE 4.0 was integrated with the shell ;)



Has it?

I couldn't resist testing this, typing 'command.com' into the address bar of Windows Explorer and the results were what sane people would expect:

- a file with the given name exists in the path: This file gets launched.
- no file exists: Explorer tries to open a website with the given name.
- Internet explorer always opens the website

So obviously the local variant is given precedence, but only in Windows Explorer. Doesn't sound like a problem to me.

Share this post


Link to post

That's why just looking at a "%s.%s"-type string should not lead to assumptions about what it is.

It could be...

  1. A filename
  2. A URL (but without a protocol prefix such as ftp://, http:// etc., it should NOT be assumed it's a URL, even though in the context of a tweet it may make sense)
  3. An object/struct field-access statement in many programming languages
  4. etc.
It's a typical context-dependent interpretation problem, and there's no single "correct" approach (other than leaving it alone and not trying to guesstimate. Screw "user friendliness". If a luser thinks it's an URL, well, have him copy & paste it himself).

VGA said:

Why are the retards on google registering a tld with the same name as one of the most popular file extensions? Maybe the most popular, actually, after txt I guess.


Well, after all, there are only 17,576 TLAs. The most obvious "victim" is the .com extension, which however in modern OSes it's all but a legacy. I don't know if it's even possible to create a .COM executable with modern compilers. Also, what about the Aminet habit of storing files extension-first? E.g. "zip.doom", "bin.data", "mod.music" etc.

Share this post


Link to post
Linguica said:

I think, any link parsing software should be smart enough to notice that org is the top level domain here. And with the http at the beginning, a parser should even recognize unknown top level domain names. In fact, this forum automatically forms a link with anything beginning with or http://www. and is also smart enough to get something like http.pk7.org/www/http/7z/rar/ftp/zip/www.zip right.

Incidentally, I had noticed that all the /idgames database URLs that ended in .zip were not being crawled by Google, even though they were perfectly valid, and Google absolutely refused to do so, even when given a sitemap.

Now that is too stupid not to be intentional. Most retarded decision ever.

At least, there's also an URL ending with .txt accompanying each entry, so they can still be found.

Share this post


Link to post

Obviously the Doom community needs to take ownership of the .wad and .pk3 TLD.

Share this post


Link to post
Maes said:

The most obvious "victim" is the .com extension, which however in modern OSes it's all but a legacy. I don't know if it's even possible to create a .COM executable with modern compilers.

It's not only a file extension, but also used for serial ports. If you try to use it as a file name on DOS or NT based systems, you'll be in trouble. It may even hang the computer under certain conditions. If you were really mean to DOS and Windows users, you could create a zip file in any competing OS and name all the files after DOS ports.

Share this post


Link to post
LogicDeLuxe said:

It's not only a file extension, but also used for serial ports. If you try to use it as a file name on DOS or NT based systems, you'll be in trouble. It may even hang the computer under certain conditions. If you were really mean to DOS and Windows users, you could create a zip file in any competing OS and name all the files after DOS ports.


Heh I forgot about those, but I think they almost always use 4-letter identifiers (COM1, COM2, LPT1, LPT2, NULL, etc.)

Share this post


Link to post
Gez said:

Obviously the Doom community needs to take ownership of the .wad and .pk3 TLD.


Last I checked it was around $150,000 to submit a request for a TLD (which might not be granted).

Doom community kickstarter? :)

Share this post


Link to post
Maes said:

Heh I forgot about those, but I think they almost always use 4-letter identifiers (COM1, COM2, LPT1, LPT2, NULL, etc.)

Well, you're right. You can name a file com in Windows. It refuses to name a file something like com1.txt, though.

Share this post


Link to post
LogicDeLuxe said:

Well, you're right. You can name a file com in Windows. It refuses to name a file something like com1.txt, though.


Heh, there's something I didn't know. FWIW, the name of the null device is "NUL". Other reserved TLAs include AUX, CON and PRN.

Share this post


Link to post

interesting article, Linguica and this decision of naming .zip a domain just sound really dumb.

As other people says, the .zip extensions is like the most used for data compressing, and lots of people use that, i really hope that google will change their plans...

EDIT: just tested if zip can be downloadable or if gave me error and on my phone the link works, but not if i write file.zip in the URL bar, sending me to a .zip inexistent domain...

EDIT2: oh boy, there is a .mov domain too? Sure is a pretty rare video format but this just silly...

Share this post


Link to post
walter confalonieri said:

EDIT2: oh boy, there is a .mov domain too? Sure is a pretty rare video format but this just silly...


There'll be a TLD for any popular file extension in short order. What's silly is over-eager link-finding heuristics.

Share this post


Link to post
Maes said:

I don't know if it's even possible to create a .COM executable with modern compilers.



You can rename any .exe into .com and it still will launch. The .com format is indeed dead as it's strictly DOS.

Share this post


Link to post
Graf Zahl said:

You can rename any .exe into .com and it still will launch. The .com format is indeed dead as it's strictly DOS.


Out of curiosity, I explored a few ".com" files that still exist in the deep recesses of the Windows root directory (Windows 7, 64-bit SP1, for the record) with a hex viewer: they all have the "MZ" executable header, so I guess that this makes them technically ".exe" files, and that ".com" is simply considered as an alias for ".exe". Besides, I don't think a true .com file with 16-bit x86 code and exclusive 1-segment access could work under a 64-bit OS.

Share this post


Link to post
Graf Zahl said:

You can rename any .exe into .com and it still will launch.

Just tried that. Does even work on 64 bit. I'm surprised that it even recognizes that extension and also tries to execute it, eventhough it doesn't support 16 bit software to begin with.

Share this post


Link to post

And this comes up after all the other good file extensions are taken as TLDs, too! If we had only had more foresight, we could've staked out a claim to .meme, .blackfriday, and .wang before they were all snatched up by the unrelenting corporate machine.

Share this post


Link to post

Hmm, indeed, I don't see THAT much use of zip files these days. Applications are downloaded as .exe, .msi, .dmg, .deb, .rpm. PDFs and other media are downloaded directly. You can't even download a .zip on an iOS device unless you have an app for it. And haha: if there's a need to package multiple files, laymen tend to use .rar files more often than .zip!

Someone from this community should make or buy "doom.zip" as soon as possible! Even more, why not just grab them all?

Share this post


Link to post
ptoing said:

Or just use .7z which has a better compression rate than zip as well. \o/


or .xz that has better than 7z, or.... perhaps compression rate isn't the most important factor. (decompression *speed* is often pretty important too. And compatibility.)

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×