Idea: METADATA lump for Slade and Doom Builder

DaniJ · May 15, 2015

Sorry kb1 but I must say that I think you are barking up the wrong tree entirely here and consequently, your proposed solution is unworkable in a practical sense due to it's sheer complexity and number of prerequisites.

You state that you want to have confidence in the accuracy of the data and prevent mishaps due to user error. However, your format introduces so many vectors for this to occur that I fail to see how it addresses the problem. For example, storing modification dates, MD5 hashes etc... in the METADATA - why? A lot of this kind of information is already present in the data when using a format like ZIP. (I appreciate that not all ports currently support ZIP).

Reliance on a single editor for modifying METADATA is a completely unnecessary requirement if one instead uses a format which inherently prevents user error by virtue of its fundamental design. Not only does this increase complexity at a design level but also means you now have a specialist tool that must be "ported" to any number of platforms in order to use your format.

Rather than try to "converge" the metadata of separate logical components (e.g., maps by different authors) into a single METADATA file, this problem can be solved far more elegantly by introducing the "component" concept into the design of the format itself. In other words, each component has its own METADATA and the author(s) of each component is responsible for ensuring the accuracy and integrity of their specific component.

Such components can then be combined dynamically and METADATA conflicts brought to light automatically.

kb1 · May 16, 2015

The only real complexity (other than explaining the concept) is in dropping MD5 support in, and calling it. Sure, you can avoid verifying the lump, but then it cannot be trusted.

The format doesn't introduce any error that is not inherently already there. It is the minimum complexity required to get the job done, in a way that doesn't completely suck. Actually, it's pretty darn simple, if you leave out the name/value pair custom property idea.

It would work perfectly if the editor supports it, and will maintain itself as much as possible in editors that don't support it. That functionality comes as a freebie.

It's the best you can do without dedicated fields in a new WAD format.

I am basically trying to provide a way to handle the OP's idea, which appears to have the potential to provide some benefits.

DaniJ, you say that my 6-field flat file lump is complex, yet you describe a separate as-of-yet-undefined different-per-component format - how on Earth could an editor support that, and how is that less complex?

You've gone this route in multiple discussions, anytime someone starts to describe a simple plan to try to standardize a functionality across ports. I wonder why that is... I think I know why: Is it because you've implemented a lot of these things into your port already?

I think it annoys you that your custom formats have not been adopted by other ports. If I'm wrong, forgive me. But, I have to ask: Did you collaborate with others when you were designing those formats? Cause that's what I am trying to do - collaborate, so we can reach an agreement, and get some standards in place.

Yet, even on something this simple, things get over-engineered to the point that no one wants to adopt it.

Please don't do that to this discussion.

If there are valid reasons why my idea is non-optimal, I will recognize those, and I will work with anyone to remedy the situation. Essentially, it's not my idea, but I chose to offer a possible solution. I offered explanation as to why certain requirements were necessary, and what the benefits were.

If anyone can devise a different solution that is as reliable, as easy to implement, and has no additional cons, please describe it.

If it's worth doing, it's worth doing right.

DaniJ · May 16, 2015

kb1 said:
@DaniJ - (Questions regarding my motivation)

I'll address these points first because I think it is important to get these out of the way so we can continue to discuss a potential solution for the original idea, unencumbered by inter-personal disputes and questioning of motives.

You've gone this route in multiple discussions, anytime someone starts to describe a simple plan to try to standardize a functionality across ports. I wonder why that is... I think I know why: Is it because you've implemented a lot of these things into your port already?

Categorically and emphatically - NO.

Let me start by saying that although I do have obvious vested interest in this topic (and indeed the chosen solution) due to my involvement with Doomsday - I can assure you that my only concern here is the long term future of the modding community. In such discussions I try to set aside my interests as a Doomsday developer in the hopes that together we can arrive at a robust solution that has a long term positive effect for the community.

I think it annoys you that your custom formats have not been adopted by other ports. If I'm wrong, forgive me.

Maybe there is a tiny bit of that if I'm entirely honest but the flip side of that is the DOOM community as a whole has not really taken the time to understand solutions we have introduced (only to adopt something similar but with significant interoperability issues down the line).

I certainly appreciate how you arrived at that opinion of me but I can honestly say that it really doesn't matter to me one way or the other whether the wider community adopts a solution of "ours" or not because whatever the outcome I'll have to support it in any case.

Please do not reduce my argument to a simple "NiH power play" move. That is not a factor for me personally and never has been.

But, I have to ask: Did you collaborate with others when you were designing those formats?

Well, at the time these formats were designed back in 2005/6 it would have been completely pointless to try and involve others in the design process because back then, Doomsday was one of the only ports to show any interest in moving away from the extreme constraints of the WAD container format. At the time, other big name ports, like ZDoom, turned down the idea of using ZIP, of using native folders on the local system in a global virtual file system, and more.

I realize you might not be fully aware of the history but please don't think that this has much bearing on matters as from my perspective at least - this is all water under the bridge and we all must now deal with the interoperability fallout which has occurred since.

Yet, even on something this simple, things get over-engineered to the point that no one wants to adopt it.

Please don't do that to this discussion.

If it's worth doing, it's worth doing right.

I must say that these points seem fundamentally and diametrically opposed to me. Surely if its worth doing right then a robust and scalable solution that can stand the test of time is the right way to go?

At the end of the day, it is far less painful to discuss the merits of a proposed solution and to iteratively refine it than it is to deal with the fallout of a failed, short-sighted implementation.

Edit: It is now very late here so I'll respond with regard to the technical aspects of your solution later.

Gez · May 16, 2015

Including the hash or checksum seems necessary to me, as a way to check that the corresponding lump was last modified with a utility supporting the metadata feature. If that control value doesn't fit, then the lump was modified and the rest of the metadata cannot be considered reliable anymore.

Including the username is something I am a bit wary about, though. Should it be the username as taken from the operating system? Or a username that people have to choose in their editor config? In any case, there's no way to prevent ten different people from inputting "fred" as their username.

DaniJ · May 16, 2015

kb1 said:
The only real complexity (other than explaining the concept) is in dropping MD5 support in, and calling it. Sure, you can avoid verifying the lump, but then it cannot be trusted.

The complexity comes not from the design of the lump (though I have issues with that too) but from the fact it exists in the first place and that a specialist tool is required in order to maintain it.

The thinking here doesn't make complete logical sense. On the one hand you say this data is 'optional' and that unsupporting editors can therefore ignore it; on the other you say that editor is required to ensure the integrity of the data, recalculating hashes and updating the metadata to keep it "in sync".

Looking at this in simple terms, such a METADATA lump is ostensibly replicating functionality in WAD which is integral to the fundamental design of all-round better container formats, such as ZIP.

MD5 hashes only become necessary in METADATA because of the need to validate that the metadata remains current to the data which it describes. However, in order to do so in a practical sense it requires a specialist tool, to keep track of edits and update the METADATA lump.

So, if one removes the need to verify that two disjoint pieces of information (disjoint because it is an "optional" augmentation) still correlate then we no longer require the hashes. (Certainly they would likely be useful for other purposes, don't get me wrong, but lets ignore that for now).

This reminds me of another data schema for augmenting WAD with information which is integral in better container formats, called DD_DIREC (introduced all the way back in 2002). This lump introduces a virtual folder hierarchy and removes the character (/length) restrictions for lump names. Like your would-be METADATA lump, the DD_DIREC schema also requires a specialist tool which has to be integrated into one's workflow.

Have you heard of it? I bet you haven't because it never took off. Sure, some people made use of it for a short while, for things like 3D models. DD_DIREC became obsolete the moment ZIP support arrived, because it was simply too cumbersome to use that specialist tool every time one wanted to update the WAD.

Nowadays DD_DIREC support is just baggage that we maintain simply because it was at one time useful.

The format doesn't introduce any error that is not inherently already there. It is the minimum complexity required to get the job done, in a way that doesn't completely suck. Actually, it's pretty darn simple, if you leave out the name/value pair custom property idea.

That is only true if one considers the WAD format and the proposed METADATA solution in a vacuum, detached from a reality in which far better formats exist and which have integral mechanisms for ensuring the integrity of the data they contain. Furthermore, such formats are industry wide standards, with implementations that allow users to easily produce and modify such files, on practically any platform, without the need for niche/specialist tools.

Essentially, the basis of my argument is - "Why use WAD when better solutions exist?"

As I see it, METADATA has been designed starting from a perspective of needing a way to augment the WAD container format. Rather, one should instead use a more modern format as a basis for designing a supplemental metadata schema as this avoids potentially skewing to integrate concepts that otherwise don't need to be considered (case in point - verifying the integrity of and correlation of the elements in METADATA with the lump data).

as-of-yet-undefined different-per-component format

I was actually arguing a theoretical point, which does not require a formal definition or design specification. Its a purely logical argument.

As we've already uncovered, the documentation for the format which I was proposing isn't up to scratch so I can't even direct you there or use it to argue my point. I'm actually on fairly shaky ground given the current documentation shortcomings, so I'm trying to work with what I do have (reason and logic).

Linguica · May 16, 2015

DaniJ said:
Essentially, the basis of my argument is - "Why use WAD when better solutions exist?"

This argument is a nonstarter in this case though. Obviously the WAD format is extremely limited and in a perfect world we'd all use ZIPs or PK3s or whatever. But the whole point is to design something that is entirely backward compatible and can be used with vanilla WADs. If you start talking about different file formats then that's a discussion about source port features.

DaniJ · May 16, 2015

That is a fair point. However, as mentioned in the above post - it surely makes more sense to assume that a format like ZIP is supported by all sourceports and to then design a solution which can deploy metadata to WAD, in a way that is compatible with the core high-level concepts.

Graf Zahl · May 16, 2015

DaniJ said:
it surely makes more sense to assume that a format like ZIP is supported by all sourceports

... as they say: Assumption is the mother of all f*ckups.

And since this assumption is not even remotely true, any work based on it is a waste of time.
(The only active ports with Zip support are, to my knowledge: ZDoom and children, Eternity and Doomsday.)

DaniJ · May 16, 2015

Erm, yeah... I am talking about a logical assumption as the basis for reasoned design. Thats the not same thing Graf.

Graf Zahl · May 16, 2015

What you call 'reasoned design' I call 'ignoring reality'.

DaniJ · May 16, 2015

(Sigh). Funny, I was thinking the exact same thing about METADATA (as I explained, earlier). But lets not digress... instead we should try a different approach because clearly we have reached an impasse.

Would you agree that a single, shared solution for mod metadata which can be deployed to any container, regardless of format, is a worthwhile design trait? (I'm going to assume yes.)

Let us also assume that we have already solved the problem of deploying shared solution to WAD as well as ZIP.

Would you agree that a binary representation for metadata is the right choice, or, a textual representation is better and fundamentally more extensible and easier to work with?

Graf Zahl · May 16, 2015

Personally, I don't see much use in the metadata stuff to begin with.

But if some people want it, it needs to be discussed under realistic conditions.
And under such conditions it should be:
- one unified solution
- simple to implement
- in text format

I don't see much use in a binary format, since this is supposed to be informative it should be in a readable form. I'd also ignore the possibility of some people deliberately destroying it. If too much focus is put on such fringe situation the necessary focus will be lost.

DaniJ · May 16, 2015

Good, we have reached a common understanding to build upon.

So on that basis, does it make sense to introduce MD5 hashes, modification timestamps, et. al into the unified metadata solution for the purpose of deploying to WAD?

esselfortium · May 16, 2015

Given that modification timestamps are one of the basic goals outlined in the first post, it would be pretty strange to pursue a metadata system without timestamps. :P

The point is to know when something was last edited and to be able to have optional tags automatically follow resource lumps from one wad to the next, done automatically by Slade/Doom Builder without the user needing to manually do it. If it only works on certain port-specific formats or requires the user to manually transfer it (no one will bother doing this), it fails its primary goals.

DaniJ · May 16, 2015

I agree, Essel. The point is not whether that information is necessary, rather it is whether it should be included in a unified metadata representation that will be deployed to many different containers.

If we agree that the WAD format is the weakest link - i.e., it does not support an integral mechanism for tracking data modification - then does it makes sense to encumber the unified metadata representation to hold that information?

Would it not be better to use an additional lump for WAD deployment which marries the unified metadata representation with the few bits of information that WAD lacks, when compared to other deployments such as ZIP?

esselfortium · May 16, 2015

Possibly? :) I'm not entirely sure what you're suggesting. Do you mean something like the lump you mentioned that would give directory structure and long lump names in WADs? I do like the idea of that, but it gets further away from something that would only need to be supported by a few tools and could be safely ignored by the game engine and anything else.

Graf Zahl · May 16, 2015

DaniJ said:
Would it not be better to use an additional lump for WAD deployment which marries the unified metadata representation with the few bits of information that WAD lacks, when compared to other deployments such as ZIP?

That negates 'simple'.

DaniJ · May 16, 2015

Something along those lines yes. At least from my perspective, it makes sense to keep those specific pieces of information out of the unified metadata and to instead, deliver it in the only place it is required - in WAD.

Graf Zahl · May 16, 2015

That might be correct if there was some guarantee that every tool preserved that information in Zips.

DaniJ · May 16, 2015

I am inclined to think that its unreasonable to consider the implications of broken ZIP implementations. Why encumber the design of a unified metadata solution on the off chance that somebody somewhere has failed to implement the well-defined ZIP standard correctly?

Graf Zahl · May 16, 2015

Why encumber it by being selective about the container format?

DaniJ · May 16, 2015

How am I being selective with regard to the choice of container format? As we have already established, our unified text format representation could be deployed to any container, on the assumption that there is no direct dependency on the requirements of one container.

Surely, that is true of METADATA but not of our (theoretical) unified text representation?

Gez · May 16, 2015

DaniJ said:
Looking at this in simple terms, such a METADATA lump is ostensibly replicating functionality in WAD which is integral to the fundamental design of all-round better container formats, such as ZIP.

Yes. Which is why I think the first step is to determine what kind of metadata should be stored.

If it's only metadata that is already handled by zips, then it becomes tempting to say "just use zips".

kb1 suggested a lot more Doom-specific information, some of which could be hard to standardize. (How would you say "this map uses jumping and arch-vile ghosts" in a way that every active port could understand?)

Several of the suggested metadata doesn't feel, to me, like the province of tool-generated information. Map compatibility options and port requirements seem to me to be something that should be in the text file, and attempting to have tools generate that information would lead to require bloating tools with tons of fallible heuristics or prompt the user with annoying forms to fill when creating a new map.

Linguica said:
But the whole point is to design something that is entirely backward compatible and can be used with vanilla WADs.

You can always go the Freedoom route and have a development repository tracked by a real versioning system, plus an automated script that spits out a wad at the end.

DaniJ · May 16, 2015

I absolutely agree Gez. That is the point behind my argument. METADATA as it stands, is a short-sighted and WAD-centric view of the world. Naturally, focusing solely on ZIP is also narrow minded however, as I argued earlier, it makes more sense to use this as a basis for design than to use WAD (because of the implied specialist tools required to maintain it).

Graf Zahl · May 16, 2015

DaniJ said:
How am I being selective with regard to the choice of container format?

Because all additional informations Zips store is the file modification timestamp. And adding metadata just for that is indeed a waste of time.

But let me restate, I think the entire idea is fundamentally wrong if it's intended to help track down changes to the data.

If someone wants that I agree with Gez that going the Freedoom route of using a real versioning system is what should be done, even if it requires alteration of the editing workflow. But you'd get all the relevant info virtually for free plus a complete history of all changes.

I also get the feeling that the understanding of 'metadata' that different people have here is not the same, so we really need to get some consensus of what we actually want here before starting to discuss the fine details.

esselfortium · May 16, 2015

I just realized that I previously missed kb1's long post at the bottom of page 1. I think what's being suggested there makes sense!

One unfortunate thing I see about that proposal, though, is that by using the size and md5 to match metadata entries to lumps, rather than only to verify that the lumps haven't been changed since the listed date, it means that a lump's metadata is lost entirely whenever it's edited in with an unsupported tool.

That then raises the question of how else they could be matched to lumps, though. I suppose that Slade could potentially be made to attempt to match up any orphaned entries based on lump names and ordering.

@Graf Zahl: A versioning system does not solve the same problem as METADATA at all, because users need to opt into a versioning system, and more importantly, versioning information is not transmitted from one project to the next. With METADATA-equipped tools, for instance, you can go back to a resource wad you threw together carelessly a year ago and see exactly where you got each texture from so that you can put together a reliable credits list. And if the wads you got the textures from were also METADATA-equipped, you might even have specific author information for each individual texture, rather than having to guess about who made what based on a vague list in the accompanying textfile.

As for your other question, in my opinion the METADATA lump is solely relevant to editing tools, not to source ports. It's for tracking timestamps and authors and origins of lumps, not for defining anything that would be useful ingame. We already have more than enough mechanisms for setting up game-related settings.

DaniJ · May 16, 2015

My understanding of worthwhile metadata in this context is information concerning the "who and why" aspects, and delivering it primarily on an FYI basis to users.

I agree that using it as little more than a vehicle for delivering "what and when" is the wrong outlook and does indeed fall into the realm of duplicating integral functionality already present in better container formats.

Linguica · May 16, 2015

I feel like referring to this idea a METADATA lump is bringing a lot of unnecessary baggage and assumptions. It might be better to consider referring to it as a SOURCES lump or something else like that, to make it clear that this is not intended as anything more than a convenient way to store basic information about the provenance of resources.

To draw an analogy... as it is, most WADs come with a text file that more often than not has a lot of relevant information in a more-or-less standardized format (the /idgames archive web frontend wouldn't exist if this wasn't the case). But to the best of my knowledge, no source port has never tried to detect it and integrate it in any way, or tried to lobby to have it extended*, because it's always been assumed it's just a nicety that people usually try to include, and can't be assumed to always be reliable.

Which, as far as I can tell, is basically what this idea is meant to be - a nicety that we would like people to include, albeit integrated directly into the WAD file for convenience.

*beyond the addition of a "advanced engine needed" field near the top, I suppose.

DaniJ · May 18, 2015

As we now seem to be (more or less) on the same page with the general idea and have a rough handle on the design of a good potential solution - lets try to define in more concrete terms which elements of metadata should be included and why.

As Ling pointed out, the text files usually found in mods on /idgames is a great place to start, so lets start by looking at the questions asked by the Text File Generator and see if there is anything that shouldn't be there, or conspicuously missing that would be useful, in a standardized representation supported by many source ports.

The first thing that stands out to me - as being tricky to generalize - is the breakdown of contents; Graphics, Sounds, Demos, DEHACKED, etc.. - I would not like to see equivalent fields in a would-be standardized metadata representation because that is an "open-ended itemization" (what about Models? Particle effects? etc..). This kind of itemization can be answered more correctly through dynamic inspection of the mods contents if necessary. So, if this information is to be conveyed in a standardized representation then it would be best to do so in summary, either within the mod description, or a similar property with a narrower scope (e.g., Resources = "Graphics, models, sounds, stuff...").

DaniJ · May 18, 2015

In the interest of moving the discussion forward I'd like to expand upon/make clear the ideas mentioned in my previous post.

Open-ended itemization
As there is no way of knowing today what types of content (images, 3D models, sounds, etc..) might be needed at some point in the future, it makes sense to use a representation which allows for future expansion. However, there is also a need to allow robots to search the mod metadata, to perhaps compile a list of those which contain specific types of content.

Clearly, we need a mechanism that allows for new "content type identifiers" to be introduced to metadata by anyone at any time.

For this purpose I would like to suggest we introduce a generalized "content tagging" mechanism, as a core concept in metadata.

Content tagging
Essentially this would work through the introduction of a markup language that can be used in text fields/properties. Each content type can be assigned a name which can be used to identify content types anywhere in a field, so as to avoid potentially confusing use of these names in the same context. For example, a mod for a high-resolution sprite pack might refer to "3D models" in a description like "...I created this high-res sprite pack because the 3D models suck!" - obviously we don't want robots to encounter the "3D models" term and include it in a result listing for a search for "3D models", as that is not what the user expects.

A generalized tag could look something like this:

Resources = "Awesome {3D model}s to replace the really old fashioned sprites."

Such markup would allow robots to parse supporting text fields in metadata to determine that the mod contains 3D models, while ignoring the irrelevant use of "sprites".

This kind of open-ended tagging system avoids the need to itemize all possible content types, a priori, while allowing for future extensibility and searching for content types unknown to us at design time.

Sign In

Idea: METADATA lump for Slade and Doom Builder

Recommended Posts

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in