Idea: METADATA lump for Slade and Doom Builder

DaniJ · May 19, 2015

esselfortium said:
I think our goals might be hopelessly at odds, then, because being able to track things like that was the entire impetus behind this whole proposal :)

Perhaps indeed. I think it rather depends on what you intend to use the history for and how accurate you need it to be.

In the general case, if a long and accurate history is required for tracking edits to files then it is usually best to use a tool designed expressly for that purpose, a versioning system like Git or SVN.

If you only need a rough idea of the history then the metadata could certainly be designed to accommodate that, and a specialist tool like SLADE could be enhanced to update it, if you are working within a ZIP or WAD container. (Perhaps such a tool could even interact with a third party versioning system to compile the edit history and insert it into the metadata for you).

esselfortium · May 19, 2015

Ah, whoops. Crossed wires there. I wasn't referring to the history-keeping stuff, which definitely wasn't part of the original plan and might be overkill for this, realistically. I just meant the Timestamp and Origin fields.

DaniJ · May 19, 2015

Personally, I would leave modification timestamps out of the unified metadata representation because in the general case this is information usually supplied by the underlying native/local file system and/or better container formats such as ZIP. We can find a better way to include this information on a per-lump basis for WAD.

'Origin' either meaning Author or Source, aka provenance, is absolutely information that should feature in the metadata.

DaniJ · May 19, 2015

esselfortium said:
How would it {metadata} be stored if not in a single master record (lump in a WAD, or file in a ZIP)? I can't speak for sirjuddington or Gez, but I imagine managing individual METADATA lumps (or files) for every individual lump (or file) would be a lot more work on the editor-support side.

As we need to avoid a long-winded 1:1 relationship between metadata and data, clearly we need a way to combine some pieces of metadata and have it correspond to multiple data files.

One way achieve that would be using a "master record" style approach, that describes relationships between elements of metadata and their corresponding data file(s).

Another way to do that is to have a single metadata file correspond to one or more data files.

Clearly we want to avoid aggregating metadata for many files into a master record for the reasons we've already discussed. So the better option here is relate a single metadata file to multiple data files.

One way to introduce such a relationship is through the introduction of an organizational structure into the representation, creating a "physical" grouping of metadata and data, and thereby establishing a "scope".

In a native/local file representation we could achieve that by simply introducing a new sub-folder, moving a subset of the data files into it and then adding another metadata file for this subset.

Lets examine whether our other deployment targets support a comparable mechanism. ZIP most certainly does support folders, so there is already a strong argument for taking this approach. However, WAD has no built-in mechanism for structuring data hierarchically. Although, by augmenting it with a data schema like DD_DIREC we could still deploy to WAD as this introduces our hierarchy.

So, using this model, we are able to:
* Depoly the same unified metadata representation to ZIP, WAD and the native/local file system
* Avoid encumbering the metadata with information specific to a single target
* Allow vanilla to ignore both the metadata and the lump which contains the virtual directory structure in WAD
* Allow existing WAD lump editors to extract both metadata and data lump, without needing to understand either or indeed the structure (this responsibility passes to the mod author in this case)
* Allow actively-maintained WAD lump editors to be updated to hide the manipulation of the data schema entirely from the user
* Allow dynamically resolving metadata conflicts by traversing the hierarchy and attempting to unify metadata into a "master record" at runtime

Gez · May 19, 2015

DaniJ said:
Aggregating the metadata for multiple files into a single "master record" introduces many problems. Using a WAD-specific format further compounds these problems.

Which problems?

You have to understand that, if such a thing were implemented, the additional properties (timestamp and source/author/origin string) would be part of the ArchiveEntry class, not the Archive class.

When loading a wad file, that metadata lump is loaded, and the properties are assigned to the entries. When saving the wad file, the old metadata lump is simply forgotten about, and a new one is generated based on the properties on all the lumps in the archive.

Conceptually, this is no different from the other aggregation of metadata for all the multiple files in the archive into a single master record that is already taking place, and is known as the wad directory.

For zip archives, it'd be implemented differently, as zip already supports file metadata: timestamp is an integral part of its specifications, and there just happens to also be a way to store arbitrary custom data as well. So, talking about zips is essentially redundant, the capability is there and standard, all we need to do is agree on the unique identifier we'd use. I suggest 0x4D44 (MD for metadata in ASCII), or 0x0666 just to be cute.

With that out of the way, let's focus on how it'd be stored in wad files.

kb1 said:
Good Lord, I created a shitstorm.

First, a correction: I never said that unsupported editors should ignore the METADATA lump, what I meant was that unsupported editors have no choice but to ignore the lump, because they know nothing about the METADATA lump.

I am glad that some people here get it. Here's the whole idea:

If you use a supported editor, you can tag some additional info to a lump. In the interest of simplicity and universal support, I suggest, at a minimum:

LumpName, Size, MD5, Date_Created, Date_Modified, Editor, Source, Misc

All of these are generated automatically by the editor, except for Source, and Misc, which can be set by the user.

Required fields
LumpName: Name of lump being described. Used with Size and MD5 to validate metadata.

Size: Size of lump being described. Used with LumpName and MD5 to validate metadata.

MD5: MD5 stamp in hex format. Used with LumpName and Size to validate metadata.

That's seriously overkill. All you need is the hash. (I'd be tempted to use CRC-32 checksum instead, because that's already coded in SLADE 3). If two files have the same hash, they are the same file. If I import hitfloor.mid into a wad, it becomes HITFLOOR lump. Renaming it to D_RUNNIN doesn't change its author or source.

And if I replace all the music in Doom II by identical copies of hitfloor, then they all share the same metadata, don't they?

DaniJ · May 19, 2015

I appreciate your argument Gez and it is a fair point. However, what practical purpose does it serve shoe-horning our custom metadata into the integral mechanism provided by ZIP when we'd have to come up with another special mechanism for WAD and yet another for native files?

By definition you are destroying one of the main benefits of a unified metadata representation - i.e., one can easily extract it and depoly it to another container with zero modifications.

esselfortium · May 19, 2015

Gez said:
That's seriously overkill. All you need is the hash. (I'd be tempted to use CRC-32 checksum instead, because that's already coded in SLADE 3). If two files have the same hash, they are the same file. If I import hitfloor.mid into a wad, it becomes HITFLOOR lump. Renaming it to D_RUNNIN doesn't change its author or source.

And if I replace all the music in Doom II by identical copies of hitfloor, then they all share the same metadata, don't they?

I can imagine some exceptions to this, where you'd want to have different metadata on lumps with matching contents. Marker lumps would all have the same size and hash, but can functionally serve different purposes. (For instance, the _MARBLE and similar markers used to categorize patch/flat groups in CC4-tex and BTSX!)

Storing both the name and hash would also help if the tool is attempting to reconstruct metadata in a wad that's been modified by a non-metadata-friendly tool. If the name is the same as an orphaned metadata record and it's the only lump with that name, it's probably the same lump. If the name is the same, there are multiple lumps with that name, and the adjacent lumps are in the same order as the adjacent metadata records, there is a high chance that it's the same lump. If the name has been changed but the adjacent lumps match up, it might be the same lump, but it might then be worthwhile to ask the user before deciding whether to keep or throw out the orphaned record.

Edit: Just thought of a more obvious example where metadata would be useful on identical marker lumps: map lumps.

esselfortium · May 19, 2015

kb1: Your virtual directory concept is definitely less ugly and more sensible than what I had initially envisioned, in which each lump would just have a filepath defined in its metadata. (Ick!) Your way is infinitely less bloated and not a bad start, IMO, if we want to pursue this as a feature.

The need for an extra DIR lump to go back a folder seems a bit excessive, though. If it'd be possible to include multiple entries in a single DIR lump, you could do

..
..
..
+A_FOLDER
+MY_FIRST_SUBFOLDER

in a single lump rather than needing two or more ".." lumps for backtracking, which could start to add up if you have multiple folders in a subdirectory, and would add up even more quickly if you have nested folders.

Gez · May 19, 2015

esselfortium said:
I can imagine some exceptions to this, where you'd want to have different metadata on lumps with matching contents. Marker lumps would all have the same size and hash, but can functionally serve different purposes. (For instance, the _MARBLE and similar markers used to categorize patch/flat groups in CC4-tex and BTSX!)

Marker lumps don't get to have metadata, because I hate them. Suck it, you dumb marker lumps!

Seriously though, why would you want to track the author and the timestamp of a marker lump?

esselfortium said:
Storing both the name and hash would also help if the tool is attempting to reconstruct metadata in a wad that's been modified by a non-metadata-friendly tool. If the name is the same as an orphaned metadata record and it's the only lump with that name, it's probably the same lump. If the name is the same, there are multiple lumps with that name, and the adjacent lumps are in the same order as the adjacent metadata records, there is a high chance that it's the same lump. If the name has been changed but the adjacent lumps match up, it might be the same lump, but it might then be worthwhile to ask the user before deciding whether to keep or throw out the orphaned record.

Yeah but why rebuild something that you can't trust anymore?

Suppose I have a metadata like this:

[D_RUNNIN]
Index=496;
CRC32=d87cbf10;
TimeStamp=1993;
LongName="Running from Evil";
Source="Bobby Prince";

Problem is, I still have a D_RUNNIN at index 496, but its CRC32 is now f5e3e8dd. Do I recover the metadata anyway and assume the title is "Running from Evil" and the author is Bobby Prince?

esselfortium said:
Edit: Just thought of a more obvious example where metadata would be useful on identical marker lumps: map lumps.

Maps will have to have a special handling because logically it's just one element, but it's spread over several lumps.

esselfortium · May 19, 2015

Timestamps aren't particularly important on marker lumps, but the ability to put comments or long names on them would be useful. I guess it's not the end of the world if they can't be supported, but it seems more sensible to include them than not, IMO.

Special handling for map lumps as groups might cause more trouble than it's worth, as there would almost inevitably be some exceptions from the expected rules, and would just confuse things. Plus, while editing the lumps individually isn't exactly common practice, it's not unheard of. (Actually, if BEHAVIOR and SCRIPTS are considered map lumps, those would be good examples of map lumps being edited independently of the others. Other situations that come to mind are just rare hacks to try modifying textures or something without breaking demo compat by rebuilding nodes. :P)

And that's a valid point regarding orphaned metadata entries. It would be ideal if Slade could give the user a prompt to select which orphan entries to keep and remove, guessing the matching lump based on name and order but allowing the user to manually reassign it to a different lump. Tossing out a lump's metadata entirely when it's been edited by a non-compliant tool would be a shame, but I recognize that doing otherwise adds a bunch of complexity and blackmagick into the equation.

DaniJ · May 19, 2015

Once you start trying to introduce arbitrary rules to govern which types of data lump deserve or benefit from metadata you complicate the whole system because now, you are required to inspect the data to make that decision. It should be all or nothing, simple as that.

Linguica · May 19, 2015

Hi I have questions

You come across a WAD with two lumps that have different names, but point (as in: same size, same pointer) to the exact same data. The entries both seem to have valid metadata (i.e. correct CRC), but the metadata has different authors and timestamps for each, despite the data being identical. What happens?
A WAD has a MY_BUTT lump between P_START and P_END, another, wholly different, MY_BUTT lump between F_START and F_END, and a third MY_BUTT near the end not inside any marker lumps. What happens?
A WAD has a MY_DILZ lump immediately followed by another, identical MY_DILZ lump. The metadata for the WAD only has a single, seemingly valid MY_DILZ entry, that could apparently describe either of them equally. What happens? If the tool only associates the metadata with one of the lumps, what happens if you delete that one lump? Is the metadata deleted and the other lump now has none at all?
There is a gimmick-project community WAD where people make levels using the same collection of things. The WAD has a bunch of custom levels, but the WAD is formatted so that the THINGS lump for each level points to the same data. What happens?
Another community project has a bunch of levels, some with REJECT lumps like normal, and some where there are no REJECT lumps for some reason. What happens?
A source port has added the ability to bake lightmaps into custom levels, and they are stored in a custom LITEMAPS lump that is placed right after the BLOCKMAP lump for the level. How does the metadata code handle this?
There is a WAD for a source port that supports arbitrary level names. It has level marker lumps titled LEVEL01, LEVEL02, etc. What happens?
An old 2002 WAD turns out to have a "METADATA" lump (or whatever the name of the special lump turns out to be) where, for some reason, the author had stored the readme. You open the WAD in SLADE, import a new lump, and save it. What happens?
An author has released a WAD where the metadata lump has somehow had all the normal \n linebreaks turned into \r\n (or vice versa, or whatever). What happens?

DaniJ · May 19, 2015

Linguica said:
You come across a WAD with two lumps that have different names, but point (as in: same size, same pointer) to the exact same data. The entries both seem to have valid metadata (i.e. correct CRC), but the metadata has different authors and timestamps for each, despite the data being identical. What happens?

This scenario concerns the WAD deployment target only. In the interest of conforming to the high-level design of the model it does not make sense to consider this scenario as a valid construct (consider that an equivalent situation cannot occur with native files or ZIP). Therefore supporting WAD lump editors should actively try to prevent this situation from occurring. Supporting implementations should detect this scenario and flatly declare it broken and unsupported.

It is unlikely to occur in the wild because a supporting WAD lump editor that understands the metadata and data schema should actively prevent it from occurring. If we try to admit such a thing specifically for WAD then we introduce additional complexity that will need strong justification.

A WAD has a MY_BUTT lump between P_START and P_END, another, wholly different, MY_BUTT lump between F_START and F_END, and a third MY_BUTT near the end not inside any marker lumps. What happens?

This scenario concerns the WAD deployment target only. Implicitly, files with "duplicated" names must be handled when dealing with this target, therefore the data schema used to map metadata to data lump must handle this situation also.

A WAD has a MY_DILZ lump immediately followed by another, identical MY_DILZ lump. The metadata for the WAD only has a single, seemingly valid MY_DILZ entry, that could apparently describe either of them equally. What happens? If the tool only associates the metadata with one of the lumps, what happens if you delete that one lump? Is the metadata deleted and the other lump now has none at all?

Same situation as Q:1

There is a gimmick-project community WAD where people make levels using the same collection of things. The WAD has a bunch of custom levels, but the WAD is formatted so that the THINGS lump for each level points to the same data. What happens?

In this scenario metadata refers to the map entities rather than the contents of a specific THINGS lump. Metadata is still independent. Also, some overlap with Q:1

Another community project has a bunch of levels, some with REJECT lumps like normal, and some where there are no REJECT lumps for some reason. What happens?

This is irrelevant from the perspective of relating metadata to data. It could be mentioned in the metadata for the map but otherwise nothing special happens.

A source port has added the ability to bake lightmaps into custom levels, and they are stored in a custom LITEMAPS lump that is placed right after the BLOCKMAP lump for the level. How does the metadata code handle this?

By ensuring the elements of metadata are defined to exclude open-ended itemizations, this additional lightmap data can either be ignored in metadata, or mentioned using the open-ended tagging mechanism.

There is a WAD for a source port that supports arbitrary level names. It has level marker lumps titled LEVEL01, LEVEL02, etc. What happens?

Nothing special. Metadata is not concerned with naming conventions applied to data files, or what an editor or sourceport infers from them.

An old 2002 WAD turns out to have a "METADATA" lump (or whatever the name of the special lump turns out to be) where, for some reason, the author had stored the readme. You open the WAD in SLADE, import a new lump, and save it. What happens?

Overlaps with Q:1. In this situation the supporting editor should detect the conflict and ask the user what action to take (e.g., discard it, or import it and reformat for inclusion in the unified metadata).

An author has released a WAD where the metadata lump has somehow had all the normal \n linebreaks turned into \r\n (or vice versa, or whatever). What happens?

Linebreaks *should* be considered unified at all stages. If non-conforming linebreaks are detected they should be excluded from the runtime representation but only removed when that lump is next written to storage.

Gez · May 19, 2015

The way I'd implement the stuff if it were entirely my decision:

Linguica said:
Hi I have questions
You come across a WAD with two lumps that have different names, but point (as in: same size, same pointer) to the exact same data. The entries both seem to have valid metadata (i.e. correct CRC), but the metadata has different authors and timestamps for each, despite the data being identical. What happens?

Not possible. Having identical data would give them the same checksum, and therefore the same metadata.

Linguica said:
A WAD has a MY_BUTT lump between P_START and P_END, another, wholly different, MY_BUTT lump between F_START and F_END, and a third MY_BUTT near the end not inside any marker lumps. What happens?

As long as they have different content, and therefore a different checksum, they all get their own metadata, and that they have the same name is irrelevant.

Linguica said:
A WAD has a MY_DILZ lump immediately followed by another, identical MY_DILZ lump. The metadata for the WAD only has a single, seemingly valid MY_DILZ entry, that could apparently describe either of them equally. What happens? If the tool only associates the metadata with one of the lumps, what happens if you delete that one lump? Is the metadata deleted and the other lump now has none at all?

The same metadata describes both. If you delete one lump the other lump still has that metadata. Keep in mind that the metadata lump is only relevant when opening and when saving the archive.

Linguica said:
There is a gimmick-project community WAD where people make levels using the same collection of things. The WAD has a bunch of custom levels, but the WAD is formatted so that the THINGS lump for each level points to the same data. What happens?

Identical lumps have only one checksum, so they only get one metadata entry.

Linguica said:
Another community project has a bunch of levels, some with REJECT lumps like normal, and some where there are no REJECT lumps for some reason. What happens?

Nothing special. It works. SLADE 3 already supports maps that don't have a reject lump.

Linguica said:
A source port has added the ability to bake lightmaps into custom levels, and they are stored in a custom LITEMAPS lump that is placed right after the BLOCKMAP lump for the level. How does the metadata code handle this?

Just the way it'd handle any other lump. Open SVE.wad in SLADE 3 if you want an example of how litemap lumps are handled.

Linguica said:
There is a WAD for a source port that supports arbitrary level names. It has level marker lumps titled LEVEL01, LEVEL02, etc. What happens?

Again, nothing special. It just works.

Linguica said:
An old 2002 WAD turns out to have a "METADATA" lump (or whatever the name of the special lump turns out to be) where, for some reason, the author had stored the readme. You open the WAD in SLADE, import a new lump, and save it. What happens?

Parsing fails, so metadata lump is ignored. When saving the wad, another metadata lump is created at the end.

Linguica said:
An author has released a WAD where the metadata lump has somehow had all the normal \n linebreaks turned into \r\n (or vice versa, or whatever). What happens?

As long as it's whitespace, it's ignored by the parser, so it still works.

If the extra characters are not whitespace, parsing fails and the metadata is ignored and treated as a "normal" lump, so it'll get its own metadata in the new metadata lump that'll be created. :p

Linguica · May 19, 2015

DaniJ said:
This scenario concerns the WAD deployment target only. In the interest of conforming to the high-level design of the model it does not make sense to consider this scenario as a valid construct (consider that an equivalent situation cannot occur with native files or ZIP). Therefore supporting WAD lump editors should actively try to prevent this situation from occurring. Supporting implementations should detect this scenario and flatly declare it broken and unsupported.

So your solution for implementing a backwards-compatible custom WAD lump involves declaring that the existing WAD format, as it has existed for over two decades, shall no longer be as flexible as it was, and limits must be placed on how it is structured so that unrelated ZIP files can work a certain way.

DaniJ · May 19, 2015

So effectively then, Gez and I are in agreement :)

Linguica · May 19, 2015

Gez said:
Not possible. Having identical data would give them the same checksum, and therefore the same metadata.

Wait, so are you saying that as far as the metadata is concerned, lumps are characterized only by their checksum, and not by their name or position in the index at all? (My eyes have glazed over on the last page.)

So if you edit a WAD lump with a non-metadata-supporting tool and then go back to SLADE, the now-invalid-checksummed lump has all its metadata automatically deleted?

DaniJ · May 19, 2015

I would not use hashes as the sole method of relating metadata to data for obvious reasons. As all lumps in a WAD have an absolute position index regardless of name, format, or other, it makes sense to relate the two absolutely using their indices and use the hashes (if deemed necessary) for accelerating lookup.

Linguica · May 19, 2015

If the metadata identifies lumps by their position in the WAD index (e.g. it lists metadata for "lump #79" or whatever), you have the same problem: if you use another tool to add a new lump at the beginning, the rest of the lumps now have a different index and the metadata won't know what to do.

DaniJ · May 19, 2015

Yep. If however we are going to use hierarchical association and a data schema like DD_DIREC to create a 1:many relationship for metadata - we don't even need to map metadata to data lumps explicitly (it can be deduced by their relative position in the hierarchy).

esselfortium · May 19, 2015

Relative positioning would be ideal, for sure. How would you recommend handling that? I think it's a bit over my head.

DaniJ · May 19, 2015

Representation wise? Its still up for debate. This part of model we never got around to implementing in Doomsday so I'm open to ideas.

Like DD_DIREC the last lump (with this name) in the WAD would describe the hierarchy of the whole WAD's contents. It would seem to make sense to include the lump timestamps here, also.

DaniJ · May 19, 2015

One potential pitfall there (and a likely point of contention) is how to deal with characters like '/' which are admitted in lump data file names but will obviously confuse things if we tried to simply map such lump names to paths.

Doomsday uses a global percent-encoded representation to deal with such characters in paths. ZDoom uses a different model.

So we need to come up with a solution which supports at least both of these and which any sourceport can easily transform into their internal/native path representation.

kb1 · May 19, 2015

Gez said:
Marker lumps don't get to have metadata, because I hate them. Suck it, you dumb marker lumps!

Seriously though, why would you want to track the author and the timestamp of a marker lump?

You wouldn't, but now you have to add a rule "Ignore 0-length lumps".

Gez said:
Yeah but why rebuild something that you can't trust anymore?

Suppose I have a metadata like this:
[D_RUNNIN]
Index=496;
CRC32=d87cbf10;
TimeStamp=1993;
LongName="Running from Evil";
Source="Bobby Prince";
Problem is, I still have a D_RUNNIN at index 496, but its CRC32 is now f5e3e8dd. Do I recover the metadata anyway and assume the title is "Running from Evil" and the author is Bobby Prince?

No, you'd have to drop the metadata, because it is no longer D_RUNNIN. I mean, yeah, you could pop up a dialog box - hopefully one that lists all of the discrepancies at once, on one screen, with checkboxes, but I don't know if that's necessary.

I probably shouldn't have even mentioned the optional stuff - it over-complicates a very simple idea. Let's drop some of that, like the history, ok?

Gez: I think you need the name and size, along with the hash, cause that adds a ton a confidence that you've made a match, and handles the occasional hash collision.

If you like CRC32, then let's go with CRC32. I am a big fan of your multiline name/value pair format:

[D_RUNNIN]
Index=496;
CRC32=d87cbf10;
TimeStamp=1993;
LongName="Running from Evil";
Source="Bobby Prince";

That makes custom properties very straight-forward. Don't know about exposing the lump index, though. I see the temptation, but, again, if you appreciate the benefit of recovery from an unsupporting editor, the reliance on index is sketchy, and the algorithm should be able to work without it. An unsupporting editor can move lumps around all day. Nothing wrong with having it - we need to create a set of rules that handle each and every scenario without ambiguity.

On the long lumpname thing, I wish I could back away from it. It puts a ton more importance on maintaining the metadata. Originally, metadata was a "nice-to-have", but if you use it for long names, it is absolutely required. It is workable, but, if we were going that route, I would almost begin to argue for an enhanced WAD format for real. (Ouch!). I won't go there today.

Then again, it could definitely be in there, especially with Gez's ".ini file" format. The semicolon is unnecessary if the header lumpname brackets are there, but it looks nice, and should be allowed.

So, to move forward:

Step 1: Define the lump format exactly. Again, I like the .ini file approach, but let's agree on format, including which properties are required first.

Step 2: Define the rules that govern how the metadata gets married to the lumps in editor memory. That should include things like:

. matching metadata to lump index
. handling duplicate lumpnames
. handling duplicate lumpname, size, CRC
. handling multiple metadata lumps
. hiding the metadata lump from the user's view in the editor, or making it read-only (or not).
. writing out the metadata lump.
. avoiding writing the lump into IWADs, or read-only PWADs, or by user choice.
. providing a custom property edit screen to add/edit/delete user properties to lumps.
. What to do with 0-length lumps.
. How to handle a compound lumpset (Maps).
. Keeping it simple. Most all of these requirements come automatically during coding, simply by doing steps in the proper order, and handling situations properly.

I have been pushing for proper management of edits done by unsupporting editors, because we get that feature, almost for free. But, that should be considered an exception. We should operate from the idea that people are generally using supporting editors, and only rarely opening up an old editor. As a courtesy, we attempt to reconnect the metadata, but that's the same function we use anyway, so we get that ability "for free". Maybe we create a new thread, just for specs, and leave the general discussion, criticisms, etc in this thread. But I think we are ready for exacting specs. Agreed?

esselfortium · May 19, 2015

After talking with Linguica about it on IRC, I should mention in here that if we're doing long texture1 names and/or virtual folders, it would probably make the most sense for those to be folded into the METADATA lump itself rather than as separate lumps. Virtual directories could be defined by the use of directory entries within the METADATA list, positioned in the same places that your proposal would have put the separate lumps. This could still use syntax similar to what you proposed earlier, or whatever works best.

Though if we're getting away from doing virtual folders or texture names, nevermind all this. :)

Gez · May 19, 2015

Linguica said:
Wait, so are you saying that as far as the metadata is concerned, lumps are characterized only by their checksum, and not by their name or position in the index at all? (My eyes have glazed over on the last page.)

Yes.

The checksum is a unique identification of content. File has a different checksum, file is different. File has identical checksum, file is identical.

Like, suppose I sell a painting that I call "portrait of my mother", but turns out that its checksum is the exact same checksum as that of the Mona Lisa (which has mysteriously disappeared from the Louvre). Am I an art thief Y/N?

Linguica said:
So if you edit a WAD lump with a non-metadata-supporting tool and then go back to SLADE, the now-invalid-checksummed lump has all its metadata automatically deleted?

Yes.

Suppose that when I stole the Mona Lisa from the Louvre, I put White on White in its place (which has mysteriously disappeared from the MOMA). Does it mean that Leonardo da Vinci is now the real author of Malevich's piece?

kb1 · May 19, 2015

Somehow, I didn't see some threads before I posted. Oh boy, I'll give it a shot :)

Disclaimer: In my previous post, I suggested that we simplify the spec by dropping long name, history, virtual folders, and other things that would make METADATA difficult, or so important that it was required by a source port to function. With that in mind, here goes:

Linguica said:
Hi I have questions
You come across a WAD with two lumps that have different names, but point (as in: same size, same pointer) to the exact same data. The entries both seem to have valid metadata (i.e. correct CRC), but the metadata has different authors and timestamps for each, despite the data being identical. What happens?

Good question, and we need to define those rules. I would suggest that metadata is parced in order, so lump #1 gets metadata #1, 2 gets #2, etc.

Linguica said:
A WAD has a MY_BUTT lump between P_START and P_END, another, wholly different, MY_BUTT lump between F_START and F_END, and a third MY_BUTT near the end not inside any marker lumps. What happens?

Not an issue if we avoid the compound lump handling, and just maintain metadata on a per-lump basis.

Linguica said:
A WAD has a MY_DILZ lump immediately followed by another, identical MY_DILZ lump. The metadata for the WAD only has a single, seemingly valid MY_DILZ entry, that could apparently describe either of them equally. What happens? If the tool only associates the metadata with one of the lumps, what happens if you delete that one lump? Is the metadata deleted and the other lump now has none at all?

Following the rule above, the 1st lump would get metadata, and the 2nd would not. Once loaded, the metadata is in memory, so it will follow the same lump thru the edit session. Upon save, all lumps are recorded into metadata, so the problem fixes itself.

Linguica said:
There is a gimmick-project community WAD where people make levels using the same collection of things. The WAD has a bunch of custom levels, but the WAD is formatted so that the THINGS lump for each level points to the same data. What happens?

Do you mean that someone hacked the WAD directory to make the header pointers point to the same physical location of the THINGS data in the WAD? Some editors would choke, I would think, and others would write a copy out to the saved file. Either way, there will be a unique entry in the WAD header, so it can have a unique metadata entry associated with it. Have you ever seen such a WAD? I'd like to check it out :)

Linguica said:
Another community project has a bunch of levels, some with REJECT lumps like normal, and some where there are no REJECT lumps for some reason. What happens?

Again, if we avoid the compound lump handling, and just maintain metadata on a per-lump basis, there's no issue.

Linguica said:
A source port has added the ability to bake lightmaps into custom levels, and they are stored in a custom LITEMAPS lump that is placed right after the BLOCKMAP lump for the level. How does the metadata code handle this?

See previous answer.

Linguica said:
There is a WAD for a source port that supports arbitrary level names. It has level marker lumps titled LEVEL01, LEVEL02, etc. What happens?

See previous answer.

Linguica said:
An old 2002 WAD turns out to have a "METADATA" lump (or whatever the name of the special lump turns out to be) where, for some reason, the author had stored the readme. You open the WAD in SLADE, import a new lump, and save it. What happens?

If SLADE doesn't crash, it will not find any matches for the data in the METADATA lump, so metadata will be blank. Upon save, it will write a new METADATA lump, losing the old data.

Or, SLADE will recognize the lump as invalid, and preserve it, and you will not have metadata for that WAD.

Linguica said:
An author has released a WAD where the metadata lump has somehow had all the normal \n linebreaks turned into \r\n (or vice versa, or whatever). What happens?The metadata is lost, and new default metadata is written out.

These are all valid good questions, and some rules need to be defined to spell out exactly what happens. If we stick to basics, and don't make source ports dependent on the metadata, then the worst that can happen is that the metadata is lost, or jumbled. Here's some things to be aware of:

1. It's VERY easy to jumble up a plain text lump, but a managed lump provides some protection. Including a hash adds even more.

2. If tools with META support are used, the data will almost always stay intact. If there's 510 lumps, there should always be 510 metadata entries once that tool saves the WAD. And, they are MD5/CRC stamped, and in order.

3. If unsupported tools are used, there is a really good chance that the editor will get it right anyway, and, what else can we ask for?

4. METADATA in a WAD is always specific to that WAD. Editors will know this, and will resolve the data upon load. If 2 METADATA lumps are found, the editor should concatenate them, and resolve lumps in lump/metadata order. Upon save, the editor writes out one METADATA lump.

5. Anyone deliberately trying to steal credit, or otherwise hack metadata are going to succeed. It's not designed for security. What are the chances that someone innocently arrives at a situation where METADATA becomes so confusing to an editor that it cannot be handled reasonably?

6. It seems to me that, there are practically only 2 scenarios where METADATA is applied incorrectly:
. The metadata cannot be applied to a lump, because a match is not found.
. The metadata gets applied to the wrong lump, because 2 lumps with exact matching content exist, and there's a mismatch between # of lumps and # of metadata entries. And, this only happens if a WAD is edited with unsupporting tools. The chances of that combination occurring are kinda small.

7. What if METADATA is applied incorrectly? What are the ramifications? 1% of lumps are incorrectly tracked? Author X gets credit for one of Author Y's resources? Forget for a minute that these conditions would be very rare. Are these showstoppers? Should we drop the idea altogether, because occasionally a lump or two get temporarily out of sync?

None of these scenarios scare me away - I think they can all be managed by proper handling, and a careful spec that defines how editors should treat METADATA (handling duplicates, handling missing info, and order of operations).

Gez said:
...The checksum is a unique identification of content. File has a different checksum, file is different. File has identical checksum, file is identical.

Your match must be a combination of name and hash (and might as well include size), for various reasons, but, for one, to decrease the odds of a false match (1 in 2^32), and, more importantly, to distinguish between differently-named 0-size lumps. It doesn't hurt to match on all fields, and it's beneficial in lots of ways. I'm sure it's the proper approach. (maybe I misunderstood you???)

EDIT: Nope, I understood, from reading old posts. Gez, you gotta include name, at least. Some sprites have identical graphics, but different names. Ideally, you want to match all metadata entries with all lumps, for a 100% success rate. Any orphans start to allow some of Ling's scenarios to creep in. I don;t know what the ramifications are, but I do know that, you avoid all of them by matching on name and hash, at a minimum. Trust me on this one - it seals a potential leak, so to speak.

EDIT 2: Damn, everyone picked apart Ling's questions. As much as I've typed into this thread, I could have written the code, debugged it, and provided examples :)

Linguica · May 19, 2015

esselfortium said:
After talking with Linguica about it on IRC, I should mention in here that if we're doing long texture1 names

That reminds me: so patches can obviously all have their own metadata entries because they're just normal bare lumps.

But textures aren't stored in the WAD directly, but rather they're defined in TEXTURE1/TEXTURE2/whatever as a mosaic of patches. So do they have their own metadata? If so, is it stored differently? Would the code just basically pretend they're "virtual lumps" or something?

esselfortium · May 19, 2015

It's worth mentioning that even if longnames, texture1 metadata, or virtual directories are implemented, source ports still would still not need to care about or be aware of metadata to remain compatible with wads. Binary-format Doom maps only have an 8-character string for textures, so the short name would always need to be written, regardless of whether a virtual name is being displayed to the user, or what virtual directory its patches are in.

The only possible way that metadata should be relevant to a source port is if the port's developer decides that they want to use metadata to add new functionality to their port, making use of metadata fields for something they decide to add. Metadata-equipped wads should be able to load normally in vanilla, because the game engine isn't expected to resolve virtual names or directories: it's just looking up the lumps in the normal list, by their true (short) names.

kb1 · May 19, 2015

Linguica said:
That reminds me: so patches can obviously all have their own metadata entries because they're just normal bare lumps.

But textures aren't stored in the WAD directly, but rather they're defined in TEXTURE1/TEXTURE2/whatever as a mosaic of patches. So do they have their own metadata? If so, is it stored differently? Would the code just basically pretend they're "virtual lumps" or something?

Whoa, we're discussing long texture names, now? Scope creep, guys! You can't fit more that 8 characters into a sidedef entry (unless you're using, say, UDMF, and have grown the sidedef texture string storage space.

And, again, I am sorry for going overboard brainstorming ideas for this metadata lump. We should avoid adding anything that requires source port modification, I think. In other words, you should be able to delete the METADATA lump, and still use the WAD 100%, in my opinion. As reliable as the format is, it is NOT 100% reliable, as many have demonstrated.

Lump METADATA would not assist in long texture names anyway!

EDIT: esselfortium beat me to it!

Sign In

Idea: METADATA lump for Slade and Doom Builder

Recommended Posts

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in