Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
Sign in to follow this  
Maes

Endianness of Doom's data structures on disk?

Recommended Posts

As I progress in my Java Doom port, I inevitably stumbled upon the problem of data endianness. As we all (?) know, Java is for the most part Big Endian for what regards file I/O, while good old Doom has had most of its data created on an Intel system, and so most of the structs that are stored in WADs are little endian.

However the endianness problems were easy to overcome (highlight if you want the gory details, else skip to the question).

Spoiler

For pure file I/O I have created an extension of RandomAccessFile called "DoomFile" (similar to Jake2's "QuakeFile"), which inherits its readInt, readShort etc. methods and also adds little-endian specific readers for some data types, as well as facilities for reading arrays of objects/structs. Objects that can be read (& written) from/to disk implement the ReadableDoomObject and WritableDoomObject interfaces, whidh define the read(Doomfile f) and write (Doomfile f) methods. Each object then is responsible for self-loading given a pointer to a file (same approach used in Jake2).

A similar facility exists for memory-cached lumps (yeah, memory caching is implemented), only that the interface is called CacheableDoomObject...which does the same thing but from ByteBuffers. Surprisingly it was easier to handle endianness with those, since byte order can be changed on-the-fly anytime.

My question is: I encountered a weird situation with patch_t structs, in particular in HU_stuff.c, where the first of STCFN% lumps (hu_font[0]) that are read from disk (Doom fonts?) is treated strangely:

In the "official" GNU code there are the following lines:
HU_TITLEY =   (167 - SHORT(hu_font[0].height));

and

HU_INPUTY = (HU_MSGY + HU_MSGHEIGHT*SHORT(hu_font[0].height) +1);
which seem to change endianness of hu_font[0].height before using it in calculations with the SHORT macro. This makes no sense, as both hex editing and debugging reveals that the value of the height field is 0x0700 in hex -> 7 decimal in little endian, and thus the value of HU_TITLEY after loading should be 160.
Similarly, HU_INPUTY has a value of 8 after loading.

If I DON'T remove the SHORT macros (or their equivalent in my Java version of the code), then I get numbers like -1605...so my guess is that there is something fishy. Why should the endianness of just the first font patch's height change when being used?

Are there any lumps in the WADs which actually have big endian values stored in them?

Share this post


Link to post

No, unless you try to load the Jaguar Doom IWAD. (And then you have more trouble ahead because of the gratuitous encryption.)

On a little-endian platform, removing the SHORT macro shouldn't change anything (m_swap.h):

/ Endianess handling.
// WAD files are stored little endian.
#ifdef __BIG_ENDIAN__
short	SwapSHORT(short);
long	SwapLONG(long);
#define SHORT(x)	((short)SwapSHORT((unsigned short) (x)))
#define LONG(x)         ((long)SwapLONG((unsigned long) (x)))
#else
#define SHORT(x)	(x)
#define LONG(x)         (x)
#endif

Share this post


Link to post

That code is likely to deal with having a little-endian IWAD on Macs, as Macs used to be big-endian. I think the NeXT machines where also big-endian.

IMO the code would have been a lot cleaner if they had just made functions to read structs in member by memeber properly, instead of reading everything as a large block and putting these macros everywhere.

Share this post


Link to post
Scet said:

That code is likely to deal with having a little-endian IWAD on Macs, as Macs used to be big-endian. I think the NeXT machines where also big-endian.

They developed Doom on NeXT machines. The NeXT explanation seems a thousand times more likely than the Mac one.

Scet said:

IMO the code would have been a lot cleaner if they had just made functions to read structs in member by memeber properly, instead of reading everything as a large block and putting these macros everywhere.

It's C, not C++.

Share this post


Link to post
Gez said:

It's C, not C++.


So? A simple function pointer passed to something similar to the W_CacheLump functions would have worked.

Share this post


Link to post
Scet said:

So? A simple function pointer passed to something similar to the W_CacheLump functions would have worked.

\


There's no comparison in performance and coding effort between the two approaches: with a binary memory-mapped struct, you just read sizeof(struct) bytes from disk/buffer directly into a memory location and BAM! your struct is ready for use, regardless of complexity, with all the bits in the right place.

Unfortunately in Java (and in most managed languages) you have no guarantee on how stuff will be arranged in memory within structs/objects/arrays etc. and usually you have no direct access to memory to begin with, so you're forced to write per-struct/object marshallers (which is what I'm doing, and what the guys of Jake2 did).

Update: after scanning through some more of the files, I noticed that especially patch_t fields are read with SHORT(...) all the time. Since Java can parse them properly as little endians, the memory representation ceases being important.

However I agree, that's one instance where a struct-specific loader should be used: after all, patch_t loading is only done once and in very specific places (or on every cache miss), while reading of the structs field is much more frequent and scattered throughout the code. They could have written a struct loader for that one struct and keep the access to their fields simpler (and more efficient on big endian platforms, instead of changing it on-the-fly each time something needs to be read).

Still, that shows how certain details that can for the most part be overlooked when working directly with C/C++ become crucial when you move away from the -relative- convenience of using the C source almost "as is" in most other source ports.

Share this post


Link to post

afaik Java sees the data as big endian, depends though, on implementation. You could read byte by byte and shift it up.

Share this post


Link to post
GhostlyDeath said:

afaik Java sees the data as big endian, depends though, on implementation. You could read byte by byte and shift it up.


The memory representation is always the machine's native one, but bit manipulation at the register level (such as when using << and >> ) is always Big Endian, that's the only level where actual endianness is abstracted away. However when dealing with binary blobs stored on disk since time immemorable, it's a different story.

File I/O is Big Endian by default, even on Intel: RandomAccessFile and most other DataInput-implementing classes can only read Big Endian (so you need to manually compensate) but some libraries like ByteBuffer actually provide an endianness switching mechanism.

It's interesting to note how Java has gradually acquired classes and new functionality that makes the porting process somewhat easier than in the past: doing an e.g. Java v1.2 compliant port would be a lot harder (no enums, no ByteBuffer, no Java2D, etc.)

Share this post


Link to post
Maes said:

It's interesting to note how Java has gradually acquired classes and new functionality that makes the porting process somewhat easier than in the past: doing an e.g. Java v1.2 compliant port would be a lot harder (no enums, no ByteBuffer, no Java2D, etc.)

If backward compatibility is retained, any change or addition can only either make it easier or have no impact.

Share this post


Link to post
Gez said:

If backward compatibility is retained, any change or addition can only either make it easier or have no impact.


Java has underwent dramatic changes since it was made official, more than 10 years ago, and backward/forward compatibility is somewhat flaky, surprisingly much more than in C. It's more similar to how "compatible" the various BASIC dialects used in the 80s were between them.

If you get an early 1998 Applet, you'll very likely get deprecation warnings (especially if AWT and/or multithreading is used), and in some cases it may be outright uncompilable or non-functional if deprecation isn't addressed.

The funny thing is that the incompatibility manifests itself mostly at the source code level: already compiled .class files will in general work fine on modern JVMs, but their source code may not compile under the equivalent JDK.

Getting a modern Java 5.0 program written with any of the following features: generics, enhanced for loops, variable parameter methods, static import, use of new base libraries (which you can't just plug in on an older version of the JDK, as if they were third party libraries), enum etc. will just fail miserably with syntax errors, unknown keywords and missing/unknown classes on older JDKs, so forward compatibility is practically nonexistent.

Anyway, my point is that if I attempted this with just the pre 1.4.2 or even pre-5.0 feature set, it would be MUCH harder or even impossible. This is ALSO part of what caused all attempts at making a Java Doom so far to fail. That, and the need for a major OO redesign, which is part of what I'm doing now.

Share this post


Link to post

Shifts and bit mask positions in C are also standardized, and the idea that they change with machine word order is a common misconception.

BOOM (and thus Eternity) exploit this fact to provide what are termed symmetric endianness conversion routines. The effect of using byte-wise decomposition of the integer in concert with the corresponding masks and bitshifts results in code which, on little endian CPUs, does effectively nothing, while on big endian CPUs, swaps the byte order. It looks like such:

d_inline static int32_t SwapLong(int32_t x)
{
  return (((uint8_t *) &x)[3] << 24) +
         (((uint8_t *) &x)[2] << 16) +
         (((uint8_t *) &x)[1] <<  8) +
          ((uint8_t *) &x)[0];
}
Variants exist for signed/unsigned 16- and 32-bit integers for both little-endian and big-endian input. It's important to realize that the assumption made by the routine is the endianness of the input. Big endian input sent to the function above would remain big endian on a little endian machine, and would become little endian on a big endian machine. Achieving the opposite result is as simple as reversing the shifts in relation to the byte positions.

Of course I am not suggesting that this approach is viable in Java, since it does not have pointers. I am only pointing out that it is possible to handle this kind of thing in ANSI C without any messy macros, etc.

Share this post


Link to post

Interesting routine....the return value is always big endian when processed in registers and only becomes little endian again if written back to main memory.

BTW, I know of no practical CPU architectures that had register level little-endian representations, that's also reason of why such a code can work.

As for Java, when you read an int from memory you are effectively manipulating it at the register level, so it's big endian to all purposes and effects. Yeah, you can't access it like a small array of 4 bytes that directly, and even if you could, it would be big endian for the reasons I outlined.

However if you read it straight from disk with readInt() methods and took no further measures to ensure endianness sanity, then you may end with a scrambled number.

In any case the endianness issue was the easiest to address, I was just surprised to see on-the-fly endianness checks in several places in the code (BTW, I took a look at RUIN's codebase, which also has a functional WAD loader, and the author had come up with a similar solution (reading raw lumps into ByteBuffers, per-member parsing, use of setOrder and ByteOrder functionality in Java) but he did the dispatching in one large "God method" which has a rather large hardcoded switch statement for every case (PLAYPAL, levels, etc.)

There are other "programming gems" such as pointer level self-incrementing iterations, const char* strings, length limited or null-terminated strings to be read from disk or processed (I extended the readers for to handle this), and the best one so far: lookup of lump names is done by converting the strings into two ints (!) and using numerical comparisons.

Wanna know the funniest part? I converted even THIS part dutifully to Java (enter the dreaded name8 class!) and it worked!!! BTW, isn't that one of the aspects of DOOM that Lee Killough optimized in BOOM? IMHO a HashTable or an AttributeMap would work better...but I should benchmark!

Share this post


Link to post
Maes said:

Interesting routine....the return value is always big endian when processed in registers and only becomes little endian again if written back to main memory.

This seems to demonstrate continued misunderstanding if I'm reading your meaning correctly. Intel values are little endian while in registers as well as while in memory (this is simple to see by examining the binary instruction set used to load values directly into registers, and is implicit in how al vs ax vs eax etc work).

The level of abstraction you are seeing is in the C language, not in the machine. << 24 for example always means "shift 24 bits toward the *most significant bit*", NOT "shift 24 bits left", with "left" depending on the host endianness.

C language routines never attempt to infer the endianness of data loaded from disk. It stays in memory as it was read from disk. That means if your host CPU endianness does not match the input file's, you need to perform this exact correction. Otherwise it is unneeded.

Maes said:

BTW, I know of no practical CPU architectures that had register level little-endian representations, that's also reason of why such a code can work.

Intel architecture is little-endian ground-up. It is the language you use that shields you from having to care about it. But if you go write assembly language, this becomes important when dealing with different register sizes.

Maes said:

There are other "programming gems" such as pointer level self-incrementing iterations, const char* strings, length limited or null-terminated strings to be read from disk or processed (I extended the readers for to handle this), and the best one so far: lookup of lump names is done by converting the strings into two ints (!) and using numerical comparisons.

Indeed, this was an ineffectual hack. :) The amount of cycles spent to pack the lump names into DWORDS saves almost no time at all.

Lee Killough removed this hack and added a hash table (W_InitLumpHash in BOOM descendants). The overhead in lump name-to-number lookups comes from doing a linear search on a set of 3000+ entries and cannot be mitigated by hacks like DWORD packing that sometimes actually *add* CPU time required to do the comparison.

Share this post


Link to post
Quasar said:

This seems to demonstrate continued misunderstanding if I'm reading your meaning correctly. Intel values are little endian while in registers as well as while in memory (this is simple to see by examining the binary instruction set used to load values directly into registers, and is implicit in how al vs ax vs eax etc work).


TBQH, the actual endianness of registers is obscure in most CPU literature I've seen, as well as textbooks.

Almost all of them leave you with the idea that, as you said, when you're playing at the register level, the MSB is to the "left" (and in fact, even on Intel the SAL opcode (Shift Arithmetic Left) is used to multiply by 2, which means that there IS a left = big and right = little convention) (source). Thus, it's irrelevant if, somewhere in the polygon pushers' specs, the flip-flop containing a register's MSB is actually labelled "0" instead of "31": to all purposes and effects, the bit order is MSB-first (and left) even on Intel, inside registers, as treated by the SAL/SAR/SHL/SHR opcodes.

However, when written in memory, the picture changes: little endian apparently fucks up the order, while big endian does not. Yeah, I know that when you use the CPU's platform-specific instructions to load platform-specific data into its registers everything is going to appear fine and dandy. In fact, with C or C++ and maybe even C# on Intel there would be no problems: they'll probably map to low-level assembly to load an int as-is on disk, and if it happened to be little endian to begin with, then all is fine and dandy.

However Java always assumes big endian data even on Intel, when you're using readInt() of RandomAccessFile or of DataBuffer methods, and apparently overrides native endianness by default. And yeah, when you're doing bit shifting, it too falls for the "MSB == LEFT" and "LSB == RIGHT" convention.

So let me rephrase it: I've never seen a CPU architecture where "left" and "right" had different meanings, or where the opcodes DIDN'T hint at bit somehow being "left" and "right". A misnomer? An improper convention? Maybe, but it also has the added benefit/curse that is matches the Big Endian convention.

Again, textbooks and even standards stay silent/ambiguous about the use of left/right. However, de-facto left = MSB = multiply and right = LSB = divide.

Quasar said:

Indeed, this was an ineffectual hack. :) The amount of cycles spent to pack the lump names into DWORDS saves almost no time at all.

Lee Killough removed this hack and added a hash table (W_InitLumpHash in BOOM descendants).


Heh....I generally try not to optimize prematurely as I convert stuff, but I might make an exception for that. Here's what CheckNumForName looks like in Java:

public int CheckNumForName (String name){  
    int		v1;
    int		v2;
    //lumpinfo_t lump_p;	
    
    int lump_p; // MAES: John Romero would make this code his bitch.
    // make the name into two integers for easy compares
    // case insensitive
    name8 union=new name8(strupr (name));

    v1 = union.x[0];
    v2 = union.x[1];


    // scan backwards so patch lump files take precedence
    lump_p = numlumps;

    while (lump_p-- != 0)
    {
        int a=name8.stringToInt(lumpinfo[lump_p].name,0);
        int b=name8.stringToInt(lumpinfo[lump_p].name,4);
	if ( ( a== v1)&&(
	        b == v2))
	{
	    return lump_p;
	}
    }

    // TFB. Not found.
    return -1;
}
...I think I should better check out Lee Killough's way for that one ;-)

Share this post


Link to post

Everything you said is correct and I get your point, and it is true that the sal/sar/shl/shr opcodes refer to the big end as left, even though the physical layout is opposite, but the other way in which it matters is register layout corresponding to the various sizes of registers.

For example if you load a DWORD 0xAABBCCDD into eax (the 32-bit register), the contents of the register are DD CC BB AA. If you then access the 8-bit register al, you get 0xDD, and if you access the 16-bit register ax, you get 0xCCDD. Big-endian CPUs may or may not present the same semantics, but if they do, the chip circuitry actually has to be more complicated to accomplish it :) This is because the chip location it writes/reads will change on big-endian, whereas it does not on little. But this is becoming increasingly irrelevant to your project so let's not get further off-track ;)

I like your code. It barely looks different than the original.

Share this post


Link to post
Quasar said:

I like your code. It barely looks different than the original.


That's one of the things I'm trying to accomplish: I try not to override or replace functions, so that someone familiar with the C code will be able to find his way even through the Java code.

Also, note how the function has become a non-static method and lacks the W_ prefix: that's because it's part of a WadLoader class, and when actually used, you can do something like:

WadLoader W=new WadLoader();
// Status and data is entirely inside an instance of "W"!
W.InitMultipleFiles("DOOM.WAD", "BITTER.WAD");

int c= W.CheckNumForName("E1M1");
Similary, when initializing the "Heads Up" code (which is also completed):
// This particular instance of the wadloader and game state is
// passed to the Heads-Up class
HeadsUp HU=new HeadsUp(W, doomstate);
HU.Init();
where doomstate is a new non-static object representing Doom's internal status, mostly taken from doomstat.c and .h. Imagine, multiple internal states ;-)

In general I believe I've come pretty close to a consistent method to "objectify" Doom and create a new OO "Doom API", while at the same time trying to keep as close to the original code as possible.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  
×