Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
Quasar

Future-safing C code?

Recommended Posts

I'm worried about what will happen with evolving C standards in the future and compiler support for code which isn't fully compliant.

Currently EE no longer successfully executes if the -fno-strict-aliasing flag is removed and -O2 or higher are used, with GCC 4.5. This is due to the "strict aliasing" requirement of the C99 standard which abruptly killed off the ability to cast so-called "unrelated type" pointers. No exception was made in the standard for similar structures (ie, structures that begin with an identical prologue of fields), and no exception was made for void *.

Basically the ability to do structured or object-oriented programming in C has been butchered, and I'm concerned that support for C90 or prior code that uses these once-valid constructs will eventually disappear. GCC is probably pretty safe for the foreseeable future due to the fact that prominent projects such as the Linux kernel depend on the switch. But what about commercial products like Visual C++, which typically have less customizability when it comes to standards-compliance and optimization behavior?

Moving to C++ is a partial solution, but this would break as many things as it would fix. For example the zone system is not compatible with C++ objects due to the way in which C++ makes assumptions about alignment when generating code to access members of class instances. So even if we go C++ and convert all our type-punning code into class heirarchies, we're suddenly cut off from our allocator, and its support for domain-specific lifetimes and garbage-collected auto allocs.

So what do we do? I think we've become a project stuck between languages with no safe place to turn.

Share this post


Link to post

Aliasing issues could likely be avoided if structs were properly padded and aligned, but I know many ports(if not all) are still relying on raw fread/writes and struct sizes being exact.

The zone system can be implemented in C++, I've actually done something similar, but unfortunately can't find it. It worked by overloading the new operator, which will give you the real size of a class(virtual table and all) and call the classes constructor, while still allowing for block allocation.

I doubt compilers will remove those options. There's plenty of C code for embedded systems out there that relies on non-aliasing, and judging by VCs ability to compile for different architectures I'd say MS still wants a piece of that pie. There's also Maes Java port and I'm working on "clean-up" of the Doom source that should eliminate some memory hacks.

I think you're overreacting.

Share this post


Link to post
Quasar said:

Moving to C++ is a partial solution, but this would break as many things as it would fix. For example the zone system is not compatible with C++ objects due to the way in which C++ makes assumptions about alignment when generating code to access members of class instances.


I don't understand this one. In what way precisely is this incompatible? I never had problems with C++ code that overloaded operator new with a custom allocator.

Share this post


Link to post
Scet said:

Aliasing issues could likely be avoided if structs were properly padded and aligned, but I know many ports(if not all) are still relying on raw fread/writes and struct sizes being exact.

File IO and memory-mapping structs onto byte arrays is a different issue entirely. What EE's current challenge is are the casting of what the compiler considers to be unrelated types which are actually meant to exist in a sort of inheritance relationship. DOOM itself used this, to pun various structure types as thinker_t, and to pun degenmobj_t as mobj_t. EE extended the same approach to structures such as mdllistitem_t and, most recently, metaobject_t.

Graf Zahl said:

I don't understand this one. In what way precisely is this incompatible? I never had problems with C++ code that overloaded operator new with a custom allocator.

C++ code is allowed to make alignment assumptions, ie. for example objects of type Foo always begin on a 16-byte boundary. Padding the beginning of the allocation with additional fields, such as the zone block header, may or may not maintain such assumptions, and just because it works on most compilers doesn't mean it'll work everywhere or that it will work forever.

Share this post


Link to post

I guess any compiler making such assumptions would be in deep trouble with any piece of code that has ever been written to use a custom allocator.

However, I'd say this is the same kind of paranoid fear which made you start this thread in the first place. Even if they move towards C99 (MS hasn't done that yet at all and probably never will) no compiler can afford to drop support for older standards. There's so much code out there, much even in old K&R syntax that any compiler developer who'd dare to remove it would probably be skinned alive...

Anyway, can you point me to some source for the alignment assumption you named? I never heard of it and I've been using C++ for longer than Doom exists.

Share this post


Link to post

no compiler can afford to drop support for older standards. There's so much code out there, much even in old K&R syntax that any compiler developer who'd dare to remove it would probably be skinned alive...[/B]


Tru dat. It's even more extreme in languages such as Fortran, that have to stretch backwards compatibility to a range that well exceeds that of the most ancient C compiler you can think of -and yet, you can run a 50s FORTRAN program with minor adaptations even under a modern Fortran 95 compiler.

However, to stay OT, I'd say that you can reasonably future-proof any Doom codebase against the specific issued of I/O and memory alignment by using custom unmarshaling/caching/allocation, as I do in Mocha Doom. At least THAT part of the code proved to be rock-steady, and doesn't depend on being able to "yank" raw disk byte reads into memory, hoping that they will "land" precisely on your idealized arrays of structs.

I see no reason why you couldn't do that in C or C++ too, other that you'll spend some extra time writing the loaders/allocators by hand and actually calling them whenever you need to read something from disk, rather than just yanking a bunch of bytes directly into memory. Having to do that allows for some load-time optimizations, e.g. splitting column-based graphics neatly into separate columns, that need to be done only once.

However, even the original Doom code itself had loaders that did some custom processing before handing control to the game, be it adapting between on-disk and in-memory structs (yup, many things have a dual format), converting/expanding to fixed-point arithmetic and/or account for endianness mismatches.

Share this post


Link to post

Well on further reading it doesn't seem that allocation alignment concerns are actually more strict in C++ than they are in C, so I'm probably creating a false issue there. Doing stuff like Z_Malloc does to add a header into an allocation has always been iffy, but it's always been possible to pull off too - and if a particular platform had issues you'd just have to account for them by adding the proper amount of padding I guess.

Calling my concerns paranoid is a bit out of line I think, as I didn't think we'd live to see a C standard that made pointer casting essentially illegal in the first place. The main tenet of C as a language was always its bare-metal lack of abstractions placed on top of machine memory. A pointer was a pointer - arrays are pointers, structures are just memory buffers, etc. But C99 has reduced that quite a bit, and it naturally concerns me - maybe the next standard will make it even more strict, who's to say.

It'd be OK if the other parts of C99, such as support for ISO fixed-size data types, weren't lumped in together with these fundamental semantic changes, but as is, EE is basically using parts of C99 without being fully compliant and that makes me feel dirty as a programmer :/ In my mind it'd be ideal to have full compliance. The solutions I'm seeing to do that and still be able to use "pseudo-OO" techniques are sorry though - for example involving unions.

Share this post


Link to post

Q, we had this conversation a looooong time ago. I still think converting the code progressively to C++ is the way to go. Start with the parts which you know will cause problems with the new C standard then continue on.

I never understood the idea that the Doom source code needed to stay in C and that pseudo-OO was a necessity. It's not. I'm always amazed to see how deadfully redirection-heavy (in function calls terms) the original code is while it is perfectly feasible to go the "object with inline methods way" 99% of the time... which would make the source code that much more maintainable and, paradoxically, more optimized (anyone still pretending C++ is slower obviouly has no idea how to code in it properly from a performance point of view).

The only issue I see turns around backward compatibility (ie. demos) but it's nothing you can't tackle provided you have a set of maps and assiciated demos to use as "unit tests".

God, I wish I had more time on my hands.

Best of luck with whatever decision you take, Eternity is still the source port I'm most in love with ;)

Share this post


Link to post
Julian said:

Q, we had this conversation a looooong time ago. I still think converting the code progressively to C++ is the way to go. Start with the parts which you know will cause problems with the new C standard then continue on.

I never understood the idea that the Doom source code needed to stay in C and that pseudo-OO was a necessity. It's not. I'm always amazed to see how deadfully redirection-heavy (in function calls terms) the original code is while it is perfectly feasible to go the "object with inline methods way" 99% of the time... which would make the source code that much more maintainable and, paradoxically, more optimized (anyone still pretending C++ is slower obviouly has no idea how to code in it properly from a performance point of view).

The only issue I see turns around backward compatibility (ie. demos) but it's nothing you can't tackle provided you have a set of maps and assiciated demos to use as "unit tests".

God, I wish I had more time on my hands.

Best of luck with whatever decision you take, Eternity is still the source port I'm most in love with ;)

Thanks for the advice Julian. I will probably be creating a cpp-branch of EE so I can see how much effort will be involved in the conversion. I don't think it'll actually be very difficult as I've already eliminated most of the problems aside from lack of explicit pointer casts. Of course all the places doing type punning need to be converted to use proper inheritance mechanisms, as C++ frowns on the former almost as much as C99, and I figure that'd be the bulk of the work needed.

Share this post


Link to post
Quasar said:

which abruptly killed off the ability to cast so-called "unrelated type" pointers.

I'll believe that when I see it -- citation needed.

You could always switch to GCC for all platforms. Someone's bound to chime in about how MinGW (the cross compiler) is the worst compiler in the world, but it works for me (tm).

Share this post


Link to post
andrewj said:

I'll believe that when I see it -- citation needed.

You could always switch to GCC for all platforms. Someone's bound to chime in about how MinGW (the cross compiler) is the worst compiler in the world, but it works for me (tm).

C99 said:

But for those interested in up-to-date definitive information on the C standard refer to ISO/IEC 9899:TC2 [open-std.org]. Here is the most relevant text from section "6.5 Expressions":

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

* a type compatible with the effective type of the object,
* a qualified version of a type compatible with the effective type of the object,
* a type that is the signed or unsigned type corresponding to the effective type of the object,
* a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
* an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
* a character type.

Share this post


Link to post
andrewj said:

I'll believe that when I see it -- citation needed.

You could always switch to GCC for all platforms. Someone's bound to chime in about how MinGW (the cross compiler) is the worst compiler in the world, but it works for me (tm).




... and lose all the debugging features MSVC offers? I can't speak for Quasar, of course, but for me that'd amount to a significant loss of productivity.

Share this post


Link to post

You can work around a few strict aliasing problems by wrapping things in unions. For example instead of calling S_StartSound on a degenmobj_t directly I use this wrapper:

//
// P_SectorSound()
//
// S_StartSound for a sector or linedef sound origin (degenmobj_t)
//

void P_SectorSound(const degenmobj_t *soundorg, int sound_id)
{
  union { const mobj_t *mobj; const degenmobj_t *soundorg; } alias;

  // avoid type punning breaking strict alias rule warnings
  alias.soundorg = soundorg;
  S_StartSound(alias.mobj, sound_id);
}
One day I'm basically going to put every different kind of thinker_t in an incredibly big union. It's been on my todo list for a long time but it seems like a lot of work so I haven't bothered yet.

Share this post


Link to post

I think that kind of mess is precisely what Quasar does not want to do.

If C99 really has such excessively strict typecasting rules I honestly don't see that version of the language go anywhere. It goes against everything C stands for.

Share this post


Link to post

With regards to that particular code snippet, at least in MochaDoom I got around it by making degenmobj_t inherit (or extend) from mobj_t. If you bother examining the two "classes" closely, you'll see that a degenmobj_t is just a small, memory-aligned subset of a mobj_t which is why this approach works (or at least stops the compiler from nagging ;-)

Since the S_StartSound function just needs the positioning info of a mobj_t, that way works just fine (and S_StartSound can be declared as taking just mobj_t arguments rather than using unions or overloading).

C++ should have no problem supporting a similar pure OO approach just like Java or C#.

Share this post


Link to post
Maes said:

With regards to that particular code snippet, at least in MochaDoom I got around it by making degenmobj_t inherit (or extend) from mobj_t. If you bother examining the two "classes" closely, you'll see that a degenmobj_t is just a small, memory-aligned subset of a mobj_t which is why this approach works (or at least stops the compiler from nagging ;-)

Since the S_StartSound function just needs the positioning info of a mobj_t, that way works just fine (and S_StartSound can be declared as taking just mobj_t arguments rather than using unions or overloading).

C++ should have no problem supporting a similar pure OO approach just like Java or C#.



What you write here does not make sense. degenmobj_t is a subset of mobj_t so inheriting as you describe would make it as large as a full mobj_t and thereby negating its entire purpose. If inheritance is to be used here the other way around would be better. Of course the proper solution would be to overload S_StartSound with versions that take a sector_t or a line_t as a parameter so that this kind of hackery is not even necessary.

Share this post


Link to post
Graf Zahl said:

If inheritance is to be used here the other way around would be better.


Duly noted ;-)

Then again, from what I recall, degenmobjs are never explicitly allocated in the code, only "casted" from mobj_t's, so functionally there's no difference in the space used: you're passing pointers to mobj's anyway, and you already allocated them by the time you reach that code.

When you "cast" a mobj_t to degenmobj_t in C you're just passing a pointer to a full mobj_t and saying "treat this as a degenmobj_t", which has the side effect of making only certain fields directly visible to the . and -> operators (pointer hacks notwithstanding). So in that sense there's no extra space penalty from just having degenmobjs extend mobjs, because you never instantiate them explicitly to begin with. TBQH, there's no reason to distinguish between them, semantically (other than implementing a poor man's version of object immutability and interfacing).

Then again, a hardcore OO purist would argue that the way to go would be to use getters/setters (or just getters) rather than directly exposed fields, and make mobj_t and degenmobj_t implement an interface which exposes just these methods, no matter what fields they actually contain). But we like it hard and dirty, don't we?

Share this post


Link to post

The degenmobjs in the C version are parts of the sector structure so they take up space.

Maes said:

Then again, a hardcore OO purist would argue that the way to go would be to use getters/setters (or just getters) rather than directly exposed fields, and make mobj_t and degenmobj_t implement an interface which exposes just these methods, no matter what fields they actually contain). But we like it hard and dirty, don't we?


Hardcore OOP purist code is probably the only thing that's worse than badly structured C. All that 'clean' encapsulation can easily end up more Spaghetti than gotos if you have to click through 10 or more getter functions to find out what it does. Been there, done that, no fun at all.

Share this post


Link to post
Graf Zahl said:

Hardcore OOP purist code is probably the only thing that's worse than badly structured C. All that 'clean' encapsulation can easily end up more Spaghetti than gotos if you have to click through 10 or more getter functions to find out what it does. Been there, done that, no fun at all.


Yep. I find that the only way to judge code is from a maintenance point of you. Over-architectured OOP is no better than the hackaton mess the Doom code ended up to be.

Share this post


Link to post
Graf Zahl said:

The degenmobjs in the C version are parts of the sector structure so they take up space.


Duly noted. That would make a getter/setter/interface approach more sensible, since you can't have interfaces that define just member fields in Java. Or simply say "fuck it", waste a few extra bytes of memory and make it degenobjs a mobj in their core (or viceversa).

In any case, S_StartSound and its callee S_StartSoundAtVolume both are weakly typed (they accept void*), so in otder to avoid that kind of cast you'll have to define a base "sound origin" class anyway. For now I made that to be just mobj_t, but as you said it makes more sense to have a more lightweight object with just x and y information... arghhhh

The bottom line is that if you really-really-really want to future-proof the code or depart from C/C++ altogether, you'll have to progressively move away from typeless raw data and accept at least SOME strong typing, altough I too believe that, as Graf said, C will always be C just as Fortran will always be Fortran, and the issue will never go beyond a few compiler warnings/discouragements.

Share this post


Link to post
Maes said:

and accept at least SOME strong typing



Strong typing as such is not a problem. It becomes one if it's used in a language which has no concept of inheritance and type relationships. Then you'd be back to the bad old Pascal/Modula 2 days. So the irony here is that although C claims to become safer by this all it will produce is even more hacky code to get around a pointless limitation. RiY's union is a perfect example for bad ideas spawning even worse workarounds.

Share this post


Link to post

Well I thought it was pretty clever I mean it fixed the warnings which was kind of the point

I'm just gonna go sit in the corner I guess

Share this post


Link to post

I didn't say it was not clever.

But ultimately it's a workaround for a problem that shouldn't really exist in the first place. It's just a more clumsy way to do an 'unsafe' conversion with precisely the same end result.

Share this post


Link to post
Graf Zahl said:

But ultimately it's a workaround for a problem that shouldn't really exist in the first place. It's just a more clumsy way to do an 'unsafe' conversion with precisely the same end result.

Why could they not have simply defined an "alias" keyword similar to the "restrict" keyword which would tell the compiler to NOT make aliasing assumptions about a particular pointer? Then it would have been as simple a matter as adding this keyword to any pointers that can alias each other.

But design by committee always seems to turn out like design by gorillas.

Share this post


Link to post
Graf Zahl said:

If C99 really has such excessively strict typecasting rules I honestly don't see that version of the language go anywhere. It goes against everything C stands for.

QFT. The only issue here are compilers which have jumped on it and now opt us in by default. My opinion is that it is the default rule-set used by such compilers which is 'wrong'.

Any C compiler which defaults to C99 is logically broken imo.

Share this post


Link to post

As a weekend hobby coder I don't dig too deep into compiler science, but I do hate all the new warnings and errors with every major gcc release. I've already run into that strict aliasing warning, I might try RjY's solution.

I haven't a problem with Zone, I converted PrBoom's functions into a C++ class long time ago.

RjY said:

One day I'm basically going to put every different kind of thinker_t in an incredibly big union. It's been on my todo list for a long time but it seems like a lot of work so I haven't bothered yet.


I'm trying to do the opposit. At first glance I could create separate lists for the different thinker types and run their functions separately. Not sure about the consequences, I'll be wiser a month later.

Share this post


Link to post
rpeter said:

I'm trying to do the opposit. At first glance I could create separate lists for the different thinker types and run their functions separately. Not sure about the consequences, I'll be wiser a month later.



Desyncs will be guaranteed if you do that.

What I don't really understand here is why do you have to alter the code to work with GCC's default mode? Doesn't it have the command line option to disable such annoying features for precisely the scenario where you want to use the (valid) old syntax?

In the end, if you 'fix' your code all you do is playing into the hands of the idiots that made the change.

Share this post


Link to post

I thought about demo desyncing, no problem for me, removed all demo code ages ago: don't care about demos.

I want to dissect that thinker stuff to pieces, because I have problems following all the casting, union, void* parameter hackage. I'm not a pro, just a hobby coder, cannot argue with compiler makers.

Share this post


Link to post
rpeter said:

I want to dissect that thinker stuff to pieces, because I have problems following all the casting, union, void* parameter hackage. I'm not a pro, just a hobby coder, cannot argue with compiler makers.

The only casting that goes on in the thinker system is such as this:

for(th = thinkercap.next; th != &thinkercap; th = th->next)
{
  if(th->function == P_MobjThinker)
  {
     mobj_t *mo = (mobj_t *)th; // GCC WARNING: Cast to unrelated type violates strict aliasing rules
     ...
  }
}
This is what is referred to as type punning. It was at least implicitly allowed in up to C90 and makes sense so long as the types involved are analogous in their prologue fields - and from what I've learned the C99 standard in fact included for a time an exception for such types in the strict aliasing - and it was removed because some morons protested it.

Why do C programs do this? It's pseudo-inheritance. By casting a structure down to a larger type which contains an instance of the smaller type, you have achieved exactly the same thing that is done in C++ by using dynamic_cast. In the former example, the value of th->function serves as RTTI for thinkers.

In Eternity, the metatable has an elaborate RTTI system based on hashed string keys, and downcasting is achieved by checking MetaIsKindOf with the METATYPE() macro around the name of the type you want to downcast a metaobject_t downward toward - for example metaint_t, or metastate_t.

Obviously it'll be simple to convert this to C++, a language that still, as the time of this writing, supports such notions. C99 is not on the list.

Share this post


Link to post
Quasar said:

Obviously it'll be simple to convert this to C++, a language that still, as the time of this writing, supports such notions. C99 is not on the list.



C++ will be 'safe' unless they remove reinterpret_cast from the language. That one's existence clearly tells that the designers knew well enough that serious programmers need such options.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×