Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
Sign in to follow this  
flubbernugget

Reading Source Code

Recommended Posts

I have recently been learning C++ and to further my comprehension of it I have been trying to read source code of various programs and projects I have found on the internet. However, these huge source files are very overwhelming and hard to follow. So I ask the programmers at DoomWorld:

How do you begin to look though and modify source code you have never seen before?

Are there any projects in particular that are easy to read for beginners?

How crippled will I be in understanding source code if I do not know the libraries it uses? Would I find it difficult to understand how a program uses physics if it's rendering section heavily uses OpenGL?

Share this post


Link to post

I think the best way to learn how to read source code is to write a lot of it. The more source code you've written, the more experience you have working with it and after a while it becomes second nature.

That said, some code is much more easy to read than others. The presence of comments makes a lot of difference. It helps if you have some level of understanding of what the program is doing. This might just be knowing what the program does or it might be more detailed knowledge of how it functions. Basically every part of the source code is a small cog in the overall machine, and the more information you have about how the machine is put together and what it does, the easier it is to figure out how the code you're looking at fits in.

If you're new to programming, the best advice I can give is to start with small programs and gradually build up to more complicated ones. It helps greatly if you can work with code that is well-written - by which I mean, lots of comments, well structured, divided into small functions that can be understood in isolation. You describe "huge source files [that] are very overwhelming and hard to follow" - these are almost certainly the wrong projects for a beginner to be trying to figure out.

How crippled will I be in understanding source code if I do not know the libraries it uses? Would I find it difficult to understand how a program uses physics if it's rendering section heavily uses OpenGL?

If you don't know the libraries and those libraries are complex ones like OpenGL, you won't be able to understand the code that interfaces to those libraries. In a well-written program though, different subsystems should be separated - it would make no sense to see OpenGL calls in the physics code.

Share this post


Link to post

For a large scale project, it doesn't really make sense to try to understand all parts of it at once. Generally, when I want to modify something in a program, I have a rough idea of where it should be, or at least how I could find it. For example, a text string shown to the user when something happens, something related to what I want to change. Then grep the entire source code (a good text editor can do that for you with a "Find in Files option, generally Shift+Ctrl+F instead of just Ctrl-F) for that string. Then go upstream from there. Find the string's ID (if there's such a system for internationalization support). Find where that ID is referenced in the code. Look what the function that references it do. Find what other functions call this function. And so on. It can be done rather quickly.

A good integrated development environment can help you with that, too. Something like MSVC's "IntelliSense" is helpful to make visual representations of functions calls.

Comments do make it much easier though. If the function is commented, and use explicit variable names, you don't have to read and understand all the code to understand what it does and what it's supposed to be doing.

Share this post


Link to post
fraggle said:

That said, some code is much more easy to read than others. The presence of comments makes a lot of difference.

Not to mention simply the way the code is written. Some people seem to have a knack for writing the most ludicrously obscure code for no particular reason, and understanding such bullshit is an inhuman task.

Share this post


Link to post
flubbernugget said:

I have recently been learning C++ and to further my comprehension of it I have been trying to read source code of various programs and projects I have found on the internet. However, these huge source files are very overwhelming and hard to follow. So I ask the programmers at DoomWorld:


I'm more of a traditional C/C#/Java programmer, but I often have to read/write code in other languages as well, including Pascal/Delphi/Fortran/Matlab script etc.

From my experience, C++ source files (especially if there's template abuse going on) are easily the hardest to follow, and without a particularly powerful IDE, they may be next to impossible to follow and look "beyond" the indirections. That being said, I've seen well-written C++ code, but I'd say you started with the wrong language. I recently had to translate template-based C++ code to C#, and the final result was much cleaner and easier to follow. C++ code that's just C with syntactic sugar may be easier to follow.

flubbernugget said:

How do you begin to look though and modify source code you have never seen before?


As others said, by looking for a specific functionality, although excessive indirection and template metaprogramming may be able to effectively hide it if you don't "get" into the mindset of the original programmer.

flubbernugget said:

Are there any projects in particular that are easy to read for beginners?


The Doom source code is a good place to start, surprisingly, although it's pure C. Otherwise I'd point you into early Borland C++ examples (if you don't mind their DOS-oriented flavor) and then working your way upwards from there.

flubbernugget said:

How crippled will I be in understanding source code if I do not know the libraries it uses? Would I find it difficult to understand how a program uses physics if it's rendering section heavily uses OpenGL?


In general that's a show-stopper if you don't have the libraries or at least if you're not able to detect that, yes, the program is calling an external library at a certain point. Some stuff that interferes with normal language syntax is the worst (#pragmas, BOOST macros etc.) as you absolutely need the header files.

Share this post


Link to post
Maes said:

From my experience, C++ source files (especially if there's template abuse going on) are easily the hardest to follow.



I have to concur and it's something I really don't get. C++ is a language that, when used properly, should make it relatively easy to write clean and comprehensible source code and yet nearly any publicly available C++ code makes a mess of using the language's features in a way that's neither efficient nor elegant.

Of course nothing of this can be really surprising when you look at the standard C++ libraries, in particular that abomination called STL.
To me it just serves as the best example of how *NOT* to write source code. No matter how powerful it is, the source is a complete and utter mess and debugging code using it is a nearly hopeless affair.

Sadly too many C++ developers seem to take their pointers from STL and write equally obtuse code... :(

I've come to avoid C++ based public libraries like the plague. 90% of them are so deeply tied into specific C++ features that they are essentially non-portable and utterly useless in certain situations, especially where memory footprint is a concern. Clean C libraries are much better most of the time.

Share this post


Link to post

Just start small and take your time. Pick something fun and start tinkering with it. To become proficient at writing or reading code, you have to write a lot of it and read a lot of it. It takes time.

The key to writing software is to break it into independent pieces that are small and simple enough to be understood separately. I don't know if there's any trick to familiarizing yourself with an existing codebase. In general, it's much harder to read code than to write it. But a similar principle applies: try to understand some small piece of the code, and then branch out a little bit at a time. Be prepared to spend time on the task; there's not really much more to it than time and the experience that comes with time.

Graf Zahl said:

Clean C libraries are much better most of the time.

Very true. Now if only C had just slightly better support for generics...

Share this post


Link to post
Fredrik said:

Very true. Now if only C had just slightly better support for generics...

You mean *any* support for it, since C99 removed what support it did have through coercion of structure pointers to their first contained type.

Share this post


Link to post
Quasar said:

You mean *any* support for it, since C99 removed what support it did have through coercion of structure pointers to their first contained type.

Apart from solutions that add some runtime overhead, you can get quite far with macros (with obvious drawbacks).

Share this post


Link to post

Along with this conversation I have to ask: Aren't there any comments in the code? I mean, my first year first semester CS131 course taught us to insert comments into the code as notes for what a section is meant to do and what-not. It just seems odd that these programs you're looking through would not have that.

Share this post


Link to post

The value of comments is entirely random. Sometimes they help a lot and sometimes they are crap. Commenting very simple functions is often a fantastic waste of time. I find most of the comments I include in my code are either:

  • TODO reminders because I need to come back and finish/polish something
  • Reminders of what's going on, so I don't forgot why I did something or perhaps for team members
  • The reason I just did something hackish (really a special case of above)
Comments that regurgitate the contents of the function name, manage to be incorrect, or just parrot the flow of the code are worse than no comments.

Generally I agree with what the other guys said. Start with something simple, and even then just start with one or two pieces of it. The net is full of little open-source tools that make good reads.

I also agree regarding C++. Too many programmers get into making some kind of language-feature porn. Makes me want to bash in their heads and mine.

Share this post


Link to post
Quasar said:

You mean *any* support for it, since C99 removed what support it did have through coercion of structure pointers to their first contained type.

Assuming I've interpreted your statement correctly, I'm pretty sure you're wrong.

C99 standard, section 6.7.2.1, paragraph 13 (page 103):

Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.

Which makes sense. If this wasn't the case, there would be a huge amount of code (including everything ever written that depends on Gtk+ or Glib) that would be broken.

Considering that the standard explicitly states that structures should be laid out this way, it's kind of a shame that they didn't go one step further and state that it should be possible to implicitly cast them to the type of the first member. Obviously C isn't an OOP language but it would certainly make things nicer sometimes.

Share this post


Link to post
fraggle said:

Assuming I've interpreted your statement correctly, I'm pretty sure you're wrong.

C99 standard, section 6.7.2.1, paragraph 13 (page 103):


Which makes sense. If this wasn't the case, there would be a huge amount of code (including everything ever written that depends on Gtk+ or Glib) that would be broken.

Considering that the standard explicitly states that structures should be laid out this way, it's kind of a shame that they didn't go one step further and state that it should be possible to implicitly cast them to the type of the first member. Obviously C isn't an OOP language but it would certainly make things nicer sometimes.

Yes but now try using that pointer that you cast. Doing so breaks strict aliasing, which is another edict of the same standard. In GCC with anything greater than NO optimization, abusing aliasing of pointers results in generation of invalid binary code which does things you cannot even imagine or anticipate based on what you wrote.

During the definition process of the C99 standard, the portion of the text which would have allowed an exception for casting pointers of structures to the type of their first element *was deliberately removed* because the compiler writers wouldn't allow it. They didn't want to have to make any exceptions in their "NO CASTING POINTERS TO UNRELATED TYPES" dogma.

And yes, there *are* reams of previously working code which are now miserably broken and will never be able to compile as C99, just because of this careless and thoughtless change to semantics in the name of optimization.

Share this post


Link to post

Then turn on -fno-strict-aliasing. Not a biggie. I don't really understand why you think it's such a big deal to be honest.

Share this post


Link to post
fraggle said:

Then turn on -fno-strict-aliasing. Not a biggie. I don't really understand why you think it's such a big deal to be honest.

It's not so long as that flag's available; it just means your code isn't standards-compliant. Using -fno-strict-aliasing is not really any different than, say, using non-ANSI functions in an otherwise ANSI C program. For example compilers aren't required by the standard to support code that claims to be C99 but abuses aliasing. GCC does so precisely because it can't get away with removing such for some of the reasons you mentioned.

To me it's just better to move into a language that supports the idiom you're attempting rather than to brave the waters of non-compliance.

Share this post


Link to post

This mess only means that whatever small chance C99 ever had to become a success is gone forever. Since it's a fundamentally broken mess and there's tons of old C code floating around, no compiler developer can afford to drop support for old C standards. So the inevitable outcome is that most code will never migrate to C99 - especially as long as Microsoft shows no interest in supporting it.

If GCC really dropped these options the ensuing shitstorm would be enormous.

Also, just because C99 is the newer version does not mean that old standards are made obsolete by it as you seem to persistently imply.

Share this post


Link to post
Maes said:

I'd say you started with the wrong language.


What language would you recommend I begin to invest my time in then? So far I am looking into Scheme and Java. While Scheme seems to be a better language to learn basic programming concepts, Java has better documentation.

Share this post


Link to post

I would personally recommend Python. Completely from scratch, I'd use Python 3, you can easily learn Python 2 semantics later if needed, but Python 3 has done a lot of effort to simplify the language, especially for newcomers being able to understand it.

Python has a fairly simple and mostly obvious syntax and it will teach you basic programming paradigms that you can apply elsewhere when you decide to learn another language (like C, Java, Scheme, whatever). Not only that, but Python will remain incredibly useful outside of the learning environment (probably doesn't sound too shocking in these days, but way back when a lot of people cut their teeth on BASIC, but found that BASIC is completely inadequate (not to mention taught the wrong paradigms) for any production work, it's a bit of a big thing).

Simplest Python program you'll ever be able to encounter(*):

print('Hello, World!')
And only slightly more complicated:
def fib(max):
    '''Prints out a Fibonacci number sequence'''
    a, b = 0, 1
    while a < max:
        print(a)
        a, b = b, a + b

print(fib(100))
It's simple enough that you can probably understand it already. This isn't simplified, this is the real complete Python program.

Obviously a more useful program would be more complicated, but generally the syntax and use remains clear.

(*) OK, technically you can run a completely empty file through the Python interpreter, but this happens to work in many other languages too :P

Share this post


Link to post
Quasar said:

It's not so long as that flag's available; it just means your code isn't standards-compliant. Using -fno-strict-aliasing is not really any different than, say, using non-ANSI functions in an otherwise ANSI C program. For example compilers aren't required by the standard to support code that claims to be C99 but abuses aliasing. GCC does so precisely because it can't get away with removing such for some of the reasons you mentioned.

To me it's just better to move into a language that supports the idiom you're attempting rather than to brave the waters of non-compliance.

Sounds nothing short of ridiculous to me. You're talking about an abstract concept of strict adherence to a language standard, but the reality is that the compilers that exist have ways to resolve the problem that you face.

Standards and programming languages are a means to an end, not an end to themself. You're making real world decisions about which programming language to use based on some idealist notion of standards purity when you could easily just solve the problem with a one line fix.

If you really want to go down this path, you can spend your entire life chasing after compliance to a standard that adds no practical benefit. A specific example: the Doom engine, as you know, has fixed_t that it uses to represent fixed-point fractional numbers. The Doom source code is littered with stuff like this:

r_draw.c:	*dest = dc_colormap[dc_translation[dc_source[frac>>FRACBITS]]];
Here's the thing: the C standard doesn't always define what that right shift actually does. fixed_t is a signed type, and the result is "implementation defined".
   The result of E1 >> E2 is E1 right-shifted E2 bit positions.  If E1
has an unsigned type or if E1 has a signed type and a nonnegative
value, the value of the result is the integral part of the quotient of
E1 divided by the quantity, 2 raised to the power E2 .  If E1 has a
signed type and a negative value, the resulting value is
implementation-defined.
In reality, almost every compiler does the sensible thing, which is bit shift right, filling in the top bits with 1s when it's a negative value. But they don't have to. There are almost certainly a few compilers for some architectures that fill in the top bits with 0s instead, and a compiler that does that is fully compliant with the standard.

The C standards (and presumably C++ too?) are riddled with implementation-defined and undefined behaviours. The only thing you can do is just be practical about it. I'm sure the new C99 behaviour is problematic, but the gcc authors have already identified the problem and deliberately gone out of their way to provide you a solution to it. It's reasonable to assume that authors of other C99 compilers would provide the same option, for the very reasons you cite. It doesn't make any sense at all to just abandon the entire language when there's no practical reason to do so.

Share this post


Link to post
chungy said:

def fib(max):
    '''Prints out a Fibonacci number sequence'''
    a, b = 0, 1
    while a < max:
        print(a)
        a, b = b, a + b

print(fib(100))
It's simple enough that you can probably understand it already. This isn't simplified, this is the real complete Python program.



I am very wary of languages that use indentation for structuring. How can that ever be something good, especially for beginners?

Share this post


Link to post

Quite simply because the indentation makes the code blocks very obvious, and honestly, you indent in the same manner you would reasonably write C or Java programs anyway. If you never indent in C or Java or combine many statements onto one line (which is actually possible in Python by using ; to terminate statements... but this should be used cautiously), you're probably doing it wrong anyway.

Also Python automatically gets rid of the problem of forgetting to put a { or } or having an extra one.

Share this post


Link to post
Graf Zahl said:

I am very wary of languages that use indentation for structuring. How can that ever be something good, especially for beginners?

Yeah, the syntactically significant indentation thing is the classic objection that experienced programmers always have when encountering Python for the first time. I think I was turned off by it at first as well. It takes time to get used to, but not a lot of time, and after a while you just forget about it, to be honest. It's not a big a deal as you might think.

I'm reminded of Eric Raymond's story about when he first encountered Python:

I immediately tripped over the first odd feature of Python that everyone notices: the fact that whitespace (indentation) is actually significant in the language syntax. The language has no analog of the C and Perl brace syntax; instead, changes in indentation delimit statement groups. And, like most hackers on first realizing this fact, I recoiled in reflexive disgust.

I am just barely old enough to have programmed in batch FORTRAN for a few months back in the 1970s. Most hackers aren't these days, but somehow our culture seems to have retained a pretty accurate folk memory of how nasty those old-style fixed-field languages were. Indeed, the term “free format”, used back then to describe the newer style of token-oriented syntax in Pascal and C, has almost been forgotten; all languages have been designed that way for decades now. Or almost all, anyway. It's hard to blame anyone, on seeing this Python feature, for initially reacting as though they had unexpectedly stepped in a steaming pile of dinosaur dung.


That was his initial reaction. He goes on to describe how "oddly enough, Python's use of whitespace stopped feeling unnatural after about twenty minutes" and he was subsequently blown away by the sheer usefulness of the language.

Share this post


Link to post

I'm wary of that word "old", as well as possibly having been included with Blender? You may not have IDLE or documentation in that case.

Python is not a large download, and you can have multiple versions living side-by-side if you wanted to. Make life a bit easier and just go get it from http://python.org/download/

Share this post


Link to post
flubbernugget said:

What language would you recommend I begin to invest my time in then? So far I am looking into Scheme and Java. While Scheme seems to be a better language to learn basic programming concepts, Java has better documentation.


Between this specific pair of languages, Java without as much of a shadow of a doubt. Java has better documentation, application, and heaps of code examples. Scheme is too much of a niche language, while Java is both a good introduction to the "curly brackets" language family, easy to learn, widespread, flexible, and pretty much a "must" on any modern programmer's CV. It opens up (among others) doors to mobile decelopment, web development, database programming, and is a great stepping stone for C#, C, and even C++, later on.

Python is good if you want to add an interpreted language which is actually useful to your skillset, and which is also easy to learn.

But C++...is really putting the cart before the horse.

Share this post


Link to post

I know little about python 3, but if using 2.x, I recommend using dr python instead of idle to avoid all the firewall problems with "sockets" that idle seems to have. I'm trying to make an animation interpolation-ish program (like how flash interpolates 'in between' frames sorta) using OOP for the first time and basically have to just sit and think with my eyes closed for hours until I smell brain smoke, before I have the slightest clue of where to even begin. I want to make a punch out clone eventually.
You might try the pygame library (what I've been using) since there's lots of source for simple games on the pygame site. Almost all use oop though.

Share this post


Link to post
Graf Zahl said:

This mess only means that whatever small chance C99 ever had to become a success is gone forever. Since it's a fundamentally broken mess and there's tons of old C code floating around, no compiler developer can afford to drop support for old C standards. So the inevitable outcome is that most code will never migrate to C99 - especially as long as Microsoft shows no interest in supporting it.

If GCC really dropped these options the ensuing shitstorm would be enormous.

Also, just because C99 is the newer version does not mean that old standards are made obsolete by it as you seem to persistently imply.

They aren't but the old standards lack things which are useful, and even vital, for programming in a modern environment, such as fixed-size types as one very good example. Others would include built-in support for inlining, 64-bit integers, and more secure library functions such as snprintf. Most of these were available in various C90 implementations but require a maze of #ifdefs and universally included header files to enable effective portable use.

I am happily at work currently, ditching all of that crap out of the Eternity codebase.

Share this post


Link to post

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
Sign in to follow this  
×