Archvile
Register | User Profile | Member List | F.A.Q | Privacy Policy | New Blog | Search Forums | Forums Home
Doomworld Forums : Powered by vBulletin version 2.2.5 Doomworld Forums > Classic Doom > Source Ports > Some port questions
 
Author
All times are GMT. The time now is 10:01. Post New Thread    Post A Reply
Justince
Junior Member


Posts: 151
Registered: 02-11


Though I could probably find some of this info, I though I'd check any code monkeys first.

So, are any active ports supporting multithreading?

Next, in your opinion which port would be ideal for forking to support a multi-core embedded RISC system?

Finally, is that Jaguar Doom source still hanging around?

Thanks dudes.

Old Post 07-21-12 19:04 #
Justince is offline Profile || Blog || PM || Email || Search || Add Buddy IP || Edit || Quote
Gez
Why don't I have a custom title by now?!


Posts: 11399
Registered: 07-07


Yeah, Jaguar Doom source is still hanging around somewhere. For example, you can find it here.

As for multithreading: many modern ports use it to some extent, for example ZDoom uses a different process for the sound system. Maes experimented with parallelizing certain rendering tasks in Mocha Doom. But multithreading the game logic itself would be difficult because the way it was designed in 1993 is too reliant on everything being in a single process. Just look at all the global variables that are "shared" by many functions... And contrarily to the audio and video output, you can't just replace game logic wholesale and hope it'll be good enough.

Last edited by Gez on 07-21-12 at 19:23

Old Post 07-21-12 19:15 #
Gez is online now Profile || Blog || PM || Search || Add Buddy IP || Edit || Quote
Justince
Junior Member


Posts: 151
Registered: 02-11


Oh thanks. My next question was to what routines I should be looking at for multicore optimization, but it seems that may be a bit of work, but I'm not opposed to rewriting large sections of the game.

Old Post 07-21-12 19:31 #
Justince is offline Profile || Blog || PM || Email || Search || Add Buddy IP || Edit || Quote
Maes
I like big butts!


Posts: 12763
Registered: 07-06


The quick and easy answer is trying and optimizing the renderer, since it's one of the most taxing subsystems and can be decoupled from the game logic to a large extent, but the benefits will depend a lot on what is being rendered and whether it's the map objects (monsters) or the geometry that's too taxing.

From my own test, a map like nuts.wad runs at about 60 fps "normally" in Mocha Doom at 1280 x 900 resolution, using the standard single-threaded renderer. If I disabled rendering completely, I could get at most 90 fps or so. This means that even if you manage an infinite speedup of the renderer's code, for that particular map, you won't get more than a 50% speed benefit, since you have things outside the renderer to take care of.

By using 2-4 threads and parallelizing the drawing of flats, walls and sprites, for that particular map again, I managed to get between 65 and 75 FPS, which seems a reasonable speedup considering the theoretical maximum would be 90 with zero rendering overhead.

But nuts.wad is an anomalous situation: very little geometry and too many monsters. Other maps with the opposite situation (e.g. a barren map with very complex geometry) might benefit more, while some will be average or equally complex on both fronts.

The biggest bottleneck after rendering is the game logic, which as others said, is inherently serial. You can't parallelize it without losing causality and repeatability of actions (so you essentially sacrifice the ability to play back demos, and even RECORD demos with your own port, since actions won't be repeatable unless you use a single thread). If you introduce thread locks and mutexes to try and preserve causality and demo compatibility, you will certainly lose any benefit from multithreading, and the execution will degenerate to a single thread.

Last edited by Maes on 07-21-12 at 20:53

Old Post 07-21-12 20:47 #
Maes is offline Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit || Quote
printz
CRAZY DUMB ZEALOT


Posts: 8894
Registered: 06-06


I find multithreading most needed when something in the game triggers a new sound effect, causing a split-second freeze in the gameplay. Some of the modern ports should implement it.

__________________
Automatic Wolfenstein - Version 1.0 - also on Android

Old Post 07-21-12 21:37 #
printz is offline Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit || Quote
Maes
I like big butts!


Posts: 12763
Registered: 07-06


Decoupling video & audio processing from gameplay processing should be a given, and to a certain extent is possible even without a multithreading programming model (e.g. with interrupts), but even if you have exactly zero A/V processing latency, you will still have a lot to do.

And starting a new thread for every new sound effect....not really a good idea, given the way mixing engines work and the overhead associated with creating and destroying threads continuously. The way certain sound systems work however (e.g. Java's) effectively do function in this way: each time you play a single SoundClip clip, a temporary Audioline object with its own dedicated thread is created. Not terribly efficient, especially in situations with lots of overlapping, short sound effects.

It's much more efficient to create a single "freewheeling" AudioLine that never shuts down, and "feeding it" through a dedicated mixing routine (which can also run on its own thread).

Old Post 07-21-12 22:51 #
Maes is offline Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit || Quote
hex11
Senior Member


Posts: 2237
Registered: 09-09


LinuxDoom had its own separate sndserver process that stayed running, and that worked pretty well even on 486dx/33 machine. Maybe there was some extra latency compared to the DOS version, but I couldn't tell the difference. They must have done standard IPC via pipe and signal (I think that code was part of the doomsrc release, so check it).

Old Post 07-21-12 23:38 #
hex11 is offline Profile || Blog || PM || Search || Add Buddy IP || Edit || Quote
Ladna
Member


Posts: 309
Registered: 04-10


Just guessing here, but the pause is likely due to the sound not being cached by the engine. In EE, for example, you can configure it to precache sounds effects and you don't get pauses (and the disk icon if you've enabled it) that way.

Old Post 07-23-12 21:54 #
Ladna is offline Profile || Blog || PM || Email || Search || Add Buddy IP || Edit || Quote
printz
CRAZY DUMB ZEALOT


Posts: 8894
Registered: 06-06



Ladna said:
Just guessing here, but the pause is likely due to the sound not being cached by the engine. In EE, for example, you can configure it to precache sounds effects and you don't get pauses (and the disk icon if you've enabled it) that way.
Yeah but that would increase the loading time at program startup.

__________________
Automatic Wolfenstein - Version 1.0 - also on Android

Old Post 07-23-12 21:55 #
printz is offline Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit || Quote
Maes
I like big butts!


Posts: 12763
Registered: 07-06


Even vanilla Doom precaches sounds during soundsystem initialization. Imagine if it didn't.

Old Post 07-23-12 21:58 #
Maes is offline Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit || Quote
Ladna
Member


Posts: 309
Registered: 04-10


Yeah it does increase startup time, so it's definitely a preference thing.

Re: vanilla, I imagine it *tries* to cache them in the zone, but rarely used ones get purged pretty quickly. Again, just a guess here.

Old Post 07-24-12 01:15 #
Ladna is offline Profile || Blog || PM || Email || Search || Add Buddy IP || Edit || Quote
wesleyjohnson
Senior Member


Posts: 1056
Registered: 04-09


When using an older port, try giving it a huge memory allocation, to avoid loosing stuff from cache. Older ports cannot grow their memory allocation, thus they purge stuff. Default varies but could be around 8 MB or less. I think most levels can be contained in 40 MB.
> Doom -mb 512

DoomLegacy has a separate sound process when compiled for Linux X-windows. I think this was inherited from linuxdoom ??.

Old Post 07-24-12 03:52 #
wesleyjohnson is offline Profile || Blog || PM || Search || Add Buddy IP || Edit || Quote
Justince
Junior Member


Posts: 151
Registered: 02-11


Thanks for the info guys, looks like the easiest way to get these two processors to get along is going to be the interrupt method. The two chips have to share a bus and only one of them can directly access memory at a time, as far as I can tell. I'll keep researching and get back to it. The audio is going to be a nightmare I can tell already.

Old Post 07-24-12 20:20 #
Justince is offline Profile || Blog || PM || Email || Search || Add Buddy IP || Edit || Quote
Maes
I like big butts!


Posts: 12763
Registered: 07-06



Justince said:
Thanks for the info guys, looks like the easiest way to get these two processors to get along is going to be the interrupt method.


If you are simply trying to "dedicate" one processor to audio or I/O while the other one runs logic/rendering, sure. That's easy, but won't really boost much.

If you want to try multithreaded rendering, you can look up the Mocha Doom source code to see some already implemented methods, though prDoom also has one. In essence, performance depends a lot on what you consider as the minimum rendering unit/instruction to build a pipeline around.

I have tried both column-based parallelization (minimum unit is a wall or sprite column) as well as seg-based (minimum unit is an entire wall seg, which results in drawing multiple columns).

Both methods have advantages and disadvantages: column-based is really simple to implement and easy to balance (N threads, each gets 1/Nth of total columns to render), but it requires dynamical memory allocation (the actual number of columns to render is heavily variable) and some overhead for storing the column pipeline. It doesn't scale very well to very high resolutions or complex architecture.

Seg-based is more complicated, especially when it comes to split work between multiple threads: some walls will be drawn by different threads, and it's hard to ensure that all threads will get an equal amount of work. But -in theory- it should have an advantage with complex architecture, as the number of actual walls visible is often much lower than individual columns, so less overhead.

Flats are a special case, more similar to how segs work. Sprites can be parallelized either by column or by sprites, but can only be rendered in parallel after they have been sorted. Sprite sorting itself can be parallelized, if you have an efficient sorter with little start/stop overhead. Rendering sprites in parallel using individual sprites as the base unit has the same work-splitting considerations as seg-based wall drawing. My approach? I say that with N threads, each of them draws only those sprites that are fully contained in its 1/Nth portion of the screen, occasionally drawing partial sprites. Some sprites might be rendered by more than one thread (partially) with no overdraw, e.g. a pinky in your face.

I don't know what programming model you're going to use though. The best thing, if you have really low-level control (like the 32X port, which also had twin CPUs, maybe you should look at its source code) would be to come up with customized SMP primitives with less overhead than threads. Depends on the OS you'll be using though.


Another idea is to have one core run the game logic, and at the end of each tic, copy the state of various objects to a temporary memory location, and immediately start computing the next tic. The other core will start rendering the frame that represents that saved state. If you assume that rendering takes as long as running the logic for each tic, this method should in theory give up to 100% speedup with 2 cores.

Furthermore, the renderer itself can still be internally parallelized in order to run a bit faster, but total tic running time will be dominated by the slowest of the two (needless to say, they should be synchronized at the end of the tic).

Old Post 07-25-12 19:46 #
Maes is offline Profile || Blog || PM || Homepage || Search || Add Buddy IP || Edit || Quote
Justince
Junior Member


Posts: 151
Registered: 02-11


Thank Maes for that info, you seem quite versed on the subject. I had no idea the 32x source was available, the SuperH family of processors is my target platform! So that source would be awesome to pour over. A lot more research will be needed on these various systems before I can come up with a good approach.

Old Post 07-26-12 20:37 #
Justince is offline Profile || Blog || PM || Email || Search || Add Buddy IP || Edit || Quote
All times are GMT. The time now is 10:01. Post New Thread    Post A Reply
 
Doomworld Forums : Powered by vBulletin version 2.2.5 Doomworld Forums > Classic Doom > Source Ports > Some port questions

Show Printable Version | Email this Page | Subscribe to this Thread

 

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are OFF
[IMG] code is ON
 

< Contact Us - Doomworld >

Powered by: vBulletin Version 2.2.5
Copyright ©2000, 2001, Jelsoft Enterprises Limited.