OpenGL and render optimizations

Kaiser · April 13, 2009

One thing I am curious to know in GL ports is what are common practices to render a scene in the most efficient and optimized way?

Reason why I am asking is because I am currently looking into ways to further optimize the scene render time (on a very minimal spec PC) in Doom64ex, which uses SDL+OpenGL.

The biggest bottlenecks, from what I debugged so far, is taking place in going through the BSP tree and clipping bsp nodes that are not within view and rendering the segs/subsectors. The clipping code is based off of Tim Stump's clipping algorithm, which seems to be commonly used in other ports.

I was concidering looking into Vertex Arrays, which ZdoomGL happens to be using them as well, but after I implemented Vertex Arrays into the engine, I was getting the same results, in fact, the FPS and render time in R_DoRenderPlayerView was actually slightly lower than before, which doesn't seem right, even if I don't bother texturing anything.

So for those who had more experience with opengl ports, what are your solutions to tackle this sort of issue?

DaniJ · April 13, 2009

Most current OpenGL rendering ports do so nearly entirely in immediate mode, constructing geometry and doing lighting calculations (etc, etc...) every frame, whilst traversing the BSP.

In general, its not the actual rendering that is the bottleneck, rather, that the port can't issue those GL commands fast enough.

This sounds to be the problem you allude to but then you mention you are using vertex arrays; are you populating those every frame (i.e., doing exactly the same process as I outlined above but rather than send a stream of GL commands you are instead writing to the vertex arrays and then sending those instead as batch jobs)?

In which case, you arn't really gaining much by simply reducing the number of GL function calls per frame. Certainly, this will help but in general this is not an optimization that leads to order-of-magnitude performance improvements.

What kind of texturing are you doing? How many passes are you making? Which blending modes are you using? How when and which buffers are you clearing? Are you using alpha/depth/whatever testing when its not really necessary?

Thats the kind of thing that will gain you more noticeable improvement.

However, all the while you are constructing geometry, doing lighting calculations (etc, etc...) every frame you'll still be struggling to feed OpenGL with commands fast enough. Therefore, the goal should be to preprocess as much of this information as possible and use it for multiple render frames.

Kaiser · April 13, 2009

DaniJ said:
This sounds to be the problem you allude to but then you mention you are using vertex arrays; are you populating those every frame (i.e., doing exactly the same process as I outlined above but rather than send a stream of GL commands you are instead writing to the vertex arrays and then sending those instead as batch jobs)?

If a seg or subsector isn't included in the array list, I draw it as normal (with standard gl function calls) and then create an array afterwards based on what was drawn, so when that seg/subsector needs to be drawn again, it'll call glDrawArrays instead because its been added to the array list.

DaniJ said:
What kind of texturing are you doing? How many passes are you making? Which blending modes are you using? How when and which buffers are you clearing? Are you using alpha/depth/whatever testing when its not really necessary?

glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); is called before R_DoRenderPlayerView, which of course is the main rendering loop function.

Only one pass is being done per texturing, unless you concider multitexture as a multiple pass but regardless, the primitives are only being drawn once per loop.

The standard combine mode used for textures is Primary RGB * texture RGB while for glowing lights, I add a constant alpha value to bright up the texture's RGB.

I turn off depth testing for drawing 2D objects and turn on alpha for drawing walls/ceilings/floors with masked textures.

DaniJ · April 13, 2009

Kaiser said:
If a seg or subsector isn't included in the array list, I draw it as normal (with standard gl function calls) and then create an array afterwards based on what was drawn, so when that seg/subsector needs to be drawn again, it'll call glDrawArrays instead because its been added to the array list.

We'd need to know more about when and how frequently you are replacing previously drawn elements in said arrays. Naturally, the ideal situation would be to only update geometry when it moves, texture coords when the texture moves and lighting values when they change. If possible, try a more fine-grained update method where you only recalculate and update what has changed.

glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); is called before R_DoRenderPlayerView, which of course is the main rendering loop function.

I don't know enough about your renderer to say for sure but in general you don't need to clear the color buffer every frame so long as you have other rendered surfaces covering the view.

Only one pass is being done per texturing, unless you concider multitexture as a multiple pass but regardless, the primitives are only being drawn once per loop.

How many texturing passes are you making? It could be that your renderer is fill-limited.

andrewj · April 13, 2009

Things may be different now, but way back when I was working on EDGE's OpenGL renderer I got a big speed increase by minimising TEXTURE changes. Code-wise this entailed storing primitives into a array and sorting them by texture id before sending them to the GL.

DaniJ · April 13, 2009

Minimizing OpenGL state changes is always a good idea. Especially changes to the current texture unit configuration.

Kaiser · April 13, 2009

DaniJ said:
We'd need to know more about when and how frequently you are replacing previously drawn elements in said arrays. Naturally, the ideal situation would be to only update geometry when it moves, texture coords when the texture moves and lighting values when they change. If possible, try a more fine-grained update method where you only recalculate and update what has changed.
I don't know enough about your renderer to say for sure but in general you don't need to clear the color buffer every frame so long as you have other rendered surfaces covering the view.
How many texturing passes are you making? It could be that your renderer is fill-limited.

Only once, as in once the seg/subsector is drawn for the first time, its then added to the array where it will be drawn as a Vertex Array from here on. The array would then be updated only when needed.

Reason why for this setup is so I can copy the vertex data used to create the drawn primitive to the vertex array after its been drawn for the first time.

For texture passes, I only draw them once but for glowing/water effects, multitexturing is involved.

entryway · April 13, 2009

Ajapted said:
Things may be different now, but way back when I was working on EDGE's OpenGL renderer I got a big speed increase by minimising TEXTURE changes. Code-wise this entailed storing primitives into a array and sorting them by texture id before sending them to the GL.

Results for some levels on outdated and modern harware:
http://www.doomworld.com/vb/post/773882
http://www.doomworld.com/vb/post/773896

Kaiser · April 13, 2009

entryway said:
Results for some levels on outdated and modern harware:
http://www.doomworld.com/vb/post/773882
http://www.doomworld.com/vb/post/773896

If I am correct, there isn't much of a huge improvement. Only ~1-2fps gain.

The problem is appearing to be binding textures and the lighting calcuation. If I can sort out how things are getting binded and precache the lighting data, I can probably gain back performance.

entryway · April 13, 2009

Kaiser said:
lighting calcuation

What is "lighting calcuation"?

That?

float gld_CalcLightLevel_glboom(int lightlevel)
{
  return lighttable[usegamma][lightlevel];
}

Kaiser · April 13, 2009

entryway said:
What is "lighting calcuation"?

That?
float gld_CalcLightLevel_glboom(int lightlevel)
{
  return lighttable[usegamma][lightlevel];
}

The lighting in Doom64 is way different than Doom1/2. The lighting is based per seg vertex rather than per sector and involves more color variations than the standard light gradients in Doom1/2.

More calculations are involved.

rpeter · April 16, 2009

Why don't you just skip steps to see the effect? Like measure the duration of a single frame and omit something like rendering flats or skydome or light calculation to see the effect. I don't know what you call outdated hardware but a GF6600GT made almost no difference wether I shoved it commands and polygons or commented those lines out. Inefficient storage of polygons (I used linked lists instead of arrays) however did make even the automap choppy with large and detailed wads.

Anyway, do you render the scene while traversing the bsp (like GLBoom) or traverse the bsp, collect what needs rendering and render them as a separate step (like GZDoom)?

entryway · April 16, 2009

rpeter said:
Anyway, do you render the scene while traversing the bsp (like GLBoom) or traverse the bsp, collect what needs rendering and render them as a separate step (like GZDoom)?

What?

GLBoom/+ collects all polygons from scene in internal structures and render them in gld_DrawScene()

Kaiser · April 16, 2009

rpeter said:
Why don't you just skip steps to see the effect? Like measure the duration of a single frame and omit something like rendering flats or skydome or light calculation to see the effect.

I've already got stuff like that setup for debugging. The majority of the rendertime is definelty being spent on lighting. Everything else is bsp traversing. Oddly enough, texture coordinate calculations have very little impact on the rendering cost.

rpeter said:
Anyway, do you render the scene while traversing the bsp (like GLBoom) or traverse the bsp, collect what needs rendering and render them as a separate step (like GZDoom)?

Yeah, I draw the polygons as it goes through the BSP, though I am concidering going down the other route by collecting the data first and then render everything.

Sign In

OpenGL and render optimizations

Recommended Posts

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in