Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
Kaiser

OpenGL and render optimizations

Recommended Posts

One thing I am curious to know in GL ports is what are common practices to render a scene in the most efficient and optimized way?

Reason why I am asking is because I am currently looking into ways to further optimize the scene render time (on a very minimal spec PC) in Doom64ex, which uses SDL+OpenGL.

The biggest bottlenecks, from what I debugged so far, is taking place in going through the BSP tree and clipping bsp nodes that are not within view and rendering the segs/subsectors. The clipping code is based off of Tim Stump's clipping algorithm, which seems to be commonly used in other ports.

I was concidering looking into Vertex Arrays, which ZdoomGL happens to be using them as well, but after I implemented Vertex Arrays into the engine, I was getting the same results, in fact, the FPS and render time in R_DoRenderPlayerView was actually slightly lower than before, which doesn't seem right, even if I don't bother texturing anything.

So for those who had more experience with opengl ports, what are your solutions to tackle this sort of issue?

Share this post


Link to post

Most current OpenGL rendering ports do so nearly entirely in immediate mode, constructing geometry and doing lighting calculations (etc, etc...) every frame, whilst traversing the BSP.

In general, its not the actual rendering that is the bottleneck, rather, that the port can't issue those GL commands fast enough.

This sounds to be the problem you allude to but then you mention you are using vertex arrays; are you populating those every frame (i.e., doing exactly the same process as I outlined above but rather than send a stream of GL commands you are instead writing to the vertex arrays and then sending those instead as batch jobs)?

In which case, you arn't really gaining much by simply reducing the number of GL function calls per frame. Certainly, this will help but in general this is not an optimization that leads to order-of-magnitude performance improvements.

What kind of texturing are you doing? How many passes are you making? Which blending modes are you using? How when and which buffers are you clearing? Are you using alpha/depth/whatever testing when its not really necessary?

Thats the kind of thing that will gain you more noticeable improvement.

However, all the while you are constructing geometry, doing lighting calculations (etc, etc...) every frame you'll still be struggling to feed OpenGL with commands fast enough. Therefore, the goal should be to preprocess as much of this information as possible and use it for multiple render frames.

Share this post


Link to post
DaniJ said:

This sounds to be the problem you allude to but then you mention you are using vertex arrays; are you populating those every frame (i.e., doing exactly the same process as I outlined above but rather than send a stream of GL commands you are instead writing to the vertex arrays and then sending those instead as batch jobs)?


If a seg or subsector isn't included in the array list, I draw it as normal (with standard gl function calls) and then create an array afterwards based on what was drawn, so when that seg/subsector needs to be drawn again, it'll call glDrawArrays instead because its been added to the array list.

DaniJ said:

What kind of texturing are you doing? How many passes are you making? Which blending modes are you using? How when and which buffers are you clearing? Are you using alpha/depth/whatever testing when its not really necessary?


glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); is called before R_DoRenderPlayerView, which of course is the main rendering loop function.

Only one pass is being done per texturing, unless you concider multitexture as a multiple pass but regardless, the primitives are only being drawn once per loop.

The standard combine mode used for textures is Primary RGB * texture RGB while for glowing lights, I add a constant alpha value to bright up the texture's RGB.

I turn off depth testing for drawing 2D objects and turn on alpha for drawing walls/ceilings/floors with masked textures.

Share this post


Link to post
Kaiser said:

If a seg or subsector isn't included in the array list, I draw it as normal (with standard gl function calls) and then create an array afterwards based on what was drawn, so when that seg/subsector needs to be drawn again, it'll call glDrawArrays instead because its been added to the array list.

We'd need to know more about when and how frequently you are replacing previously drawn elements in said arrays. Naturally, the ideal situation would be to only update geometry when it moves, texture coords when the texture moves and lighting values when they change. If possible, try a more fine-grained update method where you only recalculate and update what has changed.

glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); is called before R_DoRenderPlayerView, which of course is the main rendering loop function.

I don't know enough about your renderer to say for sure but in general you don't need to clear the color buffer every frame so long as you have other rendered surfaces covering the view.

Only one pass is being done per texturing, unless you concider multitexture as a multiple pass but regardless, the primitives are only being drawn once per loop.

How many texturing passes are you making? It could be that your renderer is fill-limited.

Share this post


Link to post

Things may be different now, but way back when I was working on EDGE's OpenGL renderer I got a big speed increase by minimising TEXTURE changes. Code-wise this entailed storing primitives into a array and sorting them by texture id before sending them to the GL.

Share this post


Link to post

Minimizing OpenGL state changes is always a good idea. Especially changes to the current texture unit configuration.

Share this post


Link to post
DaniJ said:

We'd need to know more about when and how frequently you are replacing previously drawn elements in said arrays. Naturally, the ideal situation would be to only update geometry when it moves, texture coords when the texture moves and lighting values when they change. If possible, try a more fine-grained update method where you only recalculate and update what has changed.
I don't know enough about your renderer to say for sure but in general you don't need to clear the color buffer every frame so long as you have other rendered surfaces covering the view.
How many texturing passes are you making? It could be that your renderer is fill-limited.


Only once, as in once the seg/subsector is drawn for the first time, its then added to the array where it will be drawn as a Vertex Array from here on. The array would then be updated only when needed.

Reason why for this setup is so I can copy the vertex data used to create the drawn primitive to the vertex array after its been drawn for the first time.

For texture passes, I only draw them once but for glowing/water effects, multitexturing is involved.

Share this post


Link to post
Ajapted said:

Things may be different now, but way back when I was working on EDGE's OpenGL renderer I got a big speed increase by minimising TEXTURE changes. Code-wise this entailed storing primitives into a array and sorting them by texture id before sending them to the GL.

Results for some levels on outdated and modern harware:
http://www.doomworld.com/vb/post/773882
http://www.doomworld.com/vb/post/773896

Share this post


Link to post
entryway said:

Results for some levels on outdated and modern harware:
http://www.doomworld.com/vb/post/773882
http://www.doomworld.com/vb/post/773896


If I am correct, there isn't much of a huge improvement. Only ~1-2fps gain.

The problem is appearing to be binding textures and the lighting calcuation. If I can sort out how things are getting binded and precache the lighting data, I can probably gain back performance.

Share this post


Link to post
Kaiser said:

lighting calcuation

What is "lighting calcuation"?

That?

float gld_CalcLightLevel_glboom(int lightlevel)
{
  return lighttable[usegamma][lightlevel];
}

Share this post


Link to post
entryway said:

What is "lighting calcuation"?

That?

float gld_CalcLightLevel_glboom(int lightlevel)
{
  return lighttable[usegamma][lightlevel];
}


The lighting in Doom64 is way different than Doom1/2. The lighting is based per seg vertex rather than per sector and involves more color variations than the standard light gradients in Doom1/2.

More calculations are involved.

Share this post


Link to post

Why don't you just skip steps to see the effect? Like measure the duration of a single frame and omit something like rendering flats or skydome or light calculation to see the effect. I don't know what you call outdated hardware but a GF6600GT made almost no difference wether I shoved it commands and polygons or commented those lines out. Inefficient storage of polygons (I used linked lists instead of arrays) however did make even the automap choppy with large and detailed wads.

Anyway, do you render the scene while traversing the bsp (like GLBoom) or traverse the bsp, collect what needs rendering and render them as a separate step (like GZDoom)?

Share this post


Link to post
rpeter said:

Anyway, do you render the scene while traversing the bsp (like GLBoom) or traverse the bsp, collect what needs rendering and render them as a separate step (like GZDoom)?

What?

GLBoom/+ collects all polygons from scene in internal structures and render them in gld_DrawScene()

Share this post


Link to post
rpeter said:

Why don't you just skip steps to see the effect? Like measure the duration of a single frame and omit something like rendering flats or skydome or light calculation to see the effect.


I've already got stuff like that setup for debugging. The majority of the rendertime is definelty being spent on lighting. Everything else is bsp traversing. Oddly enough, texture coordinate calculations have very little impact on the rendering cost.

rpeter said:

Anyway, do you render the scene while traversing the bsp (like GLBoom) or traverse the bsp, collect what needs rendering and render them as a separate step (like GZDoom)?


Yeah, I draw the polygons as it goes through the BSP, though I am concidering going down the other route by collecting the data first and then render everything.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×