Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
Sign in to follow this  
scifista42

C++ display framework, minimum complexity, maximum speed?

Recommended Posts

I do realize that I'm probably naive. I imagine that this is a problem that many inexperienced people, who want to create games or graphical applications "from scratch", wish to have easily/efficiently solved before even starting doing anything. I'm one of them now.

So, what I want is to implement a windowed display of a "software rendered" scene. The "software rendering" part is irrelevant though - the display itself is relevant. I insist that the scene must be stored in an array of unsigned char, with size "screenHeight*screenWidth*3", where each triplet of consecutive elements stores R, G and B values of an (imaginary yet) pixel. The application must create a window, then call a function to display this array into the window as a picture, then continually call this function again and again to update the picture within the window, while the content of the array would change in the meantime as the "software renderer" processed it.

I think you already have a clear idea of what I wish to do: Implementing a basic but adequate graphical application with potentially high frame rate, as a framework for my own game or another application. It should be fast and stable as much as possible. Ideal programming language: C++.

It sounds like such a perfect concept to me that I'm suspecting there must be a catch - also because I haven't found a clear explanation like "you need this and do it this way, here is a prototype" anywhere. Now here is where I need advice from a competent and experienced graphical application programmer:

If there is indeed a catch, what is it and what all needs to be done to overcome it? If there's not, then still, what all needs to be done to achieve my goal with minimum code and very maximum potential frame rate?

Let's say that a Windows-only application suffices for my purposes, at least for now. Also:

Inclusion of any outside code or libraries must not cause ANY theoretical or practical copyright problems if I wanted to distribute my application FOR FREE / UNDER GPL.

Sorry for being noob, but asking about this was too tempting for me, for a too long time, and reading relevant discussions on stackoverflow.com and elsewhere didn't give me actual satisfying answers.

Share this post


Link to post

You'd probably start by looking at libraries such as SFML or SDL.

SFML's license (it's the same as the zlib license) is permissive enough to let you make your programs licensed any way you want them. SDL too, at least for the 2.0 version.
http://www.sfml-dev.org/
http://www.libsdl.org/

You could also jump directly into OpenGL or DirectX and use them to display the image you've rendered.

Share this post


Link to post

I'm not quite sure what you are asking.

Drawing your game-world into an off-screen image ("buffer" or "surface") and then sending ("blitting") it to the display hardware -- that was just the usual way many games worked (before 3D hardware became ubiquitous, anyway). These days nearly everything uses OpenGL and that off-screen image is now called a "texture" or "frame buffer object".

Or do you mean having the your renderer (which creates an image of the visible game-world) run separately or independently from sending the current image (buffer) to the display hardware?

Share this post


Link to post

I just want my program to have full pixel-by-pixel control over the graphical display at any time, AND maximum possible fast speed of updating the display. Working with an array of char is easy and fast, and I've assumed that updating the display would be the fastest if the entire array got converted to the entire displayed image anytime it needed to be updated, and updated all at once. If there is a faster way that still allows full pixel-by-pixel control, I'd be glad to know.

Share this post


Link to post
scifista42 said:

Working with an array of char is easy and fast

Depends a *lot* on what you are doing.

These days storing an image into an OpenGL/D3D texture and using the 3D hardware to draw it is *much* faster than doing it yourself on the CPU.

And implementing your own image blitting code with scaling, alpha blending and/or bilinear filtering is *not* easy.

But don't take my word for it, make your thing and see for yourself.

Share this post


Link to post

I'm afraid that the hardware method doesn't allow me to have full pixel-by-pixel-Red-Green-Blue control over the display at any time - correct? This is very important to have for my intended purpose. I don't need any hardware-renderer features, except for one: Actually print the damn pixels on the monitor somehow fast.

Can you please advise me a working and efficient way to achieve what I've described in my previous posts, either via an example code or stating what exact routines are needed in it?

Share this post


Link to post
scifista42 said:

I'm afraid that the hardware method doesn't allow me to have full pixel-by-pixel-Red-Green-Blue control over the display at any time - correct?

Not correct. Reread this:

andrewj said:

Drawing your game-world into an off-screen image ("buffer" or "surface") and then sending ("blitting") it to the display hardware -- that was just the usual way many games worked (before 3D hardware became ubiquitous, anyway). These days nearly everything uses OpenGL and that off-screen image is now called a "texture" or "frame buffer object".

You can access individual pixels at your leisure in that OpenGL texture or FBO.

Share this post


Link to post

The Allegro 5 game library is fairly easy to use: http://liballeg.org/
It is C++ compatible and open source license, good documentation and examples.

al_create_bitmap() will create an off-screen buffer.
al_lock_bitmap() and al_unlock_bitmap() allows you to modify it yourself.
al_draw_bitmap() and al_flip_display() for drawing it to the actual screen.

Share this post


Link to post

I just finished converting a game that ran in 8-bit color under DOS into a 32-bit GL app for NDS, so, I can tell you it's quite a simple and easy way to do graphics.

If you want the framebuffer approach, then do all your drawing into a software framebuffer - it can even be 8-bit if you want, but you need to remap it into a 32-bit surface if it gets "dirty" before doing the texture upload, as modern cards do not support indexed textures in hardware - this is dead simple and reasonably fast for undemanding games though (just a 256 uint32_t array of packed colors from your palette in the destination texture color format).

Though if you want, and this is the approach that was taken in Noctropolis, you can also treat every independent object in the game as its own two-dimensional primitive - upload its sprites or other graphics as textures, create a rect or triangle strip at the proper position on screen, and then bind the texture while drawing.

All of this - getting your magic 1:1 texel-to-pixel correspondence - starts with what is called an orthogonal projection. You can find open source code for setting up such a projection in multiple Doom source ports, including Eternity.

Share this post


Link to post

Thanks everybody.

I have set myself a goal to make test window displays in Allegro, DirectX, OpenGL, SDL and SFML, provide them with pre-prepared char/int arrays to represent images, benchmark their speed of display, then compare them and decide which one is the fastest.

After failing to setup Allegro despite following guides and trying for over an hour, I've decided to start with SDL instead - and after more 6 hours of effort, I had success!

I've made my benchmarking program in this fashion: Upon startup, the program generates 1000 independent arrays of integers, each array 800*600 elements big, and fills it with random numbers to represent random colors all over the screen. Then it initializes SDL, creates an 800*600 window and makes it FULLSCREEN. Then the program saves the value of time (in miliseconds) that passed since the program started running. Then it runs a procedure that 1000-times changes a certain pointer to one of the pre-generated integer arrays, and each time it makes the window update its displayed content with data from this pointer. Finally, the program once again checks how much time passed since it started running, substracts the previously saved time, divides the number by 1000 to get an average display/window-update time, and prints it into an output file.

Repeatedly running this program and checking the output had confirmed that it takes averagely 2.6 - 2.7 miliseconds to display/update an 800*600 window according to array of integer data.

Right now, I don't feel like going through the pain to setup the other libraries and figure out their specifics. So, I'd like to discuss just my result measured with SDL.

2.7 miliseconds just to update pixels in a window - is it fast or is it slow?

Let's say that I want to achieve a stable frame rate of 35 FPS. 1/35 of a second = 28.57 miliseconds. This means that I have 28.57 miliseconds to process a frame, which I want to do fully by my own software-rendering algorithm. Out of these 28.57 miliseconds, 2.7 miliseconds are needed just for the pure display part, to actually update pixels in the window. That's roughly 9% of the available time. Will it be enough?

Compare it to Doom, precisely to modern Doom source ports with software rendering, given the same frame rate (35 FPS) and screen size (800*600). Let's disregard the fact that my renderer is truecolor and Doom's is 8-bit, compare just pure speed. I expect my software-renderer to be several times slower than Doom's - that means, taking several times more time to process a single frame before displaying it. My concern and question is: Do I have a chance to maintain 35 FPS under these circumstances? When a modern Doom source port processes a frame, does it take less than 2.7 miliseconds to update display in the window? Does it take less than 9% of the total available time, and does the actual software rendering algorithm take more or less than 91% of said available time to render a frame?

I hope I'm making myself clear enough. I'm merely concerned about speed, and it's actually irrelevant what exactly my "software-renderer" will be doing, perhaps something completely different than being a 2.5D graphic engine, but it doesn't matter when we're discussing just speed of display, which is what I want. I'd be glad if anyone experienced could explain me if this display speed that I had achieved is enough or not enough fast for realtime rendering, assuming that further software-rendering code will have to be executed every frame. In comparison with Doom, or otherwise, I'd just like to get a realistic outlook on the problematique.

Share this post


Link to post

I would try it on various different CPUs (PowerPC, ARM, etc.) at various CPU speeds.

Usually slower systems are more noticable where there is slowdown.

Share this post


Link to post

Too bad I have just my one laptop.

Here is the source code. Would anybody compile it and test it on their platforms with different CPUs at various CPU speeds, please? :)

Spoiler

#include <iostream>
#include <fstream>
#include <sstream>
#include <ctime>
#include "SDL.h"

using namespace std;

int screenWidth = 800;
int screenHeight = 600;
int screenSize;
int benchTests = 1000;

SDL_Window *window;
SDL_Surface *displaySurface;
SDL_Surface *screenSurface;

unsigned int **screen;

void initOutput() {
    FILE *f = fopen("output.txt","w");
    fclose(f);
}

void sendOutput(const char *text) {
    FILE *f = fopen("output.txt","a");
    fputs(text,f);
    fclose(f);
}

int main( int argc, char* args[] )
{
    screenSize = screenWidth*screenHeight;
    
    int i,j;
    
    screen = new unsigned int*[benchTests];
    
    for(j=0;j<benchTests;j++) {
        screen[j] = new unsigned int[screenSize];
        for(i=0;i<screenSize;i++) {
            screen[j][i] = ((rand()&255)<<16) + ((rand()&255)<<8) + (rand()&255);
        }
    }
    
    //Start SDL
    SDL_Init( SDL_INIT_EVERYTHING );
    
    srand(time(NULL));
    
    initOutput();
    
    window = SDL_CreateWindow("Hello World!", 100, 100, screenWidth, screenHeight, SDL_WINDOW_FULLSCREEN | SDL_WINDOW_SHOWN);
    if (window == NULL) { sendOutput(SDL_GetError()); SDL_Quit(); return 1; }
    
    displaySurface = SDL_GetWindowSurface(window);
    
    screenSurface = SDL_CreateRGBSurface(0,screenWidth,screenHeight,sizeof(int)*8,0x00ff0000,0x0000ff00,0x000000ff,0);
    
    int startTime = clock();
    
    for(j=0;j<benchTests;j++) {
        screenSurface->pixels = screen[j];
        SDL_BlitSurface(screenSurface,NULL,displaySurface,NULL);
        SDL_UpdateWindowSurface(window);
    }
    
    int endTime = clock() - startTime;
    
    sendOutput("Redrawing an 800x600 screen ");
    stringstream os;
    os << benchTests;
    sendOutput(os.str().c_str());
    os.str("");
    sendOutput(" times took ");
    os << endTime;
    sendOutput(os.str().c_str());
    os.str("");
    sendOutput(" miliseconds.\n\nThat is ");
    os << (endTime/benchTests) << "." << (endTime%benchTests);
    sendOutput(os.str().c_str());
    sendOutput(" miliseconds per one redraw.");
    
    for(j=0;j<benchTests;j++) {
        delete screen[j];
    }
    delete screen;
    
    /*SDL_FreeSurface(screenSurface); // Crashes for some reason
	screenSurface = NULL;
	SDL_DestroyWindow(window);
	window = NULL;*/
    
    //Quit SDL
    SDL_Quit();
    
    return 0;    
}

Needs to include from and be linked with SDL2. I don't know how much do I need to explain this, ask me about my particular include/link configuration if necessary.

Share this post


Link to post

I only have a couple minutes before work, but something about this part seems off.

for(j=0;j<benchTests;j++) {
        screenSurface->pixels = screen[j];
        SDL_BlitSurface(screenSurface,NULL,displaySurface,NULL);
        SDL_UpdateWindowSurface(window);
    }
I think reassigning screenSurface->pixels is creating a memory leak. A lock on the displaySurface's buffer and a memcpy seem to me like a better option, then I think you could eliminate the BlitSurface call. Also it's been a while, but I think SDL_UpdateWindowSurface is just the standard Windows repaint call, whereas there should be an SDL_Flip or similar function for double buffering that might be faster.

Share this post


Link to post

You are right about the memory leak, but there is only one (the very original screenSurface->pixels array), and can be fixed by copying screenSurface->pixels into a dummy pointer right after its initialization and then copying it back at the end of the program before freeing the surface.

I don't see how memcpy could possibly be better than changing a single pointer, but I like the idea of getting rid of calling BlitSurface.

I haven't yet discovered a way to write directly to displaySurface->pixels that would work, at least I think it didn't work when I tried it when I was beginning making this program.

Thanks for everything, I will try to look into it. If somebody else was faster than me or already has the knowledge of what's needed to improve speed of my code, I'd appreciate to hear it too!

Share this post


Link to post
scifista42 said:

Program.


Your program appears to use 100% CPU while allocating tons of memory by requesting memory maps constantly.

mmap(NULL, 1921024, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7950000
mmap(NULL, 1921024, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb777b000
mmap(NULL, 1921024, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb75a6000
Using gdb, this gives:
(gdb) bt
#0  mmap () at ../sysdeps/unix/syscall-template.S:83
#1  0x0fb99a00 in sYSMALLOc (av=<optimized out>, nb=<optimized out>) at malloc.c:3026
#2  _int_malloc (av=0xfc96260, bytes=1920000) at malloc.c:4776
#3  0x0fb9b41c in *__GI___libc_malloc (bytes=1920000) at malloc.c:3660
#4  0x0fe25a8c in operator new(unsigned int) () from /usr/lib/powerpc-linux-gnu/libstdc++.so.6
#5  0x0fe25bf4 in operator new[](unsigned int) () from /usr/lib/powerpc-linux-gnu/libstdc++.so.6
#6  0x10000f40 in main (argc=1, args=0xbffff534) at sci.cxx:40
I do not exactly have 1831MiB of free RAM to spare.

Changing benchTests to 50 results in random fuzz going by (in about a second) as if you were to watch VHF today (except with more color).

Share this post


Link to post
GhostlyDeath said:

Your program appears to use 100% CPU while allocating tons of memory by requesting memory maps constantly.
Using gdb, this gives:

This is fine on my computer. The program needs a long startup time to pre-generate random arrays that will be displayed later. It actually takes 20+ seconds if benchTests in 1000, and several seconds if it's 100.

Changing benchTests to 50 results in random fuzz going by (in about a second) as if you were to watch VHF today (except with more color).

Fully intentional, to simulate a demanding visual output.

Also, benchTests is supposed to be either 1, 10, 100 or 1000, otherwise the output speed won't be calculated properly.


In other news:

I fixed the memory leak (via a dummy pointer "origpixels"), here is new code:

Spoiler

#include <iostream>
#include <fstream>
#include <sstream>
#include <ctime>
#include "SDL.h"

using namespace std;

int screenWidth = 800;
int screenHeight = 600;
int screenSize;
int benchTests = 100;

void *origpixels;

SDL_Window *window;
SDL_Surface *displaySurface;
SDL_Surface *screenSurface;

unsigned int **screen;

void initOutput() {
FILE *f = fopen("output.txt","w");
fclose(f);
}

void sendOutput(const char *text) {
FILE *f = fopen("output.txt","a");
fputs(text,f);
fclose(f);
}

int main( int argc, char* args[] )
{
screenSize = screenWidth*screenHeight;

int i,j;

screen = new unsigned int*[benchTests];

for(j=0;j<benchTests;j++) {
screen[j] = new unsigned int[screenSize];
for(i=0;i<screenSize;i++) {
screen[j][i] = ((rand()&255)<<16) + ((rand()&255)<<8) + (rand()&255);
}
}

//Start SDL
SDL_Init( SDL_INIT_EVERYTHING );

srand(time(NULL));

initOutput();

window = SDL_CreateWindow("Hello World!", 100, 100, screenWidth, screenHeight, SDL_WINDOW_FULLSCREEN | SDL_WINDOW_SHOWN);
if (window == NULL) { sendOutput(SDL_GetError()); SDL_Quit(); return 1; }

displaySurface = SDL_GetWindowSurface(window);

screenSurface = SDL_CreateRGBSurface(0,screenWidth,screenHeight,sizeof(int)*8,0x00ff0000,0x0000ff00,0x000000ff,0);

origpixels = screenSurface->pixels;

int startTime = clock();

for(j=0;j<benchTests;j++) {
screenSurface->pixels = screen[j];
SDL_BlitSurface(screenSurface,NULL,displaySurface,NULL);
SDL_UpdateWindowSurface(window);
}

int endTime = clock() - startTime;

sendOutput("Redrawing an 800x600 screen ");
stringstream os;
os << benchTests;
sendOutput(os.str().c_str());
os.str("");
sendOutput(" times took ");
os << endTime;
sendOutput(os.str().c_str());
os.str("");
sendOutput(" miliseconds.\n\nThat is ");
os << (endTime/benchTests) << "." << (endTime%benchTests);
sendOutput(os.str().c_str());
sendOutput(" miliseconds per one redraw.");

for(j=0;j<benchTests;j++) {
delete screen[j];
}
delete screen;

screenSurface->pixels = origpixels;

SDL_FreeSurface(screenSurface);
screenSurface = NULL;
SDL_DestroyWindow(window);
window = NULL;

//Quit SDL
SDL_Quit();

return 0;
}

I learned that SDL_Flip doesn't exist in SDL2 anymore. Also that displayScreen's pixels don't seem to be directly modifiable and displayable at the same time.

Also, this page advised a "better" method (even mentioning Doom!) how to display array data as pixels via sending them to GPU. I wrote a new program with this method:
Spoiler

#include <iostream>
#include <fstream>
#include <sstream>
#include <ctime>
#include "SDL.h"

using namespace std;

int screenWidth = 800;
int screenHeight = 600;
int screenSize;
int benchTests = 100;

SDL_Window *window;
SDL_Renderer *renderer;
SDL_Texture *screenTexture;

unsigned int **screen;

void initOutput() {
FILE *f = fopen("output2.txt","w");
fclose(f);
}

void sendOutput(const char *text) {
FILE *f = fopen("output2.txt","a");
fputs(text,f);
fclose(f);
}

int main( int argc, char* args[] )
{
screenSize = screenWidth*screenHeight;

int i,j;

screen = new unsigned int*[benchTests];

for(j=0;j<benchTests;j++) {
screen[j] = new unsigned int[screenSize];
for(i=0;i<screenSize;i++) {
screen[j][i] = ((rand()&255)<<16) + ((rand()&255)<<8) + (rand()&255);
}
}

//Start SDL
SDL_Init( SDL_INIT_EVERYTHING );

srand(time(NULL));

initOutput();

window = SDL_CreateWindow("2",0,0,800,600,SDL_WINDOW_FULLSCREEN);
if (window==NULL ) { sendOutput(SDL_GetError()); SDL_Quit(); return 1; }

renderer = SDL_CreateRenderer(window,-1,0);
if (renderer==NULL ) { sendOutput(SDL_GetError()); SDL_Quit(); return 1; }

screenTexture = SDL_CreateTexture(renderer,SDL_PIXELFORMAT_ARGB8888,SDL_TEXTUREACCESS_STREAMING,800,600);

int startTime = clock();

for(j=0;j<benchTests;j++) {
SDL_UpdateTexture(screenTexture,NULL,screen[j],800*sizeof(Uint32));
SDL_RenderClear(renderer);
SDL_RenderCopy(renderer, screenTexture, NULL, NULL);
SDL_RenderPresent(renderer);
}

int endTime = clock() - startTime;

sendOutput("Redrawing an 800x600 screen ");
stringstream os;
os << benchTests;
sendOutput(os.str().c_str());
os.str("");
sendOutput(" times took ");
os << endTime;
sendOutput(os.str().c_str());
os.str("");
sendOutput(" miliseconds.\n\nThat is ");
os << (endTime/benchTests) << "." << (endTime%benchTests);
sendOutput(os.str().c_str());
sendOutput(" miliseconds per one redraw.");

for(j=0;j<benchTests;j++) {
delete screen[j];
}
delete screen;

SDL_DestroyWindow(window);
SDL_DestroyRenderer(renderer);
SDL_DestroyTexture(screenTexture);

//Quit SDL
SDL_Quit();

return 0;
}

It turned out that this method takes averagely 3.2 miliseconds to display pixels in same-sized window - so, the result is worse, at least on my computer. Now, if somebody would compare the run of these 2 programs on their computers, it would be nice.

Finally, I'm still failing to setup SFML and OpenGL/GLUT, there are always linking or runtime errors even though I've done everything according to guides to prevent them.

Share this post


Link to post

I converted the original program to SDL 1.2 and followed my own advice above, my program got 2.34ms per frame. Eliminating the memcpy (my buffer to screen copy) gets 0.77ms. Don't have time tonight to test OpenGL. Note that my computer is an old e-machines with an Intel G41 Express that still runs Windows XP, so not exactly top of the line.

Share this post


Link to post

Screw SDL2 then. Thanks!

EDIT: Well, not really. I've also converted my program to SDL1.2 and tested. With memcpy and SDL_Flip, I'm getting time 2.8 miliseconds, worse than before. When omiting memcpy but changing the "pixels" pointer manually instead, nothing is drawn at all - I don't understand it, can anybody explain it and if it can be worked around? If not, I will return to SDL2.

EDIT2: I have just realized something. Please correct me if I'm wrong. I had assumed that since "displaySurface->pixels" is a pointer, I can change the pointer to make it point to another place in memory, and then when the window gets updated, it would automatically take data from the new place. But it doesn't seem to be the case. It seems that any window update functions always take data from the same memory that "displaySurface->pixels" initially pointed to. This means that there is a firm place in memory from which the data must be taken, so that if I want to change the picture, it is necessary to copy the new data into the specified unchangable place in memory. This also means that my alleged tests of display speed were actually tests of memory copy speed + display speed together. But, after I make my software renderer, I can make it write directly to the final memory address, and the time needed to do it will replace the time needed to copy memory in my current tests. I need to make entirely new tests that will use one standardized memory writing technique, so that comparison of measured speeds will actually correspond to comparison of pure display speeds.

EDIT3: I made test programs specially to measure just the average speed of actual pixel-updating functions (UpdateWindow, Flip, UpdateTexture / RenderClear / RenderCopy / RenderPresent), and I've come to these results:

SDL1.2 Flip: 1.4 miliseconds
SDL2 UpdateWindow: 1.4 miliseconds
SDL2 hardware method: 2.5 miliseconds

SDL1.2 and SDL2 are actually equally fast at displaying. I will use SDL2 then, after all.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  
×