Maes
I like big butts!

Posts: 10188
Registered: 07-06 |
Even if you use pure software, there are some memory access patterns that work better than others.
From my understanding of the chocolate doom code, as of now you do "square scaling" aka trying to draw 4 (2x), 9 (3x), 16 (4x) n^2 (n) etc. pixels at once in a square/rectangular fashion, by using individual pixel addressing.
This is terribly inefficient, as it fucks up cache locality, since you force 2,3,4 etc. different scanlines to be cached each time you scale a single pixel, then then the next one causes the same 4 lines to be fetched again etc.
Instead, by doing only horizontal integer scaling for one scanline (using only horizontal pixel doubling, tripling etc.) all the writes stay in one scanline and cache coherency is preserved. This is the so-called "master scanline".
After you completed horizontal scaling, then you can just memcpy the master scanline as many times as you need vertical scaling and move on to the next one.
An even better access pattern is to do as above, but only do the vertical multiplication memcpys when you have completed all of the master scanlines (think about doing all the time-consuming horizontal scaling first, then vertical using super-efficient memcpy).
You can see an implementation of this in Mocha Doom's SoftwareVideoRenderer.java:
code:
/**
* Pretty crude in-place scaling. It's fast, but only works full-screen
* Width needs to be specific, height is implied.
* */
protected final void scaleSolid(int m, int n, int screen, int width) {
int height = screens[screen].length / width;
for (int i = 0; i < height; i += n) {
for (int j = 0; j < n - 1; j++) {
System.arraycopy(screens[screen], (i + j) * width,
screens[screen], (i + j + 1) * width, width);
}
}
}
This snippet only does the final memcpy-based scaling, but I only use it for scaling full-screen "solid" stuff like title screen, help pages etc.
Compare the part of code that does 4x scaling in Mocha Doom (it actually works on patches/columns, but the reasoning is the same):
code:
// Scales a pixel of a particular column 4x times horizontally ONLY
for (int j = 0; j < column.postlen[i]; j++) {
dest[destPos] = data[ptr++];
dest[destPos + 1] = dest[destPos];
dest[destPos + 2] = dest[destPos];
dest[destPos + 3] = dest[destPos];
destPos += n * this.width;
}
After everything has been horizontally scaled, solidScale is called to complete the job with bulk vertical scaling.
In Chocolate Doom:
code:
for (x=x1; x<x2; ++x)
{
*sp++ = *bp; *sp++ = *bp; *sp++ = *bp; *sp++ = *bp;
*sp2++ = *bp; *sp2++ = *bp; *sp2++ = *bp; *sp2++ = *bp;
*sp3++ = *bp; *sp3++ = *bp; *sp3++ = *bp; *sp3++ = *bp;
*sp4++ = *bp; *sp4++ = *bp; *sp4++ = *bp; *sp4++ = *bp;
++bp;
}
The former does only 4 expensive pointer-based accesses for scaling a pixel 4x, and leaves vertical scaling to the efficient bulk System.arraycopy == memcpy, while the latter does 16 pointer accesses every time.
Last edited by Maes on 01-12-12 at 10:59
|