Help with a new low detail mode (ZDoom LE)

 As you know i restored low detail modes in ZDoom LE and added a quad mode (4x2).
 Now i've added a 3x3 low detail mode (4x2 was easy), it works but crashes on exit and i don't know why. I've traced the algorithm and it's right. The only way to prevent the crash is to stop the inner loop with x < rowsize-6 but then the rightmost part of the screen wouldn't be filled and would display junk. I've kept detailxshift and detailyshift at 1 since otherwise would be a mess but i think it doesn't matter. For instance rowsize is 160 and screen width is 320. The problem is in ~DSimpleCanvas but the debug info is not useful. If someone wants to help i would appreciate it. Thanks.

 

// [RH] Double pixels in the view window horizontally
//		and/or vertically (or not at all).
void R_DetailDouble ()
{
	if (!viewactive) return;
	DetailDoubleCycles.Reset();
	DetailDoubleCycles.Clock();

	switch (r_detail) // (detailxshift << 1) | detailyshift
	{

...

	case 4:		// x-quad and y-double
		{
			int rowsize = viewwidth;
			int realpitch = RenderTarget->GetPitch();
			int pitch = realpitch << 1;
			int y,x;
			BYTE *linefrom, *lineto;

			linefrom = dc_destorg;
			for (y = viewheight; y != 0; --y, linefrom += pitch)
			{
				lineto = linefrom - viewwidth;
				for (x = 0; x < rowsize; x = x+2)
				{
					BYTE c = linefrom[x];
					lineto[x*2] = c;
					lineto[x*2+1] = c;
					lineto[x*2+2] = c;
					lineto[x*2+3] = c;
					lineto[x*2+realpitch] = c;
					lineto[x*2+realpitch+1] = c;
					lineto[x*2+realpitch+2] = c;
					lineto[x*2+realpitch+3] = c;
				}
			}
		}
		break;
	case 5:		// x- and y-triple
		{
			int rowsize = viewwidth;
			int realpitch = RenderTarget->GetPitch();
			int pitch = realpitch << 1;
			int y,x;
			BYTE *linefrom, *lineto;

			linefrom = dc_destorg;
			for (y = viewheight; y > 0; --y, linefrom += pitch)
			{
				lineto = linefrom - viewwidth*3;
				for (x = 0; x < rowsize-3; x=x+3)
				{
					BYTE c = linefrom[x];
					lineto[x*2] = c;
					lineto[x*2+1] = c;
					lineto[x*2+2] = c;
					c = linefrom[x+1];
					lineto[x*2+3] = c;
					lineto[x*2+4] = c;
					lineto[x*2+5] = c;
				}
				x = rowsize-1;
				BYTE c = linefrom[x-1];
				lineto[x*2] = c;
				lineto[x*2+1] = c;
				memcpy (lineto+realpitch, lineto, rowsize*2);
				memcpy (lineto+realpitch*2, lineto, rowsize*2);
			}
		}
		break;
	}
	DetailDoubleCycles.Unclock();
}

 

Share this post


Link to post
4 hours ago, drfrag said:

 As you know i restored low detail modes in ZDoom LE and added a quad mode (4x2).
 Now i've added a 3x3 low detail mode (4x2 was easy), it works but crashes on exit and i don't know why. I've traced the algorithm and it's right. The only way to prevent the crash is to stop the inner loop with x < rowsize-6 but then the rightmost part of the screen wouldn't be filled and would display junk. I've kept detailxshift and detailyshift at 1 since otherwise would be a mess but i think it doesn't matter. For instance rowsize is 160 and screen width is 320. The problem is in ~DSimpleCanvas but the debug info is not useful. If someone wants to help i would appreciate it. Thanks.

 


// [RH] Double pixels in the view window horizontally
//		and/or vertically (or not at all).
void R_DetailDouble ()
{
	if (!viewactive) return;
	DetailDoubleCycles.Reset();
	DetailDoubleCycles.Clock();

	switch (r_detail) // (detailxshift << 1) | detailyshift
	{

...

	case 4:		// x-quad and y-double
		{
			int rowsize = viewwidth;
			int realpitch = RenderTarget->GetPitch();
			int pitch = realpitch << 1;
			int y,x;
			BYTE *linefrom, *lineto;

			linefrom = dc_destorg;
			for (y = viewheight; y != 0; --y, linefrom += pitch)
			{
				lineto = linefrom - viewwidth;
				for (x = 0; x < rowsize; x = x+2)
				{
					BYTE c = linefrom[x];
					lineto[x*2] = c;
					lineto[x*2+1] = c;
					lineto[x*2+2] = c;
					lineto[x*2+3] = c;
					lineto[x*2+realpitch] = c;
					lineto[x*2+realpitch+1] = c;
					lineto[x*2+realpitch+2] = c;
					lineto[x*2+realpitch+3] = c;
				}
			}
		}
		break;
	case 5:		// x- and y-triple
		{
			int rowsize = viewwidth;
			int realpitch = RenderTarget->GetPitch();
			int pitch = realpitch << 1;
			int y,x;
			BYTE *linefrom, *lineto;

			linefrom = dc_destorg;
			for (y = viewheight; y > 0; --y, linefrom += pitch)
			{
				lineto = linefrom - viewwidth*3;
				for (x = 0; x < rowsize-3; x=x+3)
				{
					BYTE c = linefrom[x];
					lineto[x*2] = c;
					lineto[x*2+1] = c;
					lineto[x*2+2] = c;
					c = linefrom[x+1];
					lineto[x*2+3] = c;
					lineto[x*2+4] = c;
					lineto[x*2+5] = c;
				}
				x = rowsize-1;
				BYTE c = linefrom[x-1];
				lineto[x*2] = c;
				lineto[x*2+1] = c;
				memcpy (lineto+realpitch, lineto, rowsize*2);
				memcpy (lineto+realpitch*2, lineto, rowsize*2);
			}
		}
		break;
	}
	DetailDoubleCycles.Unclock();
}

 

No idea, but let me throw a few things out there:

You almost have to be walking past the buffer on write. I'm thinking past the top, which is unusual. As you probably know, the OS allocs a bit more memory than you ask for, and it places info about that block into the block, which is used when the memory is freed. Crash on exit suggests corruption of those bytes. Suggest (for debugging):

  1. Replace y > 0 with y >= 2, in Case 5, as a temporary fix. When viewheight is not a multiple of 3, when and if y becomes < 2, lineto points into negative array elements (I think - just guessing from a quick look).
  2. If that doesn't work, add bounds check around each memory write, that writes to a file. Or:
  3. Compile in debug mode. The compiler will place sentinel bytes around each block, at test them for corruption during deallocation, almost always detecting this type of bug. Or:
  4. Use the HOM indicator option, which inits the draw buffer with red alternating with black. Then, reduce your x and y bounds by 3, and take a few screenshots rapidly. Open them up and see if the red is being written into.

Some instrumentation will find this bug easily. Problem is, it's a lot of data to analyze. That's why if 1 doesn't work, 4 may find it the easiest. A quick glance at the code suggests that it's real close to working. Good luck! Please reply when you find it (I love a good mystery!)

 

Edited by kb1

Share this post


Link to post

 Thanks but i already tried that. It's fixed now anyway. I wanted to do this mainly to prove myself that i was capable of doing it. I guess the compiler will optimize this and the y > 0 condition will not be checked on every iteration of the outer loop but i'm not sure.
 There's a remaining problem, the image at high resolutions is displaced to the right and i think that's due to the CPU.DataL1LineSize cache optimization in DSimpleCanvas::DSimpleCanvas. I guess the pitch must be different but i don't know how to fix it, i've tried without luck.

 

Edit: fixed as well. Post editing is BROKEN!

Edited by drfrag

Share this post


Link to post

 Missing code, this forum system is a nightmare:

	case 5:		// x- and y-triple
		{
			int rowsize = viewwidth;
			int realpitch = RenderTarget->GetPitch();
			int pitch = realpitch << 1;
			int y,x;
			BYTE *linefrom, *lineto;
			BYTE c;
			int offset = viewwidth > 320 ? CPU.DataL1LineSize : 0;

			linefrom = dc_destorg;
			for (y = 0; y < viewheight; ++y, linefrom += pitch)
			{
				if (y > 0)
				{
					lineto = linefrom - viewwidth*3 - offset;
				}
				else
				{
					lineto = linefrom - viewwidth;
				}
				for (x = 0; x < rowsize-3; x=x+3)
				{
					c = linefrom[x];
					lineto[x*2] = c;
					lineto[x*2+1] = c;
					lineto[x*2+2] = c;
					c = linefrom[x+1];
					lineto[x*2+3] = c;
					lineto[x*2+4] = c;
					lineto[x*2+5] = c;
				}
				x = rowsize-1;
				if (viewwidth >= 200)
				{
					c = linefrom[x-2];
					lineto[x*2-2] = c;
					lineto[x*2-1] = c;
				}
				c = linefrom[x-1];
				lineto[x*2] = c;
				lineto[x*2+1] = c;
				if (y > 0)
				{
					memcpy (lineto+realpitch, lineto, rowsize*2);
					memcpy (lineto+realpitch*2, lineto, rowsize*2);
				}
				else
				{
					memcpy (lineto+realpitch, lineto, rowsize*2);
				}
			}
		}
		break;

 

Share this post


Link to post

 Since can't edit posts with code...

 Your reply was actually useful but i was already debugging and then it crashed inmediately in r_draw.cpp upon changing video mode to 1024 with lineto being out of bounds.

Edit: actually it's a 3x2 mode. I think 3x3 is impossible, i'd need to use linefrom += realpitch*3 but i get junk.
Same for 3x1, setting detailyshift to 0 makes the engine crash somewhere else.

Edited by drfrag

Posted (edited)

Share this post


Link to post

Hmmm - I can edit code posts just fine, by double-clicking the code, which enters the code edit window.

 

I can't see a logical reason 3x3 would be impossible, but you need a divide by 3, ,which is, of course slow. A long time ago (so it may no longer be relevant), ZDoom and others had code that increased the pitch of certain power-of-2 resolutions, like 1024, to 1028 or more, to ensure that subsequent vertical writes were not hitting the exact same position in the cache. This supposedly sped up the rendering quite a bit. Of course, your final blit cannot be a single block move, but must instead be line-by-line, to skip the extra pixels. You can, of course, write debug wrappers around each pixel write, to determine exactly what's happening. I'd suggest a function which draws test lines across the whole screen, just to ensure that everything is everything. When you're knee deep in such far-reaching modifications, it's sometimes easy for an assumption to fly right under your radar. Recently, I spent an obscene amount of time debugging a stupid bug. As I would single-step through the code, each time, I'd step over this function that "couldn't possibly be causing the problem". Guess what? Yep, that function had an initialization bug. A friend of mine actually was watching me, and said, "Hey, what about that call?". I wouldn't have found it, had he not said that, for a long time.

 

Glad you got it working. Good job!

 

1 person likes this

Share this post


Link to post

I didn't know about the double click, i just deleted the code and it's not possible to add new code without killing the entire message.

 

 I've uploaded the final version to https://github.com/drfrag666/gzdoom/commits/gzdoomle (and gzdoom32 and zdoomcl).
 On the 3x3 thing with linefrom += realpitch*3 actually there were many missing lines (missing data) and i couldn't fill them. I don't think it's related to those extra bytes in the pitch for aligment. With pitch*2 there was an extra line i could not fill (quadruple vertically would be easy). I don't know what you mean with divide by three.

 I even drew an sketch on paper but i could not get it to work. Anyway as soon as you do something strange the engine will crash in GillotineBinPack.cpp.
 Note that there are no extra bytes for low resolutions and then realpitch is equal to the screen width.

Edited by drfrag

Posted (edited)

Share this post


Link to post

I'd have to spend some more time looking at your code, before I could say exactly how to do it. But, just hypothetically, it should be do-able on paper. Say you wanted to do 640x400, with 3x3-detail video mode. You'd end up with (640/3)x(400/3) low-detail pixels, or 213.333x133.333 big pixels. I imagine the .333 was what was giving you troubles. But without even trying, I could imagine internally rendering to 213x133, and blitting 3x3 squares to the screen, leaving a 2-pixel horizontal (at the bottom?) and vertical (on the right?) gap. Or internally rendering to 214x134, filling in the gaps with smaller pixel slivers. So, it *can* be conceived on paper, but you either leave the gaps, or your code becomes a lot larger, just to handle the slivers. 2x2 usually always works, because resolutions are almost always numbers divisible by 2. Same deal with 4x4 - resolutions are almost always divisible by 4. 3x3 is an anomaly in many cases.

 

If you decide to avoid the slivers, you can handle any block size with this formula:

// in the following formula, '\' = integer division:
fixed_x_resolution = (real_x_resolution \ x_block_size) * x_block_size; 
fixed_y_resolution = (real_y_resolution \ y_block_size) * y_block_size;

So, for 640x400, at 3x3, you have:

real_x_resolution = 640;
real_y_resolution = 400;
x_block_size = 3;
y_block_size = 3;

And, finally:

fixed_x_resolution = (640 \ 3) * 3 = 639;
fixed_y_resolution = (400 \ 3) * 3 = 399;

You only need to calculate these once per video mode change. You use these fixed values to check bounds, and to terminate your for loops, thereby avoiding the ugly extra pile of sliver-handling code (which looks ugly anyway). Your drawer can remain fast and streamlined, and that one drawer can handle *any* block size (though it may still make sense performance-wise to have multiple drawers optimized for each specific size, though it's not strictly necessary.)

 

If you're really cool, you center the output, distributing the gaps on both sides of each axis, using:

// '%' = mod
start_x = ((real_x_resolution - fixed_x_resolution) % x_block_size) >> 1;
start_y = ((real_y_resolution - fixed_y_resolution) % y_block_size) >> 1;

This makes the gap as even as possible on both X and Y axis.

 

One final note: If you use the above code, you may also want to paint the gaps black, so remnants of the title screen and other things will be cleaned up. Personally, I'd rather see all the large blocks whole with a tiny blank gap at the borders, than to fill the gaps with tiny slivers off to one side. That would look worse, I think.

 

I guess for 4K monitors, it is not unreasonable to expect some people to want a 12x10 block size, to emulate 320x200... (yikes)

Edited by kb1

Posted (edited)

Share this post


Link to post

 Yes, it's possible on paper but i still don't know how to do it. 3x3 wouldn't be very useful anyway, i've added the 4x4 mode instead and it was not as easy as i expected. I've corrected horizontal position with reduced screen sizes as well.

Share this post


Link to post

That code I provided should handle any x*y pretty easily, using your code, with a little finesse. I'm not as concerned with 3x3 vs. 4x4. I'd be thinking about larger sizes. A 4K monitor would need 12x9, or 12x10 for example. The code I posted handles any combination. It's a little less efficient cause you can't unroll the writes, but it allows you to emulate 320x200, 320x240, 600x400, 640x480, whatever, with any source resolution, using the same code. If it were me, I'd write optimized drawers for up to 4x4, as you've done, and use the above code for all the other cases, and you don't have to worry about writing more drawers.

 

To start, you can take your 4x4 drawer, and get it to accept the 4 and the 4 as function arguments (DrawTextureLine(4, 4, ...)). If you can get that to work, it should handle 3x3, or whatever, without you having to figure it out. That'd be my first try, anyway. Good luck.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now