Strictly defined means to interpolate angle_t

Quasar · January 5, 2014

Nobody is using one, according to the C++ standard.

  return oldangle + FixedMul(lerp, newangle - oldangle);

Given fixed_t defined as signed int (or int32_t) and angle_t defined as unsigned int (or uint32_t), the above expression invokes implementation-defined behavior for any difference value which lies outside of the range from INT_MIN to INT_MAX.

I have tried a thousand different ways of getting around this and none of them work so far. I am about to call it quits unless somebody has a better solution. I will NOT deliberately invoke the above sort of lazy pray-it-works behavior when I know better.

andrewj · January 5, 2014

Angles cannot be interpolated as simply as that code. For example, going from 1 degree to 359 degrees should go the short way (interpolate over 2 degrees) and not the long way.

Also the C standard (section A4.2) guarantees that unsigned integers will wrap around when an addition overflows.

MP2E · January 5, 2014

Quasar found two versions that work and are more correct, however, they are slower. From benchmarks run on an x86_64 Linux system running on an i7 3770K @ 4.2ghz, the times are as follows:
No optimization from compiler:

103 seconds elapsed for double routine.
93 seconds elapsed for int routine.

O3 optimization:

40 seconds elapsed for double.
20 seconds elapsed for int(!)

This benchmark was built using clang 3.4, but any fairly recent C++ compiler with support for C++11 should work. Please test and report back on different platforms/architectures so as to give Quasar a better "feel" for which will perform better on a majority of systems.

The benchmark in question just calls each interpolation routine with random data as the arguments 300 million times per routine, while timing how long it takes.

Here is the source code of benchmark program:
http://pastebin.com/Y9wx41d4

EDIT:
GCC 4.7.3 is even faster.
No optimizations:

79 seconds elapsed for double.
61 seconds elapsed for int.

O3 optimizations:

21 seconds elapsed for double.
15 seconds elapsed for int.

Archi · January 5, 2014

MP2E said:
Please test and report back on different platforms/architectures so as to give Quasar a better "feel" for which will perform better on a majority of systems.

Here is my results:
I'm running the test on Windows 7 x64 running on AMD Phenom II x6 1035t @ 2.6ghz, compiler is the latest MinGW pack, so it doesn't support c++11 features, so I slightly edited your source file.
Here it is: http://pastebin.com/2mXmsM4R
Please correct me if there's something wrong.

No optimizations:

21 seconds elapsed on getting 3*300 million random numbers.
37 seconds elapsed for double.
38 seconds elapsed for float.
27 seconds elapsed for int.

O3:

22 seconds elapsed on getting 3*300 million random numbers.
30 seconds elapsed for double.
30 seconds elapsed for float.
25 seconds elapsed for int.

Graf Zahl · January 5, 2014

Visual C++ 2013, Core i7, 3.2 GHz:

Unoptimized:

15 seconds elasped for getting random.
30 seconds elasped for double.
44 seconds elasped for float.
35 seconds elasped for int.

Optimized with SSE math:

10 seconds elasped for getting random.
13 seconds elasped for double.
14 seconds elasped for float.
15 seconds elasped for int.

Optimized with x87 math:

10 seconds elasped for getting random.
11 seconds elasped for double.
13 seconds elasped for float.
15 seconds elasped for int.

I find it interesting that x87 math is consistently faster than SSE2, quite contrary to common wisdom - but I experienced the same in GZDoom as well.

wesleyjohnson · January 8, 2014

I would have to know what horrible thing the C++ standard has implied now that is causing the problem.
Is it the lack of a specified behavior for integer overflow ??
I assume it is not some problem with truncating instead of rounding, or some loss of precision. Or is there some concern that some compilations may get more precision than others if the compiler uses longer registers than the specified 32 bits ?
The C++ standard may have left itself open to alternatives to handling overflow, such as limiting math (an Overflow value for integer, like FP has), or an exception on overflow, but the available processors we are using do not implement integer math that does not allow overflow wrap.
It is also desirable to have overflow math available, with reliable wrapping, so one question is what did the C++ standard provide to do such calculations.

My stock answer when pressed with such a problem:
1. Imbed the 32 bit value as an unsigned value in a signed 64 bit register. The upper bits can be all 0. This is the same as adding your angle to some 64 bit constant. If you don't want the result to go negative, add an additional constant before subtracting.
2. Do the difference calculation. There cannot be overflow.
The conversion constants added previously will have canceled each other.
3. Use the lower 32 bits as an unsigned result. The mask step emulates the wrap. There is no sign to cause problems.
4. To convert to a signed result involves another wrap.
Easiest is to assign the 32 bit unsigned to a signed type and let the format change do it. There is no overflow to trigger an exception.
5. This will be much faster than any code that tests to prevent overflow.

Sign In

Strictly defined means to interpolate angle_t

Recommended Posts

Quasar

Share this post

Link to post

andrewj

Share this post

Link to post

MP2E

Share this post

Link to post

Archi

Share this post

Link to post

Graf Zahl

Share this post

Link to post

wesleyjohnson

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Downloads

Cacowards

Activity