Idea: METADATA lump for Slade and Doom Builder

Gez said:

So basically MD5 takes twice as much time as CRC-32 and SHA-1 takes thrice.

MD5 appears to be nearly twice as fast as CRC-32 (and SHA-1 is somewhere in between) according to that table; it takes 247 usec to produce a checksum for a block that takes CRC-32 468 usec.

Share this post


Link to post
grommile said:

MD5 appears to be nearly twice as fast as CRC-32 (and SHA-1 is somewhere in between) according to that table; it takes 247 usec to produce a checksum for a block that takes CRC-32 468 usec.

You're right, for some reason I remembered Adler32 as being an implementation of CRC-32 instead of being a different thing.

Share this post


Link to post

I was actually surprised at just how fast MD5 was in that test considering it's a Real Hashing Algorithm (albeit a now-broken one), unlike something like adler32, which is basically literally just "count up the values of all the bytes, modulo some prime number".

Share this post


Link to post

Hmm yeah, trying it out again calculating the md5 hash for each entry in doom2.wad only adds ~40ms to the load time on my system (from ~80ms to ~120ms), while crc was actually very slightly slower.

No idea where the slowness I remember from last time came from.

So I guess it is probably best to use md5 for this rather than crc32.

Share this post


Link to post

(Woah, big bump).

 

CRC32 is blindingly fast when optimized:

First, add the table (which can be generated vs. this list), and initialize the CRC stream.

Spoiler

// Standard CRC32 table
static const unsigned int crc_table[256] = {
	0x00000000L, 0x77073096L, 0xee0e612cL, 0x990951baL, 0x076dc419L,
	0x706af48fL, 0xe963a535L, 0x9e6495a3L, 0x0edb8832L, 0x79dcb8a4L,
	0xe0d5e91eL, 0x97d2d988L, 0x09b64c2bL, 0x7eb17cbdL, 0xe7b82d07L,
	0x90bf1d91L, 0x1db71064L, 0x6ab020f2L, 0xf3b97148L, 0x84be41deL,
	0x1adad47dL, 0x6ddde4ebL, 0xf4d4b551L, 0x83d385c7L, 0x136c9856L,
	0x646ba8c0L, 0xfd62f97aL, 0x8a65c9ecL, 0x14015c4fL, 0x63066cd9L,
	0xfa0f3d63L, 0x8d080df5L, 0x3b6e20c8L, 0x4c69105eL, 0xd56041e4L,
	0xa2677172L, 0x3c03e4d1L, 0x4b04d447L, 0xd20d85fdL, 0xa50ab56bL,
	0x35b5a8faL, 0x42b2986cL, 0xdbbbc9d6L, 0xacbcf940L, 0x32d86ce3L,
	0x45df5c75L, 0xdcd60dcfL, 0xabd13d59L, 0x26d930acL, 0x51de003aL,
	0xc8d75180L, 0xbfd06116L, 0x21b4f4b5L, 0x56b3c423L, 0xcfba9599L,
	0xb8bda50fL, 0x2802b89eL, 0x5f058808L, 0xc60cd9b2L, 0xb10be924L,
	0x2f6f7c87L, 0x58684c11L, 0xc1611dabL, 0xb6662d3dL, 0x76dc4190L,
	0x01db7106L, 0x98d220bcL, 0xefd5102aL, 0x71b18589L, 0x06b6b51fL,
	0x9fbfe4a5L, 0xe8b8d433L, 0x7807c9a2L, 0x0f00f934L, 0x9609a88eL,
	0xe10e9818L, 0x7f6a0dbbL, 0x086d3d2dL, 0x91646c97L, 0xe6635c01L,
	0x6b6b51f4L, 0x1c6c6162L, 0x856530d8L, 0xf262004eL, 0x6c0695edL,
	0x1b01a57bL, 0x8208f4c1L, 0xf50fc457L, 0x65b0d9c6L, 0x12b7e950L,
	0x8bbeb8eaL, 0xfcb9887cL, 0x62dd1ddfL, 0x15da2d49L, 0x8cd37cf3L,
	0xfbd44c65L, 0x4db26158L, 0x3ab551ceL, 0xa3bc0074L, 0xd4bb30e2L,
	0x4adfa541L, 0x3dd895d7L, 0xa4d1c46dL, 0xd3d6f4fbL, 0x4369e96aL,
	0x346ed9fcL, 0xad678846L, 0xda60b8d0L, 0x44042d73L, 0x33031de5L,
	0xaa0a4c5fL, 0xdd0d7cc9L, 0x5005713cL, 0x270241aaL, 0xbe0b1010L,
	0xc90c2086L, 0x5768b525L, 0x206f85b3L, 0xb966d409L, 0xce61e49fL,
	0x5edef90eL, 0x29d9c998L, 0xb0d09822L, 0xc7d7a8b4L, 0x59b33d17L,
	0x2eb40d81L, 0xb7bd5c3bL, 0xc0ba6cadL, 0xedb88320L, 0x9abfb3b6L,
	0x03b6e20cL, 0x74b1d29aL, 0xead54739L, 0x9dd277afL, 0x04db2615L,
	0x73dc1683L, 0xe3630b12L, 0x94643b84L, 0x0d6d6a3eL, 0x7a6a5aa8L,
	0xe40ecf0bL, 0x9309ff9dL, 0x0a00ae27L, 0x7d079eb1L, 0xf00f9344L,
	0x8708a3d2L, 0x1e01f268L, 0x6906c2feL, 0xf762575dL, 0x806567cbL,
	0x196c3671L, 0x6e6b06e7L, 0xfed41b76L, 0x89d32be0L, 0x10da7a5aL,
	0x67dd4accL, 0xf9b9df6fL, 0x8ebeeff9L, 0x17b7be43L, 0x60b08ed5L,
	0xd6d6a3e8L, 0xa1d1937eL, 0x38d8c2c4L, 0x4fdff252L, 0xd1bb67f1L,
	0xa6bc5767L, 0x3fb506ddL, 0x48b2364bL, 0xd80d2bdaL, 0xaf0a1b4cL,
	0x36034af6L, 0x41047a60L, 0xdf60efc3L, 0xa867df55L, 0x316e8eefL,
	0x4669be79L, 0xcb61b38cL, 0xbc66831aL, 0x256fd2a0L, 0x5268e236L,
	0xcc0c7795L, 0xbb0b4703L, 0x220216b9L, 0x5505262fL, 0xc5ba3bbeL,
	0xb2bd0b28L, 0x2bb45a92L, 0x5cb36a04L, 0xc2d7ffa7L, 0xb5d0cf31L,
	0x2cd99e8bL, 0x5bdeae1dL, 0x9b64c2b0L, 0xec63f226L, 0x756aa39cL,
	0x026d930aL, 0x9c0906a9L, 0xeb0e363fL, 0x72076785L, 0x05005713L,
	0x95bf4a82L, 0xe2b87a14L, 0x7bb12baeL, 0x0cb61b38L, 0x92d28e9bL,
	0xe5d5be0dL, 0x7cdcefb7L, 0x0bdbdf21L, 0x86d3d2d4L, 0xf1d4e242L,
	0x68ddb3f8L, 0x1fda836eL, 0x81be16cdL, 0xf6b9265bL, 0x6fb077e1L,
	0x18b74777L, 0x88085ae6L, 0xff0f6a70L, 0x66063bcaL, 0x11010b5cL,
	0x8f659effL, 0xf862ae69L, 0x616bffd3L, 0x166ccf45L, 0xa00ae278L,
	0xd70dd2eeL, 0x4e048354L, 0x3903b3c2L, 0xa7672661L, 0xd06016f7L,
	0x4969474dL, 0x3e6e77dbL, 0xaed16a4aL, 0xd9d65adcL, 0x40df0b66L,
	0x37d83bf0L, 0xa9bcae53L, 0xdebb9ec5L, 0x47b2cf7fL, 0x30b5ffe9L,
	0xbdbdf21cL, 0xcabac28aL, 0x53b39330L, 0x24b4a3a6L, 0xbad03605L,
	0xcdd70693L, 0x54de5729L, 0x23d967bfL, 0xb3667a2eL, 0xc4614ab8L,
	0x5d681b02L, 0x2a6f2b94L, 0xb40bbe37L, 0xc30c8ea1L, 0x5a05df1bL,
	0x2d02ef8dL
};

static unsigned int current_crc_value;

//
// BeginCRCStream
//  Begin a new CRC stream
//
void BeginCRCStream(void)
{
	current_crc_value = 0xFFFFFFFF;
}

 

 

Then, add bytes, ints, or strings to the stream. (Could probably be optimized a bit more...)

Spoiler

//
// AddCRCByte
//  Adds a byte into current CRC stream
//
void AddCRCByte(char value)
{
	current_crc_value = (current_crc_value >> 8) ^
		crc_table[(current_crc_value & 0xFF) ^ value];
}

//
// AddCRCInt
//  Adds an int into current CRC stream
//
void AddCRCInt(int value)
{
	int v = value;
	int n;

	for (n = 0; n < 4; n++)
	{
		current_crc_value = (current_crc_value >> 8) ^
			crc_table[(current_crc_value & 0xFF) ^ (v & 0xFF)];

		v >>= 8;
	}
}

//
// AddCRCStr
//  Adds a string into current CRC stream
//
void AddCRCStr(char *str)
{
	char *p;
	
	for (p = str; *p; p++)
	{
		current_crc_value = (current_crc_value >> 8) ^
			crc_table[(current_crc_value & 0xFF) ^ (*p)];
	}
}

 

 

And, finally, get the final CRC32 value:

Spoiler

//
// GetCRCValue
//  Returns the current CRC value
//
int GetCRCValue(void)
{
	return ~(current_crc_value);
}

 

 

Using this last function to get the value is required, because the standard required bit flip is postponed until you're ready for the final value. This allows the streaming to be a bit faster.

 

So, for each byte, we've got a right shift, a bitwise AND, 2 XORs, and a table lookup, which, hopefully ends up in a few cache lines. This is so fast that the bottleneck is, most probably in the disk I/O involved in reading the lumps, not the hash function. You could get "slick" and do the hashing in the background, in a thread, or in a pseudo-thread (in a timer, read a few entries between mouse/keyboard clicks), but that gets tricky.

 

MD5 is a bit more complex, and a bit slower, but still very fast, as it only does primitive logic transformations too (and, or, xor, bit-shift, add, etc.) The performance difference is probably not even noticeable, compared to the lump-read time.

 

@DaniJ: I know it's been a long time, but I was re-reading the last few pages of this thread, and I feel that I had a less-than-desirable attitude towards your concerns. In fact, I don't think I ever really understood what your concerns were. If you want to explain, I'd be interested to try to understand, but I don't want to fill up this thread. Maybe, if you wish, send me a PM - I'd be interested to understand better.
 

I am curious if any more progress has been made? Just looking for a status report, please :)

 

EDIT:

Oops, I thought the code windows would be collapsible. What tag should I use? (You know, a plus sign that you click which expands everything within tags to toggle visibility?

 

RE-EDIT: Woah, it works, even through an edit. And, it's nice: Being able to choose "C-style" formatting is awesome!

 

Edited by kb1
trying code within spoilers - it works!

Posted (edited)

Share this post


Link to post

I'm a bit behind on this topic (which started promising and gradually became depressing as I read on). Skip forward 3 years. Since you're all debating which hashing algorithm is faster, does that mean this is otherwise a done deal?

Share this post


Link to post
20 hours ago, kb1 said:

Oops, I thought the code windows would be collapsible. What tag should I use? (You know, a plus sign that you click which expands everything within tags to toggle visibility?

 

The idea would have been to embed the code windows within spoiler boxes, but it doesn't work (yet?).

Edited by Gez
nope, didn't work.

Posted (edited)

Share this post


Link to post
On 3/18/2017 at 9:14 PM, Linguica said:
  Reveal hidden contents


hello world

 

 

Yep, works like a champ! I went back and edited my post.

 

On 3/18/2017 at 7:19 PM, Jon said:

I'm a bit behind on this topic (which started promising and gradually became depressing as I read on). Skip forward 3 years. Since you're all debating which hashing algorithm is faster, does that mean this is otherwise a done deal?

No debate. A few posts back, sirjuddington said that his test of the CRC32 hash read was slow. If you're calling a library function to do the hashing, maybe try my included CRC32 code vs. a library call - it should blow away MD5, in theory, anyway. As far as hashing the whole WAD, it really shouldn't be very slow on a semi-modern box, unless the I/O is being done one byte at a time. If each lump is read in whole, or even in chunks of, say, 64K or so, a whole WAD should take about as much time as it takes to copy the WAD to another folder. The bonus here is that, afterwards, the entire WAD should be in disk cache memory, so manipulating it in the editor should be fast. I'm not sure about Slade, but some editors read the lump contents, in an attempt to auto-categorize the lump, so the hashing could be done during that phase.

 

Alternately, you could do the lump read/hash in the background, after displaying the contents, under the assumption that it will take the user a couple of seconds to begin editing anyway. This could be done to hide the delay caused by the lump read, though it must be done carefully. Unread lumps must be locked until they've been read. And, if any hashes are inconsistent, you'll probably want to tell the user and/or correct the lumps before allowing the lumps to be edited.

 

And, finally, you may want to paint the lump list in two passes: Pass 1 shows the standard info, and pass 2 shows long names, pseudo folders (if you're supporting those features), and the other new fields (create date, modify date, author, etc).

 

1 person likes this

Share this post


Link to post

The crc-32 code in SLADE is here if you're curious.

Share this post


Link to post
1 hour ago, Gez said:

The crc-32 code in SLADE is here if you're curious.

Yeah, that's nice, clean, fast code there. If there's a bottleneck, that ain't it :)

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now