Synchronization of pseudorandom number generators across a network feasible?

Spleen · July 24, 2010

Right now, Odamex's monsters stay synchronized between client and server by keeping the seed of their pseurandom number generators synchronized. Each monster has its own index for Doom's random number table, which gets incremented every time they access it (I realize this isn't very random at all but that is irrelevant to synchronization as far as I know, so unless I'm wrong on this I'd like to not get sidetracked in this direction). They then used that for gameplay-related randomness.

However, the monsters also desync quite a bit. The indexes are synchronized every 10 tics along with other monster properties like momentum, state, and target, but slightly timing issues might be throwing things off even when they're updated every single tic. The indexes seem to randomly be off by a couple sometimes, even if there's just you and a cyberdemon in a square room. This seems to occur more if you shoot said cyberdemon. I'm guessing there's some kind of information asymmetry or mistiming issue, but it's hard to track it down because there's quite a bit of duplicated code between the client and server (ouch).

Is it feasible to make this system work reliably, or is it better to take the gameplay-related randomness completely out of the client, like Skulltag 97c2 does? I'm thinking it's too complicated and unreliable to try make sure the client and server make exactly the same number of calls to every monster's random number tables, but if someone here knows anything, please come forward!

DaniJ · July 24, 2010

I was under the impression that Skulltag uses a client-server architecture so why is there a need to sync the client's playsim RNG when that should be handled by the server?

Spleen · July 24, 2010

DaniJ said:
I was under the impression that Skulltag uses a client-server architecture so why is there a need to sync the client's playsim RNG when that should be handled by the server?

In Odamex, routines like P_CheckMissileRange, P_TryWalk, P_NewChaseDir, and A_Chase are still called by the client to obtain a local representation of the gamestate, but the server updates clients with new monster information (including the random table index) to resync them every 10 tics.

Skulltag 97c2 avoids doing those calculations on the client at all, relying more on the server to inform the client, and eliminating the need to synchronize pseudorandom numbers.

Which approach do you think is better?

DaniJ · July 24, 2010

Spleen said:
In Odamex, routines like P_CheckMissileRange, P_TryWalk, P_NewChaseDir, and A_Chase are still called by the client to obtain a local representation of the gamestate, but the server updates clients with new monster information (including the random table index) to resync them every 10 tics.

Problem is, this is just pure guess work and very error-prone to implement well.

Which is better depends on your POV but in my book the best solution is the one which reduces in-play packet transfer frequency and/or client-side gamestate variance.

Bear in mind I'm no expert on those ports but it sounds like Odamex is using those for client-side prediction (and so too Skulltag). If Skulltag is removing them client-side that means they are being replaced with more generic prediction logic (no randomness).

Spleen · July 25, 2010

Well, Skulltag 97c2 updates monsters every 3 tics. If you change it to every 10 tics like in Odamex, they become really jerky. So it's a bit of a trade-off between "transfer frequency" and "gamestate variance".

Ladna · August 2, 2010

I'm no expert, but I think that the PRNG sync isn't the problem. There's either a discrepancy between the client and server code or packets are being dropped. It could also be both. I would just use the Skulltag approach and ignore the client PRNG during C/S games. I can't think of anything you'd need the PRNG for; I think all state is already send in Oda's network messages anyway, just send them more frequently if things get jerky (or make it a CVAR).

Maes · August 2, 2010

In all network-based games, a tradeoff is usually made between 100% status accuracy, and the appearance of fluidity.

The most anal (and precise from an informational point of view) approach would be to send the PRNG index EACH TIC to EVERY CLIENT with mutual acknowledgement, and THEN AND ONLY THEN advance to the next tic, slowing down or even pausing the game for everybody if necessary: this is the so-called "synchronous" network model, where everything works in a strict lockstep.

This has the advantage of having a 100% consistent status across all nodes, at the expense that slowdowns/dropouts will reflect on everyone. This is only really feasible over a LAN, or else the game will have to be slowed according to the lowest common denominator, aka the slowest and laggiest connection will dominate.

In practice, most games use an asynchronous or "freewheeling" network model, where clients are allowed to keep their own status and syncing is done sparsely or even never. This has the advantage that the game is never explicitly frozen/slowed down for anyone and thus appears to run "continuously" and "smoothly", but status can get VERY inconsistent, and in some games this can fuck up multiplayer gameplay way more than slowing/pausing. Yet, gamers seem to have embraced this solution, because it appears "faster". Well, "faster" as in the "let's just plough on without giving a damn if things get out of sync" kind of "faster", which IMHO sucks ass. But whatever.

Now, I'm not sure what vanilla Doom used for its networking model, but I know for a fact that some ports like ZDaemon can get pretty out of sync, and with pings > 100 you either have to camp, rocket-spam, BFG-spam etc. else there's no fucking way you can win a straight one-to-one.

Ladna · August 2, 2010

I personally wouldn't implement an acknowledgment message in my protocol; if I wanted that I'd just use TCP and disable Nagle. Furthermore you don't strictly need to lag all other clients while waiting for messages from one. I'd probably use a thread to poll the network socket, and place the packets in a queue. Or if you hate threads, just have the main Doom loop do it before TryRunTics() or whatever it is (I think Odamex does it this way); with no timeout value you wouldn't be lagging the server with high-latency/low-bandwidth clients, just getting whatever data is currently available.

The real difficulty with C/S network code is dealing with latency discrepancies. Like Client A sees a completely different world than Client B, so should the server try and compensate for that (yes!) ? Even a little latency can cause problems like Client B killing Client A, Client A not receiving that message yet and activating a switch that spawns 700 cacodemons. In this case, the server has to be smart enough to disregard the switch activation message, and the client has to not just spawn 700 cacodemons without the server saying it's OK. Waiting for server acknowledgment is a big pain in competitive games though, imagine trying to open doors or activate lifts and waiting 250ms for it to work. Even worse, imagine playing a player with a 250ms latency, watching lifts instantly descend and your opponent fly up them like it's nothing. Good luck shooting something that is never where your client thinks it is.

Re: Odamex, I think the best possible network play would be to broadcast client commands every TIC. There's a number of obstacles, not the least of which is the server and client having completely separate (but only slightly different) codebases regarding Actor movement, and the use of UDP (packet loss) will make some kind of sync necessary. But if you do that, player movement prediction (both first-person and opponent) is pretty much just taken care of. Otherwise you're going to end up syncing player positions quite often, and probably implementing some kind of interpolation to avoid the resulting jerky player movement.

It's probably also worth saying that every so often I read that someone is concerned about client bandwidth use. When ZDaemon was hyping its mythical 1.09 version, one of the big bullet points was very low bandwidth usage. OK, ZDaemon uses less than 10K/s, I can literally run 70 ZDaemons before I run out of bandwidth. I'd really prefer smoother player movement and fewer desyncs (corpses, invisible players, flag touches/picks/returns/captures not being announced, etc.) over moving down to 8K/s. There is absolutely no game that a dial-up user can currently play and not get royally screwed, whether by lag and packet loss or simply not being allowed to join the server in the first place. I personally think catering to that small group of users is insane, and I'd urge you to consider the fact that most gamers have at least 512kbps upload (~60K/s), and we're talking about sending a ticcmd_t over the wire 35 times a second. Even if we count the extra byte at the beginning of the network message, it's really just 280 bytes/second. So go hogwild. You can always just keep Oda's current architecture and make it a client cvar whether they'd prefer to receive client commands and infrequent position updates or no client commands and frequent position updates.

Maes · August 2, 2010

I was not talking about how the code implements waiting for acks (they could as well cycle animations in place or compute a mandelbrot fractal, for all I care).

I was speaking about having a strict requirement that NO TIC GOES AWOL. I had the "pleasure" of modding a RTS game written in C++ (Warlord Battelecry III) whose network receiver module did use TCP, but the packets had to be split-microsecond accurate in order to mantain consistent status, at least when using "Internet play". If a packet arrived out-of-order, it was discarded UNLESS it was a message issued by the server (the hosting player, in this case, since there was no centralized server system, PvP was always P2P..or almost).

This meant of course that the "host" player got an unfair advantage, for his commands would ALWAYS get executed once they reached the other players, while those coming from non-host peers would be discarded. This created plenty of opportunities for e.g. initiating a "zerg rush" type attack, spamming the network with APM events, and while the host's attacks always hit, the defender could not counter.

The LAN code actually had the option of asking for re-transmission of out-of-order packets, and in fact the same game used over Hamachi (with a VPN) could use the LAN mode over Internet, resulting in more consistent (and fair) matches.

SO that's what I meant: either you have a 100% lax "asynchronous" network model, or a strict, lockstepped, synchronous one with eventual pauses or ever REWINDING of status to a status quo antem, until ALL packets arrive perfectly in-order and status is bit-accurate across clients. Whether you have a 100% blocking call or several threads that do parallel Fibonacci numbers in Bignum arithmetic while waiting for a resync, is another matter.

Ladna · August 3, 2010

Yeah I think these days a P2P architecture is a mistake for network gaming, just because of the advantage it gives the host... although now that I'm thinking about it I'm not sure why you need a "host" per se, just the address of someone in the "game" that you can get the addresses for all the other players from. It seems overly complicated though, probably why most "P2P" architectures are really "my client is also a server" architectures.

Like you said, the tradeoff is always fidelity versus latency, and I personally think it should be up to the client. For instance, the way ZDaemon handles packet loss is to just use re-syncing. So if I haven't received a position update for Client A in a while (whether I'm dropping packets or they are), they're just stuck there until I receive another one. When I DO receive another one, they warp to that location. I think ZDaemon has some restrictions on how far away from your current position you can move on a per-tic basis (when not teleporting), but all that aside, it just makes aiming very difficult when players are warping all around you because the netcode doesn't mind dropping TICs, or you yourself are warping around because your client is constantly re-syncing.

But while there's nothing you can do if you haven't received a ticcmd_t message from a client, it is possible to avoid the crude position snapping. Because client TICs are sent with every ticcmd_t packet, the server knows how lagged each client is relative to it and every other client, so it can take that into account when calculating collisions. In our example, the server knows that Client B is now seeing Client A 35 TICs further into the past already anyway; Client A just freezes on Client B's screen for a while (maybe flashes some kind of connection problem icon). If Client B tries to shoot Client A, the server won't register the damage/kill yet because it doesn't know where Client A is at that TIC -- assuming it's Client A's problem. If it's Client B's problem then either the server and Client A will eventually receive B's packets (and vice-versa), or B's connection will be dropped by the server.

So to boil it down, I think I'm talking about an "eventually synchronous" model that's tolerant of and compensates for latency. I would probably use TCP because packet loss is the goddamn devil, and I'd probably make the client wait on server acknowledgment of line activations, spawns/deaths, etc., because rewinding logic is just too difficult or laborious to do right. But I wouldn't care if clients dropped their connections or fell behind due to latency spikes.

RestlessRodent · August 3, 2010

Each object can use it's own random number index. This would break vanilla behavior though so it's not feasable for Odamex.

Spleen · August 3, 2010

GhostlyDeath said:
Each object can use it's own random number index. This would break vanilla behavior though so it's not feasable for Odamex.

Each object already uses its own random number index in Odamex, at least online. :P

Ladna and Maes: thank you, you guys have some interesting insights.

Sign In

Synchronization of pseudorandom number generators across a network feasible?

Recommended Posts

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in