AI research and Doom multiplayer

kb1 · August 1, 2019

27 minutes ago, sun_stealer said:

@kb1 I apologize for not being active here, I was quite busy with work and just life in general) I will provide my comments to what you said once I clear things out.

As a project update: I hit a major roadblock being unable to train any good policies with memory (LSTMs) and I suspect bugs in the research framework I was using. Ended up rewriting the algorithm in PyTorch from scratch and now it looks like it is working better. But it took a lot of effort.

No apologies necessary. Please, take your time. For a few posts there, we were going back and forth frequently...so I was a bit thrown when you didn't reply - my bad.

I don't envy you, chasing bugs. It's the type of system where it's tricky to even know that you have bugs, not to mention trying to track them down :)

sun_stealer · August 28, 2019

@kb1

Please find my comments below:

> In a reward system, the worst possible action can be bounded at 0, yet the best possible action cannot be bounded (because it's infinite).

In reinforcement learning the standard is to use what is called "discount factor". When we evaluate whether the action in timestep t was good or bad, we multiply future reward/penalty received at timestep t+i by a coefficient that is close to 1 (e.g. 0.999) raised to the power i, so 0.999^i

The more we look into the future the less we pay attention to future events. This creates nice numerical guarantees: a sum of future rewards is always bounded.

This is also a parameter of the algorithm that allows us to interpolate between a shortsighted agent (e.g. 0.9) and an agent that cares a lot about far future (e.g. 0.99999). Although using high value for discount factor seems attractive, the relatively shortsighted agents are much easier to train because variance of returns is much lower.

Bottom line: we try to avoid infinities, whether rewards are positive or negative, the agent's objective is always bounded even in infinite horizon case.

> If your reward system is mainly only using number of frags, here's the problem: Shots that kill are rewarded, but shots that do damage, but don't kill are not rewarded at all. This makes it very difficult to learn how to use the lesser weapons like fist, pistol, and chainsaw: If a single use doesn't cause a frag, there's no reward for using them, thus no knowledge to learn. Sure, eventually a single use will yield a kill, but this will appear as noise, along with the rare death caused by stepping into a nuke pool.

Currently, in fact, I am using a small reward for dealing damage for exactly the reason you described. This is called "credit assignment problem": it is hard for the agent to learn behavior (deal damage without killing) that will be beneficial only, let's say, 300 timesteps into the future.

Still this is not impossible. There's always a balance between making the task easier by adding more reward shaping versus making the most unbiased agent that won't care about anything except the final objective.

One interesting approach to this is meta-optimization. You can have an inner learning loop that maximizes shaped reward and an outer loop that optimizes coefficients for rewarding certain events to maximize only one final goal: number of frags. One such algorithm is called Population-based Training, but there are many.

> You can actually combine both systems (add up and scale penalties, then scale and subtract rewards...or vice-versa).

Yeah, this is what we're currently doing. There is also a neural network called Critic that predicts the future rewards in any given situation. This prediction is then subtracted from the actual return to generate the quantity called Advantage. If advantage is positive we did better than we expected, if it's negative we did worse.

Advantages are normalized (subtracted mean and divided by standard deviation) so sign and magnitude of rewards does not actually matter that much. Only relative quantities matter.

> Gathering data can be challenging with a screen-pixels-only approach. In the case of causing damage, I suppose you could detect blood sprays. You could also read weapon counters, health, armor, and keys by "seeing" the on-screen numbers. Personally, I would dig these stats from within game memory. The stats are by no means hidden from a human...therefore not cheating.

We are feeding some info into the neural network directly: ammo, health, weapon selected, etc. But only the information that's available to the player through GUI.

> The scoring can also be a combination of accumulative stats, as well as stats altered during this tic.

A very truncated example score system:

Your code actually looks rather similar to mine :)

> In closing, the concept I'm trying to advise on is that the scoring function is how you instill desire onto the AI. The more stats you can feed it, the better. Each of these stats should have weights assigned to them, that let the AI intelligently choose the best action, within a list of bad choices. The magnitude of such weights is not that important - what is important is the relative magnitude of one weight vs. the other. Some empirical testing will help you adjust those weights - it's not very difficult once you get into it.

Yes, this is not far from my approach.

Although my philosophy is that by shaping reward too finely we can instill undesirable human bias. E.g. we can discourage some behaviors that would otherwise help to maximize the number of frags. What if shooting the wall in the particular situation scares away the opponent and allows us to get to health? Just an example.

There's an interesting take on this in recently released DeepMind podcast where they talk about their bots for Capture the Flag game. The author of the paper says that the bots aren't necessarily "smarter" than human opponents at all times, but they are completely ruthless and relentless. E.g. if the bot is holding a flag and running past an opponent, it won't even turn to look if it knows that it can score. It does not care about absolutely anything else, except the objective. And this allows the bots to be highly efficient and actually extremely hard to beat.

sun_stealer · August 28, 2019

Sadly, there's no project update at this time. We decided to focus on some theoretical things that we found while working on the project, namely how agents learn to use memory with recurrent neural networks etc.

I am planning to go back to this project later)

sun_stealer · September 10, 2019

I was thinking again about porting VizDoom functionality into a client-server version of the game. Having an ability to actually run a 24/7 server with AI bots just makes the project so much cooler and gives me motivation to work on it.

I looked at VizDoom codebase and actually the difference between ZDoom and VizDoom is not that scary, quite manageable.

Also, I compared Zandronum and Odamex. Zandronum is based on a much newer version of ZDoom, around 2.8.1. Just like VizDoom. Therefore certain files just have 1to1 correspondence, which makes the porting so much easier.

On the other hand, looks like Odamex is based on the older version and the differences are larger.

Do I actually gain anything by using Odamex over Zandronum? I can emulate "classic" Doom gameplay in Zandronum by configuring my server to restrict vertical mouse movement, jumps, etc, right?

What would you guys recommend using? Are there any other things I didn't consider?

@Fonze @Maes @Doomkid @Decay

BTW, I trained some policies based on recurrent neural nets, and this version of the bot is pretty insane :) https://www.youtube.com/watch?v=Lk8OWLVGpVM

Doomkid · September 10, 2019

There are substantial differences between Odamex and Zandronum.

Zand is far more advanced and has many new features that Odamex lacks, as Odamex was based on ZDoom 1.22 from a few centuries ago. One strong advantage Odamex has over Zandronum is that it can record demo files that are compatible with the vanilla Doom executables, Zandronum can't do that. If you want to mimic vanilla gameplay, Odamex is going to be better for that overall, but Zandronum has all sorts of crazy stuff like skins, new gamemodes, OpenGL rendering, 3D floors and polyobjects, etc etc. Odamex does have some other random cool features that Zand lacks such as multi-wad rotations for servers and a better rcon system, but these are limited use features for most users (apparently).

Long story short, if using Zandronum is easier, do that. They're both very widely cross compatible so that shouldn't really be an issue either!

Avoozl · September 10, 2019

I am surprised to see that this thread isn't by GoatLord.

Hekksy · September 10, 2019

The obvious answer is Zandronum because its easier for you to port and the most widely played client server port. However, I personally would love to see this in Odamex because it currently lacks bot support at all and desperately needs them.

Both :)

sun_stealer · September 11, 2019

Thanks a lot! I'll give it a go with Zandronum

-TDRR- · September 11, 2019

16 hours ago, Hekksy said:

The obvious answer is Zandronum because its easier for you to port and the most widely played client server port. However, I personally would love to see this in Odamex because it currently lacks bot support at all and desperately needs them.

Both :)

Ackshually, i went back to working in bots for Odamex. They are the ZCajun bots so expect them to be garbage at first, though they really aren't that hard to improve. Just takes a little bit of patience and a couple hours.

In Odamex, it would be rather hard to use the bots since you would have to open multiple instances, i imagine having them all visible on-screen at the same time. Which is obviously very impractical and not really comfortable at all.

Maes · September 12, 2019

On 9/10/2019 at 9:45 AM, Avoozl said:

I am surprised to see that this thread isn't by GoatLord.

It doesn't involve rectally insertable computers (AFAIK), so why would it be?

Mr.Rocket · September 12, 2019

Bare with me when I bring this up..

But the oldie Doom Legacy version with "ACBOT" was actually a pretty good bot for its time "for Deathmatch" anyway. ~ Co-op, not so good.

One of them were supposedly based on my movements, node movement somehow, I'm not sure how TonyD did that.. nvl, the bots name was "YOMOMMA". ~ as he asked what I wanted to call the bot.

Doomkid · October 28, 2019

Sorry for the bump, I'm just curious if any more work has been done on this? It's pretty fascinating stuff

Looper · October 29, 2019

I would really like to see a good deathmatch AI for doom, as long as it is not only powerful because of its overpowered aim/latency control/hitbox detection/insane reaction time. I mean sure, I would like that one too, but that's nothing impressive compared to the required game intelligence behind some of the maps... if there is any. Maybe AI can show us how fool our game intelligence actually is. I wouldn't be too surprised because Dota 2 AI showed us humans some new interesting tatics and beat us... and then humans learned them and beat the AI again.

sun_stealer · February 1, 2020

Hi everyone! I am happy to announce that this project is not dead. It changed course a bunch of times, and there were complications. But finally we are planning to publish a paper at a big machine learning conference, which will be about a very fast scalable architecture for deep reinforcement learning. We will report experiments with Doom environments, with agents learning how to act in several game modes by learning from raw pixel observations, just like humans do.

Now, the interesting part! Doom lovers have a chance to contribute to a scientific publication! For some of our experiments we would like to report the "human baseline". In order to collect the data we're asking Doom players of all levels to play four game modes and submit the results. The whole process should not take more than 30 minutes.

We are asking volunteers to try to score as many kills as possible in four game modes:

- two single player environments against monsters

- duel against a bot

- 8-player deathmatch against bots

The objective is to kill as many opponents as possible, the number of deaths e.g. does not matter. The only goal is to be the most effective killing machine. It makes sense to really try, because the participants will represent humanity in all its glory! Making several attempts is allowed.

We have no intentions to misrepresent the results, and beating humans in Doom is not a point of this paper. In fact, we have no illusions that it will be possible to beat humans in a fair 1v1 without some serious work. The goal here is to show the human baseline for these particular tasks, create a reference point for future work.

Here's the instruction:

1) Download the archive and unzip to C:/your/preferred/dir: doom_human_play_scripts.zip

2) Install Miniconda for Python 3.7: https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe Follow all the default options during installation

3) After installation search "Miniconda" in Windows start menu to open Anaconda prompt (command line window with Python environment)

4) Navigate to the directory in the command line by typing "cd C:/your/preferred/dir" (without quotes)

5) Type "create_env.bat" to install Python dependencies

6) Type and execute command "run_singleplayer_1.bat" to play single-player environment #1

Use run_singleplayer_2.bat, run_duel.bat, run_deathmatch.bat for the other three environments

7) After a desired number of games in each environment is played, please zip the folder "results" inside the script dir and send to petrenko@usc.edu with a topic "Doom study" (or post here)

8) In the email please mention whether you are "novice", "amateur", or "expert" Doom player!

Currently we only have Windows version, but if there are many people on other platforms, we can make a package for Linux/MacOS too. Please let me know! If there are problems with installation please let me know too.

My results are: 37 & 19 for singleplayer, 14 in duel, 32 in deathmatch. I am an amateur at best, and with no practice, so this should be easy to beat! Let's do a little competition in this thread, who can score the highest :) I will post a link to the paper when it's out so we can compare to agent's results.

Edited February 1, 2020 by sun_stealer

sun_stealer · February 2, 2020

I see people are not extremely excited about this :)

@Fonze @Maes @Doomkid @Decay

I'd really appreciate your help! The deadline is Feb 5th

This is not extremely critical for the paper, but I think it'd be a really cool addition and can help people make sense of numbers in the paper. This is what researchers often do with experiments in other domains, e.g. Atari games.

If you have feedback of any kind, please post here. Maybe I made an instruction too scary ;D It is actually just 5 minutes to go through it!

Edited February 2, 2020 by sun_stealer

Fonze · February 2, 2020

Hey, I think a lot of people have just taken a back seat to seeing what happens here as you work through things. It may have been good to give people a bit more time to help you though; Feb 5th is a couple days away! This wont leave you much time to get very many people's results, and time cut shorter for those not on windows.

I'll be happy to help, though setting up python will be a new experience for me. I do wonder if I can get a bit more info on each of the wads/game modes youd like people to play (at work for the next several hours)

Decay · February 2, 2020

If you're involving actual players in the research, shouldn't that require an informed consent document?

sun_stealer · February 3, 2020

10 hours ago, Fonze said:

Hey, I think a lot of people have just taken a back seat to seeing what happens here as you work through things. It may have been good to give people a bit more time to help you though; Feb 5th is a couple days away! This wont leave you much time to get very many people's results, and time cut shorter for those not on windows.

I'll be happy to help, though setting up python will be a new experience for me. I do wonder if I can get a bit more info on each of the wads/game modes youd like people to play (at work for the next several hours)

Yes, we were not actually planning to do this at all, not for this publication, but thought it would be a great addition to the experimental section. I appreciate the feedback. Time is indeed short. If we don't collect enough results from people, I'm still grateful for people sending their scores, it will still be useful in future work.

I think if we ever publish a follow-up paper on this, we will do this properly, with cash rewards for participation and bounties for highest scores, etc. But this will take a long time to approve with the university because bureaucracy.

Regarding the game modes. Two singleplayer ones are some randomly generated maps where you need to find and kill as many monsters as possible in a short amount of time. 8-player deathmatch is on Dwango5 MAP01 versus 100% in-game bots. Duel is on SSL2 map, also against a 100% bot.

6 hours ago, Decay said:

If you're involving actual players in the research, shouldn't that require an informed consent document?

Not sure if it's required for this kind of research. The results will, of course, be completely anonymous and we won't store any personal information from players. Thanks for feedback though, I will double-check about that.

Doomkid · February 3, 2020

I’ve been somewhat time strapped - since the deadline is the 5th I’ll do my best to get some results for you before then. Very excited by this research rest assured!

Lil_Ruff · February 4, 2020

I was running though this last night and it was pretty cool. The whole spawn invuln threw me off for duels and the DM. The AI was fairly predictable in the duel test. I could get him to go to the same spot every time after the spawn protection wore off and get an easy 1 shot kill, until I ran outta ammo. I am going to run some more tests tonight and send ya the results.

Pretty cool stuff.

xvertigox · February 12, 2020

@sun_stealer I've set up ViZDoom to run so I can watch the AI figure out how to play. I've added the bot 'Rambo' and he's running around shooting stuff (poorly). Will he actually incrementally improve? He already seems to know to pick up medikits and open doors. If he does improve is that information stored in a particular file?

Edit: I'm sure I'm completely off the mark as to how training the agents actually works. I'm doing some reading now.

Edited February 12, 2020 by xvertigox

Sign In

AI research and Doom multiplayer

Recommended Posts

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in