Jump to content
Search In
  • More options...
Find results that contain...
Find results in...

sun_stealer

Members
  • Content count

    30
  • Joined

  • Last visited

About sun_stealer

  • Rank
    Green Marine

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. sun_stealer

    AI research and Doom multiplayer

    Yes, we were not actually planning to do this at all, not for this publication, but thought it would be a great addition to the experimental section. I appreciate the feedback. Time is indeed short. If we don't collect enough results from people, I'm still grateful for people sending their scores, it will still be useful in future work. I think if we ever publish a follow-up paper on this, we will do this properly, with cash rewards for participation and bounties for highest scores, etc. But this will take a long time to approve with the university because bureaucracy. Regarding the game modes. Two singleplayer ones are some randomly generated maps where you need to find and kill as many monsters as possible in a short amount of time. 8-player deathmatch is on Dwango5 MAP01 versus 100% in-game bots. Duel is on SSL2 map, also against a 100% bot. Not sure if it's required for this kind of research. The results will, of course, be completely anonymous and we won't store any personal information from players. Thanks for feedback though, I will double-check about that.
  2. sun_stealer

    AI research and Doom multiplayer

    I see people are not extremely excited about this :) @Fonze @Maes @Doomkid @Decay I'd really appreciate your help! The deadline is Feb 5th This is not extremely critical for the paper, but I think it'd be a really cool addition and can help people make sense of numbers in the paper. This is what researchers often do with experiments in other domains, e.g. Atari games. If you have feedback of any kind, please post here. Maybe I made an instruction too scary ;D It is actually just 5 minutes to go through it!
  3. sun_stealer

    AI research and Doom multiplayer

    Hi everyone! I am happy to announce that this project is not dead. It changed course a bunch of times, and there were complications. But finally we are planning to publish a paper at a big machine learning conference, which will be about a very fast scalable architecture for deep reinforcement learning. We will report experiments with Doom environments, with agents learning how to act in several game modes by learning from raw pixel observations, just like humans do. Now, the interesting part! Doom lovers have a chance to contribute to a scientific publication! For some of our experiments we would like to report the "human baseline". In order to collect the data we're asking Doom players of all levels to play four game modes and submit the results. The whole process should not take more than 30 minutes. We are asking volunteers to try to score as many kills as possible in four game modes: - two single player environments against monsters - duel against a bot - 8-player deathmatch against bots The objective is to kill as many opponents as possible, the number of deaths e.g. does not matter. The only goal is to be the most effective killing machine. It makes sense to really try, because the participants will represent humanity in all its glory! Making several attempts is allowed. We have no intentions to misrepresent the results, and beating humans in Doom is not a point of this paper. In fact, we have no illusions that it will be possible to beat humans in a fair 1v1 without some serious work. The goal here is to show the human baseline for these particular tasks, create a reference point for future work. Here's the instruction: 1) Download the archive and unzip to C:/your/preferred/dir: doom_human_play_scripts.zip 2) Install Miniconda for Python 3.7: https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe Follow all the default options during installation 3) After installation search "Miniconda" in Windows start menu to open Anaconda prompt (command line window with Python environment) 4) Navigate to the directory in the command line by typing "cd C:/your/preferred/dir" (without quotes) 5) Type "create_env.bat" to install Python dependencies 6) Type and execute command "run_singleplayer_1.bat" to play single-player environment #1 Use run_singleplayer_2.bat, run_duel.bat, run_deathmatch.bat for the other three environments 7) After a desired number of games in each environment is played, please zip the folder "results" inside the script dir and send to petrenko@usc.edu with a topic "Doom study" (or post here) 8) In the email please mention whether you are "novice", "amateur", or "expert" Doom player! Currently we only have Windows version, but if there are many people on other platforms, we can make a package for Linux/MacOS too. Please let me know! If there are problems with installation please let me know too. My results are: 37 & 19 for singleplayer, 14 in duel, 32 in deathmatch. I am an amateur at best, and with no practice, so this should be easy to beat! Let's do a little competition in this thread, who can score the highest :) I will post a link to the paper when it's out so we can compare to agent's results.
  4. sun_stealer

    AI research and Doom multiplayer

    Thanks a lot! I'll give it a go with Zandronum
  5. sun_stealer

    AI research and Doom multiplayer

    I was thinking again about porting VizDoom functionality into a client-server version of the game. Having an ability to actually run a 24/7 server with AI bots just makes the project so much cooler and gives me motivation to work on it. I looked at VizDoom codebase and actually the difference between ZDoom and VizDoom is not that scary, quite manageable. Also, I compared Zandronum and Odamex. Zandronum is based on a much newer version of ZDoom, around 2.8.1. Just like VizDoom. Therefore certain files just have 1to1 correspondence, which makes the porting so much easier. On the other hand, looks like Odamex is based on the older version and the differences are larger. Do I actually gain anything by using Odamex over Zandronum? I can emulate "classic" Doom gameplay in Zandronum by configuring my server to restrict vertical mouse movement, jumps, etc, right? What would you guys recommend using? Are there any other things I didn't consider? @Fonze @Maes @Doomkid @Decay BTW, I trained some policies based on recurrent neural nets, and this version of the bot is pretty insane :) https://www.youtube.com/watch?v=Lk8OWLVGpVM
  6. sun_stealer

    AI research and Doom multiplayer

    Sadly, there's no project update at this time. We decided to focus on some theoretical things that we found while working on the project, namely how agents learn to use memory with recurrent neural networks etc. I am planning to go back to this project later)
  7. sun_stealer

    AI research and Doom multiplayer

    @kb1 Please find my comments below: > In a reward system, the worst possible action can be bounded at 0, yet the best possible action cannot be bounded (because it's infinite). In reinforcement learning the standard is to use what is called "discount factor". When we evaluate whether the action in timestep t was good or bad, we multiply future reward/penalty received at timestep t+i by a coefficient that is close to 1 (e.g. 0.999) raised to the power i, so 0.999^i The more we look into the future the less we pay attention to future events. This creates nice numerical guarantees: a sum of future rewards is always bounded. This is also a parameter of the algorithm that allows us to interpolate between a shortsighted agent (e.g. 0.9) and an agent that cares a lot about far future (e.g. 0.99999). Although using high value for discount factor seems attractive, the relatively shortsighted agents are much easier to train because variance of returns is much lower. Bottom line: we try to avoid infinities, whether rewards are positive or negative, the agent's objective is always bounded even in infinite horizon case. > If your reward system is mainly only using number of frags, here's the problem: Shots that kill are rewarded, but shots that do damage, but don't kill are not rewarded at all. This makes it very difficult to learn how to use the lesser weapons like fist, pistol, and chainsaw: If a single use doesn't cause a frag, there's no reward for using them, thus no knowledge to learn. Sure, eventually a single use will yield a kill, but this will appear as noise, along with the rare death caused by stepping into a nuke pool. Currently, in fact, I am using a small reward for dealing damage for exactly the reason you described. This is called "credit assignment problem": it is hard for the agent to learn behavior (deal damage without killing) that will be beneficial only, let's say, 300 timesteps into the future. Still this is not impossible. There's always a balance between making the task easier by adding more reward shaping versus making the most unbiased agent that won't care about anything except the final objective. One interesting approach to this is meta-optimization. You can have an inner learning loop that maximizes shaped reward and an outer loop that optimizes coefficients for rewarding certain events to maximize only one final goal: number of frags. One such algorithm is called Population-based Training, but there are many. > You can actually combine both systems (add up and scale penalties, then scale and subtract rewards...or vice-versa). Yeah, this is what we're currently doing. There is also a neural network called Critic that predicts the future rewards in any given situation. This prediction is then subtracted from the actual return to generate the quantity called Advantage. If advantage is positive we did better than we expected, if it's negative we did worse. Advantages are normalized (subtracted mean and divided by standard deviation) so sign and magnitude of rewards does not actually matter that much. Only relative quantities matter. > Gathering data can be challenging with a screen-pixels-only approach. In the case of causing damage, I suppose you could detect blood sprays. You could also read weapon counters, health, armor, and keys by "seeing" the on-screen numbers. Personally, I would dig these stats from within game memory. The stats are by no means hidden from a human...therefore not cheating. We are feeding some info into the neural network directly: ammo, health, weapon selected, etc. But only the information that's available to the player through GUI. > The scoring can also be a combination of accumulative stats, as well as stats altered during this tic. A very truncated example score system: Your code actually looks rather similar to mine :) > In closing, the concept I'm trying to advise on is that the scoring function is how you instill desire onto the AI. The more stats you can feed it, the better. Each of these stats should have weights assigned to them, that let the AI intelligently choose the best action, within a list of bad choices. The magnitude of such weights is not that important - what is important is the relative magnitude of one weight vs. the other. Some empirical testing will help you adjust those weights - it's not very difficult once you get into it. Yes, this is not far from my approach. Although my philosophy is that by shaping reward too finely we can instill undesirable human bias. E.g. we can discourage some behaviors that would otherwise help to maximize the number of frags. What if shooting the wall in the particular situation scares away the opponent and allows us to get to health? Just an example. There's an interesting take on this in recently released DeepMind podcast where they talk about their bots for Capture the Flag game. The author of the paper says that the bots aren't necessarily "smarter" than human opponents at all times, but they are completely ruthless and relentless. E.g. if the bot is holding a flag and running past an opponent, it won't even turn to look if it knows that it can score. It does not care about absolutely anything else, except the objective. And this allows the bots to be highly efficient and actually extremely hard to beat.
  8. sun_stealer

    AI research and Doom multiplayer

    @kb1 I apologize for not being active here, I was quite busy with work and just life in general) I will provide my comments to what you said once I clear things out. As a project update: I hit a major roadblock being unable to train any good policies with memory (LSTMs) and I suspect bugs in the research framework I was using. Ended up rewriting the algorithm in PyTorch from scratch and now it looks like it is working better. But it took a lot of effort.
  9. sun_stealer

    AI research and Doom multiplayer

    Currently, the main reward term is just the number of frags. I tried increasing the death penalty but it makes learning extremely hard in the beginning, because untrained agent can't do anything and gets killed all the time. So the agent gets really afraid of dying and hides somewhere to reduce the rate at which it's getting killed, rather than going out into the world to explore. The current agent isn't really afraid of dying, in fact, I think sometimes dies on purpose to get new weapons and full health :D
  10. sun_stealer

    AI research and Doom multiplayer

    @kb1: That's a good question and the line is very fine. @Maes is right, my goal is to do research in deep reinforcement learning, which implies training neural agents that learn autonomously by interacting with the environment. I want to see how far we can push RL on this particular task. One thing about this project that I don't want to change is that input provided to the agent is the same as what a human player sees. Which is different from AlphaZero, AlphaStar or OpenAI Five. So we're ruling out designs that incorporate hidden game state or game state perfectly preprocessed for the AI. I think it's a nice motivation because it makes human vs AI comparison more transparent, and also playing against such an agent should feel more "fair". A bot with perfect information always knows where you are and there's no way to mindgame or trick it. I think an AI that uses just screen input is much more appealing. If it knew precisely where you were hiding - sorry, the AI just outsmarted you. But in a fair way. It also makes the agent one tiny step closer to future real-world AI deployments, in a sense that it has to act in a 3D world using only egocentric sensory input. As for the other things, it depends. Let me provide a few examples. 1) A separate sub-system that can parse (preprocess) the screen input into a representation that is easier for the agent. E.g. a computer-vision system (neural network) that is trained to detect enemies and objects in the agent's view. This can significantly speed up training and solve the "poor eyesight" problem. I don't see why this is not fair, e.g. humans also have a specialized sub-system called visual cortex that is already good even in infants, can detect faces and nipples and stuff. As long as initial input to the entire agent is pure pixels, I'm fine with that. 2) Encouraging the agent to do certain things via "reward shaping". Certain behaviors are hard to learn via pure interaction, e.g. how to explore the map or find certain items. You can give agent higher reward during training for doing certain things to speed up the process. Again, this is common in humans. We have dopamine centers in a brain that reward us for eating food, interacting with partners, etc. These mechanisms evolved over millions of years and are not part of our everyday learning process. Finding efficient ways to encourage some actions without stripping the agent of the ability to make independent decisions is an important research direction. 3) Motivating the agent to learn better representations by making it solve auxiliary tasks. E.g. instead of just generating actions we can ask the neural network to predict some things about the game, e.g. where's your opponent, how much health does it have, where you are on the map. An agent that maintains accurate beliefs about the environment can also make better and more informed actions. This area of research is called "reinforcement learning with auxiliary losses" This is also similar to how a human plays: we don't focus 100% of our attention on how to move the mouse and press keys, but we reason about our opponent and the map, etc. Some negative examples: 1) Telling the agent precisely which weapon is best in the current situation. I want the agent to learn how to make the decision, even if the choice seems obvious. Limiting the agent's options one can make training easier but the ultimate performance will suffer because there might be a strategy involving different weapons that you didn't think of. 2) Adding any kinds of scripted behaviors. Again, we can make the training easier but ultimately it will make the agent even more exploitable and predictable. Generally, a lot of contemporary research in AI is about developing ways to incorporate prior knowledge into general-purpose learning and search methods. This project is no exception. Especially interesting are those ideas that are general and can be used across tasks and domains (e.g. the auxiliary losses example)
  11. sun_stealer

    AI research and Doom multiplayer

    First of all, I should be clear that the agent is very fast to act, which does not necessarily mean it's easy to train. Training the neural network and learning from experience requires tremendous amounts of compute, and maybe we would love to make our models bigger and more sophisticated, but then we'll have hard time training them with reinforcement learning. The idea you described certainly makes a lot of sense, in fact there's a whole subfield devoted to it called "Hierarchical Reinforcement Learning". Interestingly enough, people have been trying this for the last 25-30 years, and still no one managed to create a robust approach to train a hierarchical decision making system. There are things that work in certain domains like Feudal Networks, but they are not universal. DeepMind and OpenAI applied a two-level memory architecture where one level is slower than the other. This is the closest we have to a hierarchical system so far.
  12. sun_stealer

    AI research and Doom multiplayer

    @Maes Training at higher res makes everything much harder and reduces the throughput of the system (e.g. how many hours of experience the bot can learn from within a realtime hour). I can't tell you how much precisely but a significant portion of compute is spent just rendering the game frames and transferring them between actors and the learner. Yes, AI will be susceptible to attacks from long range as it basically has very poor eyesight. This is a limitation now. Answering your question more generally: the AI is highly adapted to its training setup but will struggle to generalize. Currently, it's trained to hunt down green bots, so it will recognize the sprites and shoot at the sprites. That said, there is no hand-crafted behavior of any sort, e.g. I don't even know what features it uses. The neural network is substantially big and is trained end-to-end as a black box. Whatever is useful for the task, the bot will learn it: movement, edges, textures. You can easily train it on bots of random color to make it more robust to change in visuals. Standing still in the corner won't help, that's for sure. I tried :D Short summary: it will struggle against anything that is significantly outside its training data distribution. New weapons, new maps, new enemies, new visuals. We're not really studying generalization here, because this is a huge orthogonal problem. Currently, we just want to train a bot that is really good in a bunch of fixed scenarios if that makes sense. This is not some kind of magical AGI, the field is still figuring out the basic stuff :)
  13. sun_stealer

    AI research and Doom multiplayer

    #2 is correct. The "AI" is a separate process that watches the video output and makes decisions (keypresses). This way the agent has the same access to the game state as the human player, thus eliminating the unfair advantage that scripted bots usually have. The agent is based on a neural network, which is currently relatively tiny in these applications, maybe just a couple of million parameters (aka synapses). For comparison, neural networks in language applications recently reached billions of parameters, because they are easier to train. In fact, the task of playing Doom is not computationally heavy at all, on my PC this agent can play a few hundred copies of Doom at the same time, and each faster than realtime. Without even using the GPU. I could have made it even faster, making decisions on every frame, but that would further complicate the training because of the problem called "credit assignment". Essentially, it means that certain actions have consequences hundreds of timesteps in the future, and it can become extremely hard to correlate actions with the advantage that they bring. That's why it is a common practice to make the "timestep" longer to make the credit assignment a little more tractable. This bot plays using a resolution of just 128x72 pixels. Although I wouldn't use strong words like "human-like". This agent is still very simple. The reactive and fast-paced nature of the game mode allows it to dominate, but it does not have general common sense, situational awareness or long-term strategy. This is something that we want to push in the future. I think "neural" AI like this will become more common in the next few years. I believe with this method you can create more interesting AI opponents, improving the singleplayer experience for the user.
  14. sun_stealer

    AI research and Doom multiplayer

    Well, this one does not have any problems with default bots. This agent adopted a very simple no-bullshit strategy to mostly stay in the open area and score shotgun kills. Earlier in the training, it did quite a lot of BFG spamming, and also other weapons like machinegun and rocket launcher were utilized, but eventually, it seems like the shotgun is the most effective option. In the average 4-minute match on Dwango5 MAP01, this bot manages to kill ~60-70 bots, while 100% default bots score maybe 15-20 kills. I tried to play a few matches against it as well, and I was always to beat the default bots on the scoreboard, while the agent was far ahead. But that's not saying much :) This agent makes decisions every 2nd frame (~18 times a second), therefore the footage looks very jittery. Reminds me of these Half-Life TAS videos with perfect bhopping. There might be ways to prevent this jittering by constraining "mouse acceleration" or penalizing the agent for changing viewpoint too often.
  15. sun_stealer

    AI research and Doom multiplayer

    What I might need soon is a way to progressively increase the difficulty of bots to keep them around the same level as the agent during training, until the agent is able to consistently beat the hardest ones. If you start training the agent from scratch on hardest bots it does not learn very well, because bots just kill it immediately.. Besides reactiontime, is there a way to modulate the difficulty of tdbots in such a fashion? Also, on maximum settings, are your bots stronger than standard?
×