Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
sun_stealer

AI research and Doom multiplayer

Recommended Posts

I would love to see the AI you're working on beat the crap out of the highest-level regular ol' ZDoom bots in time! (perfect aim, perfect reaction time, etc)

Share this post


Link to post
Just now, Doomkid said:

I would love to see the AI you're working on beat the crap out of the highest-level regular ol' ZDoom bots in time! (perfect aim, perfect reaction time, etc)

Working on it right now :D

Spoiler

image.png.5dffde1cbc19d5e3967614937da9eacf.png


 

Share this post


Link to post
35 minutes ago, sun_stealer said:

Working on it right now :D

  Hide contents

image.png.5dffde1cbc19d5e3967614937da9eacf.png


 

Yeah setting everything to 100 actually makes them worse so if i was you i would just leave those values undefined.

Most of the time they will just ignore you, run into walls and never react again and even when they manage to start firing they don't dodge much.

I still recommend using my bots instead as they are built to be customizable, and i can modify them however you need them to be. But if you insist on using bots that are almost 2 decades old with almost completely untouched and unmantained code since their release, i guess i'm not stopping you.

Edited by -TDRR-

Share this post


Link to post
 
 
 
 
 
 
 
9
 Advanced issues found
 
 
6
3 hours ago, -TDRR- said:

Yeah setting everything to 100 actually makes them worse so if i was you i would just leave those values undefined.

Most of the time they will just ignore you, run into walls and never react again and even when they manage to start firing they don't dodge much.

I still recommend using my bots instead as they are built to be customizable, and i can modify them however you need them to be. But if you insist on using bots that are almost 2 decades old with almost completely untouched and unmantained code since their release, i guess i'm not stopping you.

 

Oh, I am only using them cause they are already included into Vizdoom, it's a matter of effort. I am not questioning the quality of your bots :)

Currently I have only one pair of hands working on this project (mine), so I just cut all corners that I can.

Share this post


Link to post
Just now, sun_stealer said:

 

Oh, I am only using them cause they are already included into Vizdoom, it's a matter of effort. I am not questioning the quality of your bots :)

Currently I have only one pair of hands working on this project (mine), so I just cut all corners that I can.

Ah, in that case, since you already got regular bots working on ViZDoom, you can just load the TDBots .pk3 file by drag n' dropping, from there you only need to change the console variables tdbots_reactiontime and tdbots_easymode. To avoid having to load them manually all the time, you can also put them on a "skins" folder in the same directory as your ViZDoom executable is.

 

tdbots_reactiontime is the time the bots take until they fire, in tics. 35 tics is one second and 70 tics is the max you can set (otherwise the bots become far too crippled).

And tdbots_easymode is a bool, if it's 1, bots will deliberately miss many bullets. If it's 0, bots will have perfect accuracy (Obviously not 100% perfect as weapon innacuracy also factors in, but at least their aim is perfect)

These settings aren't saved, so i recommend putting them into an autoexec.cfg file so you always have your bot settings saved!

 

I tried to make the setup as quick and painless as possible, so do give it a shot whenever you can.

EDIT: Almost forgot, these bots are also added by using the "addbot" command, if you ever want to switch back to the original ZDoom bots, type into the console "tdbots_enable 0" (without quotes of course) and "addbot" will add regular bots.

It's too late to say this but this whole setup is faster than changing the bots.cfg file manually :S

Edited by -TDRR-

Share this post


Link to post

What I might need soon is a way to progressively increase the difficulty of bots to keep them around the same level as the agent during training, until the agent is able to consistently beat the hardest ones.

If you start training the agent from scratch on hardest bots it does not learn very well, because bots just kill it immediately..

 

Besides reactiontime, is there a way to modulate the difficulty of tdbots in such a fashion?

 

Also, on maximum settings, are your bots stronger than standard?

 

Share this post


Link to post
Just now, sun_stealer said:

What I might need soon is a way to progressively increase the difficulty of bots to keep them around the same level as the agent during training, until the agent is able to consistently beat the hardest ones.

If you start training the agent from scratch on hardest bots it does not learn very well, because bots just kill it immediately..

 

Besides reactiontime, is there a way to modulate the difficulty of tdbots in such a fashion?

 

Also, on maximum settings, are your bots stronger than standard?

 

They are far more kickass than the regular ZDoom bots even at 2 seconds reaction time, i can guarantee that one :D

What other settings could you want? I could add some more difficulty settings if you give me ideas. But still reactiontime 70 and easymode 1 should be enough for anyone, even these bots if they have preserved their previous training with the older bots. You can also use tdbots_usenodes 1 which short-circuits the TDBot's item searching, so most of the time (On anything else than DWANGO5 MAP01 at least) the bots should only have their Pistol to fight with.

 

Actually, even without keeping their training with the other bots, they can still have time to learn, if you pit it against one bot.

Edited by -TDRR-

Share this post


Link to post
On 7/12/2019 at 6:06 PM, Doomkid said:

I would love to see the AI you're working on beat the crap out of the highest-level regular ol' ZDoom bots in time! (perfect aim, perfect reaction time, etc)

 

Well, this one does not have any problems with default bots.

This agent adopted a very simple no-bullshit strategy to mostly stay in the open area and score shotgun kills. Earlier in the training, it did quite a lot of BFG spamming, and also other weapons like machinegun and rocket launcher were utilized, but eventually, it seems like the shotgun is the most effective option.

 

In the average 4-minute match on Dwango5 MAP01, this bot manages to kill ~60-70 bots, while 100% default bots score maybe 15-20 kills. I tried to play a few matches against it as well, and I was always to beat the default bots on the scoreboard, while the agent was far ahead. But that's not saying much :)

 

This agent makes decisions every 2nd frame (~18 times a second), therefore the footage looks very jittery. Reminds me of these Half-Life TAS videos with perfect bhopping. There might be ways to prevent this jittering by constraining "mouse acceleration" or penalizing the agent for changing viewpoint too often.

 

 

Share this post


Link to post

@sun_stealer : fascinating stuff! Quick question:

Is your AI code:

  1. Built into the Doom executable?
  2. Built as a separate app on the same PC that "watches" the video output and sends actions?
  3. Built with separate app on the same PC watching and sending actions, with the AI on a different box, or on a network of dedicated AI machines?

You mentioned 18 FPS reaction time (which seems pretty darn fast), so I'm wondering what's powering the video scraping and the decision-making. I'm also wondering how practical it is (or may become) to have this level of AI built-in as a standard feature of some Doom source ports. In other words, do you foresee always needing a lot more CPU power than is currently consumed by Doom, or do you think it can be streamlined, and, if so, how much?

 

I'm asking, mainly, because the concept fascinates me, especially when trying to determine the absolute minimum visual input throughput required to have a bot possess convincingly human-like intelligence.

Share this post


Link to post
23 hours ago, kb1 said:
 

Is your AI code:

  1. Built into the Doom executable?
  2. Built as a separate app on the same PC that "watches" the video output and sends actions?
  3. Built with separate app on the same PC watching and sending actions, with the AI on a different box, or on a network of dedicated AI machines?

You mentioned 18 FPS reaction time (which seems pretty darn fast), so I'm wondering what's powering the video scraping and the decision-making. I'm also wondering how practical it is (or may become) to have this level of AI built-in as a standard feature of some Doom source ports. In other words, do you foresee always needing a lot more CPU power than is currently consumed by Doom, or do you think it can be streamlined, and, if so, how much?

 

I'm asking, mainly, because the concept fascinates me, especially when trying to determine the absolute minimum visual input throughput required to have a bot possess convincingly human-like intelligence.

 

#2 is correct. The "AI" is a separate process that watches the video output and makes decisions (keypresses). This way the agent has the same access to the game state as the human player, thus eliminating the unfair advantage that scripted bots usually have. The agent is based on a neural network, which is currently relatively tiny in these applications, maybe just a couple of million parameters (aka synapses). For comparison, neural networks in language applications recently reached billions of parameters, because they are easier to train.

 

In fact, the task of playing Doom is not computationally heavy at all, on my PC this agent can play a few hundred copies of Doom at the same time, and each faster than realtime. Without even using the GPU.

 

I could have made it even faster, making decisions on every frame, but that would further complicate the training because of the problem called "credit assignment".

Essentially, it means that certain actions have consequences hundreds of timesteps in the future, and it can become extremely hard to correlate actions with the advantage that they bring. That's why it is a common practice to make the "timestep" longer to make the credit assignment a little more tractable.

 

This bot plays using a resolution of just 128x72 pixels. Although I wouldn't use strong words like "human-like". This agent is still very simple. The reactive and fast-paced nature of the game mode allows it to dominate, but it does not have general common sense, situational awareness or long-term strategy. This is something that we want to push in the future.

I think "neural" AI like this will become more common in the next few years. I believe with this method you can create more interesting AI opponents, improving the singleplayer experience for the user.

Share this post


Link to post
3 hours ago, sun_stealer said:

This bot plays using a resolution of just 128x72 pixels. 

 

So, does that mean that in theory it would be more vulnerable vs long-range attacks or fighting vs camouflaged opponents? Doesn't that impede map exploration/scouting, e.g. discovering a partially hidden weapon that might be visible only as a single pixel? Is the low resolution necessary in order to keep the screen-scraping overhead low? How much is it, BTW?

 

Edit: new questions popped into my mind, if you don't mind my inquisitiveness ;-)

 

Focusing more on the screen scraping/visual interface part, exactly how does it track and recognize opponents? E.g. it is trained to recognize only the general shape of Doomguy or to shoot at anything that moves? Is it trained to only track e.g. green Doomguy sprites, or is the tracking purely edge/shape based? What if you use a wall texture that looks just like Doomguy? Will it blindly attack it, or dismiss it due to lack of movement? And can the latter be exploited if an opponent sits perfectly still and only shoots at you when you have your back turned?

Edited by Maes

Share this post


Link to post

@Maes

Training at higher res makes everything much harder and reduces the throughput of the system (e.g. how many hours of experience the bot can learn from within a realtime hour). I can't tell you how much precisely but a significant portion of compute is spent just rendering the game frames and transferring them between actors and the learner.

Yes, AI will be susceptible to attacks from long range as it basically has very poor eyesight. This is a limitation now.

 

Answering your question more generally: the AI is highly adapted to its training setup but will struggle to generalize. Currently, it's trained to hunt down green bots, so it will recognize the sprites and shoot at the sprites. That said, there is no hand-crafted behavior of any sort, e.g. I don't even know what features it uses. The neural network is substantially big and is trained end-to-end as a black box. Whatever is useful for the task, the bot will learn it: movement, edges, textures. You can easily train it on bots of random color to make it more robust to change in visuals.

Standing still in the corner won't help, that's for sure. I tried :D

 

Short summary: it will struggle against anything that is significantly outside its training data distribution. New weapons, new maps, new enemies, new visuals. We're not really studying generalization here, because this is a huge orthogonal problem. Currently, we just want to train a bot that is really good in a bunch of fixed scenarios if that makes sense. This is not some kind of magical AGI, the field is still figuring out the basic stuff :)

 

Share this post


Link to post
On 7/21/2019 at 1:29 AM, sun_stealer said:

 

#2 is correct. The "AI" is a separate process that watches the video output and makes decisions (keypresses). This way the agent has the same access to the game state as the human player, thus eliminating the unfair advantage that scripted bots usually have. The agent is based on a neural network, which is currently relatively tiny in these applications, maybe just a couple of million parameters (aka synapses). For comparison, neural networks in language applications recently reached billions of parameters, because they are easier to train.

 

In fact, the task of playing Doom is not computationally heavy at all, on my PC this agent can play a few hundred copies of Doom at the same time, and each faster than realtime. Without even using the GPU.

 

I could have made it even faster, making decisions on every frame, but that would further complicate the training because of the problem called "credit assignment".

Essentially, it means that certain actions have consequences hundreds of timesteps in the future, and it can become extremely hard to correlate actions with the advantage that they bring. That's why it is a common practice to make the "timestep" longer to make the credit assignment a little more tractable.

 

This bot plays using a resolution of just 128x72 pixels. Although I wouldn't use strong words like "human-like". This agent is still very simple. The reactive and fast-paced nature of the game mode allows it to dominate, but it does not have general common sense, situational awareness or long-term strategy. This is something that we want to push in the future.

I think "neural" AI like this will become more common in the next few years. I believe with this method you can create more interesting AI opponents, improving the singleplayer experience for the user.

Very interesting! And, kinda frightening that it can hold its own at such computational speeds.

 

Regarding 'situational awareness', and 'long-term strategy', I'm thinking that, because your current setup is so fast, a separate, higher-level process could be devoted to big picture stuff. This level would not look into the screen array, but into the original AI memory space. In fact, this could be generalized into a hierarchy, using another level of nodes, and eventually multiple levels. That's the beauty of this approach: Modifying the logic simply becomes a data issue, vs. having to build excessive amounts of logic. Good stuff!

Share this post


Link to post
5 hours ago, kb1 said:

Very interesting! And, kinda frightening that it can hold its own at such computational speeds.

 

Regarding 'situational awareness', and 'long-term strategy', I'm thinking that, because your current setup is so fast, a separate, higher-level process could be devoted to big picture stuff. This level would not look into the screen array, but into the original AI memory space. In fact, this could be generalized into a hierarchy, using another level of nodes, and eventually multiple levels. That's the beauty of this approach: Modifying the logic simply becomes a data issue, vs. having to build excessive amounts of logic. Good stuff!

 

First of all, I should be clear that the agent is very fast to act, which does not necessarily mean it's easy to train. Training the neural network and learning from experience requires tremendous amounts of compute, and maybe we would love to make our models bigger and more sophisticated, but then we'll have hard time training them with reinforcement learning.

 

The idea you described certainly makes a lot of sense, in fact there's a whole subfield devoted to it called "Hierarchical Reinforcement Learning". Interestingly enough, people have been trying this for the last 25-30 years, and still no one managed to create a robust approach to train a hierarchical decision making system. There are things that work in certain domains like Feudal Networks, but they are not universal. DeepMind and OpenAI applied a two-level memory architecture where one level is slower than the other. This is the closest we have to a hierarchical system so far.

Share this post


Link to post
30 minutes ago, sun_stealer said:

 

First of all, I should be clear that the agent is very fast to act, which does not necessarily mean it's easy to train. Training the neural network and learning from experience requires tremendous amounts of compute, and maybe we would love to make our models bigger and more sophisticated, but then we'll have hard time training them with reinforcement learning.

 

The idea you described certainly makes a lot of sense, in fact there's a whole subfield devoted to it called "Hierarchical Reinforcement Learning". Interestingly enough, people have been trying this for the last 25-30 years, and still no one managed to create a robust approach to train a hierarchical decision making system. There are things that work in certain domains like Feudal Networks, but they are not universal. DeepMind and OpenAI applied a two-level memory architecture where one level is slower than the other. This is the closest we have to a hierarchical system so far.

My initial thoughts were of something that starts out very simple, and can be enhanced over time. Don't know if it's "sac-relige", but currently, there are some bots with standard (hard-coded) logic that could be used to bridge the gap. For example: "I'm going to need to grab that red key, eventually", or "oh shit, I need some health". Some traditional code could assist the high-level task of navigating to the red key through doors, while still leaving the choice to do so up to the AI. In other words, depending on your project's goals/philosophy, there are plenty of shortcuts that could be used to emulate more AI very "cost-effectively".

 

But, again, that depends on the goal (which I haven't really asked you yet:

  • Take pure "neural AI" as far as it can go, by itself
  • Build the smartest Doom bot known to man

My main point is that there's lots of grey area in between those two goals. So, I'll ask: Where do you stand? (just curious).

Share this post


Link to post
56 minutes ago, kb1 said:

In other words, depending on your project's goals/philosophy, there are plenty of shortcuts that could be used to emulate more AI very "cost-effectively".

 

Such AI models are known as "hybrid models", with the "traditional" logic taking exactly the role  you described: if you have something that already works and it makes more sense to just shoehorn it in, somehow, instead of having to re-learn it from scratch, then why not? Especially if the application in hand is a real-world engineering one, not something that can be trained and trained and trained in an academic setting. At a certain point concrete results will be demanded...and nothing beats some old shoestring and bubblegum when it comes to that, amirite?

 

56 minutes ago, kb1 said:
  • Build the smartest Doom bot known to man

IMO, that's not the goal here. For one, there's the screen scraping overhead and constraining the bot to interact through what is essentially a handicapping bottleneck. If one wanted to be super-efficient at outsmarting conventional bots, the starting point would be full access to the game's status just like they have. But not just to e.g. fire with superhuman accuracy over extreme ranges, but also to track  the position, weapon & ammo status of opponents, proximity to chokepoints or weapon spamming zones etc.

Edited by Maes

Share this post


Link to post

@Maes: You're right, I wasn't being clear:

@sun_stealer: Please let me ask my question with better clarity:

I believe that your original goal was purely to work with neural AI, and you happened to choose Doom as the underlying test platform. So which of the two choices better describes your intentions:

  1. You want to only use pure "neural AI" logic techniques to train your Doom bot, even if doing so yields an inferior Doom bot.
  2. You are open to the idea of enhancing your AI logic with domain-specific techniques (specific knowledge of Doom mechanics), and more "traditional" techniques to produce a more formidable bot (such as A+ path finding, weapon and enemy stat data tables, etc.)

 

@Maes: Yes, surely you can dig into the game memory and get exact enemy locations, map layout, thing placement, etc. I assume those techniques are out-of-scope for this project, and may be considered "cheating" by the OP. I may sound like I'm contradicting #2 - I'm trying not to. I'm suggesting that there might be a bit of middle ground here, where the final decisions are always made via AI nodes, but the massive scope of, say, a Doom level, could be assisted just a bit, to get past the massive amount of training required to simply teach the layout of each level. Such assistance could get the AI past a 'boring' stage, letting it get to "the good stuff".

 

I hope that makes sense.

Share this post


Link to post

@kb1: That's a good question and the line is very fine. 

@Maes is right, my goal is to do research in deep reinforcement learning, which implies training neural agents that learn autonomously by interacting with the environment. I want to see how far we can push RL on this particular task.

One thing about this project that I don't want to change is that input provided to the agent is the same as what a human player sees. Which is different from AlphaZero, AlphaStar or OpenAI Five. So we're ruling out designs that incorporate hidden game state or game state perfectly preprocessed for the AI. I think it's a nice motivation because it makes human vs AI comparison more transparent, and also playing against such an agent should feel more "fair". A bot with perfect information always knows where you are and there's no way to mindgame or trick it. I think an AI that uses just screen input is much more appealing. If it knew precisely where you were hiding - sorry, the AI just outsmarted you. But in a fair way.

It also makes the agent one tiny step closer to future real-world AI deployments, in a sense that it has to act in a 3D world using only egocentric sensory input.

 

As for the other things, it depends. Let me provide a few examples.

1) A separate sub-system that can parse (preprocess) the screen input into a representation that is easier for the agent. E.g. a computer-vision system (neural network) that is trained to detect enemies and objects in the agent's view. This can significantly speed up training and solve the "poor eyesight" problem. I don't see why this is not fair, e.g. humans also have a specialized sub-system called visual cortex that is already good even in infants, can detect faces and nipples and stuff. As long as initial input to the entire agent is pure pixels, I'm fine with that.

 

2) Encouraging the agent to do certain things via "reward shaping". Certain behaviors are hard to learn via pure interaction, e.g. how to explore the map or find certain items. You can give agent higher reward during training for doing certain things to speed up the process.

Again, this is common in humans. We have dopamine centers in a brain that reward us for eating food, interacting with partners, etc. These mechanisms evolved over millions of years and are not part of our everyday learning process.

Finding efficient ways to encourage some actions without stripping the agent of the ability to make independent decisions is an important research direction.

 

3) Motivating the agent to learn better representations by making it solve auxiliary tasks. E.g. instead of just generating actions we can ask the neural network to predict some things about the game, e.g. where's your opponent, how much health does it have, where you are on the map. An agent that maintains accurate beliefs about the environment can also make better and more informed actions. This area of research is called "reinforcement learning with auxiliary losses"

This is also similar to how a human plays: we don't focus 100% of our attention on how to move the mouse and press keys, but we reason about our opponent and the map, etc.

 

Some negative examples:

1) Telling the agent precisely which weapon is best in the current situation. I want the agent to learn how to make the decision, even if the choice seems obvious. Limiting the agent's options one can make training easier but the ultimate performance will suffer because there might be a strategy involving different weapons that you didn't think of.

 

2) Adding any kinds of scripted behaviors. Again, we can make the training easier but ultimately it will make the agent even more exploitable and predictable.

 

 

Generally, a lot of contemporary research in AI is about developing ways to incorporate prior knowledge into general-purpose learning and search methods. This project is no exception. Especially interesting are those ideas that are general and can be used across tasks and domains (e.g. the auxiliary losses example)

Share this post


Link to post
1 hour ago, sun_stealer said:

@kb1: That's a good question and the line is very fine. 

@Maes is right, my goal is to do research in deep reinforcement learning, which implies training neural agents that learn autonomously by interacting with the environment. I want to see how far we can push RL on this particular task.

One thing about this project that I don't want to change is that input provided to the agent is the same as what a human player sees. Which is different from AlphaZero, AlphaStar or OpenAI Five. So we're ruling out designs that incorporate hidden game state or game state perfectly preprocessed for the AI. I think it's a nice motivation because it makes human vs AI comparison more transparent, and also playing against such an agent should feel more "fair". A bot with perfect information always knows where you are and there's no way to mindgame or trick it. I think an AI that uses just screen input is much more appealing. If it knew precisely where you were hiding - sorry, the AI just outsmarted you. But in a fair way.

It also makes the agent one tiny step closer to future real-world AI deployments, in a sense that it has to act in a 3D world using only egocentric sensory input.

 

As for the other things, it depends. Let me provide a few examples.

1) A separate sub-system that can parse (preprocess) the screen input into a representation that is easier for the agent. E.g. a computer-vision system (neural network) that is trained to detect enemies and objects in the agent's view. This can significantly speed up training and solve the "poor eyesight" problem. I don't see why this is not fair, e.g. humans also have a specialized sub-system called visual cortex that is already good even in infants, can detect faces and nipples and stuff. As long as initial input to the entire agent is pure pixels, I'm fine with that.

 

2) Encouraging the agent to do certain things via "reward shaping". Certain behaviors are hard to learn via pure interaction, e.g. how to explore the map or find certain items. You can give agent higher reward during training for doing certain things to speed up the process.

Again, this is common in humans. We have dopamine centers in a brain that reward us for eating food, interacting with partners, etc. These mechanisms evolved over millions of years and are not part of our everyday learning process.

Finding efficient ways to encourage some actions without stripping the agent of the ability to make independent decisions is an important research direction.

 

3) Motivating the agent to learn better representations by making it solve auxiliary tasks. E.g. instead of just generating actions we can ask the neural network to predict some things about the game, e.g. where's your opponent, how much health does it have, where you are on the map. An agent that maintains accurate beliefs about the environment can also make better and more informed actions. This area of research is called "reinforcement learning with auxiliary losses"

This is also similar to how a human plays: we don't focus 100% of our attention on how to move the mouse and press keys, but we reason about our opponent and the map, etc.

 

Some negative examples:

1) Telling the agent precisely which weapon is best in the current situation. I want the agent to learn how to make the decision, even if the choice seems obvious. Limiting the agent's options one can make training easier but the ultimate performance will suffer because there might be a strategy involving different weapons that you didn't think of.

 

2) Adding any kinds of scripted behaviors. Again, we can make the training easier but ultimately it will make the agent even more exploitable and predictable.

 

 

Generally, a lot of contemporary research in AI is about developing ways to incorporate prior knowledge into general-purpose learning and search methods. This project is no exception. Especially interesting are those ideas that are general and can be used across tasks and domains (e.g. the auxiliary losses example)

Totally agree with all statements above, except for reward shaping. Don't get me wrong - it has to be done. But, therein lies the "artificial-ness" of it all. The scoring system (you call it "reward shaping") is *the* most interesting aspect of AI, as it is this that drives all learning and eventual decision making.

 

The scoring system sits above everything - it is where the human programs the desire. Instead of "win Doom", it could be switched to "die as quickly as possible", or "go as far North as possible", etc. The AI couldn't care less. The only "desire", as far as the AI is concerned, is getting the lowest (or highest) score. The scoring system must be able to convert all measurable outcomes into a number that reflects how close the current situation matches the most perfect result.

 

So, I don't really like the term "reward shaping", as it tends to minimize the importance of it. "Outcome shaping" might be closer to the truth.

 

Some other notes:

The video preprecessor is good - this can help your bot avoid shooting moving walls, doors, lifts, as well as co-op players and team members (available in some multi-player ports).

 

Continuing with the screen pixel input is good - this can allow your bot to work with multiple ports, Doom, and otherwise. Does game input consist of you sending fake keystrokes and mouse inputs? This lets your code be truly independent of source port.

 

Anyway, keep up the good work! I am jealous - I wish I had the time. Good luck!

Share this post


Link to post
7 hours ago, kb1 said:

The scoring system sits above everything - it is where the human programs the desire. Instead of "win Doom", it could be switched to "die as quickly as possible", or "go as far North as possible", etc. The AI couldn't care less. The only "desire", as far as the AI is concerned, is getting the lowest (or highest) score. The scoring system must be able to convert all measurable outcomes into a number that reflects how close the current situation matches the most perfect result.

 

I wonder, if the goal is made to be "die as less as possible" or "maximize K:D ratio", will be AI evolve into a sophisticated camper/snaker? :-p

 

What if you have a whole bunch of such AIs in the same map? Will they all simply sit it out?

Edited by Maes

Share this post


Link to post
11 hours ago, Maes said:

 

I wonder, if the goal is made to be "die as less as possible" or "maximize K:D ratio", will be AI evolve into a sophisticated camper/snaker? :-p

 

What if you have a whole bunch of such AIs in the same map? Will they all simply sit it out?

 

Currently, the main reward term is just the number of frags. I tried increasing the death penalty but it makes learning extremely hard in the beginning, because untrained agent can't do anything and gets killed all the time. So the agent gets really afraid of dying and hides somewhere to reduce the rate at which it's getting killed, rather than going out into the world to explore.

 

The current agent isn't really afraid of dying, in fact, I think sometimes dies on purpose to get new weapons and full health :D

Share this post


Link to post
On 7/24/2019 at 4:25 PM, sun_stealer said:

Currently, the main reward term is just the number of frags.

Did you provide the AI the game's internal frag counter as input, or are you reading the on-screen frag indicator?

 

Quote

I tried increasing the death penalty but it makes learning extremely hard in the beginning, because untrained agent can't do anything and gets killed all the time. So the agent gets really afraid of dying and hides somewhere to reduce the rate at which it's getting killed, rather than going out into the world to explore.

 

The current agent isn't really afraid of dying, in fact, I think sometimes dies on purpose to get new weapons and full health :D

Being afraid to die is a valid (albeit boring) strategy that should be present in some capacity (imagine the nearly-dead bot seeing a +100% soulsphere nearby. Running away from the danger, toward the soulsphere can be a smart thing to do.

 

Your AI would most-likely immediately improve with a more-comprehensive scoring (reward) system (in every system I built, I used a "penalty" system (lowest score is best) vs. a reward system (highest score is best). One is the mathematical reverse of the other, with a twist:

 

In a reward system, the worst possible action can be bounded at 0, yet the best possible action cannot be bounded (because it's infinite).

In a penalty system, the best possible action is bounded at 0, meaning that it can be defined exactly (as 0), yet the worst possible action is infinite.

 

An example of perfect play might be a shot that kills all enemies, while picking up 200% health, 200% armor, all keys, etc.

 

If your reward system is mainly only using number of frags, here's the problem: Shots that kill are rewarded, but shots that do damage, but don't kill are not rewarded at all. This makes it very difficult to learn how to use the lesser weapons like fist, pistol, and chainsaw: If a single use doesn't cause a frag, there's no reward for using them, thus no knowledge to learn. Sure, eventually a single use will yield a kill, but this will appear as noise, along with the rare death caused by stepping into a nuke pool.

 

Re-stated for a penalty system: Shots that don't kill would have a penalty applied to them (which makes sense, cause you wasted a bullet). But none of that information is being considered if you're only looking at frag count.

 

The idea is to devise a formula that takes all measurable data, multiplies each data point by some pre-determined constant (weighting), and adds them together. This can be done for either a reward system or a penalty system (you'd need a different formula for either system). I like penalty systems because it tends to be easier to detect mistakes than it is to detect successes.

 

You can actually combine both systems (add up and scale penalties, then scale and subtract rewards...or vice-versa).


Gathering data can be challenging with a screen-pixels-only approach. In the case of causing damage, I suppose you could detect blood sprays. You could also read weapon counters, health, armor, and keys by "seeing" the on-screen numbers. Personally, I would dig these stats from within game memory. The stats are by no means hidden from a human...therefore not cheating.

 

To me, getting your AI to be able to make sense of the on-screen numbers is an interesting side task, but I feel that it complicates things too much, and it distracts from the much more interesting task of learning fighting tactics and navigational skills. If you are really opposed to digging these values out of game memory, I suppose your visual pre-processor idea could be built to detect blood sprays, and to read those on-screen ammo, health, armor, and frag counters.

 

The scoring can also be a combination of accumulative stats, as well as stats altered during this tic.

A very truncated example score system:

Spoiler

double current_reward;
double current_penalty;
double accumulated_reward;
double accumulated_penalty;
double penalty_score;

current_reward = 
  (frags_this_tic * TIC_FRAGS_REWARD) +
  (damage_dealt_this_tic * TIC_DAMAGE_REWARD) +
  (health_grabbed_this_tic * TIC_HEALTH_REWARD) +
  (ammo_grabbed_this_tic * TIC_AMMO_REWARD)
  ...

current_penalty =
  (damage_taken_this_tic * TIC_DAMAGE_PENALTY) +
  (ammo_spend_this_tic * TIC_AMMO_PENALTY)
  ...

accumulated_reward =
  (total_frags * ACCUM_FRAGS_REWARD) +
  (total_health_grabbed * ACCUM_HEALTH_REWARD)
  ...

accumulated_penalty =
  (total_ammo_spent * ACCUM_AMMO_PENALTY) +
  (total_damage_taken * ACCUM_DAMAGE_PENALTY) +
  (total_leveltime * ACCUM_TIME_PENALTY) +       // this one helps keep the AI from camping...
  ...

penalty_score = ((current_penalty + accumulated_penalty) * PENALTY_SCALE) - 
                ((current_reward + accumulated_reward) * REWARD_SCALE);

if (penalty_score < 0)
  penalty_score = 0; // weights should prevent this from ever happening

 

 

This example is massively simplified. You want it to be fast, but, above all, you want it to capture every possible minuscule "brownie point" and "demerit". Something as simple as firing a bullet at the wall wastes a bullet, so it should produce a slight penalty. A properly-tuned scoring system like this guides the learning process with precision, without compromising any "purity of thought". Without this, every action equates to throwing everything at the wall to see what sticks. It's shuffling the deck, hoping to deal a royal flush.

 

Important: It is tempting to think of rewards as "negative penalties", and vice-versa, but this is questionable (dying might be considered 1000 penalty, but is "not dying" considered 1000 reward?). Again, there are some mathematical benefits of thinking in terms of penalties. If you can rethink your rewards as penalties, you can scale using multiplication and division, as well as plus/minus, with everything pulling towards 0 (vs. infinity). As an example, grabbing a 25% health can be thought of as +25 reward, or it can be though of as +175 penalty (grabbing no health = +200 penalty, grabbing the maximum sized health 200 = +0 penalty).

 

Some notes:

  • The score function is the enforcer: Spanking the AI, or giving it a cookie, thereby directing the learning process
  • I used both immediate stats, and accumulated stats. Accumulated stats helps steer long term goals, whereas immediate stats cause actions to occur. Therefore immediate stats should be weighted higher than accumulative stats.
  • I used both penalties and rewards, and, as mentioned above, this can be questionable. Your mileage may vary.
  • The constants create the AI's "personality". They must be set carefully, relative to each other, but are otherwise not critical. What I mean is this: There are a lot of mutually-exclusive actions (stay and fight, or run and get health) or (grab that good weapon in the room full of enemies). The constants are the programmer's way to say that action A is more important than action B, all other things being equal. For example, AMMO_SPENT should be much lower than DAMAGE_CAUSED, as it's way more important to kill your enemy than it is to save ammo. But, ammo should not be wasted, so there must be some penalty.

In closing, the concept I'm trying to advise on is that the scoring function is how you instill desire onto the AI. The more stats you can feed it, the better. Each of these stats should have weights assigned to them, that let the AI intelligently choose the best action, within a list of bad choices. The magnitude of such weights is not that important - what is important is the relative magnitude of one weight vs. the other. Some empirical testing will help you adjust those weights - it's not very difficult once you get into it.

 

I hope you can understand my chicken-scratch, and that you find some of this helpful...even if it's just for inspiration :)

 

Edited by kb1 : Modified for clarity

Share this post


Link to post

@sun_stealer I was really hoping for some feedback on your thoughts about more comprehensive scoring.

Edited by kb1

Share this post


Link to post
2 hours ago, kb1 said:

I was really hoping for some feedback on your thoughts about more comprehensive scoring.

And I bet sun_stealer was hoping that people in this thread would stop wasting his time by posting their own "visions" of how the AI should be trained.

Share this post


Link to post
5 hours ago, andrewj said:

And I bet sun_stealer was hoping that people in this thread would stop wasting his time by posting their own "visions" of how the AI should be trained.

Congratulations for typing the first shit-slinging, time-wasting post in this thread. Other than your post, all indications suggest that the original poster was having a fascinating discussion about a fascinating project, with everyone.

 

You know, I was building systems that analyze incomplete, big datasets, often with opposing conclusions, and make pretty good educated guesses, before neural networks were even an established area of study. That doesn't make me an authority, but it does afford me the ability to speak on the subject at a level greater than an unsolicited "you're wasting my time."

 

If sun_stealer had even hinted that I might be wasting his time with my "visions", I would, of course, respectfully stop posting in his thread. My hopes are that sun_stealer might appreciate some chit-chat about a subject that he was interested enough to post about...isn't that the main reason to post such topics on a public-facing forum?

 

And, in my "vision" post, I finished it with the statement: "I hope that... you find some of this helpful...even if it's just for inspiration".

 

Your post is not helpful, or inspiring. Wow, what a wicked thing to say - what did I do to you to deserve that???

 

@sun_stealer: Please feel free to reply, as bluntly as desired, in this thread, or by PM, if you'd rather I stop making observations, suggestions, or posting at all in your thread. I will completely understand - no hard feelings, no worries. I will continue to read your thread, however, and check out any source code you might produce, as I find it fascinating, both the concept, and the impressive results, especially considering the relatively short amount of time you've put into it.

 

I do not presume to be knowledgeable of the inner workings of your source, or your methodology, other than what you've revealed in this thread. I have had experience with systems that iterate to make decisions, evaluate the amount of error in those decisions, then propagate the error back through to tweak the interpretation of the inputs used to collectively calculate the next decision. (what a mouthful!) In that model, I've found that the better you can calculate the amount of error, the faster the input bias can be tweaked, therefore, the quicker the AI "learns".

 

I did presume that, because of the nature of your project, you would find such discussions interesting, and maybe inspiring (I know I would, if our roles were reversed). If not, by all means, please let me know.

 

Likewise, please feel free to post, or PM me to discuss anything - if you want to bounce ideas by me, or need a second pair of eyes...whatever. I'm here. Take care.

 

Share this post


Link to post
13 hours ago, kb1 said:

@sun_stealer I was really hoping for some feedback on your thoughts about more comprehensive scoring.

 

Dude it has been less than a day, dont you take weekends off too?

Share this post


Link to post

This is a pretty interesting topic and I'd like to see how these new DM bots turn out!

Count me in!

Share this post


Link to post

@kb1 I apologize for not being active here, I was quite busy with work and just life in general) I will provide my comments to what you said once I clear things out.

 

As a project update: I hit a major roadblock being unable to train any good policies with memory (LSTMs) and I suspect bugs in the research framework I was using. Ended up rewriting the algorithm in PyTorch from scratch and now it looks like it is working better. But it took a lot of effort.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×