Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
fraggle

Algorithm learns to play NES games

Recommended Posts

This is really impressive work. Skip to several minutes in to see the actual footage if you don't care too much about the technical details, or (if you do) read the actual paper.

Share this post


Link to post

Is this how TimeOfDeath records his demos? :-)


J/K, it's pretty impressive. The potential here is really immense, if a more stable method to analyze the status of an entire program and to determine what are the set goals in any case is devised.

At its extreme, it would become like an automated TAS tool that "knows" even the most intimate parts and functioning of a game (since it can literally peek at the entire memory), anticipating actions and exploiting controls down to superhuman precision. I wonder how long it would take such a system to find out e.g. the "play forever" patterns in Pac Man ;-)

I also recall that some game cheating utilities kind of worked this way, by taking snapshots of the memory and reporting potentially interesting locations back to the player/hacker (e.g. take a snapshot, lose a life, take another, what changed? Bingo, you have found the life counter ;-) but this one is on steroids...

Share this post


Link to post

Heh, I like the way it ragequitted Tetris. "The only winning move is not to play", indeed.

This is really cool stuff. I wonder if it will ever be plausible to create a program that learns to play something much more complex, like a FPS?

Share this post


Link to post

The most impressive (or, under a certain point of view, the most disappointing) part is that no computer vision is involved at all, which would be a massive rat's nest in itself: it works by peeking where it's not supposed to, aka the memory itself. Think about it, that's how computer opponents actually play: they "know" the game from inside information, and often this results in instances of "The computer is a damn cheater!"

Even an early FPS like Doom however has much more complex status snapshots than the entirety of the NES's 2K memory (which is why this approach worked), and I have no idea how the method's complexity would scale up.

Share this post


Link to post

It's particularly interesting because it's such a simple technique. It's basically using the fact that the game behaves in a deterministic way: it creates lots of random possible inputs, runs the game for each possible input and calculates a score for the future that would result. Then it picks the future with the best score. It's kind of like how computers play chess by calculating all the possible future moves, except in this case there's no human it's playing against. But it can only see a small distance ahead into the future, so it's not good at long term planning - hence why it's quite good at platform games that just involve avoiding obstacles, but not very good at Tetris or Pacman, which require you to think ahead.

Theoretically I guess it should be possible to do something with Doom demos, but while it would probably be good at avoiding rockets and so on, finding the exit for levels probably wouldn't work so well.

Share this post


Link to post

In the case of Tetris, someone must tell it that the goal is to keep the stack low and to form lines. Believe it or not, it's not that obvious for first-time players.

I remember playing Tetris for the first time at the arcades. While I had watched older players, for some reason I didn't exactly understand the mechanics of forming lines, nor the goal of the game, so the very first time I played was a disaster.

Chess simulators on the other hand come partially or extensively pre-trained (at least they know the rules, and can avoid common pitfalls and can even draw upon databases of standard moves or opening).

Hell, some games like Moondust are nearly impossible to understand simply by any amount of watching:



And even real sports like baseball or cricket are nearly impossible to follow for someone that doesn't know all the subtle rules like permitted hitting angles/directions, rules for running to bases etc.: it just looks like a bunch of guys hitting a ball with a stick and sometimes it's good, sometimes it's not :-p

Share this post


Link to post

It's funny how it discovers trick moves like stomping enemies while jumping. I also liked how it pauses Tetris in order not to lose.

Share this post


Link to post
fraggle said:

the actual paper


I can't believe I the whole thing.

The main interesting thing to me is the generality/ ability for the same algorithm to play multiple games, instead of being so typically rigid and fit for one unique purpose. But the simplicity/"elegance" basically defines the winning goal to make variable values increase regardless of specific game, which doesn't turn out to be so general after all. Its basically normal ai that chooses the best among fairly random future possibilities, but with goals that may or may not be out of wack because of the assumed definition of what is winning/losing being variables incrementing. Maybe if every single NES game used the same standardized overlaying code system to define what is "bad" and "good", then the ai would know what to do in each specific game.
For doom this might result in running north east, just to increase the x and y variables of the player (but then each enemy has an x & y too.. actually not sure how mario did it since goombas had their x/ys variables which would seem to interfere with mario's.
Life seems to operate on the algorithm of increasing its numberOfReplications variable.

Also, I already happened to know of tom7's music for many years (videogameish music):
I recommend "rutgers" and "alaplantine" here:
http://mp3.tom7.org/t7es/

Also this was interesting at the end: "human geniuses used tools to beat Mega Man (Mega Men?) 3, 4, 5, and 6 all at the same time using the exact same input sequence sent to all four games"

Share this post


Link to post

On the one hand, interesting design. On the other hand, the narration is fragging insufferable. "I just like this graph, it doesn't matter what it is." Thanks for the insight, dude.

fraggle said:

It's basically using the fact that the game behaves in a deterministic way: it creates lots of random possible inputs, runs the game for each possible input and calculates a score for the future that would result. Then it picks the future with the best score.

...

Theoretically I guess it should be possible to do something with Doom demos, but while it would probably be good at avoiding rockets and so on, finding the exit for levels probably wouldn't work so well.

An algorithm like this is about how Automatic Wolfenstein works, isn't it? Of course playing wolf3d is a lot simpler than Doom: all you do is open doors (and pushwalls), move and shoot, with no switch-pulling or moving floors or ceilings. From the demos I've seen, Automatic Wolfenstein doesn't even bother switching weapons when it picks up a new one, even if it could e.g. use the knife to save ammo.

Share this post


Link to post

I wonder how long it's going to take for this program to become self-aware and start doing shitty internet game reviews.

Share this post


Link to post
Urban Space Cowboy said:

An algorithm like this is about how Automatic Wolfenstein works, isn't it? Of course playing wolf3d is a lot simpler than Doom: all you do is open doors (and pushwalls), move and shoot, with no switch-pulling or moving floors or ceilings. From the demos I've seen, Automatic Wolfenstein doesn't even bother switching weapons when it picks up a new one, even if it could e.g. use the knife to save ammo.

No, AutoWolf is just a deterministic state machine that uses graph traversal to locate targets and hunt them. No machine learning involved (until I get to support mods, but I'd have to adjust to ECWolf's different and ZDoomish codebase), no randomness either. It's just high-school stuff. The Git repository has an up-to-date version with slightly improved behaviour, I wish I knew how to make frequent releases without wasting my time.

Share this post


Link to post

I think this might be able handle something like Doom if you added some sensible cost functions that it could see the results of.

Share this post


Link to post

If the program's ultimate goal is to get a high score, I can't imagine it would do well in score-less open-ended games.

Technician said:

I wonder how long it's going to take for this program to become self-aware and start doing shitty internet game reviews.

...Complaining about artificial difficulty.

Share this post


Link to post
Bucket said:

If the program's ultimate goal is to get a high score, I can't imagine it would do well in score-less open-ended games.


Well, that was more like -relative- "laziness" by part of the algorithm designer -in his ultimate quest to make it absolutely "preconception free", he omitted a goal customization/guiding part (other than providing a representative player input).

But in theory, the goal function could be anything from increasing, decreasing, periodic, or even advancing a complex state machine -but in this case there would need to be some sort of "game rules module".

E.g. in order to play an adventure game, that module would have to be nothing short of a replica of the adventure game's scripting system, its general rules etc. and a symbolic outline of a walkthrough/solution. E.g. imagine trying to make it play Leisure Suit Larry: it would need to know what a "door" is, that it can be "OPENed", it would have to know what verbs/actions it can do/needs to do/are applicable etc.

As I said, a game can be complex and non-obvious enough even for human players using higher reasoning: the change of progressing just by randomizing inputs in such games becomes infinitesimal, so some sort of restrictions/guidance will be needed. In other words, the AI needs to RTFM too ;-)

Share this post


Link to post
Technician said:

I wonder how long it's going to take for this program to become self-aware and start doing shitty internet game reviews.

The Schizophrenic Video Game Robot? (SVGR for short). Probably not long if they feed it with some LJN shit ;)

Share this post


Link to post
exp(x) said:

I think this might be able handle something like Doom if you added some sensible cost functions that it could see the results of.


I can see it dying in the nukage a lot.

Quasar said:

The Schizophrenic Video Game Robot? (SVGR for short). Probably not long if they feed it with some LJN shit ;)


I would love to see it try to play an LJN game.

Share this post


Link to post
Csonicgo said:

I can see it dying in the nukage a lot.


Or become a compulsive wall humper/item picker/linedef triggerer, after it discovers that "shit happens" after those actions. Remember, it can "see" in the computer's memory, and see relationships like e.g. sector and linedef tags that are not visible to humans. So IMO it would learn to avoid nukage pretty quickly, and it would also "unfairly" avoid non-obvious damaging floors ;-)

In theory, if fully trained, it could even run straight for switches that would be hard for a human player to find without prior knowledge (again, remember, NOTHING would be hidden from it), and no doubt it could become nearly invincible in player-to-monster combat (even with tysoning, dodging and landing hits with tic-precision).

The problem is that Doom is so complex and so many things change in the main memory as you move, that guessing the correct set of inputs or goals by randoming along would probably be an np-complete problem.

In the case of the NES you have a 2K memory with 2^16K states (yeah, that's (2^16384 possible states!) of which however, for the purpose of keeping game state, luckily only a small subset changes, and an ever smaller changes frequently, so the optimization functions can focus on that.

Even if you assume that in Mario you have 100 bytes that change constantly, that's 800 bits/2^800 states to follow...which are at least reasonable for use as an input for an optimization system, neural-network or otherwise based.

But with DOOM? A game that spans a minimum of 4 MB working RAM? Good luck with that....it would only be viable if the algorithm managed to "cut out" large, confusing parts like e.g. the video buffer (unless it could learn to play Doom by pixelvision...), static data etc. and only followed player and monsters status. Even that way, there's simply a lot of state to keep track of (in particular, the monster states might be very confusing/misleading). Eventually, it will learn that getting them all to their "dead" state is good, and that fiddling with switches open doors, and opening doors leads to new rooms.

Edit Hmm....

Maes's compulsive Dooming algorithm for (not so) intelligent bots and occasionally for human Doomers:

  • Some linedef or sector tag in the level triggers a level exit. Find it. That is your ultimate goal.
  • Always run at constant SR50 speed, as this way you can clear more obstacles.
  • Move about randomly in rooms, disposing of annoying monsters, and avoiding crossing damaging sectors as much as possible.
  • If you encounter an impassable wall/door, check for sector/linedef tags. If there are (and are within your currently reachable areas), temporarily change mission goal to locating and activating them. In theory, this should open up new passages.
  • If a door or object requires a key/keycard, temporarily change mission goal to locating that key/keycard.
  • Reachability is defined per-sector in a recursive way and dynamically mapped. A sector is reachable iff at least one of its adjacent sectors is reachable (you can walk to it or otherwise transport there), and there aren't height or impassability constraints on all of its linedefs or the sector is reachable through other means (e.g. teleportation or is within SR50 jump landing distance from reachable sectors within a certain euclidean radius, taking ballistics into account (Hah! Take that speedrunners!)
  • If a certain sector contains a teleporter destination, then it's
    reachable from all sectors containing a linedef with the same tag.


There :-p Now go implement it.






[/code]

Share this post


Link to post

I doubt the program can recognize the differences in textures to possibly figure out what's a switch or a door without chronic wall-humping.

Share this post


Link to post
Technician said:

I doubt the program can recognize the differences in textures to possibly figure out what's a switch or a door without chronic wall-humping.


Doesn't need to, it can "peek" at linedef attributes directly ;-) It would even be able to detect unmarked switches, a 1994 favorite, and secret doors would have nothing secret for it.

Share this post


Link to post

Pretty much. The idea is not to make an AI bot with environment awareness, the idea is to exploit the code for the best possible outcome.

Share this post


Link to post
Maes said:

Doesn't need to, it can "peek" at linedef attributes directly ;-) It would even be able to detect unmarked switches, a 1994 favorite, and secret doors would have nothing secret for it.

Hans Gruber said:

When Alexander saw the breadth of his domain, he wept for there were no more worlds to conquer.

Share this post


Link to post

Wow... this guy catches up to what Nintendo made for New Super Mario Bros. Personally I'd have it recognize pits, coins, enemies and deal with them accordingly.

Share this post


Link to post
geo said:

Wow... this guy catches up to what Nintendo made for New Super Mario Bros. Personally I'd have it recognize pits, coins, enemies and deal with them accordingly.


The point is to make an algorithm that does not have to be specifically tuned to the mechanics of any game -or even actually understand them. As it is, this approach has its limitations: the more a game has obscure goals and the less progress feedback or immediate response to change it provides, the less success such an algorithm will have. Kinda like a 13 yo kid with ADD ;-)

Perhaps the ultimate test in pathological gaming, would be Blue Ice. A complete sphinx :-p

Share this post


Link to post

I'm having some data mining courses at school right now. I wonder how much they can be applied on a Doom bot. Can classification techniques be applied on, say, thing state chains, to infer that the actors are monsters — of various levels of threat —, items, pseudomonsters or something else? Same goes with weapon state chains, in order to determine when it's optimal to use them.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×