I am developing a chess game and at the moment I'm trying to implement a minimax algorithm. I haven't done this before, also the little i known about how to programmatically represent and implement the following evaluation function features(material, mobility, piece square table, centre control, trapped piece, king safety, tempo and pawn structure) is not quite clear to me (I will be grateful if someone can explain to me in detail). I have been able to assign values to each chess pieces, piece action values and a square table for each piece. The problem am having at the moment is how to generate Piece attacked and defended values which will be added or subtracted from the score. The idea here is that i want to reward the AI agent for protecting its pieces and penalize it for having the pieces attacked. thanks in advances.
Each of the evaluation features you mentioned will take up compute time. As you may already be aware, playing strength of a chess engine comes from two sources:
Search
Evaluation
And both contend for the same valuable resource, compute time. Evaluation tends to be heuristics based and hence a bit fuzzy, whereas search tends to yield more concrete and relevant results. If you are starting to build an engine then I would recommend focusing on search while keeping evaluation basic (but not weak!). That way you will be able to tell exactly where something went wrong and hence avoid possible early disappointments. Moreover, popular engines like Stockfish also started out by first building a strong search algorithm.
If you've been patient enough to read this far, let me point you to two useful resources for evaluation:
Chess Programming Wiki's evaluation page: This website is probably the best online resource for chess engine development in general.
Link to a basic but not weak evaluation function: This is C# code. Unfortunately I can't find the original article that I based this evaluation on.
Hope it helps :)
I think that you shouldn't include the computation on attack and defended pieces. That functionality is already taken into account by the minmax algorithm in a more efficient way.
A piece in under attack if at the following move the opponent can take it. If you try to evaluate this possibility in a static evaluation function you will get into troubles if you want to do it correctly. If my protected pawn is taken by the opponent queen that is not an issue. How do you take this into account? If my queen is taken by the opposite pawn but moving the pawn puts the king under attack?
These considerations are better managed by the minmax algorithm, not the evaluator. Consider that to know how many pieces you can eat/can be eaten, you should take into account all possible moves and you probably would spend the same time that would be used to go one level deeper in the minmax algorithm. Moreover that time is wasted if you later decide to indeed proceed one step further in the minmax.
Related
I have a challenge that my teacher gave to beat an army of his soldiers on a 18x24 grid, with random obstacles placed on the board. The game is turn based and I have an army of 50 soldiers, each of which needs to either move or attack on their turn.
My problem is I only have access to creating a class of soldiers to fight in this environment. Currently I have a method that evaluates the board position by looking at how many soldiers there are left from each team and does yourTeam - enemyTeam to get the current score, and I have a method that will produce the legal moves for the soldier.
I want to know how I would create a reinforcement learning agent in Java with what I have access to. If you know any ways to do this or any resources that may help that would be great. Thank you for the help!
Java is not a good language for doing math heavy computation (which is what you will need to do for RL). You could attempt to implement the Q-Learning, value-iteration or policy-iteration algorithms but I would avoid doing anything with neural networks/modern deep RL approaches here as your work load will increase dramatically.
With regard to your problem, if you are to implement one of the old-school algorithms. Think about your state and action space. I have serious concerns about the size of your action space, even with a small number of moves for each solider (say 3 - attack, move up, move down) with 50 soldiers the action space will be very large - 50^3, even this many will be difficult to deal with, any more (even 4 or 5) will send you deep into some complex topics in RL.
Other problems are - defining a good reward signal, efficiently running (potentially millions) of simulated games.
The short answer is, this is not something to be taken lightly, it would be challenging and time consuming even for someone who has experience in the field and using Java is a no-no (Python is better). Given you probably don't have long to find a good solution, I would recommend trying a different approach - planning based maybe, or hard coding a reasonable strategy.
If you still want to go ahead and read up on the topic here are some good resources:
Reinforcement Learning an Introduction (Sutton & Barto) - any edition is fine
Selected chapters in Artificial Intelligence: A Modern Approach (Russel & Norvig)
Hope this helps and sorry it may not have been the answer you we hoping for!
we are working on a little Java game, based on the game Blokus.
Blokus-Manual
I'm a Java-beginner and plan to implement an advanced artificial intelligence. We already have a random AI (picks a random valid move) and an AI with a simple move-rating mechanism. We also want an AI which should be as good as possible (or at least very good ;) ).
The question is: Which AI-concept would be suitable for our purpose?
The minimax-algorithm seems to be a valid choice, but how do you adapt it to a 4-player-game? Are there better concepts for a game like blokus?
Thanks already :)
Min-max is hard to implement in a 4 player game because:
Decision tree grows exponentially, so you're going to be bounded by memory and/or computation time to a log(medMoves)=N steps. For a 4 player game, this is down to N/4. If N is 8 for example, you're only going to be able to see 2 moves ahead for each player.
Player collusion is hard to account for. In a realistic game, some players might help each other out (even if they're not on the same team). This will cause them to deviate from their personal 'maximum'.
If you want Minmax, you're going to have to do a lot of pruning to make it viable. What I would suggest is learning a few patterns so the AI would know how to react. This can be done via neural net, or reinforcement learning with a few tweaks.
These patterns could be be static (you can create the input scenario manually or programmatically), or dynamic (create all valid scenarios and randomly makes moves select the ones with the best score).
Theoretically speaking, an "as good as possible AI" is a perfect AI, which is an AI that has, at any moment in the game, full knowledge of the game state (if the full game state is not known by human players). In case of games that everyone has full game state knowledge (like Blokus), a good as possible AI is an AI that can try to predict the very best move to make (minimax here, as you said). You can also google for genetic algorithms and simulated annealing, as they are valid, depending on what you want. Also, you can use minimax for more than 2 players.
I would recommend minimax algorithm. One thing you can add to make it more efficient (meaning you should be able go more moves deep into the future) is alpha-beta pruning.
The problem with minimax search is that the number of games states it has to examine is exponential in the depth of the tree. Unfortunately, we can't eliminate the exponent, but it turns out we can effectively cut it in half.
The quote is from Chapter 5.3 of Artificial Intelligence: A Modern Approach third edition by Stuart Russel and Peter Norvig. It was holding up my monitor, and I used it in a few of my classes in college. I know people don't often reference books on SO, but it's extremely relevant. I have used it extensively, and I do really recommend it for both being understandable, and covering a wide range of AI content.
It is available on amazon for $104, or * cough cough * I'm sure you can find it online if you don't have that kind of money for a textbook floating around. Looking up the minimax algorithm and alpha beta pruning online should also get you good results.
I think the only circumstance that would make Minimax a poor option for you is if the game state is only partially observable to any given player (they don't know everything about what's going on), or if the game is non-deterministic (it has random elements). Because neither of these are the case for Blokus, I think you made an excellent choice with Minimax.
The area of AI is called Adversarial Search in the textbook (Chapter 5: Adversarial Search), so looking up more info online with that term may get you more helpful information, or help you find an example Java implementation. I do not consider this a beginner's task, but it sounds like you are up to it, if you made the game and can pick random valid moves. Keep up the good work!
In 2011, with many updates since then, a program called Pentobi
was released, and it is a very strong Blokus playing program.
The only one known to date, in fact, which is any good at all, and it
surpasses all the others by a great deal. It will beat many good human players and gives even the best a run for their money.
Its main algorithm is Monte Carlo Search Tree, but it also uses a "book" of openings and some heuristics.
There is documentation and download information at
http://pentobi.sourceforge.net/
I found that using a very simple heuristic provides a fairly intelligent player even using only 1-step look ahead. I implemented what I called "space heuristic" which takes the board state and floods it by coloring all squares adjacent to each placed piece the color of that piece. Then, the total number of colored squares is counted once the flooding terminates. The space heuristic gives a rough estimate of how much a play claims or occupies board space, and way outperforms random play. Could be combined with minimax or MCTS to get way stronger as well.
I'm going be competing in a board game AI competition at my school and am trying to come up with some ideas for concurrency to gain an edge. I will most likely be at a disadvantage because I will be implementing it in java and I understand c or c++ would be much faster.
It doesn't seem like you could just split the game tree in half because of the move ordering which should leave the best moves first and it seems that it would be difficult or maybe even impossible to communicate the current alpha/beta at a given depth. I'm going to be using transposition tables as well which would need to be synchronized.
Besides searching, is there something that a second thread could be doing which could aid in the search or provide some type of speed increase. Each AI will have 5 seconds to make a move and your program can be working while the opponent is thinking.
Any input, no matter how obscure, would be appreciated.
An overview can be found in the Chess Programming Wiki's parallel search article. Even if your actual game is not chess, many concepts will also apply. The site also covers sophisticated solutions for shared transposition tables.
However, when you don't have much time, I would not start with a parallel search. You are correct that parallelism can increase the strength of the search algorithm. It is very difficult to get it right, though, and the benefits are way lower than one would expect.
If you want to experiment with parallelism, go ahead. It is an interesting topic. However, if you just want to get the best results in a limited amount of time, I would recommend to stick with a sequential search, and instead focus on move ordering and correctness.
It is possible. You have to make communication between threads to have AB prunning help. Also, move ordering must be tweaked, it doesn't help if one thread has the best-rated moves to analyze while the others not.
This is my first question here, if I did something wrong, tell me...
I'm currently making a draughts game in Java. In fact everything works except the AI.
The AI is at the moment single threaded, using minimax and alpha-beta pruning. This code works, I think, it's just very slow, I can only go 5 deep into my game tree.
I have a function that recieves my mainboard, a depth (starts at 0) and a maxdepth. At this maxdepth it stops, returns the player's value (-1,1 or 0) with the most pieces on the board and ends the recursive call.
If maxdepth isn't reached yet, I calculate all the possible moves, I execute them one by one, storing my changes to the mainboard in someway.
I also use alpha-beta pruning, e.g. when I found a move that can make the player win I don't bother about the next possible moves.
I calculate the next set of moves from that mainboard state recursively. I undo those changes (from point 2) when coming out of the recursive call. I store the values returned by those recursive calls and use minimax on those.
That's the situation, now I have some questions.
I'd like to go deeper into my game tree, thus I have to diminish the time it takes to calculate moves.
Is it normal that the values of the possible moves of the AI (e.g. the moves that the AI can choose between) are always 0? Or will this change if I can go deeper into the recursion? Since at this moment I can only go 5 deep (maxdepth) into my recursion because otherwise it takes way too long.
I don't know if it's usefull, but how I can convert this recursion into a multithreaded recursion. I think this can divide the working time by some value...
Can someone help me with this please?
1. Is it normal that the values of the possible moves of the AI (e.g. the moves that the AI can choose between) are always 0?
Sounds strange to me. If the number of possible moves is 0, then that player can't play his turn. This shouldn't be very common, or have I misunderstood something?
If the value you're referring to represents the "score" of that move, then obviously "always 0" would indicate that all move are equally good, which obviously doesn't make a very good AI algorithm.
2. I don't know if it's usefull, but how I can convert this recursion into a multithreaded recursion. I think this can divide the working time by some value...
I'm sure it would be very useful, especially considering that most machines have several cores these days.
What makes it complicated is your "try a move, record it, undo it, try next move" approach. This indicates that you're working with a mutable data structure, which makes it extremely complicated to paralellize the algorithm.
If I were you, I would let the bord / game state be represented by an immutable data structure. You could then let each recursive call be treated as a separate task, and use a pool of threads to process them. You would get close to maximum utilization of the CPU(s) and at the same time simplify the code considerably (by removing the whole restore-to-previous-state code).
Assuming you do indeed have several cores on your machine, this could potentially allow you to go deeper in the tree.
I would strongly recommend reading this book:
One Jump Ahead: Computer Perfection At Checkers
It will give you a deep history about computer AI in the game of Checkers and will probably given you some help with your evaluation function.
Instead of having an evaluation function that just gives 1/0/-1 for differing pieces, give a score of 100 for every regular piece and 200 for a king. Then give bonuses for piece structures. For instance, if my pieces form a safe structure that can't be captured, then I get a bonus. If my piece is all alone in the middle of the board, then I get a negative bonus. It is this richness of features for piece configurations that will allow your program to play well. The final score is the difference in the evaluation for both players.
Also, you shouldn't stop your search at a uniform depth. A quiescence search extends search until the board is "quiet". In the case of Checkers, this means that there are no forced captures on the board. If you don't do this, your program will play extremely poorly.
As others have suggested, transposition tables will do a great job of reducing the size of your search tree, although the program will run slightly slower. I would also recommend the history heuristic, which is easy to program and will greatly improve the ordering of moves in the tree. (Google history heuristic for more information on this.)
Finally, the representation of your board can make a big difference. Fast implementations of search do not make copies of the board each time a move is applied, instead they try to quickly modify the board to apply and undo moves.
(I assume by draughts you mean what we would call checkers here in the States.)
I'm not sure if I understand your scoring system inside the game tree. Are you scoring by saying, "Position scores 1 point if player has more pieces than the opponent, -1 point is player has fewer pieces, 0 points if they have the same number of pieces?"
If so, then your algorithm might just be capture averse for the first five moves, or things are working out so that all captures are balanced. I'm not deeply familiar with checkers, but it doesn't seem impossible that this is so for only five moves into the game. And if it's only 5 plies (where a ply is one player's move, rather than a complete set of opposing moves) maybe its not unusual at all.
You might want to test this by feeding in a board position where you know absolutely the right answer, perhaps something with only two checkers on the board with one in a position to capture.
As a matter of general principle, though, the board evaluation function doesn't make a lot of sense-- it ignores the difference between a piece and a crowned piece, and it treats a three piece advantage the same as a one piece advantage.
I want to implement a reinforcement learning connect four agent.
I am unsure how to do so and how it should look. I am familiar with the theoretical aspects of reinforcement learning but don't know how they should be implemented.
How should it be done?
Should I use TD(lambda) or Q-learning, and how do MinMax trees come in to this?
How does my Q and V functions work (Quality of action and Value of state). How do I score those things? What is my base policy which I improve, and what is my model?
Another thing is how should I save the states or statesXactions (depending on the learning algorithm). Should I use neural networks or not? And if yes, how?
I am using JAVA.
Thanks.
This might be a more difficult problem than you think, and here is why:
The action space for the game is the choice of column to drop a piece into. The state space for the game is an MxN grid. Each column contains up to M pieces distributed among the 2 players.This means there are (2M+1-1)N states. For a standard 6x7 board, this comes out to about 1015. It follows that you cannot apply reinforement learning to the problem directly. The state value function is not smooth, so naĆve function approximation would not work.
But not all is lost. For one thing, you could simplify the problem by separating the action space. If you consider the value of each column separately, based on the two columns next to it, you reduce N to 3 and the state space size to 106. Now, this is very manageable. You can create an array to represent this value function and update it using a simple RL algorithm, such as SARSA.
Note, that the payoff for the game is very delayed, so you might want to use eligibility traces to accelerate learning.