I'm trying to build a game of Connect 4 with minimax (and alpha beta pruning), mostly to prove to myself that I can do it. However, the one big conceptual problem I'm having is with how to actually utilize the minimax algorithm. The way I do it is that I have an AI class that has one function which is to perform the minimax algorithm that returns an int.
public int minimax(Board board, int depth, int alpha, int beta, String player) {
if(depth == 0 || board.getScore() >= 512) {
return board.getScore();
}
else if(player.equals("computer")) {
int temp = -1000000;
for(Integer[] moves : board.availableMoves) {
board.putPiece(player, moves[0]);
temp = Math.max(temp, minimax(board, depth-1, alpha, beta, "human"));
board.removePiece(moves[0], moves[1]);
alpha = Math.max(alpha, temp);
if (alpha >= beta) {
break;
}
}
return temp;
}
else {
int temp = 1000000;
for(Integer[] moves : board.availableMoves) {
board.putPiece(player, moves[0]);
temp = Math.min(temp, minimax(board, depth+1, alpha, beta, "computer"));
board.removePiece(moves[0], moves[1]);
beta = Math.min(beta, temp);
if(alpha >= beta) {
break;
}
}
return temp;
}
}
This is called by a function of the Game class called computerMove().
public int computerMove() {
Board tempBoard = board;
int bestMove = 0;
AI ai = new AI();
ai.minimax(board, difficulty, -1000000, 1000000, "computer");
return bestMove;
}
But, what do I do with the int that is returned? How do I utilize that to actually move the piece? The int that is returned is simply the best possible board I could get, right? It tells me nothing in particular about the location or board that I should do.
Any and all help is greatly appreciated.
Thanks,
The books all say to return just the score, but that's impractical for actually playing the game. Of course the overhead of maintaining the best move everywhere can really slow down the program, so generally you use a driver function that does the first level of expansion, and additionally keeps track of the best move. This is effectively wrapping the implementation in an argmax function, which is just a fancy way of saying it returns the best move at the top level instead of the score. You can see an example of this in a little project I worked on last year. The code is in C# but it's close enough to Java for you to get the idea.
Alternatively, you can modify the code to return a tuple (class with multiple fields) that has the score and the best move. This is easier (and a little cleaner IMO) than writing the argmax wrapper, but without some extra engineering this will probably result in some noticeable slow down of the minimax function because it's going to result in tons more allocations. If performance isn't your top priority, this is probably the way to go.
I should also point out, that your implementation has at least one bug. The depth should always be decreasing regardless of who is playing and in your human branch you have it increase for the human player. This means the depth will never get to 0 and the base case will only be hit when a player is determined to be the winner. Additionally, when using alpha beta, it's important that the board evaluation knows whose turn it is and who is the maximizing player, else you'll run into lots of hard to find bugs. You don't show that code here, but I want to point that out because it gets me every time.
Related
First off I'm sorry for the slightly incorrect title, I just didn't want it to be 30 words long.
The alpha/beta pruning I implemented enormously reduced the amount of evaluations when I applied it to my TicTacToe game, see for yourself below.
Each pair of evaluation counts are measured with the same game state as input.
The problem arises when I want to implement the pruning to the Checkers playing Neural Network I've been working on. Which was the goal of this whole thing to begin with, I just whipped up the TicTacToe game to experiment with MiniMax + Alpha/Beta as I've never dealt with these algorithms before.
Here is the same sort of experiment with the NN.
Now for the code (checkers one, let me know if you want to have a peek at the TicTacToe version, they are almost identical though).
I'll paste only once the beginning of both methods as they are absolutely identical, I will show both signatures as they differ slightly.
Small note to make the code more clear.
Board is the object which keeps track of pieces, available moves,
which turn it is, if the game has been won/drawn etc...
Move is the object which contains all information pertinent to moves, when I make
the clone as the first line of the method I simply make a clone of the
given board and the constructor applies the given move to it.
private double miniMax(Board b, Move m, int depth) {
and
private double alphaBeta(Board b, Move m, int depth, double alpha, double beta) {
beginning of both methods:
Testboard clone = new Testboard(b, m);
// Making a clone of the board in order to
// avoid making changes to the original one
if (clone.isGameOver()) {
if (clone.getLoser() == null)
// It's a draw, evaluation = 0
return 0;
if (clone.getLoser() == Color.BLACK)
// White (Max) won, evaluation = 1
return 1;
// Black (Min) won, evaluation = -1
return -1;
}
if (depth == 0)
// Reached the end of the search, returning current Evaluation of the board
return getEvaluation(clone);
Regular MiniMax continuation:
// If it's not game over
if (clone.getTurn() == Color.WHITE) {
// It's white's turn (Maxing player)
double max = -1;
for (Move move : clone.getMoves()) {
// For each children node (available moves)
// Their minimax value is calculated
double score = miniMax(clone, move, depth-1);
// Only the highest score is stored
if (score > max)
max = score;
}
// And is returned
return max;
}
// It's black's turn (Min player)
double min = 1;
for (Move move : clone.getMoves()) {
// For each children node (available moves)
// Their minimax value is calculated
double score = miniMax(clone, move, depth-1);
// Only the lowest score is stored
if (score < min)
min = score;
}
// And is returned
return min;
}
MiniMax with Alpha/Beta pruning continuation:
// If it's not game over
if (clone.getTurn() == Color.WHITE) {
// It's white's turn (Maxing player)
for (Move move : clone.getMoves()) {
// For each children node (available moves)
// Their minimax value is calculated
double score = alphaBeta(clone, move, depth-1, alpha, beta);
if (score > alpha)
// If this score is greater than alpha
// It is assigned to alpha as the new highest score
alpha = score;
if (alpha >= beta)
// The cycle is interrupted early if the value of alpha equals or is greater than beta
break;
}
// The alpha value is returned
return alpha;
}
// It's black's turn (Min player)
for (Move move : clone.getMoves()) {
// For each children node (available moves)
// Their minimax value is calculated
double score = alphaBeta(clone, move, depth-1, alpha, beta);
if (score < beta)
// If this score is lower than beta
// It is assigned to beta as the new lowest score
beta = score;
if (alpha >= beta)
// The cycle is interrupted early if the value of alpha equals or is greater than beta
break;
}
// The beta value is returned
return beta;
}
I'm honestly stuck and I'm not sure what I could do to try and figure out what's going on. I've tried the MiniMax+A/B on several different even randomly generated neural networks but I've never seen an improvement when it comes to number of evaluations made. I hope someone here is able to shed some light on this situation, thanks!
Thanks #maraca for helping me figure this out, going to answer myself as he only replied with a comment.
There is nothing wrong with the code that I posted, the problem lies with the evaluation function that I was using once the search reached the depth limit.
I was using a still untrained Neural Network that was essentially just spitting out random values, this forced the MiniMax+A/B to go through all the nodes as there was no consistency with the answers which turns out is what is necessary for pruning to happen.
I have created a gameboard (5x5) and I now want to decide when a move is legal as fast as possible. For example a piece at (0,0) wants to go to (1,1), is that legal? First I tried to find this out with computations but that seemed bothersome. I would like to hard-code the possible moves based on a position on the board and then iterate through all the possible moves to see if they match the destinations of the piece. I have problems getting this on paper. This is what I would like:
//game piece is at 0,0 now, decide if 1,1 is legal
Point destination = new Point(1,1);
destination.findIn(legalMoves[0][0]);
The first problem I face is that I don't know how to put a list of possible moves in an array at for example index [0][0]. This must be fairly obvious but I am stuck at this for some time. I would like to create an array in which there is a list of Point objects. So in semi-code: legalMoves[0][0] = {Point(1,1),Point(0,1),Point(1,0)}
I am not sure if this is efficient but it makes logically move sense than maybe [[1,1],[0,1],[1,0]] but I am not sold on this.
The second problem I have is that instead of creating the object at every start of the game with an instance variable legalMoves, I would rather have it read from disk. I think that it should be quicker this way? Is the serializable class the way to go?
My 3rd small problem is that for the 25 positions the legal moves are unbalanced. Some have 8 possible legal moves, others have 3. Maybe this is not a problem at all.
You are looking for a structure that will give you the candidate for a given point, i.e. Point -> List<Point>.
Typically, I would go for a Map<Point, List<Point>>.
You can initialise this structure statically at program start or dynamically when needing. For instance, here I use 2 helpers arrays that contains the possible translations from a point, and these will yield the neighbours of the point.
// (-1 1) (0 1) (1 1)
// (-1 0) (----) (1 0)
// (-1 -1) (0 -1) (1 -1)
// from (1 0) anti-clockwise:
static int[] xOffset = {1,1,0,-1,-1,-1,0,1};
static int[] yOffset = {0,1,1,1,0,-1,-1,-1};
The following Map contains the actual neighbours for a Point with a function that compute, store and return these neighbours. You can choose to initialise all neighbours in one pass, but given the small numbers, I would not think this a problem performance wise.
static Map<Point, List<Point>> neighbours = new HashMap<>();
static List<Point> getNeighbours(Point a) {
List<Point> nb = neighbours.get(a);
if (nb == null) {
nb = new ArrayList<>(xOffset.length); // size the list
for (int i=0; i < xOffset.length; i++) {
int x = a.getX() + xOffset[i];
int y = a.getY() + yOffset[i];
if (x>=0 && y>=0 && x < 5 && y < 5) {
nb.add(new Point(x, y));
}
}
neighbours.put(a, nb);
}
return nb;
}
Now checking a legal move is a matter of finding the point in the neighbours:
static boolean isLegalMove(Point from, Point to) {
boolean legal = false;
for (Point p : getNeighbours(from)) {
if (p.equals(to)) {
legal = true;
break;
}
}
return legal;
}
Note: the class Point must define equals() and hashCode() for the map to behave as expected.
The first problem I face is that I don't know how to put a list of possible moves in an array at for example index [0][0]
Since the board is 2D, and the number of legal moves could generally be more than one, you would end up with a 3D data structure:
Point legalMoves[][][] = new legalMoves[5][5][];
legalMoves[0][0] = new Point[] {Point(1,1),Point(0,1),Point(1,0)};
instead of creating the object at every start of the game with an instance variable legalMoves, I would rather have it read from disk. I think that it should be quicker this way? Is the serializable class the way to go?
This cannot be answered without profiling. I cannot imagine that computing legal moves of any kind for a 5x5 board could be so intense computationally as to justify any kind of additional I/O operation.
for the 25 positions the legal moves are unbalanced. Some have 8 possible legal moves, others have 3. Maybe this is not a problem at all.
This can be handled nicely with a 3D "jagged array" described above, so it is not a problem at all.
I am trying to implement a chess game with alpha beta pruning. The following is almost working, but it returns wrong moves.
For example, the following can occur.
White (user) to move, white king position - a1 / Black (computer), black king position - h1
White moves its king from a1 - a2, then black return the move g2 - g1???
It appears that the computer returns a move for the wrong node (board representation), as if the best evaluation of a given board position is not being propagated all the way back up the tree. So in one of the simulated positions explored, the computer "imagines" its king moving to g2 and then returns the move to be made from this position, not realising that this position is a simulated position and not the representation of the actual board (the root node?).
How can I correct the code to make the computer return a move for the actual board representation and not one of the simulations by mistake?
Thank you.
Initial call alphaBeta(3, ChessEngine.invertBoard(ChessEngine.board), -10000, 10000, true);
private static int alphaBetaEvaluate = 0;
private static int alphaBetaSelectedSquare = 0;
private static int alphaBetaMoveToSquare = 0;
public static int alphaBeta(int depth, char[] board, int alpha, int beta, boolean maxPlayer) {
//create a copy of the board
char[] boardCopy = board.clone();
//if terminal state has not been met, keep searching
if (maxPlayer == true && depth > 0) {
//for all of the moves that max can make
for (int i = 0; i < board.length; i++) {
for (int move : ChessEngine.getValidMoves(i, boardCopy)) {
//make the move
boardCopy[move] = boardCopy[i];
boardCopy[i] = '.';
alphaBetaEvaluate = rating(board, boardCopy, i, move);
//store the best move to make
int temp = alphaBeta(--depth, ChessEngine.invertBoard(boardCopy), -10000, 10000, false);
if (temp > alpha) {
alphaBetaSelectedSquare = i;
alphaBetaMoveToSquare = move;
alpha = temp;
}
//reset the board for the next simulated move
boardCopy = board.clone();
if (beta <= alpha) {
break;
}
}
}
return alpha;
} else if (maxPlayer == false && depth > 0) {
//for all of the moves that min can make
for (int i = 0; i < board.length; i++) {
for (int move : ChessEngine.getValidMoves(i, boardCopy)) {
//make the move
boardCopy[move] = boardCopy[i];
boardCopy[i] = '.';
beta = Math.min(beta, alphaBeta(--depth, ChessEngine.invertBoard(boardCopy), -10000, 10000, true));
//reset the board for the next simulated move
boardCopy = board.clone();
if (beta <= alpha) {
break;
}
}
}
return beta;
}
return alphaBetaEvaluate;
}
I dont get your implementation after all. First of all what you want to do is create a tree. A decision tree and propagates the decision up. You want to maximize your evaluation and also expect that the enemy will select the move that minimizes your evaluation in return.
So inverting the board does not sound so reasonable for me unless you know that the evaluation you do uppon the situation is correctly adjusting.
Another serious problem for me is that you always call the min/max for the next move with -10k and 10k as the bounderies for alpha and beta. This way your algorithm does not 'learn' from previous moves.
What you need is to check the algorithm again (wikipedia for instance, which I used) and see that they use alpha and beta being modified by former evaluation. This way the calculation in higher depth can firstly stop and secondly evaluate the best move better.
I am no expert in this. its decades ago when I wrote my implementation and I used something different.
Another idea is not to use min and max within the same method but use the min and max methods instead. It makes it more likely you spot other defects.
Also do not use two kings for evaluation. There is no goal in that. Two kings are random, cant win. One thing might be two knights or four queens and alike. It is not so random and you can see the queens dancing around without being able to catch each other. Or use three knights versus a single queen.
And try to create yourself some unit tests around your other parts. Just to insure that the parts are working correctly independently. And why are you using characters? Why not using enums or objects. You can reuse the objets for each field (its more like kinds of figures).
But anyhow this is style and not algorithm correctness.
I've made a negamax algorithm for a chess-like game and I want to know how to use the final board value result. I understand the final return of the negamax algorithm represents what the board value will be after the player takes his best possible move, but that isn't exactly useful information. I need to know what that move is, not what it's worth.
Here's the code:
public int negamax(Match match, int depth, int alpha, int beta, int color) {
if(depth == 0) {
return color*stateScore(match);
}
ArrayList<Match> matches = getChildren(match, color);
if(matches.size() == 0) {
return color*stateScore(match);
}
int bestValue = Integer.MIN_VALUE;
for(int i = 0; i != matches.size(); i++) {
int value = -negamax(matches.get(i), depth-1, -beta, -alpha, -color);
if(value > bestValue) {
bestValue = value;
}
if(value > alpha) {
alpha = value;
}
if(alpha >= beta) {
break;
}
}
return bestValue;
}
public void getBestMove(Match match, int color) {
int bestValue = negamax(match, 4, Integer.MIN_VALUE, Integer.MAX_VALUE, color);
// What to do with bestValue???
}
I thought of re-evaluating the children of the current match state after bestValue is determined. Then I iterate through them and find which of those children has a stateScore equal to bestValue. But that wouldn't work because a lot of them will have the same stateScore anyway, it's what they can lead to which counts...
I can see you're doing a qsearch and alpha-beta. Your algorithm is well-known but you're missing a key part.
Let me sketch out the basic algorithm for chess search, it applies even to Stockfish (the strongest engine in the world).
search(Position p) {
if (leaf node)
qsearch(p)
if (need to do move reduction)
do_move_reduction_and_cut_off(p)
moves = generate_moves(p)
for_each(move in moves) {
p.move(move)
v = -search(p, -beta, -alpha)
p.undo(move)
store the score and move into a hash table
if (v > beta)
cutoff break;
}
This is just a very brief sketch, but all chess algorithms follow it. Compare your version with it, do you notice that you haven't done p.move(move) and p.undo(move)?
Basically, the traditional approach generates a list of moves for a given position. Loop through the moves, play it and undo it and search it. If you do it, you know exactly which move produces which score.
Also notice the line for storing the move and score into a hash table. If you do this, you can easily reconstruct the entire principal variation from a root node.
I don't know what exactly is inside your Java class Match, but in any case your attempt was close but no exactly the classical way to do a search. Remember you'll need to give a position object in a search algorithm but instead you gave it a Match object, which is wrong.
I have a basic implementation of alpha-beta pruning but I have no idea how to improve the move ordering. I have read that it can be done with a shallow search, iterative deepening or storing the bestMoves to transition table.
Any suggestions how to implement one of these improvements in this algorithm?
public double alphaBetaPruning(Board board, int depth, double alpha, double beta, int player) {
if (depth == 0) {
return board.evaluateBoard();
}
Collection<Move> children = board.generatePossibleMoves(player);
if (player == 0) {
for (Move move : children) {
Board tempBoard = new Board(board);
tempBoard.makeMove(move);
int nextPlayer = next(player);
double result = alphaBetaPruning(tempBoard, depth - 1, alpha,beta,nextPlayer);
if ((result > alpha)) {
alpha = result;
if (depth == this.origDepth) {
this.bestMove = move;
}
}
if (alpha >= beta) {
break;
}
}
return alpha;
} else {
for (Move move : children) {
Board tempBoard = new Board(board);
tempBoard.makeMove(move);
int nextPlayer = next(player);
double result = alphaBetaPruning(tempBoard, depth - 1, alpha,beta,nextPlayer);
if ((result < beta)) {
beta = result;
if (depth == this.origDepth) {
this.bestMove = move;
}
}
if (beta <= alpha) {
break;
}
}
return beta;
}
}
public int next(int player) {
if (player == 0) {
return 4;
} else {
return 0;
}
}
Node reordering with shallow search is trivial: calculate the
heuristic value for each child of the state before recursively
checking them. Then, sort the values of these states [descending
for max vertex, and ascending for min vertex], and recursively invoke
the algorithm on the sorted list. The idea is - if a state is good at
shallow depth, it is more likely to be good at deep state as well,
and if it is true - you will get more prunnings.
The sorting should be done before this [in both if and else clauses]
for (Move move : children) {
storing moves is also trivial - many states are calculated twice,
when you finish calculating any state, store it [with the depth of
the calculation! it is improtant!] in a HashMap. First thing you do
when you start calculation on a vertex - is check if it is already
calculated - and if it is, returned the cached value. The idea behind
it is that many states are reachable from different paths, so this
way - you can eliminate redundant calculations.
The changes should be done both in the first line of the method [something like if (cache.contains((new State(board,depth,player)) return cache.get(new State(board,depth,player))] [excuse me for lack of elegance and efficiency - just explaining an idea here].
You should also add cache.put(...) before each return statement.
First of all one has to understand the reasoning behind the move ordering in an alpha-beta pruning algorithm. Alpha-beta produces the same result as a minimax but in a lot of cases can do it faster because it does not search through the irrelevant branches.
It is not always faster, because it does not guarantee to prune, if fact in the worse case it will not prune at all and search absolutely the same tree as minimax and will be slower because of a/b values book-keeping. In the best case (maximum pruning) it allows to search a tree 2 times deep at the same time. For a random tree it can search 4/3 times deeper for the same time.
Move ordering can be implemented in a couple of ways:
you have a domain expert who gives you suggestion of what moves are better. For example in chess promotion of a pawn, capturing high value pieces with lower value piece are on average good moves. In checkers it is better to kill more checkers in a move then less checker and it is better to create a queen. So your move generation function return better moves before
you get the heuristic of how good is the move from evaluating the position at the 1 level of depth smaller (your shallow search / iterative deepening). You calculated the evaluation at the depth n-1, sorted the moves and then evaluate at the depth n.
The second approach you mentioned has nothing to do with a move ordering. It has to do with a fact that evaluation function can be expensive and many positions are evaluated many time. To bypass this you can store the values of the position in hash once you calculated it and reuse it later.