Alpha-beta move ordering - java

I have a basic implementation of alpha-beta pruning but I have no idea how to improve the move ordering. I have read that it can be done with a shallow search, iterative deepening or storing the bestMoves to transition table.
Any suggestions how to implement one of these improvements in this algorithm?
public double alphaBetaPruning(Board board, int depth, double alpha, double beta, int player) {
if (depth == 0) {
return board.evaluateBoard();
}
Collection<Move> children = board.generatePossibleMoves(player);
if (player == 0) {
for (Move move : children) {
Board tempBoard = new Board(board);
tempBoard.makeMove(move);
int nextPlayer = next(player);
double result = alphaBetaPruning(tempBoard, depth - 1, alpha,beta,nextPlayer);
if ((result > alpha)) {
alpha = result;
if (depth == this.origDepth) {
this.bestMove = move;
}
}
if (alpha >= beta) {
break;
}
}
return alpha;
} else {
for (Move move : children) {
Board tempBoard = new Board(board);
tempBoard.makeMove(move);
int nextPlayer = next(player);
double result = alphaBetaPruning(tempBoard, depth - 1, alpha,beta,nextPlayer);
if ((result < beta)) {
beta = result;
if (depth == this.origDepth) {
this.bestMove = move;
}
}
if (beta <= alpha) {
break;
}
}
return beta;
}
}
public int next(int player) {
if (player == 0) {
return 4;
} else {
return 0;
}
}

Node reordering with shallow search is trivial: calculate the
heuristic value for each child of the state before recursively
checking them. Then, sort the values of these states [descending
for max vertex, and ascending for min vertex], and recursively invoke
the algorithm on the sorted list. The idea is - if a state is good at
shallow depth, it is more likely to be good at deep state as well,
and if it is true - you will get more prunnings.
The sorting should be done before this [in both if and else clauses]
for (Move move : children) {
storing moves is also trivial - many states are calculated twice,
when you finish calculating any state, store it [with the depth of
the calculation! it is improtant!] in a HashMap. First thing you do
when you start calculation on a vertex - is check if it is already
calculated - and if it is, returned the cached value. The idea behind
it is that many states are reachable from different paths, so this
way - you can eliminate redundant calculations.
The changes should be done both in the first line of the method [something like if (cache.contains((new State(board,depth,player)) return cache.get(new State(board,depth,player))] [excuse me for lack of elegance and efficiency - just explaining an idea here].
You should also add cache.put(...) before each return statement.

First of all one has to understand the reasoning behind the move ordering in an alpha-beta pruning algorithm. Alpha-beta produces the same result as a minimax but in a lot of cases can do it faster because it does not search through the irrelevant branches.
It is not always faster, because it does not guarantee to prune, if fact in the worse case it will not prune at all and search absolutely the same tree as minimax and will be slower because of a/b values book-keeping. In the best case (maximum pruning) it allows to search a tree 2 times deep at the same time. For a random tree it can search 4/3 times deeper for the same time.
Move ordering can be implemented in a couple of ways:
you have a domain expert who gives you suggestion of what moves are better. For example in chess promotion of a pawn, capturing high value pieces with lower value piece are on average good moves. In checkers it is better to kill more checkers in a move then less checker and it is better to create a queen. So your move generation function return better moves before
you get the heuristic of how good is the move from evaluating the position at the 1 level of depth smaller (your shallow search / iterative deepening). You calculated the evaluation at the depth n-1, sorted the moves and then evaluate at the depth n.
The second approach you mentioned has nothing to do with a move ordering. It has to do with a fact that evaluation function can be expensive and many positions are evaluated many time. To bypass this you can store the values of the position in hash once you calculated it and reuse it later.

Related

Why is Alpha/Beta pruning having no effect on my MiniMax algorithm?

First off I'm sorry for the slightly incorrect title, I just didn't want it to be 30 words long.
The alpha/beta pruning I implemented enormously reduced the amount of evaluations when I applied it to my TicTacToe game, see for yourself below.
Each pair of evaluation counts are measured with the same game state as input.
The problem arises when I want to implement the pruning to the Checkers playing Neural Network I've been working on. Which was the goal of this whole thing to begin with, I just whipped up the TicTacToe game to experiment with MiniMax + Alpha/Beta as I've never dealt with these algorithms before.
Here is the same sort of experiment with the NN.
Now for the code (checkers one, let me know if you want to have a peek at the TicTacToe version, they are almost identical though).
I'll paste only once the beginning of both methods as they are absolutely identical, I will show both signatures as they differ slightly.
Small note to make the code more clear.
Board is the object which keeps track of pieces, available moves,
which turn it is, if the game has been won/drawn etc...
Move is the object which contains all information pertinent to moves, when I make
the clone as the first line of the method I simply make a clone of the
given board and the constructor applies the given move to it.
private double miniMax(Board b, Move m, int depth) {
and
private double alphaBeta(Board b, Move m, int depth, double alpha, double beta) {
beginning of both methods:
Testboard clone = new Testboard(b, m);
// Making a clone of the board in order to
// avoid making changes to the original one
if (clone.isGameOver()) {
if (clone.getLoser() == null)
// It's a draw, evaluation = 0
return 0;
if (clone.getLoser() == Color.BLACK)
// White (Max) won, evaluation = 1
return 1;
// Black (Min) won, evaluation = -1
return -1;
}
if (depth == 0)
// Reached the end of the search, returning current Evaluation of the board
return getEvaluation(clone);
Regular MiniMax continuation:
// If it's not game over
if (clone.getTurn() == Color.WHITE) {
// It's white's turn (Maxing player)
double max = -1;
for (Move move : clone.getMoves()) {
// For each children node (available moves)
// Their minimax value is calculated
double score = miniMax(clone, move, depth-1);
// Only the highest score is stored
if (score > max)
max = score;
}
// And is returned
return max;
}
// It's black's turn (Min player)
double min = 1;
for (Move move : clone.getMoves()) {
// For each children node (available moves)
// Their minimax value is calculated
double score = miniMax(clone, move, depth-1);
// Only the lowest score is stored
if (score < min)
min = score;
}
// And is returned
return min;
}
MiniMax with Alpha/Beta pruning continuation:
// If it's not game over
if (clone.getTurn() == Color.WHITE) {
// It's white's turn (Maxing player)
for (Move move : clone.getMoves()) {
// For each children node (available moves)
// Their minimax value is calculated
double score = alphaBeta(clone, move, depth-1, alpha, beta);
if (score > alpha)
// If this score is greater than alpha
// It is assigned to alpha as the new highest score
alpha = score;
if (alpha >= beta)
// The cycle is interrupted early if the value of alpha equals or is greater than beta
break;
}
// The alpha value is returned
return alpha;
}
// It's black's turn (Min player)
for (Move move : clone.getMoves()) {
// For each children node (available moves)
// Their minimax value is calculated
double score = alphaBeta(clone, move, depth-1, alpha, beta);
if (score < beta)
// If this score is lower than beta
// It is assigned to beta as the new lowest score
beta = score;
if (alpha >= beta)
// The cycle is interrupted early if the value of alpha equals or is greater than beta
break;
}
// The beta value is returned
return beta;
}
I'm honestly stuck and I'm not sure what I could do to try and figure out what's going on. I've tried the MiniMax+A/B on several different even randomly generated neural networks but I've never seen an improvement when it comes to number of evaluations made. I hope someone here is able to shed some light on this situation, thanks!
Thanks #maraca for helping me figure this out, going to answer myself as he only replied with a comment.
There is nothing wrong with the code that I posted, the problem lies with the evaluation function that I was using once the search reached the depth limit.
I was using a still untrained Neural Network that was essentially just spitting out random values, this forced the MiniMax+A/B to go through all the nodes as there was no consistency with the answers which turns out is what is necessary for pruning to happen.

Implementing a Minimax Algorithm in Java for Connect 4

I'm trying to build a game of Connect 4 with minimax (and alpha beta pruning), mostly to prove to myself that I can do it. However, the one big conceptual problem I'm having is with how to actually utilize the minimax algorithm. The way I do it is that I have an AI class that has one function which is to perform the minimax algorithm that returns an int.
public int minimax(Board board, int depth, int alpha, int beta, String player) {
if(depth == 0 || board.getScore() >= 512) {
return board.getScore();
}
else if(player.equals("computer")) {
int temp = -1000000;
for(Integer[] moves : board.availableMoves) {
board.putPiece(player, moves[0]);
temp = Math.max(temp, minimax(board, depth-1, alpha, beta, "human"));
board.removePiece(moves[0], moves[1]);
alpha = Math.max(alpha, temp);
if (alpha >= beta) {
break;
}
}
return temp;
}
else {
int temp = 1000000;
for(Integer[] moves : board.availableMoves) {
board.putPiece(player, moves[0]);
temp = Math.min(temp, minimax(board, depth+1, alpha, beta, "computer"));
board.removePiece(moves[0], moves[1]);
beta = Math.min(beta, temp);
if(alpha >= beta) {
break;
}
}
return temp;
}
}
This is called by a function of the Game class called computerMove().
public int computerMove() {
Board tempBoard = board;
int bestMove = 0;
AI ai = new AI();
ai.minimax(board, difficulty, -1000000, 1000000, "computer");
return bestMove;
}
But, what do I do with the int that is returned? How do I utilize that to actually move the piece? The int that is returned is simply the best possible board I could get, right? It tells me nothing in particular about the location or board that I should do.
Any and all help is greatly appreciated.
Thanks,
The books all say to return just the score, but that's impractical for actually playing the game. Of course the overhead of maintaining the best move everywhere can really slow down the program, so generally you use a driver function that does the first level of expansion, and additionally keeps track of the best move. This is effectively wrapping the implementation in an argmax function, which is just a fancy way of saying it returns the best move at the top level instead of the score. You can see an example of this in a little project I worked on last year. The code is in C# but it's close enough to Java for you to get the idea.
Alternatively, you can modify the code to return a tuple (class with multiple fields) that has the score and the best move. This is easier (and a little cleaner IMO) than writing the argmax wrapper, but without some extra engineering this will probably result in some noticeable slow down of the minimax function because it's going to result in tons more allocations. If performance isn't your top priority, this is probably the way to go.
I should also point out, that your implementation has at least one bug. The depth should always be decreasing regardless of who is playing and in your human branch you have it increase for the human player. This means the depth will never get to 0 and the base case will only be hit when a player is determined to be the winner. Additionally, when using alpha beta, it's important that the board evaluation knows whose turn it is and who is the maximizing player, else you'll run into lots of hard to find bugs. You don't show that code here, but I want to point that out because it gets me every time.

Negamax chess algorithm: How to use final return?

I've made a negamax algorithm for a chess-like game and I want to know how to use the final board value result. I understand the final return of the negamax algorithm represents what the board value will be after the player takes his best possible move, but that isn't exactly useful information. I need to know what that move is, not what it's worth.
Here's the code:
public int negamax(Match match, int depth, int alpha, int beta, int color) {
if(depth == 0) {
return color*stateScore(match);
}
ArrayList<Match> matches = getChildren(match, color);
if(matches.size() == 0) {
return color*stateScore(match);
}
int bestValue = Integer.MIN_VALUE;
for(int i = 0; i != matches.size(); i++) {
int value = -negamax(matches.get(i), depth-1, -beta, -alpha, -color);
if(value > bestValue) {
bestValue = value;
}
if(value > alpha) {
alpha = value;
}
if(alpha >= beta) {
break;
}
}
return bestValue;
}
public void getBestMove(Match match, int color) {
int bestValue = negamax(match, 4, Integer.MIN_VALUE, Integer.MAX_VALUE, color);
// What to do with bestValue???
}
I thought of re-evaluating the children of the current match state after bestValue is determined. Then I iterate through them and find which of those children has a stateScore equal to bestValue. But that wouldn't work because a lot of them will have the same stateScore anyway, it's what they can lead to which counts...
I can see you're doing a qsearch and alpha-beta. Your algorithm is well-known but you're missing a key part.
Let me sketch out the basic algorithm for chess search, it applies even to Stockfish (the strongest engine in the world).
search(Position p) {
if (leaf node)
qsearch(p)
if (need to do move reduction)
do_move_reduction_and_cut_off(p)
moves = generate_moves(p)
for_each(move in moves) {
p.move(move)
v = -search(p, -beta, -alpha)
p.undo(move)
store the score and move into a hash table
if (v > beta)
cutoff break;
}
This is just a very brief sketch, but all chess algorithms follow it. Compare your version with it, do you notice that you haven't done p.move(move) and p.undo(move)?
Basically, the traditional approach generates a list of moves for a given position. Loop through the moves, play it and undo it and search it. If you do it, you know exactly which move produces which score.
Also notice the line for storing the move and score into a hash table. If you do this, you can easily reconstruct the entire principal variation from a root node.
I don't know what exactly is inside your Java class Match, but in any case your attempt was close but no exactly the classical way to do a search. Remember you'll need to give a position object in a search algorithm but instead you gave it a Match object, which is wrong.

A* (A star) algorithm optimization

I'm a student and me and my team have to make a simulation of student's behaviour in a campus (like making "groups of friends") walking etc. For finding path that student has to go, I used A* algorithm (as I found out that its one of fastest path-finding algorithms). Unfortunately our simulation doesn't run fluently (it takes like 1-2 sec between successive iterations). I wanted to optimize the algorithm but I don't have any idea what I can do more. Can you guys help me out and share with me information if its possible to optimize my A* algorithm? Here goes code:
public LinkedList<Field> getPath(Field start, Field exit) {
LinkedList<Field> foundPath = new LinkedList<Field>();
LinkedList<Field> opensList= new LinkedList<Field>();
LinkedList<Field> closedList= new LinkedList<Field>();
Hashtable<Field, Integer> gscore = new Hashtable<Field, Integer>();
Hashtable<Field, Field> cameFrom = new Hashtable<Field, Field>();
Field x = new Field();
gscore.put(start, 0);
opensList.add(start);
while(!opensList.isEmpty()){
int min = -1;
//searching for minimal F score
for(Field f : opensList){
if(min==-1){
min = gscore.get(f)+getH(f,exit);
x = f;
}else{
int currf = gscore.get(f)+getH(f,exit);
if(min > currf){
min = currf;
x = f;
}
}
}
if(x == exit){
//path reconstruction
Field curr = exit;
while(curr != start){
foundPath.addFirst(curr);
curr = cameFrom.get(curr);
}
return foundPath;
}
opensList.remove(x);
closedList.add(x);
for(Field y : x.getNeighbourhood()){
if(!(y.getType()==FieldTypes.PAVEMENT ||y.getType() == FieldTypes.GRASS) || closedList.contains(y) || !(y.getStudent()==null))
{
continue;
}
int tentGScore = gscore.get(x) + getDist(x,y);
boolean distIsBetter = false;
if(!opensList.contains(y)){
opensList.add(y);
distIsBetter = true;
}else if(tentGScore < gscore.get(y)){
distIsBetter = true;
}
if(distIsBetter){
cameFrom.put(y, x);
gscore.put(y, tentGScore);
}
}
}
return foundPath;
}
private int getH(Field start, Field end){
int x;
int y;
x = start.getX()-end.getX();
y = start.getY() - end.getY();
if(x<0){
x = x* (-1);
}
if(y<0){
y = y * (-1);
}
return x+y;
}
private int getDist(Field start, Field end){
int ret = 0;
if(end.getType() == FieldTypes.PAVEMENT){
ret = 8;
}else if(start.getX() == end.getX() || start.getY() == end.getY()){
ret = 10;
}else{
ret = 14;
}
return ret;
}
//EDIT
This is what i got from jProfiler:
So getH is a bottlneck yes? Maybe remembering H score of field would be a good idea?
A linked list is not a good data structure for the open set. You have to find the node with the smallest F from it, you can either search through the list in O(n) or insert in sorted position in O(n), either way it's O(n). With a heap it's only O(log n). Updating the G score would remain O(n) (since you have to find the node first), unless you also added a HashTable from nodes to indexes in the heap.
A linked list is also not a good data structure for the closed set, where you need fast "Contains", which is O(n) in a linked list. You should use a HashSet for that.
You can optimize the problem by using a different algorithm, the following page illustrates and compares many different aglorihms and heuristics:
A*
IDA*
Djikstra
JumpPoint
...
http://qiao.github.io/PathFinding.js/visual/
From your implementation it seems that you are using naive A* algorithm. Use following way:-
A* is algorithm which is implemented using priority queue similar to BFS.
Heuristic function is evaluated at each node to define its fitness to be selected as next node to be visited.
As new node is visited its neighboring unvisited nodes are added into queue with its heuristic values as keys.
Do this till every heuristic value in the queue is less than(or greater) calculated value of goal state.
Find bottlenecks of your implementation using profiler . ex. jprofiler is easy to use
Use threads in areas where algorithm can run simultaneously.
Profile your JavaVM to run faster.
Allocate more RAM
a) As mentioned, you should use a heap in A* - either a basic binary heap or a pairing heap which should be theoretically faster.
b) In larger maps, it always happens that you need some time for the algorithm to run (i.e., when you request a path, it will simply have to take some time). What can be done is to use some local navigation algorithm (e.g., "run directly to the target") while the path computes.
c) If you have reasonable amount of locations (e.g., in a navmesh) and some time at the start of your code, why not to use Floyd-Warshall's algorithm? Using that, you can the information where to go next in O(1).
I built a new pathfinding algorithm. called Fast* or Fastaer, It is a BFS like A* but is faster and efficient than A*, the accuracy is 90% A*. please see this link for info and demo.
https://drbendanilloportfolio.wordpress.com/2015/08/14/fastaer-pathfinder/
It has a fast greedy line tracer, to make path straighter.
The demo file has it all. Check Task manager when using the demo for performance metrics. So far upon building this the profiler results of this has maximum surviving generation of 4 and low to nil GC time.

Minimax algorithm doesn't return best move

I'm writing a Othello engine using minimax with alpha-beta pruning.
It's working ok, but i found the following problem:
When the algorithm finds that a position is lost, it returns -INFINITY as expected, but in
this case i'm not able to track the 'best' move...the position is already lost, but it should return a valid move anyway (preferably a move that survives longer, as the good chess engines does).
Here is the code:
private float minimax(OthelloBoard board, OthelloMove best, float alpha, float beta, int depth)
{
OthelloMove garbage = new OthelloMove();
int currentPlayer = board.getCurrentPlayer();
if (board.checkEnd())
{
int bd = board.countDiscs(OthelloBoard.BLACK);
int wd = board.countDiscs(OthelloBoard.WHITE);
if ((bd > wd) && currentPlayer == OthelloBoard.BLACK)
return INFINITY;
else if ((bd < wd) && currentPlayer == OthelloBoard.BLACK)
return -INFINITY;
else if ((bd > wd) && currentPlayer == OthelloBoard.WHITE)
return -INFINITY;
else if ((bd < wd) && currentPlayer == OthelloBoard.WHITE)
return INFINITY;
else
return 0.0f;
}
//search until the end? (true during end game phase)
if (!solveTillEnd )
{
if (depth == maxDepth)
return OthelloHeuristics.eval(currentPlayer, board);
}
ArrayList<OthelloMove> moves = board.getAllMoves(currentPlayer);
for (OthelloMove mv : moves)
{
board.makeMove(mv);
float score = - minimax(board, garbage, -beta, -alpha, depth + 1);
board.undoMove(mv);
if(score > alpha)
{
//Set Best move here
alpha = score;
best.setFlipSquares(mv.getFlipSquares());
best.setIdx(mv.getIdx());
best.setPlayer(mv.getPlayer());
}
if (alpha >= beta)
break;
}
return alpha;
}
I call it using:
AI ai = new AI(board, maxDepth, solveTillEnd);
//create empty (invalid) move to hold best move
OthelloMove bestMove = new OthelloMove();
ai.bestFound = bestMove;
ai.minimax(board, bestMove, -INFINITY, INFINITY, 0);
//dipatch a Thread
new Thread(ai).start();
//wait for thread to finish
OthelloMove best = ai.bestFound();
When a lost position (imagine it's lost 10 moves later for example) is searched, best variable above is equal to the empty invalid move passed as argument...why??
Thanks for any help!
Your problem is that you're using -INFINITY and +INFINITY as win/loss scores. You should have scores for win/loss that are higher/lower than any other positional evaluation score, but not equal to your infinity values. This will guarantee that a move will be chosen even in positions that are hopelessly lost.
It's been a long time since i implemented minimax so I might be wrong, but it seems to me that your code, if you encounter a winning or losing move, does not update the best variable (this happens in the (board.checkEnd()) statement at the top of your method).
Also, if you want your algorithm to try to win with as much as possible, or lose with as little as possible if it can't win, I suggest you update your eval function. In a win situation, it should return a large value (larger that any non-win situation), the more you win with the laregr the value. In a lose situation, it should return a large negative value (less than in any non-lose situation), the more you lose by the less the value.
It seems to me (without trying it out) that if you update your eval function that way and skip the check if (board.checkEnd()) altogether, your algorithm should work fine (unless there's other problems with it). Good luck!
If you can detect that a position is truly won or lost, then that implies you are solving the endgame. In this case, your evaluation function should be returning the final score of the game (e.g. 64 for a total victory, 31 for a narrow loss), since this can be calculated accurately, unlike the estimates that you will evaluate in the midgame.

Categories