I want to implement a reinforcement learning connect four agent.
I am unsure how to do so and how it should look. I am familiar with the theoretical aspects of reinforcement learning but don't know how they should be implemented.
How should it be done?
Should I use TD(lambda) or Q-learning, and how do MinMax trees come in to this?
How does my Q and V functions work (Quality of action and Value of state). How do I score those things? What is my base policy which I improve, and what is my model?
Another thing is how should I save the states or statesXactions (depending on the learning algorithm). Should I use neural networks or not? And if yes, how?
I am using JAVA.
Thanks.
This might be a more difficult problem than you think, and here is why:
The action space for the game is the choice of column to drop a piece into. The state space for the game is an MxN grid. Each column contains up to M pieces distributed among the 2 players.This means there are (2M+1-1)N states. For a standard 6x7 board, this comes out to about 1015. It follows that you cannot apply reinforement learning to the problem directly. The state value function is not smooth, so naĆve function approximation would not work.
But not all is lost. For one thing, you could simplify the problem by separating the action space. If you consider the value of each column separately, based on the two columns next to it, you reduce N to 3 and the state space size to 106. Now, this is very manageable. You can create an array to represent this value function and update it using a simple RL algorithm, such as SARSA.
Note, that the payoff for the game is very delayed, so you might want to use eligibility traces to accelerate learning.
Related
I have a challenge that my teacher gave to beat an army of his soldiers on a 18x24 grid, with random obstacles placed on the board. The game is turn based and I have an army of 50 soldiers, each of which needs to either move or attack on their turn.
My problem is I only have access to creating a class of soldiers to fight in this environment. Currently I have a method that evaluates the board position by looking at how many soldiers there are left from each team and does yourTeam - enemyTeam to get the current score, and I have a method that will produce the legal moves for the soldier.
I want to know how I would create a reinforcement learning agent in Java with what I have access to. If you know any ways to do this or any resources that may help that would be great. Thank you for the help!
Java is not a good language for doing math heavy computation (which is what you will need to do for RL). You could attempt to implement the Q-Learning, value-iteration or policy-iteration algorithms but I would avoid doing anything with neural networks/modern deep RL approaches here as your work load will increase dramatically.
With regard to your problem, if you are to implement one of the old-school algorithms. Think about your state and action space. I have serious concerns about the size of your action space, even with a small number of moves for each solider (say 3 - attack, move up, move down) with 50 soldiers the action space will be very large - 50^3, even this many will be difficult to deal with, any more (even 4 or 5) will send you deep into some complex topics in RL.
Other problems are - defining a good reward signal, efficiently running (potentially millions) of simulated games.
The short answer is, this is not something to be taken lightly, it would be challenging and time consuming even for someone who has experience in the field and using Java is a no-no (Python is better). Given you probably don't have long to find a good solution, I would recommend trying a different approach - planning based maybe, or hard coding a reasonable strategy.
If you still want to go ahead and read up on the topic here are some good resources:
Reinforcement Learning an Introduction (Sutton & Barto) - any edition is fine
Selected chapters in Artificial Intelligence: A Modern Approach (Russel & Norvig)
Hope this helps and sorry it may not have been the answer you we hoping for!
I am developing a chess game and at the moment I'm trying to implement a minimax algorithm. I haven't done this before, also the little i known about how to programmatically represent and implement the following evaluation function features(material, mobility, piece square table, centre control, trapped piece, king safety, tempo and pawn structure) is not quite clear to me (I will be grateful if someone can explain to me in detail). I have been able to assign values to each chess pieces, piece action values and a square table for each piece. The problem am having at the moment is how to generate Piece attacked and defended values which will be added or subtracted from the score. The idea here is that i want to reward the AI agent for protecting its pieces and penalize it for having the pieces attacked. thanks in advances.
Each of the evaluation features you mentioned will take up compute time. As you may already be aware, playing strength of a chess engine comes from two sources:
Search
Evaluation
And both contend for the same valuable resource, compute time. Evaluation tends to be heuristics based and hence a bit fuzzy, whereas search tends to yield more concrete and relevant results. If you are starting to build an engine then I would recommend focusing on search while keeping evaluation basic (but not weak!). That way you will be able to tell exactly where something went wrong and hence avoid possible early disappointments. Moreover, popular engines like Stockfish also started out by first building a strong search algorithm.
If you've been patient enough to read this far, let me point you to two useful resources for evaluation:
Chess Programming Wiki's evaluation page: This website is probably the best online resource for chess engine development in general.
Link to a basic but not weak evaluation function: This is C# code. Unfortunately I can't find the original article that I based this evaluation on.
Hope it helps :)
I think that you shouldn't include the computation on attack and defended pieces. That functionality is already taken into account by the minmax algorithm in a more efficient way.
A piece in under attack if at the following move the opponent can take it. If you try to evaluate this possibility in a static evaluation function you will get into troubles if you want to do it correctly. If my protected pawn is taken by the opponent queen that is not an issue. How do you take this into account? If my queen is taken by the opposite pawn but moving the pawn puts the king under attack?
These considerations are better managed by the minmax algorithm, not the evaluator. Consider that to know how many pieces you can eat/can be eaten, you should take into account all possible moves and you probably would spend the same time that would be used to go one level deeper in the minmax algorithm. Moreover that time is wasted if you later decide to indeed proceed one step further in the minmax.
I'm going be competing in a board game AI competition at my school and am trying to come up with some ideas for concurrency to gain an edge. I will most likely be at a disadvantage because I will be implementing it in java and I understand c or c++ would be much faster.
It doesn't seem like you could just split the game tree in half because of the move ordering which should leave the best moves first and it seems that it would be difficult or maybe even impossible to communicate the current alpha/beta at a given depth. I'm going to be using transposition tables as well which would need to be synchronized.
Besides searching, is there something that a second thread could be doing which could aid in the search or provide some type of speed increase. Each AI will have 5 seconds to make a move and your program can be working while the opponent is thinking.
Any input, no matter how obscure, would be appreciated.
An overview can be found in the Chess Programming Wiki's parallel search article. Even if your actual game is not chess, many concepts will also apply. The site also covers sophisticated solutions for shared transposition tables.
However, when you don't have much time, I would not start with a parallel search. You are correct that parallelism can increase the strength of the search algorithm. It is very difficult to get it right, though, and the benefits are way lower than one would expect.
If you want to experiment with parallelism, go ahead. It is an interesting topic. However, if you just want to get the best results in a limited amount of time, I would recommend to stick with a sequential search, and instead focus on move ordering and correctness.
It is possible. You have to make communication between threads to have AB prunning help. Also, move ordering must be tweaked, it doesn't help if one thread has the best-rated moves to analyze while the others not.
This is my first question here, if I did something wrong, tell me...
I'm currently making a draughts game in Java. In fact everything works except the AI.
The AI is at the moment single threaded, using minimax and alpha-beta pruning. This code works, I think, it's just very slow, I can only go 5 deep into my game tree.
I have a function that recieves my mainboard, a depth (starts at 0) and a maxdepth. At this maxdepth it stops, returns the player's value (-1,1 or 0) with the most pieces on the board and ends the recursive call.
If maxdepth isn't reached yet, I calculate all the possible moves, I execute them one by one, storing my changes to the mainboard in someway.
I also use alpha-beta pruning, e.g. when I found a move that can make the player win I don't bother about the next possible moves.
I calculate the next set of moves from that mainboard state recursively. I undo those changes (from point 2) when coming out of the recursive call. I store the values returned by those recursive calls and use minimax on those.
That's the situation, now I have some questions.
I'd like to go deeper into my game tree, thus I have to diminish the time it takes to calculate moves.
Is it normal that the values of the possible moves of the AI (e.g. the moves that the AI can choose between) are always 0? Or will this change if I can go deeper into the recursion? Since at this moment I can only go 5 deep (maxdepth) into my recursion because otherwise it takes way too long.
I don't know if it's usefull, but how I can convert this recursion into a multithreaded recursion. I think this can divide the working time by some value...
Can someone help me with this please?
1. Is it normal that the values of the possible moves of the AI (e.g. the moves that the AI can choose between) are always 0?
Sounds strange to me. If the number of possible moves is 0, then that player can't play his turn. This shouldn't be very common, or have I misunderstood something?
If the value you're referring to represents the "score" of that move, then obviously "always 0" would indicate that all move are equally good, which obviously doesn't make a very good AI algorithm.
2. I don't know if it's usefull, but how I can convert this recursion into a multithreaded recursion. I think this can divide the working time by some value...
I'm sure it would be very useful, especially considering that most machines have several cores these days.
What makes it complicated is your "try a move, record it, undo it, try next move" approach. This indicates that you're working with a mutable data structure, which makes it extremely complicated to paralellize the algorithm.
If I were you, I would let the bord / game state be represented by an immutable data structure. You could then let each recursive call be treated as a separate task, and use a pool of threads to process them. You would get close to maximum utilization of the CPU(s) and at the same time simplify the code considerably (by removing the whole restore-to-previous-state code).
Assuming you do indeed have several cores on your machine, this could potentially allow you to go deeper in the tree.
I would strongly recommend reading this book:
One Jump Ahead: Computer Perfection At Checkers
It will give you a deep history about computer AI in the game of Checkers and will probably given you some help with your evaluation function.
Instead of having an evaluation function that just gives 1/0/-1 for differing pieces, give a score of 100 for every regular piece and 200 for a king. Then give bonuses for piece structures. For instance, if my pieces form a safe structure that can't be captured, then I get a bonus. If my piece is all alone in the middle of the board, then I get a negative bonus. It is this richness of features for piece configurations that will allow your program to play well. The final score is the difference in the evaluation for both players.
Also, you shouldn't stop your search at a uniform depth. A quiescence search extends search until the board is "quiet". In the case of Checkers, this means that there are no forced captures on the board. If you don't do this, your program will play extremely poorly.
As others have suggested, transposition tables will do a great job of reducing the size of your search tree, although the program will run slightly slower. I would also recommend the history heuristic, which is easy to program and will greatly improve the ordering of moves in the tree. (Google history heuristic for more information on this.)
Finally, the representation of your board can make a big difference. Fast implementations of search do not make copies of the board each time a move is applied, instead they try to quickly modify the board to apply and undo moves.
(I assume by draughts you mean what we would call checkers here in the States.)
I'm not sure if I understand your scoring system inside the game tree. Are you scoring by saying, "Position scores 1 point if player has more pieces than the opponent, -1 point is player has fewer pieces, 0 points if they have the same number of pieces?"
If so, then your algorithm might just be capture averse for the first five moves, or things are working out so that all captures are balanced. I'm not deeply familiar with checkers, but it doesn't seem impossible that this is so for only five moves into the game. And if it's only 5 plies (where a ply is one player's move, rather than a complete set of opposing moves) maybe its not unusual at all.
You might want to test this by feeding in a board position where you know absolutely the right answer, perhaps something with only two checkers on the board with one in a position to capture.
As a matter of general principle, though, the board evaluation function doesn't make a lot of sense-- it ignores the difference between a piece and a crowned piece, and it treats a three piece advantage the same as a one piece advantage.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm a first year computer engineering student and I'm quite new here. I have been learning Java for the past three and a half months, and C++ for six months before that. My knowledge of Java is limited to defining and using own methods, absolute basics of object-oriented programming like use of static data members and member visibility.
This afternoon, my computer programming prof taught us about multi-dimensional arrays in Java. About multi-dimensional arrays being simply arrays of arrays and so on. He mentioned that in nominal, educational programming, arrays beyond 2 dimensions are almost never used. Even 3D arrays are used only where absolutely essential, like carrying out scientific functions. This leaves next to zero use for 4D arrays as using them shows that "you're using the wrong datatype" in my prof's words.
However, I'd like to write a program in which the use of a 4D array, of any data type, primitive or otherwise, is justified. The program must not be as trivial as printing the elements of the array.
I have no idea where to begin, this is why I am posting this here. I'd like your suggestions. Relevant problem statements, algorithms, and bits and pieces of code are also welcome.
Thank you.
Edit: Forgot to mention, I have absolutely no idea about working with GUIs in Java, so please do not post ideas that implement GUIs.
Ideas:
- Matrix multiplication and it's applications like finding shortest path in graphs
- Solving of systems of equations
- Cryptography -- many cryptoprotocols represent data or keys or theirs internal structures in a form of matrices.
- Any algo on graphs represented as matrices
I must have been having some kind of fixation on matrices, sorry :)
For 4D arrays one obvious thing I can think of is the representation of 3D environment changing in time, so 4th dimension represents time scale. Or any representation of 3D which have additional associated property placed in 4th dimension of array.
You could create a Sodoku hypercube with 4 dimensions and stores the numbers the user enters into a 4dimensional int array.
One use could be applying dynamic programming to a function that takes 4 integer parameters f(int x,int y,int z,int w). To avoid calling this expensive function over and over again, you can cache the results in a 4D array, results[x][y][z][w]=f(x,y,z,w);.
Now you just have to find an expensive integer function with arity of 4, oh, and a need for calculating it often...
Just to back him up,..your prof is quite right. I'm afraid I might be physically violent to anyone using a 4D+ array in production code.
It's kinda cool to be able to go into greater than 3 dimensions as an educational exercise but for real work it makes things way too complicated because we don't really have much comprehension of structures with greater than 3 dimensions.
The reason it's difficult to come up with a practical use for 4D+ arrays is because there is (almost) nothing that complicated in the real world to model.
You could look into modelling something like a tesseract , which is (in layman's terms ) a 4D cube or as Victor suggests use the 4th dimension to model constant time.
HTH
There are many possible uses. As others have said, you can model a hypercube or something that makes use of a hypercube as well as modeling a change over time. However, there are many other possible uses.
For example, one of the theoretical simulation models of our universe uses 11th dimensional physics. You can write a program to model what these assumed physics would look like. The user would only be able to see a 3-dimensional space which definitely limits usability, but the 4th dimensional coordinate could act like the changing of a channel allowing the user to change their perspective. If a 4th dimensional explosion occurs, for example, you might even want a 5th dimensional array so that you can model what it looks like in each connected 3-dimensional space as well as how it looks in each frame of time.
To take a step away from the scientific, think about an MMORPG. Today many of those games uses "instanced" locations which means that a copy of a given zone is created exclusively for the use of a given group of players so to prevent lag. If this "instanced" concept was given a 4th dimensional coordinate and it allows players to shift their position across instances it could effectively allow all server worlds to be merged together while allowing the players a great deal of control over where they go while decreasing cost.
Of course, your question wants to know about ideas without using a GUI. That's a bit more difficult because you are working in a 2D environment. One real application would be Calculus. We have 3D graphing calculators, but for higher dimensions you pretty much have to do it by hand. A pogram that aims to solve these calculations for you might not be able to properly display the information, but you can certainly calculate it. Also, when hologaphic interfaces become a widespread reality it may be possible to represent a hypercube graph in 3D making such a program useful.
You might be able to write a text based board game where the position of pieces is represented with text. You can add dimensions and game rules to use them.
The simplest idea I could give you is a save state system. At each interval the program in memory is copied and stored into a file. It's coordinate is it's position in time. At face value you may not need a 4D array to handle this, but suppose the program you were saving states of used a 3D array. You could set it up to represent each saved state as a position in time that you can make use of and then view the change in time.
I'm not sure what specifically you could do with this, because I just started thinking about it. But you could possibly use a 4D array for some sort of basic physics simulation, like modeling a projectile flight involving some wind values and what not. That just came to mind because the term 4D always brings to mind that the "position" of any object is 4 values, with time as the 4th.
Being a physics student we have only 3 dimension of space but we have a 4th dimension which is time. So thinking in that way we can think of an array of any dimension(1D or 2D or 3D) whose values differ with time or an array which keeps the record of every array whose values changed with time.
It seems to be quite known to us. For example the "ATTENDANCE REGISTER" which we usually have in our classroom.
This is my view to it.
That's it.
Enjoy :-)
To give a concrete example for the Ishtar's answer: Four-string alignment. To compute optimal two-string alignment, you write one string along one (say, horizontal) axis of a 2D-array (a matrix!) and the other one along the other array. The array is filled with edit costs, and the optimal alignment of the two strings is the one which produces the lowest cost, associated with a path through the matrix. A common way of finding such a path is by the above mentioned dynamic programming. You can look up 'Levenshtein distance' or 'edit distance' for technical details.
The basic idea can be expanded to any number of strings. For four strings you'd need a four-dimensional array, to write each string along one of the dimensions.
In practice, however, multiple string alignment is not done this way, for at least two reasons:
Lack of flexibility: Why would you need to align exactly four strings??? In computational molecular biology, for example, you might wish to align many strings (think of DNA sequences), and their number is not known in advance, but it is seldom four. You program would be useful for a very limited class of problems.
Computational complexity, in space and time. The requirements are exponential in the number of dimensions, making the approach impractical for most real-world purposes. Besides, most of the entries in such multi-dimensional array would lie on such suboptimal paths, which are never even touched, so that storing them would be simply waste of space.
So, for all practical purposes, I believe your professor was right.