How do I make a reinforcement learning agent in Java?

How do I make a reinforcement learning agent in Java? - java

I have a challenge that my teacher gave to beat an army of his soldiers on a 18x24 grid, with random obstacles placed on the board. The game is turn based and I have an army of 50 soldiers, each of which needs to either move or attack on their turn.
My problem is I only have access to creating a class of soldiers to fight in this environment. Currently I have a method that evaluates the board position by looking at how many soldiers there are left from each team and does yourTeam - enemyTeam to get the current score, and I have a method that will produce the legal moves for the soldier.
I want to know how I would create a reinforcement learning agent in Java with what I have access to. If you know any ways to do this or any resources that may help that would be great. Thank you for the help!

Java is not a good language for doing math heavy computation (which is what you will need to do for RL). You could attempt to implement the Q-Learning, value-iteration or policy-iteration algorithms but I would avoid doing anything with neural networks/modern deep RL approaches here as your work load will increase dramatically.
With regard to your problem, if you are to implement one of the old-school algorithms. Think about your state and action space. I have serious concerns about the size of your action space, even with a small number of moves for each solider (say 3 - attack, move up, move down) with 50 soldiers the action space will be very large - 50^3, even this many will be difficult to deal with, any more (even 4 or 5) will send you deep into some complex topics in RL.
Other problems are - defining a good reward signal, efficiently running (potentially millions) of simulated games.
The short answer is, this is not something to be taken lightly, it would be challenging and time consuming even for someone who has experience in the field and using Java is a no-no (Python is better). Given you probably don't have long to find a good solution, I would recommend trying a different approach - planning based maybe, or hard coding a reasonable strategy.
If you still want to go ahead and read up on the topic here are some good resources:
Reinforcement Learning an Introduction (Sutton & Barto) - any edition is fine
Selected chapters in Artificial Intelligence: A Modern Approach (Russel & Norvig)
Hope this helps and sorry it may not have been the answer you we hoping for!

Related

Working with data in java

I've been using a formula for some time to try to find value in spreads for sports betting and done this by basically creating my own spread and comparing to what bookies offer and would like to automate the process.
I've written some code in java which will do the maths on the data I give it and I'm looking for a way to populate the input data either from a database or from an xml file I create but i'm quite new to programming.
Say for example if I pick two teams to compare. For each team I need a list of the teams they played, how many points each team scored in total, how many points each team conceded in total and how many games each team played so I can run the maths on those figures and I have no idea where to start. Could anyone help me or point me in the right direction?

It sounds like you've defined your problem (how to start), and also listed the information you need to get started (compare two teams, points, previous games, conceded points, etc). Are you sure you don't know how to start?
For a point in the right direction - I recommend creating a test case where you select two teams, give them some sample data for their previous games, points scored and conceded, and start working on the structure of your program.
This question lends itself too much to personal opinion and personal experience, and that makes it difficult to give you definitive answers without looking at any code or a program layout.
Give it your best effort and reply back with what you come up with, that will be much easier to critique and offer suggestions to.

Artificial Intelligence for a 'Blokus' game (1-4 Player)

we are working on a little Java game, based on the game Blokus.
Blokus-Manual
I'm a Java-beginner and plan to implement an advanced artificial intelligence. We already have a random AI (picks a random valid move) and an AI with a simple move-rating mechanism. We also want an AI which should be as good as possible (or at least very good ;) ).
The question is: Which AI-concept would be suitable for our purpose?
The minimax-algorithm seems to be a valid choice, but how do you adapt it to a 4-player-game? Are there better concepts for a game like blokus?
Thanks already :)

Min-max is hard to implement in a 4 player game because:
Decision tree grows exponentially, so you're going to be bounded by memory and/or computation time to a log(medMoves)=N steps. For a 4 player game, this is down to N/4. If N is 8 for example, you're only going to be able to see 2 moves ahead for each player.
Player collusion is hard to account for. In a realistic game, some players might help each other out (even if they're not on the same team). This will cause them to deviate from their personal 'maximum'.
If you want Minmax, you're going to have to do a lot of pruning to make it viable. What I would suggest is learning a few patterns so the AI would know how to react. This can be done via neural net, or reinforcement learning with a few tweaks.
These patterns could be be static (you can create the input scenario manually or programmatically), or dynamic (create all valid scenarios and randomly makes moves select the ones with the best score).

Theoretically speaking, an "as good as possible AI" is a perfect AI, which is an AI that has, at any moment in the game, full knowledge of the game state (if the full game state is not known by human players). In case of games that everyone has full game state knowledge (like Blokus), a good as possible AI is an AI that can try to predict the very best move to make (minimax here, as you said). You can also google for genetic algorithms and simulated annealing, as they are valid, depending on what you want. Also, you can use minimax for more than 2 players.

I would recommend minimax algorithm. One thing you can add to make it more efficient (meaning you should be able go more moves deep into the future) is alpha-beta pruning.
The problem with minimax search is that the number of games states it has to examine is exponential in the depth of the tree. Unfortunately, we can't eliminate the exponent, but it turns out we can effectively cut it in half.
The quote is from Chapter 5.3 of Artificial Intelligence: A Modern Approach third edition by Stuart Russel and Peter Norvig. It was holding up my monitor, and I used it in a few of my classes in college. I know people don't often reference books on SO, but it's extremely relevant. I have used it extensively, and I do really recommend it for both being understandable, and covering a wide range of AI content.
It is available on amazon for $104, or * cough cough * I'm sure you can find it online if you don't have that kind of money for a textbook floating around. Looking up the minimax algorithm and alpha beta pruning online should also get you good results.
I think the only circumstance that would make Minimax a poor option for you is if the game state is only partially observable to any given player (they don't know everything about what's going on), or if the game is non-deterministic (it has random elements). Because neither of these are the case for Blokus, I think you made an excellent choice with Minimax.
The area of AI is called Adversarial Search in the textbook (Chapter 5: Adversarial Search), so looking up more info online with that term may get you more helpful information, or help you find an example Java implementation. I do not consider this a beginner's task, but it sounds like you are up to it, if you made the game and can pick random valid moves. Keep up the good work!

In 2011, with many updates since then, a program called Pentobi
was released, and it is a very strong Blokus playing program.
The only one known to date, in fact, which is any good at all, and it
surpasses all the others by a great deal. It will beat many good human players and gives even the best a run for their money.
Its main algorithm is Monte Carlo Search Tree, but it also uses a "book" of openings and some heuristics.
There is documentation and download information at
http://pentobi.sourceforge.net/

I found that using a very simple heuristic provides a fairly intelligent player even using only 1-step look ahead. I implemented what I called "space heuristic" which takes the board state and floods it by coloring all squares adjacent to each placed piece the color of that piece. Then, the total number of colored squares is counted once the flooding terminates. The space heuristic gives a rough estimate of how much a play claims or occupies board space, and way outperforms random play. Could be combined with minimax or MCTS to get way stronger as well.

Need help on chess game evaluation function

I am developing a chess game and at the moment I'm trying to implement a minimax algorithm. I haven't done this before, also the little i known about how to programmatically represent and implement the following evaluation function features(material, mobility, piece square table, centre control, trapped piece, king safety, tempo and pawn structure) is not quite clear to me (I will be grateful if someone can explain to me in detail). I have been able to assign values to each chess pieces, piece action values and a square table for each piece. The problem am having at the moment is how to generate Piece attacked and defended values which will be added or subtracted from the score. The idea here is that i want to reward the AI agent for protecting its pieces and penalize it for having the pieces attacked. thanks in advances.

Each of the evaluation features you mentioned will take up compute time. As you may already be aware, playing strength of a chess engine comes from two sources:
Search
Evaluation
And both contend for the same valuable resource, compute time. Evaluation tends to be heuristics based and hence a bit fuzzy, whereas search tends to yield more concrete and relevant results. If you are starting to build an engine then I would recommend focusing on search while keeping evaluation basic (but not weak!). That way you will be able to tell exactly where something went wrong and hence avoid possible early disappointments. Moreover, popular engines like Stockfish also started out by first building a strong search algorithm.
If you've been patient enough to read this far, let me point you to two useful resources for evaluation:
Chess Programming Wiki's evaluation page: This website is probably the best online resource for chess engine development in general.
Link to a basic but not weak evaluation function: This is C# code. Unfortunately I can't find the original article that I based this evaluation on.
Hope it helps :)

I think that you shouldn't include the computation on attack and defended pieces. That functionality is already taken into account by the minmax algorithm in a more efficient way.
A piece in under attack if at the following move the opponent can take it. If you try to evaluate this possibility in a static evaluation function you will get into troubles if you want to do it correctly. If my protected pawn is taken by the opponent queen that is not an issue. How do you take this into account? If my queen is taken by the opposite pawn but moving the pawn puts the king under attack?
These considerations are better managed by the minmax algorithm, not the evaluator. Consider that to know how many pieces you can eat/can be eaten, you should take into account all possible moves and you probably would spend the same time that would be used to go one level deeper in the minmax algorithm. Moreover that time is wasted if you later decide to indeed proceed one step further in the minmax.

Concurrently search a game tree using minimax and AB pruning. Is that possible?

I'm going be competing in a board game AI competition at my school and am trying to come up with some ideas for concurrency to gain an edge. I will most likely be at a disadvantage because I will be implementing it in java and I understand c or c++ would be much faster.
It doesn't seem like you could just split the game tree in half because of the move ordering which should leave the best moves first and it seems that it would be difficult or maybe even impossible to communicate the current alpha/beta at a given depth. I'm going to be using transposition tables as well which would need to be synchronized.
Besides searching, is there something that a second thread could be doing which could aid in the search or provide some type of speed increase. Each AI will have 5 seconds to make a move and your program can be working while the opponent is thinking.
Any input, no matter how obscure, would be appreciated.

An overview can be found in the Chess Programming Wiki's parallel search article. Even if your actual game is not chess, many concepts will also apply. The site also covers sophisticated solutions for shared transposition tables.
However, when you don't have much time, I would not start with a parallel search. You are correct that parallelism can increase the strength of the search algorithm. It is very difficult to get it right, though, and the benefits are way lower than one would expect.
If you want to experiment with parallelism, go ahead. It is an interesting topic. However, if you just want to get the best results in a limited amount of time, I would recommend to stick with a sequential search, and instead focus on move ordering and correctness.

It is possible. You have to make communication between threads to have AB prunning help. Also, move ordering must be tweaked, it doesn't help if one thread has the best-rated moves to analyze while the others not.

Want to implement a reinforcement learning connect four agent

I want to implement a reinforcement learning connect four agent.
I am unsure how to do so and how it should look. I am familiar with the theoretical aspects of reinforcement learning but don't know how they should be implemented.
How should it be done?
Should I use TD(lambda) or Q-learning, and how do MinMax trees come in to this?
How does my Q and V functions work (Quality of action and Value of state). How do I score those things? What is my base policy which I improve, and what is my model?
Another thing is how should I save the states or statesXactions (depending on the learning algorithm). Should I use neural networks or not? And if yes, how?
I am using JAVA.
Thanks.

This might be a more difficult problem than you think, and here is why:
The action space for the game is the choice of column to drop a piece into. The state space for the game is an MxN grid. Each column contains up to M pieces distributed among the 2 players.This means there are (2M+1-1)N states. For a standard 6x7 board, this comes out to about 1015. It follows that you cannot apply reinforement learning to the problem directly. The state value function is not smooth, so naíve function approximation would not work.
But not all is lost. For one thing, you could simplify the problem by separating the action space. If you consider the value of each column separately, based on the two columns next to it, you reduce N to 3 and the state space size to 106. Now, this is very manageable. You can create an array to represent this value function and update it using a simple RL algorithm, such as SARSA.
Note, that the payoff for the game is very delayed, so you might want to use eligibility traces to accelerate learning.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.