I recently implemented a simple Deep Q-Learning agent in Processing for a game called Frozen Lake (game from OpenAI Gym). The agent basically has to find the shortest path between the starting and the ending points, avoiding obstacles (holes in the ice) and without going out of the map.
This is the code that generates the state passed to the Neural Network:
//Return an array of double containing all 0s except for the cell the Agent is on that is 1.
private double[] getState()
{
double[] state = new double[cellNum];
for(Cell cell : lake.cells)
{
if((x - cellDim/2) == cell.x && (y - cellDim/2) == cell.y)
{
state[lake.cells.indexOf(cell)] = 1;
}
else
{
state[lake.cells.indexOf(cell)] = 0;
}
}
return state;
}
where lake is the environment object, cells is an ArrayList attribute of lake containing all the squares of the map, x and y are the agent's coordinates on the map.
And all of this works well, but the agent only learns the best path for a single game map and if the map changes the agent must be trained all over again.
I wanted the agent to learn how to play the game and not how to play a single map.
So, instead of setting all the map squares to 0 except the one the agent is on that is set to 1, I tried to associated some random numbers for every kind of square (Goal:1, Ice:8, Hole:0, Goal:3, Agent:7) and set the input like that, but it didn't work at all.
So I tried to convert all the colors of the squares into a grayscale value (from 0 to 255), so that now the different squares were mapped as (roughly): Goal:45, Ice:243.37, Hole:34.57, Goal:70.8, Agent:150.
But this didn't work either, so I mapped all the grayscale values to values between 0 and 1.
But no result with this either.
By the way, this is the code for the Neural Network to calculate the output:
public Layer[] estimateOutput(double[] input)
{
Layer[] neurons = new Layer[2]; //Hidden neurons [0] and Output neurons [1].
neurons[0] = new Layer(input); //To be transformed into Hidden neurons.
//Hidden neurons values calculation.
neurons[0] = neurons[0].dotProduct(weightsHiddenNeurons).addition(biasesHiddenNeurons).sigmoid();
//Output neurons values calculation.
neurons[1] = neurons[0].dotProduct(weightsOutputNeurons).addition(biasesOutputNeurons);
if(gameData.trainingGames == gameData.gamesThreshold)
{
//this.render(new Layer(input), neurons[0], neurons[1].sigmoid()); //Draw Agent's Neural Network.
}
return neurons;
}
and to learn:
public void learn(Layer inputNeurons, Layer[] neurons, Layer desiredOutput)
{
Layer hiddenNeurons = neurons[0];
Layer outputNeurons = neurons[1];
Layer dBiasO = (outputNeurons.subtraction(desiredOutput)).valueMultiplication(2);
Layer dBiasH = (dBiasO.dotProduct(weightsOutputNeurons.transpose())).layerMultiplication((inputNeurons.dotProduct(weightsHiddenNeurons).addition(biasesHiddenNeurons)).sigmoidDerivative());
Layer dWeightO = (hiddenNeurons.transpose()).dotProduct(dBiasO);
Layer dWeightH = (inputNeurons.transpose()).dotProduct(dBiasH);
//Set new values for Weights and Biases
weightsHiddenNeurons = weightsHiddenNeurons.subtraction(dWeightH.valueMultiplication(learningRate));
biasesHiddenNeurons = biasesHiddenNeurons.subtraction(dBiasH.valueMultiplication(learningRate));
weightsOutputNeurons = weightsOutputNeurons.subtraction(dWeightO.valueMultiplication(learningRate));
biasesOutputNeurons = biasesOutputNeurons.subtraction(dBiasO.valueMultiplication(learningRate));
}
Anyway, the whole project is available on GitHub, where the code is better commented: https://github.com/Nyphet/Frozen-Lake-DQL
What am I doing wrong on setting the input? How can I achieve "learning the game" instead of "learning the map"?
Thanks in advance.
Related
I'm creating a game where you pick a nation and you have to manage it, but I can't find a way to load the map without crashing the program due to massive computation (lack of performance).
I made an algorithm that loops trough every pixel of an image containing the provinces (the spatial unit in the game) of the map, each has their own color, this way, when I encounter a color not yet seen in a pixel, I know that's a new province, and I can therefor load it the new Province() instance with the information from a file.
Everything above said works just fine and takes almost no time at all, but to edit the map when various nations attack each other I need a way to render singularly every province to give it its nation's color with a shader.
I've added this piece of code that gets the current pixel position and it scales it down to openGL coordinates, saving it in an arrayList (currVertices), this is then put into an another ArrayList (provinceVertices) of float[] once a new province is found.
(I know the code is not beautiful and I'm not an expert programmer (also I'm 14) so please try to be kind when telling me what I did wrong,
I've tried just storing a vertex every 4 pixel to make the list smaller, but it still crashes)
List<Float> currVertices = new ArrayList<Float>(); // the vertices of the current province
for (int y = 0; y < worldImage.getHeight(); y++) {
for (int x = 0; x < worldImage.getWidth(); x++) {
if (!currColors.contains(worldImage.getRGB(x, y))) {
if (!currVertices.isEmpty())
provinceVertices.add(Utils.toFloatArray(currVertices)); // store the current province's vertices into the total database
currVertices.clear();
}
if (x % 4 == 0)
currVertices.add((float) (x) / EngineManager.getWindowWidth());
if (y % 4 == 0)
currVertices.add((float) (y) / EngineManager.getWindowHeight());
}
}
I've only included the code representing the loading of the vertices
public static float[] toFloatArray(List<Float> list) {
float[] array = new float[list.size()];
ListIterator<Float> iterator = list.listIterator();
while (iterator.hasNext()) {
array[iterator.nextIndex()] = list.get(iterator.nextIndex());
}
return array;
}
the goal would be for the second ArrayList to have all the vertices in the right order, but when I try and add the currVertices to the provinceVertices the game just crashes with no error message, which is why I'm guessing the problem is performance-related.
(The vertices load fine into the currVertices list)
Using nextIndex() doesn't increse the index. Try to use instead:
while (iterator.hasNext()) {
array[iterator.nextIndex()] = iterator.next();
}
I've tried to create a basic NN using the book
"Make Your Own Neural Network" by Tariq Rashid
and using the coding train videos:
https://www.youtube.com/watch?v=XJ7HLz9VYz0&list=PLRqwX-V7Uu6aCibgK1PTWWu9by6XFdCfh
and the nn.js class on the coding train git as a reference
https://github.com/shiffman/Neural-Network-p5/blob/master/nn.js
I writing the NN network in java, and I tried just like in the playlist to train the network on XOR after I succeeded in doing a single perceptron.
but for some reason, even though my code is similar to what the book is doing and same in the videos (except in the videos he using JS).
when I train the network for around 500000 times with randomized data set of XOR inputs (total of 4 input [1,0] [0,1] [0,0] [1,1]).
when I giving it to guess after the training the all the 4 options I get results closer to 0.5 than to 1,1,0,0 (the order of the inputs in the test are [1,0] [0,1] [0,0] [1,1])
this is my training function:
public void train(double [] inputs, double[] target) {
//generates the Hidden layer values
this.input = Matrix.fromArrayToMatrix(inputs);
feedForward(inputs);
//convert to matrices
Matrix targets = Matrix.fromArrayToMatrix(target);
//calculate the output error
Matrix outputErrors = Matrix.subtract(targets, output);
//calculate the Gradient
Matrix outputGradient = Matrix.map(output, NeuralNetwork::sigmoidDerivative);
outputGradient = Matrix.matrixMultiplication(outputGradient, outputErrors);
outputGradient.multiply(this.learningRate);
//adjust the output layer bias
this.bias_Output.add(outputGradient);
//calculate the hidden layer weights delta
Matrix hiddenT = Matrix.Transpose(hidden);
Matrix hiddenToOutputDelta = Matrix.matrixMultiplication(outputGradient, hiddenT);
//adjust the hidden layer weights
this.weightsHiddenToOutput.add(hiddenToOutputDelta);
//calculate the hidden layer error
Matrix weightsHiddenToOutputT = Matrix.Transpose(weightsHiddenToOutput);
Matrix hiddenErrors = Matrix.matrixMultiplication(weightsHiddenToOutputT, outputErrors);
//calculate the hidden gradient
Matrix hiddenGradient = Matrix.map(this.hidden, NeuralNetwork::sigmoidDerivative);
hiddenGradient = Matrix.matrixMultiplication(hiddenGradient, hiddenErrors);
hiddenGradient.multiply(this.learningRate);
//adjust the hidden layer bias
this.bias_Hidden.add(hiddenGradient);
//calculate the input layer weights delta
Matrix inputT = Matrix.Transpose(this.input);
Matrix inputToHiddenDelta = Matrix.matrixMultiplication(hiddenGradient, inputT);
//adjust the hidden layer weights
this.weightsInputToHidden.add(inputToHiddenDelta);
}
those are the sigmoid functions:
private static double sigmoid(double x) {
return 1d / (1d+ Math.exp(-x));
}
private static double sigmoidDerivative(double x) {
return (x * (1d - x));
}
I'm using this method to calculate the derivative because the network already getting the sigmoid function on the feed-forward process so all I do is calculate the derivative like that.
and this is my guess/ feedforward function:
public double[] feedForward(double [] inputs) {
double[] guess;
//generates the Hidden layer values
input = Matrix.fromArrayToMatrix(inputs);
hidden = Matrix.matrixMultiplication(weightsInputToHidden, input);
hidden.add(bias_Hidden);
//activation function
hidden.map(NeuralNetwork::sigmoid);
//Generates the output layer values
output = Matrix.matrixMultiplication(weightsHiddenToOutput, hidden);
output.add(bias_Output);
//activation function
output.map(NeuralNetwork::sigmoid);
guess = Matrix.fromMatrixToArray(output);
return guess;
}
this is in the main class the data set I'm giving him:
NeuralNetwork nn = new NeuralNetwork(2,2,1);
double [] label0 = {0};
double [] label1 = {1};
Literal l1 = new Literal(label1,0,1);
Literal l2 = new Literal(label1,1,0);
Literal l3 = new Literal(label0,0,0);
Literal l4 = new Literal(label0,1,1);
Literal[] arr = {l1, l2, l3, l4};
Random random = new Random();
for(int i = 0 ; i<500000 ; i++) {
Literal l = arr[i%4];
nn.train(l.getTruthValue(), l.getLabel());
}
System.out.println(Arrays.toString(nn.feedForward(l1.getTruthValue())));
System.out.println(Arrays.toString(nn.feedForward(l2.getTruthValue())));
System.out.println(Arrays.toString(nn.feedForward(l3.getTruthValue())));
System.out.println(Arrays.toString(nn.feedForward(l4.getTruthValue())));
but for some reason the outputs look like that:
[0.47935468493879807]
[0.5041956026507048]
[0.4575246472403595]
[0.5217568912941623]
I've tried changing it to subtract instead of add-on every bias and weights update (cause you need the negative gradient although both in the book and in the videos they use add instead of subtract) meaning changing those 4 lines to subtract:
this.bias_Output.subtract(outputGradient);
this.weightsHiddenToOutput.subtract(hiddenToOutputDelta);
this.bias_Hidden.subtract(hiddenGradient);
this.weightsInputToHidden.subtract(inputToHiddenDelta);
those are the 2 main outputs I get:
[0.9999779359460259]
[0.9999935716126019]
[0.9999860145346924]
[0.999990155468117]
or
[1.7489664881918983E-5]
[6.205315404676972E-6]
[8.41530873105465E-6]
[1.1853929628341918E-5]
I'm pretty sure the problem isn't in my Matrix class that I've created because I checked it before and all the add, subtract, multiply, transpose and it worked fine.
I would really appreciate if someone could look at this code and help me to figure out the problem
i've made an app that implements augmented reality based on POI's and have all the functionality working for one POI but i would now like to be able to put in multiple points. Can any give me advice on how to do this ? Can i create an array of POI's ?? posted my relevant code below but don't really know where to go from here.
private void setAugmentedRealityPoint() {
homePoi = new AugmentedPOI(
"Home",
"Latitude, longitude",
28.306802, -81.601358
);
This is how its currently set and i then go on to use it in other area's as shown belown:
public double calculateAngle() {
double dX = homePoi.getPoiLatitude() - myLatitude;
double dY = homePoi.getPoiLongitude() - myLongitude;
}
and here:
private boolean isWithinDistance(double myLatitude, double myLongitude){
Location my1 = new Location("One");
my1.setLatitude(myLatitude);
my1.setLongitude(myLongitude);
Location target =new Location("Two");
target.setLatitude(homePoi.getPoiLatitude());
target.setLongitude(homePoi.getPoiLongitude());
double range =my1.distanceTo(target);
double zone = 20;
if (range < zone ) {
return true;
}
else {
return false;
}
}
Any help would be appreciated.
Using a List would be a smart idea. You could add all entries into it in code, or you could pull them in from a JSON file. When you're rendering them, you could check if they are in range.
If you have a lot of these POIs, you should divide them into smaller and smaller regions, and only load what you need. For example, structure them like this:
- CountryA
+ County 1
* POI
* POI
- CountryB
+ County 1
* POI
* POI
+ County 2
* POI
Get the country and county of the user, and only load what you really need. I assume this is a multiplayer game, so I'll share some of my code.
On the server side, I have 3 objects: Country, County and POI.
First I discover all countries on the disk, and make an object for it. Inside my country object I have a list for all counties, and inside my County object I have a list of POIs. When a player joins, they send a packet with their Country and County, and I can select the appropriate POIs for them. Storing them in smaller regions is essential, or your server will have a hard time if you go through all of the POIs for every player.
Here is my method for discovering data: Server.java#L311-L385
Code for selecting POIs for a player: Server.java#L139-L181
And how you can render it: PlayScreen.java#L209-L268
You need to port it to your own app, and I'm probably horrible at explaining, but I hope you got something out of it.
I'm working on a 2D game for android so performance is a real issue and a must. In this game there might occur a lot of collisions between any objects and I don't want to check in bruteforce o(n^2) whether any gameobject collides with another one. In order to reduce the possible amount of collision checks I decided to use spatial hashing as broadphase algorithm becouse it seems quite simple and efficient - dividing the scene on rows and columns and checking collisions between objects residing only in the same grid element.
Here's the basic concept I quickly scratched:
public class SpatialHashGridElement
{
HashSet<GameObject> gameObjects = new HashSet<GameObject>();
}
static final int SPATIAL_HASH_GRID_ROWS = 4;
static final int SPATIAL_HASH_GRID_COLUMNS = 5;
static SpatialHashGridElement[] spatialHashGrid = new SpatialHashGridElement[SPATIAL_HASH_GRID_ROWS * SPATIAL_HASH_GRID_COLUMNS];
void updateGrid()
{
float spatialHashGridElementWidth = screenWidth / SPATIAL_HASH_GRID_COLUMNS;
float spatialHashGridElementHeight = screenHeight / SPATIAL_HASH_GRID_ROWS;
for(SpatialHashGridElement e : spatialHashGrid)
e.gameObjects.clear();
for(GameObject go : displayList)
{
for(int i = 0; i < go.vertices.length/3; i++)
{
int row = (int) Math.abs(((go.vertices[i*3 + 1] / spatialHashGridElementHeight) % SPATIAL_HASH_GRID_ROWS));
int col = (int) Math.abs(((go.vertices[i*3 + 0] / spatialHashGridElementWidth) % SPATIAL_HASH_GRID_COLUMNS));
if(!spatialHashGrid[row * SPATIAL_HASH_GRID_COLUMNS + col].gameObjects.contains(go))
spatialHashGrid[row * SPATIAL_HASH_GRID_COLUMNS + col].gameObjects.add(go);
}
}
}
The code isn't probably of the highest quality so if you spot anything to improve please don't hesitate to tell me but the most worrying problem that arises currently is that in 2 grid cells there might be same collision pairs checked. Worst case example (assuming none of the objects spans more than 2 cells):
Here we have 2 gameObjects colliding(red and blue). Each of them resides in 4 cells => therefore in each cell there will be the same pair to check.
I can't come up with some efficient approach to remove the possibility of duplicate pairs without a need to filter the grid after creating it in updateGrid(). Is there some brilliant way to detect that some collision pair has been already inserted even during the updateGrid function? I will be very grateful for any tips!
I'm trying to explain my idea using some pseudo-code (C# language elements):
public partial class GameObject {
// ...
Set<GameObject> collidedSinceLastTick = new HashSet<GameObject>();
public boolean collidesWith(GameObject other) {
if (collidedSinceLastTick.contains(other)) {
return true; // or even false, see below
}
boolean collided = false;
// TODO: your costly logic here
if (collided) {
collidedSinceLastTick.add(other);
// maybe return false if other actions depend on a GameObject just colliding once per tick
}
return collided;
}
// ...
}
HashSet and .hashCode() both can be tuned in some cases. Maybe you could even remove displayList and "hold" everything in spatialHashGrid to reduce the memory foot-print a little bit. Of course do that only if you don't need special access to displayList - in XML's DocumentObjectModel objects can be accessed by a path throught the tree, and "hot spots" can be accessed by ID where the ID has to be assigned explicitely. For serializing (saving game state or whatever) it should not be an issue to iterate through spatialHashGrid performance-wise (it's a bit slower than serializing the gameObject set because you may have to suppress duplicates - using Java serialization it even does not save the same object twice using the default settings, saving just a reference after the first occurence of an object).
I have created a gameboard (5x5) and I now want to decide when a move is legal as fast as possible. For example a piece at (0,0) wants to go to (1,1), is that legal? First I tried to find this out with computations but that seemed bothersome. I would like to hard-code the possible moves based on a position on the board and then iterate through all the possible moves to see if they match the destinations of the piece. I have problems getting this on paper. This is what I would like:
//game piece is at 0,0 now, decide if 1,1 is legal
Point destination = new Point(1,1);
destination.findIn(legalMoves[0][0]);
The first problem I face is that I don't know how to put a list of possible moves in an array at for example index [0][0]. This must be fairly obvious but I am stuck at this for some time. I would like to create an array in which there is a list of Point objects. So in semi-code: legalMoves[0][0] = {Point(1,1),Point(0,1),Point(1,0)}
I am not sure if this is efficient but it makes logically move sense than maybe [[1,1],[0,1],[1,0]] but I am not sold on this.
The second problem I have is that instead of creating the object at every start of the game with an instance variable legalMoves, I would rather have it read from disk. I think that it should be quicker this way? Is the serializable class the way to go?
My 3rd small problem is that for the 25 positions the legal moves are unbalanced. Some have 8 possible legal moves, others have 3. Maybe this is not a problem at all.
You are looking for a structure that will give you the candidate for a given point, i.e. Point -> List<Point>.
Typically, I would go for a Map<Point, List<Point>>.
You can initialise this structure statically at program start or dynamically when needing. For instance, here I use 2 helpers arrays that contains the possible translations from a point, and these will yield the neighbours of the point.
// (-1 1) (0 1) (1 1)
// (-1 0) (----) (1 0)
// (-1 -1) (0 -1) (1 -1)
// from (1 0) anti-clockwise:
static int[] xOffset = {1,1,0,-1,-1,-1,0,1};
static int[] yOffset = {0,1,1,1,0,-1,-1,-1};
The following Map contains the actual neighbours for a Point with a function that compute, store and return these neighbours. You can choose to initialise all neighbours in one pass, but given the small numbers, I would not think this a problem performance wise.
static Map<Point, List<Point>> neighbours = new HashMap<>();
static List<Point> getNeighbours(Point a) {
List<Point> nb = neighbours.get(a);
if (nb == null) {
nb = new ArrayList<>(xOffset.length); // size the list
for (int i=0; i < xOffset.length; i++) {
int x = a.getX() + xOffset[i];
int y = a.getY() + yOffset[i];
if (x>=0 && y>=0 && x < 5 && y < 5) {
nb.add(new Point(x, y));
}
}
neighbours.put(a, nb);
}
return nb;
}
Now checking a legal move is a matter of finding the point in the neighbours:
static boolean isLegalMove(Point from, Point to) {
boolean legal = false;
for (Point p : getNeighbours(from)) {
if (p.equals(to)) {
legal = true;
break;
}
}
return legal;
}
Note: the class Point must define equals() and hashCode() for the map to behave as expected.
The first problem I face is that I don't know how to put a list of possible moves in an array at for example index [0][0]
Since the board is 2D, and the number of legal moves could generally be more than one, you would end up with a 3D data structure:
Point legalMoves[][][] = new legalMoves[5][5][];
legalMoves[0][0] = new Point[] {Point(1,1),Point(0,1),Point(1,0)};
instead of creating the object at every start of the game with an instance variable legalMoves, I would rather have it read from disk. I think that it should be quicker this way? Is the serializable class the way to go?
This cannot be answered without profiling. I cannot imagine that computing legal moves of any kind for a 5x5 board could be so intense computationally as to justify any kind of additional I/O operation.
for the 25 positions the legal moves are unbalanced. Some have 8 possible legal moves, others have 3. Maybe this is not a problem at all.
This can be handled nicely with a 3D "jagged array" described above, so it is not a problem at all.