I have implemented a general neural network framework from scratch, and ensured it is ostensibly doing the right thing by solving the XOR problem (I have done so with both 1 and more hidden layers to ensure I don't have a bug in my multi-hidden layer implementation).
I've also implemented my own Matrix math library from scratch (more as an exercise in matrix math as a precursor to understanding the NN math rather than for lightweight reasons).
So now I am trying to finalise the successful functionality of the framework by trying to solve MNIST with a vector-input network with layers of size {784, 100, 50, 10}, a learning rate of 0.2, and weight initialisation of random float in range {-1, 1} and zero-initialisation of biases.
After training the network on all 60,000 training examples (running backprop for each one, no mini-batching), I'm getting rubbish out. I've verified that I'm not putting rubbish in, however I've written the framework in Java and the training data needs quite a bit of prep for it to be usable.
I have mapped each value {0, 255} between {0.01, 1} using transformation x -> x * (0.99 / 255) + 0.01 to prevent zero-adjustment of biases.
That's the background, I'll dump the code below, hopefully it isn't too terribly written:
Network class
public class Network {
int inputNum, outputNum;
Layer[] layers;
float cost;
float learning_rate;
public Network(int input, int[] hidden, int output, float learning_rate, float weight_bound) {
this.inputNum = input;
this.outputNum = output;
this.layers = new Layer[hidden.length + 1];
this.learning_rate = learning_rate;
this.layers[0] = new Layer(input, hidden[0], weight_bound);
this.layers[this.layers.length - 1] = new Layer(hidden[hidden.length - 1], output, weight_bound);
for (int i = 1; i < hidden.length; i++) {
layers[i] = new Layer(hidden[i - 1], hidden[i], weight_bound);
}
}
public Matrix predict(Matrix input) throws DimensionException {
Matrix currentIn = input;
for (Layer l : this.layers) {
currentIn = l.forwardPropogate(currentIn);
}
return currentIn;
}
public void train(Matrix[] inputs, Matrix[] labels) throws DimensionException {
for (int i = 0; i < inputs.length; i++) {
Matrix currentIn = inputs[i];
for (Layer l : this.layers) {
currentIn = l.forwardPropogate(currentIn);
}
Matrix error = Matrix.sub(labels[i], currentIn);
this.cost = Matrix.apply(error, x -> 0.5f * x * x).sum();
for (int l = this.layers.length - 1; l >= 0; l--) {
error = this.layers[l].backwardPropogate(error, learning_rate);
}
}
}
public static float sigmoid(float x) {
return 1 / (1 + (float) Math.exp(-x));
}
public static float dsigmoid(float y) {
return y * (1 - y);
}
}
Layer class
import java.util.concurrent.ThreadLocalRandom;
public class Layer {
Matrix weights;
Matrix biases;
Matrix inValues;
Matrix outValues;
public Layer(int input_size, int output_size, float init_bound) {
this.weights = new Matrix(output_size, input_size);
this.weights.apply(x -> (float) ThreadLocalRandom.current().nextDouble(-init_bound, init_bound));
this.biases = new Matrix(output_size, 1);
}
public Matrix forwardPropogate(Matrix in) throws DimensionException {
this.inValues = in;
this.outValues = this.weights.multiply(this.inValues);
this.outValues.add(this.biases);
this.outValues.apply(x -> Network.sigmoid(x));
return outValues;
}
public Matrix backwardPropogate(Matrix out_error, float learning_rate) throws DimensionException {
Matrix errorByDSig = Matrix.apply(this.outValues, x -> Network.dsigmoid(x)).multiply(out_error);
Matrix weightAdjustment = errorByDSig.multiply(Matrix.transpose(this.inValues));
weightAdjustment.multiply(learning_rate);
Matrix biasAdjustment = errorByDSig.copy();
biasAdjustment.multiply(learning_rate);
Matrix newError = Matrix.transpose(this.weights).multiply(errorByDSig);
this.weights.add(weightAdjustment);
this.biases.add(biasAdjustment);
return newError;
}
}
Matrix library
import java.util.concurrent.ThreadLocalRandom;
import java.util.function.Function;
class Matrix {
int rows, cols;
float[][] matrix;
public Matrix (int rows, int cols) {
this.rows = rows;
this.cols = cols;
this.matrix = new float[this.rows][this.cols];
}
public void multiply(float n) {
this.apply(x -> (float)x * n);
}
public Matrix multiply(Matrix m) throws DimensionException {
Matrix newM;
if (this.cols == m.cols && this.rows == m.rows) {
newM = this.copy();
for (int r = 0; r < m.rows; r++) {
for (int c = 0; c < m.cols; c++) {
newM.matrix[r][c] *= m.matrix[r][c];
}
}
return newM;
}
if (this.cols == m.rows) {
newM = new Matrix(this.rows, m.cols);
for (int r = 0; r < this.rows; r++) {
for (int c = 0; c < m.cols; c++) {
float sum = 0;
for (int s = 0; s < m.rows; s++) {
sum += this.matrix[r][s] * m.matrix[s][c];
}
newM.matrix[r][c] = sum;
}
}
return newM;
}
throw new DimensionException("Matrix dimensions are incorrect");
}
public void add(float n) {
this.apply(x -> x + n);
}
public void add(Matrix m) throws DimensionException{
if (this.cols != m.cols || this.rows != m.rows) {
throw new DimensionException("Matrix dimensions are incorrect");
}
for (int r = 0; r < this.rows; r++) {
for (int c = 0; c < this.cols; c++) {
this.matrix[r][c] += m.matrix[r][c];
}
}
}
public void sub(float n) {
this.add(-n);
}
public void sub(Matrix m) throws DimensionException {
this.add(Matrix.apply(m, x -> -x));
}
public void apply(Function<Float, Float> f) {
for (int r = 0; r < this.rows; r++) {
for (int c = 0; c < this.cols; c++) {
this.matrix[r][c] = f.apply(this.matrix[r][c]);
}
}
}
public Matrix copy() {
Matrix m = new Matrix(this.rows, this.cols);
for (int r = 0; r < this.rows; r++) {
for (int c = 0; c < this.cols; c++) {
m.matrix[r][c] = this.matrix[r][c];
}
}
return m;
}
public String toString() {
String s = "";
for (int r = 0; r < this.rows; r++) {
for (int c = 0; c < this.cols; c++) {
s += Float.toString(this.matrix[r][c]) + " ";
}
s += "\n";
}
return s;
}
public float sum() {
float total = 0;
for (int r = 0; r < this.rows; r++) {
for (int c = 0; c < this.cols; c++) {
total += this.matrix[r][c];
}
}
return total;
}
public static Matrix multiply(Matrix m, float n) {
Matrix newM = m.copy();
newM.add(n);
return newM;
}
public static Matrix multiply(Matrix m, Matrix n) throws DimensionException {
Matrix newM = m.copy();
return newM.multiply(n);
}
public static Matrix add(Matrix m, float n) {
Matrix newM = m.copy();
newM.apply(x -> x + n);
return newM;
}
public static Matrix add(Matrix m, Matrix n) throws DimensionException {
Matrix newM = m.copy();
newM.add(n);
return newM;
}
public static Matrix sub(Matrix m, float n) {
Matrix newM = m.copy();
newM.sub(n);
return newM;
}
public static Matrix sub(Matrix m, Matrix n) throws DimensionException {
Matrix newM = m.copy();
newM.sub(n);
return newM;
}
public static Matrix apply(Matrix m, Function<Float, Float> f) {
Matrix newM = m.copy();
newM.apply(f);
return newM;
}
public static Matrix random(int rows, int cols, int bound) {
Matrix m = new Matrix(rows, cols);
m.apply(x -> x + ThreadLocalRandom.current().nextInt(bound));
return m;
}
public static Matrix fromArray(int rows, int cols, float[] array) {
Matrix newMatrix = new Matrix(rows, cols);
for (int r = 0; r < rows; r++) {
for (int c = 0; c < cols; c++) {
newMatrix.matrix[r][c] = array[r * cols + c];
}
}
return newMatrix;
}
public static Matrix transpose(Matrix m) {
Matrix newMatrix = new Matrix(m.cols, m.rows);
for (int r = 0; r < m.rows; r++) {
for (int c = 0; c < m.cols; c++) {
newMatrix.matrix[c][r] = m.matrix[r][c];
}
}
return newMatrix;
}
}
Training / getting an output
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.Arrays;
public class tests {
public static void main(String[] args) throws DimensionException, FileNotFoundException, IOException {
File training_files = new File("C:/Users/2001b/OneDrive/Desktop/data/data/training");
File label_file = new File("C:/Users/2001b/OneDrive/Desktop/data/data/labels.csv");
Matrix[] training_data = new Matrix[60000];
Matrix[] training_labels = new Matrix[60000];
int file_count = 0;
for (File f : training_files.listFiles()) {
float[] pixels = new float[784];
int p_count = 0;
BufferedReader br = new BufferedReader(new FileReader(f));
String line;
while ((line = br.readLine()) != null) {
double[] values = Arrays.stream(line.split(",")).mapToDouble(Double::parseDouble).toArray();
for (double v : values) {
pixels[p_count++] = ((float) v * (0.99f / 255) + 0.01f);
}
}
training_data[file_count++] = Matrix.fromArray(784, 1, pixels);
br.close();
}
BufferedReader br = new BufferedReader(new FileReader(label_file));
String line;
int count = 0;
while ((line = br.readLine()) != null) {
int value = Integer.valueOf(line);
Matrix answerMatrix = new Matrix(10, 1);
answerMatrix.matrix[value][0] = 1;
training_labels[count++] = answerMatrix;
}
br.close();
System.out.println(training_labels[5]);
Network network = new Network(784, new int[] {100, 50}, 10, 0.2f, 1);
network.train(training_data, training_labels);
System.out.println(network.predict(training_data[5]));
}
}
After training with 10-vector all-zero labels (other than the expected 1), I'm getting output vectors like this:
0.0629406
0.09087993
0.09197301
0.08965302
0.052927334
0.08770021
0.11267567
0.071576655
0.12798244
0.09146147
Any help would be much appreciated
Related
I'm trying to implement a neural network with:
5 input nodes(+1 bias)
1 hidden layer of 1 hidden node(+1 bias)
1 output unit.
The training data I'm using is the a disjunction of 5 input units. The Overall Error is oscillating instead of decreasing and reaching very high numbers.
package neuralnetworks;
import java.io.File;
import java.io.FileNotFoundException;
import java.math.*;
import java.util.Random;
import java.util.Scanner;
public class NeuralNetworks {
private double[] weightslayer1;
private double[] weightslayer2;
private int[][] training;
public NeuralNetworks(int inputLayerSize, int weights1, int weights2) {
weightslayer1 = new double[weights1];
weightslayer2 = new double[weights2];
}
public static int[][] readCSV() {
Scanner readfile = null;
try {
readfile = new Scanner(new File("disjunction.csv"));
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Scanner delimit;
int[][] train = new int[32][6];
int lines = 0;
while (readfile.hasNext()) {
String line = readfile.nextLine();
delimit = new Scanner(line);
delimit.useDelimiter(",");
int features = 0;
while (delimit.hasNext() && lines > 0) {
train[lines - 1][features] = Integer.parseInt(delimit.next());
features++;
}
lines++;
}
return train;
}
public double linearcomb(double[] input, double[] weights) { //calculates the sum of the multiplication of weights and inputs
double sigma = 0;
for (int i = 0; i < input.length; i++) {
sigma += (input[i] * weights[i]);
}
return sigma;
}
public double hiddenLayerOutput(int[] inputs) { //calculates the output of the hiddenlayer
double[] formattedInput = new double[6]; //adds the bias unit
formattedInput[0] = 1;
for (int i = 1; i < formattedInput.length; i++)
formattedInput[i] = inputs[i - 1];
double hlOutput = linearcomb(formattedInput, weightslayer1);
return hlOutput;
}
public double feedForward(int[] inputs) { //calculates the output
double hlOutput = hiddenLayerOutput(inputs);
double[] olInput = new double[2];
olInput[0] = 1;
olInput[1] = hlOutput;
double output = linearcomb(olInput, weightslayer2);
return output;
}
public void backprop(double predoutput, double targetout, double hidout, double learningrate, int[] input) {
double outputdelta = predoutput * (1 - predoutput) * (targetout - predoutput);
double hiddendelta = hidout * (1 - hidout) * (outputdelta * weightslayer2[1]);
updateweights(learningrate, outputdelta, hiddendelta, input);
}
public void updateweights(double learningrate, double outputdelta, double hiddendelta, int[] input) {
for (int i = 0; i < weightslayer1.length; i++) {
double deltaw1 = learningrate * hiddendelta * input[i];
weightslayer1[i] += deltaw1;
}
for (int i = 0; i < weightslayer2.length; i++) {
double deltaw2 = learningrate * outputdelta * hiddenLayerOutput(input);
weightslayer2[i] += deltaw2;
}
}
public double test(int[] inputs) {
return feedForward(inputs);
}
public void train() {
double learningrate = 0.01;
double output;
double hiddenoutput;
double error = 100;
do {
error = 0;
for (int i = 0; i < training.length; i++) {
output = feedForward(training[i]);
error += (training[i][5] - output) * (training[i][5] - output) / 2;
hiddenoutput = hiddenLayerOutput(training[i]);
backprop(output, training[i][5], hiddenoutput, learningrate, training[i]);
}
//System.out.println(error);
}while(error>1);
}
public static void main(String[] args) {
NeuralNetworks nn = new NeuralNetworks(6, 6, 2);
Random rand = new Random();
nn.weightslayer2[0] = (rand.nextDouble() - 0.5);
nn.weightslayer2[1] = (rand.nextDouble() - 0.5);
for (int i = 0; i < nn.weightslayer1.length; i++)
nn.weightslayer1[i] = (rand.nextDouble() - 0.5);
nn.training = readCSV();
/*for (int i = 0; i < nn.training.length; i++) {
for (int j = 0; j < nn.training[i].length; j++)
System.out.print(nn.training[i][j] + ",");
System.out.println();
}*/
nn.train();
int[] testa = { 0, 0, 0, 0, 0 };
System.out.println(nn.test(testa));
}
}
I'm working on a brute force approach to the traveling salesman problem. I have a certain line that produces the ArrayIndexOutOfBounds exception, however all the arrays used there have more than enough space. The particular line of code:
testCity[0][a] = cities[0][(int) cityList[a]];
This is where I initialize testCity:
int[][] testCity = new int[2][CITIES+10];
cities:
public static int[][] cities = new int[2][CITIES+10];
And, finally, cityList:
Object[] cityList = new Integer[CITIES+10];
This is the entire error message:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 4
at BruteF.permute(BruteF.java:39)
at BruteF.permute(BruteF.java:30)
at BruteF.permute(BruteF.java:30)
at BruteF.permute(BruteF.java:30)
at BruteF.main(BruteF.java:11)
And here is the code:
public class BruteF {
public static final int CITIES = 5;
public static int[][] cities = new int[2][CITIES+10];
public static int[][] bestCity = new int[2][CITIES+10];
public static double bestDistance = 1000;
public static int[][] testCity = new int[2][CITIES+10];
public static Object[] cityList = new Integer[CITIES+10];
public static void main(String[] args)
{
permute(java.util.Arrays.asList(1,2,3,4), 0);
for (int i = 0;i < CITIES;i++)
{
System.out.println(bestCity[0][i] + "," + bestCity[1][i]);
}
}
static void permute(java.util.List<Integer> arr, int k){
cities[0][0] = 1;
cities[1][0] = 1;
cities[0][1] = 2;
cities[1][1] = 5;
cities[0][2] = 3;
cities[1][2] = 2;
cities[0][3] = 4;
cities[1][3] = 3;
int originalX = cities[0][0];
int originalY = cities[1][0];
for(int i = k; i < arr.size(); i++){
java.util.Collections.swap(arr, i, k);
permute(arr, k+1);
java.util.Collections.swap(arr, k, i);
}
if (k == arr.size() -1){
for (int i = 0;i < CITIES;i++)
{
cityList = arr.toArray();
for (int a = 0;a < CITIES;a++)
{
testCity[0][a] = cities[0][(int) cityList[a]];
}
if (distance(testCity,CITIES,originalX, originalY) < bestDistance)
{
bestCity = testCity;
bestDistance = distance(testCity,CITIES, originalX, originalY);
}
}
}
}
static double distance (int[][] cities, int CITIES, int originalX, int originalY)
{
int[][] taken = new int[2][CITIES+1];
int takenCounter = 0;
double distance = 0;
cities[0][CITIES] = cities[0][0];
cities[1][CITIES] = cities[1][0];
for (int i = 0;i <= CITIES;i++)
{
for (int z = 0;z <= CITIES;z++)
{
if (cities[0][i] == taken[0][z] && cities[1][i] == taken[1][z])
{
return CITIES*1000; //possible error here
}
else {
taken[0][takenCounter] = cities[0][i];
taken[1][takenCounter] = cities[1][i];
}
}
if (cities[0][0] != originalX && cities[1][0] != originalY)
{
return CITIES*1000; //POSSIBLE BUG HERE
}
distance = distance + Math.sqrt(Math.pow(cities[0][i+1]-cities[0][i],2) + Math.pow(cities[1][i+1]-cities[1][i],2));
}
return distance;
}
}
Why is this happenening? What can I do to fix it?
It is giving out of bound exception : 4
when you are initializing cityList i.e. cityList = arr.toArray(); your array cityList[] = {1,2,3,4} , i.e of size 4 from 0 to 3.
And you are running a for loop i.e
for (int a = 0;a < CITIES;a++)
from a=0 to CITIES , so as the moment arrive when a=4, it gives out of bound error.
I am working on a Matrix program on eclipse and I am currently stuck on 2 methods in which I thought were the most simplest of all. The first method that I am working on is to take the sum of 2 different 2D arrays from the #Test cases and returning the sum in a new 2D array. I already have an 2D array instance variable. The reason I am stuck is because the of the parameter in the method. The parameter doesn't give any variable other than the class (Matrix) and the variable (other). So I was wondering how to go about getting this method started and most importantly returning the sum array.
The other method I am stuck on is the transpose method where you must flip the rows and columns of the given 2D array. I know I must create a temp 2D array in order to store the content back into the original 2D array but for some reason it is not passing the test cases. If someone could please help me with these two methods, it would be much appreciated.
import java.util.Arrays;
public class Matrix {
private int[][] array;
private int[][] array2;
private int theRow;
private int theCol;
public Matrix(int[][] arrayOfArrays) {
// TODO Auto-generated constructor stub
array = new int[arrayOfArrays.length][arrayOfArrays[0].length];
for (int r = 0; r < arrayOfArrays.length; r++) {
for (int c = 0; c < arrayOfArrays[r].length; c++) {
array[r][c] = arrayOfArrays[r][c];
}
}
}
public int get(int row, int column) {
return array[row][column];
}
public int getNumberOfRows() {
int nRows = array.length;
return nRows;
}
public int getNumberOfColumns() {
int nCols = array[0].length;
return nCols;
}
public String toString() {
String res = "";
for (int r = 0; r < array.length; r++) {
for (int c = 0; c < array[r].length; c++)
res = res + array[r][c];
}
return res;
}
public Matrix sum(Matrix other) {
return sum;
}
public void scalarMultiply(int scalar) {
for (int r = 0; r < array.length; r++) {
for (int c = 0; c < array[0].length; c++) {
array[r][c] = array[r][c] * scalar;
}
}
}
public void transpose() {
int m = array.length;
int n = array[0].length;
int[][] transpose = new int [n][m];
int temp;
for (int r = 0; r < m; r++) {
for (int c = 0; c < n; c++) {
transpose[c][r] = array[r][c];
array[r][c] = array[c][r];
array[c][r] = transpose[c][r];
}
}
}
//The test cases for sum method and transpose method
#Test
public void testSum() {
int[][] a1 = { { 1, 2, 3 },
{ 5, 6, 7 } };
Matrix a = new Matrix(a1);
int[][] a2 = { { -2, -2, -2 },
{ 4, 4, 4 } };
Matrix b = new Matrix(a2);
Matrix c = a.sum(b);
assertEquals(-1, c.get(0, 0));
assertEquals(0, c.get(0, 1));
assertEquals(1, c.get(0, 2));
assertEquals(9, c.get(1, 0));
assertEquals(10, c.get(1, 1));
assertEquals(11, c.get(1, 2));
}
#Test
public void testTranspose() {
int[][] a1 = { { 1, 3, 5 },
{ 2, 4, 6 } };
Matrix a = new Matrix(a1);
a.transpose();
assertEquals(1, a.get(0, 0));
assertEquals(2, a.get(0, 1));
assertEquals(3, a.get(1, 0));
assertEquals(4, a.get(1, 1));
assertEquals(5, a.get(2, 0));
assertEquals(6, a.get(2, 1));
}
You need change the dimensions, for example, 2x3 -> 3x2.
import java.util.Arrays;
public class Matrix {
private int[][] array;
private int[][] array2;// remove this
private int theRow;// remove this
private int theCol;// remove this
public void transpose() {
int m = array.length;
int n = array[0].length;
int[][] transpose = new int [n][m];
for (int r = 0; r < m; r++) {
for (int c = 0; c < n; c++) {
transpose[c][r] = array[r][c];
}
}
array = transpose;
}
}
I'm trying to implement a feed-forward neural network in Java.
I've created three classes NNeuron, NLayer and NNetwork. The "simple" calculations seem fine (I get correct sums/activations/outputs), but when it comes to the training process, I don't seem to get correct results. Can anyone, please tell what I'm doing wrong ?
The whole code for the NNetwork class is quite long, so I'm posting the part that is causing the problem:
[EDIT]: this is actually pretty much all of the NNetwork class
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class NNetwork
{
public static final double defaultLearningRate = 0.4;
public static final double defaultMomentum = 0.8;
private NLayer inputLayer;
private ArrayList<NLayer> hiddenLayers;
private NLayer outputLayer;
private ArrayList<NLayer> layers;
private double momentum = NNetwork1.defaultMomentum; // alpha: momentum, default! 0.3
private ArrayList<Double> learningRates;
public NNetwork (int nInputs, int nOutputs, Integer... neuronsPerHiddenLayer)
{
this(nInputs, nOutputs, Arrays.asList(neuronsPerHiddenLayer));
}
public NNetwork (int nInputs, int nOutputs, List<Integer> neuronsPerHiddenLayer)
{
// the number of neurons on the last layer build so far (i.e. the number of inputs for each neuron of the next layer)
int prvOuts = 1;
this.layers = new ArrayList<>();
// input layer
this.inputLayer = new NLayer(nInputs, prvOuts, this);
this.inputLayer.setAllWeightsTo(1.0);
this.inputLayer.setAllBiasesTo(0.0);
this.inputLayer.useSigmaForOutput(false);
prvOuts = nInputs;
this.layers.add(this.inputLayer);
// hidden layers
this.hiddenLayers = new ArrayList<>();
for (int i=0 ; i<neuronsPerHiddenLayer.size() ; i++)
{
this.hiddenLayers.add(new NLayer(neuronsPerHiddenLayer.get(i), prvOuts, this));
prvOuts = neuronsPerHiddenLayer.get(i);
}
this.layers.addAll(this.hiddenLayers);
// output layer
this.outputLayer = new NLayer(nOutputs, prvOuts, this);
this.layers.add(this.outputLayer);
this.initCoeffs();
}
private void initCoeffs ()
{
this.learningRates = new ArrayList<>();
// learning rates of the hidden layers
for (int i=0 ; i<this.hiddenLayers.size(); i++)
this.learningRates.add(NNetwork1.defaultLearningRate);
// learning rate of the output layer
this.learningRates.add(NNetwork1.defaultLearningRate);
}
public double getLearningRate (int layerIndex)
{
if (layerIndex > 0 && layerIndex <= this.hiddenLayers.size()+1)
{
return this.learningRates.get(layerIndex-1);
}
else
{
return 0;
}
}
public ArrayList<Double> getLearningRates ()
{
return this.learningRates;
}
public void setLearningRate (int layerIndex, double newLearningRate)
{
if (layerIndex > 0 && layerIndex <= this.hiddenLayers.size()+1)
{
this.learningRates.set(
layerIndex-1,
newLearningRate);
}
}
public void setLearningRates (Double... newLearningRates)
{
this.setLearningRates(Arrays.asList(newLearningRates));
}
public void setLearningRates (List<Double> newLearningRates)
{
int len = (this.learningRates.size() <= newLearningRates.size())
? this.learningRates.size()
: newLearningRates.size();
for (int i=0; i<len; i++)
this.learningRates
.set(i,
newLearningRates.get(i));
}
public double getMomentum ()
{
return this.momentum;
}
public void setMomentum (double momentum)
{
this.momentum = momentum;
}
public NNeuron getNeuron (int layerIndex, int neuronIndex)
{
if (layerIndex == 0)
return this.inputLayer.getNeurons().get(neuronIndex);
else if (layerIndex == this.hiddenLayers.size()+1)
return this.outputLayer.getNeurons().get(neuronIndex);
else
return this.hiddenLayers.get(layerIndex-1).getNeurons().get(neuronIndex);
}
public ArrayList<Double> getOutput (ArrayList<Double> inputs)
{
ArrayList<Double> lastOuts = inputs; // the last computed outputs of the last 'called' layer so far
// input layer
//lastOuts = this.inputLayer.getOutput(lastOuts);
lastOuts = this.getInputLayerOutputs(lastOuts);
// hidden layers
for (NLayer layer : this.hiddenLayers)
lastOuts = layer.getOutput(lastOuts);
// output layer
lastOuts = this.outputLayer.getOutput(lastOuts);
return lastOuts;
}
public ArrayList<ArrayList<Double>> getAllOutputs (ArrayList<Double> inputs)
{
ArrayList<ArrayList<Double>> outs = new ArrayList<>();
// input layer
outs.add(this.getInputLayerOutputs(inputs));
// hidden layers
for (NLayer layer : this.hiddenLayers)
outs.add(layer.getOutput(outs.get(outs.size()-1)));
// output layer
outs.add(this.outputLayer.getOutput(outs.get(outs.size()-1)));
return outs;
}
public ArrayList<ArrayList<Double>> getAllSums (ArrayList<Double> inputs)
{
//*
ArrayList<ArrayList<Double>> sums = new ArrayList<>();
ArrayList<Double> lastOut;
// input layer
sums.add(inputs);
lastOut = this.getInputLayerOutputs(inputs);
// hidden nodes
for (NLayer layer : this.hiddenLayers)
{
sums.add(layer.getSums(lastOut));
lastOut = layer.getOutput(lastOut);
}
// output layer
sums.add(this.outputLayer.getSums(lastOut));
return sums;
}
public ArrayList<Double> getInputLayerOutputs (ArrayList<Double> inputs)
{
ArrayList<Double> outs = new ArrayList<>();
for (int i=0 ; i<this.inputLayer.getNeurons().size() ; i++)
outs.add(this
.inputLayer
.getNeuron(i)
.getOutput(inputs.get(i)));
return outs;
}
public void changeWeights (
ArrayList<ArrayList<Double>> deltaW,
ArrayList<ArrayList<Double>> inputSet,
ArrayList<ArrayList<Double>> targetSet,
boolean checkError)
{
for (int i=0 ; i<deltaW.size()-1 ; i++)
this.hiddenLayers.get(i).changeWeights(deltaW.get(i), inputSet, targetSet, checkError);
this.outputLayer.changeWeights(deltaW.get(deltaW.size()-1), inputSet, targetSet, checkError);
}
public int train2 (
ArrayList<ArrayList<Double>> inputSet,
ArrayList<ArrayList<Double>> targetSet,
double maxError,
int maxIterations)
{
ArrayList<Double>
input,
target;
ArrayList<ArrayList<ArrayList<Double>>> prvNetworkDeltaW = null;
double error;
int i = 0, j = 0, traininSetLength = inputSet.size();
do // during each itreration...
{
error = 0.0;
for (j = 0; j < traininSetLength; j++) // ... for each training element...
{
input = inputSet.get(j);
target = targetSet.get(j);
prvNetworkDeltaW = this.train2_bp(input, target, prvNetworkDeltaW); // ... do backpropagation, and return the new weight deltas
error += this.getInputMeanSquareError(input, target);
}
i++;
} while (error > maxError && i < maxIterations); // iterate as much as necessary/possible
return i;
}
public ArrayList<ArrayList<ArrayList<Double>>> train2_bp (
ArrayList<Double> input,
ArrayList<Double> target,
ArrayList<ArrayList<ArrayList<Double>>> prvNetworkDeltaW)
{
ArrayList<ArrayList<Double>> layerSums = this.getAllSums(input); // the sums for each layer
ArrayList<ArrayList<Double>> layerOutputs = this.getAllOutputs(input); // the outputs of each layer
// get the layer deltas (inc the input layer that is null)
ArrayList<ArrayList<Double>> layerDeltas = this.train2_getLayerDeltas(layerSums, layerOutputs, target);
// get the weight deltas
ArrayList<ArrayList<ArrayList<Double>>> networkDeltaW = this.train2_getWeightDeltas(layerOutputs, layerDeltas, prvNetworkDeltaW);
// change the weights
this.train2_updateWeights(networkDeltaW);
return networkDeltaW;
}
public void train2_updateWeights (ArrayList<ArrayList<ArrayList<Double>>> networkDeltaW)
{
for (int i=1; i<this.layers.size(); i++)
this.layers.get(i).train2_updateWeights(networkDeltaW.get(i));
}
public ArrayList<ArrayList<ArrayList<Double>>> train2_getWeightDeltas (
ArrayList<ArrayList<Double>> layerOutputs,
ArrayList<ArrayList<Double>> layerDeltas,
ArrayList<ArrayList<ArrayList<Double>>> prvNetworkDeltaW)
{
ArrayList<ArrayList<ArrayList<Double>>> networkDeltaW = new ArrayList<>(this.layers.size());
ArrayList<ArrayList<Double>> layerDeltaW;
ArrayList<Double> neuronDeltaW;
for (int i=0; i<this.layers.size(); i++)
networkDeltaW.add(new ArrayList<ArrayList<Double>>());
double
deltaW, x, learningRate, prvDeltaW, d;
int i, j, k;
for (i=this.layers.size()-1; i>0; i--) // for each layer
{
learningRate = this.getLearningRate(i);
layerDeltaW = new ArrayList<>();
networkDeltaW.set(i, layerDeltaW);
for (j=0; j<this.layers.get(i).getNeurons().size(); j++) // for each neuron of this layer
{
neuronDeltaW = new ArrayList<>();
layerDeltaW.add(neuronDeltaW);
for (k=0; k<this.layers.get(i-1).getNeurons().size(); k++) // for each weight (i.e. each neuron of the previous layer)
{
d = layerDeltas.get(i).get(j);
x = layerOutputs.get(i-1).get(k);
prvDeltaW = (prvNetworkDeltaW != null)
? prvNetworkDeltaW.get(i).get(j).get(k)
: 0.0;
deltaW = -learningRate * d * x + this.momentum * prvDeltaW;
neuronDeltaW.add(deltaW);
}
// the bias !!
d = layerDeltas.get(i).get(j);
x = 1;
prvDeltaW = (prvNetworkDeltaW != null)
? prvNetworkDeltaW.get(i).get(j).get(prvNetworkDeltaW.get(i).get(j).size()-1)
: 0.0;
deltaW = -learningRate * d * x + this.momentum * prvDeltaW;
neuronDeltaW.add(deltaW);
}
}
return networkDeltaW;
}
ArrayList<ArrayList<Double>> train2_getLayerDeltas (
ArrayList<ArrayList<Double>> layerSums,
ArrayList<ArrayList<Double>> layerOutputs,
ArrayList<Double> target)
{
// get ouput deltas
ArrayList<Double> outputDeltas = new ArrayList<>(); // the output layer deltas
double
oErr, // output error given a target
s, // sum
o, // output
d; // delta
int
nOutputs = target.size(), // #TODO ?== this.outputLayer.size()
nLayers = this.hiddenLayers.size()+2; // #TODO ?== layerOutputs.size()
for (int i=0; i<nOutputs; i++) // for each neuron...
{
s = layerSums.get(nLayers-1).get(i);
o = layerOutputs.get(nLayers-1).get(i);
oErr = (target.get(i) - o);
d = -oErr * this.getNeuron(nLayers-1, i).sigmaPrime(s); // #TODO "s" or "o" ??
outputDeltas.add(d);
}
// get hidden deltas
ArrayList<ArrayList<Double>> hiddenDeltas = new ArrayList<>();
for (int i=0; i<this.hiddenLayers.size(); i++)
hiddenDeltas.add(new ArrayList<Double>());
NLayer nextLayer = this.outputLayer;
ArrayList<Double> nextDeltas = outputDeltas;
int
h, k,
nHidden = this.hiddenLayers.size(),
nNeurons = this.hiddenLayers.get(nHidden-1).getNeurons().size();
double
wdSum = 0.0;
for (int i=nHidden-1; i>=0; i--) // for each hidden layer
{
hiddenDeltas.set(i, new ArrayList<Double>());
for (h=0; h<nNeurons; h++)
{
wdSum = 0.0;
for (k=0; k<nextLayer.getNeurons().size(); k++)
{
wdSum += nextLayer.getNeuron(k).getWeight(h) * nextDeltas.get(k);
}
s = layerSums.get(i+1).get(h);
d = this.getNeuron(i+1, h).sigmaPrime(s) * wdSum;
hiddenDeltas.get(i).add(d);
}
nextLayer = this.hiddenLayers.get(i);
nextDeltas = hiddenDeltas.get(i);
}
ArrayList<ArrayList<Double>> deltas = new ArrayList<>();
// input layer deltas: void
deltas.add(null);
// hidden layers deltas
deltas.addAll(hiddenDeltas);
// output layer deltas
deltas.add(outputDeltas);
return deltas;
}
public double getInputMeanSquareError (ArrayList<Double> input, ArrayList<Double> target)
{
double diff, mse=0.0;
ArrayList<Double> output = this.getOutput(input);
for (int i=0; i<target.size(); i++)
{
diff = target.get(i) - output.get(i);
mse += (diff * diff);
}
mse /= 2.0;
return mse;
}
}
Some methods' names (with their return values/types) are quite self-explanatory, like "this.getAllSums" that returns the sums (sum(x_i*w_i) for each neuron) of each layer, "this.getAllOutputs" that return the outputs (sigmoid(sum) for each neuron) of each layer and "this.getNeuron(i,j)" that returns the j'th neuron of the i'th layer.
Thank you in advance for your help :)
Here is a very simple java implementation with tests in the main method :
import java.util.Arrays;
import java.util.Random;
public class MLP {
public static class MLPLayer {
float[] output;
float[] input;
float[] weights;
float[] dweights;
boolean isSigmoid = true;
public MLPLayer(int inputSize, int outputSize, Random r) {
output = new float[outputSize];
input = new float[inputSize + 1];
weights = new float[(1 + inputSize) * outputSize];
dweights = new float[weights.length];
initWeights(r);
}
public void setIsSigmoid(boolean isSigmoid) {
this.isSigmoid = isSigmoid;
}
public void initWeights(Random r) {
for (int i = 0; i < weights.length; i++) {
weights[i] = (r.nextFloat() - 0.5f) * 4f;
}
}
public float[] run(float[] in) {
System.arraycopy(in, 0, input, 0, in.length);
input[input.length - 1] = 1;
int offs = 0;
Arrays.fill(output, 0);
for (int i = 0; i < output.length; i++) {
for (int j = 0; j < input.length; j++) {
output[i] += weights[offs + j] * input[j];
}
if (isSigmoid) {
output[i] = (float) (1 / (1 + Math.exp(-output[i])));
}
offs += input.length;
}
return Arrays.copyOf(output, output.length);
}
public float[] train(float[] error, float learningRate, float momentum) {
int offs = 0;
float[] nextError = new float[input.length];
for (int i = 0; i < output.length; i++) {
float d = error[i];
if (isSigmoid) {
d *= output[i] * (1 - output[i]);
}
for (int j = 0; j < input.length; j++) {
int idx = offs + j;
nextError[j] += weights[idx] * d;
float dw = input[j] * d * learningRate;
weights[idx] += dweights[idx] * momentum + dw;
dweights[idx] = dw;
}
offs += input.length;
}
return nextError;
}
}
MLPLayer[] layers;
public MLP(int inputSize, int[] layersSize) {
layers = new MLPLayer[layersSize.length];
Random r = new Random(1234);
for (int i = 0; i < layersSize.length; i++) {
int inSize = i == 0 ? inputSize : layersSize[i - 1];
layers[i] = new MLPLayer(inSize, layersSize[i], r);
}
}
public MLPLayer getLayer(int idx) {
return layers[idx];
}
public float[] run(float[] input) {
float[] actIn = input;
for (int i = 0; i < layers.length; i++) {
actIn = layers[i].run(actIn);
}
return actIn;
}
public void train(float[] input, float[] targetOutput, float learningRate, float momentum) {
float[] calcOut = run(input);
float[] error = new float[calcOut.length];
for (int i = 0; i < error.length; i++) {
error[i] = targetOutput[i] - calcOut[i]; // negative error
}
for (int i = layers.length - 1; i >= 0; i--) {
error = layers[i].train(error, learningRate, momentum);
}
}
public static void main(String[] args) throws Exception {
float[][] train = new float[][]{new float[]{0, 0}, new float[]{0, 1}, new float[]{1, 0}, new float[]{1, 1}};
float[][] res = new float[][]{new float[]{0}, new float[]{1}, new float[]{1}, new float[]{0}};
MLP mlp = new MLP(2, new int[]{2, 1});
mlp.getLayer(1).setIsSigmoid(false);
Random r = new Random();
int en = 500;
for (int e = 0; e < en; e++) {
for (int i = 0; i < res.length; i++) {
int idx = r.nextInt(res.length);
mlp.train(train[idx], res[idx], 0.3f, 0.6f);
}
if ((e + 1) % 100 == 0) {
System.out.println();
for (int i = 0; i < res.length; i++) {
float[] t = train[i];
System.out.printf("%d epoch\n", e + 1);
System.out.printf("%.1f, %.1f --> %.3f\n", t[0], t[1], mlp.run(t)[0]);
}
}
}
}
}
I tried going over your code, but as you stated, it was pretty long.
Here's what I suggest:
To verify that your network is learning properly, try to train a simple network, like a network that recognizes the XOR operator. This shouldn't take all that long.
Use the simplest back-propagation algorithm. Stochastic backpropagation (where the weights are updated after the presentation of each training input) is the easiest. Implement the algorithm without the momentum term initially, and with a constant learning rate (i.e., don't start with adaptive learning-rates). Once you're satisfied that the algorithm is working, you can introduce the momentum term. Doing too many things at the same time increases the chances that more than one thing can go wrong. This makes it harder for you to see where you went wrong.
If you want to go over some code, you can check out some code that I wrote; you want to look at Backpropagator.java. I've basically implemented the stochastic backpropagation algorithm with a momentum term. I also have a video where I provide a quick explanation of my implementation of the backpropagation algorithm.
Hopefully this is of some help!
I have some problems with getting inheritance to work. In the parent class, the array Coefficients is private. I have some access methods but I still can't get it to work.
import java.util.ArrayList;
public class Poly {
private float[] coefficients;
public static void main (String[] args){
float[] fa = {3, 2, 4};
Poly test = new Poly(fa);
}
public Poly() {
coefficients = new float[1];
coefficients[0] = 0;
}
public Poly(int degree) {
coefficients = new float[degree+1];
for (int i = 0; i <= degree; i++)
coefficients[i] = 0;
}
public Poly(float[] a) {
coefficients = new float[a.length];
for (int i = 0; i < a.length; i++)
coefficients[i] = a[i];
}
public int getDegree() {
return coefficients.length-1;
}
public float getCoefficient(int i) {
return coefficients[i];
}
public void setCoefficient(int i, float value) {
coefficients[i] = value;
}
public Poly add(Poly p) {
int n = getDegree();
int m = p.getDegree();
Poly result = new Poly(Poly.max(n, m));
int i;
for (i = 0; i <= Poly.min(n, m); i++)
result.setCoefficient(i, coefficients[i] + p.getCoefficient(i));
if (i <= n) {
//we have to copy the remaining coefficients from this object
for ( ; i <= n; i++)
result.setCoefficient(i, coefficients[i]);
} else {
// we have to copy the remaining coefficients from p
for ( ; i <= m; i++)
result.setCoefficient(i, p.getCoefficient(i));
}
return result;
}
public void displayPoly () {
for (int i=0; i < coefficients.length; i++)
System.out.print(" "+coefficients[i]);
System.out.println();
}
private static int max (int n, int m) {
if (n > m)
return n;
return m;
}
private static int min (int n, int m) {
if (n > m)
return m;
return n;
}
public Poly multiplyCon (double c){
int n = getDegree();
Poly results = new Poly(n);
for (int i =0; i <= n; i++){ // can work when multiplying only 1 coefficient
results.setCoefficient(i, (float)(coefficients[i] * c)); // errors ArrayIndexOutOfBounds for setCoefficient
}
return results;
}
public Poly multiplyPoly (Poly p){
int n = getDegree();
int m = p.getDegree();
Poly result = null;
for (int i = 0; i <= n; i++){
Poly tmpResult = p.multiByConstantWithDegree(coefficients[i], i); //Calls new method
if (result == null){
result = tmpResult;
} else {
result = result.add(tmpResult);
}
}
return result;
}
public void leadingZero() {
int degree = getDegree();
if ( degree == 0 ) return;
if ( coefficients[degree] != 0 ) return;
// find the last highest degree with non-zero cofficient
int highestDegree = degree;
for ( int i = degree; i <= 0; i--) {
if ( coefficients[i] == 0 ) {
highestDegree = i -1;
} else {
// if the value is non-zero
break;
}
}
float[] newCoefficients = new float[highestDegree + 1];
for ( int i=0; i<= highestDegree; i++ ) {
newCoefficients[i] = coefficients[i];
}
coefficients = newCoefficients;
}
public Poly differentiate(){
int n = getDegree();
Poly newResult = new Poly(n);
if (n>0){ //checking if it has a degree
for (int i = 1; i<= n; i++){
newResult.coefficients[i-1]= coefficients[i] * (i); // shift degree by 1 and multiplies
}
return newResult;
} else {
return new Poly(); //empty
}
}
public Poly multiByConstantWithDegree(double c, int degree){ //used specifically for multiply poly
int oldPolyDegree = this.getDegree();
int newPolyDegree = oldPolyDegree + degree;
Poly newResult = new Poly(newPolyDegree);
//set all coeff to zero
for (int i = 0; i<= newPolyDegree; i++){
newResult.coefficients[i] = 0;
}
//shift by n degree
for (int j = 0; j <= oldPolyDegree; j++){
newResult.coefficients[j+degree] = coefficients[j] * (float)c;
}
return newResult;
}
}
Can anyone help me fix my Second class that inherits from the one above? I cant seem to get my multiply and add methods for the second class to work properly.
public class QuadPoly extends Poly
{
private float [] quadcoefficients;
public QuadPoly() {
super(2);
}
public QuadPoly(int degree) {
super(2);
}
public QuadPoly(float [] f) {
super(f);
if (getDegree() > 2){
throw new IllegalArgumentException ("Must be Quadratic");
}
}
public QuadPoly(Poly p){
super(p.coefficients);
for (int i = 0; i < coefficients.length; i++){
if (coefficients[i] < 0){
throw new Exception("Expecting positive coefficients!");
}
}
}
// public QuadPoly(Poly p){
// super(p.coefficients);
//}
public QuadPoly addQuad (QuadPoly p){
return new QuadPoly(super.add(p));
}
public QuadPoly multiplyQuadPoly (QuadPoly f){
if (quadcoefficients.length > 2){
throw new IllegalArgumentException ("Must be Quadratic");
}
return new QuadPoly(super.multiplyPoly(f));
}
I would make the coefficients protected or use an accessor method.
I wouldn't throw a plain checked Exception. An IllegalArgumentException would be a better choice.
What is quadcoefficients? They don't appear to be set anywhere.
You put coefficients private. I wouldn't change this but I would add a getter method into Poly class:
public class Poly {
//somecode here
public float[] getCoefficients(){
return this.coefficients;
}
}
Then I would use it by the getter method in other code;
public QuadPoly(Poly p){
super(p.getCoefficients);
//some more code here
}
Even if you make coefficient protected, you are trying to reach coefficients field of another Object, which is a parameter. So it is not related to inheritance and the problem.