Delta training rule for perceptron training

Delta training rule for perceptron training - java

I'm trying to train a perceptron for the AND boolean function using the delta training rule. But even after convergence, it's wrongly classifying the inputs (1 input actually). Could you please tell me where am I wrong : http://ideone.com/CDgTQE
This is the training function used:
public void trianWithDelta(Example[] examples){
for(int i=0;i<1000;++i){
dw1 = 0;
dw2 = 0;
for(Example ex:examples){
double o = computeOutput(ex);
double t = ex.o;
dw1 = dw1 + n*(t-o)*ex.x1;
dw2 = dw2 + n*(t-o)*ex.x2;
}
w1 += dw1;
w2 += dw2;
}
}
The training examples (boolean AND):
Example[] examples = new Example[]{
new Example(-1, -1, -1),
new Example(-1 , 1, -1),
new Example( 1, -1, -1),
new Example( 1, 1, 1)
};
Results :
w1 : 0.49999999999999994 w2 : 0.5000000000000002
Tests using the training examples after training :
-1
1 (incorrect)
-1
1

Your code is actually correct, the problem lies in your understanding of what can be learned using an unbiased perceptron and what can't.
If you do not have a bias, then learning AND is nearly impossible because:
there is exactly one angle separating your data, which is realized for line y=-x, in your code it would mean that w1=w2, and even slightest difference between their values will break the classifier (such as 1e-20)
you classifier actualy answers three values (as you use sign function): -1, 0, 1 while it is impossible to separate AND without bias in such setting, as you need to answer -1 when activation is 0.
Try to draw the correct separator on piece of paper, you will notice, that without bias your line has to cross (0,0), thus, it has to be y=-x, and consequently for (-1,1) and (1,-1) the activation is 0.
Both problems can be solved by just adding bias node (and this is what you should do).
You can also change "a bit" definition of AND - for example by encoding "False" as -2
Example[] examples = new Example[]{
new Example(-2, -2, -2),
new Example(-2 , 1, -2),
new Example( 1, -2, -2),
new Example( 1, 1, 1)
};
And runing your code behaves as expected
Trained weights : 0.6363636363636364 0.6363636363636364
-1
-1
-1
1

Related

How can I fix this dot product method without adjusting the number of neurons? (Java)

I'm testing my neural network for XOR comparisons, and i've encountered an error i'd like to fix without altering the number of neurons in the first hidden layer. The code causing the error is:
public double dotProduct(int[][] a, double[][] ds)
{
int i;
double sum = 0;
for(i = 0; i < a.length; i++)
{
int j;
for(j = 0; j < a[i].length; j++)
{
sum += a[i][j] * ds[i][j];
}
}
return sum;
}
is giving me a null pointer exception. The dot product calculation itself is used to generate the dot product from an inputset my neural net has been provided with.
The input set is this:
int inputSets[][] =
{
{0, 0, 1},
{0, 1, 1},
{1, 0, 1},
{0, 1, 0},
{1, 0, 0},
{1, 1, 1},
{0, 0, 0}
};
It's a multidimensional array containing 7 arrays. It is then used in this:
public double think(int[][] input)
{
output_from_layer1 = sigmoid(dotProd.dotProduct(input, layer1.getWeights()));
return output_from_layer1;
}
The sigmoid part of the function isn't an issue, as it takes a double and
dotProduct is supposed to output a double. The issue as far as I'm aware is that the dotProduct function is taking a larger multidimensional array, and then attempting to cross it with a smaller one (The layer1.getWeights getter that calls the weights array for that layer).
The weights of a layer are defined as such:
layerWeights = new double[numNeurons][inpNum];
and the layer that's being used in the dot product is:
XORlayer layer1 = new XORlayer(4, 3);
So 4 neurons with 3 inputs each. The issue stems from the fact that there aren't enough neurons in this layer for the amount of inputs, as far as i'm aware, which is generating the null pointer exception when there isn't anything further to multiply against the input values.
We have 12 inputs in the neurons, and 21 input values.
My main question is, is there a way to solve this issue so the dot product operation is completed successfully without simply expanding the amount of neurons the layer contains to 7?

This discussion might help. As suggested there, since you're using a 2D array, matrix multiplication (instead of dot product) would likely be more appropriate.
Of course, similar to the dot product, the dimensions must be aligned for matrix multiplication.
inputSets is a 7x3 matrix and layerWeights is a 4x3 matrix. The transpose of layerWeights is a 3x4 matrix. Now the dimensions are aligned, and the matrix multiplication results in a 7x4 matrix.
Based on the posted code, I would suggest something like this:
output_from_layer1 = sigmoid(matrixMult.multiply(input, transpose(layer1.getWeights())));

Finding higher values from arrays that are all closer than predefined distances

I have arrays a1 to an each containing m number of elements. I have another symmetric n X n matrix b containing distance between the arrays. I want to select one element from each array x1 to xn limited to the following constraint. (a1 is an array and x1 a single value taken from a1)
For every xi (which was originally aiu) and xj (which was originally ajv), where i is not same as j, and u and v are the original array indices, we have |u - v| <= bij.
The total sum of x1 to xn is the maximum of all possible such sets.
An example
a1 = [1, 2, 3, 8, -1, -1, 0, -1]
a2 = [1, 2, 4, 0, -1, 1, 10, 11]
b = |0, 2|
|2, 0|
The selected values are x1 = 8 and x2 = 4. One can notice that we didn't select 10 or 11 from the second because the nearest possible value for any of them is just 0.
Now when I have only two arrays I can do the following in java in O(n2) time, I guess, and find the maximum sum, which is 12 in this case. How can I achieve better solution for more than 2 arrays?
int[][] a = new int[][]{{1, 2, 3, 8, -1, -1, 1, -1}, {1, 2, 4, 0, -1, 1, 10, 11}};
int[][] b = new int[][]{{0, 2}, {2, 0}};
int maxVal = Integer.MIN_VALUE;
for (int i = 0; i < a[0].length; i++) {
for (int j = Math.max(i - b[0][1], 0); j < Math.min(a[1].length, i + b[0][1]); j++) {
maxVal = Math.max(maxVal, a[0][i] + a[1][j]);
}
}
System.out.println("The max val: "+maxVal);

You can't use dynamic programming here, because there is no optimal substructure: the b_1n entry can ruin a highly valuable path from x_1 to x_{n-1}. So it's probably hard to avoid exponential time in general. However, for a set of b_ij that do reasonably restrict the choices, there is a straightforward backtracking approach that should have reasonable performance:
At each step, a value has been selected from some of the a_i, but no choice has yet been made from the others. (The arrays selected need not be a prefix of the list, or even contiguous.)
If a choice has been made for every array, return (from this recursive call) the score obtained.
Consider, for each pair of a chosen array and a remaining array, the interval of indices available for selection in the latter given the restriction on distance from the choice made in the former.
Intersect these intervals for each remaining array. If any intersection is empty, reject this proposed set of choices and backtrack.
Otherwise, select the remaining array with the smallest set of choices available. Add each choice to the set of proposed choices and recurse. Return the best score found and the choice made to obtain it, if any, or reject and backtrack.
The identification of the most-constrained array is critical to performance: it constitutes a form of fuzzy belief propagation, efficiently pruning future choices incompatible with present choices necessitated by prior choices. Depending on the sort of input you expect, there might be value in doing further prioritization/pruning based on achievable scores.
My 35-line Python implementation, given a 10x10 random matrix of small integers and b_ij a constant 2, ran in a few seconds. b_ij=3 (which allows up to 7 of the 10 values for each pair of arrays!) took about a minute.

Learning the boolean AND function using a perceptron

I'm new to machine learning. I've written this code http://ideone.com/t9VOag for training a perceptron to learn the boolean AND function using the perceptron training rule.
The perceptron never learnes the correct weights. Errors for input (1, -1) and (-1, 1) make the weights to oscillate between 0.7999999999999999, 0.20000000000000004 and 0.7, 0.300000000000000 which is obvious as
For input 1, -1
target output - output given = 0-1 = -1
w1 = w1 + n*(t-o)*1 = w1 - n
w2 = w2 + n*(t-o)*(-1) = w2 + n
for input -1, 1
t-o = 0-1 = -1
w1 = w1 + (n)(t-o)(-1) = w1 + n
w2 = w2 + (n)(t-o)(1) = w2 - n
The weights are getting increased and decreased by the same amount
If I include the weight w0 for being updated during learning, it reaches a solution (but w0 isn't supposed to be updated?).
What is the correct implementation?

Take w0 out of the code altogether. Your perceptron should have 2 input nodes and 1 output node with a single weight connecting each input node to the output node. Like this (excuse the bad ascii art):
I1
\
\W1
\
Out
/
/W2
/
I2
You are effectively feeding in a strong bias by setting W0 to 1

In contrast to your statement "but w0 isn't supposed to be updated", w0 is supposed to be updated, however, input to w0 is always your unchangable bias (for example 1).
Intuition: Look at your problem; you have two inputs could be either 1 or -1 and they could change their positions without affecting the result. This is the nature of "and" operator. Therefore, w1 should be equal to w2 and bias weight (w0) should be zero.
Briefly, your code is correct and you should just uncomment updating w0.

most efficient way of changing an int to an X and Y direction

I have a set of key codes, with values (mod 4 of course), 0 to 3 corresponding to the keys down, left, up, right, in that order. I need to convert these key codes into x and y directions, with a positive x indicating a location left of the origin, and an positive y indicating a location below the origin. The way I see it, I have two ways of doing this:
using arrays:
int [] dx = {0, -1, 0, 1};
int [] dy = {1, 0, -1, 0};
int x = dx[kc];
int y = dy[kc];
or using arithmetic:
int x = (kc%2)*(((kc/2)%2)*2 - 1);
int y = ((kc+1)%2)*(((kc/2)%2)*-2 + 1);
which would be more efficient?

It probably depends on the language. I would think the integer representation would be more efficient. Or better yet, if you need space you could represent directions with bit strings. You would need 4 bits for the four directions. Most ints are 4 bytes, which is 8x the storage! Then again, this probably doesn't affect anything unless you are storing a LOT of these.
I would abstract away the representation with direction methods (getDirection(), setDirection(), etc) and then try running your program with several different kinds.
Edit: woops, I meant to make this a comment, not an answer. Sorry about that.

Profiling would be your friend, but, I would separate your constants out in a different way. Consider:
private static final int[][] directions = {
{0, 1},
{-1, 0},
{0, -1},
{1, 0}
};
Then you can do it as simply:
x = directions[kc][0];
y = directions[kc][1];

First of all, I wouldn't really worry about the efficiency of either approach since it's very unlikely that this code will be the bottleneck in any real world application. I do however, think that the first approach one is much more readable. So if you value your maintenance and debugging time, that's the way to go.
If performance is that important, and this piece of code is critical, you should actually benchmark the two approaches. Use something like google caliper for that.
Second, you can optimize the second approach by replacing the (somewhat slow) modulus operation with a logical AND (x &= 0xfffffffe is the same as x%=2 only faster, assuming x is an int). And replacing the multiplication by 2 with a logical left shift (so x<<1 instead of x*2).

Here's yet another way to do the conversion.
package com.ggl.testing;
import java.awt.Point;
public class Convert {
public Point convertDirection(int index) {
// 0 to 3 corresponds to the keys down, left, up, right
Point[] directions = { new Point(0, 1), new Point(-1, 0),
new Point(0, -1), new Point(1, 0) };
return directions[index];
}
}

Liblinear usage format

I am using .NET implementation of liblinear in my C# code by the following nuget package:
https://www.nuget.org/packages/Liblinear/
But in the readme file of liblinear, the format for x is:
struct problem describes the problem:
struct problem
{
int l, n;
int *y;
struct feature_node **x;
double bias;
};
where `l` is the number of training data. If bias >= 0, we assume
that one additional feature is added to the end of each data
instance. `n` is the number of feature (including the bias feature
if bias >= 0). `y` is an array containing the target values. (integers
in classification, real numbers in regression) And `x` is an array
of pointers, each of which points to a sparse representation (array
of feature_node) of one training vector.
For example, if we have the following training data:
LABEL ATTR1 ATTR2 ATTR3 ATTR4 ATTR5
----- ----- ----- ----- ----- -----
1 0 0.1 0.2 0 0
2 0 0.1 0.3 -1.2 0
1 0.4 0 0 0 0
2 0 0.1 0 1.4 0.5
3 -0.1 -0.2 0.1 1.1 0.1
and bias = 1, then the components of problem are:
l = 5
n = 6
y -> 1 2 1 2 3
x -> [ ] -> (2,0.1) (3,0.2) (6,1) (-1,?)
[ ] -> (2,0.1) (3,0.3) (4,-1.2) (6,1) (-1,?)
[ ] -> (1,0.4) (6,1) (-1,?)
[ ] -> (2,0.1) (4,1.4) (5,0.5) (6,1) (-1,?)
[ ] -> (1,-0.1) (2,-0.2) (3,0.1) (4,1.1) (5,0.1) (6,1) (-1,?)
But, in the example showing java implementation:
https://gist.github.com/hodzanassredin/6682771
problem.x <- [|
[|new FeatureNode(1,0.); new FeatureNode(2,1.)|]
[|new FeatureNode(1,2.); new FeatureNode(2,0.)|]
|]// feature nodes
problem.y <- [|1.;2.|] // target values
which means his data set is:
1 0 1
2 2 0
So, he is not storing the nodes as per sparse format of liblinear. Does, anyone know of correct format for x for liblinear implementation?

Though it doesn't address exactly the library you mentioned, I can offer you an alternative. The
Accord.NET Framework has recently incorporated all of LIBLINEAR's algorithms in its machine learning
namespaces. It is also available through NuGet.
In this library, the direct syntax to create a linear support vector machine from in-memory data is
// Create a simple binary AND
// classification problem:
double[][] problem =
{
// a b a + b
new double[] { 0, 0, 0 },
new double[] { 0, 1, 0 },
new double[] { 1, 0, 0 },
new double[] { 1, 1, 1 },
};
// Get the two first columns as the problem
// inputs and the last column as the output
// input columns
double[][] inputs = problem.GetColumns(0, 1);
// output column
int[] outputs = problem.GetColumn(2).ToInt32();
// However, SVMs expect the output value to be
// either -1 or +1. As such, we have to convert
// it so the vector contains { -1, -1, -1, +1 }:
//
outputs = outputs.Apply(x => x == 0 ? -1 : 1);
After the problem is created, one can learn a linear SVM using
// Create a new linear-SVM for two inputs (a and b)
SupportVectorMachine svm = new SupportVectorMachine(inputs: 2);
// Create a L2-regularized L2-loss support vector classification
var teacher = new LinearDualCoordinateDescent(svm, inputs, outputs)
{
Loss = Loss.L2,
Complexity = 1000,
Tolerance = 1e-5
};
// Learn the machine
double error = teacher.Run(computeError: true);
// Compute the machine's answers for the learned inputs
int[] answers = inputs.Apply(x => Math.Sign(svm.Compute(x)));
This assumes, however, that your data is already in-memory. If you wish to load your data from the
disk, from a file in libsvm sparse format, you can use the framework's SparseReader class.
An example on how to use it can be found below:
// Suppose we are going to read a sparse sample file containing
// samples which have an actual dimension of 4. Since the samples
// are in a sparse format, each entry in the file will probably
// have a much smaller number of elements.
//
int sampleSize = 4;
// Create a new Sparse Sample Reader to read any given file,
// passing the correct dense sample size in the constructor
//
SparseReader reader = new SparseReader(file, Encoding.Default, sampleSize);
// Declare a vector to obtain the label
// of each of the samples in the file
//
int[] labels = null;
// Declare a vector to obtain the description (or comments)
// about each of the samples in the file, if present.
//
string[] descriptions = null;
// Read the sparse samples and store them in a dense vector array
double[][] samples = reader.ReadToEnd(out labels, out descriptions);
Afterwards, one can use the samples and labels vectors as the inputs and outputs of the problem,
respectively.
I hope it helps.
Disclaimer: I am the author of this library. I am answering this question in the sincere hope it
can be useful for the OP, since not long ago I also faced the same problems. If a moderator thinks
this looks like spam, feel free to delete. However, I am only posting this because I think it might
help others. I even came across this question by mistake while searching for existing C#
implementations of LIBSVM, not LIBLINEAR.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.