I am working on a FF Neural network (used for classification problems) which I am training using a PSO. I only have one hidden layer and I can vary the amount of neurons in that layer.
My problem is that the NN can learn linearly separable problems quite easily but can not learn problems that are not linearly separable(like XOR) like it should be able to do.
I believe my PSO is working correctly because I cans see that it tries to minimises the error function of each particle (using mean squared error over the training set).
I have tried using a sigmoid and linear activation function with similar(bad) results. I also have a bias unit(which also doesn't help much).
What I want to know is if there is something specific that I might be doing wrong that might cause this type of problem, or maybe just some things I should look at where the error might be.
I am a bit lost at the moment
Thanks
PSO can train a neural network to non-solve linearly separable problems, like XOR. I've done this before, my algorithm takes about 50 or so iterations at most. Sigmoid is a good activation function for XOR. If it does converge for non-separable problems then my guess is somehow your hidden layer is not having an effect, or is bypassed. As the hidden layer is what typically allows non-separable.
When I debug AI I find it often useful to determine first if my training code or evaluation code (the neural network in this case) is at fault. You might want to create a 2nd trainer for your network. Then you can make sure your network code is calculating the output correctly. You could even do a simple "hill climber". Pick a random weight and change by a random small amount (up or down). Did your error get better? Keep the weight change and repeat. Did your error get worse, drop the change and try again.
Related
My problem:
I implemented a feed forward model and a recurrent model with deeplearning4j to detect anomalies in a 1D signal.
Maybe I'm missing an abstraction but I thought I could solve this problem the following way:
Preprocess the data. I have 5 different failure categories and have roundabout 40 examples each.
Each failure has his own "structure".
Building a neural net with 5 output neurons, one for each failure.
Train and evaluate.
Now I wanted to test my net with real data and it should detect the anomalies in a very long 1D
signal. The idea was, that the net should somehow "iterate" over the signal and detect these failures
in it.
Is this approach even possible? Do u have any ideas?
Thanks in advance!
It depends on how the structure to those defects looks like.
Given that you have a 1D signal, I expect that your examples are a sequence of data that is effectively a window over your continuous signal.
There are multiple ways to model that problem:
Sliding window
This works if all of your examples have the same length. In that case, you can make a normal feed forward network, that just takes a fixed number of steps as input and returns a single classification.
If your real data doesn't have enough data, you can pad it, and if it has more data than the example length, then you slide over the sequence (e.g. with a window size of 2 the sequence abcd turns into [ab], [bc], [cd] and you get 3 classifications).
As far as I know there is nothing in DL4J out of the box that implements this solution. But on the other hand it shouldn't be too hard to implement it yourself using RecordConverter.toRecord and RecordConverter.toArray to transform your real data into NDArrays.
Recurrent Network
Using a recurrent network, you can apply a neural network to any length of sequence data. This will likely be your choice if the faults you are looking for can have different lengths in the signal.
The recurrent network can have an internal state that gets updated on each call during inference and it will produce a classification after each step of your signal.
What the right solution is for you, will depend entirely on your actual concrete use case.
I am creating an evolution simulator in Java. The simulation consists of a map with cold/hot regions and high/low elevation, etc'.
I want the creatures in the world to evolve in two ways- every single creature will evolve it's AI during the course of his lifetime, and when a creature reproduces there is a chance for mutation.
I thought it would be good to make the brain of the creatures a neural network that takes the sensor's data as input (only eyes at the moment), and produces commands to the thrusters (which move the creature around).
However, I only have experience with basic neural networks that recieve desired inputs from the user and calculate the error accordingly. However in this simulator, there is no optimal result. Results can be rated by a fitness function I have created (which takes in count energy changes, amount of offsprings, etc'), but it is unknown which output node is wrong and which is right.
Am I using the correct approach for this problem? Or perhaps neural networks are not the best solution for it?
If it is a viable way to achieve what I desire, how can I make the neural network adjust the correct weights if I do not know them?
Thanks in advance, and sorry for any english mistakes.
You ran into a common problem with neural networks and games. As mentioned in the comments a Genetic algorithm is often used when there is no 'correct' solution.
So your goal is basically to somehow combine neural networks and genetic algorithms. Luckily somebody did this before and described the process in this paper.
Since the paper is relatively complex and it is very time consumeing to implemwnt the algorithm you should consider using a library.
Since I couldn't find any suiting library for me, I decided to write my own one, you can find it here
The library should work good enough for 'smaller' problems like yours. You will find some example code in the Main class.
Combine networks using
Network.breedWith(Network other);
Create networks using
Network net = new Network(int inputs, int outputs);
Mutate networks using
Network.innovate();
As you will see in the example code it is important to always have an initial amount of mutations for each new network. This is because when you create a new network there are no connections, so innovation (fancy word for mutation) is needed to create connections.
If needed you can always create copys of networks (Network.getCopy();). The Network class and all of its attributes implement serializable, so you can save/load a network using an ObjectOutputStream.
If you decide to use my library please let me know what results you got!
Ok so first off Hi,
Secondly its 3am and I promise you I am fried after spending 3 solid days I understand RNN's and RTRL, but my calculus brain has ran away from me at this point in time.
Basically I'm at the stage where I Need to calculate this:
More specifically:
I used a variety of sites and my textbook (which had 0 on this subject), but this is the rest of my primary source willamette.edu
The issue I'm having is how to programmatically (Java) partially differentiate
Yk in the direction Wij
I can't wrap my head around how to go about that.
NOTE: I do actually understand how the RNN and RTRL works, confidently.
You do not "programmatically (Java) partially differentiate", you do this analyticaly, and implement the simple solution. All these operations are described in any neural network related book.
In particular, having d y_k(t)/d w_ij. y_k(t) is a function of w_ij assuming (for simplicity), that this is one layer network theny_k(t) is of form y_k(t) = f( sum w_ij x_j(t) ) (i represent bias as a neuron). So calculating the partial derivative leads to the f'( sum w_ij x_j(t) ) * x_j(t)
Before going to RTRL you should first understand simple backpropagation through time`, which requires understanding the simple backpropagation first. Then, you can go into RTRL - I recommend following tutorial.
The problem - I have 10 number of cards value 1 to 10. Now I have to arrange the cards in away that adding 5 cards gives me 36 and product of remaining 5 cards give me 360.
I had successfully made a GA to solve cards Problem in java. Now I am thinking to solve same problem with Neural Network. Is it possible to solve this by NN? What approach should I take?
This problem is hard to solve directly with a Neural Network. Neural Networks are not going to have a concept of sum or product, so they won't be able to tell the difference between a valid and invalid solution directly.
If you created enough examples and labelled then then the neural network might be able to learn to tell the "good" and "bad" arrangements apart just by memorising them all. But it would be a very inefficient and inaccurate way of doing this, and it would be somewhat pointless - you'd have to have a separate program that knew how to solve the problem in order to create the data to train the neural network.
P.S. I think you are a bit lucky that you managed to get the GA to work as well - I suspect it only worked because the problem is small enough for the GA to try most of the possible solutions in the vicinity of the answer(s) and hence it stumbles upon a correct answer by chance before too long.
To follow up on #mikera's comments on why Neural Networks (NNs) might not be best for this task, it is useful to consider how NNs are usually used.
A NN is usually used in a supervised learning task. That is, the implementer provides many examples of input and the correct output that goes with that input. The NN then finds a general function which captures the provided input/output pairs and hopefully captures many other previously unseen input/output pairs as well.
In your problem you are solving a particular optimization, so there isn't much training to be done. There is just one (or more) right answers. So, NNs aren't really designed for such problems.
Note that the concept of not having a sum/product doesn't necessarily hurt a NN. You just have to create your own input layer which has sum and product features so that the NN can learn directly from these features. But, in this problem it won't help very much.
Note also that your problem is so small that even a naive enumeration of all combinations (10! = 3,628,800) of numbers should be achievable in a few seconds at most.
I'm trying to add to the code for a single layer neural network which takes a bitmap as input and has 26 outputs for the likelihood of each letter in the alphabet.
The first question I have is regarding the single hidden layer that is being added. Am I correct in thinking that the hidden layer will have it's own set of output values and weights only? It doesn't need to have it's own bias'?
Can I also confirm that I'm thinking about the feedforward aspect correctly? Here's some pseudocode:
// input => hidden
for j in hiddenOutput.length:
sum=inputs*hiddenWeights
hiddenOutput[j] = activationFunction(sum)
// hidden => output
for j in output.length:
sum=hiddenOutputs*weights
output[j] = activationFunction(sum)
Assuming that is correct, would the training be something like this?
def train(input[], desired[]):
iterate through output and determine errors[]
update weights & bias accordingly
iterate through hiddenOutput and determine hiddenErrors[]
update hiddenWeights & (same bias?) accordingly
Thanks in advance for any help, I've read so many examples and tutorials and I'm still having trouble determining how to do everything correctly.
Dylan, this is probably long after your homework assignment was due, but I do have a few thoughts about what you've posted.
Make the hidden layer much bigger than the size of the input bitmaps.
You should have different weights and biases from input -> hidden than from hidden -> output.
Spend a lot of time on your error function (discriminator).
Understand that neural nets have a tendency to get quickly locked in to a set of weights (usually incorrect). You'll need to start over and train in a different order.
The thing I learned about neural nets is that you never know why they're working (or not working). That alone is reason to keep it out of the realms of medicine and finance.
you might want to read http://www.ai-junkie.com/ann/evolved/nnt1.html . there it mentions exactly something about what you are doing. It also provided code along with a (mostly) simple explanation of how it learns. Although the learning aspect is completely different from feed forward this should hopefully give you some ideas about the nature of NN.
It is my belief that even the hidden and output layers should have a bias.
Also NN can be tricky, try first identifying only 1 letter. Getting a consistent high/low signal from only a single output. Then try to keep that signal with different variations of the same letter. Then you can progress and add more. You might do that by teaching 26 different networks that give an output only on a match. Or maybe you make it as one large NN with 26 outputs. Two different approaches.
As far as the use of bias terms is concerned I found the section Why use a bias/threshold? in the comp.ai.neural-nets FAQ very useful. I highly recommend reading that FAQ.