LSTM in DL4J - All output values are the same

LSTM in DL4J - All output values are the same - java

I'm trying to create a simple LSTM using DeepLearning4J, with 2 input features and a timeseries length of 1. I'm having a strange issue however; after training the network, inputting test data yields the same, arbitrary result regardless of the input values. My code is shown below.
(UPDATED)
public class LSTMRegression {
public static final int inputSize = 2,
lstmLayerSize = 4,
outputSize = 1;
public static final double learningRate = 0.0001;
public static void main(String[] args) {
int miniBatchSize = 99;
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.miniBatch(false)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.updater(new Adam(learningRate))
.list()
.layer(0, new LSTM.Builder().nIn(inputSize).nOut(lstmLayerSize)
.weightInit(WeightInit.XAVIER)
.activation(Activation.TANH).build())
// .layer(1, new LSTM.Builder().nIn(lstmLayerSize).nOut(lstmLayerSize)
// .weightInit(WeightInit.XAVIER)
// .activation(Activation.SIGMOID).build())
// .layer(2, new LSTM.Builder().nIn(lstmLayerSize).nOut(lstmLayerSize)
// .weightInit(WeightInit.XAVIER)
// .activation(Activation.SIGMOID).build())
.layer(1, new RnnOutputLayer.Builder(LossFunctions.LossFunction.MSE)
.weightInit(WeightInit.XAVIER)
.activation(Activation.IDENTITY)
.nIn(lstmLayerSize).nOut(outputSize).build())
.backpropType(BackpropType.TruncatedBPTT)
.tBPTTForwardLength(miniBatchSize)
.tBPTTBackwardLength(miniBatchSize)
.build();
final var network = new MultiLayerNetwork(conf);
final DataSet train = getTrain();
final INDArray test = getTest();
final DataNormalization normalizer = new NormalizerMinMaxScaler(0, 1);
// = new NormalizerStandardize();
normalizer.fitLabel(true);
normalizer.fit(train);
normalizer.transform(train);
normalizer.transform(test);
network.init();
for (int i = 0; i < 100; i++)
network.fit(train);
final INDArray output = network.output(test);
normalizer.revertLabels(output);
System.out.println(output);
}
public static INDArray getTest() {
double[][][] test = new double[][][]{
{{20}, {203}},
{{16}, {183}},
{{20}, {190}},
{{18.6}, {193}},
{{18.9}, {184}},
{{17.2}, {199}},
{{20}, {190}},
{{17}, {181}},
{{19}, {197}},
{{16.5}, {198}},
...
};
INDArray input = Nd4j.create(test);
return input;
}
public static DataSet getTrain() {
double[][][] inputArray = {
{{18.7}, {181}},
{{17.4}, {186}},
{{18}, {195}},
{{19.3}, {193}},
{{20.6}, {190}},
{{17.8}, {181}},
{{19.6}, {195}},
{{18.1}, {193}},
{{20.2}, {190}},
{{17.1}, {186}},
...
};
double[][] outputArray = {
{3750},
{3800},
{3250},
{3450},
{3650},
{3625},
{4675},
{3475},
{4250},
{3300},
...
};
INDArray input = Nd4j.create(inputArray);
INDArray labels = Nd4j.create(outputArray);
return new DataSet(input, labels);
}
}
Here's an example of the output:
(UPDATED)
00:06:04.554 [main] WARN o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.554 [main] WARN o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
00:06:04.555 [main] WARN o.d.nn.multilayer.MultiLayerNetwork - Cannot do truncated BPTT with non-3d inputs or labels. Expect input with shape [miniBatchSize,nIn,timeSeriesLength], got [99, 2, 1] and labels with shape [99, 1]
[[[3198.1614]],
[[2986.7781]],
[[3059.7017]],
[[3105.3828]],
[[2994.0127]],
[[3191.4468]],
[[3059.7017]],
[[2962.4341]],
[[3147.4412]],
[[3183.5991]]]
So far I've tried tried changing a number of hyperparameters, including the updater (previously Adam), the activation function in the hidden layers (previously ReLU), and the learning rate; none of which fixed the issue.
Thank you.

This is always either a tuning issue or input data. In your case your input data is wrong.
You almost always need need to normalize your input data or your network won't learn anything. This is also true for your outputs. Your output labels should also be normalized.
Snippets below:
//Normalize data, including labels (fitLabel=true)
NormalizerMinMaxScaler normalizer = new NormalizerMinMaxScaler(0, 1);
normalizer.fitLabel(true);
normalizer.fit(trainData); //Collect training data statistics
normalizer.transform(trainData);
normalizer.transform(testData);
Here's how to revert:
//Revert data back to original values for plotting
normalizer.revert(trainData);
normalizer.revert(testData);
normalizer.revertLabels(predicted);
There are different kinds of normalizers, the below just does 0 to 1. Sometimes NormalizeStandardize could be better here. That will normalize the data by subtracting the mean and dividing by the variance in the data.
That will be something like this:
NormalizerStandardize myNormalizer = new NormalizerStandardize();
myNormalizer.fitLabel(true);
myNormalizer.fit(sampleDataSet);
Afterwards your network should train normally.
Edit: If that doesn't work,due to the size of your dataset dl4j also has a knob (I explained this in my comment below) that normally is true where we assume your data is minibatch. On most reasonable problems (read: not 10 data points) this works. Otherwise the training can be all over the place. We can turn off the minibatch assumption with:
ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
.miniBatch(false)
The same is true for multilayer network as well.
Also of note is your architecture is vastly overkill for what is a VERY small unrealistic problem for DL. DL usually requires a lot more data to work properly. That is why you see layers stacked multiple times. For a problem like this I would suggest reducing the number of layers to 1.
At each layer what's essentially happening is a form of compression of information. When your number of data points is small, you eventually lose signal through the network when you've saturated it. Subsequent layers tend to not learn very well in that case.

Related

Dense and Sparse Tensor Input to TensorFlow Model in Java

After training the model in python & loading it in Java, for making predictions, how is it possible to create sparse tensors for categorical inputs.
I could successfully create tensors for numeric values as :
Tensor x =
Tensor.create(
new long[] {2, 4},
FloatBuffer.wrap(
new float[] {
6.4f, 3.2f, 4.5f, 1.5f,
5.8f, 3.1f, 5.0f, 1.7f
}));
But for Categorical data, we need Sparse Tensors, how can we create it?
Please find my input_fn() as :
def input_fn(df):
# Creates a dictionary mapping from each continuous feature column name (k) to
# the values of that column stored in a constant Tensor.
continuous_cols = {k: tf.constant(df[k].values)
for k in CONTINUOUS_COLUMNS}
# Creates a dictionary mapping from each categorical feature column name (k)
categorical_cols = {k: tf.SparseTensor(
indices=[[i, 0] for i in range(df[k].size)],
values=df[k].values,
dense_shape=[df[k].size, 1])
for k in CATEGORICAL_COLUMNS}
# Merges the two dictionaries into one.
feature_cols = dict(continuous_cols.items() + categorical_cols.items())
# Converts the label column into a constant Tensor.
label = tf.constant(df[LABEL_COLUMN].values)
# Returns the feature columns and the label.
return feature_cols, label
So what if i have input like below:
age workclass fnlwgt education education_num marital_status occupation race LABEL(Income_bracket)
39 State-gov 77516 Bachelors 13 Never-married Adm-clerical White 3
How can i create a tensor for continous values and categorical values and merge them to be provided as the input to Tensorflow in JAVA.
Find the code for training the model in python - https://gist.github.com/gaganmalhotra/cd6a5898b9caf9005a05c8831a9b9153
#ash

commons-math differentiation result is 0

I'm trying to use commons-math library for some numerical differentiation task. I've built a very simple function using DerivativeStructures which I thought would work; apparently I was wrong.
public static void main(String[] args) {
DerivativeStructure x0 = new DerivativeStructure(2, 2, 2.0);
DerivativeStructure y0 = new DerivativeStructure(2, 2, 4.0);
DerivativeStructure xi = x0.pow(2);
DerivativeStructure yi = y0.pow(2);
DerivativeStructure f = xi.add(yi);
System.out.println(f.getValue());
System.out.println(f.getPartialDerivative(1, 0)); // (?)
System.out.println(f.getPartialDerivative(0, 1)); // (?)
}
I'm trying to get the 1st and 2nd order partial derivatives of a the multivariate function f(x)=x^2+y^2 at point (2.0, 4.0). As a result I'd expect 4.0 for df/dx and 8.0 for df/dy as first order partials. 2.0 for second order partials. I however am getting the correct f(x,y) value and I don't even have the slightest idea from this javadoc. I saw a couple questions here on stackoverflow with some comments about the opaque documentation for commons-math but not a working example on multivariate functions. Univariate I can work out, but not this...
Any tips would be appreciated!

In your code you haven't really specified 2 independent variables x0, y0 but only 1. With DerivativeStructure x0, y0 are actually seen as functions themselves depending on an implicit vector of variables p. For each independent variable you have to give a different index into the p vector of independent variables. What you need to do is:
DerivativeStructure x0 = new DerivativeStructure(2, 2, 0, 2.0);
DerivativeStructure y0 = new DerivativeStructure(2, 2, 1, 4.0);
Where the third parameter(s) 0 and 1 indicate 2 different indexes in the p vector therefore two different independent variables. If you omit this parameter when creating a DerivativeStructure, 0 is assumed so in your code x0, y0 are not independent.
Further Reading

Detecting all abrupt changes in array

How do you find abrupt change in an array? For example, if you have following array:
1,3,8,14,58,62,69
In this case, there is a jump from 14 to 58
OR
79,77,68,61,9,3,1
In this case, there is a drop from 61 to 9
In both examples, there are small and big jumps. For example, in 2nd case, there is a small drop from 77 to 68. However, this must be ignored if a larger jump/drop is found. I have following algorithm in my mind but I am not sure if this will cover all possible cases:
ALGO
Iterate over array
Diff (i+1)-i
store first difference in a variable
if next diff is bigger than previous then overwrite
For the following example, this algo will not work for the following case:
1, 2, 4, 6, 34, 38, 41, 67, 69, 71
There are two jumps in this array. So it should be arranged like
[1, 2, 4, 6], [34, 38, 41], [67, 69, 71]

In the end, this is pure statistics. You have a data set; and you are look for a certain forms of outliers. In that sense, your requirement to detect "abrupt changes" is not very precise.
I think you should step back here; and have a deeper look into the mathematics behind your problem - to come up with clear "semantics" and crisp definitions for your actual problem (for example based on average, deviation, etc.). The wikipedia link I gave above should be a good starting point for that part.
From there on, to get to an Java implementation, you might start looking here.

I would look into using a Moving Average, this involves looking at an average for the last X ammount of values. Do this based on the change in value (Y1 - Y2). Any large deviations from the average could be seen as a big shift.
However given how small your datasets are a moving average would likely yeild bad results. With such a small sample size it might be better to take an average of all values in the array instead:
double [] nums = new double[] {79,77,68,61,9,3,1};
double [] deltas = new double[nums.length-1];
double advDelta = 0;
for(int i=0;i<nums.length-1;i++) {
deltas[i] = nums[i+1]-nums[i];
advDelta += deltas[i] / deltas.length;
}
// search for deltas > average
for(int i=0;i<deltas.length;i++) {
if(Math.abs(deltas[i]) > Math.abs(advDelta)) {
System.out.println("Big jump between " + nums[i] + " " + nums[i+1]);
}
}

This problem doesn't have an absolute solution, you'll have to determine thresholds for the context in which the solution is to be applied.
No algorithm can give us the rule for the jump. We as humans are able to determine these changes because we are able to see the entire data at one glance for now. But if data set is large enough then it would be difficult for us to say which jumps are to be considered. For example if on average differences between consecutive numbers are 10 then any difference above that would be considered a jump. However in a large data set there could be differences which are sort of spikes or which start a new normal difference like from 10 to differences suddenly become 100. We will have to decide if we want to get the jumps based on the difference average 10 or 100.
If we are interested in local spike only then it's possible to use moving average as suggested by #ug_
However moving average has to be moving, meaning we maintain a set of local numbers with a fixed set size. On that we calculate the average of the differences and then compare them to the local differences.
However here also we again face the problem to determine the size of the local set. This threshold determines the granularity of the jumps that we capture. A very large set will tend to ignore the closer jumps and a smaller one will tend to provide false positives.
Following a simple solution where you can try setting the thresholds. Local set size in this case is 3, that's the minimum that can be used as it will give us minimum count of differences required that is 2.
public class TestJump {
public static void main(String[] args) {
int[] arr = {1, 2, 4, 6, 34, 38, 41, 67, 69, 71};
//int[] arr = {1,4,8,13,19,39,60,84,109};
double thresholdDeviation = 50; //percent jump to detect, set for your reuirement
double thresholdDiff = 3; //Minimum difference between consecutive differences to avoid false positives like 1,2,4
System.out.println("Started");
for(int i = 1; i < arr.length - 1; i++) {
double diffPrev = Math.abs(arr[i] - arr[i-1]);
double diffNext = Math.abs(arr[i+1] - arr[i]);
double deviation = Math.abs(diffNext - diffPrev) / diffPrev * 100;
if(deviation > thresholdDeviation && Math.abs(diffNext - diffPrev) > thresholdDiff) {
System.out.printf("Abrupt change # %d: (%d, %d, %d)%n", i, arr[i-1], arr[i], arr[i+1]);
i++;
}
//System.out.println(deviation + " : " + Math.abs(diffNext - diffPrev));
}
System.out.println("Finished");
}
}
Output
Started
Abrupt change # 3: (4, 6, 34)
Abrupt change # 6: (38, 41, 67)
Finished
If you're trying to solve a larger problem than just arrays like finding spikes in medical data or images, then you should checkout neural networks.

Learning the boolean AND function using a perceptron

I'm new to machine learning. I've written this code http://ideone.com/t9VOag for training a perceptron to learn the boolean AND function using the perceptron training rule.
The perceptron never learnes the correct weights. Errors for input (1, -1) and (-1, 1) make the weights to oscillate between 0.7999999999999999, 0.20000000000000004 and 0.7, 0.300000000000000 which is obvious as
For input 1, -1
target output - output given = 0-1 = -1
w1 = w1 + n*(t-o)*1 = w1 - n
w2 = w2 + n*(t-o)*(-1) = w2 + n
for input -1, 1
t-o = 0-1 = -1
w1 = w1 + (n)(t-o)(-1) = w1 + n
w2 = w2 + (n)(t-o)(1) = w2 - n
The weights are getting increased and decreased by the same amount
If I include the weight w0 for being updated during learning, it reaches a solution (but w0 isn't supposed to be updated?).
What is the correct implementation?

Take w0 out of the code altogether. Your perceptron should have 2 input nodes and 1 output node with a single weight connecting each input node to the output node. Like this (excuse the bad ascii art):
I1
\
\W1
\
Out
/
/W2
/
I2
You are effectively feeding in a strong bias by setting W0 to 1

In contrast to your statement "but w0 isn't supposed to be updated", w0 is supposed to be updated, however, input to w0 is always your unchangable bias (for example 1).
Intuition: Look at your problem; you have two inputs could be either 1 or -1 and they could change their positions without affecting the result. This is the nature of "and" operator. Therefore, w1 should be equal to w2 and bias weight (w0) should be zero.
Briefly, your code is correct and you should just uncomment updating w0.

Delta training rule for perceptron training

I'm trying to train a perceptron for the AND boolean function using the delta training rule. But even after convergence, it's wrongly classifying the inputs (1 input actually). Could you please tell me where am I wrong : http://ideone.com/CDgTQE
This is the training function used:
public void trianWithDelta(Example[] examples){
for(int i=0;i<1000;++i){
dw1 = 0;
dw2 = 0;
for(Example ex:examples){
double o = computeOutput(ex);
double t = ex.o;
dw1 = dw1 + n*(t-o)*ex.x1;
dw2 = dw2 + n*(t-o)*ex.x2;
}
w1 += dw1;
w2 += dw2;
}
}
The training examples (boolean AND):
Example[] examples = new Example[]{
new Example(-1, -1, -1),
new Example(-1 , 1, -1),
new Example( 1, -1, -1),
new Example( 1, 1, 1)
};
Results :
w1 : 0.49999999999999994 w2 : 0.5000000000000002
Tests using the training examples after training :
-1
1 (incorrect)
-1
1

Your code is actually correct, the problem lies in your understanding of what can be learned using an unbiased perceptron and what can't.
If you do not have a bias, then learning AND is nearly impossible because:
there is exactly one angle separating your data, which is realized for line y=-x, in your code it would mean that w1=w2, and even slightest difference between their values will break the classifier (such as 1e-20)
you classifier actualy answers three values (as you use sign function): -1, 0, 1 while it is impossible to separate AND without bias in such setting, as you need to answer -1 when activation is 0.
Try to draw the correct separator on piece of paper, you will notice, that without bias your line has to cross (0,0), thus, it has to be y=-x, and consequently for (-1,1) and (1,-1) the activation is 0.
Both problems can be solved by just adding bias node (and this is what you should do).
You can also change "a bit" definition of AND - for example by encoding "False" as -2
Example[] examples = new Example[]{
new Example(-2, -2, -2),
new Example(-2 , 1, -2),
new Example( 1, -2, -2),
new Example( 1, 1, 1)
};
And runing your code behaves as expected
Trained weights : 0.6363636363636364 0.6363636363636364
-1
-1
-1
1

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

LSTM in DL4J - All output values are the same - java

Related

Dense and Sparse Tensor Input to TensorFlow Model in Java

commons-math differentiation result is 0

Detecting all abrupt changes in array

Learning the boolean AND function using a perceptron

Delta training rule for perceptron training

Categories

Resources