I'm trying to do a simple prediction in DL4j (going to use it later for a large dataset with n features) but no matter what I do my network just doesn't want to learn and behaves very weird. Of course I studied all the tutorials and did the same steps shown in dl4j repo, but it doesn't work for me somehow.
For dummy features data I use:
*double[val][x] features; where val = linspace(-10,10)...; and x= Math.sqrt(Math.abs(val)) * val;
my y is : double[y] labels; where y = Math.sin(val) / val
DataSetIterator dataset_train_iter = getTrainingData(x_features, y_outputs_train, batchSize, rnd);
DataSetIterator dataset_test_iter = getTrainingData(x_features_test, y_outputs_test, batchSize, rnd);
// Normalize data, including labels (fitLabel=true)
NormalizerMinMaxScaler normalizer = new NormalizerMinMaxScaler(0, 1);
normalizer.fitLabel(false);
normalizer.fit(dataset_train_iter);
normalizer.fit(dataset_test_iter);
// Use the .transform function only if you are working with a small dataset and no iterator
normalizer.transform(dataset_train_iter.next());
normalizer.transform(dataset_test_iter.next());
dataset_train_iter.setPreProcessor(normalizer);
dataset_test_iter.setPreProcessor(normalizer);
//DataSet setNormal = dataset.next();
//Create the network
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.weightInit(WeightInit.XAVIER)
//.miniBatch(true)
//.l2(1e-4)
//.activation(Activation.TANH)
.updater(new Nesterovs(0.1,0.3))
.list()
.layer(new DenseLayer.Builder().nIn(numInputs).nOut(20).activation(Activation.TANH)
.build())
.layer(new DenseLayer.Builder().nIn(20).nOut(10).activation(Activation.TANH)
.build())
.layer( new DenseLayer.Builder().nIn(10).nOut(6).activation(Activation.TANH)
.build())
.layer(new OutputLayer.Builder(LossFunctions.LossFunction.MSE)
.activation(Activation.IDENTITY)
.nIn(6).nOut(1).build())
.build();
//Train and fit network
final MultiLayerNetwork net = new MultiLayerNetwork(conf);
net.init();
net.setListeners(new ScoreIterationListener(100));
//Train the network on the full data set, and evaluate in periodically
final INDArray[] networkPredictions = new INDArray[nEpochs / plotFrequency];
for (int i = 0; i < nEpochs; i++) {
//in fit we have already Backpropagation. See Release deeplearning
// https://deeplearning4j.konduit.ai/release-notes/1.0.0-beta3
net.fit(dataset_train_iter);
dataset_train_iter.reset();
if((i+1) % plotFrequency == 0) networkPredictions[i/ plotFrequency] = net.output(x_features, false);
}
// evaluate and plot
dataset_test_iter.reset();
dataset_train_iter.reset();
INDArray predicted = net.output(dataset_test_iter, false);
System.out.println("PREDICTED ARRAY " + predicted);
INDArray output_train = net.output(dataset_train_iter, false);
//Revert data back to original values for plotting
// normalizer.revertLabels(predicted);
normalizer.revertLabels(output_train);
normalizer.revertLabels(predicted);
PlotUtil.plot(om, y_outputs_train, networkPredictions);
My output seems then very weird (see picture below), even when I use miniBatch (1, 20,100 Samples/Batch) change number of epochs or add hidden nodes and hidden Layers (tryed to add 1000 Nodes and 5 Layers). The network either outputs very stochastic values or the one constant y. I just can't recognize, what is going wrong here. Why the network even doesn't approach the train function.
Another question: what doesn iter.reset() do exactly. Does the Iterator turn the pointer back to 0-Batch in the DataSetIterator?
A pretty common problem is people doing toy problems like this is dl4j's assumption of minibatches (which 99% of problems tend to be). You aren't actually doing minibatch learning (which actually defeats the point of actually using an iterator, which is meant to iterate through slices of a dataset, not an in memory small dataset) - a small recommendation is to just use the normal dataset api (which is what's returned from dataset.next())
Ensure you turn off the minibatch penalty dl4j assigns to all losses with:
.minibatch(false) - you can see that configuration here:
https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/NeuralNetConfiguration.java#L434
A unit test testing this behavior can be found here:
https://github.com/eclipse/deeplearning4j/blob/b4047006ac8175df295c2f3c008e7601437ea4dc/deeplearning4j/deeplearning4j-core/src/test/java/org/deeplearning4j/gradientcheck/GradientCheckTests.java#L94
For posterity, here is the relevant configuration:
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder().miniBatch(false)
.dataType(DataType.DOUBLE)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).updater(new NoOp())
.list()
.layer(0,
new DenseLayer.Builder().nIn(4).nOut(3)
.dist(new NormalDistribution(0, 1))
.activation(Activation.TANH)
.build())
.layer(1, new OutputLayer.Builder(LossFunction.MCXENT)
.activation(Activation.SOFTMAX).nIn(3).nOut(3).build())
.build();
You'll notice 2 things: 1 is minibatch is false and 2 is the configuration for data type double. You are also welcome to try that for your problem.
Dl4j to save memory tends to also assume float for the default data type.
This is a reasonable assumption when working on larger problems, but may not work well for toy problems.
For reference, you can find the application of the minibatch math here:
https://github.com/eclipse/deeplearning4j/blob/fc735d30023981ebbb0fafa55ea9520ec44292e0/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/updater/BaseMultiLayerUpdater.java#L332
This affects the gradient updates.
The score penalty can be found in the output layer:
https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/layers/BaseOutputLayer.java#L84
Essentially, both of these automatically penalize the loss update for your dataset reflected in both the loss and the gradient updates.
I have this code,
public class TfIdfExample {
public static void main(String[] args){
JavaSparkContext sc = SparkSingleton.getContext();
SparkSession spark = SparkSession.builder()
.config("spark.sql.warehouse.dir", "spark-warehouse")
.getOrCreate();
JavaRDD<List<String>> documents = sc.parallelize(Arrays.asList(
Arrays.asList("this is a sentence".split(" ")),
Arrays.asList("this is another sentence".split(" ")),
Arrays.asList("this is still a sentence".split(" "))), 2);
HashingTF hashingTF = new HashingTF();
documents.cache();
JavaRDD<Vector> featurizedData = hashingTF.transform(documents);
// alternatively, CountVectorizer can also be used to get term frequency vectors
IDF idf = new IDF();
IDFModel idfModel = idf.fit(featurizedData);
featurizedData.cache();
JavaRDD<Vector> tfidfs = idfModel.transform(featurizedData);
System.out.println(tfidfs.collect());
KMeansProcessor kMeansProcessor = new KMeansProcessor();
JavaPairRDD<Vector,Integer> result = kMeansProcessor.Process(tfidfs);
result.collect().forEach(System.out::println);
}
}
I need get Vectors for k-means, but I getting odd Vectors
[(1048576,[489554,540177,736740,894973],[0.28768207245178085,0.0,0.0,0.0]),
(1048576,[455491,540177,736740,894973],[0.6931471805599453,0.0,0.0,0.0]),
(1048576,[489554,540177,560488,736740,894973],[0.28768207245178085,0.0,0.6931471805599453,0.0,0.0])]
after k-means work I getting it
((1048576,[489554,540177,736740,894973],[0.28768207245178085,0.0,0.0,0.0]),1)
((1048576,[489554,540177,736740,894973],[0.28768207245178085,0.0,0.0,0.0]),0)
((1048576,[489554,540177,736740,894973],[0.28768207245178085,0.0,0.0,0.0]),1)
((1048576,[455491,540177,736740,894973],[0.6931471805599453,0.0,0.0,0.0]),1)
((1048576,[489554,540177,560488,736740,894973],[0.28768207245178085,0.0,0.6931471805599453,0.0,0.0]),1)
((1048576,[455491,540177,736740,894973],[0.6931471805599453,0.0,0.0,0.0]),0)
((1048576,[455491,540177,736740,894973],[0.6931471805599453,0.0,0.0,0.0]),1)
((1048576,[489554,540177,560488,736740,894973],[0.28768207245178085,0.0,0.6931471805599453,0.0,0.0]),0)
((1048576,[489554,540177,560488,736740,894973],[0.28768207245178085,0.0,0.6931471805599453,0.0,0.0]),1)
But I think it work not correctly, because tf-idf must have another view.
I think mllib have ready methods for this, but I tested documentation examples and don't receive what I need. Custom solution for Spark I have not found. May be somebody work with it and give me answer what I doing wrong? May be I am not correctly use mllib functional?
What you are getting after TF-IDF is a SparseVector.
To understand the values better, let me start with TF vectors:
(1048576,[489554,540177,736740,894973],[1.0,1.0,1.0,1.0])
(1048576,[455491,540177,736740,894973],[1.0,1.0,1.0,1.0])
(1048576,[489554,540177,560488,736740,894973],[1.0,1.0,1.0,1.0,1.0])
For instance, TF vector corresponding to the first sentence is a 1048576 (= 2^20) component vector, with 4 non-zero values corresponding to indices the 489554,540177,736740 and 894973, all other values are zeros and therefore not stored in the sparse vector representation.
The dimensionality of the feature vectors is equal to the number of buckets you hash into: 1048576 = 2^20 buckets in your case.
For a corpus of this size, you should consider reducing the number of buckets:
HashingTF hashingTF = new HashingTF(32);
powers of 2 are recommended to minimize number of hash collisions.
Next, you apply IDF weights:
(1048576,[489554,540177,736740,894973],[0.28768207245178085,0.0,0.0,0.0])
(1048576,[455491,540177,736740,894973],[0.6931471805599453,0.0,0.0,0.0])
(1048576,[489554,540177,560488,736740,894973],[0.28768207245178085,0.0,0.6931471805599453,0.0,0.0])
If we look at the first sentence again, we got 3 zeros - which is expected, since the terms "this", "is", and "sentence" appear in every document of the corpus, so by definition of IDF will be equal to zero.
Why do the zero values still in the (sparse) vector? Because in the current implementation, the size of the vector is kept the same and only the values are multiplied by IDF.
I'm new to the deeplearning4j library, but I've got some experience with neural networks in general.
I'm trying to train a recurrent neural network (a LSTM in particular) which is supposed to detect beats in music in realtime. All examples for using recurrent neural nets with deeplearning4j that I've found so far use a reader which reads the training data from a file. As I want to record music in realtime via a microphone, I can't read some pregenerated file, so the data which is fed into the neural network is generated in realtime by my application.
This is the code that I'm using to generate my network:
NeuralNetConfiguration.ListBuilder builder = new NeuralNetConfiguration.Builder()
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1)
.learningRate(0.1)
.rmsDecay(0.95)
.regularization(true)
.l2(0.001)
.weightInit(WeightInit.XAVIER)
.updater(Updater.RMSPROP)
.list();
int nextIn = hiddenLayers.length > 0 ? hiddenLayers[0] : numOutputs;
builder = builder.layer(0, new GravesLSTM.Builder().nIn(numInputs).nOut(nextIn).activation("softsign").build());
for(int i = 0; i < hiddenLayers.length - 1; i++){
nextIn = hiddenLayers[i + 1];
builder = builder.layer(i + 1, new GravesLSTM.Builder().nIn(hiddenLayers[i]).nOut(nextIn).activation("softsign").build());
}
builder = builder.layer(hiddenLayers.length, new RnnOutputLayer.Builder(LossFunctions.LossFunction.MCXENT).nIn(nextIn).nOut(numOutputs).activation("softsign").build());
MultiLayerConfiguration conf = builder.backpropType(BackpropType.TruncatedBPTT).tBPTTForwardLength(DEFAULT_RECURRENCE_DEPTH).tBPTTBackwardLength(DEFAULT_RECURRENCE_DEPTH)
.pretrain(false).backprop(true)
.build();
net = new MultiLayerNetwork(conf);
net.init();
In this case I'm using about 700 inputs (which is mostly FFT-data of the recorded audio), 1 output (which is supposed to output a number between 0 [no beat] and 1 [beat]) and my hiddenLayers array consists of the ints {50, 25, 10}.
For getting the output of the network I'm using this code:
double[] output = new double[]{net.rnnTimeStep(Nd4j.create(netInputData)).getDouble(0)};
where netInputData is the data I want to input into the network as a one-dimensional double array.
I'm relatively sure that this code is working fine, since I get some output for an untrained network which looks something like this when I plot it.
However, once I try to train a network (even if I train it just for a short time, which should alter the weights of the network just a little bit, so that the output should be very similar to the untrained network), I get an output which looks like a constant.
This is the code which I'm using to train the network:
for(int timestep = 0; timestep < trainingData.length - DEFAULT_RECURRENCE_DEPTH; timestep++){
INDArray inputDataArray = Nd4j.create(new int[]{1, numInputs, DEFAULT_RECURRENCE_DEPTH},'f');
for(int inputPos = 0; inputPos < trainingData[timestep].length; inputPos++)
for(int inputTimeWindowPos = 0; inputTimeWindowPos < DEFAULT_RECURRENCE_DEPTH; inputTimeWindowPos++)
inputDataArray.putScalar(new int[]{0, inputPos, inputTimeWindowPos}, trainingData[timestep + inputTimeWindowPos][inputPos]);
INDArray desiredOutputDataArray = Nd4j.create(new int[]{1, numOutputs, DEFAULT_RECURRENCE_DEPTH},'f');
for(int outputPos = 0; outputPos < desiredOutputData[timestep].length; outputPos++)
for(int inputTimeWindowPos = 0; inputTimeWindowPos < DEFAULT_RECURRENCE_DEPTH; inputTimeWindowPos++)
desiredOutputDataArray.putScalar(new int[]{0, outputPos, inputTimeWindowPos}, desiredOutputData[timestep + inputTimeWindowPos][outputPos]);
net.fit(new DataSet(inputDataArray, desiredOutputDataArray));
}
Once again, I've got my data for the input and for the desired output as a double array. This time the two arrays are two-dimensional. The first index represents the time (where index 0 is the first audio data of the recorded audio) and the second index represents the input (or respectively the desired output) for this time step.
Given the shown output after training a network, I tend to think that there must be something wrong with my code used for creating the INDArrays from my data. Am I missing some important step for initializing these arrays or did I mess up the order I need to put my data into these arrays?
Thank you for any help in advance.
I'm not sure, but perhaps 99.99% of your training examples are 0, with only an occasional 1 exactly where the beat occurs. This might be too imbalanced to learn. Good luck.
I have a dense symmetric matrix of size about 30000 X 30000 that contains distances between strings. Since the distance is symmetric, the upper triangle of the matrix is stored in a tab-separated 3-column file of the form
stringA<tab>stringB<tab>distance
I am using HashMap and org.javatuples.Pair to create a map to quickly look up distances for given pairs of string as follows:
import org.javatuples.Pair;
HashMap<Pair<String,String>,Double> pairScores = new HashMap<Pair<String,String>,Double>();
BufferedReader bufferedReader = new BufferedReader(new FileReader("data.txt"));
String line = null;
while((line = bufferedReader.readLine()) != null) {
String [] parts = line.split("\t");
String d1 = parts[0];
String d2 = parts[1];
Double score = Double.parseDouble(parts[2]);
Pair<String,String> p12 = new Pair<String,String>(d1,d2);
Pair<String,String> p21 = new Pair<String,String>(d2,d1);
pairScores.put(p12, score);
pairScores.put(p21, score);
}
data.txt is very big (~400M lines) and the process eventually slows down to a crawl with most time being spent in java.util.HashMap.put.
I don't think there should be (m)any hash code collisions on pairs but I might be wrong. How can I verify this? Is it enough to simply look at how unique p12.hashCode() and p12.hashCode() are?
If there are no collisions, what else could be causing to slow down?
Is there a batter way to construct this matrix for quick lookup?
I am now using Guava's Table<Integer, Integer, Double> after also realizing that my strings are unique enough that I could use their hashes, instead of the strings themselves, as keys, to reduce memory requirements. The creation of the table runs in reasonable time, however, there are issues with serializing and deserializing the resulting objects: I ran into out of memory errors even with the move from String to Integer. It seems to be working after I decided to not store both a-b and b-a pairs, but I might be balancing on the edge of what my machine can handle