weka GUI and Java code give different results - java

I'm using weka java API to do a grid search in order to find the optimal parameters for MultilayerPerceptron. However, the RMSE (I'm doing regression here) given by my java code is different from that given by weka GUI. Here is the code:
public class ANN {
/**
* #param args
*/
public static void main(String[] args) throws Exception{
DataSource source = new DataSource("/home/yongfeng/ML/Project/choose_openning_price/holdout.arff");
Instances raw = source.getDataSet();
int trainSize = (int) Math.round(raw.numInstances()*0.666666666);
int testSize = raw.numInstances() - trainSize;
Instances train = new Instances(raw, 0, trainSize);
Instances test = new Instances(raw, trainSize, testSize);
train.setClassIndex(0);
test.setClassIndex(0);
final int sizeOfSearch = 15;
double[][] resultsArray = new double[sizeOfSearch][sizeOfSearch];
for (int i=0;i < sizeOfSearch;i++){
for (int j=0;j < sizeOfSearch;j++){
double m = i;
double k = j;
double learningRate = (m+1)/1000;
double momentum = (k+1)/100;
MultilayerPerceptron ann = new MultilayerPerceptron();
String options = String.format("-L %f -M %f -N 500 -V 0 -S 0 -E 20 -H a", learningRate, momentum);
ann.setOptions(weka.core.Utils.splitOptions(options));
ann.buildClassifier(train);
Evaluation eval = new Evaluation(train);
eval.evaluateModel(ann, test);
double error = eval.rootMeanSquaredError();
System.out.println("learningRate: " + learningRate + "\tMomentum: " + momentum + "\tError: " + error);
printOptions(ann.getOptions());
resultsArray[i][j] = error;
ann = null;
eval = null;
}
}
}
}
I even printed out the options in each iteration and they turned out to be the same as those in weka GUI. The attribute to be predicted is the first one, so setClassIndex(0); And used train-test set split to do the evaluation. Can anybody help? Many thanks!

Use weka.jar in weka installation folder in your java code.

Related

How to generate random numbers with uniform distribution in Java?

So, i'm having trouble generating random numbers with uniform distribution in java, given the maximum and the minimun value of some attributes in some data set (Iris from UCI for machine learning). What i have is iris dataset, in some 2-d-array called samples. I put the random values according to the maximun and the minimun value of each attribute in iris data set (without the class attribute) in a 2-d-array called gworms (which has some extra fields for some other values of the algorithm).
So far, the full algorithm is not working properly, and my thoughts are in the fact that maybe the gworms (the points in 4-d space) are not generating correctly or with a good randomness. I think that the points are to close to each other (this i think because of some results obtained later whose code is not shown here). So, i'm asking for your help to validate this code in which i implement "uniform distribution" for gworms (for de first 4 positions):
/*
* To change this license header, choose License Headers in Project
Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package glowworms;
import java.lang.Math;
import java.util.ArrayList;
import java.util.Random;
import weka.core.AttributeStats;
import weka.core.Instances;
/**
*
* #author oscareduardo937
*/
public class GSO {
/* ************ Initializing parameters of CGSO algorithm ******************** */
int swarmSize = 1000; // Swarm size m
int maxIte = 200;
double stepSize = 0.03; // Step size for the movements
double luciferin = 5.0; // Initial luciferin level
double rho = 0.4; // Luciferin decay parameter
double gamma = 0.6; // Luciferin reinforcement parameter
double rs = 0.38; // Initial radial sensor range. This parameter depends on the data set and needs to be found by running experiments
double gworms[][] = null; // Glowworms of the swarm.
/* ************ Initializing parameters of clustering problem and data set ******************** */
int numAtt; // Dimension of the position vector
int numClasses; // Number of classes
int total_data; //Number of instances
int threshold = 5;
int runtime = 1;
/*Algorithm can be run many times in order to see its robustness*/
double minValuesAtts[] = new double[this.numAtt]; // Minimum values for all attributes
double maxValuesAtts[] = new double[this.numAtt]; // Maximum values for all attributes
double samples[][] = new double[this.total_data][this.numAtt]; //Samples of the selected dataset.
ArrayList<Integer> candidateList;
double r;
/*a random number in the range [0,1)*/
/* *********** Method to put the instances in a matrix and get max and min values for attributes ******************* */
public void instancesToSamples(Instances data) {
this.numAtt = data.numAttributes();
System.out.println("********* NumAttributes: " + this.numAtt);
AttributeStats attStats = new AttributeStats();
if (data.classIndex() == -1) {
//System.out.println("reset index...");
data.setClassIndex(data.numAttributes() - 1);
}
this.numClasses = data.numClasses();
this.minValuesAtts = new double[this.numAtt];
this.maxValuesAtts = new double[this.numAtt];
System.out.println("********* NumClasses: " + this.numClasses);
this.total_data = data.numInstances();
samples = new double[this.total_data][this.numAtt];
double[] values = new double[this.total_data];
for (int j = 0; j < this.numAtt; j++) {
values = data.attributeToDoubleArray(j);
for (int i = 0; i < this.total_data; i++) {
samples[i][j] = values[i];
}
}
for(int j=0; j<this.numAtt-1; j++){
attStats = data.attributeStats(j);
this.maxValuesAtts[j] = attStats.numericStats.max;
this.minValuesAtts[j] = attStats.numericStats.min;
//System.out.println("** Min Value Attribute " + j + ": " + this.minValuesAtts[j]);
//System.out.println("** Max Value Attribute " + j + ": " + this.maxValuesAtts[j]);
}
//Checking
/*for(int i=0; i<this.total_data; i++){
for(int j=0; j<this.numAtt; j++){
System.out.print(samples[i][j] + "** ");
}
System.out.println();
}*/
} // End of method InstancesToSamples
public void initializeSwarm(Instances data) {
this.gworms = new double[this.swarmSize][this.numAtt + 2]; // D-dimensional vector plus luciferin, fitness and intradistance.
double intraDistance = 0;
Random r = new Random(); //Random r;
for (int i = 0; i < this.swarmSize; i++) {
for (int j = 0; j < this.numAtt - 1; j++) {
//Uniform randomization of d-dimensional position vector
this.gworms[i][j] = this.minValuesAtts[j] + (this.maxValuesAtts[j] - this.minValuesAtts[j]) * r.nextDouble();
}
this.gworms[i][this.numAtt - 1] = this.luciferin; // Initial luciferin level for all swarm
this.gworms[i][this.numAtt] = 0; // Initial fitness for all swarm
this.gworms[i][this.numAtt + 1] = intraDistance; // Intra-distance for gworm i
}
//Checking gworms
/*for(int i=0; i<this.swarmSize; i++){
for(int j=0; j<this.numAtt+2; j++){
System.out.print(gworms[i][j] + "** ");
}
System.out.println();
}*/
} // End of method initializeSwarm
}
The main class is this one:
package uniformrandomization;
/**
*
* #author oscareduardo937
*/
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.FileNotFoundException;
import weka.core.Instances;
import glowworms.GSO;
public class UniformRandomization {
public UniformRandomization(){
super();
}
//Loading the data from the filename file to the program. It can be .arff or .csv
public static BufferedReader readDataFile(String filename) {
BufferedReader inputReader = null;
try {
inputReader = new BufferedReader(new FileReader(filename));
} catch (FileNotFoundException ex) {
System.err.println("File not found: " + filename);
}
return inputReader;
}
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws Exception {
// TODO code application logic here
BufferedReader datafile1 = readDataFile("src/data/iris.arff");
Instances data = new Instances(datafile1);
GSO gso = new GSO();
gso.instancesToSamples(data);
gso.initializeSwarm(data);
System.out.println("Fin...");
}
}
So i want to know if with this code, the numbers of the position ij of the gworms are generating within the range of max value and min value for attribute j.
Thanks so much in advanced.

Vectors in one plot in ImageJ (Java)

I write a plugin in ImageJ, and I need some idea.
I have a plugin which generate plots for every image in a stack. So if I have 4 image in a stack, it will generate 4 plot from a vector.
But I need to be one plot with 4 curve. Please help me. `
This is the code.
public void run(String arg){
openImage();
if (cancel==false){
options();
}
if (cancel==false){
for (int k=0;k<imp.getStackSize();k++){
imp.setSlice(k+1);
generateESFArray("ESF Plot",imp,roi);
generateLSFArray("LSF Plot",ESFArray);
calculateMax();
ESFArrayF=alignArray(ESFArray);
if (cancel==false){
LSFArrayF=alignArray(LSFArray);
}
if (cancel==false){
ESFVector=averageVector(ESFArrayF);
}
if (cancel==false){
LSFVector=averageVector(LSFArrayF);
int aura = (LSFVector.length * 2);
LSFDVector = new double [aura];
int j = 0;
int aura2 = (LSFVector.length);
for(i=0;i<(LSFDVector.length-3); i++){
if(i % 2 == 0) {
LSFDVector[i]= LSFVector[j];
j=j+1;
}else {
LSFDVector[i]= ((0.375*LSFVector[(j-1)]) + (0.75*LSFVector[(j)]) - (0.125*LSFVector[(j+1)]));
}
}
LSFDVector[i] = ((LSFVector[j-1] + LSFVector[j])*0.5);
LSFDVector[i+1] = LSFVector[j];
LSFDVector[i+2] = LSFVector[j];
int indexMax = 0;
double valorMax = LSFDVector[0];
for(int i=0;i<LSFDVector.length;i++){
if(valorMax < LSFDVector[i]){
indexMax = i;
valorMax = LSFDVector[i];
}
}
i=indexMax;
LSFDVector[i-1]=((LSFDVector[i-2] + LSFDVector[i])*0.5);
MTFVector=fftConversion(LSFDVector, "MTF");
Max=obtenerMax();
SPPVector=fftConversion(Max,"SPP");
LSFArrayF=alignArray(LSFArray);
if (MTFButton.isSelected()){
generatePlot (MTFVector,"MTF");
...
}
void generatePlot(double[] Vector, String plot){
double[]xValues;
String ejeX="pixel";
String ejeY="";
String allTitle="";
ImageProcessor imgProc;
xValues=calculateXValues(Vector,plot);
//plot titles
if (plot=="ESF"){
ejeY="Grey Value";
...
allTitle=plot + "_" + title;
plotResult = new Plot(allTitle, ejeX, ejeY, xValues, Vector);
//plot limits
if (plot=="ESF"){
plotResult.setLimits(1,Vector.length,0,yMax);
}
plotResult.draw();
plotResult.show();
}
`
The ij.gui.Plot class has an addPoints method allowing you to add multiple data series to a plot. The Groovy script below illustrates its usage. Just paste the code into ImageJ's script editor, choose Language > Groovy and press Run to try it.
import ij.gui.Plot
plot = new Plot("Multiple Line Plot", "x values", "y values", (double[])[0,1,2,3,4], (double[])[0.1,0.3,0.5,0.6,0.7])
plot.addPoints((double[])[0,1,2,3,4], (double[])[0.2,0.15,0.1,0.05,0.05], Plot.LINE)
plot.setLimits(0, 4, 0, 1)
plot.draw()
plot.show()
For any further questions regarding the usage of the ImageJ API, you might get better help on the ImageJ forum.

Math calculation in Java

I am working on a simple program to calculate a mathematical equation. But there is a problem that I could not find. Any help would be greatly appreciated.
It seems a problem is with
alpha[j] = (double)(j-1)*2*Math.PI/(double)rotationNum;
NullPointerException is returned. There has to be some silly mistakes here.
import java.util.*;
import java.io.*;
//import Jama.Matrix;
class efun {
static double epso;
static double sigma;
static double alpha[];
static double charge;
static double axisR;
static double axisZ;
//static Random randGen;
static int numPoints = -1;
static int rotationNum;
public static void main (String[] args) {
try {
sigma = 300e-6*1e2;
epso = 8.854e-12;
/*Input arguments*/
numPoints = Integer.parseInt (args[0]);
FileReader fr = new FileReader(args[1]);
rotationNum = Integer.parseInt (args[2]);
BufferedReader br = new BufferedReader(fr);
double pointsR[] = new double[numPoints];
double pointsZ[] = new double[numPoints];
double chargeDensity[] = new double[numPoints];
double electricField = 0.0;
double ER = 0.0;
double EZ = 0.0;
double EY = 0.0;
for (int id = 0; id < numPoints; id++) {
// read file
while ( (line = br.readLine() )!= null) {
StringTokenizer stk = new StringTokenizer(line);
axisR = Double.parseDouble(stk.nextToken());
axisZ = Double.parseDouble(stk.nextToken());
charge = Double.parseDouble(stk.nextToken());
pointsR[id] = axisR;
pointsZ[id] = axisZ;
chargeDensity[id] = charge;
System.out.println("axisR: "+pointsR[id]+" and axisZ: "+ pointsZ[id]+"; its corresponding charge density is: "+ chargeDensity[id]);
double rotatedR[] = new double[numPoints];
double rotatedZ[] = new double[numPoints];
double rotatedY[] = new double[numPoints];
double sumSquarePoints[] = new double[numPoints];
for (int j = 1; j < rotationNum+1; j++) {
alpha[j] = (double)(j-1)*2*Math.PI/(double)rotationNum;
System.out.println("print alpha: "+alpha[j]);
rotatedR[id] = pointsR[id] - Math.cos(alpha[j])*pointsR[id];
rotatedZ[id] = pointsZ[id];
rotatedY[id] = pointsR[id] - Math.sin(alpha[j])*pointsR[id];
sumSquarePoints[id] = Math.sqrt(rotatedR[id]*rotatedR[id] + rotatedZ[id]*rotatedZ[id] + rotatedY[id]*rotatedY[id]);
ER += chargeDensity[id]*rotatedR[id]/(sumSquarePoints[id]*sumSquarePoints[id]*sumSquarePoints[id]);
EZ += chargeDensity[id]*rotatedZ[id]/(sumSquarePoints[id]*sumSquarePoints[id]*sumSquarePoints[id]);
EY += chargeDensity[id]*rotatedY[id]/(sumSquarePoints[id]*sumSquarePoints[id]*sumSquarePoints[id]);
System.out.println ("ER is: "+ ER);
System.out.println ("EZ is: "+ EZ);
System.out.println ("EY is: "+ EY);
}
}
}
electricField = sigma/(4*Math.PI*epso)*Math.sqrt(ER*ER + EZ*EZ + EY*EY);
System.out.println("electricField is: " + electricField);
}
catch (Exception e) {
e.printStackTrace();
}
}
}
You never initialized alpha, you only declared it, so you can't access alpha[j]. Initialize it and make sure that its size is large enough for every j:
alpha = new double[MY_SIZE];
Also, make sure that you're passing in at least 3 arguments to main so that rotationNum is assigned correctly.
Your class variable alpha is declared, but not initialized, so Java gives it the default value of null. The variable was never initialized to any array.
static double alpha[];
However, it doesn't look like you're using any other intended value in the array except for the current value. Just declare it to be a local double (not an array), and use it as a normal variable.
double alpha = (double)(j-1)*2*Math.PI/(double)rotationNum;
And use alpha instead of alpha[j] a few lines down from there.
You've never initialized alpha[]. Just like pontsR, pointsZ, and chargeDensity, you need to point alpha at a new array of doubles before you can use it.
Add alpha = new double[rotationNum ]; after getting rotationNum

libsvm classifying - at bad stage so far

Problem: I have certain set of data to be classified - Useful(1)/Useless(0). I will provide full set of data as input for training purpose of the classifier. and test with different data set.
For this, I am trying to convert my data to LIBSVM format. before doing anything, I thought of providing numeric input of one vector and check the result.
Input:
Training: 1 1 2 (the first 1 indicates useful Class in this vector followed by numeric input)
Testing: 1 1 2(I am not sure of input data format)
Output:
(0:0.9982708183417436)(1:0.0017291816582564153)(Actual:1.0 Prediction:0.0)
I dont have class 0 in training set, but it has probEstimated for class 0.
I am not really sure of how to convert my data to numeric vector input and fetch the data from the numeric test data set to equivalent Data as supplied. ANY HELP IN THIS REGARD IS HIGHLY APPRECIATED.
Planned tasks:
1. Load all the data to Hash tables and get the keys to be saved in data sets with respective classifier - USEFUL(1).
2. Supply the data set to the svmTrain and get the model.
3. Prepare test data set(Convert each word/phrase to respective numeric value saved training set, if found. Else, assign a new value).
4. Supply the test set and model to the SVM's EVALUATE method.
5. Get the resultant vectors from the USEFUL class and re-map to the data.
Code: used from different sources.
public class Datatosvmformat {
static double[][] train = new double[1000][3];
public static void main(String[] args) {
// TODO Auto-generated method stub
HashMap<String, Integer> dataSet = new HashMap<String, Integer>();
double[][] test = new double[10][3];
train[1][0] = 1;
train[1][1] = 1;
train[1][2] = 2;
svm_model model = svmTrain();
//Test Data Set
double[] test1 = new double[3];
test1[0] = 1;
test1[1] = 1;
test1[2] = 2;
evaluate(test1,model);
}
private static svm_model svmTrain() {
svm_problem prob = new svm_problem();
int dataCount = train.length;
prob.y = new double[dataCount];
prob.l = dataCount;
prob.x = new svm_node[dataCount][];
for (int i = 0; i <dataCount; i++){
double[] features = train[i];
//ystem.out.println("Features "+features[i]);
prob.x[i] = new svm_node[features.length-1];
for (int j = 1; j < features.length; j++){
svm_node node = new svm_node();
node.index = j;
node.value = features[j];
prob.x[i][j-1] = node;
}
prob.y[i] = features[0];
}
svm_parameter param = new svm_parameter();
param.probability = 1;
param.gamma = 0.5;
param.nu = 0.5;
param.C = 1;
param.svm_type = svm_parameter.C_SVC;
param.kernel_type = svm_parameter.LINEAR;
param.cache_size = 20000;
param.eps = 0.001;
svm_model model = svm.svm_train(prob, param);
return model;
}
public static double evaluate(double[] features, svm_model model)
{
svm_node[] nodes = new svm_node[features.length-1];
for (int i = 1; i < features.length; i++)
{
svm_node node = new svm_node();
node.index = i;
node.value = features[i];
nodes[i-1] = node;
}
int totalClasses = 2;
int[] labels = new int[totalClasses];
svm.svm_get_labels(model,labels);
double[] prob_estimates = new double[totalClasses];
double v = svm.svm_predict_probability(model, nodes, prob_estimates);
for (int i = 0; i < totalClasses; i++){
System.out.print("(" + labels[i] + ":" + prob_estimates[i] + ")");
}
System.out.println("(Actual:" + features[0] + " Prediction:" + v + ")");
return v;
}
}
I'm not completely sure, but the problem could be due to the fact that you need to mark positive examples with +1 and negative examples with -1.
Otherwise, the libsvm software could asssign an arbitraty class (e.g. 0) to the training vector 1 1 2, since it iterprets the first elem of the feature vector as a feature value (and not the label class).
So try to change the class label 1 in +1 for positive examples (and -1 for negative examples).
Usually, for data format for libsvm is the following:
<label> <index1>:<value1> <index2>:<value2>
where:
label is the class label (e.g. +1/-1)
indexN is the feature Id (i.e. the number that identified a certain feature)
valueN is the feature value (i.e. the value assigned to the specified feature: 0/1 for binary features or 0,1,2,... for categorical features)
An example of the data format accepted by the libsvm tool can be found at this page:
LIBSVM Data: Classification (Binary Class)
There are many datasets that you can explore in order to understand the data format accepted by the libsvm tool.

Understanding multithreading with a for loop Java

I've got some code that I don't think is able to be multithreaded, perhaps I'm wrong. I'd like to make execute this code on a clustered system but I'm unsure of how to scale it for such a deployment.
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.PrintStream;
import java.text.DecimalFormat;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
public class Coord {
public int a,b,c,d,e,f;
public static void main(String[] args) throws IOException {
FileOutputStream out = new FileOutputStream("/Users/evanlivingston/2b.txt");
PrintStream pout = new PrintStream(out);
Scanner sc = new Scanner(new File("/Users/evanlivingston/1.txt"));
List<Coord> coords = new ArrayList<Coord>();{
// for each line in the file
while(sc.hasNextLine()) {
String[] numstrs = sc.nextLine().split("\\s+");
Coord c = new Coord();
c.a = Integer.parseInt(numstrs[1]);
c.b = Integer.parseInt(numstrs[2]);
c.c = Integer.parseInt(numstrs[3]);
c.d = Integer.parseInt(numstrs[4]);
c.e = Integer.parseInt(numstrs[5]);
c.f = Integer.parseInt(numstrs[6]);
coords.add(c);
}
// now you have all coords in memory
{
for(int i=0; i<coords.size(); i++ )
for( int j=0; j<coords.size(); j++)
{
Coord c1 = coords.get(i);
Coord c2 = coords.get(j);
double foo = ((c1.a - c2.a) * (c1.a - c2.a)) *1 ;
double goo = ((c1.b - c2.b) * (c1.b - c2.b)) *1 ;
double hoo = ((c1.c - c2.c) * (c1.c - c2.c)) *2 ;
double joo = ((c1.d - c2.d) * (c1.d - c2.d)) *2 ;
double koo = ((c1.e - c2.e) * (c1.e - c2.e)) *4 ;
double loo = ((c1.f - c2.f) * (c1.f - c2.f)) *4 ;
double zoo = Math.sqrt(foo + goo + hoo + joo + koo + loo);
DecimalFormat df = new DecimalFormat("#.###");
pout.println(i + " " + j + " " + df.format(zoo));
System.out.println(i);
}
pout.flush();
pout.close();
}
}
}
}
I appreciate any help anyone can offer.
Splitting the inner for loop into separate tasks looks like a good candidate for where to make this process multithreaded. Here is one way this could be done with an ExecutorService and Futures
final ExecutorService executor = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
final List<Future<String>> results = new LinkedList<Future<String>>();
// now you have all coords in memory
for (int i = 0; i < coords.size(); i++) {
final int index = i;
final Coord c1 = coords.get(index);
results.add(executor.submit(new Callable<String>() {
public String call() {
final StringBuilder stringBuilder = new StringBuilder();
for (int j = 0; j < coords.size(); j++) {
final Coord c2 = coords.get(j);
final double foo = ((c1.a - c2.a) * (c1.a - c2.a)) * 1;
final double goo = ((c1.b - c2.b) * (c1.b - c2.b)) * 1;
final double hoo = ((c1.c - c2.c) * (c1.c - c2.c)) * 2;
final double joo = ((c1.d - c2.d) * (c1.d - c2.d)) * 2;
final double koo = ((c1.e - c2.e) * (c1.e - c2.e)) * 4;
final double loo = ((c1.f - c2.f) * (c1.f - c2.f)) * 4;
final double zoo = Math.sqrt(foo + goo + hoo + joo + koo + loo);
final DecimalFormat df = new DecimalFormat("#.###");
stringBuilder.append(index + " " + j + " " + df.format(zoo));
System.out.println(index);
}
return stringBuilder.toString();
}
}));
}
for (Future<String> result : results) {
pout.print(result.get());
}
pout.flush();
pout.close();
executor.shutdown();
For clustering, I think Hazelcast offers a good solution that will allow you to define a shared ExecutorService and shared Collections. You would need two flavors of nodes, the single node responsible for all I/O and creating the list of Coords as well as submitting the tasks. And a processing node which simply executes the tasks. That is all my opinion of how I might do it. However, if your dataset is small enough to fit in memory it is likely not worth the effort to split up the processing this much.
It looks very parallelizable to me. Why don't you have threads process one row of data at a time? You could use an AtomicInteger to keep a count of how many rows have been claimed by worker threads. Each thread would do a counter.getAndIncrement to get a row to work on (if it returns coords.size() or higher, the thread should terminate), then do all the math for that row, and repeat.
The printing would be out of order, but you could instead fill some buffers with the results, then quickly print everything at the end.

Categories