How can I predict unlabeled data with .arff file in Weka? - java

I have the training and test "labeled.arff" files. Then I build a classifier and write to a "modelFile.model" file.
I have a "unlabeled.arff" file with the last attribute in each row "?".
How can I make the prediction in Java or C#?
I have some code but it is not right, always gives me the same prediction.
Thank you
// Write to Model
public static void Classify()
{
Instances train = new Instances(new java.io.FileReader(dirTrain + "labeled.arff"));
Instances test = new Instances(new java.io.FileReader(dirTest + "labeled.arff"));
train.setClassIndex(train.numAttributes() - 1);
test.setClassIndex(test.numAttributes() - 1);
// train Classifier
Classifier cl = new J48();
// Randomize the order of the instances in the dataset
weka.filters.Filter myRandom = new weka.filters.unsupervised.instance.Randomize();
myRandom.setInputFormat(train);
train = weka.filters.Filter.useFilter(train, myRandom);
// Build the classifier
cl.buildClassifier(train);
// evaluate classifier and print some statistics
Evaluation eval = new Evaluation(train);
eval.evaluateModel(cl, test);
Console.WriteLine(eval.toSummaryString("\nResults Decision Tree\n======\n", false));
SerializationHelper.write(dirModel + "modelFile.model", cl);
}
// Make predictions
public void Predictions()
{
Classifier cl = (Classifier)SerializationHelper.read(dirModel + "modelFile.model");
// load unlabeled data
Instances unlabeled = new Instances(new java.io.FileReader(pathFeatures + "unlabeled.arff"));
// set class attribute
unlabeled.setClassIndex(unlabeled.numAttributes() - 1);
// create copy
Instances labeled = new Instances(unlabeled);
// label instances
for (int i = 0; i < unlabeled.numInstances(); i++)
{
double clsLabel = cl.classifyInstance(unlabeled.instance(i));
labeled.instance(i).setClassValue(clsLabel);
}
int numCorrect = 0;
for (int i = 0; i < unlabeled.numInstances(); i++)
{
double pred = cl.classifyInstance(unlabeled.instance(i));
Console.Write("ID: " + unlabeled.instance(i).value(i));
//Console.Write(", actual: " + unlabeled.classAttribute().value((int)unlabeled.instance(i).classValue()));
Console.WriteLine(", predicted: " + unlabeled.classAttribute().value((int)pred));
}
Console.WriteLine("Correct predictions: " + numCorrect);
}

Related

"Invalid classification data: expect label value"

I'm trying to train a model using deep learning in java, when I start training the train data it gives an error
Invalid classification data: expect label value (at label index column = 0) to be in range 0 to 1 inclusive (0 to numClasses-1, with numClasses=2); got label value of 2
I didn't understand the error since I am a beginner in deep learning 4j. I am using a data set which views relationship between two people (if there is a relationship between two people then the class label is going to be 1 otherwise 0).
The Java code
public class SNA {
private static Logger log = LoggerFactory.getLogger(SNA.class);
public static void main(String[] args) throws Exception {
int seed = 123;
double learningRate = 0.01;
int batchSize = 50;
int nEpochs = 30;
int numInputs = 2;
int numOutputs = 2;
int numHiddenNodes = 20;
//load the training data
RecordReader rr = new CSVRecordReader(0,",");
rr.initialize(new FileSplit(new File("C:\\Users\\GTS\\Desktop\\SNA project\\experiments\\First experiment\\train\\slashdotTrain.csv")));
DataSetIterator trainIter = new RecordReaderDataSetIterator(rr, batchSize,0, 2);
// load test data
RecordReader rrTest = new CSVRecordReader();
rr.initialize(new FileSplit(new File("C:\\Users\\GTS\\Desktop\\SNA project\\experiments\\First experiment\\test\\slashdotTest.csv")));
DataSetIterator testIter = new RecordReaderDataSetIterator(rrTest, batchSize,0, 2);
log.info("**** Building Model ****");
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.iterations(1)
.learningRate(learningRate)
.updater(Updater.NESTEROVS).momentum(0.9)
.list()
.layer(0, new DenseLayer.Builder()
.nIn(numInputs)
.nOut(numHiddenNodes)
.activation("relu")
.weightInit(WeightInit.XAVIER)
.build())
.layer(1, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
.activation("softmax")
.weightInit(WeightInit.XAVIER)
.nIn(numHiddenNodes)
.nOut(numOutputs)
.build())
.pretrain(false).backprop(true)
.build();
MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();
// Listener to show how the network is training in the log
model.setListeners(new ScoreIterationListener(10));
log.info(" **** Train Model **** ");
for (int i = 0; i < nEpochs; i++) {
model.fit(trainIter);
}
System.out.println("**** Evaluate Model ****");
Evaluation evaluation = new Evaluation(numOutputs);
while (testIter.hasNext()) {
DataSet t = testIter.next();
INDArray feature = t.getFeatureMatrix();
INDArray labels = t.getLabels();
INDArray predicted = model.output(feature, false);
evaluation.eval(labels, predicted);
}
System.out.println(evaluation.stats());
}
}
Any help Please?
Thanks A lot
problem solved:
Change the third parameter of RecordReaderDataSetIterator in
DataSetIterator testIter = new RecordReaderDataSetIterator(rrTest, batchSize,0, 2); from 0 to 2; because the data set has three columns and the index of the class label is 2 because its the third column.
solution:
DataSetIterator trainIter = new RecordReaderDataSetIterator(rr, batchSize,2, 2);
refrences:
enter link description here

Learning curve from a weka experiment java

I am trying to get a learning curve for an automated weka experiment. I currently have the following java code.
public static void EvaluateModel(AbstractClassifier cl, String datapath, String outfile) throws Exception {
Experiment exp = new Experiment();
ClassifierSplitEvaluator se = new ClassifierSplitEvaluator();
se.setClassifier(cl);
Classifier sec = ((ClassifierSplitEvaluator) se).getClassifier();
CrossValidationResultProducer cvrp = new CrossValidationResultProducer();
cvrp.setNumFolds(10);
cvrp.setSplitEvaluator(se);
PropertyNode[] propertyPath = new PropertyNode[2];
try {
propertyPath[0] = new PropertyNode(
se,
new PropertyDescriptor("splitEvaluator",
CrossValidationResultProducer.class),
CrossValidationResultProducer.class);
propertyPath[1] = new PropertyNode(sec,
new PropertyDescriptor("classifier", se.getClass()),
se.getClass());
} catch (IntrospectionException e) {
e.printStackTrace();
}
exp.setResultProducer(cvrp);
exp.setPropertyPath(propertyPath);
exp.setPropertyArray(new Classifier[]{cl});
DefaultListModel model = new DefaultListModel();
model.addElement(new File(datapath));
exp.setDatasets(model);
InstancesResultListener irl = new InstancesResultListener();
irl.setOutputFile(new File(outfile));
exp.setResultListener(irl);
System.out.println("Initializing...");
exp.initialize();
System.out.println("Running...");
exp.runExperiment();
System.out.println("Finishing...");
exp.postProcess();
System.out.println("Evaluating...");
PairedTTester tester = new PairedCorrectedTTester();
FileReader reader = new FileReader(irl.getOutputFile());
Instances result = new Instances(reader);
tester.setInstances(result);
tester.setSortColumn(-1);
tester.setRunColumn(result.attribute("Key_Run").index());
tester.setFoldColumn(result.attribute("Key_Fold").index());
tester.setDatasetKeyColumns(
new Range(
""
+ (result.attribute("Key_Dataset").index() + 1)));
tester.setResultsetKeyColumns(
new Range(
""
+ (result.attribute("Key_Scheme").index() + 1)
+ ","
+ (result.attribute("Key_Scheme_options").index() + 1)
+ ","
+ (result.attribute("Key_Scheme_version_ID").index() + 1)));
tester.setResultMatrix(new ResultMatrixPlainText());
tester.setDisplayedResultsets(null);
tester.setSignificanceLevel(0.05);
tester.setShowStdDevs(true);
// fill result matrix (but discarding the output)
tester.multiResultsetFull(0, result.attribute("Percent_correct").index());
// output results for reach dataset
System.out.println("\nResult:");
ResultMatrix matrix = tester.getResultMatrix();
for (int i = 0; i < matrix.getColCount(); i++) {
System.out.println(matrix.getColName(i));
System.out.println(" Perc. correct: " + matrix.getMean(i, 0));
System.out.println(" StdDev: " + matrix.getStdDev(i, 0));
}
}
What I would like to do is either save or display the learning curve in this method. I cannot find info for how to do this programmatically.

ClassifyInstance() Method error

For the code below, the classifyInstance() line gives an error:
Exception in thread "main" java.lang.NullPointerException
at weka.classifiers.functions.LinearRegression.classifyInstance(LinearRegression.java:272)
at LR.main(LR.java:45)
I tried to debug but no success. How can I use my saved model to predict the class attribute of my test file? The problem is based on the regression.
for (int i = 0; i < unlabeled.numInstances(); i++) {
double clsLabel = cls.classifyInstance(unlabeled.instance(i));
labeled.instance(i).setClassValue(clsLabel);
System.out.println(clsLabel + " -> " + unlabeled.classAttribute().value((int) clsLabel));
}
This is the actual code:
public class LR{
public static void main(String[] args) throws Exception
{
BufferedReader datafile = new BufferedReader(new FileReader("C:\\dataset.arff"));
Instances data = new Instances(datafile);
data.setClassIndex(data.numAttributes()-1); //setting class attribute
datafile.close();
LinearRegression lr = new LinearRegression(); //build model
int folds=10;
Evaluation eval = new Evaluation(data);
eval.crossValidateModel(lr, data, folds, new Random(1));
System.out.println(eval.toSummaryString());
//save the model
weka.core.SerializationHelper.write("C:\\lr.model", lr);
//load the model
Classifier cls = (Classifier)weka.core.SerializationHelper.read("C:\\lr.model");
Instances unlabeled = new Instances(new BufferedReader(new FileReader("C:\\testfile.arff")));
// set class attribute
unlabeled.setClassIndex(unlabeled.numAttributes() - 1);
// create copy
Instances labeled = new Instances(unlabeled);
double clsLabel;
// label instances
for (int i = 0; i < unlabeled.numInstances(); i++)
{
clsLabel = cls.classifyInstance(unlabeled.instance(i));
labeled.instance(i).setClassValue(clsLabel);
System.out.println(clsLabel + " -> " + unlabeled.classAttribute().value((int) clsLabel));
}
// save labeled data
BufferedWriter writer = new BufferedWriter(new FileWriter("C:\\final.arff"));
writer.write(labeled.toString());
writer.newLine();
writer.flush();
writer.close();
}
}
Did you train your classifier?
Looks to me like you are trying to classify, without having trained your classifier.

creating a model using naive bayes

Iam using lingpipe tool for naive bayes algorithm.I trained it using my trained data and it successfullu tests my test data. But each time I runs the algorithm each time it trains. I don't want to train it each time instead I want to build a model to which I can apply the test data.
public class ClassifyNews {
private static File TRAINING_DIR= new File("train");
private static File TESTING_DIR= new File("test");
private static String[] CATEGORIES
= { "c1",
"c2",
"c3"};
private static int NGRAM_SIZE = 6;
public static void main(String[] args)throws ClassNotFoundException, IOException
{
DynamicLMClassifier<NGramProcessLM> classifier
=DynamicLMClassifier.createNGramProcess(CATEGORIES,NGRAM_SIZE);
for(int i=0; i<CATEGORIES.length; ++i)
{
File classDir = new File(TRAINING_DIR,CATEGORIES[i]);
if (!classDir.isDirectory())
{
String msg = "Could not find training directory="+ classDir
+ "\nTraining directory not found";
System.out.println(msg);
throw new IllegalArgumentException(msg);
}
String[] trainingFiles = classDir.list();
for (int j = 0; j < trainingFiles.length; ++j)
{
File file = new File(classDir,trainingFiles[j]);
String text = Files.readFromFile(file,"ISO-8859-1");
System.out.println("Training on " + CATEGORIES[i] + "/" + trainingFiles[j]);
Classification classification= new Classification(CATEGORIES[i]);
Classified<CharSequence> classified= new Classified<CharSequence>(text,classification);
classifier.handle(classified);}
}
System.out.println("Compiling");
JointClassifier<CharSequence> compiledClassifier
= (JointClassifier<CharSequence>)
AbstractExternalizable.compile(classifier);
boolean storeCategories = true;
JointClassifierEvaluator<CharSequence> evaluator =
new JointClassifierEvaluator
<CharSequence> (compiledClassifier,CATEGORIES,storeCategories);
for(int i = 0; i < CATEGORIES.length; ++i)
{
File classDir = new File(TESTING_DIR,CATEGORIES[i]);
String[] testingFiles = classDir.list();
for (int j=0; j<testingFiles.length; ++j)
{
String text= Files.readFromFile(new File(classDir,testingFiles[j]),"ISO-8859-1");
System.out.print("\nTesting on " + CATEGORIES[i] + "/" + testingFiles[j] + " ");
Classification classification= new Classification(CATEGORIES[i]);
Classified<CharSequence> classified= new Classified<CharSequence>(text,classification);
evaluator.handle(classified);
JointClassification jc =compiledClassifier.classify(text);
String bestCategory = jc.bestCategory();
String details = jc.toString();
System.out.println("\tGot best category of: " + bestCategory);
System.out.println(jc.toString());
}}
}
}

Java Reading in Images Through a Loop

I'm reading a couple images with the exact same name but numbered from 1-6 so I've used and array to read in the images, for example AstroWalkLeft1, AstroWalkLeft2 into arimgAstroWalkleft[]. This is what I have:
public void GetImages() {
imgMonster = new ImageIcon("Assets\\MonsterSingle.png").getImage();
for (int i = 1; i <= nASTROIMGMAX; i++) {
arimgAstroWalkLeft[i] = new ImageIcon("Assets\\AstroWalkLeft" + i + ".png").getImage();
arimgAstroWalkRight[i] = new ImageIcon("Assets\\AstroWalkRight" + i + ".png").getImage();
}
imgAstroStandLeft = new ImageIcon("Assets\\AstroStandLeft.png").getImage();
imgAstroStandRight = new ImageIcon("Assets\\AstroStandRight.png").getImage();
imgBackground1 = new ImageIcon("Assets\\Hallway.png").getImage();
imgBackground2 = new ImageIcon("Assets\\Observation Room.png").getImage();
}
My problem is replacing the numbers in the image's name to the variable in my loop. I'm wonder how I put that variable in where the number once was.

Categories