Evaluate the class of a sample using WEKA - java

I have created a model in Weka using the SMO algorithm. I am trying to evaluate a test sample using the mentioned model to classify it in my two-class problem. I am a bit confused on how to evaluate the sample using Weka Smo code. I have built an empty arff file which contains only the meta-data of the file. I calculate the sample features and I add the vector in arff file. I have created the following function Evaluate in order to evaluate a sample. File template.arff is the template which contains the meta-data of a arff file and models/smo my model.
public static void Evaluate(ArrayList<Float> temp) throws Exception {
temp.add(Float.parseFloat("1"));
System.out.println(temp.size());
double dt[] = new double[temp.size()];
for (int index = 0; index < temp.size(); index++) {
dt[index] = temp.get(index);
}
double data[][] = new double[1][];
data[0] = dt;
weka.classifiers.Classifier c = loadModel(new File("models/"), "/smo"); // loads smo model
File tmp = new File("template.arff"); //loads data template
Instances dataset = new weka.core.converters.ConverterUtils.DataSource(tmp.getAbsolutePath()).getDataSet();
int numInstances = data.length;
for (int inst = 0; inst < numInstances; inst++) {
dataset.add(new Instance(1.0, data[inst]));
}
dataset.setClassIndex(dataset.numAttributes() - 1);
Evaluation eval = new Evaluation(dataset);
//returned evaluated index
double a = eval.evaluateModelOnceAndRecordPrediction(c, dataset.instance(0));
double arr[] = c.distributionForInstance(dataset.instance(0));
System.out.println(" Confidence Scores");
for (int idx = 0; idx < arr.length; idx++) {
System.out.print(arr[idx] + " ");
}
System.out.println();
}
I am not sure if I am right here. I create the sample file. Afterwards I am loading my model. I am wandering if my code is what I need in order to evaluate the class of sample temp. If this code is ok, how can I extract the confidence score and not the binary decision about the class? The structure of template.arff file is:
#relation Dataset
#attribute Attribute0 numeric
#attribute Attribute1 numeric
#attribute Attribute2 numeric
...
#ATTRIBUTE class {1, 2}
#data
Moreover loadModel function is the following:
public static SMO loadModel(File path, String name) throws Exception {
SMO classifier;
FileInputStream fis = new FileInputStream(path + name + ".model");
ObjectInputStream ois = new ObjectInputStream(fis);
classifier = (SMO) ois.readObject();
ois.close();
return classifier;
}
I found this post here which suggest to locate the SMO.java file and change the following line smo.buildClassifier(train, cl1, cl2, true, -1, -1); // from false to true.
However it seems when I did so, I got the same binary output.
My training function:
public void weka_train(File input, String[] options) throws Exception {
long start = System.nanoTime();
File tmp = new File("data.arff");
TwitterTrendSetters obj = new TwitterTrendSetters();
Instances data = new weka.core.converters.ConverterUtils.DataSource(
tmp.getAbsolutePath()).getDataSet();
data.setClassIndex(data.numAttributes() - 1);
Classifier c = null;
String ctype = null;
boolean newmodel = false;
ctype = "SMO";
c = new SMO();
for (int i = 0; i < options.length; i++) {
System.out.print(options[i]);
}
c.setOptions(options);
c.buildClassifier(data);
newmodel = true;
if (newmodel) {
obj.saveModel(c, ctype, new File("models"));
}
}

I have some suggestions but I have no idea whether they will work. Let me know if this works for you.
First use SMO not just the parent object Classifier class. I created a new method loadModelSMO as an example of this.
SMO Class
public static SMO loadModelSMO(File path, String name) throws Exception {
SMO classifier;
FileInputStream fis = new FileInputStream(path + name + ".model");
ObjectInputStream ois = new ObjectInputStream(fis);
classifier = (SMO) ois.readObject();
ois.close();
return classifier;
}
and then
SMO c = loadModelSMO(new File("models/"), "/smo");
...
I found a article that might help you out from the mailing list subject titled
I used SMO with logistic regression but I always get a confidence of 1.0
It suggest to set use the -M to fit your logistics model which can be used through the method
setOptions(java.lang.String[] options)
Also maybe you need to set your build logistics model to true
Confidence score in SMO
c.setBuildLogisticModels(true);
Let me know if this helped at all.

Basically you should try to use the option "-M" for SMO to fit logistic models, in training process. Check the solution proposed here. It should work!

Related

Load Naïve Bayes model in java code using weka jar

I have used weka and made a Naive Bayes classifier, by using weka GUI. Then I have saved this model by following this tutorial. Now I want to load this model through Java code but I am unable to find any way to load a saved model using weka.
This is my requirement that I have to made model separately and then use it in a separate program.
If anyone can guide me in this regard I will be thankful to you.
You can easily load a saved model in java using this command:
Classifier myCls = (Classifier) weka.core.SerializationHelper.read(pathToModel);
For a complete workflow in Java I wrote the following article in SO Documentation, now copied here:
Text Classification in Weka
Text Classification with LibLinear
Create training instances from .arff file
private static Instances getDataFromFile(String path) throws Exception{
DataSource source = new DataSource(path);
Instances data = source.getDataSet();
if (data.classIndex() == -1){
data.setClassIndex(data.numAttributes()-1);
//last attribute as class index
}
return data;
}
Instances trainingData = getDataFromFile(pathToArffFile);
Use StringToWordVector to transform your string attributes to number representation:
Important features of this filter:
tf-idf representation
stemming
lowercase words
stopwords
n-gram representation*
StringToWordVector() filter = new StringToWordVector();
filter.setWordsToKeep(1000000);
if(useIdf){
filter.setIDFTransform(true);
}
filter.setTFTransform(true);
filter.setLowerCaseTokens(true);
filter.setOutputWordCounts(true);
filter.setMinTermFreq(minTermFreq);
filter.setNormalizeDocLength(new SelectedTag(StringToWordVector.FILTER_NORMALIZE_ALL,StringToWordVector.TAGS_FILTER));
NGramTokenizer t = new NGramTokenizer();
t.setNGramMaxSize(maxGrams);
t.setNGramMinSize(minGrams);
filter.setTokenizer(t);
WordsFromFile stopwords = new WordsFromFile();
stopwords.setStopwords(new File("data/stopwords/stopwords.txt"));
filter.setStopwordsHandler(stopwords);
if (useStemmer){
Stemmer s = new /*Iterated*/LovinsStemmer();
filter.setStemmer(s);
}
filter.setInputFormat(trainingData);
Apply the filter to trainingData: trainingData = Filter.useFilter(trainingData, filter);
Create the LibLinear Classifier
SVMType 0 below corresponds to the L2-regularized logistic regression
Set setProbabilityEstimates(true) to print the output probabilities
Classifier cls = null;
LibLINEAR liblinear = new LibLINEAR();
liblinear.setSVMType(new SelectedTag(0, LibLINEAR.TAGS_SVMTYPE));
liblinear.setProbabilityEstimates(true);
// liblinear.setBias(1); // default value
cls = liblinear;
cls.buildClassifier(trainingData);
Save model
System.out.println("Saving the model...");
ObjectOutputStream oos;
oos = new ObjectOutputStream(new FileOutputStream(path+"mymodel.model"));
oos.writeObject(cls);
oos.flush();
oos.close();
Create testing instances from .arff file
Instances trainingData = getDataFromFile(pathToArffFile);
Load classifier
Classifier myCls = (Classifier) weka.core.SerializationHelper.read(path+"mymodel.model");
Use the same StringToWordVector filter as above or create a new one for testingData, but remember to use the trainingData for this command:filter.setInputFormat(trainingData); This will make training and testing instances compatible.
Alternatively you could use InputMappedClassifier
Apply the filter to testingData: testingData = Filter.useFilter(testingData, filter);
Classify!
1.Get the class value for every instance in the testing set
for (int j = 0; j < testingData.numInstances(); j++) {
double res = myCls.classifyInstance(testingData.get(j));
}
res is a double value that corresponds to the nominal class that is defined in .arff file. To get the nominal class use : testintData.classAttribute().value((int)res)
2.Get the probability distribution for every instance
for (int j = 0; j < testingData.numInstances(); j++) {
double[] dist = first.distributionForInstance(testInstances.get(j));
}
dist is a double array that contains the probabilities for every class defined in .arff file
Note. Classifier should support probability distributions and enable them with: myClassifier.setProbabilityEstimates(true);

weka.core.UnassignedClassException: Class index is negative (not set)!

I try to implement linear regression over an csv file. Here is the content of the csv file:
X1;X2;X3;X4;X5;X6;X7;X8;Y1;Y2;
0.98;514.50;294.00;110.25;7.00;2;0.00;0;15.55;21.33;
0.98;514.50;294.00;110.25;7.00;3;0.00;0;15.55;21.33;
0.98;514.50;294.00;110.25;7.00;4;0.00;0;15.55;21.33;
0.98;514.50;294.00;110.25;7.00;5;0.00;0;15.55;21.33;
0.90;563.50;318.50;122.50;7.00;2;0.00;0;20.84;28.28;
0.90;563.50;318.50;122.50;7.00;3;0.00;0;21.46;25.38;
0.90;563.50;318.50;122.50;7.00;4;0.00;0;20.71;25.16;
0.90;563.50;318.50;122.50;7.00;5;0.00;0;19.68;29.60;
0.86;588.00;294.00;147.00;7.00;2;0.00;0;19.50;27.30;
0.86;588.00;294.00;147.00;7.00;3;0.00;0;19.95;21.97;
0.86;588.00;294.00;147.00;7.00;4;0.00;0;19.34;23.49;
0.86;588.00;294.00;147.00;7.00;5;0.00;0;18.31;27.87;
0.82;612.50;318.50;147.00;7.00;2;0.00;0;17.05;23.77;
...
0.71;710.50;269.50;220.50;3.50;2;0.40;5;12.43;15.59;
0.71;710.50;269.50;220.50;3.50;3;0.40;5;12.63;14.58;
0.71;710.50;269.50;220.50;3.50;4;0.40;5;12.76;15.33;
0.71;710.50;269.50;220.50;3.50;5;0.40;5;12.42;15.31;
0.69;735.00;294.00;220.50;3.50;2;0.40;5;14.12;16.63;
0.69;735.00;294.00;220.50;3.50;3;0.40;5;14.28;15.87;
0.69;735.00;294.00;220.50;3.50;4;0.40;5;14.37;16.54;
0.69;735.00;294.00;220.50;3.50;5;0.40;5;14.21;16.74;
0.66;759.50;318.50;220.50;3.50;2;0.40;5;14.96;17.64;
0.66;759.50;318.50;220.50;3.50;3;0.40;5;14.92;17.79;
0.66;759.50;318.50;220.50;3.50;4;0.40;5;14.92;17.55;
0.66;759.50;318.50;220.50;3.50;5;0.40;5;15.16;18.06;
0.64;784.00;343.00;220.50;3.50;2;0.40;5;17.69;20.82;
0.64;784.00;343.00;220.50;3.50;3;0.40;5;18.19;20.21;
0.64;784.00;343.00;220.50;3.50;4;0.40;5;18.16;20.71;
0.64;784.00;343.00;220.50;3.50;5;0.40;5;17.88;21.40;
0.62;808.50;367.50;220.50;3.50;2;0.40;5;16.54;16.88;
0.62;808.50;367.50;220.50;3.50;3;0.40;5;16.44;17.11;
0.62;808.50;367.50;220.50;3.50;4;0.40;5;16.48;16.61;
0.62;808.50;367.50;220.50;3.50;5;0.40;5;16.64;16.03;
I read this csv file and implement linear regression implementation. Here is the source code in java:
public static void main(String[] args) throws IOException
{
String csvFile = null;
CSVLoader loader = null;
Remove remove =null;
Instances data =null;
LinearRegression model = null;
int numberofFeatures = 0;
try
{
csvFile = "C:\\Users\\Taha\\Desktop/ENB2012_data.csv";
loader = new CSVLoader();
// load CSV
loader.setSource(new File(csvFile));
data = loader.getDataSet();
//System.out.println(data);
numberofFeatures = data.numAttributes();
System.out.println("number of features: " + numberofFeatures);
data.setClassIndex(data.numAttributes() - 2);
//remove last attribute Y2
remove = new Remove();
remove.setOptions(new String[]{"-R", data.numAttributes()+""});
remove.setInputFormat(data);
data = Filter.useFilter(data, remove);
// data.setClassIndex(data.numAttributes() - 2);
model = new LinearRegression();
model.buildClassifier(data);
System.out.println(model);
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
I am getting an error, weka.core.UnassignedClassException: Class index is negative (not set)! at the line model.buildClassifier(data); Number of features is 1, however, it is expected to be 9.They are X1;X2;X3;X4;X5;X6;X7;X8;Y1;Y2 What am I missing?
Thanks in advance.
You can add after the line data=loader.getDataSet(), the next lines which will resolve your exception:
if (data.classIndex() == -1) {
System.out.println("reset index...");
instances.setClassIndex(data.numAttributes() - 1);
}
This worked for me.
Since I can not find any solution to that problem, I decided to position data into Oracle database and I read data from Oracle. There is an import utility in Oracle Sql Developer and I used it. That solves my problem. I write this article for people who has the same problem.
Here is the detailed information about connecting an Oracle database for weka.
http://tahasozgen.blogspot.com.tr/2016/10/connection-to-oracle-database-in-weka.html

How to write and retrieve objects arraylist to file?

I have an object arraylist, can someone please help me by telling me the most efficient way to write AND retrieve an object from file?
Thanks.
My attempt
public static void LOLadd(String ab, String cd, int ef) throws IOException {
MyShelf newS = new MyShelf();
newS.Fbooks = ab;
newS.Bbooks = cd;
newS.Cbooks = ef;
InfoList.add(newS);
FileWriter fw;
fw = new FileWriter("UserInfo.out.txt");
PrintWriter outt = new PrintWriter(eh);
for (int i = 0; i <InfoList.size(); i++)
{
String ax = InfoList.get(i).Fbooks;
String ay = InfoList.get(i).Bbooks;
int az = InfoList.get(i).Cbooks;
output.print(ax + " " + ay + " " + az); //Output all the words to file // on the same line
output.println(""); //Make space
}
fw.close();
output.close();
}
My attempt to retrieve file. Also, after retrieving file, how can I read each column of Objects?? For example, if I have ::::: Fictions, Dramas, Plays --- How can I read, get, replace, delete, and add values to Dramas column?
public Object findUsername(String a) throws FileNotFoundException, IOException, ClassNotFoundException
{
ObjectInputStream sc = new ObjectInputStream(new FileInputStream("myShelf.out.txt"));
//ArrayList<Object> List = new ArrayList<Object>();
InfoList = null;
Object obj = (Object) sc.readObject();
InfoList.add((UserInfo) obj);
sc.close();
for (int i=0; i <InfoList.size(); i++) {
if (InfoList.get(i).user.equals(a)){
return "something" + InfoList.get(i);
}
}
return "doesn't exist";
}
public static String lbooksMatching(String b) {
//Retrieve data from file
//...
for (int i=0; i<myShelf.size(); i++) {
if (myShelf.get(i).user.equals (b))
{
return b;
}
else
{
return "dfbgregd";
}
}
return "dfgdfge";
}
public static String matching(String qp) {
for (int i=0; i<myShelf.size(); i++) {
if (myShelf.get(i).pass.equals (qp))
{
return c;
}
else
{
return "Begegne";
}
}
return "Bdfge";
}
Thanks!!!
It seems like you want to serialize an object and persist that serialized form to some kind of storage (in this case a file).
2 important remarks here :
Serialization
Internal java serialization
Java provides automatic serialization which requires that the object be marked by implementing the java.io.Serializable interface. Implementing the interface marks the class as "okay to serialize," and Java then handles serialization internally.
See this post for a code sample on how to serialize /
deserialize an object to/from bytes.
This might nog always be the ideal way to persist an object, as you have no control over the format (handled by java), it's not human readable, and you can versioning issues if your objects change.
Marshalling to JSON or XML
A better way to seralize an object to disk is to use another data format like XML or JSON.
A sample on how to convert an object to/from a JSON structure can be found here.
Important : I would not do the kind of serialization in code like you're doing unless there is a very good reason (that I don't see here). It quickly becomes messy and is subject to change when your objects change. I would opt for a more automated way of serializing. Also, when using a format like JSON / XML, you know that there are tons of APIs available to read/write to that format, so all of that serialization / deserialization logic doesn't need to be implemented by you anymore.
Persistence
Writing your serialized object to a file isn't always a good idea for various reasons (no versioning / concurrency issues / .....).
A better approach is to use a database. If it's a hierarchical database, take a look at Hibernate or JPA to persist your objects with very little code.
If it's a document database like MongoDB, you can persist your JSON serialized representation.
There are tons of resources available on persisting objects to databases in Java. I would suggest checking out JPA, the the standard API for persistence and object/relational mapping .
Here is another basic example, which will give you insight into Arraylist,constructor and writing output to file:
After running this, if you are using IDE go to project folder, there you will file *.txt file.
import java.io.*;
import java.util.List;
import java.util.ArrayList;
import java.util.logging.Level;
import java.util.logging.Logger;
public class ListOfNumbers {
private List<Integer> list;
private static final int SIZE = 10;//whatever size you wish
public ListOfNumbers () {
list = new ArrayList<Integer>(SIZE);
for (int i = 0; i < SIZE; i++) {
list.add(new Integer(i));
}
}
public void writeList() {
PrintWriter out = null;
try {
out = new PrintWriter(new FileWriter("ManOutFile.txt"));
for (int i = 0; i < SIZE; i++) {
out.println("Value at: " + i + " = " + list.get(i));
}
out.close();
} catch (IOException ex) {
Logger.getLogger(ListOfNumbers.class.getName()).log(Level.SEVERE, null, ex);
} finally {
out.close();
}
}
public static void main(String[] args)
{
ListOfNumbers lnum=new ListOfNumbers();
lnum.writeList();
}
}

How to test a Weka Text Classification (FilteredClassifier)

Looked at lots of examples for this, and so far no luck. I'd like to classify free text.
Configure a text classifier. (FilteredClassifier using StringToWordVector and LibSVM)
Train the classifier (add in lots of documents, train on filtered text)
Serialize the FilteredClassifier to disk, quit the app
Then later
Load up the serialized FilteredClassifier
Classify stuff!
It goes ok up to when I try to read from disk and classify things. All the documents and examples show the training list and testing list being built at the same time, and in my case, I'm trying to build a testing list after the fact.
A FilteredClassifier alone is not enough to create a testing Instance with the same "dictionary" as the original training set, so how do I save everything I need to classify at a later date?
http://weka.wikispaces.com/Use+WEKA+in+your+Java+code just says "Instances loaded from somewhere" and doesn't say anything about using a similar dictionary.
ClassifierFramework cf = new WekaSVM();
if (!cf.isTrained()) {
train(cf); // Train, save to disk
cf = new WekaSVM(); // reloads from file
}
cf.test("this is a test");
Ends up throwing
java.lang.ArrayIndexOutOfBoundsException: 2
at weka.core.DenseInstance.value(DenseInstance.java:332)
at weka.filters.unsupervised.attribute.StringToWordVector.convertInstancewoDocNorm(StringToWordVector.java:1587)
at weka.filters.unsupervised.attribute.StringToWordVector.input(StringToWordVector.java:688)
at weka.classifiers.meta.FilteredClassifier.filterInstance(FilteredClassifier.java:465)
at weka.classifiers.meta.FilteredClassifier.distributionForInstance(FilteredClassifier.java:495)
at weka.classifiers.AbstractClassifier.classifyInstance(AbstractClassifier.java:70)
at ratchetclassify.lab.WekaSVM.test(WekaSVM.java:125)
Serialize your Instances which holds the definition of the trained data -similar dictionary?- while you are serializing your classifier:
Instances trainInstances = ... //
Instances trainHeader = new Instances(trainInstances, 0);
trainHeader.setClassIndex(trainInstances .classIndex());
OutputStream os = new FileOutputStream(fileName);
ObjectOutputStream objectOutputStream = new ObjectOutputStream(os);
objectOutputStream.writeObject(classifier);
if (trainHeader != null)
objectOutputStream.writeObject(trainHeader);
objectOutputStream.flush();
objectOutputStream.close();
To desialize:
Classifier classifier = null;
Instances trainHeader = null;
InputStream is = new BufferedInputStream(new FileInputStream(fileName));
ObjectInputStream objectInputStream = new ObjectInputStream(is);
classifier = (Classifier) objectInputStream.readObject();
try { // see if we can load the header
trainHeader = (Instances) objectInputStream.readObject();
} catch (Exception e) {
}
objectInputStream.close();
Use trainHeader to create new Instance:
int numAttributes = trainHeader.numAttributes();
double[] vals = new double[numAttributes];
for (int i = 0; i < numAttributes - 1; i++) {
Attribute attribute = trainHeader.attribute(i);
//If your attribute is nominal or string:
double value = attribute.indexOfValue(myStrVal); //get myStrVal from your source
//If your attribute is numeric
double value = myNumericVal; //get myNumericVal from your source
vals[i] = value;
}
vals[numAttributes] = Instance.missingValue();
Instance instance = new Instance(1.0, vals);
instance.setDataset(trainHeader);
return instance;

Read From Binary File on Android

I have some data that I have saved into a file using Matlab. I have saved this data in Matlab as follows:
fwrite(fid,numImg2,'integer*4');
fwrite(fid,y,'integer*4');
fwrite(fid,imgName,'char*1');
fwrite(fid,a,'integer*4');
fwrite(fid,img.imageData,'double');
I read this data back into Matlab using the following code
fread(fid,1,'integer*4');// Returns numImg2
fread(fid,1,'integer*4');// Returns y which is the number of cha rectors in the image name, i use in the following line to read the image name, say for example if the image name is 1.jpg, then using the following will return the image name
fread(fid,5,'char*1');
fread(fid,1);
etc...
I want to be able to read this data on an android phone. This is the code I have at the moment.
DataInputStream ds = new DataInputStream(new FileInputStream(imageFile));
//String line;
// Read the first byte to find out how many images are stored in the file.
int x = 0;
byte numberOfImages;
int numImages = 0;
while(x<1)
{
numberOfImages = ds.readByte();
numImages = (int)numberOfImages;
x++;
}
int lengthName = 0;
String imgName = "";
for(int y=1; y<=numImages; y++)
{
lengthName = ds.readInt();
byte[] nameBuffer = new byte[lengthName];
char[] name = new char[lengthName];
for(int z = 1; z<=5;z++)
{
nameBuffer[z-1] = ds.readByte();
//name[z-1] = ds.readChar();
}
imgName = new String(nameBuffer);
//imgName = name.toString();
}
text.append(imgName);
I cannot seem to retrieve the image name as a string from the binary file data. Any help is much appreciated.
I'm not sure it will work but anyway:
byte[] nameBuffer = new byte[lengthName];
if(ds.read(nameBuffer) != lengthName) {
// error handling here
return;
}
imgName = new String(nameBuffer, "ISO-8859-1");

Categories