I want to make a list of all the predictions.
I have this code :
//Get File
BufferedReader reader = new BufferedReader(new FileReader(PATH + "TempArffFile.arff"));
//Get the data
Instances data = new Instances(reader);
reader.close();
//Setting class attribute
data.setClassIndex(data.numAttributes() - 1);
//Make tree
J48 tree = new J48();
String[] options = new String[1];
options[0] = "-U";
tree.setOptions(options);
tree.buildClassifier(data);
//Print tree
System.out.println(tree);
It works fine i can see the tree printed , but dont know how to work with that from here.
I want to make a list for each root how can i do that?
If you would like a list of all the testing predictions, you could use the following code (sample code provided here):
import weka.core.Instances;
import weka.classifiers.Evaluation;
import weka.classifiers.trees.J48;
...
Instances train = ... // from somewhere
Instances test = ... // from somewhere
// train classifier
Classifier cls = new J48();
cls.buildClassifier(train);
// evaluate classifier and print some statistics
Evaluation eval = new Evaluation(train);
eval.evaluateModel(cls, test);
System.out.println(eval.toSummaryString("\nResults\n======\n", false));
You could also use J48.classifyInstance() to predict a single instance, if you prefer to go that way.
Related
I have used weka and made a Naive Bayes classifier, by using weka GUI. Then I have saved this model by following this tutorial. Now I want to load this model through Java code but I am unable to find any way to load a saved model using weka.
This is my requirement that I have to made model separately and then use it in a separate program.
If anyone can guide me in this regard I will be thankful to you.
You can easily load a saved model in java using this command:
Classifier myCls = (Classifier) weka.core.SerializationHelper.read(pathToModel);
For a complete workflow in Java I wrote the following article in SO Documentation, now copied here:
Text Classification in Weka
Text Classification with LibLinear
Create training instances from .arff file
private static Instances getDataFromFile(String path) throws Exception{
DataSource source = new DataSource(path);
Instances data = source.getDataSet();
if (data.classIndex() == -1){
data.setClassIndex(data.numAttributes()-1);
//last attribute as class index
}
return data;
}
Instances trainingData = getDataFromFile(pathToArffFile);
Use StringToWordVector to transform your string attributes to number representation:
Important features of this filter:
tf-idf representation
stemming
lowercase words
stopwords
n-gram representation*
StringToWordVector() filter = new StringToWordVector();
filter.setWordsToKeep(1000000);
if(useIdf){
filter.setIDFTransform(true);
}
filter.setTFTransform(true);
filter.setLowerCaseTokens(true);
filter.setOutputWordCounts(true);
filter.setMinTermFreq(minTermFreq);
filter.setNormalizeDocLength(new SelectedTag(StringToWordVector.FILTER_NORMALIZE_ALL,StringToWordVector.TAGS_FILTER));
NGramTokenizer t = new NGramTokenizer();
t.setNGramMaxSize(maxGrams);
t.setNGramMinSize(minGrams);
filter.setTokenizer(t);
WordsFromFile stopwords = new WordsFromFile();
stopwords.setStopwords(new File("data/stopwords/stopwords.txt"));
filter.setStopwordsHandler(stopwords);
if (useStemmer){
Stemmer s = new /*Iterated*/LovinsStemmer();
filter.setStemmer(s);
}
filter.setInputFormat(trainingData);
Apply the filter to trainingData: trainingData = Filter.useFilter(trainingData, filter);
Create the LibLinear Classifier
SVMType 0 below corresponds to the L2-regularized logistic regression
Set setProbabilityEstimates(true) to print the output probabilities
Classifier cls = null;
LibLINEAR liblinear = new LibLINEAR();
liblinear.setSVMType(new SelectedTag(0, LibLINEAR.TAGS_SVMTYPE));
liblinear.setProbabilityEstimates(true);
// liblinear.setBias(1); // default value
cls = liblinear;
cls.buildClassifier(trainingData);
Save model
System.out.println("Saving the model...");
ObjectOutputStream oos;
oos = new ObjectOutputStream(new FileOutputStream(path+"mymodel.model"));
oos.writeObject(cls);
oos.flush();
oos.close();
Create testing instances from .arff file
Instances trainingData = getDataFromFile(pathToArffFile);
Load classifier
Classifier myCls = (Classifier) weka.core.SerializationHelper.read(path+"mymodel.model");
Use the same StringToWordVector filter as above or create a new one for testingData, but remember to use the trainingData for this command:filter.setInputFormat(trainingData); This will make training and testing instances compatible.
Alternatively you could use InputMappedClassifier
Apply the filter to testingData: testingData = Filter.useFilter(testingData, filter);
Classify!
1.Get the class value for every instance in the testing set
for (int j = 0; j < testingData.numInstances(); j++) {
double res = myCls.classifyInstance(testingData.get(j));
}
res is a double value that corresponds to the nominal class that is defined in .arff file. To get the nominal class use : testintData.classAttribute().value((int)res)
2.Get the probability distribution for every instance
for (int j = 0; j < testingData.numInstances(); j++) {
double[] dist = first.distributionForInstance(testInstances.get(j));
}
dist is a double array that contains the probabilities for every class defined in .arff file
Note. Classifier should support probability distributions and enable them with: myClassifier.setProbabilityEstimates(true);
I try to implement a document classifier with Mallet in Java. I already have a file that essential contains feature values. So I don't want to run through an entire raw text processing pipeline.
A line in my feature file looks like this at the moment (2 features, ID and NrOfToken, document label is "A")
ID=3 NrofTokens=279.0 A
I try to read in this file and put it into a classifier like this:
Pipe instancePipe = new SerialPipes(new Pipe[] {
new CharSequence2TokenSequence(),
new TokenSequence2FeatureSequence(),
new Target2Label(),
});
InstanceList trainData = new InstanceList(instancePipe);
InstanceList testData = new InstanceList(instancePipe);
Reader trainFileReader = new InputStreamReader(new FileInputStream(fileTrain), "UTF-8");
trainData.addThruPipe(new LineGroupIterator(trainFileReader, Pattern.compile("^\\s*$"), true));
Reader testFileReader = new InputStreamReader(new FileInputStream(fileTest), "UTF-8");
testData.addThruPipe(new LineGroupIterator(testFileReader, Pattern.compile("^\\s*$"), true));
// Create a classifier trainer, and use it to create a classifier
#SuppressWarnings("rawtypes")
ClassifierTrainer naiveBayesTrainer = new NaiveBayesTrainer();
Classifier classifier = naiveBayesTrainer.train(trainData);
At the moment I get this exception:
java.lang.IllegalArgumentException: Alphabets don't match: Instance: [6, null], InstanceList: [6, 0]
at cc.mallet.types.InstanceList.add(InstanceList.java:335)
at cc.mallet.types.InstanceList.addThruPipe(InstanceList.java:267)
at
Anyone an idea why the Alphabet is breaking?
This is not really an answer, but I found the exceptions in Mallet not very informative so far. I also got this error, changing my regex that parses the data lines and removing an empty line at the end made it go away.
i.e. the regex in this part
CsvIterator reader = new CsvIterator(new FileReader(tempTrainPath), "(\\w+)\\s+(\\S+)\\s+(.*)", 3, 2, 1);
testInstances.addThruPipe(reader);
At the end of a whole day of debugging, I was too annoyed to try out which of the two was the actual culprit. But maybe this info helps other people.
I had the same error when trying to evaluate a classifier from the command line. Adding --use-pipe-from train_input.mallet option as described at https://mallet-dev.cs.umass.narkive.com/NFtumW1r/mallet-2-0-7-ge-maxent-alphabets-don-t-match solved the problem.
I have followed example code on weka websites for example but i still keep on getting an error of unable to find the class label
weka.core.WekaException: weka.classifiers.bayes.NaiveBayesMultinomialUpdateable: Not enough training instances with class labels (required: 1, provided: 0)!
I have tried this file with the weka explorer and it works fine.
ArffLoader loader = new ArffLoader();
loader.setFile(new File(""));//file is valid
Instances structure = loader.getStructure();
structure.setClassIndex(structure.numAttributes() - 1);
// train NaiveBayes
NaiveBayesMultinomialUpdateable n = new NaiveBayesMultinomialUpdateable();
FilteredClassifier f = new FilteredClassifier();
StringToWordVector s = new StringToWordVector();
f.setFilter(s);
f.setClassifier(n);
f.buildClassifier(structure);
Instance current;
while ((current = loader.getNextInstance(structure)) != null)
n.updateClassifier(current);
// output generated model
System.out.println(n);
The problem lies in where the class index is at position 0
ArffLoader loader = new ArffLoader();
loader.setFile(new File(""));//file is valid
Instances structure = loader.getStructure();
structure.setClassIndex(0);
// train NaiveBayes
NaiveBayesMultinomialUpdateable n = new NaiveBayesMultinomialUpdateable();
FilteredClassifier f = new FilteredClassifier();
StringToWordVector s = new StringToWordVector();
f.setFilter(s);
f.setClassifier(n);
f.buildClassifier(structure);
Instance current;
while ((current = loader.getNextInstance(structure)) != null)
n.updateClassifier(current);
// output generated model
System.out.println(n);
I am creating a prototype for my thesis using the model generated/ trained model in weka. My thesis is about emotion analysis on text. Now I have the test data/set that I want to classify using the model/trained model.
this is my partial code that reads arff file and have a filter (stringToWordVector):
Classify ct = new Classify(TextJ48.model); // loads model
string sample = getARFFile();
StringBuilder buffer = new StringBuilder(sample);
BufferedReader reader = new BufferedReader(new java.io.StringReader(buffer.ToString()));
weka.core.converters.ArffLoader.ArffReader arff = new weka.core.converters.ArffLoader.ArffReader(reader);
Instances dataRaw = arff.getData();
StringToWordVector filter = new StringToWordVector();
filter.setInputFormat(dataRaw);
Instances dataFiltered = Filter.useFilter(dataRaw, filter);
When I show the dataFilteredit successfully filtered from words to numeric.
this is the classify class:
public Classify(string filename)
{
try
{
classifier = (Classifier)weka.core.SerializationHelper.read(filename);
}
catch (java.lang.Exception ex)
{
lblProgress.Text = ex.getMessage();
}
loadAttributes();
this.fileName = filename;
}
I don't know what to do in loadAttributes() My plan is to add all attributes in FastVector,I saw in some sources they adds attributes easily because they have a fixed sized attributes, but in my case I have different number of attributes that are based from the text.
Now how do I classify the text that I input using the model.
Looked at lots of examples for this, and so far no luck. I'd like to classify free text.
Configure a text classifier. (FilteredClassifier using StringToWordVector and LibSVM)
Train the classifier (add in lots of documents, train on filtered text)
Serialize the FilteredClassifier to disk, quit the app
Then later
Load up the serialized FilteredClassifier
Classify stuff!
It goes ok up to when I try to read from disk and classify things. All the documents and examples show the training list and testing list being built at the same time, and in my case, I'm trying to build a testing list after the fact.
A FilteredClassifier alone is not enough to create a testing Instance with the same "dictionary" as the original training set, so how do I save everything I need to classify at a later date?
http://weka.wikispaces.com/Use+WEKA+in+your+Java+code just says "Instances loaded from somewhere" and doesn't say anything about using a similar dictionary.
ClassifierFramework cf = new WekaSVM();
if (!cf.isTrained()) {
train(cf); // Train, save to disk
cf = new WekaSVM(); // reloads from file
}
cf.test("this is a test");
Ends up throwing
java.lang.ArrayIndexOutOfBoundsException: 2
at weka.core.DenseInstance.value(DenseInstance.java:332)
at weka.filters.unsupervised.attribute.StringToWordVector.convertInstancewoDocNorm(StringToWordVector.java:1587)
at weka.filters.unsupervised.attribute.StringToWordVector.input(StringToWordVector.java:688)
at weka.classifiers.meta.FilteredClassifier.filterInstance(FilteredClassifier.java:465)
at weka.classifiers.meta.FilteredClassifier.distributionForInstance(FilteredClassifier.java:495)
at weka.classifiers.AbstractClassifier.classifyInstance(AbstractClassifier.java:70)
at ratchetclassify.lab.WekaSVM.test(WekaSVM.java:125)
Serialize your Instances which holds the definition of the trained data -similar dictionary?- while you are serializing your classifier:
Instances trainInstances = ... //
Instances trainHeader = new Instances(trainInstances, 0);
trainHeader.setClassIndex(trainInstances .classIndex());
OutputStream os = new FileOutputStream(fileName);
ObjectOutputStream objectOutputStream = new ObjectOutputStream(os);
objectOutputStream.writeObject(classifier);
if (trainHeader != null)
objectOutputStream.writeObject(trainHeader);
objectOutputStream.flush();
objectOutputStream.close();
To desialize:
Classifier classifier = null;
Instances trainHeader = null;
InputStream is = new BufferedInputStream(new FileInputStream(fileName));
ObjectInputStream objectInputStream = new ObjectInputStream(is);
classifier = (Classifier) objectInputStream.readObject();
try { // see if we can load the header
trainHeader = (Instances) objectInputStream.readObject();
} catch (Exception e) {
}
objectInputStream.close();
Use trainHeader to create new Instance:
int numAttributes = trainHeader.numAttributes();
double[] vals = new double[numAttributes];
for (int i = 0; i < numAttributes - 1; i++) {
Attribute attribute = trainHeader.attribute(i);
//If your attribute is nominal or string:
double value = attribute.indexOfValue(myStrVal); //get myStrVal from your source
//If your attribute is numeric
double value = myNumericVal; //get myNumericVal from your source
vals[i] = value;
}
vals[numAttributes] = Instance.missingValue();
Instance instance = new Instance(1.0, vals);
instance.setDataset(trainHeader);
return instance;