Does libsvm work for multi output regression? - java

I have been trying to use jlibSVM
I want to use it for multi output regression.
for example my :
feature set / inputs will be x1,x2,x3
and outputs/target values will be y1,y2
Is this possible using the libSVM library ?
The API docs are not clear and there is not example app showing the use of jlibsvm so I tried to modify the code inside lexecyexec/svm_train.java
The author has originally just created the app to use one output/target value only .
this is seen in this part where the author tries to read the training file :
private void read_problem() throws IOException
{
BufferedReader fp = new BufferedReader(new FileReader(input_file_name));
Vector<Float> vy = new Vector<Float>();
Vector<SparseVector> vx = new Vector<SparseVector>();
int max_index = 0;
while (true)
{
String line = fp.readLine();
if (line == null)
{
break;
}
StringTokenizer st = new StringTokenizer(line, " \t\n\r\f:");
vy.addElement(Float.parseFloat(st.nextToken()));
int m = st.countTokens() / 2;
SparseVector x = new SparseVector(m);
for (int j = 0; j < m; j++)
{
//x[j] = new svm_node();
x.indexes[j] = Integer.parseInt(st.nextToken());
x.values[j] = Float.parseFloat(st.nextToken());
}
if (m > 0)
{
max_index = Math.max(max_index, x.indexes[m - 1]);
}
vx.addElement(x);
}
I tried to modify it so that the vector vy accepts a sparse vector with 2 values.
The program gets executed but the model file seems to be wrong.
Can anyone please verify if they have used jlibsvm for multiple output svm regression???
If yes can someone please explain how they achieved this ??
If no then does someone know of a similar svm implementation in Java ??

The classic SVM algorithm does not support multi dimensional outputs. One way to work around this would be to have a SVM model for each output dimension.

Related

Java read text file with maze and get all possible paths [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
EDIT: I have tried to store the lines character by character into a 2D Array.
However, the problem is to get all possible paths of a maze from 0 to 1 inside of a text file. And the asterisk are the walls or obstacle.
Maze looks like this
8,8
********
*0 *
* *
* ** *
* ** *
* *
* 1*
********
I'm not sure if it's achievable to put it into a Two Dimensional Array string. And do a recursion or dynamic programming afterwards.
Note that the only movements allowed is right and down, also the 0 destination could be somewhere on 2nd, 3rd and so on column. Same as 1 destination as well.
Any tips or suggestions will be appreciated, thank you in advance!
Yep, this is fairly easy to do:
Read the first line of the text file and parse out the dimensions.
Create an array of length n.
For every (blank) item in the array:
Create a new length-n array as the data.
Parse the next line of the text file as individual characters into the array.
After this, you'll have your n x n data structure to complete your game with.
Using a Map to store this File Seems like a good idea.
While I don't think reading a file character by character would be an issue,
BufferedReader br = new BufferedReader(new FileReader(file));
String line = br.readLine();
You have specified the grid dimensions say (n x n)
A Simple way I could visualize is by generating unique keys for every coordinate.
More like a Parser method to store Keys in the Map:
public String parseCoordinate(int x, int y){
return x + "" + y;
}
Map<String, Boolean> gridMap = new HashMap<>();
So when you read file by Characters, you could put parsed coordinates as keys in the map:
gridMap.put(parseCoordinate(lineCount, characterCount), line.charAt(characterCount) == '*');
I'm assuming the only problem you are facing is to decide how to read the file correctly for processing or applying the algorithm to determine the number of unique paths in the given maze.
private static int[][] getMatrixFromFile(File f) throws IOException {
//Read the input file as a list of String lines
List<String> lines = Files.lines(f.toPath())
//.map(line -> line.substring(1 , line.length() - 1))
.collect(Collectors.toList());
//Get the dimensions of the maze from the first line
String[] dimensions = lines.get(0).split("\\*");
//initalize a sub matrix of just the maze dimensions ignoring the walls
int[][] mat = new int[Integer.valueOf(dimensions[0]) - 2 ][Integer.valueOf(dimensions[1]) - 2];
//for each line in the maze excluding the boundaries , if you encounter a * encode as 0 else 1
for( int i = 2 ; i < lines.size() - 1 ; i++) {
String currLine = lines.get(i);
int j = 0;
for(char c : currLine.toCharArray())
mat[i - 2][j] = (c == '*') ? 0 : 1;
}
return mat;
}
With this in place you can now focus on the algorithm for actually traversing the matrix to determine the number of unique paths from top-right to bottom-left.
Having said that , once you have the above matrix you are not limited to traversing just top-right to bottom-left , rather any arbit point in you maze can serve as start and end points.
If you require help with figuring out the number of unique paths , i can edit to include the bit , but Dynamic programming should help in getting the same.
private char[][] maze;
private void read() {
final InputStream inputStream = YourClass.class.getResourceAsStream(INPUT_PATH);
final BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
try {
final String header = reader.readLine();
final String[] tokens = header.split(",");
if (tokens.length < 2) {
throw new RuntimeException("Invalid header"); // Use a dedicated exception
}
final int dimX = parseInt(tokens[0]);
final int dimY = parseInt(tokens[1]);
maze = new char[dimX][dimY];
for (int i = 0; i < dimY; i++) {
final String line = reader.readLine();
maze[i] = line.toCharArray();
}
} catch (final IOException e) {
// handle exception
} finally {
try {
reader.close();
} catch (IOException e) {
// handle exception
}
}
}
Now, some assumptions: I assumed the first line contains the declaration of the maze size, so it will be used to initialize the two dimensional array. The other assumption is that you can make use of a char array, but that's pretty easy to change if you want.
From here you can start working on your path finding algorithm.
By the way, this thing you're trying to implement reminds me a lot of this challenge in the Adventofcode challenge series. There are a lot of people discussing their solutions to the challenge, just have a look in Reddit for instance and you'll find plenty oh tips on how to go on with your little experiment.
Have fun!

How to express the reasoning for Weka instance classification?

Background:
If I open Weka Explorer GUI, train a J48 tree and test using the NSL-KDD training and testing datasets a pruned tree would be produced. Weka Explorer GUI expresses the algorithms reasoning for stating whether something would be classified as an anomaly or not in terms of queries such as src_bytes <= 28.
Screenshot of Weka Explorer GUI showing pruned tree
Question:
Referring to the pruned tree example produced by the Weka Explorer GUI, how can I programmatically have weka express the reasoning for each instance classification in Java?
i.e. Instance A was classified as an anomaly as src_bytes < 28 &&
dst_host_srv_count < 88 && dst_bytes < 3 etc.
So Far I've been able to:
Train and test a J48 tree on the NSL-KDD dataset.
Output a description of the J48 tree within Java.
Return the J48 tree as an if-then statement.
But I simply have no idea how whilst iterating through each instance during the testing phase, to express the reasoning for each classification; without each time manually outputting the J48 tree as an if-then statement and adding numerous println expressing when each was triggered (which I'd really rather not do, as this would dramatically increase the human intervention requirements in the long-term).
Additional Screenshots:
Screenshot of the 'description of the J48 tree within Java'
Screenshot of the 'J48 tree as an if-then statement'
Code:
public class Junction_Tree {
String train_path = "KDDTrain+.arff";
String test_path = "KDDTest+.arff";
double accuracy;
double recall;
double precision;
int correctPredictions;
int incorrectPredictions;
int numAnomaliesDetected;
int numNetworkRecords;
public void run() {
try {
Instances train = DataSource.read(train_path);
Instances test = DataSource.read(test_path);
train.setClassIndex(train.numAttributes() - 1);
test.setClassIndex(test.numAttributes() - 1);
if (!train.equalHeaders(test))
throw new IllegalArgumentException("datasets are not compatible..");
Remove rm = new Remove();
rm.setAttributeIndices("1");
J48 j48 = new J48();
j48.setUnpruned(true);
FilteredClassifier fc = new FilteredClassifier();
fc.setFilter(rm);
fc.setClassifier(j48);
fc.buildClassifier(train);
numAnomaliesDetected = 0;
numNetworkRecords = 0;
int n_ana_p = 0;
int ana_p = 0;
correctPredictions = 0;
incorrectPredictions = 0;
for (int i = 0; i < test.numInstances(); i++) {
double pred = fc.classifyInstance(test.instance(i));
String a = "anomaly";
String actual;
String predicted;
actual = test.classAttribute().value((int) test.instance(i).classValue());
predicted = test.classAttribute().value((int) pred);
if (actual.equalsIgnoreCase(a))
numAnomaliesDetected++;
if (actual.equalsIgnoreCase(predicted))
correctPredictions++;
if (!actual.equalsIgnoreCase(predicted))
incorrectPredictions++;
if (actual.equalsIgnoreCase(a) && predicted.equalsIgnoreCase(a))
ana_p++;
if ((!actual.equalsIgnoreCase(a)) && predicted.equalsIgnoreCase(a))
n_ana_p++;
numNetworkRecords++;
}
accuracy = (correctPredictions * 100) / (correctPredictions + incorrectPredictions);
recall = ana_p * 100 / (numAnomaliesDetected);
precision = ana_p * 100 / (ana_p + n_ana_p);
System.out.println("\n\naccuracy: " + accuracy + ", Correct Predictions: " + correctPredictions
+ ", Incorrect Predictions: " + incorrectPredictions);
writeFile(j48.toSource(J48_if-then.java));
writeFile(j48.toString());
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
Junction_Tree JT1 = new Junction_Tree();
JT1.run();
}
}
I have never used it myself, but according to the WEKA documentation the J48 class includes a getMembershipValues method. This method should return an array that indicates the node membership of an instance. One of the few mentions of this method appears to be in this thread on the WEKA forums.
Other than this, I can't find any information on possible alternatives other than the one you mentioned.

How to train data correctly using libsvm?

I want to use SVM (Support vector machine) in my program, but I could not get the true result.
I want to know that how we must train data for SVM.
What I am doing:
Think that we have 5 document (the numbers are just an example), 3 of them is on first category and others (2 of them) are on second category, I merge the categories to each other (it means that the 3 doc that are in the first category will merge in one document), after that I made a train array like this:
double[][] train = new double[cat1.getDocument().getAttributes().size() + cat2.getDocument().getAttributes().size()][];
and I will fill the array like this:
int i = 0;
Iterator<String> iteraitor = cat1.getDocument().getAttributes().keySet().iterator();
Iterator<String> iteraitor2 = cat2.getDocument().getAttributes().keySet().iterator();
while (i < train.length) {
if (i < cat2.getDocument().getAttributes().size()) {
while (iteraitor2.hasNext()) {
String key = (String) iteraitor2.next();
Long value = cat2.getDocument().getAttributes().get(key);
double[] vals = { 0, value };
train[i] = vals;
i++;
System.out.println(vals[0] + "," + vals[1]);
}
} else {
while (iteraitor.hasNext()) {
String key = (String) iteraitor.next();
Long value = cat1.getDocument().getAttributes().get(key);
double[] vals = { 1, value };
train[i] = vals;
i++;
System.out.println(vals[0] + "," + vals[1]);
}
i++;
}
so I will continue like this to get the model :
svm_problem prob = new svm_problem();
int dataCount = train.length;
prob.y = new double[dataCount];
prob.l = dataCount;
prob.x = new svm_node[dataCount][];
for (int k = 0; k < dataCount; k++) {
double[] features = train[k];
prob.x[k] = new svm_node[features.length - 1];
for (int j = 1; j < features.length; j++) {
svm_node node = new svm_node();
node.index = j;
node.value = features[j];
prob.x[k][j - 1] = node;
}
prob.y[k] = features[0];
}
svm_parameter param = new svm_parameter();
param.probability = 1;
param.gamma = 0.5;
param.nu = 0.5;
param.C = 1;
param.svm_type = svm_parameter.C_SVC;
param.kernel_type = svm_parameter.LINEAR;
param.cache_size = 20000;
param.eps = 0.001;
svm_model model = svm.svm_train(prob, param);
Is this way correct? if not please help me to make it true.
these two answers are true : answer one , answer two,
Even without examining the code one can find conceptual errors:
think that we have 5 document , 3 of them is on first category and others( 2 of them) are on second category , i merge the categories to each other (it means that the 3 doc that are in the first category will merge in one document ) ,after that i made a train array like this
So:
training on the 5 documents won't give any reasonable effects, with any machine learning model... these are statistical models,there is no reasonable statistics in 5 points in R^n, where n~10,000
You do not merge anything. Such approach can work for Naive Bayes, which do not really treat documents as "whole" but rather - as probabilistic dependencies between features and classes. In SVM each document should be separate point in the R^n space, where n can be number of distinct words (for bag of words/set of words representation).
A problem might be that you do not terminate each set of features in a training example with an index of -1 which you should according to the read me...
I.e. if you have one example with two features i think you should do:
Index[0]: 0
Value[0]: 22
Index[1]: 1
Value[1]: 53
Index[2]: -1
Good luck!
Using SVMs to classify text is a common task. You can check out research papers by Joachims [1] regarding SVM text classification.
Basically you have to:
Tokenize your documents
Remove stopwords
Apply stemming technique
Apply feature selection technique (see [2])
Transform your documents using features achieved in 4.) (simple would be binary (0: feature is absent, 1: feature is present) or other measures like TFC)
Train your SVM and be happy :)
[1] T. Joachims: Text Categorization with Support Vector Machines: Learning with Many Relevant Features; Springer: Heidelberg, Germany, 1998, doi:10.1007/BFb0026683.
[2] Y. Yang, J. O. Pedersen: A Comparative Study on Feature Selection in Text Categorization. International Conference on Machine Learning, 1997, 412-420.

How to compute the probability of a multi-class prediction using libsvm?

I'm using libsvm and the documentation leads me to believe that there's a way to output the believed probability of an output classification's accuracy. Is this so? And if so, can anyone provide a clear example of how to do it in code?
Currently, I'm using the Java libraries in the following manner
SvmModel model = Svm.svm_train(problem, parameters);
SvmNode x[] = getAnArrayOfSvmNodesForProblem();
double predictedValue = Svm.svm_predict(model, x);
Given your code-snippet, I'm going to assume you want to use the Java API packaged with libSVM, rather than the more verbose one provided by jlibsvm.
To enable prediction with probability estimates, train a model with the svm_parameter field probability set to 1. Then, just change your code so that it calls the svm method svm_predict_probability rather than svm_predict.
Modifying your snippet, we have:
parameters.probability = 1;
svm_model model = svm.svm_train(problem, parameters);
svm_node x[] = problem.x[0]; // let's try the first data pt in problem
double[] prob_estimates = new double[NUM_LABEL_CLASSES];
svm.svm_predict_probability(model, x, prob_estimates);
It's worth knowing that training with multiclass probability estimates can change the predictions made by the classifier. For more on this, see the question Calculating Nearest Match to Mean/Stddev Pair With LibSVM.
The accepted answer worked like a charm. Make sure to set probability = 1 during training.
If you are trying to drop prediction when the confidence is not met with threshold, here is the code sample:
double confidenceScores[] = new double[model.nr_class];
svm.svm_predict_probability(model, svmVector, confidenceScores);
/*System.out.println("text="+ text);
for (int i = 0; i < model.nr_class; i++) {
System.out.println("i=" + i + ", labelNum:" + model.label[i] + ", name=" + classLoadMap.get(model.label[i]) + ", score="+confidenceScores[i]);
}*/
//finding max confidence;
int maxConfidenceIndex = 0;
double maxConfidence = confidenceScores[maxConfidenceIndex];
for (int i = 1; i < confidenceScores.length; i++) {
if(confidenceScores[i] > maxConfidence){
maxConfidenceIndex = i;
maxConfidence = confidenceScores[i];
}
}
double threshold = 0.3; // set this based data & no. of classes
int labelNum = model.label[maxConfidenceIndex];
// reverse map number to name
String targetClassLabel = classLoadMap.get(labelNum);
LOG.info("classNumber:{}, className:{}; confidence:{}; for text:{}",
labelNum, targetClassLabel, (maxConfidence), text);
if (maxConfidence < threshold ) {
LOG.info("Not enough confidence; threshold={}", threshold);
targetClassLabel = null;
}
return targetClassLabel;

Java Buffered Reader Text File Parsing

I am really struggling with parsing a text file. I have a text file which is in the following format
ID
Float Float
Float Float
.... // variable number of floats
END
ID
Float Float
Float Float
....
END
etc However the ID can represent one of two values, 0 which means it is a new field, or -1 which means it is related to the last new field. The number of times a related field can repeat it self is unlimited. Which is where the problem is occurring.
As I have a method in a library which takes an ArrayList of the new Floats, then an ArrayList of an ArrayList of the related floats.
When I try and code the logic for this I just keep getting deeper and deeper embedded while loops.
I would really appreciate any suggestions as to how I should go about this. Thanks in advance.
Here is the code I have so far.
BufferedReader br = new BufferedReader(new FileReader(buildingsFile));
String[] line = br.readLine().trim().split(" ");
boolean original = true;
while(true)
{
if(line[0].equals("END"))
break;
startCoordinate = new Coordinate(Double.parseDouble(line[0]), Double.parseDouble(line[1]));
while(true)
{
line = br.readLine().trim().split(" ");
if(!line[0].equals("END") && original == true)
polypoints.add(new Coordinate(Double.parseDouble(line[0]), Double.parseDouble(line[1])));
else if(!line[0].equals("END") && original == false)
cutout.add(new Coordinate(Double.parseDouble(line[0]), Double.parseDouble(line[1])));
else if(line[0].equals("END") && original == false)
{
cutouts.add(cutout);
cutout.clear();
}
else if(line[0].equals("-99999"))
original = false;
else if(line[0].equals("0"))
break;
}
buildingDB.addBuilding(mapName, startCoord, polypoints, cutouts);
}
New Code
int i = 0;
BufferedReader br = new BufferedReader(new FileReader(buildingsFile));
String[] line;
while(true)
{
line = br.readLine().trim().split(" ");
if(line[0].equals("END"))
break;
polygons.add(new Polygon(line));
while(true)
{
line = br.readLine().trim().split(" ");
if(line[0].equals("END"))
break;
polygons.get(i).addCoord(new Coordinate(Double.parseDouble(line[0]), Double.parseDouble(line[1])));
}
i++;
}
System.out.println(polygons.size());
int j = 0;
for(i = 0; i< polygons.size(); i++)
{
Building newBuilding = new Building();
if(polygons.get(i).isNew == true)
{
newBuilding = new Building();
newBuilding.startCoord = new Coordinate(polygons.get(i).x, polygons.get(i).y);
}
while(polygons.get(i).isNew == false)
newBuilding.cutouts.add(polygons.get(i).coords);
buildings.add(newBuilding);
}
for(i = 0; i<buildings.size(); i++)
{
System.out.println(i);
buildingDB.addBuilding(mapName, buildings.get(i).startCoord, buildings.get(i).polypoint, buildings.get(i).cutouts);
}
Maybe you should use map for new floats and related floats..if got your question it should help..example:
HashMap hm = new HashMap();
hm.put("Rohit", new Double(3434.34));
I assume that a "field" means an ID and a variable number of coordinates (pairs of floats), that, judging from your code, represents a polygon in fact.
I would first load all the polygons, each into a separate Polygon object:
class Polygon {
boolean isNew;
List<Coordinate> coordinates;
}
and store the polygons in another list. Then in a 2nd pass go through all the polygons to group them according to their IDs into something like
class Building {
Polygon polygon;
List<Polygon> cutouts;
}
I think this would be fairly simple to code.
OTOH if you have a huge amount of data in the file, and/or you prefer processing the read data little by little, you could simply read a polygon and all its associated cutouts, until you find the next polygon (ID of 0), at which point you could simply pass the stuff read so far to the building DB and start reading the next polygon.
You can try using ANTLR here, The Grammar defines the format of the text you are expecting and then you can wrap the contents in a Java object. The * and + Wildcards will resolve the complexity of while and for. Its very simple and easy to use, you dont have to construct AST you can take the parsed content from java objects directly. But the only overhead is you have to add the ANTLR.jar to your path.

Categories