Apache Spark Decision Tree Predictions

Apache Spark Decision Tree Predictions - java

I have the following code for classification using decision trees. I need to get the predictions of the test dataset into a java array and print them. Can someone help me to extend this code for that. I need to have an a 2D array of predicted label and actual label and print the predicted labels.
public class DecisionTreeClass {
public static void main(String args[]){
SparkConf sparkConf = new SparkConf().setAppName("DecisionTreeClass").setMaster("local[2]");
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
// Load and parse the data file.
String datapath = "/home/thamali/Desktop/tlib.txt";
JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(jsc.sc(), datapath).toJavaRDD();//A training example used in supervised learning is called a “labeled point” in MLlib.
// Split the data into training and test sets (30% held out for testing)
JavaRDD<LabeledPoint>[] splits = data.randomSplit(new double[]{0.7, 0.3});
JavaRDD<LabeledPoint> trainingData = splits[0];
JavaRDD<LabeledPoint> testData = splits[1];
// Set parameters.
// Empty categoricalFeaturesInfo indicates all features are continuous.
Integer numClasses = 12;
Map<Integer, Integer> categoricalFeaturesInfo = new HashMap();
String impurity = "gini";
Integer maxDepth = 5;
Integer maxBins = 32;
// Train a DecisionTree model for classification.
final DecisionTreeModel model = DecisionTree.trainClassifier(trainingData, numClasses,
categoricalFeaturesInfo, impurity, maxDepth, maxBins);
// Evaluate model on test instances and compute test error
JavaPairRDD<Double, Double> predictionAndLabel =
testData.mapToPair(new PairFunction<LabeledPoint, Double, Double>() {
#Override
public Tuple2<Double, Double> call(LabeledPoint p) {
return new Tuple2(model.predict(p.features()), p.label());
}
});
Double testErr =
1.0 * predictionAndLabel.filter(new Function<Tuple2<Double, Double>, Boolean>() {
#Override
public Boolean call(Tuple2<Double, Double> pl) {
return !pl._1().equals(pl._2());
}
}).count() / testData.count();
System.out.println("Test Error: " + testErr);
System.out.println("Learned classification tree model:\n" + model.toDebugString());
}
}

You basically have exactly that with the prediction and label variable. If you really needed a list of a 2d double arrays, you could change the method that you use to:
JavaRDD<double[]> valuesAndPreds = testData.map(point -> new double[]{model.predict(point.features()), point.label()});
and run collect on that reference for a list of 2d double arrays.
List<double[]> values = valuesAndPreds.collect();
I would take a look at the docs here: https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html . You can also change the data to get additional statical performance measurements of your model with classes like MulticlassMetrics. This requires changing the mapToPair function to a map function and changing the generics to an object. So something like:
JavaRDD<Tuple2<Object, Object>> valuesAndPreds = testData().map(point -> new Tuple2<>(model.predict(point.features()), point.label()));
Then running:
MulticlassMetrics multiclassMetrics = new MulticlassMetrics(JavaRDD.toRDD(valuesAndPreds));
All of this stuff is very well documented in Spark's MLLib documentation. Also, you mentioned needing to print the results. If this is homework, I will let you figure out that part, since it would be a good exercise to learn how to do that from a list.
Edit:
ALSO, noticed that you are using java 7, and what I have is from java 8. To answer your main question in how to turn into a 2d double array, you would do:
JavaRDD<double[]> valuesAndPreds = testData.map(new org.apache.spark.api.java.function.Function<LabeledPoint, double[]>() {
#Override
public double[] call(LabeledPoint point) {
return new double[]{model.predict(point.features()), point.label()};
}
});
Then run collect, to get a list of two doubles. Also, to give a hint on the printing part, take a look at the java.util.Arrays toString implementation.

Related

stream of objects containing multiple properties question

I am trying to solve the following:
Given a list of Data objects try, in a 'one shot' like operation, stream the list, such that the end result will be a generic object or a data object where each prop get its own sum/max/min:
class Data {
int prop1;
int prop2;
...
// constructor
// getters and setters
}
For example, given a list of 2 Data objects as follows:
List<Data> list = Arrays.asList(new Data(1,2), new Data(3,4));
If I apply max to the first property and sum to the second one the result is an object with prop1=3 and prop2=6 or Data(3,6)
Thanks for helping!

I am trying, in a 'one shot' like operation
You are looking for the Teeing Collector introduced in Java 12. Given a list of Data, where a Data class is something like:
#AllArgsConstructor
#Getter
#ToString
public class Data {
int prop1;
int prop2;
}
and a list:
List<Data> data = List.of(
new Data(1,10),
new Data(2,20),
new Data(3,30),
new Data(4,40)
);
the end result will be an generic object or a data object
if I apply max to the first prop and sum to the second
You can use Collectors.teeing to get a new Data object with the result of your operations
Data result =
data.stream().collect(Collectors.teeing(
Collectors.reducing(BinaryOperator.maxBy(Comparator.comparing(Data::getProp1))),
Collectors.summingInt(Data::getProp2),
(optionalMax, sum) -> new Data(optionalMax.get().getProp1(), sum)
));
Or something else, for example a Map<String,Integer>
Map<String,Integer> myMap =
data.stream().collect(Collectors.teeing(
Collectors.reducing(BinaryOperator.maxBy(Comparator.comparing(Data::getProp1))),
Collectors.summingInt(Data::getProp2),
(optionalMax, sum) -> {
HashMap<String, Integer> map = new HashMap();
map.put("max_prop1", optionalMax.get().getProp1());
map.put("sum_prop2", sum);
return map;
}
));

EDIT
After reading your comments I better understood what you meant and need. If your final goal is to create a Data which holds the result of multiple computations, then the stream operation teeing is still a viable solution.
However, since the operation teeing accepts only 2 downstreams and a merger BiFunction to merge their results, you need to nest your teeing calls to include in the first downstream one of the operations you need to perform; while in the second downstream another teeing call. Basically, every second downstream of each nested call uses a teeing operation until you're left with only two computations. Then, the merger function of every outer call takes the first downstream's result and the nested call's result, merges them together, and creates a new Data object with them.
Here is an example with a hypothetical Data class whose properties represent: min value, max value, average and sum:
#lombok.Data
class Data {
private #NonNull int prop1Min;
private #NonNull int prop2Max;
private #NonNull int prop3Avg;
private #NonNull int prop4Sum;
}
public class Main {
public static void main(String[] args) {
List<Data> data = List.of(
new Data(1, 10, 100, 1000),
new Data(2, 20, 200, 2000),
new Data(3, 30, 300, 3000),
new Data(4, 40, 400, 4000)
);
Data result = data.stream().collect(Collectors.teeing(
Collectors.minBy(Comparator.comparing(Data::getProp1Min)),
Collectors.teeing(
Collectors.maxBy(Comparator.comparing(Data::getProp2Max)),
Collectors.teeing(
Collectors.averagingInt(Data::getProp3Avg),
Collectors.summingInt(Data::getProp4Sum),
(avg, count) -> new Data(0, 0, avg.intValue(), count.intValue())),
(max, d) -> new Data(0, max.get().getProp2Max(), d.getProp3Avg(), d.getProp4Sum())
),
(min, d) -> new Data(min.get().getProp1Min(), d.getProp2Max(), d.getProp3Avg(), d.getProp4Sum())
));
System.out.println(result);
}
}
Output
Data(prop1Min=1, prop2Max=40, prop3Avg=250, prop4Sum=10000)
Previous Answer
It sounds like you're trying to retrieve some statistics from a stream of elements.
I am trying [...] to stream the list, such that the end result will be a generic object or a data object where each prop get its own sum/max/min etc.
For this purpose, there is already the IntSummaryStatistics class which includes a set of statistics gathered from a set of int elements. To obtain this information, you just need to stream your elements and invoke the collect operation by supplying Collectors.summarizingInt(); this will return the statistics of your elements. Moreover, Java also provides LongSummaryStatistics and DoubleSummaryStatistics to retrieve statistics of long and double types.
List<Integer> list = new ArrayList<>(List.of(0, 1, 2, 3, 4, 5, 6, 7, 8, 9));
IntSummaryStatistics stats = list.stream()
.collect(Collectors.summarizingInt(Integer::intValue));
System.out.println("Count: " + stats.getCount());
System.out.println("Sum: " + stats.getSum());
System.out.println("Min Value: " + stats.getMin());
System.out.println("Max Value: " + stats.getMax());
System.out.println("Average: " + stats.getAverage());
//In case Data had not been designed in place of IntSummaryStatistics and it's an actual needed class,
//then you could set up the properties you need from the IntSummaryStatistics
Data d = new Data();
d.setMinProp(stats.getMin());
d.setMaxProp(stats.getMax());
d.setSumProp(stats.getSum());
//--------- Data class ---------
class Data {
private int minProp, maxProp, sumProp;
//... rest of the implementation ...
public void setMinProp(int minProp) {
this.minProp = minProp;
}
public void setMaxProp(int maxProp) {
this.maxProp = maxProp;
}
public void setSumProp(int sumProp) {
this.sumProp = sumProp;
}
}
Output
Count: 10
Sum: 45
Min Value: 0
Max Value: 9
Average: 4.5

Well, if they're only reducing functions, then you could utilize reduce as well:
Data someNewData = someData.stream()
.reduce((Data l, Data r) -> {
int a = l.prop1() + r.prop1(); // Find sum of prop1
int b = Math.max(l.prop2(), r.prop2()); // Find max value of prop2
int c = Math.min(l.prop3(), r.prop3()); // Find min value of prop3
return new Data(a, b, c);
})
.orElseThrow();

Given a Map with values of type string, which are all numbers separated by commas, would it be possible to transform each value into a double?

I recently asked about converting Json using Gson into something I can sort values into, and the best option was using a Linked HashMap.
List<String> stringList = Arrays.asList(tm.split(" |,")); // split into pair key : value
Map<String, List<String>> mapString = new LinkedHashMap<>();
stringList.forEach(s1 -> {
String[] splitedStrings = s1.split(": "); //split into key : value
String key = splitedStrings[0].replaceAll("[^A-Za-z0-9]",""); // remove non alphanumeric from key, like {
String value = splitedStrings[1];
if (mapString.get(key) == null) {
List<String> values = new ArrayList<>();
values.add(value);
mapString.put(key, values);
}else if (mapString.get(key) != null) {
mapString.get(key).add(value);
}
});
When this code is run, a map with keys for frequency, magnitude, and other attributes of my data is created. This is the original Json Message compared to the resulting map value for the same set of data (Formatted to make it easier to understand and look better)
{"groupId":"id3_x_","timestamp":1.591712740507E9,"tones":
[{"frequency":1.074,"level":3.455,"bw":0.34,"snr":3.94,"median":0.877},
{"frequency":14.453,"level":2.656,"bw":0.391,"snr":2.324,"median":1.143},
{"frequency":24.902,"level":0.269,"bw":0.282,"snr":2.216,"median":0.121},
{"frequency":22.607,"level":0.375,"bw":0.424,"snr":2.034,"median":0.184},
{"frequency":9.863,"level":2.642,"bw":0.423,"snr":1.92,"median":1.376}]}
To Map values:
Message Received
Group ID: id3_x_
Message Topic: pi7/digest/tonals
Time of Arrival: 1.591712740507E9
---------------DATA---------------
Frequency: [1.07, 14.45, 24.90, 22.61, 9.86]
Magnitude: [3.46, 2.66, 0.27, 0.38, 2.64]
Bandwidth: [0.34, 0.39, 0.28, 0.42, 0.42]
SNR: [3.94, 2.32, 2.22, 2.03, 1.92]
Median: [0.88, 1.14, 0.12, 0.18, 1.38]]
While this is very useful for analyzing the data, the information stored is a string. What I would like to be able to do is transform each of the values in the map (Example: Frequency 1.07, 14.45, etc.) into doubles that i can then run through additional programs and run calculations with, such as an average. I have looked around online and havnt found anything that I am looking for, so im wondering if there would be a way to transform these strings into doubles using either an array, list, or any other means.
I am an intern for a tech company so I am still trying to hammer in Java and describing what I am talking about, so if there is any questions about what I am asking, please let me know and thanks in advance!

You could get a Map from the JSON file , you can also extract the values array from the Map yourmap.getvalues() , then you can parse each on of these element and case it into double
Example : Frequency: [1.07, 14.45, 24.90, 22.61, 9.86]
for ( String f : Frequency ) {
double f_double = Double.parse(f); // turns String into double
}

You can do this with another class that will store duplicate attribute values in arrays. You can simply get them through a.getValues (). This is just a concept and you should extend it as it will be convenient for you.
import java.util.*;
public class Main {
public static void main(String[] args) {
Map<String, List<Attribute>> map = new LinkedHashMap<>();
List<Attribute> attributes = new ArrayList<>();
attributes.add(new Attribute("frequency", 3.46, 5.11, 6.12));
attributes.add(new Attribute("magnitude", 3.46, 10.22, 10.54));
//and so on
map.put("idString1", attributes);
//printing double values
for (String key : map.keySet()) {
for (Attribute a : map.get(key)) {
System.out.println(a.getName() + " " +Arrays.toString(a.getValues()));
//a.getValues() to get all of doubles
}
}
}
private static class Attribute {
private String name;
private double[] values;
Attribute(String name, double... values) {
this.name = name;
this.values = values;
}
String getName() {
return name;
}
double[] getValues() {
return values;
}
}
}
The result will be:
frequency [3.46, 5.11, 6.12]
magnitude [3.46, 10.22, 10.54]

Your question:
I would like to be able to do is transform each of the String values in the
map (Example: Frequency 1.07, 14.45, etc.) into doubles and run calculations with, such as an average.
Yes, it is possible to transform your String array in a double array using Stream like below:
String[] frequencies = { "1.07", "14.45", "24.90", "22.61", "9.86" };
double[] arr = Stream.of(frequencies)
.mapToDouble(Double::parseDouble)
.toArray();
If you use the DoubleSummaryStatistics class you have already available ops like avg, sum :
String[] frequencies = { "1.07", "14.45", "24.90", "22.61", "9.86" };
DoubleSummaryStatistics statistics = Stream.of(frequencies)
.mapToDouble(Double::parseDouble)
.summaryStatistics();
System.out.println(statistics.getAverage()); //<-- 14.578
System.out.println(statistics.getMax()); //<-- 24.9

Iterate over a HashMap with multiple values per key

I am currently learning sets and maps through university (still using Java 7).
They have given us a half finished to-do list app to complete. Currently the to-do list takes three String local variables to allow the user to state a job (aJob), a time to do it (aTime) and a date to do it (aDate).
The app also has an instance variable (today) that holds todays date.
I need to come up with a way to check the HashMap for any tasks that are due today. So I need to be able to query just the HashMap values attributed by the aDate local variable.
I know that to iterate Maps that I can place the keys or the values into a Set and then iterate over the set - not a problem. But if I use the values() method (within the Map class) to put these into a set - it places all three Strings per key into the set. I just want to move the aDate values into a set.
Any ideas?
I only seem to be able to find examples where the Maps have just a single Key and Single Value. This list has a single key and three values per key.
Any pointers would be good?
Kind Regards
Edit.....
Just thought I would add some code to help as there have been several different approaches - which I am all very greatful for. But not sure if they suit my needs....
The Job Class is constructed as such...
public Job(String aJob, String aDate, String aTime)
{
Job = aJob;
date = aDate;
time = aTime;
}
I then create the map within the instance declarations for the To Do List class....
Map<Integer, Job> toDoList = new HashMap<>();
So I need to know the best way to iterate over this map, but it is only the Job attribute 'aDate' that is possibly going to hold the value I am after.
Not sure if that helps at all?
Kind Regards

If really the only structure you're allowed to use is a Map where each key has 3 values (which is the case if I understand correctly), of which only one is a Date, you technically could do the following:
map.values()
.stream()
.filter(Date.class::isInstance)
...whatever else you want to do
The other suggested solutions are far better though, design wise.

If you can't use a custom class, as suggested by Toisen, maybe HashMap<String, HashMap<String, ArrayList<String>>> could do the trick for you.
I've added a sample of how to use it (as well as populating it with some random data)
public class FunkyMap {
private HashMap<String, HashMap<String, ArrayList<String>>> jobs;
// For random data
private String[] job = {"EAT", "SLEEP", "FART", "RELAX", "WORK"};
private String[] time = {"MORNING", "BEFORENOON", "NOON", "AFTERNOON", "EVENING", "MIDNIGHT"};
private String[] date = {"FIRST", "SECOND", "THIRD", "FOURTH"};
public FunkyMap() {
jobs = new HashMap<>();
// To populate some random data
Random r = new Random();
for(int i = 0; i < 20; i++) {
String d = date[r.nextInt(date.length)];
if(jobs.containsKey(d)) {
HashMap<String, ArrayList<String>> inner = jobs.get(d);
String t = time[r.nextInt(time.length)];
if(inner.containsKey(t)) {
inner.get(t).add(job[r.nextInt(job.length)]);
} else {
List<String> s = Arrays.asList(new String(job[r.nextInt(job.length)]));
inner.put(t, new ArrayList<String>(s));
}
} else {
jobs.put(d, new HashMap<String, ArrayList<String>>());
}
}
// Actual iteration over date => time => jobs
Iterator<String> i = jobs.keySet().iterator();
while(i.hasNext()) {
String iKey = i.next();
HashMap<String, ArrayList<String>> inner = jobs.get(iKey);
System.out.println("Jobs scheduled for " + iKey);
Iterator<String> j = inner.keySet().iterator();
while(j.hasNext()) {
String jKey = j.next();
ArrayList<String> actualJobs = inner.get(jKey);
System.out.println("\tAt " + jKey);
for(String s : actualJobs) {
System.out.println("\t\tDo " + s);
}
}
}
}
public static void main(String[] args) {
new FunkyMap();
}
}
I took the liberty to assume that dates were unique, and time was unique per date, while a time could hold any number of jobs including duplicates. If the last assumption with jobs is not true, you could swap ArrayList<String> with Set<String>.

Just create a class that holds all data that you need. E.g.
If you need something strange like Map<String, Tuple<String, Integer, Date>> just make a new class that holds the Tuple:
class TupleHolder {
private String firstValue;
private Integer secondValue;
private Date thirdValue;
// get/set here...
}
and use it: Map<String, TupleHolder>

How to sort a Map by Key and Value, whereas the Val is a Map/List itself

I am having a hard time understanding the right syntax to sort Maps which values aren't simply one type, but can be nested again.
I'll try to come up with a fitting example here:
Let's make a random class for that first:
class NestedFoo{
int valA;
int valB;
String textA;
public NestedFoo(int a, int b, String t){
this.valA = a;
this.valB = b;
this.textA = t;
}
}
Alright, that is our class.
Here comes the list:
HashMap<Integer, ArrayList<NestedFoo>> sortmePlz = new HashMap<>();
Let's create 3 entries to start with, that should show sorting works already.
ArrayList<NestedFoo> l1 = new ArrayList<>();
n1 = new NestedFoo(3,2,"a");
n2 = new NestedFoo(2,2,"a");
n3 = new NestedFoo(1,4,"c");
l1.add(n1);
l1.add(n2);
l1.add(n3);
ArrayList<NestedFoo> l2 = new ArrayList<>();
n1 = new NestedFoo(3,2,"a");
n2 = new NestedFoo(2,2,"a");
n3 = new NestedFoo(2,2,"b");
n4 = new NestedFoo(1,4,"c");
l2.add(n1);
l2.add(n2);
l2.add(n3);
l2.add(n4);
ArrayList<NestedFoo> l3 = new ArrayList<>();
n1 = new NestedFoo(3,2,"a");
n2 = new NestedFoo(2,3,"b");
n3 = new NestedFoo(2,2,"b");
n4 = new NestedFoo(5,4,"c");
l3.add(n1);
l3.add(n2);
l3.add(n3);
l3.add(n4);
Sweet, now put them in our Map.
sortmePlz.put(5,l1);
sortmePlz.put(2,l2);
sortmePlz.put(1,l3);
What I want now, is to sort the Entire Map first by its Keys, so the order should be l3 l2 l1.
Then, I want the lists inside each key to be sorted by the following Order:
intA,intB,text (all ascending)
I have no idea how to do this. Especially not since Java 8 with all those lambdas, I tried to read on the subject but feel overwhelmed by the code there.
Thanks in advance!
I hope the code has no syntatical errors, I made it up on the go

You can use TreeSet instead of regular HashMap and your values will be automatically sorted by key:
Map<Integer, ArrayList<NestedFoo>> sortmePlz = new TreeMap<>();
Second step I'm a little confused.
to be sorted by the following Order: intA,intB,text (all ascending)
I suppose you want to sort the list by comparing first the intA values, then if they are equal compare by intB and so on. If I understand you correctly you can use Comparator with comparing and thenComparing.
sortmePlz.values().forEach(list -> list
.sort(Comparator.comparing(NestedFoo::getValA)
.thenComparing(NestedFoo::getValB)
.thenComparing(NestedFoo::getTextA)));

I'm sure there are way of doing it with lambda but it is not actually required. See answer from Schidu Luca for a lambda like solution.
Keep reading if you want an 'old school solution'.
You cannot sort a map. It does not make sense because there is no notion of order in a map. Now, there are some map objects that store the key in a sorted way (like the TreeMap).
You can order a list. In your case, makes the class NestedFoo comparable (https://docs.oracle.com/javase/8/docs/api/java/lang/Comparable.html). Then you can invoke the method Collections.sort (https://docs.oracle.com/javase/8/docs/api/java/util/Collections.html#sort-java.util.List-) on your lists.

Use TreeMap instead of HashMap, it solves the 1st problem: ordering entries by key.
After getting the needed list from the Map, you can sort the ArrayList by valA, valB, text:
l1.sort(
Comparator.comparing(NestedFoo::getValA).thenComparing(NestedFoo::getValB).thenComparing(NestedFoo::getTextA)
);
And change your NestedFoo class definition like this:
class NestedFoo {
int valA;
int valB;
String textA;
public NestedFoo(int a, int b, String t) {
this.valA = a;
this.valB = b;
this.textA = t;
}
public int getValA() {
return valA;
}
public void setValA(int valA) {
this.valA = valA;
}
public int getValB() {
return valB;
}
public void setValB(int valB) {
this.valB = valB;
}
public String getTextA() {
return textA;
}
public void setTextA(String textA) {
this.textA = textA;
}
}

When using treemap for sorting keep in mind that treemap uses compareTo instead of equals for sorting and to find duplicity. compareTo should be incosistent with equals and hashcode when implemented for any object which will be used as key. You can look for a detailed example on this link https://codingninjaonline.com/2017/09/29/unexpected-results-for-treemap-with-inconsistent-compareto-and-equals/

Libsvm java training testing example(also in real time)

Anybody can help me by providing libsvm java example for training and testing. I am new in Machine learning and need help regarding the same. Earlier provided example by #machine learner have error giving only one class result. I don't want to use weka as suggestion given in earlier post.
Or can you rectify error in this code it always predict one class in result.(I want to perform multiclassification). This example is given by "Machine learner"
import java.io.*;
import java.util.*;
import libsvm.*;
public class Test{
public static void main(String[] args) throws Exception{
// Preparing the SVM param
svm_parameter param=new svm_parameter();
param.svm_type=svm_parameter.C_SVC;
param.kernel_type=svm_parameter.RBF;
param.gamma=0.5;
param.nu=0.5;
param.cache_size=20000;
param.C=1;
param.eps=0.001;
param.p=0.1;
HashMap<Integer, HashMap<Integer, Double>> featuresTraining=new HashMap<Integer, HashMap<Integer, Double>>();
HashMap<Integer, Integer> labelTraining=new HashMap<Integer, Integer>();
HashMap<Integer, HashMap<Integer, Double>> featuresTesting=new HashMap<Integer, HashMap<Integer, Double>>();
HashSet<Integer> features=new HashSet<Integer>();
//Read in training data
BufferedReader reader=null;
try{
reader=new BufferedReader(new FileReader("a1a.train"));
String line=null;
int lineNum=0;
while((line=reader.readLine())!=null){
featuresTraining.put(lineNum, new HashMap<Integer,Double>());
String[] tokens=line.split("\\s+");
int label=Integer.parseInt(tokens[0]);
labelTraining.put(lineNum, label);
for(int i=1;i<tokens.length;i++){
String[] fields=tokens[i].split(":");
int featureId=Integer.parseInt(fields[0]);
double featureValue=Double.parseDouble(fields[1]);
features.add(featureId);
featuresTraining.get(lineNum).put(featureId, featureValue);
}
lineNum++;
}
reader.close();
}catch (Exception e){
}
//Read in test data
try{
reader=new BufferedReader(new FileReader("a1a.t"));
String line=null;
int lineNum=0;
while((line=reader.readLine())!=null){
featuresTesting.put(lineNum, new HashMap<Integer,Double>());
String[] tokens=line.split("\\s+");
for(int i=1; i<tokens.length;i++){
String[] fields=tokens[i].split(":");
int featureId=Integer.parseInt(fields[0]);
double featureValue=Double.parseDouble(fields[1]);
featuresTesting.get(lineNum).put(featureId, featureValue);
}
lineNum++;
}
reader.close();
}catch (Exception e){
}
//Train the SVM model
svm_problem prob=new svm_problem();
int numTrainingInstances=featuresTraining.keySet().size();
prob.l=numTrainingInstances;
prob.y=new double[prob.l];
prob.x=new svm_node[prob.l][];
for(int i=0;i<numTrainingInstances;i++){
HashMap<Integer,Double> tmp=featuresTraining.get(i);
prob.x[i]=new svm_node[tmp.keySet().size()];
int indx=0;
for(Integer id:tmp.keySet()){
svm_node node=new svm_node();
node.index=id;
node.value=tmp.get(id);
prob.x[i][indx]=node;
indx++;
}
prob.y[i]=labelTraining.get(i);
}
svm_model model=svm.svm_train(prob,param);
for(Integer testInstance:featuresTesting.keySet()){
HashMap<Integer, Double> tmp=new HashMap<Integer, Double>();
int numFeatures=tmp.keySet().size();
svm_node[] x=new svm_node[numFeatures];
int featureIndx=0;
for(Integer feature:tmp.keySet()){
x[featureIndx]=new svm_node();
x[featureIndx].index=feature;
x[featureIndx].value=tmp.get(feature);
featureIndx++;
}
double d=svm.svm_predict(model, x);
System.out.println(testInstance+"\t"+d);
}
}
}

It is because your featuresTesting is never used, HashMap<Integer, Double> tmp=new HashMap<Integer, Double>(); should be HashMap<Integer, Double> tmp=featuresTesting.get(testInstance);

you could use javaML library to classify your data
it is a sample code with javaML:
Classifier clas = new LibSVM();
clas.buildClassifier(data);
Dataset dataForClassification= FileHandler.loadDataset(new File(.), 0, ",");
/* Counters for correct and wrong predictions. */
int correct = 0, wrong = 0;
/* Classify all instances and check with the correct class values */
for (Instance inst : dataForClassification) {
Object predictedClassValue = clas.classify(inst);
Map<Object,Double> map = clas.classDistribution(inst);
Object realClassValue = inst.classValue();
if (predictedClassValue.equals(realClassValue))
correct++;
else
wrong++;
}

It seems like you're having trouble understanding what you're doing, and are just copying code from here and there. It may help you to understand basic machine learning. For example you should probably read this practical guide for SVM classification from the authors of LIBSVM (the library you use). The advice you got here that you should probably take an introductory machine learning course online is probably even better.
Let me also give you two big tips, that may save you time if you're getting all results of the same class:
Are you normalizing your data, making all values lie between 0 and 1
(or between -1 and 1), either linearly or using the mean and the
standard deviation? It doesn't seem from your code like you are.
Are you parameter searching for a good value of C (or C and gamma in
the case of an RBF kernel)? Doing cross validation or on a hold out
set? It doesn't seem fro your code that you are.

A) No one knows that you are referencing. Give links if you wan't people to understand what you are referring to.
B) You need to take a course on Machine Learning. There is a free one on Coursera. The output of a model is dependent upon the data itself - and heavily influenced by the model parameters. The model parameters are effected by scaling, and you generally need to do a search for them. Your code incorporates none of this - and you have made it clear you are new to Machine Learning. You will waists hours and days and weeks on what could be done in a few minutes by obtaining the necessary background knowledge.
C) There are numerous version of LIBSVM for Java, and you have provided no indication of which one you are using. Each one works somewhat differently.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Apache Spark Decision Tree Predictions - java

Related

stream of objects containing multiple properties question

Given a Map with values of type string, which are all numbers separated by commas, would it be possible to transform each value into a double?

Iterate over a HashMap with multiple values per key

How to sort a Map by Key and Value, whereas the Val is a Map/List itself

Libsvm java training testing example(also in real time)

Categories

Resources