Libsvm java training testing example(also in real time) - java

Anybody can help me by providing libsvm java example for training and testing. I am new in Machine learning and need help regarding the same. Earlier provided example by #machine learner have error giving only one class result. I don't want to use weka as suggestion given in earlier post.
Or can you rectify error in this code it always predict one class in result.(I want to perform multiclassification). This example is given by "Machine learner"
import java.io.*;
import java.util.*;
import libsvm.*;
public class Test{
public static void main(String[] args) throws Exception{
// Preparing the SVM param
svm_parameter param=new svm_parameter();
param.svm_type=svm_parameter.C_SVC;
param.kernel_type=svm_parameter.RBF;
param.gamma=0.5;
param.nu=0.5;
param.cache_size=20000;
param.C=1;
param.eps=0.001;
param.p=0.1;
HashMap<Integer, HashMap<Integer, Double>> featuresTraining=new HashMap<Integer, HashMap<Integer, Double>>();
HashMap<Integer, Integer> labelTraining=new HashMap<Integer, Integer>();
HashMap<Integer, HashMap<Integer, Double>> featuresTesting=new HashMap<Integer, HashMap<Integer, Double>>();
HashSet<Integer> features=new HashSet<Integer>();
//Read in training data
BufferedReader reader=null;
try{
reader=new BufferedReader(new FileReader("a1a.train"));
String line=null;
int lineNum=0;
while((line=reader.readLine())!=null){
featuresTraining.put(lineNum, new HashMap<Integer,Double>());
String[] tokens=line.split("\\s+");
int label=Integer.parseInt(tokens[0]);
labelTraining.put(lineNum, label);
for(int i=1;i<tokens.length;i++){
String[] fields=tokens[i].split(":");
int featureId=Integer.parseInt(fields[0]);
double featureValue=Double.parseDouble(fields[1]);
features.add(featureId);
featuresTraining.get(lineNum).put(featureId, featureValue);
}
lineNum++;
}
reader.close();
}catch (Exception e){
}
//Read in test data
try{
reader=new BufferedReader(new FileReader("a1a.t"));
String line=null;
int lineNum=0;
while((line=reader.readLine())!=null){
featuresTesting.put(lineNum, new HashMap<Integer,Double>());
String[] tokens=line.split("\\s+");
for(int i=1; i<tokens.length;i++){
String[] fields=tokens[i].split(":");
int featureId=Integer.parseInt(fields[0]);
double featureValue=Double.parseDouble(fields[1]);
featuresTesting.get(lineNum).put(featureId, featureValue);
}
lineNum++;
}
reader.close();
}catch (Exception e){
}
//Train the SVM model
svm_problem prob=new svm_problem();
int numTrainingInstances=featuresTraining.keySet().size();
prob.l=numTrainingInstances;
prob.y=new double[prob.l];
prob.x=new svm_node[prob.l][];
for(int i=0;i<numTrainingInstances;i++){
HashMap<Integer,Double> tmp=featuresTraining.get(i);
prob.x[i]=new svm_node[tmp.keySet().size()];
int indx=0;
for(Integer id:tmp.keySet()){
svm_node node=new svm_node();
node.index=id;
node.value=tmp.get(id);
prob.x[i][indx]=node;
indx++;
}
prob.y[i]=labelTraining.get(i);
}
svm_model model=svm.svm_train(prob,param);
for(Integer testInstance:featuresTesting.keySet()){
HashMap<Integer, Double> tmp=new HashMap<Integer, Double>();
int numFeatures=tmp.keySet().size();
svm_node[] x=new svm_node[numFeatures];
int featureIndx=0;
for(Integer feature:tmp.keySet()){
x[featureIndx]=new svm_node();
x[featureIndx].index=feature;
x[featureIndx].value=tmp.get(feature);
featureIndx++;
}
double d=svm.svm_predict(model, x);
System.out.println(testInstance+"\t"+d);
}
}
}

It is because your featuresTesting is never used, HashMap<Integer, Double> tmp=new HashMap<Integer, Double>(); should be HashMap<Integer, Double> tmp=featuresTesting.get(testInstance);

you could use javaML library to classify your data
it is a sample code with javaML:
Classifier clas = new LibSVM();
clas.buildClassifier(data);
Dataset dataForClassification= FileHandler.loadDataset(new File(.), 0, ",");
/* Counters for correct and wrong predictions. */
int correct = 0, wrong = 0;
/* Classify all instances and check with the correct class values */
for (Instance inst : dataForClassification) {
Object predictedClassValue = clas.classify(inst);
Map<Object,Double> map = clas.classDistribution(inst);
Object realClassValue = inst.classValue();
if (predictedClassValue.equals(realClassValue))
correct++;
else
wrong++;
}

It seems like you're having trouble understanding what you're doing, and are just copying code from here and there. It may help you to understand basic machine learning. For example you should probably read this practical guide for SVM classification from the authors of LIBSVM (the library you use). The advice you got here that you should probably take an introductory machine learning course online is probably even better.
Let me also give you two big tips, that may save you time if you're getting all results of the same class:
Are you normalizing your data, making all values lie between 0 and 1
(or between -1 and 1), either linearly or using the mean and the
standard deviation? It doesn't seem from your code like you are.
Are you parameter searching for a good value of C (or C and gamma in
the case of an RBF kernel)? Doing cross validation or on a hold out
set? It doesn't seem fro your code that you are.

A) No one knows that you are referencing. Give links if you wan't people to understand what you are referring to.
B) You need to take a course on Machine Learning. There is a free one on Coursera. The output of a model is dependent upon the data itself - and heavily influenced by the model parameters. The model parameters are effected by scaling, and you generally need to do a search for them. Your code incorporates none of this - and you have made it clear you are new to Machine Learning. You will waists hours and days and weeks on what could be done in a few minutes by obtaining the necessary background knowledge.
C) There are numerous version of LIBSVM for Java, and you have provided no indication of which one you are using. Each one works somewhat differently.

Related

Saving a HashMap piece by piece to File

I am running a large loop in Java where, in every pass, data is populated in a HashMap.
The loop is very long so I cannot hold the complete HashMap in memory. So I need to find a way to export the Hashmap to a file after every 1000 iterations or so.
I was thinking about exporting the HashMap using serialization after every 1000 steps to a file, clearing the HashMap variable and repeating the process by appending the next to the same file. But the problem would then occur while retrieving the complete HashMap from the file as there would be metadata appended to the file every time I export. So is there any other way to do this?
Edit:
The HashMap structure is given below:
HashMap<Key, double[]>
Key {
String name;
BitSet set;
}
Yes. You have a great idea of clearing the file every N amount of iterations, which would look something similar to this:
public void exportHashTable() {
HashMap<String, Object> map = new HashMap<>();
map.put("hi", "world");
for (int i = 0; i < map.size(); i++) {
// Some logic ..
if (i % 1000 == 0) {
appendToFile(map);
map.clear();
}
}
}
In order to import you don't have to read the entire file, but read it line by line, in case you exported it (not serialized it). Let's say you export it as CSV or maybe even JSON. In that case, you can import the HashMap and process N amount of rows, then clear and proceed further.
public void importHashTable() {
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
String line;
while ((line = br.readLine()) != null) {
// process the line, add to hashmap or do some other operation
}
}
}

Is there a Map object with takes index and key and object? Java

I'm trying to emulate a rotor of an enigma machine in Java.
I need an object which takes an index, a key and an object, because I unsuccessfully tried HashMaps like this:
private HashMap<Integer,Integer> rotorWiring = new HashMap<Integer, Integer();
private HashMap<Integer,Integer> reverseRotorWiring = new HashMap<Integer, Integer>();
//The "wiring" of the rotor is set from a String,
public void setRotorWiring(String Wiring) {
if (Wiring.length()==26) {
for (int i=0; i<Wiring.length();i++ ) {
char tempChar = Wiring.charAt(i);
int valueOfChar = (int)tempChar-64;
if (valueOfChar<=26){
this.rotorWiring.put(i+1,valueOfChar);
this.reverseRotorWiring.put(valueOfChar,i+1);
}
}
}
}
So far so good, this allows me to translate e.x. an A to an E, however, once I tried to simulate a turn of the rotor like this:
//It should be mentioned that I designing the program to only accept characters a to z inclusive.
public void turn() {
for (int i=1;i<=rotorWiring.size();i++) {
if (i!=26) {
rotorWiring.replace(i, rotorWiring.get(i+1));
}
else {
rotorWiring.replace(i, rotorWiring.get(1));
}
}
for (int i=1;i<=rotorWiring.size();i++) {
if (i!=26) {
reverseRotorWiring.replace(i, rotorWiring.get(i+1));
}
}
}
However, I noticed that this rather simulates an offset of the internal wiring of the rotor rather than a turn... I'm asking for a "Map"-like solutions with an index, key and object, because that would allow me to offset the index of all the keys and objects by 1, thus simulating a turn.
I am, however, open to suggestions for different solutions to this problem.
It should be mentioned that I'm a bit of a novice, and therefore appreciate rather in-depth explanations.
Many thanks.
Welcome to StackOverflow. There doesn't exist an implementation of what you have described in JDK. However, there are more ways to achieve the storing of Integer-String-Object. Note that both the index and the key are unique by definition. Also, note that the index-key are tightly coupled. You might want to put a Map to another Map:
Map<Integer, Map<String, MyObject>> map;
Or use a collection characteristic for indices:
List<Map<String, MyObject>>
Be careful with removing items which change the index of all the subsequent elements - replace it with null instead to keep the indices. Alternatively, you can create a decorator for your defined object with index/key:
Map<Integer, MyDecoratedObject> map;
Where the MyDecoratedObject would look like:
public class MyDecoratedObject {
private final String key; // or int index
private final MyObject delegate;
// Full-args constructor, getters
}
Finally, it's up to you to pick a way that satisfied your requirements the most.
A map of maps was the solution! It was solved like this:
private HashMap<Integer,HashMap<Integer,Integer>> rotorWiring = new HashMap<Integer, HashMap<Integer,Integer>>();
private HashMap<Integer,HashMap<Integer,Integer>> reverseRotorWiring = new HashMap<Integer, HashMap<Integer,Integer>>();
public void setRotorWiring(String Wiring) {
if (Wiring.length()==26) {
for (int i=0; i<Wiring.length();i++ ) {
HashMap<Integer, Integer> wire = new HashMap<Integer, Integer>();
HashMap<Integer, Integer> reverseWire = new HashMap<Integer, Integer>();
char tempChar = Wiring.charAt(i);
int valueOfChar = (int)tempChar-64;
if (valueOfChar<=26){
wire.put(i+1,valueOfChar);
reverseWire.put(valueOfChar,i+1);
rotorWiring.put(i, wire);
reverseRotorWiring.put(i, reverseWire);
}
}
}
}

Apache Spark Decision Tree Predictions

I have the following code for classification using decision trees. I need to get the predictions of the test dataset into a java array and print them. Can someone help me to extend this code for that. I need to have an a 2D array of predicted label and actual label and print the predicted labels.
public class DecisionTreeClass {
public static void main(String args[]){
SparkConf sparkConf = new SparkConf().setAppName("DecisionTreeClass").setMaster("local[2]");
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
// Load and parse the data file.
String datapath = "/home/thamali/Desktop/tlib.txt";
JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(jsc.sc(), datapath).toJavaRDD();//A training example used in supervised learning is called a “labeled point” in MLlib.
// Split the data into training and test sets (30% held out for testing)
JavaRDD<LabeledPoint>[] splits = data.randomSplit(new double[]{0.7, 0.3});
JavaRDD<LabeledPoint> trainingData = splits[0];
JavaRDD<LabeledPoint> testData = splits[1];
// Set parameters.
// Empty categoricalFeaturesInfo indicates all features are continuous.
Integer numClasses = 12;
Map<Integer, Integer> categoricalFeaturesInfo = new HashMap();
String impurity = "gini";
Integer maxDepth = 5;
Integer maxBins = 32;
// Train a DecisionTree model for classification.
final DecisionTreeModel model = DecisionTree.trainClassifier(trainingData, numClasses,
categoricalFeaturesInfo, impurity, maxDepth, maxBins);
// Evaluate model on test instances and compute test error
JavaPairRDD<Double, Double> predictionAndLabel =
testData.mapToPair(new PairFunction<LabeledPoint, Double, Double>() {
#Override
public Tuple2<Double, Double> call(LabeledPoint p) {
return new Tuple2(model.predict(p.features()), p.label());
}
});
Double testErr =
1.0 * predictionAndLabel.filter(new Function<Tuple2<Double, Double>, Boolean>() {
#Override
public Boolean call(Tuple2<Double, Double> pl) {
return !pl._1().equals(pl._2());
}
}).count() / testData.count();
System.out.println("Test Error: " + testErr);
System.out.println("Learned classification tree model:\n" + model.toDebugString());
}
}
You basically have exactly that with the prediction and label variable. If you really needed a list of a 2d double arrays, you could change the method that you use to:
JavaRDD<double[]> valuesAndPreds = testData.map(point -> new double[]{model.predict(point.features()), point.label()});
and run collect on that reference for a list of 2d double arrays.
List<double[]> values = valuesAndPreds.collect();
I would take a look at the docs here: https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html . You can also change the data to get additional statical performance measurements of your model with classes like MulticlassMetrics. This requires changing the mapToPair function to a map function and changing the generics to an object. So something like:
JavaRDD<Tuple2<Object, Object>> valuesAndPreds = testData().map(point -> new Tuple2<>(model.predict(point.features()), point.label()));
Then running:
MulticlassMetrics multiclassMetrics = new MulticlassMetrics(JavaRDD.toRDD(valuesAndPreds));
All of this stuff is very well documented in Spark's MLLib documentation. Also, you mentioned needing to print the results. If this is homework, I will let you figure out that part, since it would be a good exercise to learn how to do that from a list.
Edit:
ALSO, noticed that you are using java 7, and what I have is from java 8. To answer your main question in how to turn into a 2d double array, you would do:
JavaRDD<double[]> valuesAndPreds = testData.map(new org.apache.spark.api.java.function.Function<LabeledPoint, double[]>() {
#Override
public double[] call(LabeledPoint point) {
return new double[]{model.predict(point.features()), point.label()};
}
});
Then run collect, to get a list of two doubles. Also, to give a hint on the printing part, take a look at the java.util.Arrays toString implementation.

Search multiple HashMaps at the same time

tldr: How can I search for an entry in multiple (read-only) Java HashMaps at the same time?
The long version:
I have several dictionaries of various sizes stored as HashMap< String, String >. Once they are read in, they are never to be changed (strictly read-only).
I want to check whether and which dictionary had stored an entry with my key.
My code was originally looking for a key like this:
public DictionaryEntry getEntry(String key) {
for (int i = 0; i < _numDictionaries; i++) {
HashMap<String, String> map = getDictionary(i);
if (map.containsKey(key))
return new DictionaryEntry(map.get(key), i);
}
return null;
}
Then it got a little more complicated: my search string could contain typos, or was a variant of the stored entry. Like, if the stored key was "banana", it is possible that I'd look up "bannana" or "a banana", but still would like the entry for "banana" returned. Using the Levenshtein-Distance, I now loop through all dictionaries and each entry in them:
public DictionaryEntry getEntry(String key) {
for (int i = 0; i < _numDictionaries; i++) {
HashMap<String, String> map = getDictionary(i);
for (Map.Entry entry : map.entrySet) {
// Calculate Levenshtein distance, store closest match etc.
}
}
// return closest match or null.
}
So far everything works as it should and I'm getting the entry I want. Unfortunately I have to look up around 7000 strings, in five dictionaries of various sizes (~ 30 - 70k entries) and it takes a while. From my processing output I have the strong impression my lookup dominates overall runtime.
My first idea to improve runtime was to search all dictionaries parallely. Since none of the dictionaries is to be changed and no more than one thread is accessing a dictionary at the same time, I don't see any safety concerns.
The question is just: how do I do this? I have never used multithreading before. My search only came up with Concurrent HashMaps (but to my understanding, I don't need this) and the Runnable-class, where I'd have to put my processing into the method run(). I think I could rewrite my current class to fit into Runnable, but I was wondering if there is maybe a simpler method to do this (or how can I do it simply with Runnable, right now my limited understanding thinks I have to restructure a lot).
Since I was asked to share the Levenshtein-Logic: It's really nothing fancy, but here you go:
private int _maxLSDistance = 10;
public Map.Entry getClosestMatch(String key) {
Map.Entry _closestMatch = null;
int lsDist;
if (key == null) {
return null;
}
for (Map.Entry entry : _dictionary.entrySet()) {
// Perfect match
if (entry.getKey().equals(key)) {
return entry;
}
// Similar match
else {
int dist = StringUtils.getLevenshteinDistance((String) entry.getKey(), key);
// If "dist" is smaller than threshold and smaller than distance of already stored entry
if (dist < _maxLSDistance) {
if (_closestMatch == null || dist < _lsDistance) {
_closestMatch = entry;
_lsDistance = dist;
}
}
}
}
return _closestMatch
}
In order to use multi-threading in your case, could be something like:
The "monitor" class, which basically stores the results and coordinates the threads;
public class Results {
private int nrOfDictionaries = 4; //
private ArrayList<String> results = new ArrayList<String>();
public void prepare() {
nrOfDictionaries = 4;
results = new ArrayList<String>();
}
public synchronized void oneDictionaryFinished() {
nrOfDictionaries--;
System.out.println("one dictionary finished");
notifyAll();
}
public synchronized boolean isReady() throws InterruptedException {
while (nrOfDictionaries != 0) {
wait();
}
return true;
}
public synchronized void addResult(String result) {
results.add(result);
}
public ArrayList<String> getAllResults() {
return results;
}
}
The Thread it's self, which can be set to search for the specific dictionary:
public class ThreadDictionarySearch extends Thread {
// the actual dictionary
private String dictionary;
private Results results;
public ThreadDictionarySearch(Results results, String dictionary) {
this.dictionary = dictionary;
this.results = results;
}
#Override
public void run() {
for (int i = 0; i < 4; i++) {
// search dictionary;
results.addResult("result of " + dictionary);
System.out.println("adding result from " + dictionary);
}
results.oneDictionaryFinished();
}
}
And the main method for demonstration:
public static void main(String[] args) throws Exception {
Results results = new Results();
ThreadDictionarySearch threadA = new ThreadDictionarySearch(results, "dictionary A");
ThreadDictionarySearch threadB = new ThreadDictionarySearch(results, "dictionary B");
ThreadDictionarySearch threadC = new ThreadDictionarySearch(results, "dictionary C");
ThreadDictionarySearch threadD = new ThreadDictionarySearch(results, "dictionary D");
threadA.start();
threadB.start();
threadC.start();
threadD.start();
if (results.isReady())
// it stays here until all dictionaries are searched
// because in "Results" it's told to wait() while not finished;
for (String string : results.getAllResults()) {
System.out.println("RESULT: " + string);
}
I think the easiest would be to use a stream over the entry set:
public DictionaryEntry getEntry(String key) {
for (int i = 0; i < _numDictionaries; i++) {
HashMap<String, String> map = getDictionary(i);
map.entrySet().parallelStream().foreach( (entry) ->
{
// Calculate Levenshtein distance, store closest match etc.
}
);
}
// return closest match or null.
}
Provided you are using java 8 of course. You could also wrap the outer loop into an IntStream as well. Also you could directly use the Stream.reduce to get the entry with the smallest distance.
Maybe try thread pools:
ExecutorService es = Executors.newFixedThreadPool(_numDictionaries);
for (int i = 0; i < _numDictionaries; i++) {
//prepare a Runnable implementation that contains a logic of your search
es.submit(prepared_runnable);
}
I believe you may also try to find a quick estimate of strings that completely do not match (i.e. significant difference in length), and use it to finish your logic ASAP, moving to next candidate.
I have my strong doubts that HashMaps are a suitable solution here, especially if you want to have some fuzzing and stop words. You should utilize a proper full text search solutions like ElaticSearch or Apache Solr or at least an available engine like Apache Lucene.
That being said, you can use a poor man's version: Create an array of your maps and a SortedMap, iterate over the array, take the keys of the current HashMap and store them in the SortedMap with the index of their HashMap. To retrieve a key, you first search in the SortedMap for said key, get the respective HashMap from the array using the index position and lookup the key in only one HashMap. Should be fast enough without the need for multiple threads to dig through the HashMaps. However, you could make the code below into a runnable and you can have multiple lookups in parallel.
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.SortedMap;
import java.util.TreeMap;
public class Search {
public static void main(String[] arg) {
if (arg.length == 0) {
System.out.println("Must give a search word!");
System.exit(1);
}
String searchString = arg[0].toLowerCase();
/*
* Populating our HashMaps.
*/
HashMap<String, String> english = new HashMap<String, String>();
english.put("banana", "fruit");
english.put("tomato", "vegetable");
HashMap<String, String> german = new HashMap<String, String>();
german.put("Banane", "Frucht");
german.put("Tomate", "Gemüse");
/*
* Now we create our ArrayList of HashMaps for fast retrieval
*/
List<HashMap<String, String>> maps = new ArrayList<HashMap<String, String>>();
maps.add(english);
maps.add(german);
/*
* This is our index
*/
SortedMap<String, Integer> index = new TreeMap<String, Integer>(String.CASE_INSENSITIVE_ORDER);
/*
* Populating the index:
*/
for (int i = 0; i < maps.size(); i++) {
// We iterate through or HashMaps...
HashMap<String, String> currentMap = maps.get(i);
for (String key : currentMap.keySet()) {
/* ...and populate our index with lowercase versions of the keys,
* referencing the array from which the key originates.
*/
index.put(key.toLowerCase(), i);
}
}
// In case our index contains our search string...
if (index.containsKey(searchString)) {
/*
* ... we find out in which map of the ones stored in maps
* the word in the index originated from.
*/
Integer mapIndex = index.get(searchString);
/*
* Next, we look up said map.
*/
HashMap<String, String> origin = maps.get(mapIndex);
/*
* Last, we retrieve the value from the origin map
*/
String result = origin.get(searchString);
/*
* The above steps can be shortened to
* String result = maps.get(index.get(searchString).intValue()).get(searchString);
*/
System.out.println(result);
} else {
System.out.println("\"" + searchString + "\" is not in the index!");
}
}
}
Please note that this is a rather naive implementation only provided for illustration purposes. It doesn't address several problems (you can't have duplicate index entries, for example).
With this solution, you are basically trading startup speed for query speed.
Okay!!..
Since your concern is to get faster response.
I would suggest you to divide the work between threads.
Lets you have 5 dictionaries May be keep three dictionaries to one thread and rest two will take care by another thread.
And then witch ever thread finds the match will halt or terminate the other thread.
May be you need an extra logic to do that dividing work ... But that wont effect your performance time.
And may be you need little more changes in your code to get your close match:
for (Map.Entry entry : _dictionary.entrySet()) {
you are using EntrySet But you are not using values anyway it seems getting entry set is a bit expensive. And I would suggest you to just use keySet since you are not really interested in the values in that map
for (Map.Entry entry : _dictionary.keySet()) {
For more details on the proformance of map Please read this link Map performances
Iteration over the collection-views of a LinkedHashMap requires time proportional to the size of the map, regardless of its capacity. Iteration over a HashMap is likely to be more expensive, requiring time proportional to its capacity.

Store associative array of strings with length as keys

I have this input:
5
it
your
reality
real
our
First line is number of strings comming after. And i should store it this way (pseudocode):
associative_array = [ 2 => ['it'], 3 => ['our'], 4 => ['real', 'your'], 7 => ['reality']]
As you can see the keys of associative array are the length of strings stored in inner array.
So how can i do this in java ? I came from php world, so if you will compare it with php, it will be very well.
MultiMap<Integer, String> m = new MultiHashMap<Integer, String>();
for(String item : originalCollection) {
m.put(item.length(), item);
}
djechlin already posted a better version, but here's a complete standalone example using just JDK classes:
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
public class Main {
public static void main(String[] args) throws Exception{
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
String firstLine = reader.readLine();
int numOfRowsToFollow = Integer.parseInt(firstLine);
Map<Integer,Set<String>> stringsByLength = new HashMap<>(numOfRowsToFollow); //worst-case size
for (int i=0; i<numOfRowsToFollow; i++) {
String line = reader.readLine();
int length = line.length();
Set<String> alreadyUnderThatLength = stringsByLength.get(length); //int boxed to Integer
if (alreadyUnderThatLength==null) {
alreadyUnderThatLength = new HashSet<>();
stringsByLength.put(length, alreadyUnderThatLength);
}
alreadyUnderThatLength.add(line);
}
System.out.println("results: "+stringsByLength);
}
}
its output looks like this:
3
bob
bart
brett
results: {4=[bart], 5=[brett], 3=[bob]}
Java doesn't have associative arrays. But it does have Hashmaps, which mostly accomplishes the same goal. In your case, you can have multiple values for any given key. So what you could do is make each entry in the Hashmap an array or a collection of some kind. ArrayList is a likely choice. That is:
Hashmap<Integer,ArrayList<String>> words=new HashMap<Integer,ArrayList<String>>();
I'm not going to go through the code to read your list from a file or whatever, that's a different question. But just to give you the idea of how the structure would work, suppose we could hard-code the list. We could do it something like this:
ArrayList<String> set=new ArrayList<String)();
set.add("it");
words.put(Integer.valueOf(2), set);
set.clear();
set.add("your");
set.add("real");
words.put(Integer.valueOf(4), set);
Etc.
In practice, you probably would regularly be adding words to an existing set. I often do that like this:
void addWord(String word)
{
Integer key=Integer.valueOf(word.length());
ArrayList<String> set=words.get(key);
if (set==null)
{
set=new ArrayList<String>();
words.put(key,set);
}
// either way we now have a set
set.add(word);
}
Side note: I often see programmers end a block like this by putting "set" back into the Hashmap, i.e. "words.put(key,set)" at the end. This is unnecessary: it's already there. When you get "set" from the Hashmap, you're getting a reference, not a copy, so any updates you make are just "there", you don't have to put it back.
Disclaimer: This code is off the top of my head. No warranties expressed or implied. I haven't written any Java in a while so I may have syntax errors or wrong function names. :-)
As your key appears to be small integer, you could use a list of lists. In this case the simplest solution is to use a MultiMap like
Map<Integer, Set<String>> stringByLength = new LinkedHashMap<>();
for(String s: strings) {
Integer len = s.length();
Set<String> set = stringByLength.get(s);
if(set == null)
stringsByLength.put(len, set = new LinkedHashSet<>());
set.add(s);
}
private HashMap<Integer, List<String>> map = new HashMap<Integer, List<String>>();
void addStringToMap(String s) {
int length = s.length();
if (map.get(length) == null) {
map.put(length, new ArrayList<String>());
}
map.get(length).add(s);
}

Categories