The datastructure design I've chosen is proving very awkward to execute, so rather than ask for your expert opinion on how to execute it, I'm hoping you can suggest a more natural data structure for what I'm trying to do, which is as follows. I'm reading in rows of data. Each column is a single variable (Animal, Color, Crop, ... - there are 45 of them). Each row of data has a value for the variable of that column - you don't know the values or the number of rows in advance.
Animal Color Crop ...
-------------------------------------
cat red oat
cat blue hay
dog blue oat
bat blue corn
cat red corn
dog gray corn
... ... ...
When I'm done reading, it should capture each Variable, each value that variable took, and how many times that variable took that value, like so:
Animal [cat, 3][dog,2][bat, 1]...
Color [blue, 3][red,2][gray,1]...
Crop [corn,3][oat, 2][hay,1]...
...
I've tried several approaches, the closest I've gotten is with a GUAVA multi map of hash maps, like so:
Map<String, Integer> eqCnts = new HashMap<String, Integer>();
Multimap<String, Map> ed3Dcnt = HashMultimap.create();
for (int i = 0; i + 1 < header.length; i++) {
System.out.format("Got a variable of %s\n", tmpStrKey = header[i]);
ed3Dcnt.put(tmpStrKey, new HashMap<String, Integer>());
}
It seems I've created exactly what I want just fine, but it's extremely awkward and tedious to work with, and also it behaves in mysterious ways (for one thing, even though the "ed3Dcnt.put()" inserted a HashMap, the corresponding ".get()" does not return a HashMap, but rather a Collection, which creates a whole new set of problems.) Note that I'd like to sort the result on the values, from highest to lowest, but I think I can do that easily enough.
So if you please, a suggestion on a better choice of data structure design? If there isn't a clearly better design choice, how do I use the Collection that the .get() returns, when all I want is the single HashMap that I put in that slot?
Thanks very much - Ed
You can remove some of the oddity by replacing your Map<String, Integer> by a Multiset.
A multiset (or a bag) is a set that allows duplicate elements - and counts them. You throw in an apple, a pear, and an apple again. It remembers that it has two apples and a pear. Basically, it's what you imagine under a Map<String, Integer> which you just used.
Multiset<String> eqCounts = HashMultiset.create();
the corresponding ".get()" does not return a HashMap, but rather a
Collection
This is because you used a generic 'Multimap' interface. The docs say:
You rarely use the Multimap interface directly, however; more often
you'll use ListMultimapor SetMultimap, which map keys to a List or a
Set respectively.
So, to stick to your original design:
Each column will be a Multiset<String> which will store and count your values.
You'll have a Map<String, Multiset<String>> (key is a header, value is the column) where you'll put the columns like this:
Map<String, Multiset<String>> columns = Maps.newHashMap();
for (int i = 0; i < headers.length; i++) {
System.out.format("Got a variable of %s\n", headers[i]);
columns.put(headers[i], HashMultiset.<String>create());
}
Read a line and put the values where they belong:
String[] values = line.split(" ");
for (int i = 0; i < headers.length; i++) {
columns.get(headers[i]).add(values[i]);
}
All that said, you can see that the outer HashMap is kind of redundant and the whole thing still could be improved (though it's good enough, I think). To improve it more, you can try of these:
Use an array of Multiset instead of a HashMap. Afterall, you know the number of columns beforehand.
If you're uncomfortable with creating generic arrays, use a List.
And probably the best: Create a class Column like this:
private static class Column {
private final String header;
private final Multiset<String> values;
private Column(String header) {
this.header = header;
this.values = HashMultiset.create();
}
}
And instead of using String[] for headers and a Map<String, Multiset<String>> for their values, use a Column[]. You can create this array in place of creating the headers array.
Seems to me that the best fit is:
HashMap<String, HashMap<String, Integer>> map= new HashMap<String, HashMap<String, Integer>>();
Now, to add header inner maps:
for (int i = 0; i + 1 < header.length; i++) {
System.out.format("Got a variable of %s\n", tmpStrKey = header[i]);
map.put(tmpStrKey, new HashMap<String, Integer>());
}
And to increment a value in the inner map:
//we are in some for loop
for ( ... ) {
String columnKey = "animal"; //lets say we are here in the for loop
for ( ... ) {
String columnValue = "cat"; //assume we are here
HashMap<String, Integer> innerMap = map.get(columnKey);
//increment occurence
Integer count = innerMap.get(columnValue);
if (count == null) {
count = 0;
}
innerMap.put(columnValue, ++count);
}
}
1) The map inside your multimap is commonly referred to as a cardinality map. For creating a cardinality map from a collection of values, I usually use CollectionUtils.getCardinalityMap from Apache Commons Collections, although that isn't generified so you'll need one unsafe (but known to be safe) cast. If you want to build the map using Guava I think you should first put the values for a variable in a Set<String> (to get the set of unique values) and then use Iterables.frequency() for each value to get the count. (EDIT: or even easier: use ImmutableMultiset.copyOf(collection) to get the cardinality map as a Multiset) Anyway, the resulting cardinality map is a Map<String, Integer such as you're already using.
2) I don't see why you need a Multimap. After all you want to map each variable to a cardinality map, so I'd use Map<String, Map<String, Integer>>.
EDIT: or use Map<String, Multiset<String>> if you decide to use a Multiset as your cardinality map.
Related
Is it possible to iterate over a certain range of keys from a HashMap?
My HashMap contains key-value pairs where the key denotes a certainr row-column in Excel (e.g. "BM" or "AT") and the value is the value in this cell.
For example, my table import is:
startH = {
BQ=2019-11-04,
BU=2019-12-02,
BZ=2020-01-06,
CD=2020-02-03,
CH=2020-03-02,
CM=2020-04-06
}
endH = {
BT=2019-11-25,
BY=2019-12-30,
CC=2020-01-27,
CG=2020-02-24,
CL=2020-03-30,
CP=2020-04-27
}
I need to iterate over those two hashmap using a key-range in order to extract the data in the correct order. For example from "BQ" to "BT".
Explanation
Is it possible to iterate over hashmap but using its index?
No.
A HashMap has no indices. Depending on the underlying implementation it would also be impossible. Java HashMaps are not necessarily represented by a hashing-table. It can switch over to a red-black tree and they do not provide direct access at all. So no, not possible.
There is another fundamental flaw in this approach. HashMap does not maintain any order. Iterating it yields random orders that can change each time you start the program. But for this approach you would need insertion order. Fortunately LinkedHashMap does this. It still does not provide index-based access though.
Solutions
Generation
But, you actually do not even want index based access. You want to retrieve a certain key-range, for example from "BA" to "BM". A good approach that works with HashMap would be to generate your key-range and simply using Map#get to retrieve the data:
char row = 'B';
char columnStart = 'A';
char columnEnd = 'M';
for (char column = columnStart; columnStart <= columnEnd; column++) {
String key = Chararcter.toString(row) + column;
String data = map.get(key);
...
}
You might need to fine-tune it a bit if you need proper edge case handling, like wrapping around the alphabet (use 'A' + (column % alphabetSize)) and maybe it needs some char to int casting and vice versa for the additions, did not test it.
NavigableMap
There is actually a variant of map that offers pretty much what you want out of the box. But at higher cost of performance, compared to a simple HashMap. The interface is called NavigableMap. The class TreeMap is a good implementation. The problem is that it requires an explicit order. The good thing though is that you actually want Strings natural order, which is lexicographical.
So you can simply use it with your existing data and then use the method NavigableMap#subMap:
NavigableMap<String, String> map = new TreeMap<>(...);
String startKey = "BA";
String endKey = "BM";
Map<String, String> subMap = map.subMap(startKey, endKey);
for (Entry<String, String> entry : subMap.entrySet()) {
...
}
If you have to do those kind of requests more than once, this will definitely pay off and it is the perfect data-structure for this use-case.
Linked iteration
As explained before, it is also possible (although not as efficient) to instead have a LinkedHashMap (to maintain insertion order) and then simply iterate over the key range. This has some major drawbacks though, for example it first needs to locate the start of the range by fully iterating to there. And it relies on the fact that you inserted them correctly.
LinkedHashMap<String, String> map = ...
String startKey = "BA";
String endKey = "BM";
boolean isInRange = false;
for (Entry<String, String> entry : map.entrySet()) {
String key = entry.getKey();
if (!isInRange) {
if (key.equals(startKey)) {
isInRange = true;
} else {
continue;
}
}
...
if (key.equals(endKey)) {
break;
}
}
// rangeLower and rangeUpper can be arguments
int i = 0;
for (Object mapKey : map.keySet()) {
if (i < rangeLower || i > rangeUpper) {
i++;
continue;
}
// Do something with mapKey
}
The above code iterates by getting keyset and explicitly maintaining index and incrementing it in each loop. Another option is to use LinkedHashMap, which maintains a doubly linked list for maintaining insertion order.
I don't believe you can. The algorithm you propose assumes that the keys of a HashMap are ordered and they are not. Order of keys is not guaranteed, only the associations themselves are guaranteed.
You might be able to change the structure of your data to something like this:
ranges = {
BQ=BT,
BU=BY,
....
}
Then the iteration over the HashMap keys (start cells) would easily find the matching end cells.
for example:
public static LinkedList<String, Double> ll = new LinkedList<String, Double>;
from your question, I think (not 100% sure) you are looking for
java.util.LinkedHashMap<K, V>
in your case, it would be LinkedHashMap<String, Double>
from java doc:
Hash table and linked list implementation of the Map interface, with
predictable iteration order. This implementation differs from HashMap
in that it maintains a doubly-linked list running through all of its
entries.
if you do want to get element by list.get(5), you could :
LinkedList<Entry<String, Double>>
so you can get Entry element by Entry entry = list.get(5), then entry.getKey() gives you the STring, and entry.getValue() gives you the Double.
Reading all your comments, I suggest you do something like this:
public class StringAndDouble {
private String str;
private double dbl;
// add constructor
// add getters, setters and other methods as needed.
// override equals() and hashCode()
}
Now you can use:
List<StringAndDouble> list = new LinkedList<>(); // or
List<StringAndDouble> list = new ArrayList<>(); // better in most cases
Now you can access your objects by index.
This answer creates a new class, to fit your needs. The class has two fields, one String, one double. This doesn't make the class two dimensional. I think you have a misunderstanding there. When there are n dimensions, you need n indexes to access an element. You were talking of accessing by index, so I assume you're looking for a one dimensional list holding the objects, that have more than one field.
Do you mean like this?
HashMap<String, Double> hm = new HashMap<String, Double>();
Since OP in a comment to #Kent says he wants to be able to get items by index...
Note that a LinkedList (and LinkedHashMap) are inefficient at that. He may prefer an ArrayList. So I would suggest that his "2D" implementation be a
ArrayList<Map.Entry<String, Double>>
which will efficiently support a get by index.
As for the normal get(String key), you'd have to do a linear search of all the entries, which would be inefficient.
So, you have a decision: which way of accessing (by a key or by an index) is more important?
You can actually use Linked Lists within eachother...
For Example:
public LinkedList<LinkedList<Integer>> twoDimLinkedList = new LinkedList<LinkedList<Integer>>();
Then:
////////////////
int value = twoDimLinkedList.get(3).get(4);
/////////////////
or (If you were planning on using it for iterative purposes):
/////////////////
for (int i = 0; i < twoDimLinkedList.size(); i++) {
LinkedList<Integer> twoDimLinkedListRow = new LinkedList<Integer>();
for (int m = 0; m < twoDimLinkedList.get(i).size(); m++) {
twoDimLinkedListRow.add(value);
}
twoDimLinkedList.add(twoDimLinkedListRow);
}
////////////////
So what I have been trying to do is use a TreeMap I previously had and apply it to this method in which I convert it into a set and have it go through a Map Entry Loop. What I wish to do is invert my previous TreeMap into the opposite (flipped) TreeMap
'When I run my code, it gives me a comparable error. Does this mean I have to implement the comparable method? I convereted the arrayList into an Integer so I thought the comparable method would support it. Or is it just something wrong with my code
Error: Exception in thread "main" java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.Comparable
Overview: Originally, my intended purpose for the program was to make a Treemap that read from a text document and specifically found all the words and the index/rows of where the words were located. Now I wish to make a "top ten" list that contains the most used words. I wanted to "flip" my treemap so that the integer values would be what would be put in order and the string would follow
public static void getTopTenWords(TreeMap<String, ArrayList<Integer>> map) {
Set<Map.Entry<String, ArrayList<Integer>>> set = map.entrySet();
TreeMap<Integer, String> temp = new TreeMap<Integer, String>();
int count = 1;
for(Map.Entry<String, ArrayList<Integer>> entry : set){
if(temp.containsKey(entry.getValue())) {
Integer val = entry.getValue().get(count);
val++;
temp.put(val, entry.getKey());
}
else {
temp.put(entry.getValue().get(count), entry.getKey());
}
count++;
}
}
Now I wish to make a "top ten" list that contains the most used words.
I wanted to "flip" my treemap so that the integer values would be what
would be put in order and the string would follow
Note that a Map contains only unique keys. So, if you try to keep your count as key, then you would need to put it in your Map by creating a new object with new Integer(count).
If you put your count in Map like: - map.put(2, "someword"), then there are chances that your previous count value gets overwritten, because Integer caches the values in range: - [-128 to 127]. So, the integer values between these range will be interned if you don't create a new object. And hence two Integer with value say 2 will point to same Integer object, and hence resulting in duplicate key.
Secondly, in your code: -
if (temp.containsKey(entry.getValue()))
using the above if statement, you are comparing an ArrayList with an Integer value. temp contains key which are integers. And values in entry are ArrayList. So, that will fail at runtime. Also, since your orginal Map contains just the location of the word found in the text file. So, just what you need to do is, get the size of arraylist for each word, and make that a key.
You would need to modify your code a little bit.
public static void getTopTenWords(TreeMap<String, ArrayList<Integer>> map) {
Set<Map.Entry<String, ArrayList<Integer>>> set = map.entrySet();
TreeMap<Integer, String> temp = new TreeMap<Integer, String>();
for(Map.Entry<String, ArrayList<Integer>> entry : set) {
int size = entry.getValue().size();
int word = entry.getKey();
temp.put(new Integer(size), word));
}
}
So, you can see that, I just used the size of the values in your entry set. And put it as a key in your TreeMap. Also using new Integer(size) is very important. It ensures that every integer reference points to a new object. Thus no duplication.
Also, note that, your TreeMap sorts your Integer value in ascending order. Your most frequent words would be somewhere at the end.
Say I have a LinkedHashMap containing 216 entries, how would I get the first 100 values (here, of type Object) from a LinkedHashMap<Integer, Object>.
Well to start with, doing this for HashMap as per your title, doesn't make much sense - HashMap has no particular order, and the order may change between calls. It makes more sense for LinkedHashMap though.
There, I'd use Guava's Iterables.limit method:
Iterable<Object> first100Values = Iterables.limit(map.values(), 100);
or
// Or whatever type you're interested in...
Iterable<Map.Entry<Integer, Object>> firstEntries =
Iterables.limit(map.entrySet(), 100);
You can then create a list from that, or iterate over it, or whatever you want to do.
Ugly One-Liner
This ugly one-liner would do (and return a ArrayList<Object> in the question's case):
Collections.list(Collections.enumeration(lhMap.values())).subList(0, 100)
This would work for a HashMap as well, however HashMap being backed by a HashSet there's not guarantee that you will get the first 100 values that were entered; it would work on other types, with similar limitations.
Notes:
relatively unefficient (read the Javadoc to know why - though there's worse!),
careful when using views (read the Javadoc to know more),
I did mention it was ugly.
Step-By-Step Usage Example
(as per the OP's comment)
Map<Integer, Pair<Double, SelectedRoad>> hashmap3 =
new LinkedHashMap<Integer, Pair<Double, SelectedRoad>>();
// [...] add 216 elements to hasmap3 here somehow
ArrayList<Pair<Double,SelectedRoad>> firstPairs =
Collections.list(Collections.enumeration(hashmap3.values())).subList(0, 100)
// you can then view your Pairs' SelectedRow values with them with:
// (assuming that:
// - your Pair class comes from Apache Commons Lang 3.0
// - your SelectedRoad class implements a decent toString() )
for (final Pair<Double, SelectedRoad> p : firstPairs) {
System.out.println("double: " + p.left);
System.out.println("road : " + p.right);
}
You can do:
Map<Integer, Object> records;
List<Entry<Integer, Object>> firstHundredRecords
= new ArrayList<Entry<Integer, Object>>(records.entrySet()).subList(0, 100);
Although note that this will copy all the entries from the map.
To copy only the records you need with using a library.
Map<Integer, Object> records;
List<Entry<Integer, Object>> firstHundredRecords = new ArrayList<>();
for(Entry<Integer, Object> entry : records.entrySet()) {
firstHundredRecords.add(entry);
if (firstHundredRecords.size()>=100) break;
}
You can use counter. Your foreach loop will exit when your counter reached 100.
Write a loop which uses a Iterator.next() 100 times, and then stops.
I was going to say something about NavigableMap and SortedMap - but their interfaces are defined in terms of keys, not indexes. But they may be useful nevertheless, depending on what your actual underlying problem is.
I have data stored in a HashMap, which I want to access via multiple threads simultaneously, to split the work done on the items.
Normally (with a List for example) I would just give each thread an index to start with and could easily split the work like this:
for(int i = startIndex; i < startIndex+batchSize && i < list.size(); i++)
{
Item a = list.get(i);
// do stuff with the Item
}
Of course this doesnt work with a HashMap, because I can't access it via an index.
Is there an easy way to iterate only over a part of the map? Should I rather use another data structure for this case?
I read about SortedMap, but it has too much overhead I dont need (sorting the items). I have a lot of data and performance is crucial.
Any tips would be highly appreciated.
Firstly, you shouldn't be using a HashMap, because iteration order is undefined. Either use a LinkedHashMap, whose iteration order is the same as insertion order (at least it's defined), or use a TreeMap, whose iteration order is the natural sorting order. I would recommend the LinkedHashMap, because inserting an entry will make slicing the map up unpredictable.
To carve up a map, use this code:
LinkedHashMap<Integer, String> map = new LinkedHashMap<Integer, String>();
for (Map.Entry<Integer, String> entry : new ArrayList<Map.Entry<Integer,String>>(map.entrySet()).subList(start, end)) {
Integer key = entry.getKey();
String value = entry.getValue();
// Do something with the entry
}
I have in-lined the code, but expanded out it is equivalent to:
List<Map.Entry<Integer, String>> entryList = new ArrayList<Map.Entry<Integer,String>>();
entryList.addAll(map.entrySet());
entryList = entryList.subList(start, end); // You provide the start and end index
for (Map.Entry<Integer, String> entry : entryList) ...
If you only do the traversal a few times, or if the map doesn't change you could get a Set of keys, and then send that to an array. From there its pretty much your normal method. But obviously if the HashMap changed then you would have to do those two operations over again which could get very costly.
With HashMap#keySet -> Set#toArray you would get an array of the keys.
With this array you could procede as before, keep the array of keys and pass them to your threads. Then each thread would access only the keys it had been assigned and finally you could access the entries of a given partition of the HashMap with only those keys.
Unless your map is enormous, the cost of iterating over a map is small compared with the cost of starting a task on another thread and trivial compared with the work you intend to do.
For this reason, the simplest way to divide up your work is likely to be turn the Map into an Array and break that up.
final Map<K, V> map =
final ExecutorServices es =
final int portions = Runtime.getRuntime().availableProcessors();
final Map.Entry<K,V>[] entries = (Map.Entry<K,V>[]) map.entrySet().toArray(new Map.Entry[map.size()]);
final int portionSize = (map.size() + portions-1)/ portions;
for(int i = 0; i < portions; i++) {
final int start = i * portionSize;
final int end = Math.min(map.size(), (i + 1) * portionSize);
es.submit(new Runnable() {
public void run() {
for(int j=start; j<end;j++) {
Map.Entry<K,V> entry = entries[j];
// process entry.
}
}
});
}