Only iterate over a part of a Map - java

I have data stored in a HashMap, which I want to access via multiple threads simultaneously, to split the work done on the items.
Normally (with a List for example) I would just give each thread an index to start with and could easily split the work like this:
for(int i = startIndex; i < startIndex+batchSize && i < list.size(); i++)
{
Item a = list.get(i);
// do stuff with the Item
}
Of course this doesnt work with a HashMap, because I can't access it via an index.
Is there an easy way to iterate only over a part of the map? Should I rather use another data structure for this case?
I read about SortedMap, but it has too much overhead I dont need (sorting the items). I have a lot of data and performance is crucial.
Any tips would be highly appreciated.

Firstly, you shouldn't be using a HashMap, because iteration order is undefined. Either use a LinkedHashMap, whose iteration order is the same as insertion order (at least it's defined), or use a TreeMap, whose iteration order is the natural sorting order. I would recommend the LinkedHashMap, because inserting an entry will make slicing the map up unpredictable.
To carve up a map, use this code:
LinkedHashMap<Integer, String> map = new LinkedHashMap<Integer, String>();
for (Map.Entry<Integer, String> entry : new ArrayList<Map.Entry<Integer,String>>(map.entrySet()).subList(start, end)) {
Integer key = entry.getKey();
String value = entry.getValue();
// Do something with the entry
}
I have in-lined the code, but expanded out it is equivalent to:
List<Map.Entry<Integer, String>> entryList = new ArrayList<Map.Entry<Integer,String>>();
entryList.addAll(map.entrySet());
entryList = entryList.subList(start, end); // You provide the start and end index
for (Map.Entry<Integer, String> entry : entryList) ...

If you only do the traversal a few times, or if the map doesn't change you could get a Set of keys, and then send that to an array. From there its pretty much your normal method. But obviously if the HashMap changed then you would have to do those two operations over again which could get very costly.

With HashMap#keySet -> Set#toArray you would get an array of the keys.
With this array you could procede as before, keep the array of keys and pass them to your threads. Then each thread would access only the keys it had been assigned and finally you could access the entries of a given partition of the HashMap with only those keys.

Unless your map is enormous, the cost of iterating over a map is small compared with the cost of starting a task on another thread and trivial compared with the work you intend to do.
For this reason, the simplest way to divide up your work is likely to be turn the Map into an Array and break that up.
final Map<K, V> map =
final ExecutorServices es =
final int portions = Runtime.getRuntime().availableProcessors();
final Map.Entry<K,V>[] entries = (Map.Entry<K,V>[]) map.entrySet().toArray(new Map.Entry[map.size()]);
final int portionSize = (map.size() + portions-1)/ portions;
for(int i = 0; i < portions; i++) {
final int start = i * portionSize;
final int end = Math.min(map.size(), (i + 1) * portionSize);
es.submit(new Runnable() {
public void run() {
for(int j=start; j<end;j++) {
Map.Entry<K,V> entry = entries[j];
// process entry.
}
}
});
}

Related

Iterate over key-range of HashMap

Is it possible to iterate over a certain range of keys from a HashMap?
My HashMap contains key-value pairs where the key denotes a certainr row-column in Excel (e.g. "BM" or "AT") and the value is the value in this cell.
For example, my table import is:
startH = {
BQ=2019-11-04,
BU=2019-12-02,
BZ=2020-01-06,
CD=2020-02-03,
CH=2020-03-02,
CM=2020-04-06
}
endH = {
BT=2019-11-25,
BY=2019-12-30,
CC=2020-01-27,
CG=2020-02-24,
CL=2020-03-30,
CP=2020-04-27
}
I need to iterate over those two hashmap using a key-range in order to extract the data in the correct order. For example from "BQ" to "BT".
Explanation
Is it possible to iterate over hashmap but using its index?
No.
A HashMap has no indices. Depending on the underlying implementation it would also be impossible. Java HashMaps are not necessarily represented by a hashing-table. It can switch over to a red-black tree and they do not provide direct access at all. So no, not possible.
There is another fundamental flaw in this approach. HashMap does not maintain any order. Iterating it yields random orders that can change each time you start the program. But for this approach you would need insertion order. Fortunately LinkedHashMap does this. It still does not provide index-based access though.
Solutions
Generation
But, you actually do not even want index based access. You want to retrieve a certain key-range, for example from "BA" to "BM". A good approach that works with HashMap would be to generate your key-range and simply using Map#get to retrieve the data:
char row = 'B';
char columnStart = 'A';
char columnEnd = 'M';
for (char column = columnStart; columnStart <= columnEnd; column++) {
String key = Chararcter.toString(row) + column;
String data = map.get(key);
...
}
You might need to fine-tune it a bit if you need proper edge case handling, like wrapping around the alphabet (use 'A' + (column % alphabetSize)) and maybe it needs some char to int casting and vice versa for the additions, did not test it.
NavigableMap
There is actually a variant of map that offers pretty much what you want out of the box. But at higher cost of performance, compared to a simple HashMap. The interface is called NavigableMap. The class TreeMap is a good implementation. The problem is that it requires an explicit order. The good thing though is that you actually want Strings natural order, which is lexicographical.
So you can simply use it with your existing data and then use the method NavigableMap#subMap:
NavigableMap<String, String> map = new TreeMap<>(...);
String startKey = "BA";
String endKey = "BM";
Map<String, String> subMap = map.subMap(startKey, endKey);
for (Entry<String, String> entry : subMap.entrySet()) {
...
}
If you have to do those kind of requests more than once, this will definitely pay off and it is the perfect data-structure for this use-case.
Linked iteration
As explained before, it is also possible (although not as efficient) to instead have a LinkedHashMap (to maintain insertion order) and then simply iterate over the key range. This has some major drawbacks though, for example it first needs to locate the start of the range by fully iterating to there. And it relies on the fact that you inserted them correctly.
LinkedHashMap<String, String> map = ...
String startKey = "BA";
String endKey = "BM";
boolean isInRange = false;
for (Entry<String, String> entry : map.entrySet()) {
String key = entry.getKey();
if (!isInRange) {
if (key.equals(startKey)) {
isInRange = true;
} else {
continue;
}
}
...
if (key.equals(endKey)) {
break;
}
}
// rangeLower and rangeUpper can be arguments
int i = 0;
for (Object mapKey : map.keySet()) {
if (i < rangeLower || i > rangeUpper) {
i++;
continue;
}
// Do something with mapKey
}
The above code iterates by getting keyset and explicitly maintaining index and incrementing it in each loop. Another option is to use LinkedHashMap, which maintains a doubly linked list for maintaining insertion order.
I don't believe you can. The algorithm you propose assumes that the keys of a HashMap are ordered and they are not. Order of keys is not guaranteed, only the associations themselves are guaranteed.
You might be able to change the structure of your data to something like this:
ranges = {
BQ=BT,
BU=BY,
....
}
Then the iteration over the HashMap keys (start cells) would easily find the matching end cells.

Getting the indices of an unsorted double array after sorting

This question comes as a companion of this one that regarded fastest sorting of a double array.
Now I want to get the top-k indices corresponding to the unsorted array.
I have implemented this version which (unfortunately) uses autoboxing and HashMap as proposed in some answers including this one:
HashMap<Double, Integer> map = new HashMap<Double, Integer>();
for(int i = 0; i < numClusters; i++) {
map.put(scores[i], i);
}
Arrays.sort(scores);
HashSet<Integer> topPossibleClusters = new HashSet<Integer>();
for(int i = 0; i < numClusters; i++) {
topPossibleClusters.add(map.get(scores[numClusters - (i+1)]));
}
As you can see this uses a HashMap with keys the Double values of the original array and as values the indices of the original array.
So, after sorting the original array I just retrieve it from the map.
I also use HashSet as I am interested in deciding if an int is included in this set, using .contains() method. (I don't know if this makes a difference since as I mentioned in the other question my arrays are small -50 elements-). If this does not make a difference point it out though.
I am not interested in the value per se, only the indices.
My question is whether there is a faster approach to go with it?
This sort of interlinking/interlocking collections lends itself to fragile, easily broken, hard to debug, unmaintainable code.
Instead create an object:
class Data {
double value;
int originalIndex;
}
Create an array of Data objects storing the original value and index.
Sort them using a custom comparator that looks at data.value and sorts descending.
Now the top X items in your array are the ones you want and you can just look at the value and originalIndex as you need them.
As Tim points out linking a multiple collections is rather errorprone. I would suggest using a TreeMap as this would allow for a standalone solution.
Lets say you have double[] data, first copy it to a TreeMap:
final TreeMap<Double, Integer> dataWithIndex = new TreeMap<>();
for(int i = 0; i < data.length; ++i) {
dataWithIndex.put(data[i], i);
}
N.B. You can declare dataWithIndex as a NavigableMap to be less specific, but it's so much longer and it doesn't really add much as there is only one implementation in the JDK.
This will populate the Map in O(n lg n) time as each put is O(lg n) - this is the same complexity as sorting. In reality it will be probably be a little slower, but it will scale in the same way.
Now, say you need the first k elements, you need to first find the kth element - this is O(k):
final Iterator<Double> keyIter = dataWithIndex.keySet().iterator();
double kthKey;
for (int i = 0; i < k; ++i) {
kthKey = keyIter.next();
}
Now you just need to get the sub-map that has all the entries upto the kth entry:
final Map<Double, Integer> topK = dataWithIndex.headMap(kthKey, true);
If you only need to do this once, then with Java 8 you can do something like this:
List<Entry<Double, Integer>> topK = IntStream.range(0, data.length).
mapToObj(i -> new SimpleEntry<>(data[i], i)).
sorted(comparing(Entry::getKey)).
limit(k).
collect(toList());
i.e. take an IntStream for the indices of data and mapToObj to an Entry of the data[i] => i (using the AbsractMap.SimpleEntry implementation). Now sort that using Entry::getKey and limit the size of the Stream to k entries. Now simply collect the result to a List. This has the advantage of not clobbering duplicate entries in the data array.
It is almost exactly what Tim suggests in his answer, but using an existing JDK class.
This method is also O(n lg n). The catch is that if the TreeMap approach is reused then it's O(n lg n) to build the Map but only O(k) to reuse it. If you want to use the Java 8 solution with reuse then you can do:
List<Entry<Double, Integer>> sorted = IntStream.range(0, data.length).
mapToObj(i -> new SimpleEntry<>(data[i], i)).
sorted(comparing(Entry::getKey)).
collect(toList());
i.e. don't limit the size to k elements. Now, to get the first k elements you just need to do:
List<Entry<Double, Integer>> subList = sorted.subList(0, k);
The magic of this is that it's O(1).

Better data structure for a multi map of a hash map

The datastructure design I've chosen is proving very awkward to execute, so rather than ask for your expert opinion on how to execute it, I'm hoping you can suggest a more natural data structure for what I'm trying to do, which is as follows. I'm reading in rows of data. Each column is a single variable (Animal, Color, Crop, ... - there are 45 of them). Each row of data has a value for the variable of that column - you don't know the values or the number of rows in advance.
Animal Color Crop ...
-------------------------------------
cat red oat
cat blue hay
dog blue oat
bat blue corn
cat red corn
dog gray corn
... ... ...
When I'm done reading, it should capture each Variable, each value that variable took, and how many times that variable took that value, like so:
Animal [cat, 3][dog,2][bat, 1]...
Color [blue, 3][red,2][gray,1]...
Crop [corn,3][oat, 2][hay,1]...
...
I've tried several approaches, the closest I've gotten is with a GUAVA multi map of hash maps, like so:
Map<String, Integer> eqCnts = new HashMap<String, Integer>();
Multimap<String, Map> ed3Dcnt = HashMultimap.create();
for (int i = 0; i + 1 < header.length; i++) {
System.out.format("Got a variable of %s\n", tmpStrKey = header[i]);
ed3Dcnt.put(tmpStrKey, new HashMap<String, Integer>());
}
It seems I've created exactly what I want just fine, but it's extremely awkward and tedious to work with, and also it behaves in mysterious ways (for one thing, even though the "ed3Dcnt.put()" inserted a HashMap, the corresponding ".get()" does not return a HashMap, but rather a Collection, which creates a whole new set of problems.) Note that I'd like to sort the result on the values, from highest to lowest, but I think I can do that easily enough.
So if you please, a suggestion on a better choice of data structure design? If there isn't a clearly better design choice, how do I use the Collection that the .get() returns, when all I want is the single HashMap that I put in that slot?
Thanks very much - Ed
You can remove some of the oddity by replacing your Map<String, Integer> by a Multiset.
A multiset (or a bag) is a set that allows duplicate elements - and counts them. You throw in an apple, a pear, and an apple again. It remembers that it has two apples and a pear. Basically, it's what you imagine under a Map<String, Integer> which you just used.
Multiset<String> eqCounts = HashMultiset.create();
the corresponding ".get()" does not return a HashMap, but rather a
Collection
This is because you used a generic 'Multimap' interface. The docs say:
You rarely use the Multimap interface directly, however; more often
you'll use ListMultimapor SetMultimap, which map keys to a List or a
Set respectively.
So, to stick to your original design:
Each column will be a Multiset<String> which will store and count your values.
You'll have a Map<String, Multiset<String>> (key is a header, value is the column) where you'll put the columns like this:
Map<String, Multiset<String>> columns = Maps.newHashMap();
for (int i = 0; i < headers.length; i++) {
System.out.format("Got a variable of %s\n", headers[i]);
columns.put(headers[i], HashMultiset.<String>create());
}
Read a line and put the values where they belong:
String[] values = line.split(" ");
for (int i = 0; i < headers.length; i++) {
columns.get(headers[i]).add(values[i]);
}
All that said, you can see that the outer HashMap is kind of redundant and the whole thing still could be improved (though it's good enough, I think). To improve it more, you can try of these:
Use an array of Multiset instead of a HashMap. Afterall, you know the number of columns beforehand.
If you're uncomfortable with creating generic arrays, use a List.
And probably the best: Create a class Column like this:
private static class Column {
private final String header;
private final Multiset<String> values;
private Column(String header) {
this.header = header;
this.values = HashMultiset.create();
}
}
And instead of using String[] for headers and a Map<String, Multiset<String>> for their values, use a Column[]. You can create this array in place of creating the headers array.
Seems to me that the best fit is:
HashMap<String, HashMap<String, Integer>> map= new HashMap<String, HashMap<String, Integer>>();
Now, to add header inner maps:
for (int i = 0; i + 1 < header.length; i++) {
System.out.format("Got a variable of %s\n", tmpStrKey = header[i]);
map.put(tmpStrKey, new HashMap<String, Integer>());
}
And to increment a value in the inner map:
//we are in some for loop
for ( ... ) {
String columnKey = "animal"; //lets say we are here in the for loop
for ( ... ) {
String columnValue = "cat"; //assume we are here
HashMap<String, Integer> innerMap = map.get(columnKey);
//increment occurence
Integer count = innerMap.get(columnValue);
if (count == null) {
count = 0;
}
innerMap.put(columnValue, ++count);
}
}
1) The map inside your multimap is commonly referred to as a cardinality map. For creating a cardinality map from a collection of values, I usually use CollectionUtils.getCardinalityMap from Apache Commons Collections, although that isn't generified so you'll need one unsafe (but known to be safe) cast. If you want to build the map using Guava I think you should first put the values for a variable in a Set<String> (to get the set of unique values) and then use Iterables.frequency() for each value to get the count. (EDIT: or even easier: use ImmutableMultiset.copyOf(collection) to get the cardinality map as a Multiset) Anyway, the resulting cardinality map is a Map<String, Integer such as you're already using.
2) I don't see why you need a Multimap. After all you want to map each variable to a cardinality map, so I'd use Map<String, Map<String, Integer>>.
EDIT: or use Map<String, Multiset<String>> if you decide to use a Multiset as your cardinality map.

How to make two dimensional LinkedList in java?

for example:
public static LinkedList<String, Double> ll = new LinkedList<String, Double>;
from your question, I think (not 100% sure) you are looking for
java.util.LinkedHashMap<K, V>
in your case, it would be LinkedHashMap<String, Double>
from java doc:
Hash table and linked list implementation of the Map interface, with
predictable iteration order. This implementation differs from HashMap
in that it maintains a doubly-linked list running through all of its
entries.
if you do want to get element by list.get(5), you could :
LinkedList<Entry<String, Double>>
so you can get Entry element by Entry entry = list.get(5), then entry.getKey() gives you the STring, and entry.getValue() gives you the Double.
Reading all your comments, I suggest you do something like this:
public class StringAndDouble {
private String str;
private double dbl;
// add constructor
// add getters, setters and other methods as needed.
// override equals() and hashCode()
}
Now you can use:
List<StringAndDouble> list = new LinkedList<>(); // or
List<StringAndDouble> list = new ArrayList<>(); // better in most cases
Now you can access your objects by index.
This answer creates a new class, to fit your needs. The class has two fields, one String, one double. This doesn't make the class two dimensional. I think you have a misunderstanding there. When there are n dimensions, you need n indexes to access an element. You were talking of accessing by index, so I assume you're looking for a one dimensional list holding the objects, that have more than one field.
Do you mean like this?
HashMap<String, Double> hm = new HashMap<String, Double>();
Since OP in a comment to #Kent says he wants to be able to get items by index...
Note that a LinkedList (and LinkedHashMap) are inefficient at that. He may prefer an ArrayList. So I would suggest that his "2D" implementation be a
ArrayList<Map.Entry<String, Double>>
which will efficiently support a get by index.
As for the normal get(String key), you'd have to do a linear search of all the entries, which would be inefficient.
So, you have a decision: which way of accessing (by a key or by an index) is more important?
You can actually use Linked Lists within eachother...
For Example:
public LinkedList<LinkedList<Integer>> twoDimLinkedList = new LinkedList<LinkedList<Integer>>();
Then:
////////////////
int value = twoDimLinkedList.get(3).get(4);
/////////////////
or (If you were planning on using it for iterative purposes):
/////////////////
for (int i = 0; i < twoDimLinkedList.size(); i++) {
LinkedList<Integer> twoDimLinkedListRow = new LinkedList<Integer>();
for (int m = 0; m < twoDimLinkedList.get(i).size(); m++) {
twoDimLinkedListRow.add(value);
}
twoDimLinkedList.add(twoDimLinkedListRow);
}
////////////////

How to sort a map

I have a Map to sort as follows:
Map<String, String> map = new HashMap();
It contains the following String keys:
String key = "key1.key2.key3.key4"
It contains the following String values:
String value = "value1.value2"
where the key and value can vary by their number of dot sections from key1/value1 to key1.key2.key3.key4.key5/value1.value2.value3.value4.value5 non-homogeneously
I need to compare them according to the number of dots present in keys or in values according to the calling method type key / value :
sortMap(Map map, int byKey);
or
sortMap(Map map, int byValue);
The methods of course will return a sorted map.
Any help would be appreciated.
There is no way to impose any sort of order on HashMap.
If you want to order elements by some comparison on the keys, then use a TreeMap with some Comparator on the keys, or just use their default Comparable ordering.
If you want to order by the values, the only real option is to use a LinkedHashMap, which preserves the order that entries were put into the map, and then to sort the entries before inserting them into the map, or perhaps some non-JDK Map implementation. There are dirty hacks that make a key comparator that actually secretly compares the values, but these are dangerous and frequently lead to unpredictable behavior.
For starters, you will need to be using an instance of SortedMap. If the map doesn't implement that interface, then it has an undefined/arbitrary iteration order and you can't control it. (Generally this is the case, since a map is a way of associating values with keys; ordering is an auxiliary concern.)
So I'll assume you're using TreeMap, which is the canonical sorted map implementation. This sorts its keys according to a Comparator which you can supply in the constructor. So if you can write such a comparator that determines which is the "lower" of two arbitrary keys (spoiler alert: you can), this will be straightforward to implement.
This will, however, only work when sorting by key. I don't know if it makes much sense to sort a map by value, and I'm not aware of any straightforward way to do this. The best I can think of is to write a Comparator<Map.Entry> that sorts on values, call Map.getEntrySet and push all the entries into a list, then call Collections.sort on the list. It's not very elegant or efficient but it should get the job done if performance isn't your primary concern.
(Note also that if your keys aren't immutable, you will run into a lot of trouble, as they won't be resorted when externally changed.
You should use a TreeMap and implement a ValueComparator or make the key and value objects that implement Comparable.
Must be a duplicate here...
edit: duplicate of (to name just one) Sort a Map<Key, Value> by values (Java)
I did it by the following:
#SuppressWarnings({ "unchecked", "rawtypes" })
public static Map sortMap(Map unsortedMap) {
List list = new LinkedList(unsortedMap.entrySet());
// sort list based on comparator
Collections.sort(list, new Comparator() {
public int compare(Object o1, Object o2) {
String value1 = (String)((Map.Entry) (o1)).getValue();
String value2 = (String)((Map.Entry) (o2)).getValue();
// declare the count
int count1 = findOccurances(value1, '.');
int count2 = findOccurances(value2, '.');
// Go to thru the comparing
if(count1 > count2){
return -1;
}
if(count1 < count2){
return 1;
}
return 0;
}
});
// put the sorted list into map again
Map sortedMap = new LinkedHashMap();
for (Iterator it = list.iterator(); it.hasNext();) {
Map.Entry entry = (Map.Entry) it.next();
sortedMap.put(entry.getKey(), entry.getValue());
}
return sortedMap;
}
With the following helper method:
private static int findOccurances(String s, char chr) {
final char[] chars = s.toCharArray();
int count = 0;
for (int i = 0; i < chars.length; i++) {
if (chars[i] == chr) {
count++;
}
}
return count;
}
Here, I can put some switch on the comparing part with an additional int argument to change between asc/desc.
I can change between values and keys through a switch of another int argument value to get my answer.

Categories