When to use TreeMap instead of HashMap - java

I require a map that supports 3 operations: "insert", "remove" and "iterate in sorted order". This is exactly the interface of TreeMap in Java. That being said it can also be implemented by using a HashMap and sorting it every time before iteration. To analyze the different approaches, lets say I perform n inserts and m removes, 'r' reads and then iterate.
With TreeMap we have the following implementation:
TreeMap<Integer, Integer> tm = Maps.newTreeMap();
for (int i=0;i<n;++i) {tm.put(i, 2*i);} // O(n*log(n))
for (int i=0;i<m;++i) {tm.remove(i);} // O(m*log(m))
for (int i=0;i<r;++i) {tm.get(i);} // O(r*log(n-m))
for (Integer i : tm) {print(i);} // O(n-m)
All told we have a total run time of O(n*log(n) + m*log(m) + r*log(n-m))
With HashMap we have the following implementation:
HashMap<Integer, Integer> hm = Maps.newHashMap();
for (int i=0;i<n;++i) {hm.put(i, 2*i);} // O(n)
for (int i=0;i<m;++i) {hm.remove(i);} // O(m)
for (int i=0;i<r;++i) {hm.get(i);} // O(r)
List<Integer> sortedList = Lists.newArrayList(hm.keySet()); // O(n-m)
Collections.sort(sortedList); // O((n-m)*log(n-m))
for (Integer i : sortedList) {print(i);} // O(n-m)
All told we have a total run time of O((n-m)*log(n-m)).
For all n,m O(n*log(n) + m*log(m) + r*log(n-m)) > O((n-m)*log(n-m)).
My question therefore is, what is the use case where a TreeMap is better than a HashMap? Is it only better if you need to iterate over the map many (let's say k) times (in which case, if k is >> log(n) the run time for TreeMap will be O(k*(n-m)) whereas for HashMap will be O(k*(n-m)*log(n-m)))? Regardless, if you are only performing O(log(n)) iterations (this does sound like such a sane use case), HashMap will outperform TreeMap. Am I missing something?

Of course there exist such use cases. In all read-heavy settings you have the advantage of sorting only once, during insertion. The majority of use cases are read-heavy, contrary to the assumptions of your question.
An even greater advantage is offered by the TreeMap when you need to extract submaps with an upper or lower bound on the key, find the minimum or maximum keys, or find keys closest to a given key. The interface NavigableMap is dedicated to these operations.

An obvious use case is when you want to sort the map according to some Comparator definition. It's not always about performance.

Related

Iterate over key-range of HashMap

Is it possible to iterate over a certain range of keys from a HashMap?
My HashMap contains key-value pairs where the key denotes a certainr row-column in Excel (e.g. "BM" or "AT") and the value is the value in this cell.
For example, my table import is:
startH = {
BQ=2019-11-04,
BU=2019-12-02,
BZ=2020-01-06,
CD=2020-02-03,
CH=2020-03-02,
CM=2020-04-06
}
endH = {
BT=2019-11-25,
BY=2019-12-30,
CC=2020-01-27,
CG=2020-02-24,
CL=2020-03-30,
CP=2020-04-27
}
I need to iterate over those two hashmap using a key-range in order to extract the data in the correct order. For example from "BQ" to "BT".
Explanation
Is it possible to iterate over hashmap but using its index?
No.
A HashMap has no indices. Depending on the underlying implementation it would also be impossible. Java HashMaps are not necessarily represented by a hashing-table. It can switch over to a red-black tree and they do not provide direct access at all. So no, not possible.
There is another fundamental flaw in this approach. HashMap does not maintain any order. Iterating it yields random orders that can change each time you start the program. But for this approach you would need insertion order. Fortunately LinkedHashMap does this. It still does not provide index-based access though.
Solutions
Generation
But, you actually do not even want index based access. You want to retrieve a certain key-range, for example from "BA" to "BM". A good approach that works with HashMap would be to generate your key-range and simply using Map#get to retrieve the data:
char row = 'B';
char columnStart = 'A';
char columnEnd = 'M';
for (char column = columnStart; columnStart <= columnEnd; column++) {
String key = Chararcter.toString(row) + column;
String data = map.get(key);
...
}
You might need to fine-tune it a bit if you need proper edge case handling, like wrapping around the alphabet (use 'A' + (column % alphabetSize)) and maybe it needs some char to int casting and vice versa for the additions, did not test it.
NavigableMap
There is actually a variant of map that offers pretty much what you want out of the box. But at higher cost of performance, compared to a simple HashMap. The interface is called NavigableMap. The class TreeMap is a good implementation. The problem is that it requires an explicit order. The good thing though is that you actually want Strings natural order, which is lexicographical.
So you can simply use it with your existing data and then use the method NavigableMap#subMap:
NavigableMap<String, String> map = new TreeMap<>(...);
String startKey = "BA";
String endKey = "BM";
Map<String, String> subMap = map.subMap(startKey, endKey);
for (Entry<String, String> entry : subMap.entrySet()) {
...
}
If you have to do those kind of requests more than once, this will definitely pay off and it is the perfect data-structure for this use-case.
Linked iteration
As explained before, it is also possible (although not as efficient) to instead have a LinkedHashMap (to maintain insertion order) and then simply iterate over the key range. This has some major drawbacks though, for example it first needs to locate the start of the range by fully iterating to there. And it relies on the fact that you inserted them correctly.
LinkedHashMap<String, String> map = ...
String startKey = "BA";
String endKey = "BM";
boolean isInRange = false;
for (Entry<String, String> entry : map.entrySet()) {
String key = entry.getKey();
if (!isInRange) {
if (key.equals(startKey)) {
isInRange = true;
} else {
continue;
}
}
...
if (key.equals(endKey)) {
break;
}
}
// rangeLower and rangeUpper can be arguments
int i = 0;
for (Object mapKey : map.keySet()) {
if (i < rangeLower || i > rangeUpper) {
i++;
continue;
}
// Do something with mapKey
}
The above code iterates by getting keyset and explicitly maintaining index and incrementing it in each loop. Another option is to use LinkedHashMap, which maintains a doubly linked list for maintaining insertion order.
I don't believe you can. The algorithm you propose assumes that the keys of a HashMap are ordered and they are not. Order of keys is not guaranteed, only the associations themselves are guaranteed.
You might be able to change the structure of your data to something like this:
ranges = {
BQ=BT,
BU=BY,
....
}
Then the iteration over the HashMap keys (start cells) would easily find the matching end cells.

Finding the maximum and minimum values in a HashMap of ArrayLists with semi-known key - Java

I have a HashMap of ArrayLists as follows:
HashMap<String, ArrayList<Double>> Flkn = new HashMap<String, ArrayList<Double>>();
Flkn.put("T_"+l+"_"+k+"_"+n, new ArrayList());
l, k and n take their values based on several loops and hence their values change depending on the parameters.
Under these circumstances, I am wanting to know for a given value of k, how the minimum and maximum values of the elements can be found in their relevant ArrayLists. (Please note that the length or ArrayLists is also dependent on the parameters)
For instance, let's say that I am wanting to know the minimum and maximum values within the ArrayList for k=3. Then what I am looking for would be all the ArrayLists that have the key ("T_"+l+"_"+3+"_"+n) for every value of l and n. The problem here is that there is no way I can predict the values of l and n because they are totally dependent on the code. Another inconvenient thing is that I am wanting to get the minimum and maximum values out of the loops where l and n get their values, hence using these variables directly isn't feasible.
What would be an efficient way to get Java to call every value of l and n and fetch the values in the ArrayList in order to find the minimum and maximum of these values?
If you absolutely have to deal with such "smart keys", for any kind of processing based on its parts you first need functions to extract values of those parts:
final static Function<String, Integer> EXTRACT_K = s -> Integer.parseInt(s.replaceAll("T_\\d+_(\\d+)_\\d+", "$1"));
final static Function<String, Integer> EXTRACT_L = s -> Integer.parseInt(s.replaceAll("T_(\\d+)_\\d+_\\d+", "$1"));
final static Function<String, Integer> EXTRACT_N = s -> Integer.parseInt(s.replaceAll("T_\\d+_(\\d+)_\\d+", "$1"));
These functions when applied to a key return k, l or n, respectively (if one knows more elegant way to do such, please comment or edit).
To be as more effective as possible (iterate not over entire map, but only over its part), suggest to switch from HashMap to any implementation of SortedMap with ordering based on values stored in a smart key:
final static Comparator<String> CMP
= Comparator.comparing(EXTRACT_K)
.thenComparing(EXTRACT_L)
.thenComparing(EXTRACT_N);
SortedMap<String, List<Double>> map = new TreeMap<>(CMP);
Such you get a map where entries will be first sorted by k, then by l and finally by n. Now it is possible to get all lists mapped to a given k using:
int k = 1;
Collection<List<Double>> lists
= map.subMap(String.format("T_0_%s_0", k), String.format("T_0_%s_0", k + 1)).values();
To get max and min values around items of subMap, take the stream of its values, convert it to DoubleStream and use its .summaryStatistics() as follows:
DoubleSummaryStatistics s
= subMap.values().stream()
.flatMapToDouble(vs -> vs.stream().mapToDouble(Double::doubleValue))
.summaryStatistics();
The final part is to check whether values exist:
if (s.getCount() > 0) {
max = s.getMax();
min = s.getMin();
} else
// no values exist for a given k, thus max and min are undefined
In Java 8 you could use DoubleSummaryStatistics and do something like this:
final DoubleSummaryStatistics stats =
Flkn.entrySet().stream().filter(e -> e.getKey().matches("T_[0-9]+_" + k + "_[0-9]+"))
.flatMapToDouble(e -> e.getValue().stream().mapToDouble(Double::doubleValue))
.summaryStatistics();
System.out.println(stats.getMax());
System.out.println(stats.getMin());
filter to keep only the entries you need; flatMapToDouble to merge your lists; and summaryStatistics to get both the minimum and maximum.
I'll simplify this a bit. Suppose you have a key that depends on an Integer k and a String s. It might seem a good idea to use a
Map<String, Object>
where the keys are k + " " + s (or something similar).
This is a terrible idea because, as you have realised, you have to iterate over the entire map and use String.split in order to find entries for a particular k value. This is extremely inefficient.
One common solution is to use a Map<Integer, Map<String, Object>> instead. You can get the object associated to k = 3, s = "foo" by doing map.get(3).get("foo"). You can also get all objects associated to 3 by doing map.get(3).values().
The downside to this approach is that it is a bit cumbersome to add to the map. In Java 8 you can do
map.computeIfAbsent(3, k -> new HashMap<String, Object>()).put("foo", "bar");
Google Guava's Table interface takes the pain out of using a data structure like this.

Getting the indices of an unsorted double array after sorting

This question comes as a companion of this one that regarded fastest sorting of a double array.
Now I want to get the top-k indices corresponding to the unsorted array.
I have implemented this version which (unfortunately) uses autoboxing and HashMap as proposed in some answers including this one:
HashMap<Double, Integer> map = new HashMap<Double, Integer>();
for(int i = 0; i < numClusters; i++) {
map.put(scores[i], i);
}
Arrays.sort(scores);
HashSet<Integer> topPossibleClusters = new HashSet<Integer>();
for(int i = 0; i < numClusters; i++) {
topPossibleClusters.add(map.get(scores[numClusters - (i+1)]));
}
As you can see this uses a HashMap with keys the Double values of the original array and as values the indices of the original array.
So, after sorting the original array I just retrieve it from the map.
I also use HashSet as I am interested in deciding if an int is included in this set, using .contains() method. (I don't know if this makes a difference since as I mentioned in the other question my arrays are small -50 elements-). If this does not make a difference point it out though.
I am not interested in the value per se, only the indices.
My question is whether there is a faster approach to go with it?
This sort of interlinking/interlocking collections lends itself to fragile, easily broken, hard to debug, unmaintainable code.
Instead create an object:
class Data {
double value;
int originalIndex;
}
Create an array of Data objects storing the original value and index.
Sort them using a custom comparator that looks at data.value and sorts descending.
Now the top X items in your array are the ones you want and you can just look at the value and originalIndex as you need them.
As Tim points out linking a multiple collections is rather errorprone. I would suggest using a TreeMap as this would allow for a standalone solution.
Lets say you have double[] data, first copy it to a TreeMap:
final TreeMap<Double, Integer> dataWithIndex = new TreeMap<>();
for(int i = 0; i < data.length; ++i) {
dataWithIndex.put(data[i], i);
}
N.B. You can declare dataWithIndex as a NavigableMap to be less specific, but it's so much longer and it doesn't really add much as there is only one implementation in the JDK.
This will populate the Map in O(n lg n) time as each put is O(lg n) - this is the same complexity as sorting. In reality it will be probably be a little slower, but it will scale in the same way.
Now, say you need the first k elements, you need to first find the kth element - this is O(k):
final Iterator<Double> keyIter = dataWithIndex.keySet().iterator();
double kthKey;
for (int i = 0; i < k; ++i) {
kthKey = keyIter.next();
}
Now you just need to get the sub-map that has all the entries upto the kth entry:
final Map<Double, Integer> topK = dataWithIndex.headMap(kthKey, true);
If you only need to do this once, then with Java 8 you can do something like this:
List<Entry<Double, Integer>> topK = IntStream.range(0, data.length).
mapToObj(i -> new SimpleEntry<>(data[i], i)).
sorted(comparing(Entry::getKey)).
limit(k).
collect(toList());
i.e. take an IntStream for the indices of data and mapToObj to an Entry of the data[i] => i (using the AbsractMap.SimpleEntry implementation). Now sort that using Entry::getKey and limit the size of the Stream to k entries. Now simply collect the result to a List. This has the advantage of not clobbering duplicate entries in the data array.
It is almost exactly what Tim suggests in his answer, but using an existing JDK class.
This method is also O(n lg n). The catch is that if the TreeMap approach is reused then it's O(n lg n) to build the Map but only O(k) to reuse it. If you want to use the Java 8 solution with reuse then you can do:
List<Entry<Double, Integer>> sorted = IntStream.range(0, data.length).
mapToObj(i -> new SimpleEntry<>(data[i], i)).
sorted(comparing(Entry::getKey)).
collect(toList());
i.e. don't limit the size to k elements. Now, to get the first k elements you just need to do:
List<Entry<Double, Integer>> subList = sorted.subList(0, k);
The magic of this is that it's O(1).

Find number of distinct elements in a linked list

I have a LinkedList that contains many objects. How can I find the number and frequency of the distinct elements in the LinkedList.
You can iterate the list with a for-each loop while maintaining a histogram.
The histogram will actually be a Map<T,Integer> where T is the type of the elements in the linked list.
If you use a HashMap, this will get you O(n) average case algorithm for it - be sure you override equals() and hashCode() for your T elements. [if T is a built-in class [like Integer or String], you shouldn't be worried about this, they already override these methods].
The idea is simple: iterate the array, for each element: search for it in the histogram - if it is not there, insert it with value 1 [since you just saw it for the first time]. If it is in the histogram already, extract the value, and re-insert the element - with the same key and with value + 1.
should look something like this: [list is of type LinkedList<Integer>]
Map<Integer,Integer> histogram = new HashMap<Integer, Integer>();
for (Integer x : list) {
Integer value = histogram.get(x);
if (value == null) histogram.put(x,1);
else histogram.put(x, value + 1);
}
A simpler variation of the histogram solution with a Guava Multiset:
Multiset<Integer> multiset = HashMultiset.create();
multiset.addAll(linkedList);
int count = multiset.count(element); // number of occurrences of element
Set<Integer> distinctElements = multiset.elementSet();
// set of all the unique elements seen
(Disclosure: I work on Guava.)
#amit's answer is good, but I want to share a slight variation (and can't format a block of code in comment - otherwise this would just be a comment). I like to make two passes, one to create the histogram elements and the second to populate them. This feels cleaner to me, although it may be less efficient.
Map<Integer,Integer> histogram = new HashMap<Integer, Integer>();
for (Integer n : list)
histogram.put(n, 0);
for (Integer n : list)
histogram.put(n, histogram.get(n) + 1);
The LambdaJ Library offers a few interesting methods to query collections very easily as well:
List<Jedi> jedis = asList(
new Jedi("Luke"), new Jedi("Obi-wan"), new Jedi("Luke"),
new Jedi("Yoda"), new Jedi("Mace-Windu"),new Jedi("Luke"),
new Jedi("Obi-wan")
);
Group<Jedi> byName = with(jedis).group(Groups.by(on(Jedi.class).getName()));
System.out.println(byName.find("Luke").size()); //output 3
System.out.println(byName.find("Obi-wan").size()); //ouput 2
i have just learned about HashSet. i have no idea about map yet. so let me suggest my solution base on HashSet.
for(String a:Linklist1){
if(Hashset1.add(a){
count++;
}
}
System.out.println(count);
hope this helps.

Improving performance of merging lots of sorted maps into one sorted map - java

I have a method that gets a SortedMap as input, this map holds many SortedMap objects, the output of this method should be one SortedMap containing all elements of the maps held in the input map. the method looks like this:
private SortedMap mergeSamples(SortedMap map){
SortedMap mergedMap = new TreeMap();
Iterator sampleIt = map.values().iterator();
while(sampleIt.hasNext())
{
SortedMap currMap = (SortedMap) sampleIt.next();
mergedMap.putAll(currMap);
}
return mergedMap;
}
This is a performance killer, what can I improve here?
I don't see anything wrong with your code; all you can really do is try alternative implementations of SortedMap. First one would be ConcurrentSkipListMap and then look at Commons Collections, Google Collections and GNU Trove. The latter can yield very good results especially if your maps' keys and values are primitive types.
Is it a requirement for the input to be a SortedMap? To me it would seem easier if the input was just a Collection or List. That might speed up creating the input, and might make iteration over all contained maps faster.
Other than that I believe the most likely source of improving the performance of this code is by improving the speed of the compareTo() implementation of the values in the the sorted maps being merged.
Your code is as good as it gets. However, it seems to me that the overall design of the data structure needs some overhaul: You are using SortedMap<?, SortedMap<?, ?>, yet the keys of the parent map are not used.
Do you want to express a tree with nested elements with that and your task is it to flatten the tree? If so, either create a Tree class that supports your approach, or use an intelligent way to merge the keys:
public class NestedKey implements Comparable<NestedKey> {
private Comparable[] entries;
public NestedKey(Comparable... entries) {
assert entries != null;
this.entries = entries;
}
public int compareTo(NestedKey other) {
for(int i = 0; i < other.entries.length; i++) {
if (i == entries.length)
return -1; // other is longer then self <=> self is smaller than other
int cmp = entries[i].compareTo(other.entries[i]);
if (cmp != 0)
return cmp;
}
if (entries.length > other.entries.length)
return 1; // self is longer than others <=> self is larger than other
else
return 0;
}
}
The NestedKey entry used as a key for a SortedMap compares to other NestedKey objects by comparing each of its entries. NestedKeys that are in all elements present, but that have more entries are assumed to be larger. Thus, you have a relationship like this:
NestedKey(1, 2, 3) < NestedKey(1, 2, 4)
NestedKey(1, 3, 3) < NestedKey(2, 1, 1)
NestedKey(1, 2, 3) < NestedKey(2)
If you use only one SortedMap that uses NestedKey as its keys, then its .values() set automatically returns all entries, flattened out. However, if you want to use only parts of the SortedMap, then you must use .subMap. For example, if you want all entries wite NestedKeys between 2 and 3 , use .subMap(new NestedKey(2), new NestedKey(3))

Categories