Fastest way to determine the lowest available key in Java HashMap? - java

Imagine a situation like this:
I have a HashMap<Integer, String>, in which I store the connected clients. It is HashMap, because the order does not matter and I need speed. It looks like this:
{
3: "John",
528: "Bob",
712: "Sue"
}
Most of the clients disconnected, so this is why I have the large gap.
If I want to add a new client, I need a key and obviously the usage of _map.size() to get a key is incorrect.
So, currently I use this function to get he lowest available key:
private int lowestAvailableKey(HashMap<?, ?> _map) {
if (_map.isEmpty() == false) {
for (int i = 0; i <= _map.size(); i++) {
if (_map.containsKey(i) == false) {
return i;
}
}
}
return 0;
}
In some cases, this is really slow.
Is there any faster or more professional way to get the lowest free key of a HashMap?

Any reason to use a HashMap? If you used TreeMap instead, the map would be ordered by key automatically. Yes, you end up with O(log n) access instead of O(1), but it's the most obvious approach.
Of course you could always maintain both a HashMap and a TreeSet, making sure you add entries and remove entries from both together, if you really needed to. The TreeSet would just act as an ordered set of keys for the map.

Related

Iterate over key-range of HashMap

Is it possible to iterate over a certain range of keys from a HashMap?
My HashMap contains key-value pairs where the key denotes a certainr row-column in Excel (e.g. "BM" or "AT") and the value is the value in this cell.
For example, my table import is:
startH = {
BQ=2019-11-04,
BU=2019-12-02,
BZ=2020-01-06,
CD=2020-02-03,
CH=2020-03-02,
CM=2020-04-06
}
endH = {
BT=2019-11-25,
BY=2019-12-30,
CC=2020-01-27,
CG=2020-02-24,
CL=2020-03-30,
CP=2020-04-27
}
I need to iterate over those two hashmap using a key-range in order to extract the data in the correct order. For example from "BQ" to "BT".
Explanation
Is it possible to iterate over hashmap but using its index?
No.
A HashMap has no indices. Depending on the underlying implementation it would also be impossible. Java HashMaps are not necessarily represented by a hashing-table. It can switch over to a red-black tree and they do not provide direct access at all. So no, not possible.
There is another fundamental flaw in this approach. HashMap does not maintain any order. Iterating it yields random orders that can change each time you start the program. But for this approach you would need insertion order. Fortunately LinkedHashMap does this. It still does not provide index-based access though.
Solutions
Generation
But, you actually do not even want index based access. You want to retrieve a certain key-range, for example from "BA" to "BM". A good approach that works with HashMap would be to generate your key-range and simply using Map#get to retrieve the data:
char row = 'B';
char columnStart = 'A';
char columnEnd = 'M';
for (char column = columnStart; columnStart <= columnEnd; column++) {
String key = Chararcter.toString(row) + column;
String data = map.get(key);
...
}
You might need to fine-tune it a bit if you need proper edge case handling, like wrapping around the alphabet (use 'A' + (column % alphabetSize)) and maybe it needs some char to int casting and vice versa for the additions, did not test it.
NavigableMap
There is actually a variant of map that offers pretty much what you want out of the box. But at higher cost of performance, compared to a simple HashMap. The interface is called NavigableMap. The class TreeMap is a good implementation. The problem is that it requires an explicit order. The good thing though is that you actually want Strings natural order, which is lexicographical.
So you can simply use it with your existing data and then use the method NavigableMap#subMap:
NavigableMap<String, String> map = new TreeMap<>(...);
String startKey = "BA";
String endKey = "BM";
Map<String, String> subMap = map.subMap(startKey, endKey);
for (Entry<String, String> entry : subMap.entrySet()) {
...
}
If you have to do those kind of requests more than once, this will definitely pay off and it is the perfect data-structure for this use-case.
Linked iteration
As explained before, it is also possible (although not as efficient) to instead have a LinkedHashMap (to maintain insertion order) and then simply iterate over the key range. This has some major drawbacks though, for example it first needs to locate the start of the range by fully iterating to there. And it relies on the fact that you inserted them correctly.
LinkedHashMap<String, String> map = ...
String startKey = "BA";
String endKey = "BM";
boolean isInRange = false;
for (Entry<String, String> entry : map.entrySet()) {
String key = entry.getKey();
if (!isInRange) {
if (key.equals(startKey)) {
isInRange = true;
} else {
continue;
}
}
...
if (key.equals(endKey)) {
break;
}
}
// rangeLower and rangeUpper can be arguments
int i = 0;
for (Object mapKey : map.keySet()) {
if (i < rangeLower || i > rangeUpper) {
i++;
continue;
}
// Do something with mapKey
}
The above code iterates by getting keyset and explicitly maintaining index and incrementing it in each loop. Another option is to use LinkedHashMap, which maintains a doubly linked list for maintaining insertion order.
I don't believe you can. The algorithm you propose assumes that the keys of a HashMap are ordered and they are not. Order of keys is not guaranteed, only the associations themselves are guaranteed.
You might be able to change the structure of your data to something like this:
ranges = {
BQ=BT,
BU=BY,
....
}
Then the iteration over the HashMap keys (start cells) would easily find the matching end cells.

How to select a random key from a HashMap in Java?

I'm working with a large ArrayList<HashMap<A,B>>, and I would repeatedly need to select a random key from a random HashMap (and do some stuff with it). Selecting the random HashMap is trivial, but how should I select a random key from within this HashMap?
Speed is important (as I need to do this 10000 times and the hashmaps are large), so just selecting a random number k in [0,9999] and then doing .next() on the iterator k times, is really not an option. Similarly, converting the HashMap to an array or ArrayList on every random pick is really not an option. Please, read this before replying.
Technically I feel that this should be possible, since the HashMap stores its keys in an Entry[] internally, and selecting at random from an array is easy, but I can't figure out how to access this Entry[]. So any ideas to access the internal Entry[] are more than welcome. Other solutions (as long as they don't consume linear time in the hashmap size) are also welcome of course.
Note: heuristics are fine, so if there's a method that excludes 1% of the elements (e.g. because of multi-filled buckets) that's no problem at all.
from top of my head
List<A> keysAsArray = new ArrayList<A>(map.keySet())
Random r = new Random()
then just
map.get(keysAsArray.get(r.nextInt(keysAsArray.size()))
I managed to find a solution without performance loss. I will post it here since it may help other people -- and potentially answer several open questions on this topic (I'll search for these later).
What you need is a second custom Set-like data structure to store the keys -- not a list as some suggested here. Lists-like data structures are to expensive to remove items from. The operations needed are adding/removing elements in constant time (to keep it up-to-date with the HashMap) and a procedure to select the random element. The following class MySet does exactly this
class MySet<A> {
ArrayList<A> contents = new ArrayList();
HashMap<A,Integer> indices = new HashMap<A,Integer>();
Random R = new Random();
//selects random element in constant time
A randomKey() {
return contents.get(R.nextInt(contents.size()));
}
//adds new element in constant time
void add(A a) {
indices.put(a,contents.size());
contents.add(a);
}
//removes element in constant time
void remove(A a) {
int index = indices.get(a);
contents.set(index,contents.get(contents.size()-1));
indices.put(contents.get(index),index);
contents.remove((int)(contents.size()-1));
indices.remove(a);
}
}
You need access to the underlying entry table.
// defined staticly
Field table = HashMap.class.getDeclaredField("table");
table.setAccessible(true);
Random rand = new Random();
public Entry randomEntry(HashMap map) {
Entry[] entries = (Entry[]) table.get(map);
int start = rand.nextInt(entries.length);
for(int i=0;i<entries.length;i++) {
int idx = (start + i) % entries.length;
Entry entry = entries[idx];
if (entry != null) return entry;
}
return null;
}
This still has to traverse the entries to find one which is there so the worst case is O(n) but the typical behaviour is O(1).
Sounds like you should consider either an ancillary List of keys or a real object, not a Map, to store in your list.
As #Alberto Di Gioacchino pointed out, there is a bug in the accepted solution with the removal operation. This is how I fixed it.
class MySet<A> {
ArrayList<A> contents = new ArrayList();
HashMap<A,Integer> indices = new HashMap<A,Integer>();
Random R = new Random();
//selects random element in constant time
A randomKey() {
return contents.get(R.nextInt(contents.size()));
}
//adds new element in constant time
void add(A item) {
indices.put(item,contents.size());
contents.add(item);
}
//removes element in constant time
void remove(A item) {
int index = indices.get(item);
contents.set(index,contents.get(contents.size()-1));
indices.put(contents.get(index),index);
contents.remove(contents.size()-1);
indices.remove(item);
}
}
I'm assuming you are using HashMap as you need to look something up at a later date?
If not the case, then just change your HashMap to an Array/ArrayList.
If this is the case, why not store your objects in a Map AND an ArrayList so you can look up randomly or by key.
Alternatively, could you use a TreeMap instead of HashMap? I don't know what type your key is but you use TreeMap.floorKey() in conjunction with some key randomizer.
After spending some time, I came to the conclusion that you need to create a model which can be backed by a List<Map<A, B>> and a List<A> to maintain your keys. You need to keep the access of your List<Map<A, B>> and List<A>, just provide the operations/methods to the caller. In this way, you will have the full control over implementation, and the actual objects will be safer from external changes.
Btw, your questions lead me to,
Why does the java.util.Set<V> interface not provide a get(Object o) method?, and
Bimap: I was trying to be clever but, of course, its values() method also returns Set.
This example, IndexedSet, may give you an idea about how-to.
[edited]
This class, SetUniqueList, might help you if you decide to create your own model. It explicitly states that it wraps the list, not copies. So, I think, we can do something like,
List<A> list = new ArrayList(map.keySet());
SetUniqueList unikList = new SetUniqueList(list, map.keySet);
// Now unikList should reflect all the changes to the map keys
...
// Then you can do
unikList.get(i);
Note: I didn't try this myself. Will do that later (rushing to home).
Since Java 8, there is an O(log(N)) approach with O(log(N)) additional memory: create a Spliterator via map.entrySet().spliterator(), make log(map.size()) trySplit() calls and choose either the first or the second half randomly. When there are say less than 10 elements left in a Spliterator, dump them into a list and make a random pick.
If you absolutely need to access the Entry array in HashMap, you can use reflection. But then your program will be dependent on that concrete implementation of HashMap.
As proposed, you can keep a separate list of keys for each map. You would not keep deep copies of the keys, so the actual memory denormalisation wouldn't be that big.
Third approach is to implement your own Map implementation, the one that keeps keys in a list instead of a set.
How about wrapping HashMap in another implementation of Map? The other map maintains a List, and on put() it does:
if (inner.put(key, value) == null) listOfKeys.add(key);
(I assume that nulls for values aren't permitted, if they are use containsKey, but that's slower)

Java - TreeMap Solution

I haven't done Java in a while and i need some suggestions and idea's regarding data structures.
Currently i am using a TreeMap to map String values to Integer value. I now need to do some calculations and divide the Integer value of the map entry by the the size of the whole map and store this for each entry. I was thinking about using a Map,Integer> but is there a 3 way generics data structure in Java?
My current solution for this is this ..
int treeSize = occurrence.size();
String [][] weight = new String[treeSize][2];
int counter=0;
double score =0;
for(Entry<String, Integer> entry : occurrence.entrySet()) {
weight[counter][0]=entry.getKey();
score=entry.getValue()/treeSize;
weight[counter][1]= Double.toString(score);
counter++;
}
I would use another object to hold this data:
public Data {
private int value;
private double score;
...
}
And then type the map as Map<String, Data>. After inserting all the values, you can iterate over the values and update the ratio property for each value in the map. For example:
double size = myMap.size();
for(Map.Entry<String, Data> entry : myMap.entrySet()) {
Data data = entry.getValue();
data.setScore(data.getValue() / size);
}
EDIT
Another thought just came to mind. Instead of calculating the values after you have inserted it, you should probably calculate it as you are inserting it; it's more efficient that way. Of course, you can only do this if you know the total number of values beforehand.
An even better way is to perform the calculation only when you retrieve a value from the map. There are two advantages to this:
You don't need a separate object. Just abstract the access of the value from the map inside another function which returns the value associated with the key, divided by the size of the map.
Since you don't have a separate object to maintain the calculated value, you don't need to update it every time you add or delete a new value.
You could use a Map.Entry<Integer, Double> to hold the two values. (Ultimately, you'd use either AbstractMap.SimpleEntry or AbstractMap.SimpleImmutableEntry)
So your TreeMap would be TreeMap<String, Map.Entry<Integer, Double>>
However, unless you have a good reason to do otherwise, I'd strongly suggest that you do the calculation on the fly. Recalculating every fraction every time anything is inserted or deleted is time consuming, and churns small little objects, so it's likely to be slower than just doing the calculation. Also, recalculation will cause threading issues if multiple threads access the TreeMap. Instead, something like
public synchronized double getFraction(String key) {
Integer value = theTreeMap.get(key);
if (value == null)
return 0.0; // or throw an exception if you prefer...
// note, since the Map has at least one entry, no need to check for div by zero
return value.doubleValue() / theTreeMap.size();
}

Java: Getting the 500 most common words in a text via HashMap

I'm storing my wordcount into the value field of a HashMap, how can I then get the 500 top words in the text?
public ArrayList<String> topWords (int numberOfWordsToFind, ArrayList<String> theText) {
//ArrayList<String> frequentWords = new ArrayList<String>();
ArrayList<String> topWordsArray= new ArrayList<String>();
HashMap<String,Integer> frequentWords = new HashMap<String,Integer>();
int wordCounter=0;
for (int i=0; i<theText.size();i++){
if(frequentWords.containsKey(theText.get(i))){
//find value and increment
wordCounter=frequentWords.get(theText.get(i));
wordCounter++;
frequentWords.put(theText.get(i),wordCounter);
}
else {
//new word
frequentWords.put(theText.get(i),1);
}
}
for (int i=0; i<theText.size();i++){
if (frequentWords.containsKey(theText.get(i))){
// what to write here?
frequentWords.get(theText.get(i));
}
}
return topWordsArray;
}
One other approach you may wish to look at is to think of this another way: is a Map really the right conceptual object here? It may be good to think of this as being a good use of a much-neglected-in-Java data structure, the bag. A bag is like a set, but allows an item to be in the set multiple times. This simplifies the 'adding a found word' very much.
Google's guava-libraries provides a Bag structure, though there it's called a Multiset. Using a Multiset, you could just call .add() once for each word, even if it's already in there. Even easier, though, you could throw your loop away:
Multiset<String> words = HashMultiset.create(theText);
Now you have a Multiset, what do you do? Well, you can call entrySet(), which gives you a collection of Multimap.Entry objects. You can then stick them in a List (they come in a Set), and sort them using a Comparator. Full code might look like (using a few other fancy Guava features to show them off):
Multiset<String> words = HashMultiset.create(theWords);
List<Multiset.Entry<String>> wordCounts = Lists.newArrayList(words.entrySet());
Collections.sort(wordCounts, new Comparator<Multiset.Entry<String>>() {
public int compare(Multiset.Entry<String> left, Multiset.Entry<String> right) {
// Note reversal of 'right' and 'left' to get descending order
return right.getCount().compareTo(left.getCount());
}
});
// wordCounts now contains all the words, sorted by count descending
// Take the first 50 entries (alternative: use a loop; this is simple because
// it copes easily with < 50 elements)
Iterable<Multiset.Entry<String>> first50 = Iterables.limit(wordCounts, 50);
// Guava-ey alternative: use a Function and Iterables.transform, but in this case
// the 'manual' way is probably simpler:
for (Multiset.Entry<String> entry : first50) {
wordArray.add(entry.getElement());
}
and you're done!
Here you can find a guide how to sort a HashMap by the values. After the sorting you can just iterate over the first 500 entries.
Take a look at the TreeBidiMap provided by the Apache Commons Collections package. http://commons.apache.org/collections/api-release/org/apache/commons/collections/bidimap/TreeBidiMap.html
It allows you to sort the map according to both the key or the value set.
Hope it helps.
Zhongxian

Improving performance of merging lots of sorted maps into one sorted map - java

I have a method that gets a SortedMap as input, this map holds many SortedMap objects, the output of this method should be one SortedMap containing all elements of the maps held in the input map. the method looks like this:
private SortedMap mergeSamples(SortedMap map){
SortedMap mergedMap = new TreeMap();
Iterator sampleIt = map.values().iterator();
while(sampleIt.hasNext())
{
SortedMap currMap = (SortedMap) sampleIt.next();
mergedMap.putAll(currMap);
}
return mergedMap;
}
This is a performance killer, what can I improve here?
I don't see anything wrong with your code; all you can really do is try alternative implementations of SortedMap. First one would be ConcurrentSkipListMap and then look at Commons Collections, Google Collections and GNU Trove. The latter can yield very good results especially if your maps' keys and values are primitive types.
Is it a requirement for the input to be a SortedMap? To me it would seem easier if the input was just a Collection or List. That might speed up creating the input, and might make iteration over all contained maps faster.
Other than that I believe the most likely source of improving the performance of this code is by improving the speed of the compareTo() implementation of the values in the the sorted maps being merged.
Your code is as good as it gets. However, it seems to me that the overall design of the data structure needs some overhaul: You are using SortedMap<?, SortedMap<?, ?>, yet the keys of the parent map are not used.
Do you want to express a tree with nested elements with that and your task is it to flatten the tree? If so, either create a Tree class that supports your approach, or use an intelligent way to merge the keys:
public class NestedKey implements Comparable<NestedKey> {
private Comparable[] entries;
public NestedKey(Comparable... entries) {
assert entries != null;
this.entries = entries;
}
public int compareTo(NestedKey other) {
for(int i = 0; i < other.entries.length; i++) {
if (i == entries.length)
return -1; // other is longer then self <=> self is smaller than other
int cmp = entries[i].compareTo(other.entries[i]);
if (cmp != 0)
return cmp;
}
if (entries.length > other.entries.length)
return 1; // self is longer than others <=> self is larger than other
else
return 0;
}
}
The NestedKey entry used as a key for a SortedMap compares to other NestedKey objects by comparing each of its entries. NestedKeys that are in all elements present, but that have more entries are assumed to be larger. Thus, you have a relationship like this:
NestedKey(1, 2, 3) < NestedKey(1, 2, 4)
NestedKey(1, 3, 3) < NestedKey(2, 1, 1)
NestedKey(1, 2, 3) < NestedKey(2)
If you use only one SortedMap that uses NestedKey as its keys, then its .values() set automatically returns all entries, flattened out. However, if you want to use only parts of the SortedMap, then you must use .subMap. For example, if you want all entries wite NestedKeys between 2 and 3 , use .subMap(new NestedKey(2), new NestedKey(3))

Categories