Retrieve N most relevant objects in Java TreeMap - java

According to this question I have ordered a Java Map, as follows:
ValueComparator bvc = new ValueComparator(originalMap);
Map<String,Integer> sortedMap = new TreeMap<String,Integer>(bvc);
sortedMap.putAll(originalMap);
Now, I would like to extract the K most relevant values from the map, in top-K fashion. Is there a highly efficient way of doing it without iterating through the map?
P.S., some similar questions (e.g., this) ask for a solution to the top-1 retrieval problem.

No, not if you use a Map. You'd have to iterate over it.
Have you considered using a PriorityQueue? It's Java's implementation of a heap. It has efficient operations for insertion of arbitrary elements and for removal of the "minimum". You might think about doing this here. Instead of a Map, you could put them into a PriorityQueue ordered by relevance, with the most relevant as the root. Then, to extract the K most relevant, you'd just pop K elements from the PriorityQueue.
If you need the map-like property (mapping from String to Integer), then you could write a class that internally keeps everything in both a PriorityQueue and a HashMap. When you insert, you insert into both; when you remove the minimal element, you pop from the PriorityQueue, and that then tells you which element you also need to remove from your HashMap. This will still give you log-time inserts and min-removals.

Related

LinkedHashMap Implementation in Java

I cannot understand the use of HashFunction in LinkedHashMap.
In the HashMap implementation, the use of hashFunction is to find the index of the internal array, which can be justified, following the hashfunction contract (same key will must have same hashcode, but distinct key can have same hashcode).
My questions are:
1) What is the use of hashfunction in LinkedHashMap?
2) How does the put and get method works for LinkedHashMap?
3) Why does it maintains the doublylinkedlist internally?
Whats wrong in using the HashMap as internal implementation(just like HashSet) and maintain a separate Array/List of indexes of the Entry array in the sequence of insertion?
Appreciate useful response and references.
1) LinkedHashMap extends HashMap so the hashfunction is the same of HashMap (if you check the code the hash function is inherited from HashMap), i.e. the function computes a the hash of the object inserted and it use to store in a data structure together with the elements with the same key hash; the hasfunction is used in the get method to retrieve the object with the key specified as a param.
2)Put and Get method are behave the same way as HashMap plus the track the insertion order of the elements so when you iterate over the the keyset you get the key values in the order you inserted into the map (see here for more details)
3)the LinkedHashMap uses a double linked list instead of an Array because it's more compact; a double linked list is the the most efficient data structure for list where you insert and remove items; if you mostly insert/append elements then an array based implementation may be better. Since the map sematic is a key-value implementation and removing elements from the map could be a frequent operation a double linked list is a better fit. The internal implmentation could be made with a LinkedList but my opionion is that using a low level data stucture is more efficient and decouples LinkedHashMap from other classes.
A LinkedHashMap does use a HashMap (in fact it extends from it), so the hashCode is used to identify the right hash bucket in the array of hash buckets, just as for HashMap. put and get work just as for HashMap (except that the before and after references for iterating over the entries are updated differently for the two implementations).
The reason insertion order is not kept by keeping an Array or ArrayList is that addition or removal in the middle of an ArrayList is an O(n) operation because you have to move all subsequent items along one place. You could do this with a LinkedList because addition and removal in the middle of a LinkedList is O(1) (all you have to do is break a few links and make a few new ones). However there's no point using a separate LinkedList because you may as well make the Map.Entry objects reference the previous and next Entry objects, which is exactly how LinkedHashMap works.
LinkedHashMap is a good choice for a data structure where you want to be able to put and get entries with O(1) running time, but you also need the behavior of a LinkedList. The internal hashing function is what allows you put and get entries with constant-time.
Here is how you use LinkedHashMap:
Map<String, Double> linkedHashMap = new LinkedHashMap<String, String>();
linkedHashMap.put("today", "Wednesday");
linkedHashMap.put("tomorrow", "Thursday");
String today = linkedHashMap.get("today"); // today is 'Wednesday'
There are several arguments against using a simple HashMap and maintaining a separate List for the insertion order. First, if you go this route it means you will have to maintain 2 data structures instead of one. This is error prone, and makes maintaining your code more difficult. Second, if you have to make your data structure Thread-safe, this would be complex for 2 data structures. On the other hand, if you use a LinkedHashMap you only would have to worry about making this single data structure thread-safe.
As for implementation details, when you do a put into a LinkedHashMap, the JVM will take your key and use a cryptographic mapping function to ultimately convert that key into a memory address where your value will be stored. Doing a get with a given key will also use this mapping function to find the exact location in memory where the value be stored. The entrySet() method returns a Set consisting of all the keys and values in the LinkedHashMap. By definition, sets are not ordered. The entrySet() is not guaranteed to be Thread-safe.
Ans. 2)
when we call put(map,key) of linkedhashmap. Internally it calls createEntry
void createEntry(int hash, K key, V value, int bucketIndex) {
HashMap.Entry<K,V> old = table[bucketIndex];
Entry<K,V> e = new Entry<K,V>(hash, key, value, old);
table[bucketIndex] = e;
e.addBefore(header);
size++;
Ans 3)
To efficiently maintain a linkedHashmap, you actually need a doubly linked list.
Consider three entries in order
A ---> B ---> C
Suppose you want to remove B. Obviously A should now point to C. But unless you know the entry before B you cannot efficiently say which entry should now point to C. To fix this, you need entries to point in both the directions Like this
---> --->
A B C
<--- <---
This way, when you remove B you can look at the entries before and after B (A and C) and update so that A and C point to each other.
similar post in this link discussed earlier
why linkedhashmap maintains doubly linked list for iteration

Occurred order in the iteration at run-time in a map

In the piece of code similar to
//something before
Iteration<String> iterator = hashMap.keySet().iterator();// HashMap<String, Document>
while(iterator.hasNext()){
System.out.println(iterator.next());
}
//something after
I know that the order of print can be different by the order of insertion of entry key, value; all right.
But if I call this piece in another moment, with re-create the variable hashMap and putting them the equal elements, can the second-moment time print be different from the first-time print?
My question was born by a problem with a web-app: I have a list of String in a JSP, but, after some years, the customer call because the order of the String was different in the morning, but it shows the usual order at the afternoon.
The problem is happened in only one day: the web-app uses the explained piece of code for take a Map and populate an ArrayList.
This ArrayList does'nt any explicit changement of order (no Comparator or similar classes).
I think (hope) that the cause of different order of print derives by a different sequence of iteration in the same HashMap at run-time and I looking for a validation by other people.
In the web, I read that the iteration order by a HashMap changes if the HashMap receives a modification: but what happens if the HashMap remains the same?
Hash map document says HashMap makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
that explains though the hashmap is same it can not guaranatee on order. for Ordered map you can use TreeMap or LinkedHashMap
TreeMap API says The map is sorted according to the natural ordering of its keys, or by a Comparator provided at map creation time, depending on which constructor is used.
HashMap API documentation states that
This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
For a Map that keeps its keys in original insertion order, use LinkedHashMap.
For a Map that keeps its keys in sorted order (either natural order or by you passing a Comparator), use either TreeMap or ConcurrentSkipListMap. If multi-threaded, use the second.
For a Map where the key an enum, use EnumMap if you want the entries ordered by the definition order of the enum's objects.
The other six Map implementations bundled with Java 11 do not promise any order to their entries.
See this graphic table of mine as an overview.
Use a LinkedHashMap instead, to preserve insertion order. From the javadoc: "Hash table and linked list implementation of the Map interface, with predictable iteration order."
If you just want a Map with predictable ordering, then you can also use TreeMap. However, a LinkedHashMap is faster, as seen here: "TreeMap has O(log n) performance for containsKey, get, put, and remove, according to the Javadocs, while LinkedHashMap is O(1) for each."
As Octopus mentioned, HashMap "makes no guarantees as to the order of the map," and you shouldn't use it if order must remain consistent.

Searching LinkedHashMap, faster method than sequential?

I am wondering if there is a more efficient method for getting objects out of my LinkedHashMap with timestamps greater than a specified time. I.e. something better than the following:
Iterator<Foo> it = foo_map.values().iterator();
Foo foo;
while(it.hasNext()){
foo = it.next();
if(foo.get_timestamp() < minStamp) continue;
break;
}
In my implementation, each of my objects has essentially three values: an "id," "timestamp," and "data." The objects are insterted in order of their timestamps, so when I call an iterator over the set, I get ordered results (as required by the linked hashmap contract). The map is keyed to the object's id, so I can quickly lookup them up by id.
When I look them up by a timestamp condition, however, I get an iterator with sorted results. This is an improvement over a generic hashmap, but I still need to iterate sequentially over much of the range until I find the next entry with a higher timestamp than the specified one.
Since the results are already sorted, is there any algorithm I can pass the iterator (or collection to), that can search it faster than sequential? If I went with a treemap as an alternative, would it offer overall speed advantages, or is it doing essentially the same thing in the background? Since the collection is sorted by insertion order already, I'm thinking tree map has a lot more overhead I don't need?
There is no faster way ... if you just use a LinkedHashMap.
If you want faster access, you need to use a different data structure. For example, a TreeSet with an appropriate comparator might be a better solution for this aspect of your problem. For example if your TreeSet is ordered by date, then calling tailSet with an appropriate dummy value can give you all elements greater or equal to a given date.
Since the results are already sorted, is there any algorithm I can pass the iterator (or collection to), that can search it faster than sequential?
Not for a LinkedHashMap.
However, if the ordered list was an ArrayList instead, then you could use "binary search" on the list ... provided that you could lock it to prevent concurrent modifications while you are searching. (Actually, concurrency is a potential issue to consider no matter how you implement this ... including your current linear search.)
If you want to keep the ability to do id lookups, then you need two data structures; e.g. a TreeSet and a HashMap which share their element objects. A TreeSet will probably be more efficient than trying to maintain an ArrayList in order assuming that there are random insertions and/or random deletions.

Java HashMap and underlying values() collection

I was wondering if the Collection view of the values contained in a HashMap is kept ordered when the HashMap changes.
For example if I have a HashMap whose values() method returns L={a, b, c}
What happened to L if I add a new element "d" to the map?
Is it added at the end, i.e. if I iterate through the elements, it's the order kept?
In particular, if the addition of the new element "d" causes a rehash, will the order be kept in L?
Many thanks!
I was wondering if the Collection view of the values contained in a HashMap is kept ordered when the HashMap changes.
No, there is no such guarantee.
If this was the case, then the following program would output and ordered sequence from 1-100
HashMap<Integer, Integer> map = new HashMap<Integer, Integer>();
for (int i = 0; i < 100; i++)
map.put(i, i);
System.out.println(map.values());
(and it doesn't).
There is a class that does precisely what you're asking for, and that is LinkedHashMap:
Hash table and linked list implementation of the Map interface, with predictable iteration order. This implementation differs from HashMap in that it maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order).
If it doesn't say it in the JavaDoc then there are no guarantees about it. Different versions of Java could do different things. Don't depend on undocumented behaviour.
You might want to look at LinkedHashMap.
HashMap in Java aren't ordered, so I think it will be safe to say that values() won't return an ordered Collection.
LinkedHashMap is an ordered version of HashMap (insertion order), but I don't know it values() will return an ordered Collection. I think the best is to try.
Generally they is no guarantee of order when you are using HashMap. It might be in the order in which you add elements for a few elements but it would get reshuffled when there is a possibility of collision and it has to go with a collision resolution strategy.

what data structure in java suppport sort/order

i use a hashmap to store some data, but i need to keep it in ascending order whenever new data saved to the hashmap or old data move out of the hashmap. but hashmap itself doesn't suppport order, what data structure i can use to support order? Thanks
TreeMap would be the canonical sorted map implementation. Note that this is sorted on the keys, which I presume is what you're after, but if not it won't be suitable.
Since Java 6 also comes with a SortedMap interface, you can look at the list of classes which implement it (on the linked Javadoc page), and choose between those. Implementing this method only guarantees that they have some sort of defined iteration order, you'd have to read the descriptions of each class to see if it's what you like.
TreeMap isn't a hashmap, in that it isn't backed by a hashtable to provide amortised O(1) inserts. However, it's not possible to maintain a sorted map with O(1) inserts anyway (since you have to inspect at least some of the existing elements to work out where the new element should go), and hence the O(lg n) performance of TreeMap is as good as you'll get in this case.
LinkedHashMap may be what you're looking for.
http://download.oracle.com/javase/1.4.2/docs/api/java/util/LinkedHashMap.html

Categories