HashMap has containsValue method, but not getValue method - java

I am curious that in the Java collections library, HashMap has a method that searches for the existance of a particular object value called containsValue(Object value) returing a boolean, but no method exists to get the value object by value object directly like you do by providing a key via the get(Object key) method. Now, I know that the purpose of HashMap is to access object values via the keys, but in exceptional cases may want retrieve via the object value, so why is there not a getValue(Object value) method? I ask this, because the algorithm that the method containsValue() implements to search for the object value is faster than my custom search (see below). Also, is there a better way to accomplish this search using HashMap in Java 7 ?
Code Snippet:
// Custom Search
MyCustomer findCust = new MyCustomer(50000, "Joe Bloggs", "London");
for (MyCustomer value : hashMap.values()) {
if (value.equals(findCust)) { // found
cust = value;
break;
}
}

The basic assumption of the collections framework is that if two objects are .equals, they are interchangeable in every way. Given that assumption, there's no reason to get out the value from a Map, because you already have one that is equals and interchangeable. As far as the Collections Framework is concerned, these two methods are fully equivalent:
for (V value : map.values()) {
if (value.equals(myValue)) {
return value;
}
}
and
if (map.containsValue(myValue)) {
return myValue;
}
This assumption is built into the Collections Framework in many places, and this is one of many examples.

hashMap.values().contains(findCust)
You will need equals and hashCode on Customer based on your "business rules" (for example, are two customers with the same "id" but with different other values "equal"????... Obviously you are already doing that because you are using equals...)

HashMap is designed to aid constant lookups using hashcode() and equals() of the key you use to put some value into map.
If you look at the internal structure of HashMap, it's nothing but an array. Each index is called a bucket which can be obtained by normalizing current array's length and the hashcode of the key you pass. Once you find the bucket, it will store the element at that particular index. But if there's already some element stored in that index, they will form a LinkedList of these elements chaining all the values having same hashcode() but different equals() criteria.
In Java 8, this linked list is even changed to TreeMap if the number of elements in that linked list reaches some threshold (8) for improving performance.
Coming to your question, containsValue() basically iterates over all the buckets in the array and again through all the elements in the linked list of each bucket
// iterate through buckets
for (int i = 0; i < table.length; ++i) {
// iterate through each element in linked list at each bucket
for (Node<K,V> e = table[i]; e != null; e = e.next) {
if ((v = e.value) == value ||
(value != null && value.equals(v)))
return true;
}
}
HashMap.values() returns a Collection with the iterator implemented to traverse each element in HashMap providing access to Value object in each iteration.
containsValue() is used when you want to do something if some value is already there in the map but you don't need that value to proceed with your flow.This is merely a convenience method because if you're using values, you will be creating a Collection object and an iterator object to iterate over them but using containsValue(), you just have two nested for loops. I think the reason for not having a getValue() is to encourage the purpose HashMap is intended for - near constant time look ups using hashcode & equals of some key.
values() is used when you basically need to iterate over all the values. This is different from calling map.get(key) in a loop because you don't have to normalize the hashcode, find the bucket, then find the element in the linked list in each iteration, you just loop in the natural order, the way the elements are laid out in the array.
If you're doing this value lookup way too many times, you lose the advantage of constant lookups offered by HashMap. If you're only going to skim through the values searching for some value, I suggest you use an ArrayList. If there are too many elements in that list, and you need to search for some random value quite often, sort the list and use Binary Search.

Related

Extend HashMap to get all objects with the specified hashcode?

From what I understand, when two objects are put in a HashMap that have the same hashcode, they are put in a LinkedList (I think) of objects with the same hash code. I am wondering if there is a way to either extend HashMap or manipulate the existing methods to return a list or array of objects that share a hash code instead of going into equals to see if they are the same object.
The reasoning is that I'm trying to optimize a part of a code that, currently, is just a while loop that finds the first object with that hashcode and stores/removes it. This would be a lot faster if I could just return the full list in one go.
Here's the bit of code I'd like to replace:
while (WorkingMap.containsKey(toSearch)) {
Occurences++;
Possibles.add(WorkingMap.get(toSearch));
WorkingMap.remove(toSearch);
}
The keys are Chunk objects and the values are Strings. Here are the hashcode() and equals() functions for the Chunk class:
/**
* Returns a string representation of the ArrayList of words
* thereby storing chunks with the same words but with different
* locations and next words in the same has bucket, triggering the
* use of equals() when searching and adding
*/
public int hashCode() {
return (Words.toString()).hashCode();
}
#Override
/**
* result depends on the value of location. A location of -1 is obviously
* not valid and therefore indicates that we are searching for a match rather
* than adding to the map. This allows multiples of keys with matching hashcodes
* to be considered unequal when adding to the hashmap but equal when searching
* it, which is integral to the MakeMap() and GetOptions() methods of the
* RandomTextGenerator class.
*
*/
public boolean equals(Object obj) {
Chunk tempChunk = (Chunk)obj;
if (LocationInText == -1 && Words.size() == tempChunk.GetText().size())
{
for (int i = 0; i < Words.size(); i++) {
if (!Words.get(i).equals(tempChunk.GetText().get(i))) {
return false;
}
}
return true;
}
else {
if (tempChunk.GetLocation() == LocationInText) {
return true;
}
return false;
}
}
Thanks!
HashMap does not expose any way to do this, but I think you're misunderstanding how HashMap works in the first place.
The first thing you need to know is that if every single object had exactly the same hash code, HashMap would still work. It would never "mix up" keys. If you call get(key), it will only return the value associated with key.
The reason this works is that HashMap only uses hashCode as a first grouping, but then it checks the object you passed to get against the keys stored in the map using the .equals method.
There is no way, from the outside, to tell that HashMap uses linked lists. (In fact, in more recent versions of Java, it doesn't always use linked lists.) The implementation doesn't provide any way to look at hash codes, to find out how hash codes are grouped, or anything along those lines.
while (WorkingMap.containsKey(toSearch)) {
Occurences++;
Possibles.add(WorkingMap.get(toSearch));
WorkingMap.remove(toSearch);
}
This code does not "find the first object with that hashcode and store/remove it." It finds the one and only object equal to toSearch according to .equals, stores and removes it. (There can only be one such object in a Map.)
Your while isn't really going. It makes max one turn, if the WorkingMap is a plain Java HashMap. .get(key) return the last saved Object in the HashMap that is saved on 'key'. If it matched toSearch, than it going once.
I'm not sure about that many open questions here. But if you need that one and your farther code is understanding
What kind of type is class Possibles? ArrayList?
// this one should make the same as your while
if(WorkingMap.containsKey(toSearch)) {
Possibles.add(WorkingMap.get(toSearch));
WorkingMap.remove(toSearch);
}
// farher: expand your Possibles to get that LinkedList what you want to have.
public class possibilities {
// List<LinkedList<String>> container = new ArrayList<LinkedList<String>>();
public Map<Chunk, LinkedList<String>> container2 = new HashMap<Chunk, LinkedList<String>>();
public void put(Chunk key, String value) {
if(!this.container2.containsKey(key)) {
this.container2.put(key, new LinkedList<String>());
}
this.container2.get(key).add(value);
}
}
// this one works with updated Possibles
if(WorkingMap.containsKey(toSearch)) {
Possibles.put(toSearch, WorkingMap.get(toSearch));
WorkingMap.remove(toSearch);
}
//---
How ever, yes it can go like that, but keys should not be a complex object.
Notice: That LinkedLists takes memory and how big are chunks? check Memory Usage
Possibles.(get)container2.keySet();
Good Look
Sail
From what I understand, when two objects are put in a HashMap that have the same hashcode, they are put in a LinkedList (I think) of objects with the same hash code.
Yes, but it's more complicated than that. It often needs to put objects in linked lists even when they have differing hash codes, since it only uses some bits of the hash codes to choose which bucket to store objects in; the number of bits it uses depends on the current size of the internal hash table, which approximately depends on the number of things in the map. And when a bucket needs to contain multiple objects it will also try to use binary trees like a TreeMap if possible (if objects are mutually Comparable), rather than linked lists.
Anyway.....
I am wondering if there is a way to either extend HashMap or manipulate the existing methods to return a list or array of objects that share a hash code instead of going into equals to see if they are the same object.
No.
A HashMap compares keys for equality according to the equals method. Equality according to the equals method is the only valid way to set, replace, or retrieve values associated with a particular key.
Yes, it also uses hashCode as a way to arrange objects in a structure that allows for far faster location of potentially equal objects. Still, the contract for matching keys is defined in terms of equals, not hashCode.
Note that it is perfectly legal for every hashCode method to be implemented as return 0; and the map will still work just as correctly (but very slowly). So any idea that involves getting a list of objects sharing a hash code is either impossible or pointless or both.
I'm not 100% sure what you're doing in your equals method with the LocationInText variable, but it looks dangerous, as it violates the contract of the equals method. It is required that the equals method be symmetric, transitive, and consistent:
Symmetric: for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
Transitive: for any non-null reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
Consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.
And the hashCode method is required to always agree with equals about equal objects:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
The LocationInText variable is playing havoc with those rules, and may well break things. If not today, then some day. Get rid of it!
Here's the bit of code I'd like to replace:
while (WorkingMap.containsKey(toSearch)) {
Occurences++;
Possibles.add(WorkingMap.get(toSearch));
WorkingMap.remove(toSearch);
}
Something that jumps out at me is that you only need to do the key lookup once, instead of doing it three times, since Map.remove returns the removed value or null if the key is not present:
for (;;) {
String s = WorkingMap.remove(toSearch);
if (s == null) break;
Occurences++;
Possibles.add(s);
}
Either way, the loop is still faulty, since it is supposed to be impossible for a map to contain more than one key equal to toSearch. I can't overstate that the LocationInText variable as you're using it is not a good idea.
I agree with the other commenters it looks like you're looking for a map-of-list structure. Some Java libraries like Guava offer a Multimap for this, but you can do it manually pretty easily. I think the declaration you want is:
Map<Chunk,List<String>> map = new HashMap<>();
To add a new chunk-string pair to the map, do:
void add(Chunk chunk, String string) {
map.computeIfAbsent(chunk, k -> new ArrayList<>()).add(string);
}
That method puts a new ArrayList in the map if the chunk is new, or fetches the existing ArrayList if there is one for that chunk. Then it adds the string to the list that it fetched or created.
To retrieve the list of all strings for a particular chunk value is as simple as map.get(chunkToSearch), which you can add to your Possibles list as Possibles.addAll(map.get(chunkToSearch));.
Other potential optimizations I'd point out:
In your Chunk.hashCode method, consider caching the hash code instead of recomputing it every time the method is called. If Chunk is mutable (which is not a good idea for a map key, but vaguely allowed so long as you're careful) then recompute the hash code only after the Chunk's value has changed. Also, if Words is a List, which it seems to be, it would likely be faster to use its hash code than convert it to a string and use the string's hash code, but I'm not sure.
In your Chunk.equals method, you can return true immediately if the instances are the same (which they often will be). Also, if GetText returns a copy of the data, then don't call it; you can access the private Words list of the other Chunk since you are in the same class, and finally, you can just defer to the List.equals method:
#Override
public boolean equals(Object o) {
return (this == o) || (o instanceof Chunk && this.Words.equals(((Chunk)o).Words));
}
Simple! Fast!

How to efficiently implement array element lookup and removal in Java?

Given a sorted array of objects, while the order is based on some object attribute. (Sorting is done via a List using Collections.sort() with a custom Comparator and then calling toArray()).
Duplicate instances of SomeObject are not allowed ("duplicates" in this regard depends on multiple attribute value in SomeObject), but it's possible that multiple instances of SomeObject have the same value for attribute1, which is used for sorting.
public SomeObject {
public attribute1;
public attribute2;
}
List<SomeObject> list = ...
Collections.sort(list, new Comparator<SomeObject>() {
#Override
public int compare(SomeObject v1, SomeObject v2) {
if (v1.attribute1 > v2.attribute1) {
return 1;
} else if (v1.attribute1 < v2.attribute1) {
return -1;
} else
return 0;
}
});
SomeObject[] array = list.toArray(new SomeObject[0]);
How to efficiently check whether a certain object based on some attribute is in that array while also being able to "mark" objects already found in some previous look up (e.g. simply by removing them from the array; already found objects don't need to be accessed at later time).
Without the later requirement, one could do a Arrays.binarySearch() with custom Comparator. But obviously it's not working when one want to remove objects already found.
Use a TreeSet (or TreeMultiset).
You can initialize it with your comparator; it sorts itself; look-up and removal are in logarithmic time.
You can also check for existence and remove in one step, because remove returns a boolean.
Building on Arian's answer, you can also use TreeBag from Apache Commons' Collections library. This is backed by a TreeMap, and maintains a count for repeated elements.
If you want you can put all the elements into some sort of linked list whose nodes are also connected in a heap form when you sort them. That way, finding an element would be log n and you can still delete the nodes in place.

Collection with index and hash access

I need a collection class which has both: quick index and hash access.
Now I have ArrayList. It has good index acces, but his contains method is not performant. HashSet has good contains implementation but no indexed acces. Which collection has both? Probably something from Apache?
Or should I create my own collection class which has both: ArrayList for indexed acces and HashSet for contains check?
Just for clarification: i need both get(int index) and contains(Object o)
If indexed access performance is not a problem the closest match is LinkedHashSet whose API says that it is
Hash table and linked list implementation of the Set interface, with predictable iteration order.
at least I dont think that the performance will be be worse than that of LinkedListPerformance. Otherwise I cannot see no alternative but your ArrayList + HashTable solution
If you are traversing the index from start to finish, I think this might satisfy your needs: LinkedHashSet
If you need to randomly access via the index, as well as hash access, if no-one else has a better suggestion I guess you can make your own collection which does both.
Do like this; use combination of Hash technique as well as list to get best of both worlds :)
class DataStructure<Integer>{
Hash<Integer,Integer> hash = new HashMap<Integer, Integer>();
List<Integer> list = new ArrayList<Integer>();
public void add(Integer i){
hash.add(i,i);
list.add(i);
}
public Integer get(int index){
return list.get(index);
}
...
} //used Integers to make it simpler
So object; you keep in HashMap/HashSet as well as ArrayList.
So if you want to use
contains method : call hashed contains method.
get an object with index: use array to return the value
Just make sure that you have both these collections in sync. And take care of updation/deletion in both data structures.
I don't know the exact lookup times, but maybe you could use some implementation of the Map interface. You could store you objects with map.put(objectHash, obj).
Then you could verify that you have a certain object with:
boolean contained = map.containsValue(obj);
And you can use the hash to lookup an object in the map:
MyObject object = map.get(objectHash);
Though, the only downfall is that you would need to know your hashes on this lookup call which may not be probably in your implementation.

Searching an object in a java map

I am new to Java, I was working with Map class and its derivatives.
I was just wondering about how elements are found inside them. Is only a pointer/reference check performed?
Let's say I have a TreeMap<MyObject, Integer>. If I have an object x i would like you to search an integer v such that its key is "equal" to x even if they are 2 separate instances of the class MyObject, hence 2 different pointers.
Is there any method (of an interface/superclass too) which can it do such operation?
Thanks in advance.
All the methods that involve comparisons in Map and its implementations make use of the 'equals' method for the objects. If you attempt to add a key+value to a Map which already contains aentry with a key that would compare equals to it, then the new key+value replaces the old one.
See the documentation:
For example, the specification for the containsKey(Object key) method says: "returns true if and only if this map contains a mapping for a key k such that (key==null ? k==null : key.equals(k))."
The implementation may not execute any equals comparison if it can determine that the keys are 'unequal' through some other means, such as comparing hashcodes.
In your example, you would do
TreeMap<MyObject, Integer> tree = ...
Integer i = tree.get(x);
The get(x) will iterate over your keys() and returning the integer value for the key matching aKey.equals(x).
In most cases, Maps are backed by a hash table, and are very similar to a HashMap. TreeMap gives a bit more information with each node by having pointers up and down the tree, but lookups are still done via hashes (I believe)

How does Java implement hash tables?

Does anyone know how Java implements its hash tables (HashSet or HashMap)? Given the various types of objects that one may want to put in a hash table, it seems very difficult to come up with a hash function that would work well for all cases.
HashMap and HashSet are very similar. In fact, the second contains an instance of the first.
A HashMap contains an array of buckets in order to contain its entries. Array size is always powers of 2. If you don't specify another value, initially there are 16 buckets.
When you put an entry (key and value) in it, it decides the bucket where the entry will be inserted calculating it from its key's hashcode (hashcode is not its memory address, and the the hash is not a modulus). Different entries can collide in the same bucket, so they'll be put in a list.
Entries will be inserted until they reach the load factor. This factor is 0.75 by default, and is not recommended to change it if you are not very sure of what you're doing. 0.75 as load factor means that a HashMap of 16 buckets can only contain 12 entries (16*0.75). Then, an array of buckets will be created, doubling the size of the previous. All entries will be put again in the new array. This process is known as rehashing, and can be expensive.
Therefore, a best practice, if you know how many entries will be inserted, is to construct a HashMap specifying its final size:
new HashMap(finalSize);
You can check the source of HashMap, for example.
Java depends on each class' implementation of the hashCode() method to distribute the objects evenly. Obviously, a bad hashCode() method will result in performance problems for large hash tables. If a class does not provide a hashCode() method, the default in the current implementation is to return some function (i.e. a hash) of the the object's address in memory. Quoting from the API doc:
As much as is reasonably practical,
the hashCode method defined by class
Object does return distinct integers
for distinct objects. (This is
typically implemented by converting
the internal address of the object
into an integer, but this
implementation technique is not
required by the JavaTM programming
language.)
There are two general ways to implement a HashMap. The difference is how one deals with collisions.
The first method, which is the one Java users, makes every bucket in a the HashMap contain a singly linked list. To accomplish this, each bucket contains an Entry type, which caches the hashCode, has a pointer to the key, pointer to the value, and a pointer to the next entry. When a collision occurs in Java, another entry is added to the list.
The other method for handling collisions, is to simply put the item into the next empty bucket. The advantage of this method is it requires less space, however, it complicates removals, as if the bucket following the removed item is not empty, one has to check to see if that item is in the right or wrong bucket, and shift the item if it originally has collided with the item being removed.
In my own words:
An Entry object is created to hold the reference of the Key and Value.
The HashMap has an array of Entry's.
The index for the given entry is the hash returned by key.hashCode()
If there is a collision ( two keys gave the same index ) , the entry is stored in the .next attribute of the existing entry.
That's how two objects with the same hash could be stored into the collection.
From this answer we get:
public V get(Object key) {
if (key == null)
return getForNullKey();
int hash = hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
return e.value;
}
return null;
}
Let me know if I got something wrong.

Categories