I am new to Java, I was working with Map class and its derivatives.
I was just wondering about how elements are found inside them. Is only a pointer/reference check performed?
Let's say I have a TreeMap<MyObject, Integer>. If I have an object x i would like you to search an integer v such that its key is "equal" to x even if they are 2 separate instances of the class MyObject, hence 2 different pointers.
Is there any method (of an interface/superclass too) which can it do such operation?
Thanks in advance.
All the methods that involve comparisons in Map and its implementations make use of the 'equals' method for the objects. If you attempt to add a key+value to a Map which already contains aentry with a key that would compare equals to it, then the new key+value replaces the old one.
See the documentation:
For example, the specification for the containsKey(Object key) method says: "returns true if and only if this map contains a mapping for a key k such that (key==null ? k==null : key.equals(k))."
The implementation may not execute any equals comparison if it can determine that the keys are 'unequal' through some other means, such as comparing hashcodes.
In your example, you would do
TreeMap<MyObject, Integer> tree = ...
Integer i = tree.get(x);
The get(x) will iterate over your keys() and returning the integer value for the key matching aKey.equals(x).
In most cases, Maps are backed by a hash table, and are very similar to a HashMap. TreeMap gives a bit more information with each node by having pointers up and down the tree, but lookups are still done via hashes (I believe)
Related
in my project I use HashMap in order to store some data, and I've recently discovered that when I mutate the keys of the HashMap, some unexpected wrong results may happen. For Example:
HashMap<ArrayList,Integer> a = new HashMap<>();
ArrayList list1 = new ArrayList<>();
a.put(list1, 1);
System.out.println(a.containsKey(new ArrayList<>())); // true
list1.add(5);
ArrayList list2 = new ArrayList<>();
list2.add(5);
System.out.println(a.containsKey(list2)); // false
Note that both a.keySet().iterator().next().hashCode() == list2.hashCode() and a.keySet().iterator().next().equals(list2) are true.
I cannot understand why it happens, referring to the fact that the two objects are equal and have the same hash-code. Do anyone know what is the cause of that, and if there is any other similar structure that allows mutation of the keys? Thanks.
Mutable keys are always a problem. Keys are to be considered mutable if the mutation could change their hashcode and/or the result of equals(). That being said, lists often generate their hashcodes and check equality based on their elements so they almost never are good candidates for map keys.
What is the problem in your example? When the key is added it is an empty list and thus produces a different hashcode than when it contains an element. Hence even though the hashcode of the key and list2 are the same after changing the key list you'll not find the element. Why? Simply because the map looks in the wrong bucket.
Example (simplified):
Let's start with a few assumptions:
an empty list returns a hashcode of 0
if the list contains the element 5 it returns the hashcode 5
our map has 16 buckets (default)
the bucket index is determined by hashcode % 16 (the number of our buckets)
If you now add the empty list it gets inserted into bucket 0 due to its hashcode.
When you do the lookup with list1 it will look in bucket 5 due to the hashcode of 5. Since that bucket is empty nothing will be found.
The problem is that your key list changes its hashcode and thus should be put into a different bucket but the map doesn't know this should happen (and doing so would probably cause a bunch of other problems).
According to the javadocs for Map:
Note: great care must be exercised if mutable objects are used as map
keys. The behavior of a map is not specified if the value of an object
is changed in a manner that affects equals comparisons while the
object is a key in the map. A special case of this prohibition is that
it is not permissible for a map to contain itself as a key. While it
is permissible for a map to contain itself as a value, extreme caution
is advised: the equals and hashCode methods are no longer well defined
on such a map.
Your lists are the keys and you're changing them. It would not be a problem if the contents of the list were not what determine the values for hash code and what is equal, however that is not your case. If you think about it, it doesn't make much sense to change the key of a map. The key is what identifies the value, and if that key changes, all bets are off.
The map inserts the value given the hash code upon insertion. When you search for it later, it uses the hash code of the parameter to determine if it is a hit. I think you'd find that had you inserted list1 with the value already inserted that you would see "true" printed out since list2.hashCode() would produce the same hash code as list1 when it was inserted.
That's because a HashMap uses the hashCode() Method of Object in combination with equals(Object obj) to check if this map contains an object.
See:
ArrayList<Integer> a = new ArrayList<>();
a.add(1);
System.out.println(a.hashCode());
a.add(2);
System.out.println(a.hashCode());
This example shows, that the hashCode of your ArrayList has changed.
You should never use a mutable object as a key in your hashmap.
So what basically going on when u put the list1 as key in line 3 is that the map calculates its hashCode which it would later compare in containsKey(someKey) .
but when u mutated the list1 in line 5 its hashCode is essentially changed.
so if u now do
System.out.println(a.containsKey(list1));
after line 5 it would say false
and if u do System.out.println(a.get(list1));
it would say null as its comparing two different hashCodes
Probably you didn't override equals() and hashCode() methods.
I'm a bit confused about the internal implementation of HashSet and HashMap in java.
This is my understanding, so please correct me if I'm wrong:
Neither HashSet or HashMap allow duplicate elements.
HashSet is backed by a HashMap, so in a HashSet when we call .add(element), we are calling the hashCode() method on the element and internally doing a put(k,v) to the internal HashMap, where the key is the hashCode and the value is the actual object. So if we try to add the same object to the Set, it will see that the hashCode is already there, and then replace the old value by the new one.
But then, this seems inconsistent to me when I read how a HashMap works when storing our own objects as keys in a HashMap.
In this case we must override the hashCode() and equals() methods and make them consistent between each other, because, if we find keys with the same hashCode, they will go to the same bucket, and then to distinguish between all the entries with the same hashCode we have to iterate over the list of entries to call the method equals() on each key and find a match.
So in this case, we allow to have the same hashCode and we create a bucket containing a list for all the objects with the same hashCode, however using a HashSet, if we find already a hashCode, we replace the old value by the new value.
I'm a bit confused, could someone clarify this to me please?
You are correct regarding the behavior of HashMap, but you are wrong about the implementation of HashSet.
HashSet is backed by a HashMap internally, but the element you are adding to the HashSet is used as the key in the backing HashMap. For the value, a dummy value is used. Therefore the HashSet's contains(element) simply calls the backing HashMap's containsKey(element).
The value we insert in HashMap acts as a Key to the map object and for its value, java uses a constant variable.So in the key-value pair, all the keys will have the same value.
you can refer to this link
https://www.geeksforgeeks.org/hashset-in-java/
Hash Map:-Basically Hash map working as key and value ,if we want to store data as key and value pair then we will go to the hash map, basically when we insert data by using hash map basically internally it will follow 3 think,
1.hashcode
2..equale
3.==
when we insert the data in hash map it will store the data in bucket(fast in) by using hash code , if there is 2 data store in the same bocket then key collision will happen to resolve this key collision we use (==) method, always == method check the reference of the object, if both object hashcode is same then first one replace to second one if the hashcode is not same then hashing Collision will happen to resolve this hashing collision we will use (.equal) method .equal method basically it will check the content , if both the content is same then it will return true other wise it will return false, so in the hash map it will check is the content is same ? if the content is same then first one replace to the second one if both content is different the it will create another one object in the bocket and store the data
Hash Set:- Basically Hash Set is use to store bunch of object at a time ,internally hash set also use hash map only , when we insert somethink by using add method internally it will call put method and it will store data in the hashmap key bcz hash map key always unique and duplicate are not allowed that's way hashset also unique and duplicate are not allowed and if we entered duplicate also in hashst it will not through any exception first one will replace to the second one and in the value it will store constant data "PRESENT".
You can observe that internal hashmap object contains the element of hashset as keys and constant “PRESENT” as their value.
Where present is constant which is defined as
private static final Object present = new Object()
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Everywhere you can find answer what are differences:
Map is storing keys-values, it is not synchronized(not a thread safe), allows null values and only one null key, faster to get value because all values have unique key, etc.
Set - not sorted, slower to get value, storing only value, does not allow duplicates or null values I guess.
BUT what means Hash word (that is what they have the same). Is it something about hashing values or whatever I hope you can answer me clearly.
Both use hash value of the Object to store which internally uses hashCode(); method of Object class.
So if you are storing instances of your custom class then you need to override hashCode(); method.
HashSet and HashMap have a number of things in common:
The start of their name - which is a clue to the real similarity.
They use Hash Codes (from the hashCode method built into all Java objects) to quickly process and organize Objects.
They are both unordered collections - but both provide ordered varients (LinkedHashX to store objects in the order of addition)
There is also TreeSet/TreeMap to sort all objects present in the collection and keep them sorted. A comparison of TreeSet to TreeMap will find very similar differences and similarities to one between HashSet and HashMap.
They are also both impacted by the strengths and limitations of Hash algorithms in general.
Hashing is only effective if the objects have well behaved hash functions.
Hashing breaks entirely if equals and hashCode do not follow the correct contract.
Key objects in maps and objects in set should be immutable (or at least their hashCode and equals return values should never change) as otherwise behavior becomes undefined.
If you look at the Map API you can also see a number of other interesting connections - such as the fact that keySet and entrySet both return a Set.
None of the Java Collections are thread safe. Some of the older classes from other packages were but they have mostly been retired. For thread-safety look at the concurrent package for non-thread-safety look at the collections package.
Just look into HashSet source code and you will see that it uses HashMap. So they have the same properties of null-safety, synchronization etc:
public class HashSet<E>
...
private transient HashMap<E,Object> map;
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
/**
* Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
* default initial capacity (16) and load factor (0.75).
*/
public HashSet() {
map = new HashMap<>();
}
...
public boolean contains(Object o) {
return map.containsKey(o);
}
...
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
...
}
HashSet is like a HashMap where you don't care about the values but about the keys only.
So you care only if a given key K is in the set but not about the value V to which it is mapped (you can think of it as if V is a constant e.g. V=Boolean.TRUE for all keys in the HashSet). So HashSet has no values (V set). This is the whole difference from structural point of view. The hash part means that when putting elements into the structure Java first calls the hashCode method. See also http://en.wikipedia.org/wiki/Open_addressing to understand in general what happens under the hood.
The hash value is used to check faster if two objects are the same. If two objects have same hash, they can be equal or not equal (so they are then compared for equality with the equals method). But if they have different hashes they are different for sure and the check for equality is not needed. This doesn't mean that if two objects have same hash values they overwrite each other when they are stored in the HashSet or in the HashMap.
Both are not Thread safe and store values using hashCode(). Those are common facts. And another one is both are member of Java collection framework. But there are lots of variations between those two.
Hash regards the technique used to convert the key to an index. Back in the data strucutures class we used to learn how to construct a hash table, to do that you would need to get the strings that were inserted as values and convert them to a number to index an array used internally as the storing data structure.
One problem that was also very discussed was to find a hashing function that would incurr in minimum colision so that we won't have two different objects, with different keys sharing the same position.
So, the hash is about how the keys are processed to be stored. If we think about it for a while, there isn't a (real) way to index memory with strings, only with numbers, so to have a 2d structure like a table that is indexed by a string (or an object as you wish) you need to generate a number (or a hash) for that string and store the value in an array in this index. However, if you need the key "name" you would need a different array to, in the same index, store the key "name".
Cheers
The "HASH" word is common because both uses hashing mechanism. HashSet is actually implemented using HashMap, using dummy object instance on every entry of the Set. And thereby a wastage of 4 bytes for each entry.
I am trying to figure something out about hashing in java.
If i want to store some data in a hashmap for example, will it have some kind of underlying hashtable with the hashvalues?
Or if someone could give a good and simple explanation of how hashing work, I would really appreciate it.
HashMap is basically implemented internally as an array of Entry[]. If you understand what is linkedList, this Entry type is nothing but a linkedlist implementation. This type actually stores both key and value.
To insert an element into the array, you need index. How do you calculate index? This is where hashing function(hashFunction) comes into picture. Here, you pass an integer to this hashfunction. Now to get this integer, java gives a call to hashCode method of the object which is being added as a key in the map. This concept is called preHashing.
Now once the index is known, you place the element on this index. This is basically called as BUCKET , so if element is inserted at Entry[0], you say that it falls under bucket 0.
Now assume that the hashFunction returns you same index say 0, for another object that you wanted to insert as a key in the map. This is where equals method is called and if even equals returns true, it simple means that there is a hashCollision. So under this case, since Entry is a linkedlist implmentation, on this index itself, on the already available entry at this index, you add one more node(Entry) to this linkedlist. So bottomline, on hashColission, there are more than one elements at a perticular index through linkedlist.
The same case is applied when you are talking about getting a key from map. Based on index returned by hashFunction, if there is only one entry, that entry is returned otherwise on linkedlist of entries, equals method is called.
Hope this helps with the internals of how it works :)
Hash values in Java are provided by objects through the implementation of public int hashCode() which is declared in Object class and it is implemented for all the basic data types. Once you implement that method in your custom data object then you don't need to worry about how these are used in miscellaneous data structures provided by Java.
A note: implementing that method requires also to have public boolean equals(Object o) implemented in a consistent manner.
If i want to store some data in a hashmap for example, will it have some kind of underlying hashtable with the hashvalues?
A HashMap is a form of hash table (and HashTable is another). They work by using the hashCode() and equals(Object) methods provided by the HashMaps key type. Depending on how you want you keys to behave, you can use the hashCode / equals methods implemented by java.lang.Object ... or you can override them.
Or if someone could give a good and simple explanation of how hashing work, I would really appreciate it.
I suggest you read the Wikipedia page on Hash Tables to understand how they work. (FWIW, the HashMap and HashTable classes use "separate chaining with linked lists", and some other tweaks to optimize average performance.)
A hash function works by turning an object (i.e. a "key") into an integer. How it does this is up to the implementor. But a common approach is to combine hashcodes of the object's fields something like this:
hashcode = (..((field1.hashcode * prime) + field2.hashcode) * prime + ...)
where prime is a smallish prime number like 31. The key is that you get a good spread of hashcode values for different keys. What you DON'T want is lots of keys all hashing to the same value. That causes "collisions" and is bad for performance.
When you implement the hashcode and equals methods, you need to do it in a way that satisfies the following constraints for the hash table to work correctly:
1. O1.equals(o2) => o1.hashcode() == o2.hashcode()
2. o2.equals(o2) == o2.equals(o1)
3. The hashcode of an object doesn't change while it is a key in a hash table.
It is also worth noting that the default hashCode and equals methods provided by Object are based on the target object's identity.
"But where is the hash values stored then? It is not a part of the HashMap, so is there an array assosiated to the HashMap?"
The hash values are typically not stored. Rather they are calculated as required.
In the case of the HashMap class, the hashcode for each key is actually cached in the entry's Node.hash field. But that is a performance optimization ... to make hash chain searching faster, and to avoid recalculating hashes if / when the hash table is resized. But if you want this level of understanding, you really need to read the source code rather than asking Questions.
This is the most fundamental contract in Java: the .equals()/.hashCode() contract.
The most important part of it here is that two objects which are considered .equals() should return the same .hashCode().
The reverse is not true: objects not considered equal may return the same hash code. But it should be as rare an occurrence as possible. Consider the following .hashCode() implementation, which, while perfectly legal, is as broken an implementation as can exist:
#Override
public int hashCode() { return 42; } // legal!!
While this implementation obeys the contract, it is pretty much useless... Hence the importance of a good hash function to begin with.
Now: the Set contract stipulates that a Set should not contain duplicate elements; however, the strategy of a Set implementation is left... Well, to the implementation. You will notice, if you look at the javadoc of Map, that its keys can be retrieved by a method called .keySet(). Therefore, Map and Set are very closely related in this regard.
If we take the case of a HashSet (and, ultimately, HashMap), it relies on .equals() and .hashCode(): when adding an item, it first calculates this item's hash code, and according to this hash code, attemps to insert the item into a given bucket. In contrast, a TreeSet (and TreeMap) relies on the natural ordering of elements (see Comparable).
However, if an object is to be inserted and the hash code of this object would trigger its insertion into a non empty hash bucket (see the legal, but broken, .hashCode() implementation above), then .equals() is used to determine whether that object is really unique.
Note that, internally, a HashSet is a HashMap...
Hashing is a way to assign a unique code for any variable/object after applying any function/algorithm on its properties.
HashMap stores key-value pair in Map.Entry static nested class implementation.
HashMap works on hashing algorithm and uses hashCode() and equals() method in put and get methods.
When we call put method by passing key-value pair, HashMap uses Key hashCode() with hashing to find out
the index to store the key-value pair. The Entry is stored in the LinkedList, so if there are already
existing entry, it uses equals() method to check if the passed key already exists, if yes it overwrites
the value else it creates a new entry and store this key-value Entry.
When we call get method by passing Key, again it uses the hashCode() to find the index
in the array and then use equals() method to find the correct Entry and return it’s value.
Below image will explain these detail clearly.
The other important things to know about HashMap are capacity, load factor, threshold resizing.
HashMap initial default capacity is 16 and load factor is 0.75. Threshold is capacity multiplied
by load factor and whenever we try to add an entry, if map size is greater than threshold,
HashMap rehashes the contents of map into a new array with a larger capacity.
The capacity is always power of 2, so if you know that you need to store a large number of key-value pairs,
for example in caching data from database, it’s good idea to initialize the HashMap with correct capacity
and load factor.
I want a map indexed by two keys (a map in which you put AND retrieve values using two keys) in Java. Just to be clear, I'm looking for the following behavior:
map.put(key1, key2, value);
map.get(key1, key2); // returns value
map.get(key2, key1); // returns null
map.get(key1, key1); // returns null
What's the best way to to it? More specifically, should I use:
Map<K1,Map<K2,V>>
Map<Pair<K1,K2>, V>
Other?
(where K1,K2,V are the types of first key, second key and value respectively)
You should use Map<Pair<K1,K2>, V>
It will only contain one map,
instead of N+1 maps
Key construction
will be obvious (creation of the
Pair)
Nobody will get confused as to
the meaning of the Map as its
programmer facing API won't have changed.
Dwell time in the data structure would be shorter, which is good if you find you need to synchronize it later.
If you're willing to bring in a new library (which I recommend), take a look at Table in Guava. This essentially does exactly what you're looking for, also possibly adding some functionality where you may want all of the entries that match one of your two keys.
interface Table<R,C,V>
A collection that associates an
ordered pair of keys, called a row key
and a column key, with a single value.
A table may be sparse, with only a
small fraction of row key / column key
pairs possessing a corresponding
value.
I'd recommend going for the second option
Map<Pair<K1,K2>,V>
The first one will generate more overload when retrieving data, and even more when inserting/removing data from the Map. Every time that you put a new Value V, you'll need to check if the Map for K1 exists, if not create it and put it inside the main Map, and then put the value with K2.
If you want to have an interface as you're exposing initially wrap your Map<Pair<K1,K2>,V> with your own "DoubleKeyMap".
(And don't forget to properly implement the methods hash and equals in the Pair class!!)
While I also am on board with what you proposed (a pair of values to use as the key), you could also consider making a wrapper which can hold/match both keys. This might get somewhat confusing since you would need to override the equals and hashCode methods and make that work, but it could be a straightforward way of indicating to the next person using your code that the key must be of a special type.
Searching a little bit, I found this post which may be of use to you. In particular, out of the Apache Commons Collection, MultiKeyMap. I've never used this before, but it looks like a decent solution and may be worth exploring.
I would opt for the Map<Pair<K1,K2>, V> solution, because:
it directly expresses what you want to do
is potentially faster because it uses fewer indirections
simplifies the client code (the code that uses the Map afterwards
Logically, you Pair (key1, key2) corresponds to something since it is the key of your map. Therefore you may consider writing your own class having K1 and K2 as parameters and overriding the hashCode() method (plus maybe other methods for more convenience).
This clearly appears to be a "clean" way to solve your problem.
I have used array for the key: like this
Map<Array[K1,K2], V>