Old values in hash map being overwritten by new values? - java

I have one hash map. I'm storing 12 different key,values pairs in it.
The first 8 values are stored fine, but when I try to put the 9th value it overwrites the old value. But the size increases.
If I try to get the old values, I get nulls. I have also checked the hash map table. Only 8 values are there. The old values are overwritten.
here have only 7 values but size is 9 . how it's possible ?
What could I be doing wrong?

Make sure you use different keys. If that's the case, make sure equals and hashcode for your key class work as required, i.e. when two objects are equal, their hashcodes must be same. And of course, equals for different key values (or what you'd expect to be distinct keys) must return false.
If that doesn't help, post a minimal, yet complete (compilable) example that demonstrates your problem.

As for the size=9 but only 7 values in the table, you are misunderstanding the internal workings of the HashMap. All values are not stored in the top-level table. The table is more like "buckets" that store entries grouped by certain hashcode ranges. Each "bucket" holds a chain of linked entries so what you are seeing in the table are just the first entries in each particular range chain. The size is always correct though, in terms of total number of entries in the map.
As for entries overwriting eachother, that happens only when you put en entry with a key that is identical (hashCode and equals) to en existing entry. So you are either adding with an existing key, or you are adding with null as key (null is permissible as key, but you can only have one entry with the key null).
Check your code, are you adding with null keys? If you are using instances of a custom class (one you created yourself) as key, have you implemented hashCode() and equals() according to the specifications (see http://download.oracle.com/javase/6/docs/api/java/lang/Object.html#hashCode%28%29)? Are you making sure that you are really using unique keys for all 12 put operations?

Related

How to calculate the complexity of the HashMap search algorithm? [duplicate]

This question already has answers here:
Is a Java hashmap search really O(1)?
(15 answers)
Closed 4 years ago.
How to calculate the complexity of the HashMap search algorithm? I'm googling result of this calculation - O(1), but I don't understand how they arrived at these findings.
HashMap works on the hashing principle.It is the data structure that allow you to store and retrieve data in O(1) time provided we know the key.
In hashing, hash functions are used to link key and value in HashMap. Objects are stored by calling put(key, value) method of HashMap and retrieved by calling get(key) method. When we call put method, hashcode() method of the key object is called so that hash function of the map can find a bucket location to store value object, which is actually an index of the internal array, known as the table. HashMap internally stores mapping in the form of Map.Entry object which contains both key and value object. When you want to retrieve the object, you call the get() method and again pass the key object. This time again key object generate same hash code (it's mandatory for it to do so to retrieve the object and that's why HashMap keys are immutable e.g. String) and we end up at same bucket location. If there is only one object then it is returned and that's your value object which you have stored earlier. Things get little tricky when collisions occur.
Collision : Since the internal array of HashMap is of fixed size, and if you keep storing objects, at some point of time hash function will return same bucket location for two different keys, this is called collision in HashMap. In this case, a linked list is formed at that bucket location and a new entry is stored as next node.
If we try to retrieve an object from this linked list, we need an extra check to search correct value, this is done by equals() method. Since each node contains an entry, HashMap keeps comparing entry's key object with the passed key using equals() and when it return true, Map returns the corresponding value.
Since searching inlined list is O(n) operation, in worst case hash collision reduce a map to linked list. This issue is recently addressed in Java 8 by replacing linked list to the tree to search in O(logN) time.
By the way, you can easily verify how HashMap works by looking at the code of HashMap.java in your Eclipse IDE if you are keenly interested in the code, otherwise the logic is explained above.
Information On Buckets : An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

Questions about Java's library map classes?

The answers are (2) and (4) but not sure why. I don't have much foundation on these topics. Could someone please explain why these are the correct answers and why the others are incorrect.
Thank you
A HashMap is a data structure that consists of keys and values. Values are stored in the HashMap with an associated key. The values can then be retrieved by recalling from the HashMap with the same key you used when putting the value in.
1
TreeMaps and LinkedHashMaps are different versions of a Map. A HashMap uses hashing to store its keys, whereas a TreeMap uses a binary search tree to store its keys and a LinkedHashMap uses a LinkedList to store keys. If you iterate over a HashMap, the keys will be returned in hash-sorted order (unpredictable in most cases), because that's how they were stored. The TreeMap, however, has a tree of all the values, so when you iterate over the tree, you'll get all the keys in actual sorted order. A LinkedHashMap has the keys in an ordered list, so an iterator will return the keys in the same order in which you inserted them.
2, 3, and 5
In a HashMap, values are looked up using their keys. If you had duplicate keys, the HashMap couldn't know which value to return. Therefore, every key in a HashMap must be unique, but the values do not have to be.
4
In a normal HashMap, the key is hashed and then inserted in the appropriate spot. With a TreeMap and a LinkedHashMap, you have the additional overhead of inserting the key into the tree or linked list which will take up additional time and memory.

Get Value of HashMap using part of the key

I have an HashMap(String,Object). The key is combination of more than 1 unique ID. I have an input, a string which is part of the key(1 unique ID). I need to take the value in HashMap using that part of the key i have without iterating thousands of values in HashMap.
Can we achieve it using any Regex statement in HashMap.get()?
My Key is xxx.yyy.zzz where combination of xxx.zzz is unique throughout the Map. I have xxx and zzz as input. Also i have set of possible values of yyy(5-6 possibilities which may increase as well)for a given zzz.
I have two options to solve this now.
Map.Entry to check whether key starts and ends with xxx and zzz respectively
Trial and Error Method
i. Form key xxx.yyy.zzz with all possible yyys and check for whether the key is present or not using .contains()
ii. But this way, if i do .contains() 5-6 times for each call, won't it loop through 5-6 times at the worst case?
iii. Also i am creating more strings in stringpool.
Which one should i prefer?
The only way to retrieve a value from a HashMap without iterating over the entries/keys (which you don't want) is by searching for the full key.
If you require efficient search via a partial key, you should consider having a HashMap whose key is that partial key.
No, it's not possible to use partial keys with a HashMap.
With TreeMap this can be achieved with a partial prefix of the wanted key, as it allows you to use tailMap(String key) to return a part of the map that would follow a specific key (i.e. your keypart). You'd still need to process the entries to see which ones would match the partial key.
If your keys are like xxx.yyy.zzz and you want to use xxx.* type access then you could consider my MapFilter class.
It allows you to take a Map and filter it on a certain key prefix. This will do the searching for specific prefixes and retain the results of that search for later.
Can we achieve it using any Regex statement in HashMap.get()?
No.You can't. You need to pass the exact key to get the associated value.
Alternatively, you should itertate ober keys and get the values matched to it. They you can have regex to match your input string against key.
You cannot do this using a HashMap. However, you can use a TreeMap which will internally store the keys according to their natural order. You can write a custom search method which will find the matching key, if it exists, in the set using the regex. If written correctly, this will take O(lgN) time, which is substantially better than linear. The problem reduces to searching for a String in an ordered list of Strings.
As #Thilo pointed out, this solution assumes that you are trying to match a fragment of a key which starts at the beginning, and not anywhere else.
HashMap works on hashing algorithm that maintains hash buckets of hash code of keys and based on that hash code hash map retrieves corresponding value. For the you need to override equals() and hashcode() method for custom objects.
So
If you will try to get the value of a key, then key's hash code value get generated and further fetch operation happen based on that hash code.
If you would not give a exact match of key how HashMap will find out that bucket with a wrong hashcode ?

Collision resolution in Java HashMap

Java HashMap uses put method to insert the K/V pair in HashMap.
Lets say I have used put method and now HashMap<Integer, Integer> has one entry with key as 10 and value as 17.
If I insert 10,20 in this HashMap it simply replaces the the previous entry with this entry due to collision because of same key 10.
If the key collides HashMap replaces the old K/V pair with the new K/V pair.
So my question is when does the HashMap use Chaining collision resolution technique?
Why it did not form a linkedlist with key as 10 and value as 17,20?
When you insert the pair (10, 17) and then (10, 20), there is technically no collision involved. You are just replacing the old value with the new value for a given key 10 (since in both cases, 10 is equal to 10 and also the hash code for 10 is always 10).
Collision happens when multiple keys hash to the same bucket. In that case, you need to make sure that you can distinguish between those keys. Chaining collision resolution is one of those techniques which is used for this.
As an example, let's suppose that two strings "abra ka dabra" and "wave my wand" yield hash codes 100 and 200 respectively. Assuming the total array size is 10, both of them end up in the same bucket (100 % 10 and 200 % 10). Chaining ensures that whenever you do map.get( "abra ka dabra" );, you end up with the correct value associated with the key. In the case of hash map in Java, this is done by using the equals method.
In a HashMap the key is an object, that contains hashCode() and equals(Object) methods.
When you insert a new entry into the Map, it checks whether the hashCode is already known. Then, it will iterate through all objects with this hashcode, and test their equality with .equals(). If an equal object is found, the new value replaces the old one. If not, it will create a new entry in the map.
Usually, talking about maps, you use collision when two objects have the same hashCode but they are different. They are internally stored in a list.
It could have formed a linked list, indeed. It's just that Map contract requires it to replace the entry:
V put(K key, V value)
Associates the specified value with the specified key in this map
(optional operation). If the map previously contained a mapping for
the key, the old value is replaced by the specified value. (A map m is
said to contain a mapping for a key k if and only if m.containsKey(k)
would return true.)
http://docs.oracle.com/javase/6/docs/api/java/util/Map.html
For a map to store lists of values, it'd need to be a Multimap. Here's Google's: http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Multimap.html
A collection similar to a Map, but which may associate multiple values
with a single key. If you call put(K, V) twice, with the same key but
different values, the multimap contains mappings from the key to both
values.
Edit: Collision resolution
That's a bit different. A collision happens when two different keys happen to have the same hash code, or two keys with different hash codes happen to map into the same bucket in the underlying array.
Consider HashMap's source (bits and pieces removed):
public V put(K key, V value) {
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
// i is the index where we want to insert the new element
addEntry(hash, key, value, i);
return null;
}
void addEntry(int hash, K key, V value, int bucketIndex) {
// take the entry that's already in that bucket
Entry<K,V> e = table[bucketIndex];
// and create a new one that points to the old one = linked list
table[bucketIndex] = new Entry<>(hash, key, value, e);
}
For those who are curious how the Entry class in HashMap comes to behave like a list, it turns out that HashMap defines its own static Entry class which implements Map.Entry. You can see for yourself by viewing the source code:
GrepCode for HashMap
First of all, you have got the concept of hashing a little wrong and it has been rectified by #Sanjay.
And yes, Java indeed implement a collision resolution technique. When two keys get hashed to a same value (as the internal array used is finite in size and at some point the hashcode() method will return same hash value for two different keys) at this time, a linked list is formed at the bucket location where all the informations are entered as an Map.Entry object that contains a key-value pair. Accessing an object via a key will at worst require O(n) if the entry in present in such a lists. Comparison between the key you passed with each key in such list will be done by the equals() method.
Although, from Java 8 , the linked lists are replaced with trees (O(log n))
Your case is not talking about collision resolution, it is simply replacement of older value with a new value for the same key because Java's HashMap can't contain duplicates (i.e., multiple values) for the same key.
In your example, the value 17 will be simply replaced with 20 for the same key 10 inside the HashMap.
If you are trying to put a different/new value for the same key, it is not the concept of collision resolution, rather it is simply replacing the old value with a new value for the same key. It is how HashMap has been designed and you can have a look at the below API (emphasis is mine) taken from here.
public V put(K key, V value)
Associates the specified value with the
specified key in this map. If the map previously contained a mapping
for the key, the old value is replaced.
On the other hand, collision resolution techniques comes into play only when multiple keys end up with the same hashcode (i.e., they fall in the same bucket location) where an entry is already stored. HashMap handles the collision resolution by using the concept of chaining i.e., it stores the values in a linked list (or a balanced tree since Java8, depends on the number of entries).
When multiple keys end up in same hash code which is present in same bucket.
When the same key has different values then the old value will be replaced with new value.
Liked list converted to balanced Binary tree from java 8 version on wards in worst case scenario.
Collision happen when 2 distinct keys generate the same hashcode() value.
When there are more collisions then there it will leads to worst performance of hashmap.
Objects which are are equal according to the equals method must return the same hashCode value.
When both objects return the same has code then they will be moved into the same bucket.
There is difference between collision and duplication.
Collision means hashcode and bucket is same, but in duplicate, it will be same hashcode,same bucket, but here equals method come in picture.
Collision detected and you can add element on existing key. but in case of duplication it will replace new value.
It isn't defined to do so. In order to achieve this functionality, you need to create a map that maps keys to lists of values:
Map<Foo, List<Bar>> myMap;
Or, you could use the Multimap from google collections / guava libraries
There is no collision in your example. You use the same key, so the old value gets replaced with the new one. Now, if you used two keys that map to the same hash code, then you'd have a collision. But even in that case, HashMap would replace your value! If you want the values to be chained in case of a collision, you have to do it yourself, e.g. by using a list as a value.

What does it mean by "the hash table is open" in Java?

I was reading the Java api docs on Hashtable class and came across several questions. In the doc, it says "Note that the hash table is open: in the case of a "hash collision", a single bucket stores multiple entries, which must be searched sequentially. " I tried the following code myself
Hashtable<String, Integer> me = new Hashtable<String, Integer>();
me.put("one", new Integer(1));
me.put("two", new Integer(2));
me.put("two", new Integer(3));
System.out.println(me.get("one"));
System.out.println(me.get("two"));
the out put was
1
3
Is this what it means by "open"?
what happened to the Integer 2? collected as garbage?
Is there an "closed" example?
No, this is not what is meant by "open".
Note the difference between a key collision and a hash collision.
The Hashtable will not allow more than one entry with the same key (as in your example, you put two entries with the key "two", the second one (3) replaced the first one (2), and you were left with only the second one in the Hashtable).
A hash collision is when two different keys have the same hashcode (as returned by their hashCode() method). Different hash table implementations could treat this in different ways, mostly in terms of low-level implementation. Being "open", Hashtable will store a linked list of entries whose keys hash to the same value. This can cause, in the worst case, O(N) performance for simple operations, that normally would be O(1) in a hash map where the hashes mostly were different values.
It means that two items with different keys that have the same hashcode end up in the same bucket.
In your case the keys "two" are the same and so the second put overwrites the first one.
But assuming that you have your own class
class Thingy {
private final String name;
public Thingy(String name) {
this.name = name;
}
public boolean equals(Object o) {
...
}
public int hashcode() {
//not the worlds best idea
return 1;
}
}
And created multiple instances of it. i.e.
Thingy a = new Thingy("a");
Thingy b = new Thingy("b");
Thingy c = new Thingy("c");
And inserted them into a map. Then one bucket i.e. the bucket containing the stuff with hashcode 1 will contain a list (chain) of the three items.
Map<Thingy, Thingy> map = new HashMap<Thingy, Thingy>();
map.put(a, a);
map.put(b, b);
map.put(c, c);
So getting an item by any Thingy key would result in a lookup of the hashcode O(1) followed by a linear search O(n) on the list of items in the bucket with hashcode 1.
Also be careful to ensure that you obey the correct relationship when implementing hashcode and equals. Namely if two objects are equal then they should have the same hascode, but not necessarily the otherway round as multiple keys are likely to get the same hashcode.
Oh and for the full definitions of Open hashing and Closed hash tables look here http://www.c2.com/cgi/wiki?HashTable
Open means that if two keys are not equal, but have the same hash value, then they will be stored in the same "bucket". In this case, you can think of each bucket as a linked list, so if many things are stored in the same bucket, search performance will decrease.
Bucket 0: Nothing
Bucket 1: Item 1
Bucket 2: Item 2 -> Item 3
Bucket 3: Nothing
Bucket 4: Item 4
In this case, if you search for a key that hashes to bucket 2, you have to then perform an O(n) search on the list to find the key that equals what you're searching for. If the key hashes to Bucket 0, 1, 3, or 4, then you get an O(1) search performance.
It means that Hashtable uses open hashing (also known as separate chaining) to deal with hash collisions. If two separate keys have the same hashcode, both of them will be stored in the same bucket (in a list).
A hash is a computed function that maps one object ("one" or "two" in your sample) to (in this case) an integer. This means that there may be multiple values that map to the same integer ( an integer has a finite number of permitted values while there may be an infinite number of inputs) . In this case "equals" must be able to tell these two apart. So your code example is correct, but there may be some other key that has the same hashcode (and will be put in the same bucket as "two")
Warning: there are contradictory definitions of "open hashing" in common usage:
Quoting from http://www.c2.com/cgi/wiki?HashTable cited in another answer:
Caution: some people use the term
"open hashing" to mean what I've
called "closed hashing" here! The
usage here is in accordance with that
in TheArtOfComputerProgramming and
IntroductionToAlgorithms, both of
which are recommended references if
you want to know more about hash
tables.
For example, the above page defines "open hashing" as follows:
There are two main strategies. Open
hashing, also called open addressing,
says: when the table entry you need
for a new key/value pair is already
occupied, find another unused entry
somehow and put it there. Closed
hashing says: each entry in the table
is a secondary data structure (usually
a linked list, but there are other
possibilities) containing the actual
data, and this data structure can be
extended without limit.
By contrast, the definition supplied by Wikipedia is:
In the strategy known as separate
chaining, direct chaining, or simply
chaining, each slot of the bucket
array is a pointer to a linked list
that contains the key-value pairs that
hashed to the same location. Lookup
requires scanning the list for an
entry with the given key. Insertion
requires appending a new entry record
to either end of the list in the
hashed slot. Deletion requires
searching the list and removing the
element. (The technique is also called
open hashing or closed addressing,
which should not be confused with
'open addressing' or 'closed
hashing'.)
If so-called "experts" cannot agree what the term "open hashing" means, it is best to avoid using it.

Categories