How to calculate the complexity of the HashMap search algorithm? [duplicate]

How to calculate the complexity of the HashMap search algorithm? [duplicate] - java

This question already has answers here:
Is a Java hashmap search really O(1)?
(15 answers)
Closed 4 years ago.
How to calculate the complexity of the HashMap search algorithm? I'm googling result of this calculation - O(1), but I don't understand how they arrived at these findings.

HashMap works on the hashing principle.It is the data structure that allow you to store and retrieve data in O(1) time provided we know the key.
In hashing, hash functions are used to link key and value in HashMap. Objects are stored by calling put(key, value) method of HashMap and retrieved by calling get(key) method. When we call put method, hashcode() method of the key object is called so that hash function of the map can find a bucket location to store value object, which is actually an index of the internal array, known as the table. HashMap internally stores mapping in the form of Map.Entry object which contains both key and value object. When you want to retrieve the object, you call the get() method and again pass the key object. This time again key object generate same hash code (it's mandatory for it to do so to retrieve the object and that's why HashMap keys are immutable e.g. String) and we end up at same bucket location. If there is only one object then it is returned and that's your value object which you have stored earlier. Things get little tricky when collisions occur.
Collision : Since the internal array of HashMap is of fixed size, and if you keep storing objects, at some point of time hash function will return same bucket location for two different keys, this is called collision in HashMap. In this case, a linked list is formed at that bucket location and a new entry is stored as next node.
If we try to retrieve an object from this linked list, we need an extra check to search correct value, this is done by equals() method. Since each node contains an entry, HashMap keeps comparing entry's key object with the passed key using equals() and when it return true, Map returns the corresponding value.
Since searching inlined list is O(n) operation, in worst case hash collision reduce a map to linked list. This issue is recently addressed in Java 8 by replacing linked list to the tree to search in O(logN) time.
By the way, you can easily verify how HashMap works by looking at the code of HashMap.java in your Eclipse IDE if you are keenly interested in the code, otherwise the logic is explained above.
Information On Buckets : An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

Related

Why does Java HashMap contain null values in the debug view

While debugging I found a strange behavior.
I got a HashMap<Integer, Set<Term>> (Term is a class which only contains a String) the normal toString() shows this:
When I click the table property of the HashMap I get this:
My Question now, why are there null values in the table toString() ?
Edit: Thanks for your fast answers! If I could, I would accept all of them...

HashMap is a Map implementation that's crucial feature is constant time O(1) lookup.
The only data structure in computer science with constant time lookup is an array of fixed length. When you initialise the HashMap it's creating a fixed length array that it will expand when your entries exceed the current array's size.
Edit: #kutschkem has pointed out that java.util.HashMap expands its fixed length array when the number of entries is around 80% of the current array's size, rather than when the entries exceed the current array's size.

Because the Map implementation you are using is working with a starting set of HashBuckets some of which are NULL at beginning (determined by initialCapacity). If you exceed the number of entries it will start creating more HashBuckets / slots for your Objects. Think of this as a growth reserve the HashMap automatically creates for you.
Read more:
https://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html

The HashMap stores its entries in a hashtable. That is an array, and the hash function maps the key to one of the array entries (also called hash buckets).
The hash buckets are always at least 20% empty. If they are not, then the array is resized to make sure there is enough free space.
The reason is that as the hash table gets filled up, collisions between hashes get more and more likely. You lose all advantages of the HashMap if collisions are too frequent. Too full, and your HashMap would be no better than a LinkedList (yes, LinkedList, not ArrayList). It would probably be even worse.

That is how a hash map work: a large array (table), and for some key the following table entry is tried:
table[key.hashCode() % table.length]
That table slot then is used. Rehashing is used if there already is a key that is not equals(key).
So initially the table contains only nulls, and has size initialCapacity. The array can be grown when the hash map becomes too filled (loadFactor).

The HashMap uses internally an array to store the entries. Very much simplified, it does something like array_index = hashcode % array_length (again: very simplified, as it also needs to take care of hash collisions etc). This internal array is typically larger than the number of elements you store in the HashMap -- otherwise, the array would have to be resized every time you add an element to it. So what you see as null are the yet unused slots in the array.

This is normal behavior.
There are null values because the table array was initialized as being filled with nulls, and uses null to indicate that there are no values stored in that hash bucket.
The toString() function provided doesn't skip over them because seeing them was useful to the folds debugging the HashMap implementation.
If you want to see the contents without the nulls, you'll have to write your own display function, either by subclassing HashMap and overriding toString() or by providing a convenience function somewhere in your code.

Collision resolution in Java HashMap

Java HashMap uses put method to insert the K/V pair in HashMap.
Lets say I have used put method and now HashMap<Integer, Integer> has one entry with key as 10 and value as 17.
If I insert 10,20 in this HashMap it simply replaces the the previous entry with this entry due to collision because of same key 10.
If the key collides HashMap replaces the old K/V pair with the new K/V pair.
So my question is when does the HashMap use Chaining collision resolution technique?
Why it did not form a linkedlist with key as 10 and value as 17,20?

When you insert the pair (10, 17) and then (10, 20), there is technically no collision involved. You are just replacing the old value with the new value for a given key 10 (since in both cases, 10 is equal to 10 and also the hash code for 10 is always 10).
Collision happens when multiple keys hash to the same bucket. In that case, you need to make sure that you can distinguish between those keys. Chaining collision resolution is one of those techniques which is used for this.
As an example, let's suppose that two strings "abra ka dabra" and "wave my wand" yield hash codes 100 and 200 respectively. Assuming the total array size is 10, both of them end up in the same bucket (100 % 10 and 200 % 10). Chaining ensures that whenever you do map.get( "abra ka dabra" );, you end up with the correct value associated with the key. In the case of hash map in Java, this is done by using the equals method.

In a HashMap the key is an object, that contains hashCode() and equals(Object) methods.
When you insert a new entry into the Map, it checks whether the hashCode is already known. Then, it will iterate through all objects with this hashcode, and test their equality with .equals(). If an equal object is found, the new value replaces the old one. If not, it will create a new entry in the map.
Usually, talking about maps, you use collision when two objects have the same hashCode but they are different. They are internally stored in a list.

It could have formed a linked list, indeed. It's just that Map contract requires it to replace the entry:
V put(K key, V value)
Associates the specified value with the specified key in this map
(optional operation). If the map previously contained a mapping for
the key, the old value is replaced by the specified value. (A map m is
said to contain a mapping for a key k if and only if m.containsKey(k)
would return true.)
http://docs.oracle.com/javase/6/docs/api/java/util/Map.html
For a map to store lists of values, it'd need to be a Multimap. Here's Google's: http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Multimap.html
A collection similar to a Map, but which may associate multiple values
with a single key. If you call put(K, V) twice, with the same key but
different values, the multimap contains mappings from the key to both
values.
Edit: Collision resolution
That's a bit different. A collision happens when two different keys happen to have the same hash code, or two keys with different hash codes happen to map into the same bucket in the underlying array.
Consider HashMap's source (bits and pieces removed):
public V put(K key, V value) {
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
// i is the index where we want to insert the new element
addEntry(hash, key, value, i);
return null;
}
void addEntry(int hash, K key, V value, int bucketIndex) {
// take the entry that's already in that bucket
Entry<K,V> e = table[bucketIndex];
// and create a new one that points to the old one = linked list
table[bucketIndex] = new Entry<>(hash, key, value, e);
}
For those who are curious how the Entry class in HashMap comes to behave like a list, it turns out that HashMap defines its own static Entry class which implements Map.Entry. You can see for yourself by viewing the source code:
GrepCode for HashMap

First of all, you have got the concept of hashing a little wrong and it has been rectified by #Sanjay.
And yes, Java indeed implement a collision resolution technique. When two keys get hashed to a same value (as the internal array used is finite in size and at some point the hashcode() method will return same hash value for two different keys) at this time, a linked list is formed at the bucket location where all the informations are entered as an Map.Entry object that contains a key-value pair. Accessing an object via a key will at worst require O(n) if the entry in present in such a lists. Comparison between the key you passed with each key in such list will be done by the equals() method.
Although, from Java 8 , the linked lists are replaced with trees (O(log n))

Your case is not talking about collision resolution, it is simply replacement of older value with a new value for the same key because Java's HashMap can't contain duplicates (i.e., multiple values) for the same key.
In your example, the value 17 will be simply replaced with 20 for the same key 10 inside the HashMap.
If you are trying to put a different/new value for the same key, it is not the concept of collision resolution, rather it is simply replacing the old value with a new value for the same key. It is how HashMap has been designed and you can have a look at the below API (emphasis is mine) taken from here.
public V put(K key, V value)
Associates the specified value with the
specified key in this map. If the map previously contained a mapping
for the key, the old value is replaced.
On the other hand, collision resolution techniques comes into play only when multiple keys end up with the same hashcode (i.e., they fall in the same bucket location) where an entry is already stored. HashMap handles the collision resolution by using the concept of chaining i.e., it stores the values in a linked list (or a balanced tree since Java8, depends on the number of entries).

When multiple keys end up in same hash code which is present in same bucket.
When the same key has different values then the old value will be replaced with new value.
Liked list converted to balanced Binary tree from java 8 version on wards in worst case scenario.
Collision happen when 2 distinct keys generate the same hashcode() value.
When there are more collisions then there it will leads to worst performance of hashmap.
Objects which are are equal according to the equals method must return the same hashCode value.
When both objects return the same has code then they will be moved into the same bucket.

There is difference between collision and duplication.
Collision means hashcode and bucket is same, but in duplicate, it will be same hashcode,same bucket, but here equals method come in picture.
Collision detected and you can add element on existing key. but in case of duplication it will replace new value.

It isn't defined to do so. In order to achieve this functionality, you need to create a map that maps keys to lists of values:
Map<Foo, List<Bar>> myMap;
Or, you could use the Multimap from google collections / guava libraries

There is no collision in your example. You use the same key, so the old value gets replaced with the new one. Now, if you used two keys that map to the same hash code, then you'd have a collision. But even in that case, HashMap would replace your value! If you want the values to be chained in case of a collision, you have to do it yourself, e.g. by using a list as a value.

How are HashMap et al really O(1)? [duplicate]

This question already has answers here:
Is a Java hashmap search really O(1)?
(15 answers)
Closed 9 years ago.
I'm studying Java collection performance characteristics, Big O notation and complexity, etc. There's a real-world part I can't wrap my head around, and that's why HashMap and other hash containers are considered O(1), which should mean that finding an entry by key in a 1,000 entry table should take about the same time as a 1,000,000 entry table.
Let's say you have HashMap myHashMap, stored with a key of first name + last name. If you call myHashMap.get("FredFlinstone"), how can it instantly find Fred Flinstone's Person object? How can it not have to iterate through the set of keys stored in the HashMap to find the pointer to the object? If there were 1,000,000 entries in the HashMap, the list of keys would also be 1,000,000 long (assuming no collision), which must take more time to go through than a list of 1.000, even if it were sorted. So how can the get() or containsKey() time not change with n?
Note: I thought my question would be answered in Is a Java hashmap really O(1)? but the answers didn't really address this point. My question is also not about collisions.

"My question is also not about collisions." - Actually it is indirectly. No collision = O(1) ...
In the worst case (pathological case), there would be one bucket with N items hanging off it, then it would be O(N)

Let's take a look at a very simple example of a hash map and a hash function. To keep things simple, let's say that your hash map has 10 buckets and that it uses integers as keys. For the purposes of this example we shall use the following hash function:
public int hash(int key) {
return key % 10;
}
Now, when we want to store an entry in the map, we hash the key, get an integer between 0-9 and then put that entry in the corresponding bucket. Then, when we need to lookup a key, we just have to compute it's hash and we know exactly what bucket it is in (or would be in) without having to look in any of the others.

Computing the hash function on a given key takes constant time. Looking up whether there is a value stored to that key is a random access operation - the hashmap is backed with an array. The only problem is being assured that different keys with the SAME value (hash collision) doesn't happen too often. If it happened once in n, that's enough for constant time in the average case.

How does hashing have an o(1) search time? [duplicate]

This question already has answers here:
Can hash tables really be O(1)?
(10 answers)
Closed 5 years ago.
When we use a HashTable for storing data, it is said that searching takes o(1) time. I am confused, can anybody explain?

Well it's a little bit of a lie -- it can take longer than that, but it usually doesn't.
Basically, a hash table is an array containing all of the keys to search on. The position of each key in the array is determined by the hash function, which can be any function which always maps the same input to the same output. We shall assume that the hash function is O(1).
So when we insert something into the hash table, we use the hash function (let's call it h) to find the location where to put it, and put it there. Now we insert another thing, hashing and storing. And another. Each time we insert data, it takes O(1) time to insert it (since the hash function is O(1).
Looking up data is the same. If we want to find a value, x, we have only to find out h(x), which tells us where x is located in the hash table. So we can look up any hash value in O(1) as well.
Now to the lie: The above is a very nice theory with one problem: what if we insert data and there is already something in that position of the array? There is nothing which guarantees that the hash function won't produce the same output for two different inputs (unless you have a perfect hash function, but those are tricky to produce). Therefore, when we insert we need to take one of two strategies:
Store multiple values at each spot in the array (say, each array slot has a linked list). Now when you do a lookup, it is still O(1) to arrive at the correct place in the array, but potentially a linear search down a (hopefully short) linked list. This is called "separate chaining".
If you find something is already there, hash again and find another location. Repeat until you find an empty spot, and put it there. The lookup procedure can follow the same rules to find the data. Now it's still O(1) to get to the first location, but there is a potentially (hopefully short) linear search to bounce around the table till you find the data you are after. This is called "open addressing".
Basically, both approaches are still mostly O(1) but with a hopefully-short linear sequence. We can assume for most purposes that it is O(1). If the hash table is getting too full, those linear searches can become longer and longer, and then it is time to "re-hash" which means to create a new hash table of a much bigger size and insert all the data back into it.

Searching takes O(1) time if there is no collisons in the hashtable , so it is incorrect to sya that searching in a hashtable takes O(1) or constant time.
See how Hashtable works on MSDN?
EDIT
mgiuca explains it beautifully and i am just adding one more Collosion Avoidance technique which is called Chaining.
IN this technique , We maintain a linklist of values at each location so when you have a collosion , your value will be added to the Linklist at that position so when you are searching for a value there may be scenario that you need to search the value in whole link list so in that case searching will not be O(1) operation.

What does it mean by "the hash table is open" in Java?

I was reading the Java api docs on Hashtable class and came across several questions. In the doc, it says "Note that the hash table is open: in the case of a "hash collision", a single bucket stores multiple entries, which must be searched sequentially. " I tried the following code myself
Hashtable<String, Integer> me = new Hashtable<String, Integer>();
me.put("one", new Integer(1));
me.put("two", new Integer(2));
me.put("two", new Integer(3));
System.out.println(me.get("one"));
System.out.println(me.get("two"));
the out put was
1
3
Is this what it means by "open"?
what happened to the Integer 2? collected as garbage?
Is there an "closed" example?

No, this is not what is meant by "open".
Note the difference between a key collision and a hash collision.
The Hashtable will not allow more than one entry with the same key (as in your example, you put two entries with the key "two", the second one (3) replaced the first one (2), and you were left with only the second one in the Hashtable).
A hash collision is when two different keys have the same hashcode (as returned by their hashCode() method). Different hash table implementations could treat this in different ways, mostly in terms of low-level implementation. Being "open", Hashtable will store a linked list of entries whose keys hash to the same value. This can cause, in the worst case, O(N) performance for simple operations, that normally would be O(1) in a hash map where the hashes mostly were different values.

It means that two items with different keys that have the same hashcode end up in the same bucket.
In your case the keys "two" are the same and so the second put overwrites the first one.
But assuming that you have your own class
class Thingy {
private final String name;
public Thingy(String name) {
this.name = name;
}
public boolean equals(Object o) {
...
}
public int hashcode() {
//not the worlds best idea
return 1;
}
}
And created multiple instances of it. i.e.
Thingy a = new Thingy("a");
Thingy b = new Thingy("b");
Thingy c = new Thingy("c");
And inserted them into a map. Then one bucket i.e. the bucket containing the stuff with hashcode 1 will contain a list (chain) of the three items.
Map<Thingy, Thingy> map = new HashMap<Thingy, Thingy>();
map.put(a, a);
map.put(b, b);
map.put(c, c);
So getting an item by any Thingy key would result in a lookup of the hashcode O(1) followed by a linear search O(n) on the list of items in the bucket with hashcode 1.
Also be careful to ensure that you obey the correct relationship when implementing hashcode and equals. Namely if two objects are equal then they should have the same hascode, but not necessarily the otherway round as multiple keys are likely to get the same hashcode.
Oh and for the full definitions of Open hashing and Closed hash tables look here http://www.c2.com/cgi/wiki?HashTable

Open means that if two keys are not equal, but have the same hash value, then they will be stored in the same "bucket". In this case, you can think of each bucket as a linked list, so if many things are stored in the same bucket, search performance will decrease.
Bucket 0: Nothing
Bucket 1: Item 1
Bucket 2: Item 2 -> Item 3
Bucket 3: Nothing
Bucket 4: Item 4
In this case, if you search for a key that hashes to bucket 2, you have to then perform an O(n) search on the list to find the key that equals what you're searching for. If the key hashes to Bucket 0, 1, 3, or 4, then you get an O(1) search performance.

It means that Hashtable uses open hashing (also known as separate chaining) to deal with hash collisions. If two separate keys have the same hashcode, both of them will be stored in the same bucket (in a list).

A hash is a computed function that maps one object ("one" or "two" in your sample) to (in this case) an integer. This means that there may be multiple values that map to the same integer ( an integer has a finite number of permitted values while there may be an infinite number of inputs) . In this case "equals" must be able to tell these two apart. So your code example is correct, but there may be some other key that has the same hashcode (and will be put in the same bucket as "two")

Warning: there are contradictory definitions of "open hashing" in common usage:
Quoting from http://www.c2.com/cgi/wiki?HashTable cited in another answer:
Caution: some people use the term
"open hashing" to mean what I've
called "closed hashing" here! The
usage here is in accordance with that
in TheArtOfComputerProgramming and
IntroductionToAlgorithms, both of
which are recommended references if
you want to know more about hash
tables.
For example, the above page defines "open hashing" as follows:
There are two main strategies. Open
hashing, also called open addressing,
says: when the table entry you need
for a new key/value pair is already
occupied, find another unused entry
somehow and put it there. Closed
hashing says: each entry in the table
is a secondary data structure (usually
a linked list, but there are other
possibilities) containing the actual
data, and this data structure can be
extended without limit.
By contrast, the definition supplied by Wikipedia is:
In the strategy known as separate
chaining, direct chaining, or simply
chaining, each slot of the bucket
array is a pointer to a linked list
that contains the key-value pairs that
hashed to the same location. Lookup
requires scanning the list for an
entry with the given key. Insertion
requires appending a new entry record
to either end of the list in the
hashed slot. Deletion requires
searching the list and removing the
element. (The technique is also called
open hashing or closed addressing,
which should not be confused with
'open addressing' or 'closed
hashing'.)
If so-called "experts" cannot agree what the term "open hashing" means, it is best to avoid using it.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.