I've heard in my degree classes that a HashTable will place a new entry into the 'next available' bucket if the new Key entry collides with another.
How would the HashTable still return the correct Value if this collision occurs when calling for one back with the collision key?
I'm assuming that the Keys are String type and the hashCode() returns the default generated by say Java.
If I implement my own hashing function and use it as part of a look-up table (i.e. a HashMap or Dictionary), what strategies exist for dealing with collisions?
I've even seen notes relating to prime numbers! Information not so clear from Google search.
Hash tables deal with collisions in one of two ways.
Option 1: By having each bucket contain a linked list of elements that are hashed to that bucket. This is why a bad hash function can make lookups in hash tables very slow.
Option 2: If the hash table entries are all full then the hash table can increase the number of buckets that it has and then redistribute all the elements in the table. The hash function returns an integer and the hash table has to take the result of the hash function and mod it against the size of the table that way it can be sure it will get to bucket. So by increasing the size, it will rehash and run the modulo calculations which if you are lucky might send the objects to different buckets.
Java uses both option 1 and 2 in its hash table implementations.
When you talked about "Hash Table will place a new entry into the 'next available' bucket if the new Key entry collides with another.", you are talking about the Open addressing strategy of Collision resolution of hash table.
There are several strategies for hash table to resolve collision.
First kind of big method require that the keys (or pointers to them) be stored in the table, together with the associated values, which further includes:
Separate chaining
Open addressing
Coalesced hashing
Cuckoo hashing
Robin Hood hashing
2-choice hashing
Hopscotch hashing
Another important method to handle collision is by Dynamic resizing, which further has several ways:
Resizing by copying all entries
Incremental resizing
Monotonic keys
EDIT: the above are borrowed from wiki_hash_table, where you should go to have a look to get more info.
There are multiple techniques available to handle collision. I will explain some of them
Chaining:
In chaining we use array indexes to store the values. If hash code of second value also points to the same index then we replace that index value with an linked list and all values pointing to that index are stored in the linked list and actual array index points to the head of the the linked list.
But if there is only one hash code pointing to an index of array then the value is directly stored in that index. Same logic is applied while retrieving the values. This is used in Java HashMap/Hashtable to avoid collisions.
Linear probing: This technique is used when we have more index in the table than the values to be stored. Linear probing technique works on the concept of keep incrementing until you find an empty slot. The pseudo code looks like this:
index = h(k)
while( val(index) is occupied)
index = (index+1) mod n
Double hashing technique: In this technique we use two hashing functions h1(k) and h2(k). If the slot at h1(k) is occupied then the second hashing function h2(k) used to increment the index. The pseudo-code looks like this:
index = h1(k)
while( val(index) is occupied)
index = (index + h2(k)) mod n
Linear probing and double hashing techniques are part of open addressing technique and it can only be used if available slots are more than the number of items to be added. It takes less memory than chaining because there is no extra structure used here but its slow because of lot of movement happen until we find an empty slot. Also in open addressing technique when an item is removed from a slot we put an tombstone to indicate that the item is removed from here that is why its empty.
For more information see this site.
I strongly suggest you to read this blog post which appeared on HackerNews recently:
How HashMap works in Java
In short, the answer is
What will happen if two different
HashMap key objects have same
hashcode?
They will be stored in same bucket but
no next node of linked list. And keys
equals () method will be used to
identify correct key value pair in
HashMap.
I've heard in my degree classes that a
HashTable will place a new entry into
the 'next available' bucket if the new
Key entry collides with another.
This is actually not true, at least for the Oracle JDK (it is an implementation detail that could vary between different implementations of the API). Instead, each bucket contains a linked list of entries prior to Java 8, and a balanced tree in Java 8 or above.
then how would the HashTable still
return the correct Value if this
collision occurs when calling for one
back with the collision key?
It uses the equals() to find the actually matching entry.
If I implement my own hashing function
and use it as part of a look-up table
(i.e. a HashMap or Dictionary), what
strategies exist for dealing with
collisions?
There are various collision handling strategies with different advantages and disadvantages.
Wikipedia's entry on hash tables gives a good overview.
Update since Java 8: Java 8 uses a self-balanced tree for collision-handling, improving the worst case from O(n) to O(log n) for lookup. The use of a self-balanced tree was introduced in Java 8 as an improvement over chaining (used until java 7), which uses a linked-list, and has a worst case of O(n) for lookup (as it needs to traverse the list)
To answer the second part of your question, insertion is done by mapping a given element to a given index in the underlying array of the hashmap, however, when a collision occurs, all elements must still be preserved (stored in a secondary data-structure, and not just replaced in the underlying array). This is usually done by making each array-component (slot) be a secondary datastructure (aka bucket), and the element is added to the bucket residing on the given array-index (if the key does not already exist in the bucket, in which case it is replaced).
During lookup, the key is hashed to it's corresponding array-index, and search is performed for an element matching the (exact) key in the given bucket. Because the bucket does not need to handle collisions (compares keys directly), this solves the problem of collisions, but does so at the cost of having to perform insertion and lookup on the secondary datastructure. The key point is that in a hashmap, both the key and the value is stored, and so even if the hash collides, keys are compared directly for equality (in the bucket), and thus can be uniquely identified in the bucket.
Collission-handling brings the worst-case performance of insertion and lookup from O(1) in the case of no collission-handling to O(n) for chaining (a linked-list is used as secondary datastructure) and O(log n) for self-balanced tree.
References:
Java 8 has come with the following improvements/changes of HashMap
objects in case of high collisions.
The alternative String hash function added in Java 7 has been removed.
Buckets containing a large number of colliding keys will store their entries in a balanced tree instead of a linked list after
certain threshold is reached.
Above changes ensure performance of O(log(n)) in worst case scenarios
(https://www.nagarro.com/en/blog/post/24/performance-improvement-for-hashmap-in-java-8)
It will use the equals method to see if the key is present even and especially if there are more than one element in the same bucket.
As there is some confusion about which algorithm Java's HashMap is using (in the Sun/Oracle/OpenJDK implementation), here the relevant source code snippets (from OpenJDK, 1.6.0_20, on Ubuntu):
/**
* Returns the entry associated with the specified key in the
* HashMap. Returns null if the HashMap contains no mapping
* for the key.
*/
final Entry<K,V> getEntry(Object key) {
int hash = (key == null) ? 0 : hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
}
return null;
}
This method (cite is from lines 355 to 371) is called when looking up an entry in the table, for example from get(), containsKey() and some others. The for loop here goes through the linked list formed by the entry objects.
Here the code for the entry objects (lines 691-705 + 759):
static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
final int hash;
/**
* Creates new entry.
*/
Entry(int h, K k, V v, Entry<K,V> n) {
value = v;
next = n;
key = k;
hash = h;
}
// (methods left away, they are straight-forward implementations of Map.Entry)
}
Right after this comes the addEntry() method:
/**
* Adds a new entry with the specified key, value and hash code to
* the specified bucket. It is the responsibility of this
* method to resize the table if appropriate.
*
* Subclass overrides this to alter the behavior of put method.
*/
void addEntry(int hash, K key, V value, int bucketIndex) {
Entry<K,V> e = table[bucketIndex];
table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
if (size++ >= threshold)
resize(2 * table.length);
}
This adds the new Entry on the front of the bucket, with a link to the old first entry (or null, if no such one). Similarily, the removeEntryForKey() method goes through the list and takes care of deleting only one entry, letting the rest of the list intact.
So, here is a linked entry list for each bucket, and I very doubt that this changed from _20 to _22, since it was like this from 1.2 on.
(This code is (c) 1997-2007 Sun Microsystems, and available under GPL, but for copying better use the original file, contained in src.zip in each JDK from Sun/Oracle, and also in OpenJDK.)
here's a very simple hash table implementation in java. in only implements put() and get(), but you can easily add whatever you like. it relies on java's hashCode() method that is implemented by all objects. you could easily create your own interface,
interface Hashable {
int getHash();
}
and force it to be implemented by the keys if you like.
public class Hashtable<K, V> {
private static class Entry<K,V> {
private final K key;
private final V val;
Entry(K key, V val) {
this.key = key;
this.val = val;
}
}
private static int BUCKET_COUNT = 13;
#SuppressWarnings("unchecked")
private List<Entry>[] buckets = new List[BUCKET_COUNT];
public Hashtable() {
for (int i = 0, l = buckets.length; i < l; i++) {
buckets[i] = new ArrayList<Entry<K,V>>();
}
}
public V get(K key) {
int b = key.hashCode() % BUCKET_COUNT;
List<Entry> entries = buckets[b];
for (Entry e: entries) {
if (e.key.equals(key)) {
return e.val;
}
}
return null;
}
public void put(K key, V val) {
int b = key.hashCode() % BUCKET_COUNT;
List<Entry> entries = buckets[b];
entries.add(new Entry<K,V>(key, val));
}
}
There are various methods for collision resolution.Some of them are Separate Chaining,Open addressing,Robin Hood hashing,Cuckoo Hashing etc.
Java uses Separate Chaining for resolving collisions in Hash tables.Here is a great link to how it happens:
http://javapapers.com/core-java/java-hashtable/
Related
If Hashmap has two keys having same value, for eg:
HashMap map=new HashMap();
map.put("a","abc");
map.put("a","xyz");
So here put two key with "a" value and suppose for first bucketindex=1 and
second bucketindex=9
So my question is if bucket index for both is coming different after
applying hashing algorithm, in this how to handle for not inserting
duplicate key as it is already present and hashmap cannot have duplicate
key.
please suggest your view on this.
There won't be any such thing as "second bucket index".
I suggest you add something like System.out.println(map.toString()) in order to see what that second put() has done to your map.
EDIT:
In the method put(key,value), the "bucket index" is computed as a function of the key element's value, not the value element's value (so "a" and "a" give the same index for the bucket). This function is supposed to be deterministic so feeding it the same value ("a" in your case), the same hashCode() will come out and subsequently, the same bucket index.
In Java if a hashing function returns the same hash, equality of two objects is determined by equals() method. And if the objects are found equal, the old one is simply replaced by the new one.
Instead, if the objects are not equal, they just get chained in a linked list (or a balanced tree) and the map contains both objects, because they are different.
So, back to your question: "if bucket index for both is coming different after applying hashing algorithm" - this is impossible for equal objects. Equal objects must have the same hash code.
To make #Erwin's answer more clear, here's the source code of HashMap from JDK
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Digging more deep you will find that the bucket index is calculated from key's hash code.
To make it simple and straightforward, putting duplicate key with different values to the same HashMap will result just one single entry, which the second put is just overwriting the value of the entry.
If your question is how to create a hashmap that can handle more than one value for the same key, what you need is a Map> so it adds a new value to the arraylist everytime the key is the same.
I'm new to using hash table structures. I'm using LinkedHashMap (ex: cache = new LinkedHashMap<K,V>(...)) to implement my own cache. I have a list of questions about this data structure:
I set a parameter capacity = 100 (eg.), it means that number of items in bucket is limited to 100. Then if I insert a new item into this cache (when cache size = 100), am I correct in thinking the evict policy will happen?
In my implementation, keys are composite object include two items like this:
class Key {
public string a;
public string b;
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((a == null) ? 0 : a.hashCode());
result = prime * result + ((b == null) ? 0 : b.hashCode());
return result;
}
}
With this hashcode(), suppose the bucket already has 100 items. When I insert a new item, assuming that hashcode() returns a duplicate key with a previous item, my understanding is that linkedhashmap will remove the eldest item using the evict policy and use linkedlist to handle collision for the new item, so the number of items in the bucket will be 99. Is it right ?
Is there any way to identify which entries in the bucket current contain a chain for handle collision?
Answering to question one:
You need to explicity override method removeEldest to make the eviction work.
Default implementation returns false, so it won't remove any element:
protected boolean removeEldestEntry(Map.Entry<K,V> eldest) {
return false;
}
Question two: Nothing will be removed in your case, if you don't override the method removeEldest
Question three: I don't think there is a way to handle such situation.
Please read this useful article to become more familiar with eviciton algorithm based on LinkedHahMap:
http://javarticles.com/2012/06/lru-cache.html
For complementary lecture, read also about LFU eviction: http://javarticles.com/2012/06/lfu-cache.html
I set a parameter capcity = 100 (eg.), it means that number of items in bucket limit to 100. Then if I insert new item to this cache (when cache size = 100), the evict policy will happen,right?
No, the capacity parameter is a hint to the constructor of how large you expect the map to become. It uses this to attempt to avoid needlessly resizing the map as you add elements. If you add more than the specified capacity it will just resize the map to fit more elements efficiently.
when I insert new item, assuming that hashcode() return a duplicate key with one of previous items, then linkedhashmap will remove the eldest item as evict policy and use linkedlist to handle collision for new item, so the number items in bucket will be 99, is it right ?
No, if two non-equal elements are inserted with the same hash code they will simply be placed in the same bucket, but both will still exist and be accessible. Of course if you specify a key that is equal to a key that currently exists in the map, that entry will be overwritten.
Is there any way to identify which entries in the bucket current contain a chain for handle collision?
Generally no. You could use reflection, but that would be arduous at best. What are you trying to accomplish that makes you think you'd need to do this?
The caching behavior provided by LinkedHashMap depends on you extending the class and implementing removeEldestEntry(). As you can see in the example in that method, you can add a check such as size() > MAX_ENTRIES to instruct the map to remove the oldest element when put() or putAll() is called.
If you need a more powerful cache you might like Guava's Cache and LoadingCache classes.
Capacity is not fixed. It will dynamically change based on the map usage.
From javadocs:
An instance of HashMap has two parameters that affect its
performance: initial capacity and load factor. The capacity is the
number of buckets in the hash table, and the initial capacity is
simply the capacity at the time the hash table is created. The load
factor is a measure of how full the hash table is allowed to get
before its capacity is automatically increased. When the number of
entries in the hash table exceeds the product of the load factor and
the current capacity, the hash table is rehashed (that is, internal
data structures are rebuilt) so that the hash table has approximately
twice the number of buckets.
So map will not remove items based on number of entries.
Simple cache to use is provided by guava library.
The code I used originally, which returned a NullPointerException in some places for any HashMap containing 16 or more elements:
for(Entry<Integer, String> e : myHashMap.entrySet()){
System.out.println(e.getKey() + ": "+e.getValue());
}
The code I am now using, which works on the same HashMap, regardless of size:
int i = 0; //variable to show the index
int c = 0; //variable to count the number items found
while(c < myHashMap.size()){
if(myHashMap.containsKey(i)){ //if the HashMap contains the key i
System.out.println(i + ": "+myHashMap.get(i)); //Print found item
c++; //increment up to count the number of objects found
}
i++; //increment to iterate to the next key
}
What is the difference between the two? Why does the first one iterate over null values? And, more importantly, why does the first one iterate out of order if there are 16 or more items? (ie: 12,13,17,15,16,19,18 instead of the neat 12,13,14,15,16,17,18,19 in the second)
I think I am just starting to scratch the surface of java, so I would like to understand why it was designed this way. Any book recommendations on this kind of thing are welcome.
You should read the documentation of a class and try to understand its purpose before starting to use it. HashMap provides an efficient storage but no guaranteed order. It’s just a coincident that you didn't discover it with smaller HashMap sizes because the default capacity is 16 and the hash codes of contiguous Integer objects are contiguous too. But that is not a property you can rely on. You always have to assume no guaranteed order for a HashMap.
If you need the insertion order you can use a LinkedHashMap, if you need ascending order of the keys you can use a TreeMap. If you have a contiguous range of Integer keys and want ascending order you can simply use an array as well.
The foreach loop for(Entry<Integer, String> e : myHashMap.entrySet()) does not “iterate over null values”. It iterates over the values contained in the HashMap which are the values you have added before. There can be at most one null key contained in the map, if you added it. You might see null values in the debugger when looking at the internal array of a HashMap which are unused slots as a HashMap has a capacity which can be larger than its size.
Java HashMap uses put method to insert the K/V pair in HashMap.
Lets say I have used put method and now HashMap<Integer, Integer> has one entry with key as 10 and value as 17.
If I insert 10,20 in this HashMap it simply replaces the the previous entry with this entry due to collision because of same key 10.
If the key collides HashMap replaces the old K/V pair with the new K/V pair.
So my question is when does the HashMap use Chaining collision resolution technique?
Why it did not form a linkedlist with key as 10 and value as 17,20?
When you insert the pair (10, 17) and then (10, 20), there is technically no collision involved. You are just replacing the old value with the new value for a given key 10 (since in both cases, 10 is equal to 10 and also the hash code for 10 is always 10).
Collision happens when multiple keys hash to the same bucket. In that case, you need to make sure that you can distinguish between those keys. Chaining collision resolution is one of those techniques which is used for this.
As an example, let's suppose that two strings "abra ka dabra" and "wave my wand" yield hash codes 100 and 200 respectively. Assuming the total array size is 10, both of them end up in the same bucket (100 % 10 and 200 % 10). Chaining ensures that whenever you do map.get( "abra ka dabra" );, you end up with the correct value associated with the key. In the case of hash map in Java, this is done by using the equals method.
In a HashMap the key is an object, that contains hashCode() and equals(Object) methods.
When you insert a new entry into the Map, it checks whether the hashCode is already known. Then, it will iterate through all objects with this hashcode, and test their equality with .equals(). If an equal object is found, the new value replaces the old one. If not, it will create a new entry in the map.
Usually, talking about maps, you use collision when two objects have the same hashCode but they are different. They are internally stored in a list.
It could have formed a linked list, indeed. It's just that Map contract requires it to replace the entry:
V put(K key, V value)
Associates the specified value with the specified key in this map
(optional operation). If the map previously contained a mapping for
the key, the old value is replaced by the specified value. (A map m is
said to contain a mapping for a key k if and only if m.containsKey(k)
would return true.)
http://docs.oracle.com/javase/6/docs/api/java/util/Map.html
For a map to store lists of values, it'd need to be a Multimap. Here's Google's: http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Multimap.html
A collection similar to a Map, but which may associate multiple values
with a single key. If you call put(K, V) twice, with the same key but
different values, the multimap contains mappings from the key to both
values.
Edit: Collision resolution
That's a bit different. A collision happens when two different keys happen to have the same hash code, or two keys with different hash codes happen to map into the same bucket in the underlying array.
Consider HashMap's source (bits and pieces removed):
public V put(K key, V value) {
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
// i is the index where we want to insert the new element
addEntry(hash, key, value, i);
return null;
}
void addEntry(int hash, K key, V value, int bucketIndex) {
// take the entry that's already in that bucket
Entry<K,V> e = table[bucketIndex];
// and create a new one that points to the old one = linked list
table[bucketIndex] = new Entry<>(hash, key, value, e);
}
For those who are curious how the Entry class in HashMap comes to behave like a list, it turns out that HashMap defines its own static Entry class which implements Map.Entry. You can see for yourself by viewing the source code:
GrepCode for HashMap
First of all, you have got the concept of hashing a little wrong and it has been rectified by #Sanjay.
And yes, Java indeed implement a collision resolution technique. When two keys get hashed to a same value (as the internal array used is finite in size and at some point the hashcode() method will return same hash value for two different keys) at this time, a linked list is formed at the bucket location where all the informations are entered as an Map.Entry object that contains a key-value pair. Accessing an object via a key will at worst require O(n) if the entry in present in such a lists. Comparison between the key you passed with each key in such list will be done by the equals() method.
Although, from Java 8 , the linked lists are replaced with trees (O(log n))
Your case is not talking about collision resolution, it is simply replacement of older value with a new value for the same key because Java's HashMap can't contain duplicates (i.e., multiple values) for the same key.
In your example, the value 17 will be simply replaced with 20 for the same key 10 inside the HashMap.
If you are trying to put a different/new value for the same key, it is not the concept of collision resolution, rather it is simply replacing the old value with a new value for the same key. It is how HashMap has been designed and you can have a look at the below API (emphasis is mine) taken from here.
public V put(K key, V value)
Associates the specified value with the
specified key in this map. If the map previously contained a mapping
for the key, the old value is replaced.
On the other hand, collision resolution techniques comes into play only when multiple keys end up with the same hashcode (i.e., they fall in the same bucket location) where an entry is already stored. HashMap handles the collision resolution by using the concept of chaining i.e., it stores the values in a linked list (or a balanced tree since Java8, depends on the number of entries).
When multiple keys end up in same hash code which is present in same bucket.
When the same key has different values then the old value will be replaced with new value.
Liked list converted to balanced Binary tree from java 8 version on wards in worst case scenario.
Collision happen when 2 distinct keys generate the same hashcode() value.
When there are more collisions then there it will leads to worst performance of hashmap.
Objects which are are equal according to the equals method must return the same hashCode value.
When both objects return the same has code then they will be moved into the same bucket.
There is difference between collision and duplication.
Collision means hashcode and bucket is same, but in duplicate, it will be same hashcode,same bucket, but here equals method come in picture.
Collision detected and you can add element on existing key. but in case of duplication it will replace new value.
It isn't defined to do so. In order to achieve this functionality, you need to create a map that maps keys to lists of values:
Map<Foo, List<Bar>> myMap;
Or, you could use the Multimap from google collections / guava libraries
There is no collision in your example. You use the same key, so the old value gets replaced with the new one. Now, if you used two keys that map to the same hash code, then you'd have a collision. But even in that case, HashMap would replace your value! If you want the values to be chained in case of a collision, you have to do it yourself, e.g. by using a list as a value.
I was reading the Java api docs on Hashtable class and came across several questions. In the doc, it says "Note that the hash table is open: in the case of a "hash collision", a single bucket stores multiple entries, which must be searched sequentially. " I tried the following code myself
Hashtable<String, Integer> me = new Hashtable<String, Integer>();
me.put("one", new Integer(1));
me.put("two", new Integer(2));
me.put("two", new Integer(3));
System.out.println(me.get("one"));
System.out.println(me.get("two"));
the out put was
1
3
Is this what it means by "open"?
what happened to the Integer 2? collected as garbage?
Is there an "closed" example?
No, this is not what is meant by "open".
Note the difference between a key collision and a hash collision.
The Hashtable will not allow more than one entry with the same key (as in your example, you put two entries with the key "two", the second one (3) replaced the first one (2), and you were left with only the second one in the Hashtable).
A hash collision is when two different keys have the same hashcode (as returned by their hashCode() method). Different hash table implementations could treat this in different ways, mostly in terms of low-level implementation. Being "open", Hashtable will store a linked list of entries whose keys hash to the same value. This can cause, in the worst case, O(N) performance for simple operations, that normally would be O(1) in a hash map where the hashes mostly were different values.
It means that two items with different keys that have the same hashcode end up in the same bucket.
In your case the keys "two" are the same and so the second put overwrites the first one.
But assuming that you have your own class
class Thingy {
private final String name;
public Thingy(String name) {
this.name = name;
}
public boolean equals(Object o) {
...
}
public int hashcode() {
//not the worlds best idea
return 1;
}
}
And created multiple instances of it. i.e.
Thingy a = new Thingy("a");
Thingy b = new Thingy("b");
Thingy c = new Thingy("c");
And inserted them into a map. Then one bucket i.e. the bucket containing the stuff with hashcode 1 will contain a list (chain) of the three items.
Map<Thingy, Thingy> map = new HashMap<Thingy, Thingy>();
map.put(a, a);
map.put(b, b);
map.put(c, c);
So getting an item by any Thingy key would result in a lookup of the hashcode O(1) followed by a linear search O(n) on the list of items in the bucket with hashcode 1.
Also be careful to ensure that you obey the correct relationship when implementing hashcode and equals. Namely if two objects are equal then they should have the same hascode, but not necessarily the otherway round as multiple keys are likely to get the same hashcode.
Oh and for the full definitions of Open hashing and Closed hash tables look here http://www.c2.com/cgi/wiki?HashTable
Open means that if two keys are not equal, but have the same hash value, then they will be stored in the same "bucket". In this case, you can think of each bucket as a linked list, so if many things are stored in the same bucket, search performance will decrease.
Bucket 0: Nothing
Bucket 1: Item 1
Bucket 2: Item 2 -> Item 3
Bucket 3: Nothing
Bucket 4: Item 4
In this case, if you search for a key that hashes to bucket 2, you have to then perform an O(n) search on the list to find the key that equals what you're searching for. If the key hashes to Bucket 0, 1, 3, or 4, then you get an O(1) search performance.
It means that Hashtable uses open hashing (also known as separate chaining) to deal with hash collisions. If two separate keys have the same hashcode, both of them will be stored in the same bucket (in a list).
A hash is a computed function that maps one object ("one" or "two" in your sample) to (in this case) an integer. This means that there may be multiple values that map to the same integer ( an integer has a finite number of permitted values while there may be an infinite number of inputs) . In this case "equals" must be able to tell these two apart. So your code example is correct, but there may be some other key that has the same hashcode (and will be put in the same bucket as "two")
Warning: there are contradictory definitions of "open hashing" in common usage:
Quoting from http://www.c2.com/cgi/wiki?HashTable cited in another answer:
Caution: some people use the term
"open hashing" to mean what I've
called "closed hashing" here! The
usage here is in accordance with that
in TheArtOfComputerProgramming and
IntroductionToAlgorithms, both of
which are recommended references if
you want to know more about hash
tables.
For example, the above page defines "open hashing" as follows:
There are two main strategies. Open
hashing, also called open addressing,
says: when the table entry you need
for a new key/value pair is already
occupied, find another unused entry
somehow and put it there. Closed
hashing says: each entry in the table
is a secondary data structure (usually
a linked list, but there are other
possibilities) containing the actual
data, and this data structure can be
extended without limit.
By contrast, the definition supplied by Wikipedia is:
In the strategy known as separate
chaining, direct chaining, or simply
chaining, each slot of the bucket
array is a pointer to a linked list
that contains the key-value pairs that
hashed to the same location. Lookup
requires scanning the list for an
entry with the given key. Insertion
requires appending a new entry record
to either end of the list in the
hashed slot. Deletion requires
searching the list and removing the
element. (The technique is also called
open hashing or closed addressing,
which should not be confused with
'open addressing' or 'closed
hashing'.)
If so-called "experts" cannot agree what the term "open hashing" means, it is best to avoid using it.