What does it mean by "the hash table is open" in Java?

What does it mean by "the hash table is open" in Java? - java

I was reading the Java api docs on Hashtable class and came across several questions. In the doc, it says "Note that the hash table is open: in the case of a "hash collision", a single bucket stores multiple entries, which must be searched sequentially. " I tried the following code myself
Hashtable<String, Integer> me = new Hashtable<String, Integer>();
me.put("one", new Integer(1));
me.put("two", new Integer(2));
me.put("two", new Integer(3));
System.out.println(me.get("one"));
System.out.println(me.get("two"));
the out put was
1
3
Is this what it means by "open"?
what happened to the Integer 2? collected as garbage?
Is there an "closed" example?

No, this is not what is meant by "open".
Note the difference between a key collision and a hash collision.
The Hashtable will not allow more than one entry with the same key (as in your example, you put two entries with the key "two", the second one (3) replaced the first one (2), and you were left with only the second one in the Hashtable).
A hash collision is when two different keys have the same hashcode (as returned by their hashCode() method). Different hash table implementations could treat this in different ways, mostly in terms of low-level implementation. Being "open", Hashtable will store a linked list of entries whose keys hash to the same value. This can cause, in the worst case, O(N) performance for simple operations, that normally would be O(1) in a hash map where the hashes mostly were different values.

It means that two items with different keys that have the same hashcode end up in the same bucket.
In your case the keys "two" are the same and so the second put overwrites the first one.
But assuming that you have your own class
class Thingy {
private final String name;
public Thingy(String name) {
this.name = name;
}
public boolean equals(Object o) {
...
}
public int hashcode() {
//not the worlds best idea
return 1;
}
}
And created multiple instances of it. i.e.
Thingy a = new Thingy("a");
Thingy b = new Thingy("b");
Thingy c = new Thingy("c");
And inserted them into a map. Then one bucket i.e. the bucket containing the stuff with hashcode 1 will contain a list (chain) of the three items.
Map<Thingy, Thingy> map = new HashMap<Thingy, Thingy>();
map.put(a, a);
map.put(b, b);
map.put(c, c);
So getting an item by any Thingy key would result in a lookup of the hashcode O(1) followed by a linear search O(n) on the list of items in the bucket with hashcode 1.
Also be careful to ensure that you obey the correct relationship when implementing hashcode and equals. Namely if two objects are equal then they should have the same hascode, but not necessarily the otherway round as multiple keys are likely to get the same hashcode.
Oh and for the full definitions of Open hashing and Closed hash tables look here http://www.c2.com/cgi/wiki?HashTable

Open means that if two keys are not equal, but have the same hash value, then they will be stored in the same "bucket". In this case, you can think of each bucket as a linked list, so if many things are stored in the same bucket, search performance will decrease.
Bucket 0: Nothing
Bucket 1: Item 1
Bucket 2: Item 2 -> Item 3
Bucket 3: Nothing
Bucket 4: Item 4
In this case, if you search for a key that hashes to bucket 2, you have to then perform an O(n) search on the list to find the key that equals what you're searching for. If the key hashes to Bucket 0, 1, 3, or 4, then you get an O(1) search performance.

It means that Hashtable uses open hashing (also known as separate chaining) to deal with hash collisions. If two separate keys have the same hashcode, both of them will be stored in the same bucket (in a list).

A hash is a computed function that maps one object ("one" or "two" in your sample) to (in this case) an integer. This means that there may be multiple values that map to the same integer ( an integer has a finite number of permitted values while there may be an infinite number of inputs) . In this case "equals" must be able to tell these two apart. So your code example is correct, but there may be some other key that has the same hashcode (and will be put in the same bucket as "two")

Warning: there are contradictory definitions of "open hashing" in common usage:
Quoting from http://www.c2.com/cgi/wiki?HashTable cited in another answer:
Caution: some people use the term
"open hashing" to mean what I've
called "closed hashing" here! The
usage here is in accordance with that
in TheArtOfComputerProgramming and
IntroductionToAlgorithms, both of
which are recommended references if
you want to know more about hash
tables.
For example, the above page defines "open hashing" as follows:
There are two main strategies. Open
hashing, also called open addressing,
says: when the table entry you need
for a new key/value pair is already
occupied, find another unused entry
somehow and put it there. Closed
hashing says: each entry in the table
is a secondary data structure (usually
a linked list, but there are other
possibilities) containing the actual
data, and this data structure can be
extended without limit.
By contrast, the definition supplied by Wikipedia is:
In the strategy known as separate
chaining, direct chaining, or simply
chaining, each slot of the bucket
array is a pointer to a linked list
that contains the key-value pairs that
hashed to the same location. Lookup
requires scanning the list for an
entry with the given key. Insertion
requires appending a new entry record
to either end of the list in the
hashed slot. Deletion requires
searching the list and removing the
element. (The technique is also called
open hashing or closed addressing,
which should not be confused with
'open addressing' or 'closed
hashing'.)
If so-called "experts" cannot agree what the term "open hashing" means, it is best to avoid using it.

Related

Why `floorEntry` and other methods are not accessible in PatriciaTrie?

While implementing an ip-lookup structure, I was trying to maintain a set of keys in a trie-like structure that allows me to search the "floor" of a key (that is, the largest key that is less or equal to a given key). I decided to use Apache Collections 4 PatriciaTrie but sadly, I found that the floorEntry and related methods are not public. My current "dirty" solution is forcing them with reflection (in Scala):
val pt = new PatriciaTrie[String]()
val method = pt.getClass.getSuperclass.getDeclaredMethod("floorEntry", classOf[Object])
method.setAccessible(true)
// and then for retrieving the entry for floor(key)
val entry = method.invoke(pt, key).asInstanceOf[Entry[String, String]]
Is there any clean way to have the same functionality? Why this methods are not publicly available?

Why those methods are not public, I don't know. (Maybe it's because you can achieve what you want with common Map API).
Here's a way to fulfil your requirement:
PatriciaTrie<String> trie = new PatriciaTrie<>();
trie.put("a", "a");
trie.put("b", "b");
trie.put("d", "d");
String floorKey = trie.headMap("d").lastKey(); // d
According to the docs, this is very efficient, since it depends on the number of bits of the largest key of the trie.
EDIT: As per the comment below, the code above has a bounds issue: headMap() returns a view of the map whose keys are strictly lower than the given key. This means that, i.e. for the above example, trie.headMap("b").lastKey() will return "a", instead of "b" (as needed).
In order to fix this bounds issue, you can use the following trick:
String cFloorKey = trie.headMap("c" + "\uefff").lastKey(); // b
String dFloorKey = trie.headMap("d" + "\uefff").lastKey(); // d
Now everything works as expected, since \uefff is the highest unicode character. Actually, searching for key + "\uefff", whatever key is, will always return key if it belongs to the trie, or the element immediately prior to key, if key is not present in the trie.
Now, this trick works for String keys, but is extensible to other types as well. i.e. for Integer keys you could search for key + 1, for Date keys you could add 1 millisecond, etc.

Collision resolution in Java HashMap

Java HashMap uses put method to insert the K/V pair in HashMap.
Lets say I have used put method and now HashMap<Integer, Integer> has one entry with key as 10 and value as 17.
If I insert 10,20 in this HashMap it simply replaces the the previous entry with this entry due to collision because of same key 10.
If the key collides HashMap replaces the old K/V pair with the new K/V pair.
So my question is when does the HashMap use Chaining collision resolution technique?
Why it did not form a linkedlist with key as 10 and value as 17,20?

When you insert the pair (10, 17) and then (10, 20), there is technically no collision involved. You are just replacing the old value with the new value for a given key 10 (since in both cases, 10 is equal to 10 and also the hash code for 10 is always 10).
Collision happens when multiple keys hash to the same bucket. In that case, you need to make sure that you can distinguish between those keys. Chaining collision resolution is one of those techniques which is used for this.
As an example, let's suppose that two strings "abra ka dabra" and "wave my wand" yield hash codes 100 and 200 respectively. Assuming the total array size is 10, both of them end up in the same bucket (100 % 10 and 200 % 10). Chaining ensures that whenever you do map.get( "abra ka dabra" );, you end up with the correct value associated with the key. In the case of hash map in Java, this is done by using the equals method.

In a HashMap the key is an object, that contains hashCode() and equals(Object) methods.
When you insert a new entry into the Map, it checks whether the hashCode is already known. Then, it will iterate through all objects with this hashcode, and test their equality with .equals(). If an equal object is found, the new value replaces the old one. If not, it will create a new entry in the map.
Usually, talking about maps, you use collision when two objects have the same hashCode but they are different. They are internally stored in a list.

It could have formed a linked list, indeed. It's just that Map contract requires it to replace the entry:
V put(K key, V value)
Associates the specified value with the specified key in this map
(optional operation). If the map previously contained a mapping for
the key, the old value is replaced by the specified value. (A map m is
said to contain a mapping for a key k if and only if m.containsKey(k)
would return true.)
http://docs.oracle.com/javase/6/docs/api/java/util/Map.html
For a map to store lists of values, it'd need to be a Multimap. Here's Google's: http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Multimap.html
A collection similar to a Map, but which may associate multiple values
with a single key. If you call put(K, V) twice, with the same key but
different values, the multimap contains mappings from the key to both
values.
Edit: Collision resolution
That's a bit different. A collision happens when two different keys happen to have the same hash code, or two keys with different hash codes happen to map into the same bucket in the underlying array.
Consider HashMap's source (bits and pieces removed):
public V put(K key, V value) {
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
// i is the index where we want to insert the new element
addEntry(hash, key, value, i);
return null;
}
void addEntry(int hash, K key, V value, int bucketIndex) {
// take the entry that's already in that bucket
Entry<K,V> e = table[bucketIndex];
// and create a new one that points to the old one = linked list
table[bucketIndex] = new Entry<>(hash, key, value, e);
}
For those who are curious how the Entry class in HashMap comes to behave like a list, it turns out that HashMap defines its own static Entry class which implements Map.Entry. You can see for yourself by viewing the source code:
GrepCode for HashMap

First of all, you have got the concept of hashing a little wrong and it has been rectified by #Sanjay.
And yes, Java indeed implement a collision resolution technique. When two keys get hashed to a same value (as the internal array used is finite in size and at some point the hashcode() method will return same hash value for two different keys) at this time, a linked list is formed at the bucket location where all the informations are entered as an Map.Entry object that contains a key-value pair. Accessing an object via a key will at worst require O(n) if the entry in present in such a lists. Comparison between the key you passed with each key in such list will be done by the equals() method.
Although, from Java 8 , the linked lists are replaced with trees (O(log n))

Your case is not talking about collision resolution, it is simply replacement of older value with a new value for the same key because Java's HashMap can't contain duplicates (i.e., multiple values) for the same key.
In your example, the value 17 will be simply replaced with 20 for the same key 10 inside the HashMap.
If you are trying to put a different/new value for the same key, it is not the concept of collision resolution, rather it is simply replacing the old value with a new value for the same key. It is how HashMap has been designed and you can have a look at the below API (emphasis is mine) taken from here.
public V put(K key, V value)
Associates the specified value with the
specified key in this map. If the map previously contained a mapping
for the key, the old value is replaced.
On the other hand, collision resolution techniques comes into play only when multiple keys end up with the same hashcode (i.e., they fall in the same bucket location) where an entry is already stored. HashMap handles the collision resolution by using the concept of chaining i.e., it stores the values in a linked list (or a balanced tree since Java8, depends on the number of entries).

When multiple keys end up in same hash code which is present in same bucket.
When the same key has different values then the old value will be replaced with new value.
Liked list converted to balanced Binary tree from java 8 version on wards in worst case scenario.
Collision happen when 2 distinct keys generate the same hashcode() value.
When there are more collisions then there it will leads to worst performance of hashmap.
Objects which are are equal according to the equals method must return the same hashCode value.
When both objects return the same has code then they will be moved into the same bucket.

There is difference between collision and duplication.
Collision means hashcode and bucket is same, but in duplicate, it will be same hashcode,same bucket, but here equals method come in picture.
Collision detected and you can add element on existing key. but in case of duplication it will replace new value.

It isn't defined to do so. In order to achieve this functionality, you need to create a map that maps keys to lists of values:
Map<Foo, List<Bar>> myMap;
Or, you could use the Multimap from google collections / guava libraries

There is no collision in your example. You use the same key, so the old value gets replaced with the new one. Now, if you used two keys that map to the same hash code, then you'd have a collision. But even in that case, HashMap would replace your value! If you want the values to be chained in case of a collision, you have to do it yourself, e.g. by using a list as a value.

Why does toString function of a HashMap prints itself with a different order?

I have this very simple piece of code, and I was just trying to play a bit with different kind of objects inside a Map.
//There's a bit of spanish, sorry about that
//just think 'persona1' as an object with
//a string and an int
Map mapa = new HashMap();
mapa.put('c', 12850);
mapa.put(38.6, 386540);
mapa.put("Andrés", 238761);
mapa.put(14, "Valor de 14");
mapa.put("p1", persona1);
mapa.put("Andrea", 34500);
System.out.println(mapa.toString());
And then I expect from console something like:
{c=12850, 38.6=386540, Andrés=238761, 14=Valor de 14, p1={nombre: Andres Perea, edad: 10}, Andrea=34500}
But susprisingly for me I got same data in different order:
{38.6=386540, Andrés=238761, c=12850, p1={nombre: Andres Perea, edad: 10}, Andrea=34500, 14=Valor de 14}
It doesn't matter if I try other kind of objects, even just Strings or numeric types, it always does the same, it makes a different without-apparently-any-sense order.
Can someone give me a hint why this happens? Or may be something too obvious I'm missing?
I'm using Java 1.7 and Eclipse Juno.

As per Oracle's documentation
The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls. This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
Refer to HashMap JavaDocs.

There are 3 class which implements map interface in java.
1. hashMap: Id does not guarantee any order.
2. Linked HashMap:It will store them in insertion order.
3. TreeMap: It will store in ascending order.(ASCII value)
So As per your requirement you can use Linked HashMap instead of HashMap.so instead of writing
Map mapa = new HashMap();
create object of Linked HashMap
Map mapa = new LinkedHashMap();
follow below link for more info.
http://docs.oracle.com/javase/tutorial/collections/interfaces/map.html

HashMap not guaranteed the order of element. If you want to keep order use LinkedHashMap.
See following case
Map<Integer,String> unOrderedMap=new HashMap<>();
unOrderedMap.put(1,"a");
unOrderedMap.put(3,"a");
unOrderedMap.put(2,"a");
System.out.println("HashMap output: "+unOrderedMap.toString());
Map<Integer,String> orderedMap=new LinkedHashMap<>();
orderedMap.put(1,"a");
orderedMap.put(3,"a");
orderedMap.put(2,"a");
System.out.println("LinkedHashMap output: "+orderedMap.toString());
Output:
HashMap output: {1=a, 2=a, 3=a}
LinkedHashMap output: {1=a, 3=a, 2=a}

Maps does not maintain the order the order in which elements were added, List will maintain the order of elements
"The order of a map is defined as the order in which the iterators on the map's collection views return their elements. Some map implementations, like the TreeMap class, make specific guarantees as to their order; others, like the HashMap class, do not."

This is how a hashmap works: (citing from another source)
It has a number of "buckets" which it uses to store key-value pairs in. Each bucket has a unique number - that's what identifies the bucket. When you put a key-value pair into the map, the hashmap will look at the hash code of the key, and store the pair in the bucket of which the identifier is the hash code of the key. For example: The hash code of the key is 235 -> the pair is stored in bucket number 235. (Note that one bucket can store more then one key-value pair).
When you lookup a value in the hashmap, by giving it a key, it will first look at the hash code of the key that you gave. The hashmap will then look into the corresponding bucket, and then it will compare the key that you gave with the keys of all pairs in the bucket, by comparing them with equals().
Now you can see how this is very efficient for looking up key-value pairs in a map: by the hash code of the key the hashmap immediately knows in which bucket to look, so that it only has to test against what's in that bucket.
Looking at the above mechanism, you can also see what requirements are necessary on the hashCode() and equals() methods of keys:
If two keys are the same (equals() returns true when you compare them), their hashCode() method must return the same number. If keys violate this, then keys that are equal might be stored in different buckets, and the hashmap would not be able to find key-value pairs (because it's going to look in the same bucket).
If two keys are different, then it doesn't matter if their hash codes are the same or not. They will be stored in the same bucket if their hash codes are the same, and in this case, the hashmap will use equals() to tell them apart.
Now, when you put all your "key-value" pairs in the hashmap, and print them, it prints them in some random order of the keys which got generated by hashing the value you supplied for keys.
If your requirement is still to maintain the ordering, you can use the LinkedHashMap in Java.
Hope this helps :-)
Edit: Original Post: How does a Java HashMap handle different objects with the same hash code?

Storing a dictionary in a hashtable

I have an assignment that I am working on, and I can't get a hold of the professor to get clarity on something. The idea is that we are writing an anagram solver, using a given set of words, that we store in 3 different dictionary classes: Linear, Binary, and Hash.
So we read in the words from a textfile, and for the first 2 dictionary objects(linear and binary), we store the words as an ArrayList...easy enough.
But for the HashDictionary, he want's us to store the words in a HashTable. I'm just not sure what the values are going to be for the HashTable, or why you would do that. The instructions say we store the words in a Hashtable for quick retrieval, but I just don't get what the point of that is. Makes sense to store words in an arraylist, but I'm just not sure of how key/value pairing helps with a dictionary.
Maybe i'm not giving enough details, but I figured maybe someone would have seen something like this and its obvious to them.
Each of our classes has a contains method, that returns a boolean representing whether or not a word passed in is in the dictionary, so the linear does a linear search of the arraylist, the binary does a binary search of the arraylist, and I'm not sure about the hash....

The difference is speed. Both methods work, but the hash table is fast.
When you use an ArrayList, or any sort of List, to find an element, you must inspect each list item, one by one, until you find the desired word. If the word isn't there, you've looped through the entire list.
When you use a HashTable, you perform some "magic" on the word you are looking up known as calculating the word's hash. Using that hash value, instead of looping through a list of values, you can immediately deduce where to find your word - or, if your word doesn't exist in the hash, that your word isn't there.
I've oversimplified here, but that's the general idea. You can find another question here with a variety of explanations on how a hash table works.
Here is a small code snippet utilizing a HashMap.
// We will map our words to their definitions; word is the key, definition is the value
Map<String, String> dictionary = new HashMap<String, String>();
map.put("hello","A common salutation");
map.put("chicken","A delightful vessel for protein");
// Later ...
map.get("chicken"); // Returns "A delightful vessel for protein";
The problem you describe asks that you use a HashMap as the basis for a dictionary that fulfills three requirements:
Adding a word to the dictionary
Removing a word from the dictionary
Checking if a word is in the dictionary
It seems counter-intuitive to use a map, which stores a key and a value, since all you really want to is store just a key (or just a value). However, as I described above, a HashMap makes it extremely quick to find the value associated with a key. Similarly, it makes it extremely quick to see if the HashMap knows about a key at all. We can leverage this quality by storing each of the dictionary words as a key in the HashMap, and associating it with a garbage value (since we don't care about it), such as null.
You can see how to fulfill the three requirements, as follows.
Map<String, Object> map = new HashMap<String, Object>();
// Add a word
map.put('word', null);
// Remove a word
map.remove('word');
// Check for the presence of a word
map.containsKey('word');
I don't want to overload you with information, but the requirements we have here align with a data structure known as a Set. In Java, a commonly used Set is the HashSet, which is almost exactly what you are implementing with this bit of your homework assignment. (In fact, if this weren't a homework assignment explicitly instructing you to use a HashMap, I'd recommend you instead use a HashSet.)

Arrays are hard to find stuff in. If I gave you array[0] = "cat"; array[1] = "dog"; array[2] = "pikachu";, you'd have to check each element just to know if jigglypuff is a word. If I gave you hash["cat"] = 1; hash["dog"] = 1; hash["pikachu"] = 1;", instant to do this in, you just look it up directly. The value 1 doesn't matter in this particular case although you can put useful information there, such as how many times youv'e looked up a word, or maybe 1 will mean real word and 2 will mean name of a Pokemon, or for a real dictionary it could contain a sentence-long definition. Less relevant.

It sounds like you don't really understand hash tables then. Even Wikipedia has a good explanation of this data structure.
Your hash table is just going to be a large array of strings (initially all empty). You compute a hash value using the characters in your word, and then insert the word at that position in the table.
There are issues when the hash value for two words is the same. And there are a few solutions. One is to store a list at each array position and just shove the word onto that list. Another is to step through the table by a known amount until you find a free position. Another is to compute a secondary hash using a different algorithm.
The point of this is that hash lookup is fast. It's very quick to compute a hash value, and then all you have to do is check that the word at that array position exists (and matches the search word). You follow the same rules for hash value collisions (in this case, mismatches) that you used for the insertion.
You want your table size to be a prime number that is larger than the number of elements you intend to store. You also need a hash function that diverges quickly so that your data is more likely to be dispersed widely through your hash table (rather than being clustered heavily in one region).
Hope this is a help and points you in the right direction.

Old values in hash map being overwritten by new values?

I have one hash map. I'm storing 12 different key,values pairs in it.
The first 8 values are stored fine, but when I try to put the 9th value it overwrites the old value. But the size increases.
If I try to get the old values, I get nulls. I have also checked the hash map table. Only 8 values are there. The old values are overwritten.
here have only 7 values but size is 9 . how it's possible ?
What could I be doing wrong?

Make sure you use different keys. If that's the case, make sure equals and hashcode for your key class work as required, i.e. when two objects are equal, their hashcodes must be same. And of course, equals for different key values (or what you'd expect to be distinct keys) must return false.
If that doesn't help, post a minimal, yet complete (compilable) example that demonstrates your problem.

As for the size=9 but only 7 values in the table, you are misunderstanding the internal workings of the HashMap. All values are not stored in the top-level table. The table is more like "buckets" that store entries grouped by certain hashcode ranges. Each "bucket" holds a chain of linked entries so what you are seeing in the table are just the first entries in each particular range chain. The size is always correct though, in terms of total number of entries in the map.
As for entries overwriting eachother, that happens only when you put en entry with a key that is identical (hashCode and equals) to en existing entry. So you are either adding with an existing key, or you are adding with null as key (null is permissible as key, but you can only have one entry with the key null).
Check your code, are you adding with null keys? If you are using instances of a custom class (one you created yourself) as key, have you implemented hashCode() and equals() according to the specifications (see http://download.oracle.com/javase/6/docs/api/java/lang/Object.html#hashCode%28%29)? Are you making sure that you are really using unique keys for all 12 put operations?

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.