Java Hash Table Implementation - java

In my implementation of a Hash Table, my hash function simply takes the value of the item I pass, call hashCode (inherited from the Object class), and modulo the size of the internal array. This internal array is an array of LinkedLists. Now if my LinkedLists become too long (and my efficiency begins to slip from O(1) to O(n)), I figured it would make sense to simply grow the size of the array. But that's where my problems lie, since I stated I hash the items I pass and modulo the size of the array (which has just changed). If I were to continue, wouldn't the hashes point to different indices in the array, thus lose the ability to refer to items in my hash table? How could I solve this?

You need the actual hash values for each of the items so that you can put them into the correct hash chain in the resized table. (Otherwise, as you observed, the items are liable to end up on the wrong chain and to not be locatable as a result.)
There are two ways to deal with this:
You could simply recalculate the hash value for each item as you add it to the new table.
You could keep a copy of the original hash values for each item in the hash chain. This is what the standard Java HashMap implementation does ... at least in the versions I've looked at.
(The latter is a time vs space trade-off that could pay off big time if your items have an expensive hashcode method. However, if you amortize over the lifetime of a hash table, this optimization does not alter the "big O" complexity of any of the public API methods ... assuming that your hash table resizing is exponential; e.g. you roughly double the table size each time.)

package com.codewithsouma.hashtable;
import java.util.LinkedList;
public class HashTable {
private class Entry {
private int key;
private String value;
public Entry(int key, String value) {
this.key = key;
this.value = value;
}
}
LinkedList<Entry>[] entries = new LinkedList[5];
public void put(int key, String value) {
var entry = getEntry(key);
if (entry != null){
entry.value = value;
return;
}
getOrCreateBucket(key).add(new Entry(key,value));
}
public String get(int key) {
var entry = getEntry(key);
return (entry == null) ? null : entry.value;
}
public void remove(int key) {
var entry = getEntry(key);
if (entry == null)
throw new IllegalStateException();
getBucket(key).remove(entry);
}
private LinkedList<Entry> getBucket(int key){
return entries[hash(key)];
}
private LinkedList<Entry> getOrCreateBucket(int key){
var index = hash(key);
var bucket = entries[index];
if (bucket == null) {
entries[index] = new LinkedList<>();
bucket = entries[index];
}
return bucket;
}
private Entry getEntry(int key) {
var bucket = getBucket(key);
if (bucket != null) {
for (var entry : bucket) {
if (entry.key == key) return entry;
}
}
return null;
}
private int hash(int key) {
return key % entries.length;
}
}

Related

how to get a linkedhashmap value using index?

i'm new to java.. i've made a linked hashmap like :
Map<String, Double> MonthlyCPIMenu = new LinkedHashMap<String, Double>();
MonthlyCPIMenu.put("1394/10", 0.0);
MonthlyCPIMenu.put("1394/09", 231.6);
MonthlyCPIMenu.put("1394/08", 228.7);
MonthlyCPIMenu.put("1394/07", 227.0);
MonthlyCPIMenu.put("1394/06", 225.7);
I know how to find each item's index using (for example):
String duemonth="1394/08";
int indexduemonth = new ArrayList<String>(MonthlyCPIMenu.keySet()).indexOf(duemonth);
but I don't know how to find the value using index. (I know how to get the value using key but in this case i should use index for some reason)
A crude way to do it would be
new ArrayList<String>(MonthlyCPIMenu.keySet()).get(index);
but LinkedHashMap generally doesn't support efficient indexed retrieval, and it doesn't provide any API for the purpose. The best algorithm to do it is just to take MonthlyCPIMenu.keySet().iterator(), call next() index times, and then return the result of one final next():
<K, V> K getKey(LinkedHashMap<K, V> map, int index) {
Iterator<K> itr = map.keySet().iterator();
for (int i = 0; i < index; i++) {
itr.next();
}
return itr.next();
}
First, do you have a specific reason you are using a LinkedHashMap? Generally speaking, iterating over keys is cheap and lookups are 0(1). Why does order of the values matter?
You can retrieve values from a map using the get(key) method.
Map.get(key);
You can protect against nulls with:
Map.get(key) != null ? Map.get(key) : "";
This will return the value if the key is found, else return an empty string. You can replace the empty string with whatever you want.
If you are set on getting the value then use the List interface and create your own type
public class MyValue {
String date;
String value;
public MyValue(String d, String v) {
this.date = d;
this.value = v;
}
public String getDate() {
return date;
}
public String getValue() {
return value;
}
}
Then use the List interface:
List<MyValue> list = new ArrayList<>();
// put all you values in the list
// get the values out by index in the list

how to create a linked-list in a hash-table with the specific index?

public boolean isCollide(String key, String value){
int index = key.hashCode();
if (this.key_array[index]==null) return false;
else return true;
}
public void addValue(String key, String value){
Hashtable hashtable = new Hashtable(key,value);
int index = key.hashCode();
if (isCollide(key,value)) {
hashtable.key_array[index]=key;
hashtable.value_array[index]=value;
}
else{
LinkedList<String> linkedList = new LinkedList<>();
linkedList.add(value); //how to create a linkedlist on a hashtable?
}
}
I'm implementing Hashtable from scratch. I was wondering how to create a linked-list in a hashtable? The code above is pretty much wrong but I hope it could illustrate what I'm thinking. So if there is a collision, then I would like to create a linked list starting from that collided index. Could anyone give me some guidance pls? Thanks!
Here is how Java HashMap does it internally:
class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
int hash;
/**
* Creates new entry.
*/
Entry(int h, K k, V v, Entry<K,V> n) {
value = v;
next = n;
key = k;
hash = h;
}
// rest of methods here...
}
Entry class maintains internal "next" property for building linked list of colliding keys.
Basically key and value pair is stored internally as an instance of the Entry class. If collision happens, new Entry instance is being added as a next node to the last item in the slot. pseudo code:
table[i].next = newEntry;

Java HashMap containsKey function

Somebody please tell me the function containsKey() of HashMap that how does it work internally. does it use equals or hashcode function to match the key. I am using string keys for a hashmap and when I am using the key dynamically the containskey returns false. e.g. (Just a sample code not the original one I am using in application)
class employee {
employee(String name) {
return name;
}
}
class test {
HashMap hm = new HashMap();
hm.put("key1",new Employee("emp1"));
hm.put("key2",new Employee("emp2"));
hm.put("key3","emp4");
hm.put("new Employee("emp5")","emp4");
System.out.println(hm.containsKey("emp5"));
}
The key is an Employee object, not a string, in containsKey you have a string. That comparison will return false, because string "emp5" is not equal to an object Employee.
Here is a quote from containsKey doc:
Returns true if this map contains a mapping for the specified key. More formally, returns true if and only if this map contains a mapping for a key k such that (key==null ? k==null : key.equals(k))
Since in your case key is a string, 'equals' will return 'true' only if k is a string as well and its content is the same as that of key.
Your code has many errors, this is invalid hm.put("new Employee("emp5")","emp4");
Also use generic types with collections
HashMap<String,employee> hm = new HashMap<String,employee>();
And name you class Employee not employee , Begin with capital for class names. Also you are calling new Employee Whereas you classname is employee.
According to the source for hashMap . It calls equals() on the keys (in your case which would mean equals for String) internally
public boolean containsKey(Object key)
{
int idx = hash(key);
HashEntry<K, V> e = buckets[idx];
while (e != null)
{
if (equals(key, e.key))
return true;
e = e.next;
}
return false;
}
Your valid code (assuming you are not trying to achieve something unusual) should look like this :-
class Employee {
String name;
Employee(String name) {
this.name = name;
}
}
class Test {
public void hello() {
HashMap<String,Employee> hm = new HashMap<String,Employee>();
hm.put("key1", new Employee("emp1"));
hm.put("key2", new Employee("emp2"));
hm.put("key3", new Employee("emp4"));
hm.put("key4", new Employee("emp5"));
System.out.println(hm.containsKey("key4"));
}
}
Corrected Code:
HashMap hm= new HashMap();
hm.put("key1",new Employee("emp1"));
hm.put("key2",new Employee("emp2"));
hm.put("key3","emp4");
System.out.println(hm.containsKey("key1"));
This will return true.
You are saving Employee object against String keys. So you need to check the valid key. In your case emp5 is not used as a key while adding elements to hashmap.
For your second question:
It internally checks hashcode of the key first. If hashcodes are same it will check equals method.
Assuming
employee(String name) {
return name;
}
Is not a constructor and it is some method this piece of code will not compile. As you are returning String but you dint specify the return type in the method.
Moreover this line hm.put("new Employee("emp5")","emp4");
you have specified the key as
new Employee("emp5") and you are searching using the key emp5 in the containsKey() obviously it will return false because
containsKey() -Returns true if this map contains a mapping for the specified key.
Internally, a hash map can be implemented with an array of linked lists.
The key is passed to a routine (the hash) which gives back a number. The number is then divided by the size of the array, giving a remainder. That remainder is the linked list you then travel to see if any of the nodes exactly matches the key.
The advantages are that if you have a properly balanced hash function, and (let's say) an array of 32 items, you can quickly discard the searching of 31/32 (or +90%) of your possible values in a constant time operation.
Other means of implementation exist; however, they are computationally similar.
An example of a (very bad) hash algorithm for Strings might be to simply add up all the ASCII character values. Very good hash algorithms tend to give back an evenly distributed number based on the expected inputs, where incremental inputs do not incrementally fill adjacent buckets.
So, to find out if a hash map contains a key, get the result of the hash function on the key, and then walk down the correct linked list checking each entry for the key's presence.
In C, a "node" in the linked list.
struct Node {
char* key;
char* value;
struct Node* next;
};
In C, the "hashmap"
struct HashMap {
int size;
struct Node* listRoots;
};
The algorithm
int containsKey(HashMap* hashMap, char* key) {
int hash = hashFunc(key);
Node* head = hashMap->listRoots[hash % hashMap->size];
while (head != 0) {
if (strcmp(head->key, key)) {
return TRUE;
}
}
return FALSE;
}
Keep in mind my C is a bit rusty, but hopefully, you'll get the idea.

Questions about implementing my own HashMap in Java

I am working on an assignment where I have to implement my own HashMap. In the assignment text it is being described as an Array of Lists, and whenever you want to add an element the place it ends up in the Array is determined by its hashCode. In my case it is positions from a spreadsheet, so I have just taken columnNumber + rowNumber and then converted that to a String and then to an int, as the hashCode, and then I insert it that place in the Array. It is of course inserted in the form of a Node(key, value), where the key is the position of the cell and the value is the value of the cell.
But I must say I do not understand why we need an Array of Lists, because if we then end up with a list with more than one element, will it not increase the look up time quite considerably? So should it not rather be an Array of Nodes?
Also I have found this implementation of a HashMap in Java:
public class HashEntry {
private int key;
private int value;
HashEntry(int key, int value) {
this.key = key;
this.value = value;
}
public int getKey() {
return key;
}
public int getValue() {
return value;
}
}
public class HashMap {
private final static int TABLE_SIZE = 128;
HashEntry[] table;
HashMap() {
table = new HashEntry[TABLE_SIZE];
for (int i = 0; i < TABLE_SIZE; i++)
table[i] = null;
}
public int get(int key) {
int hash = (key % TABLE_SIZE);
while (table[hash] != null && table[hash].getKey() != key)
hash = (hash + 1) % TABLE_SIZE;
if (table[hash] == null)
return -1;
else
return table[hash].getValue();
}
public void put(int key, int value) {
int hash = (key % TABLE_SIZE);
while (table[hash] != null && table[hash].getKey() != key)
hash = (hash + 1) % TABLE_SIZE;
table[hash] = new HashEntry(key, value);
}
}
So is it correct that the put method, looks first at the table[hash], and if that is not empty and if what is in there has not got the key, being inputted in the method put, then it moves on to table[(hash + 1) % TABLE_SIZE]. But if it is the same key it simply overwrites the value. So is that correctly understood? And is it because the get and put method use the same method of looking up the place in the Array, that given the same key they would end up at the same place in the Array?
I know these questions might be a bit basic, but I have spend quite some time trying to get this sorted out, why any help would be much appreciated!
Edit
So now I have tried implementing the HashMap myself via a Node class, which just
constructs a node with a key and a corresponding value, it has also got a getHashCode method, where I just concatenate the two values on each other.
I have also constructed a SinglyLinkedList (part of a previous assignment), which I use as the bucket.
And my Hash function is simply hashCode % hashMap.length.
Here is my own implementation, so what do you think of it?
package spreadsheet;
public class HashTableMap {
private SinglyLinkedListMap[] hashArray;
private int size;
public HashTableMap() {
hashArray = new SinglyLinkedListMap[64];
size = 0;
}
public void insert(final Position key, final Expression value) {
Node node = new Node(key, value);
int hashNumber = node.getHashCode() % hashArray.length;
SinglyLinkedListMap bucket = new SinglyLinkedListMap();
bucket.insert(key, value);
if(hashArray[hashNumber] == null) {
hashArray[hashNumber] = bucket;
size++;
}
if(hashArray[hashNumber] != null) {
SinglyLinkedListMap bucket2 = hashArray[hashNumber];
bucket2.insert(key, value);
hashArray[hashNumber] = bucket2;
size++;
}
if (hashArray.length == size) {
SinglyLinkedListMap[] newhashArray = new SinglyLinkedListMap[size * 2];
for (int i = 0; i < size; i++) {
newhashArray[i] = hashArray[i];
}
hashArray = newhashArray;
}
}
public Expression lookUp(final Position key) {
Node node = new Node(key, null);
int hashNumber = node.getHashCode() % hashArray.length;
SinglyLinkedListMap foundBucket = hashArray[hashNumber];
return foundBucket.lookUp(key);
}
}
The look up time should be around O(1), so I would like to know if that is the case? And if not how can I improve it, in that regard?
You have to have some plan to deal with hash collisions, in which two distinct keys fall in the same bucket, the same element of your array.
One of the simplest solutions is to keep a list of entries for each bucket.
If you have a good hashing algorithm, and make sure the number of buckets is bigger than the number of elements, you should end up with most buckets having zero or one items, so the list search should not take long. If the lists are getting too long it is time to rehash with more buckets to spread the data out.
It really depends on how good your hashcode method is. Lets say you tried to make it as bad as possible: You made hashcode return 1 every time. If that were the case, you'd have an array of lists, but only 1 element of the array would have any data in it. That element would just grow to have a huge list in it.
If you did that, you'd have a really inefficient hashmap. But, if your hashcode were a little better, it'd distribute the objects into many different array elements and as a result it'd be much more efficient.
The most ideal case (which often isn't achievable) is to have a hashcode method that returns a unique number no matter what object you put into it. If you could do that, you wouldn't ever need an array of lists. You could just use an array. But since your hashcode isn't "perfect" it's possible for two different objects to have the same hashcode. You need to be able to handle that scenario by putting them in a list at the same array element.
But, if your hashcode method was "pretty good" and rarely had collisions, you rarely would have more than 1 element in the list.
The Lists are often referred to as buckets and are a way of dealing with collisions. When two data elements have the same hash code mod TABLE SIZE they collide, but both must be stored.
A worse kind of collision is two different data point having the same key -- this is disallowed in hash tables and one will overwrite the others. If you just add row to column, then (2,1) and (1,2) will both have a key of 3, which means they cannot be stored in the same hash table. If you concatenated the strings together without a separator then the problem is with (12,1) versus (1, 21) --- both have key "121" With a separator (such as a comma) all the keys will be distinct.
Distinct keys can land in the same buck if there hashcodes are the same mod TABLE_SIZE. Those lists are one way to store the two values in the same bucket.
class SpreadSheetPosition {
int column;
int row;
#Override
public int hashCode() {
return column + row;
}
}
class HashMap {
private Liat[] buckets = new List[N];
public void put(Object key, Object value) {
int keyHashCode = key.hashCode();
int bucketIndex = keyHashCode % N;
...
}
}
Compare having N lists, with having just one list/array. For searching in a list one has to traverse possibly the entire list. By using an array of lists, one at least reduces the single lists. Possibly even getting a list of one or zero elements (null).
If the hashCode() is as unique as possible the chance for an immediate found is high.

HashTable Collision Where is it using LinkedList to store multiple values which is having same keys

I read many times that in a hashtable when collision arises one key with multiple values it stores in a linkedlist and then it will make equals calls to check which keys map to required value but I see code of hashtable it does not have any linkedlist code in a put method or get method. It uses Entry[] array and I dont understand how this will be used as linkedlist.
for (Entry<K,V> e = tab[index] ; e != null ; e = e.next) {
if ((e.hash == hash) && e.key.equals(key)) {
V old = e.value;
e.value = value;
return old;
}
}
Kindly guide and clear my doubt.
I think that the implementation may differ between JVM but from my understanding linked list is used (but not necessary java.util.LinkedList). This is how 'put' is implemented in HashTable in JVM I use:
public Object put(Object key, Object value) {
// Make sure the value is not null
if (value == null) throw new NullPointerException();
// Makes sure the key is not already in the hashtable.
HashtableEntry e;
HashtableEntry tab[] = table;
int hash = key.hashCode();
int index = (hash & 0x7FFFFFFF) % tab.length;
for (e = tab[index] ; e != null ; e = e.next) {
if ((e.hash == hash) && e.key.equals(key)) {
Object old = e.value;
e.value = value;
return old;
}
}
There is some difference between this version and the one you posted but I think logic behind them is the same.
The HashtableEntry looks like this:
class HashtableEntry {
int hash;
Object key;
Object value;
HashtableEntry next;
(...)
the "HashtableEntry next" reference does make a HashtableEntry a linked list (a linked list is a structure in which each element has a reference to another element of the same type unless it is the last element in list).
I think what you were looking for was java.util.LinkedList but HastTable implements linked list structure in its own way.

Categories