Questions about implementing my own HashMap in Java - java

I am working on an assignment where I have to implement my own HashMap. In the assignment text it is being described as an Array of Lists, and whenever you want to add an element the place it ends up in the Array is determined by its hashCode. In my case it is positions from a spreadsheet, so I have just taken columnNumber + rowNumber and then converted that to a String and then to an int, as the hashCode, and then I insert it that place in the Array. It is of course inserted in the form of a Node(key, value), where the key is the position of the cell and the value is the value of the cell.
But I must say I do not understand why we need an Array of Lists, because if we then end up with a list with more than one element, will it not increase the look up time quite considerably? So should it not rather be an Array of Nodes?
Also I have found this implementation of a HashMap in Java:
public class HashEntry {
private int key;
private int value;
HashEntry(int key, int value) {
this.key = key;
this.value = value;
}
public int getKey() {
return key;
}
public int getValue() {
return value;
}
}
public class HashMap {
private final static int TABLE_SIZE = 128;
HashEntry[] table;
HashMap() {
table = new HashEntry[TABLE_SIZE];
for (int i = 0; i < TABLE_SIZE; i++)
table[i] = null;
}
public int get(int key) {
int hash = (key % TABLE_SIZE);
while (table[hash] != null && table[hash].getKey() != key)
hash = (hash + 1) % TABLE_SIZE;
if (table[hash] == null)
return -1;
else
return table[hash].getValue();
}
public void put(int key, int value) {
int hash = (key % TABLE_SIZE);
while (table[hash] != null && table[hash].getKey() != key)
hash = (hash + 1) % TABLE_SIZE;
table[hash] = new HashEntry(key, value);
}
}
So is it correct that the put method, looks first at the table[hash], and if that is not empty and if what is in there has not got the key, being inputted in the method put, then it moves on to table[(hash + 1) % TABLE_SIZE]. But if it is the same key it simply overwrites the value. So is that correctly understood? And is it because the get and put method use the same method of looking up the place in the Array, that given the same key they would end up at the same place in the Array?
I know these questions might be a bit basic, but I have spend quite some time trying to get this sorted out, why any help would be much appreciated!
Edit
So now I have tried implementing the HashMap myself via a Node class, which just
constructs a node with a key and a corresponding value, it has also got a getHashCode method, where I just concatenate the two values on each other.
I have also constructed a SinglyLinkedList (part of a previous assignment), which I use as the bucket.
And my Hash function is simply hashCode % hashMap.length.
Here is my own implementation, so what do you think of it?
package spreadsheet;
public class HashTableMap {
private SinglyLinkedListMap[] hashArray;
private int size;
public HashTableMap() {
hashArray = new SinglyLinkedListMap[64];
size = 0;
}
public void insert(final Position key, final Expression value) {
Node node = new Node(key, value);
int hashNumber = node.getHashCode() % hashArray.length;
SinglyLinkedListMap bucket = new SinglyLinkedListMap();
bucket.insert(key, value);
if(hashArray[hashNumber] == null) {
hashArray[hashNumber] = bucket;
size++;
}
if(hashArray[hashNumber] != null) {
SinglyLinkedListMap bucket2 = hashArray[hashNumber];
bucket2.insert(key, value);
hashArray[hashNumber] = bucket2;
size++;
}
if (hashArray.length == size) {
SinglyLinkedListMap[] newhashArray = new SinglyLinkedListMap[size * 2];
for (int i = 0; i < size; i++) {
newhashArray[i] = hashArray[i];
}
hashArray = newhashArray;
}
}
public Expression lookUp(final Position key) {
Node node = new Node(key, null);
int hashNumber = node.getHashCode() % hashArray.length;
SinglyLinkedListMap foundBucket = hashArray[hashNumber];
return foundBucket.lookUp(key);
}
}
The look up time should be around O(1), so I would like to know if that is the case? And if not how can I improve it, in that regard?

You have to have some plan to deal with hash collisions, in which two distinct keys fall in the same bucket, the same element of your array.
One of the simplest solutions is to keep a list of entries for each bucket.
If you have a good hashing algorithm, and make sure the number of buckets is bigger than the number of elements, you should end up with most buckets having zero or one items, so the list search should not take long. If the lists are getting too long it is time to rehash with more buckets to spread the data out.

It really depends on how good your hashcode method is. Lets say you tried to make it as bad as possible: You made hashcode return 1 every time. If that were the case, you'd have an array of lists, but only 1 element of the array would have any data in it. That element would just grow to have a huge list in it.
If you did that, you'd have a really inefficient hashmap. But, if your hashcode were a little better, it'd distribute the objects into many different array elements and as a result it'd be much more efficient.
The most ideal case (which often isn't achievable) is to have a hashcode method that returns a unique number no matter what object you put into it. If you could do that, you wouldn't ever need an array of lists. You could just use an array. But since your hashcode isn't "perfect" it's possible for two different objects to have the same hashcode. You need to be able to handle that scenario by putting them in a list at the same array element.
But, if your hashcode method was "pretty good" and rarely had collisions, you rarely would have more than 1 element in the list.

The Lists are often referred to as buckets and are a way of dealing with collisions. When two data elements have the same hash code mod TABLE SIZE they collide, but both must be stored.
A worse kind of collision is two different data point having the same key -- this is disallowed in hash tables and one will overwrite the others. If you just add row to column, then (2,1) and (1,2) will both have a key of 3, which means they cannot be stored in the same hash table. If you concatenated the strings together without a separator then the problem is with (12,1) versus (1, 21) --- both have key "121" With a separator (such as a comma) all the keys will be distinct.
Distinct keys can land in the same buck if there hashcodes are the same mod TABLE_SIZE. Those lists are one way to store the two values in the same bucket.

class SpreadSheetPosition {
int column;
int row;
#Override
public int hashCode() {
return column + row;
}
}
class HashMap {
private Liat[] buckets = new List[N];
public void put(Object key, Object value) {
int keyHashCode = key.hashCode();
int bucketIndex = keyHashCode % N;
...
}
}
Compare having N lists, with having just one list/array. For searching in a list one has to traverse possibly the entire list. By using an array of lists, one at least reduces the single lists. Possibly even getting a list of one or zero elements (null).
If the hashCode() is as unique as possible the chance for an immediate found is high.

Related

Changing complexity from O(n) to o(logn)

We have a linkedlist called ratings that contains 3 integers
userId, ItemId and value of the actual rating (example from 0 to 10)
this method actually returns rating of User i and item j that the programs reads it from a File and returns -1 if there is no rating
the method that is BigOh(n) :
public int getRating(int i, int j){
ratings.findFirst();
while(!ratings.empty()){
if(ratings.retrieve().getUserId() == i && ratings.retrieve().getItemId() == j)
return ratings.retrieve().getValue();
else
ratings.findNext();
}
return -1;
}
How can I do this in BigOh(logn)?
Or is there anyway I can solve it using Binary Search tree?
The short answer is: use a different data structure. Linked lists aren't capable of doing searches in anything other than linear time, since each element is linked together without any real semblance or order (and even if the list were sorted, you'd still have to do some kind of timed traversal).
One data structure that you could use would be a Table from Guava. With this data structure, you'd have to do more work to add an element in...
Table<Integer, Integer, Rating> ratings = HashBasedTable.create();
ratings.put(rating.getUserId(), rating.getItemId(), rating);
...but you can retrieve very quickly - in roughly O(1) time since HashBasedTable is backed by LinkedHashSet<Integer, LinkedHashSet<Integer, Rating>>.
ratings.get(i, j);
You can use hashing to achieve your task in O(1). Please read this article to gain a deeper understanding about hashing.
Since you are using Java, you can use HashMap to accomplish your task. Note that, worst case time complexity for hashing technique is O(log n) but on average it is O(1). If you are more interested to know about hash tables and amortized analysis, please go through this article.
Code Example: You can create a Class with the required attributes and implement equals and hashCode method as follows. [read Java collections - hashCode() and equals()]
class Rating {
public int user_id; // id of the user who rated
public int item_id; // id of the item being rated
public Rating(int user_id, int item_id) {
this.user_id = user_id;
this.item_id = item_id;
}
#Override
public boolean equals(Object o) {
if (o == this) {
return true;
}
if (!(o instanceof Rating)) {
return false;
}
Rating ratingObj = (Rating) o;
return ratingObj.user_id == user_id
&& ratingObj.item_id == item_id;
}
#Override
public int hashCode() {
int result = 17;
result = 31 * result + user_id;
result = 31 * result + item_id;
return result;
}
}
Then store values in HashMap as follows:
public static void main(String[] args) {
HashMap<Rating, Integer> ratingMap = new HashMap<>();
Rating rt = new Rating(1, 5); // user id = 1, item id = 5
ratingMap.put(rt, 3);
rt = new Rating(1, 2); // user id = 1, item id = 2
ratingMap.put(rt, 4);
rt = new Rating(1, 3); // user id = 1, item id = 3
ratingMap.put(rt, 5);
// now search in HashMap
System.out.println(ratingMap.get(new Rating(1, 3))); // prints 5
}
As presented, this could hardly be done in O(log n). You're looking through elements until you find the one you need. In the worst case, you won't find the element you want until the end of the loop, thus making it O(n).
Of course, if ratings were a dictionary you'd retrieve the value in almost O(1): user ids as keys and for example a list of ratings as value. Insertion would be a bit slower but not much.

How to add into an arraylist and lookup using binarysearch? PhoneBook ArrayList clas

I have to do a PhoneBook class. I have done all but two methods. I need some help on how to do them.
1. A PhoneEntry class exists. =
a. Write a PhoneBook class that stores PhoneEntry objects. Add the following methods.
iii. addInOrder(PhoneEntry a) This is the insertion Sort
v. lookup(String name)- return the phone number associated with name. Use binary search
I don't know how to do both of them. I have some of addInOrder done but none of lookup. This is what I have so far:
public void addInOrder(PhoneEntry a) {
for (int outer = 1; outer < book.size(); outer++) {
int position = outer;
String key = (a.get(position);
// shift larger values to right
while (position > 0 && a.get(position - 1).compareTo(key) > 0) {
(a.get(position)).equals(a.get(position - 1));
position--;
}
a.get(position).equals(key);
}
}
public String lookUp(String name) {
}
addInOrder
Just add at the end of the entries then sort if performance is not relevant or you have few elements (you can easily sort using Collections.sort). A faster approach is to keep the entries sorted from the beginning so finding where to add is faster.
lookup
Iterate over all the entries, check each name and return the phone if equal. A faster approach is to keep a Map where the key is the name.

Java HashMap containsKey function

Somebody please tell me the function containsKey() of HashMap that how does it work internally. does it use equals or hashcode function to match the key. I am using string keys for a hashmap and when I am using the key dynamically the containskey returns false. e.g. (Just a sample code not the original one I am using in application)
class employee {
employee(String name) {
return name;
}
}
class test {
HashMap hm = new HashMap();
hm.put("key1",new Employee("emp1"));
hm.put("key2",new Employee("emp2"));
hm.put("key3","emp4");
hm.put("new Employee("emp5")","emp4");
System.out.println(hm.containsKey("emp5"));
}
The key is an Employee object, not a string, in containsKey you have a string. That comparison will return false, because string "emp5" is not equal to an object Employee.
Here is a quote from containsKey doc:
Returns true if this map contains a mapping for the specified key. More formally, returns true if and only if this map contains a mapping for a key k such that (key==null ? k==null : key.equals(k))
Since in your case key is a string, 'equals' will return 'true' only if k is a string as well and its content is the same as that of key.
Your code has many errors, this is invalid hm.put("new Employee("emp5")","emp4");
Also use generic types with collections
HashMap<String,employee> hm = new HashMap<String,employee>();
And name you class Employee not employee , Begin with capital for class names. Also you are calling new Employee Whereas you classname is employee.
According to the source for hashMap . It calls equals() on the keys (in your case which would mean equals for String) internally
public boolean containsKey(Object key)
{
int idx = hash(key);
HashEntry<K, V> e = buckets[idx];
while (e != null)
{
if (equals(key, e.key))
return true;
e = e.next;
}
return false;
}
Your valid code (assuming you are not trying to achieve something unusual) should look like this :-
class Employee {
String name;
Employee(String name) {
this.name = name;
}
}
class Test {
public void hello() {
HashMap<String,Employee> hm = new HashMap<String,Employee>();
hm.put("key1", new Employee("emp1"));
hm.put("key2", new Employee("emp2"));
hm.put("key3", new Employee("emp4"));
hm.put("key4", new Employee("emp5"));
System.out.println(hm.containsKey("key4"));
}
}
Corrected Code:
HashMap hm= new HashMap();
hm.put("key1",new Employee("emp1"));
hm.put("key2",new Employee("emp2"));
hm.put("key3","emp4");
System.out.println(hm.containsKey("key1"));
This will return true.
You are saving Employee object against String keys. So you need to check the valid key. In your case emp5 is not used as a key while adding elements to hashmap.
For your second question:
It internally checks hashcode of the key first. If hashcodes are same it will check equals method.
Assuming
employee(String name) {
return name;
}
Is not a constructor and it is some method this piece of code will not compile. As you are returning String but you dint specify the return type in the method.
Moreover this line hm.put("new Employee("emp5")","emp4");
you have specified the key as
new Employee("emp5") and you are searching using the key emp5 in the containsKey() obviously it will return false because
containsKey() -Returns true if this map contains a mapping for the specified key.
Internally, a hash map can be implemented with an array of linked lists.
The key is passed to a routine (the hash) which gives back a number. The number is then divided by the size of the array, giving a remainder. That remainder is the linked list you then travel to see if any of the nodes exactly matches the key.
The advantages are that if you have a properly balanced hash function, and (let's say) an array of 32 items, you can quickly discard the searching of 31/32 (or +90%) of your possible values in a constant time operation.
Other means of implementation exist; however, they are computationally similar.
An example of a (very bad) hash algorithm for Strings might be to simply add up all the ASCII character values. Very good hash algorithms tend to give back an evenly distributed number based on the expected inputs, where incremental inputs do not incrementally fill adjacent buckets.
So, to find out if a hash map contains a key, get the result of the hash function on the key, and then walk down the correct linked list checking each entry for the key's presence.
In C, a "node" in the linked list.
struct Node {
char* key;
char* value;
struct Node* next;
};
In C, the "hashmap"
struct HashMap {
int size;
struct Node* listRoots;
};
The algorithm
int containsKey(HashMap* hashMap, char* key) {
int hash = hashFunc(key);
Node* head = hashMap->listRoots[hash % hashMap->size];
while (head != 0) {
if (strcmp(head->key, key)) {
return TRUE;
}
}
return FALSE;
}
Keep in mind my C is a bit rusty, but hopefully, you'll get the idea.

Java Hash Table Implementation

In my implementation of a Hash Table, my hash function simply takes the value of the item I pass, call hashCode (inherited from the Object class), and modulo the size of the internal array. This internal array is an array of LinkedLists. Now if my LinkedLists become too long (and my efficiency begins to slip from O(1) to O(n)), I figured it would make sense to simply grow the size of the array. But that's where my problems lie, since I stated I hash the items I pass and modulo the size of the array (which has just changed). If I were to continue, wouldn't the hashes point to different indices in the array, thus lose the ability to refer to items in my hash table? How could I solve this?
You need the actual hash values for each of the items so that you can put them into the correct hash chain in the resized table. (Otherwise, as you observed, the items are liable to end up on the wrong chain and to not be locatable as a result.)
There are two ways to deal with this:
You could simply recalculate the hash value for each item as you add it to the new table.
You could keep a copy of the original hash values for each item in the hash chain. This is what the standard Java HashMap implementation does ... at least in the versions I've looked at.
(The latter is a time vs space trade-off that could pay off big time if your items have an expensive hashcode method. However, if you amortize over the lifetime of a hash table, this optimization does not alter the "big O" complexity of any of the public API methods ... assuming that your hash table resizing is exponential; e.g. you roughly double the table size each time.)
package com.codewithsouma.hashtable;
import java.util.LinkedList;
public class HashTable {
private class Entry {
private int key;
private String value;
public Entry(int key, String value) {
this.key = key;
this.value = value;
}
}
LinkedList<Entry>[] entries = new LinkedList[5];
public void put(int key, String value) {
var entry = getEntry(key);
if (entry != null){
entry.value = value;
return;
}
getOrCreateBucket(key).add(new Entry(key,value));
}
public String get(int key) {
var entry = getEntry(key);
return (entry == null) ? null : entry.value;
}
public void remove(int key) {
var entry = getEntry(key);
if (entry == null)
throw new IllegalStateException();
getBucket(key).remove(entry);
}
private LinkedList<Entry> getBucket(int key){
return entries[hash(key)];
}
private LinkedList<Entry> getOrCreateBucket(int key){
var index = hash(key);
var bucket = entries[index];
if (bucket == null) {
entries[index] = new LinkedList<>();
bucket = entries[index];
}
return bucket;
}
private Entry getEntry(int key) {
var bucket = getBucket(key);
if (bucket != null) {
for (var entry : bucket) {
if (entry.key == key) return entry;
}
}
return null;
}
private int hash(int key) {
return key % entries.length;
}
}

Java: Composite key in hashmaps

I would like to store a group of objects in a hashmap , where the key shall be a composite of two string values. is there a way to achieve this?
i can simply concatenate the two strings , but im sure there is a better way to do this.
You could have a custom object containing the two strings:
class StringKey {
private String str1;
private String str2;
}
Problem is, you need to determine the equality test and the hash code for two such objects.
Equality could be the match on both strings and the hashcode could be the hashcode of the concatenated members (this is debatable):
class StringKey {
private String str1;
private String str2;
#Override
public boolean equals(Object obj) {
if(obj != null && obj instanceof StringKey) {
StringKey s = (StringKey)obj;
return str1.equals(s.str1) && str2.equals(s.str2);
}
return false;
}
#Override
public int hashCode() {
return (str1 + str2).hashCode();
}
}
You don't need to reinvent the wheel. Simply use the Guava's HashBasedTable<R,C,V> implementation of Table<R,C,V> interface, for your need. Here is an example
Table<String, String, Integer> table = HashBasedTable.create();
table.put("key-1", "lock-1", 50);
table.put("lock-1", "key-1", 100);
System.out.println(table.get("key-1", "lock-1")); //prints 50
System.out.println(table.get("lock-1", "key-1")); //prints 100
table.put("key-1", "lock-1", 150); //replaces 50 with 150
public int hashCode() {
return (str1 + str2).hashCode();
}
This seems to be a terrible way to generate the hashCode: Creating a new string instance every time the hash code is computed is terrible! (Even generating the string instance once and caching the result is poor practice.)
There are a lot of suggestions here:
How do I calculate a good hash code for a list of strings?
public int hashCode() {
final int prime = 31;
int result = 1;
for ( String s : strings ) {
result = result * prime + s.hashCode();
}
return result;
}
For a pair of strings, that becomes:
return string1.hashCode() * 31 + string2.hashCode();
That is a very basic implementation. Lots of advice through the link to suggest better tuned strategies.
Why not create a (say) Pair object, which contains the two strings as members, and then use this as the key ?
e.g.
public class Pair {
private final String str1;
private final String str2;
// this object should be immutable to reliably perform subsequent lookups
}
Don't forget about equals() and hashCode(). See this blog entry for more on HashMaps and keys, including a background on the immutability requirements. If your key isn't immutable, then you can change its components and a subsequent lookup will fail to locate it (this is why immutable objects such as String are good candidates for a key)
You're right that concatenation isn't ideal. For some circumstances it'll work, but it's often an unreliable and fragile solution (e.g. is AB/C a different key from A/BC ?).
I have a similar case. All I do is concatenate the two strings separated by a tilde ( ~ ).
So when the client calls the service function to get the object from the map, it looks like this:
MyObject getMyObject(String key1, String key2) {
String cacheKey = key1 + "~" + key2;
return map.get(cachekey);
}
It is simple, but it works.
I see that many people use nested maps. That is, to map Key1 -> Key2 -> Value (I use the computer science/ aka haskell curring notation for (Key1 x Key2) -> Value mapping which has two arguments and produces a value), you first supply the first key -- this returns you a (partial) map Key2 -> Value, which you unfold in the next step.
For instance,
Map<File, Map<Integer, String>> table = new HashMap(); // maps (File, Int) -> Distance
add(k1, k2, value) {
table2 = table1.get(k1);
if (table2 == null) table2 = table1.add(k1, new HashMap())
table2.add(k2, value)
}
get(k1, k2) {
table2 = table1.get(k1);
return table2.get(k2)
}
I am not sure that it is better or not than the plain composite key construction. You may comment on that.
Reading about the spaguetti/cactus stack I came up with a variant which may serve for this purpose, including the possibility of mapping your keys in any order so that map.lookup("a","b") and map.lookup("b","a") returns the same element. It also works with any number of keys not just two.
I use it as a stack for experimenting with dataflow programming but here is a quick and dirty version which works as a multi key map (it should be improved: Sets instead of arrays should be used to avoid looking up duplicated ocurrences of a key)
public class MultiKeyMap <K,E> {
class Mapping {
E element;
int numKeys;
public Mapping(E element,int numKeys){
this.element = element;
this.numKeys = numKeys;
}
}
class KeySlot{
Mapping parent;
public KeySlot(Mapping mapping) {
parent = mapping;
}
}
class KeySlotList extends LinkedList<KeySlot>{}
class MultiMap extends HashMap<K,KeySlotList>{}
class MappingTrackMap extends HashMap<Mapping,Integer>{}
MultiMap map = new MultiMap();
public void put(E element, K ...keys){
Mapping mapping = new Mapping(element,keys.length);
for(int i=0;i<keys.length;i++){
KeySlot k = new KeySlot(mapping);
KeySlotList l = map.get(keys[i]);
if(l==null){
l = new KeySlotList();
map.put(keys[i], l);
}
l.add(k);
}
}
public E lookup(K ...keys){
MappingTrackMap tmp = new MappingTrackMap();
for(K key:keys){
KeySlotList l = map.get(key);
if(l==null)return null;
for(KeySlot keySlot:l){
Mapping parent = keySlot.parent;
Integer count = tmp.get(parent);
if(parent.numKeys!=keys.length)continue;
if(count == null){
count = parent.numKeys-1;
}else{
count--;
}
if(count == 0){
return parent.element;
}else{
tmp.put(parent, count);
}
}
}
return null;
}
public static void main(String[] args) {
MultiKeyMap<String,String> m = new MultiKeyMap<String,String>();
m.put("brazil", "yellow", "green");
m.put("canada", "red", "white");
m.put("USA", "red" ,"white" ,"blue");
m.put("argentina", "white","blue");
System.out.println(m.lookup("red","white")); // canada
System.out.println(m.lookup("white","red")); // canada
System.out.println(m.lookup("white","red","blue")); // USA
}
}
public static String fakeMapKey(final String... arrayKey) {
String[] keys = arrayKey;
if (keys == null || keys.length == 0)
return null;
if (keys.length == 1)
return keys[0];
String key = "";
for (int i = 0; i < keys.length; i++)
key += "{" + i + "}" + (i == keys.length - 1 ? "" : "{" + keys.length + "}");
keys = Arrays.copyOf(keys, keys.length + 1);
keys[keys.length - 1] = FAKE_KEY_SEPARATOR;
return MessageFormat.format(key, (Object[]) keys);}
public static string FAKE_KEY_SEPARATOR = "~";
INPUT:
fakeMapKey("keyPart1","keyPart2","keyPart3");
OUTPUT: keyPart1~keyPart2~keyPart3
I’d like to mention two options that I don’t think were covered in the other answers. Whether they are good for your purpose you will have to decide yourself.
Map<String, Map<String, YourObject>>
You may use a map of maps, using string 1 as key in the outer map and string 2 as key in each inner map.
I do not think it’s a very nice solution syntax-wise, but it’s simple and I have seen it used in some places. It’s also supposed to be efficient in time and memory, while this shouldn’t be the main reason in 99 % of cases. What I don’t like about it is that we’ve lost the explicit information about the type of the key: it’s only inferred from the code that the effective key is two strings, it’s not clear to read.
Map<YourObject, YourObject>
This is for a special case. I have had this situation more than once, so it’s not more special than that. If your objects contain the two strings used as key and it makes sense to define object equality based on the two, then define equals and hashCode in accordance and use the object as both key and value.
One would have wished to use a Set rather than a Map in this case, but a Java HashSet doesn’t provide any method to retrieve an object form a set based on an equal object. So we do need the map.
One liability is that you need to create a new object in order to do lookup. This goes for the solutions in many of the other answers too.
Link
Jerónimo López: Composite key in HashMaps on the efficiency of the map of maps.

Categories