I am looking for a hash function for an integer array containing about 17 integers each. There are about 1000 items in the HashMap and I want the computation to be as fast as possible.
I am now kind of confused by so many hash functions to choose and I notice that most of them are designed for strings with different characters. So is there a hash function designed for strings with only numbers and quick to run?
Thanks for your patience!
You did not specify any requirements (except speed of calculation), but take a look at java.util.Arrays#hashCode. It should be fast, too, just iterating once over the array and combining the elements in an int calculation.
Returns a hash code based on the contents of the specified array. For any two non-null int arrays a and b such that Arrays.equals(a, b), it is also the case that Arrays.hashCode(a) == Arrays.hashCode(b).
The value returned by this method is the same value that would be obtained by invoking the hashCode method on a List containing a sequence of Integer instances representing the elements of a in the same order. If a is null, this method returns 0.
And the hashmap accepts an array of integer as the key.
Actually, no!
You could technically use int[] as a key in a HashMap in Java (you can use any kind of Object), but that won't work well, as arrays don't define a useful hashCode method (or a useful equals method). So the key will use object identity. Two arrays with identical content will be considered to be different from each-other.
You could use List<Integer>, which does implement hashCode and equals. But keep in mind that you must not mutate the list after setting it as a key. That would break the hashtable.
hashmap functions can be found in
https://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html
Creating a hashmap is easy.. it goes as
HashMap<Object, Integer> map = new HashMap<Object, Integer>();
Related
in my project I use HashMap in order to store some data, and I've recently discovered that when I mutate the keys of the HashMap, some unexpected wrong results may happen. For Example:
HashMap<ArrayList,Integer> a = new HashMap<>();
ArrayList list1 = new ArrayList<>();
a.put(list1, 1);
System.out.println(a.containsKey(new ArrayList<>())); // true
list1.add(5);
ArrayList list2 = new ArrayList<>();
list2.add(5);
System.out.println(a.containsKey(list2)); // false
Note that both a.keySet().iterator().next().hashCode() == list2.hashCode() and a.keySet().iterator().next().equals(list2) are true.
I cannot understand why it happens, referring to the fact that the two objects are equal and have the same hash-code. Do anyone know what is the cause of that, and if there is any other similar structure that allows mutation of the keys? Thanks.
Mutable keys are always a problem. Keys are to be considered mutable if the mutation could change their hashcode and/or the result of equals(). That being said, lists often generate their hashcodes and check equality based on their elements so they almost never are good candidates for map keys.
What is the problem in your example? When the key is added it is an empty list and thus produces a different hashcode than when it contains an element. Hence even though the hashcode of the key and list2 are the same after changing the key list you'll not find the element. Why? Simply because the map looks in the wrong bucket.
Example (simplified):
Let's start with a few assumptions:
an empty list returns a hashcode of 0
if the list contains the element 5 it returns the hashcode 5
our map has 16 buckets (default)
the bucket index is determined by hashcode % 16 (the number of our buckets)
If you now add the empty list it gets inserted into bucket 0 due to its hashcode.
When you do the lookup with list1 it will look in bucket 5 due to the hashcode of 5. Since that bucket is empty nothing will be found.
The problem is that your key list changes its hashcode and thus should be put into a different bucket but the map doesn't know this should happen (and doing so would probably cause a bunch of other problems).
According to the javadocs for Map:
Note: great care must be exercised if mutable objects are used as map
keys. The behavior of a map is not specified if the value of an object
is changed in a manner that affects equals comparisons while the
object is a key in the map. A special case of this prohibition is that
it is not permissible for a map to contain itself as a key. While it
is permissible for a map to contain itself as a value, extreme caution
is advised: the equals and hashCode methods are no longer well defined
on such a map.
Your lists are the keys and you're changing them. It would not be a problem if the contents of the list were not what determine the values for hash code and what is equal, however that is not your case. If you think about it, it doesn't make much sense to change the key of a map. The key is what identifies the value, and if that key changes, all bets are off.
The map inserts the value given the hash code upon insertion. When you search for it later, it uses the hash code of the parameter to determine if it is a hit. I think you'd find that had you inserted list1 with the value already inserted that you would see "true" printed out since list2.hashCode() would produce the same hash code as list1 when it was inserted.
That's because a HashMap uses the hashCode() Method of Object in combination with equals(Object obj) to check if this map contains an object.
See:
ArrayList<Integer> a = new ArrayList<>();
a.add(1);
System.out.println(a.hashCode());
a.add(2);
System.out.println(a.hashCode());
This example shows, that the hashCode of your ArrayList has changed.
You should never use a mutable object as a key in your hashmap.
So what basically going on when u put the list1 as key in line 3 is that the map calculates its hashCode which it would later compare in containsKey(someKey) .
but when u mutated the list1 in line 5 its hashCode is essentially changed.
so if u now do
System.out.println(a.containsKey(list1));
after line 5 it would say false
and if u do System.out.println(a.get(list1));
it would say null as its comparing two different hashCodes
Probably you didn't override equals() and hashCode() methods.
I have a question. It says that hashset in java doesn't mantain an order but looking at my program
public static void main(String[] args) {
HashSet<Integer> io=new HashSet<Integer>();
Integer io1=new Integer(4);
Integer io2=new Integer(5);
Integer io3=new Integer(6);
io.add(io2);
io.add(io3);
io.add(io1);
System.out.println(io);
}
and execute it it give me an ordered set everytime i run it. Why this is happening?
Another question is: if i implement a treeset (like i did in the previous program but instead of hashset using treeset and intead of Integer using my class) i have to implement compareto ?
HashSet doesn't maintain order, but it has to iterate over the elements in some order when you print them. HashSet is backed by a HashMap, which iterates over the elements in the order of the bins in which they are stored. In your simple example, 4,5,6 are mapped to bins 4,5,6 (since the hashCode of an integer is the value of the integer) so they are printed in ascending order.
If you tried to add 40,50,60 you would see a different order ([50, 40, 60]), since the default initial number of bins is 16, so the hash codes 40,50,60 will be mapped to bins 40%16 (8),50%16 (2) ,60%16 (12), so 50 is the first element iterated, followed by 50 and 60.
As for TreeSet<SomeCostumClass>, you can either implement Comparable<SomeCostumClass> in SomeCostumClass, or pass a Comparator<SomeCostumClass> to the constructor.
As per oracle docs, there is no guarantee that you will get the same order all the time.
This class implements the Set interface, backed by a hash table
(actually a HashMap instance). It makes no guarantees as to the
iteration order of the set; in particular, it does not guarantee that
the order will remain constant over time.
A HashSet keeps an internal hash table (https://en.wikipedia.org/wiki/Hash_table), which is driven by the result of the respective objects hashCode(). For most objects, the hashCode() function is deterministic, so the results of iterating a HashSet of the same elements will likely be the same. That does not mean it will be ordered. However, for Integer specifically, the hashCode() of the function returns the very integer itself, therefore, for a single-level hash table it will be ordered.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Everywhere you can find answer what are differences:
Map is storing keys-values, it is not synchronized(not a thread safe), allows null values and only one null key, faster to get value because all values have unique key, etc.
Set - not sorted, slower to get value, storing only value, does not allow duplicates or null values I guess.
BUT what means Hash word (that is what they have the same). Is it something about hashing values or whatever I hope you can answer me clearly.
Both use hash value of the Object to store which internally uses hashCode(); method of Object class.
So if you are storing instances of your custom class then you need to override hashCode(); method.
HashSet and HashMap have a number of things in common:
The start of their name - which is a clue to the real similarity.
They use Hash Codes (from the hashCode method built into all Java objects) to quickly process and organize Objects.
They are both unordered collections - but both provide ordered varients (LinkedHashX to store objects in the order of addition)
There is also TreeSet/TreeMap to sort all objects present in the collection and keep them sorted. A comparison of TreeSet to TreeMap will find very similar differences and similarities to one between HashSet and HashMap.
They are also both impacted by the strengths and limitations of Hash algorithms in general.
Hashing is only effective if the objects have well behaved hash functions.
Hashing breaks entirely if equals and hashCode do not follow the correct contract.
Key objects in maps and objects in set should be immutable (or at least their hashCode and equals return values should never change) as otherwise behavior becomes undefined.
If you look at the Map API you can also see a number of other interesting connections - such as the fact that keySet and entrySet both return a Set.
None of the Java Collections are thread safe. Some of the older classes from other packages were but they have mostly been retired. For thread-safety look at the concurrent package for non-thread-safety look at the collections package.
Just look into HashSet source code and you will see that it uses HashMap. So they have the same properties of null-safety, synchronization etc:
public class HashSet<E>
...
private transient HashMap<E,Object> map;
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
/**
* Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
* default initial capacity (16) and load factor (0.75).
*/
public HashSet() {
map = new HashMap<>();
}
...
public boolean contains(Object o) {
return map.containsKey(o);
}
...
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
...
}
HashSet is like a HashMap where you don't care about the values but about the keys only.
So you care only if a given key K is in the set but not about the value V to which it is mapped (you can think of it as if V is a constant e.g. V=Boolean.TRUE for all keys in the HashSet). So HashSet has no values (V set). This is the whole difference from structural point of view. The hash part means that when putting elements into the structure Java first calls the hashCode method. See also http://en.wikipedia.org/wiki/Open_addressing to understand in general what happens under the hood.
The hash value is used to check faster if two objects are the same. If two objects have same hash, they can be equal or not equal (so they are then compared for equality with the equals method). But if they have different hashes they are different for sure and the check for equality is not needed. This doesn't mean that if two objects have same hash values they overwrite each other when they are stored in the HashSet or in the HashMap.
Both are not Thread safe and store values using hashCode(). Those are common facts. And another one is both are member of Java collection framework. But there are lots of variations between those two.
Hash regards the technique used to convert the key to an index. Back in the data strucutures class we used to learn how to construct a hash table, to do that you would need to get the strings that were inserted as values and convert them to a number to index an array used internally as the storing data structure.
One problem that was also very discussed was to find a hashing function that would incurr in minimum colision so that we won't have two different objects, with different keys sharing the same position.
So, the hash is about how the keys are processed to be stored. If we think about it for a while, there isn't a (real) way to index memory with strings, only with numbers, so to have a 2d structure like a table that is indexed by a string (or an object as you wish) you need to generate a number (or a hash) for that string and store the value in an array in this index. However, if you need the key "name" you would need a different array to, in the same index, store the key "name".
Cheers
The "HASH" word is common because both uses hashing mechanism. HashSet is actually implemented using HashMap, using dummy object instance on every entry of the Set. And thereby a wastage of 4 bytes for each entry.
I am trying to figure something out about hashing in java.
If i want to store some data in a hashmap for example, will it have some kind of underlying hashtable with the hashvalues?
Or if someone could give a good and simple explanation of how hashing work, I would really appreciate it.
HashMap is basically implemented internally as an array of Entry[]. If you understand what is linkedList, this Entry type is nothing but a linkedlist implementation. This type actually stores both key and value.
To insert an element into the array, you need index. How do you calculate index? This is where hashing function(hashFunction) comes into picture. Here, you pass an integer to this hashfunction. Now to get this integer, java gives a call to hashCode method of the object which is being added as a key in the map. This concept is called preHashing.
Now once the index is known, you place the element on this index. This is basically called as BUCKET , so if element is inserted at Entry[0], you say that it falls under bucket 0.
Now assume that the hashFunction returns you same index say 0, for another object that you wanted to insert as a key in the map. This is where equals method is called and if even equals returns true, it simple means that there is a hashCollision. So under this case, since Entry is a linkedlist implmentation, on this index itself, on the already available entry at this index, you add one more node(Entry) to this linkedlist. So bottomline, on hashColission, there are more than one elements at a perticular index through linkedlist.
The same case is applied when you are talking about getting a key from map. Based on index returned by hashFunction, if there is only one entry, that entry is returned otherwise on linkedlist of entries, equals method is called.
Hope this helps with the internals of how it works :)
Hash values in Java are provided by objects through the implementation of public int hashCode() which is declared in Object class and it is implemented for all the basic data types. Once you implement that method in your custom data object then you don't need to worry about how these are used in miscellaneous data structures provided by Java.
A note: implementing that method requires also to have public boolean equals(Object o) implemented in a consistent manner.
If i want to store some data in a hashmap for example, will it have some kind of underlying hashtable with the hashvalues?
A HashMap is a form of hash table (and HashTable is another). They work by using the hashCode() and equals(Object) methods provided by the HashMaps key type. Depending on how you want you keys to behave, you can use the hashCode / equals methods implemented by java.lang.Object ... or you can override them.
Or if someone could give a good and simple explanation of how hashing work, I would really appreciate it.
I suggest you read the Wikipedia page on Hash Tables to understand how they work. (FWIW, the HashMap and HashTable classes use "separate chaining with linked lists", and some other tweaks to optimize average performance.)
A hash function works by turning an object (i.e. a "key") into an integer. How it does this is up to the implementor. But a common approach is to combine hashcodes of the object's fields something like this:
hashcode = (..((field1.hashcode * prime) + field2.hashcode) * prime + ...)
where prime is a smallish prime number like 31. The key is that you get a good spread of hashcode values for different keys. What you DON'T want is lots of keys all hashing to the same value. That causes "collisions" and is bad for performance.
When you implement the hashcode and equals methods, you need to do it in a way that satisfies the following constraints for the hash table to work correctly:
1. O1.equals(o2) => o1.hashcode() == o2.hashcode()
2. o2.equals(o2) == o2.equals(o1)
3. The hashcode of an object doesn't change while it is a key in a hash table.
It is also worth noting that the default hashCode and equals methods provided by Object are based on the target object's identity.
"But where is the hash values stored then? It is not a part of the HashMap, so is there an array assosiated to the HashMap?"
The hash values are typically not stored. Rather they are calculated as required.
In the case of the HashMap class, the hashcode for each key is actually cached in the entry's Node.hash field. But that is a performance optimization ... to make hash chain searching faster, and to avoid recalculating hashes if / when the hash table is resized. But if you want this level of understanding, you really need to read the source code rather than asking Questions.
This is the most fundamental contract in Java: the .equals()/.hashCode() contract.
The most important part of it here is that two objects which are considered .equals() should return the same .hashCode().
The reverse is not true: objects not considered equal may return the same hash code. But it should be as rare an occurrence as possible. Consider the following .hashCode() implementation, which, while perfectly legal, is as broken an implementation as can exist:
#Override
public int hashCode() { return 42; } // legal!!
While this implementation obeys the contract, it is pretty much useless... Hence the importance of a good hash function to begin with.
Now: the Set contract stipulates that a Set should not contain duplicate elements; however, the strategy of a Set implementation is left... Well, to the implementation. You will notice, if you look at the javadoc of Map, that its keys can be retrieved by a method called .keySet(). Therefore, Map and Set are very closely related in this regard.
If we take the case of a HashSet (and, ultimately, HashMap), it relies on .equals() and .hashCode(): when adding an item, it first calculates this item's hash code, and according to this hash code, attemps to insert the item into a given bucket. In contrast, a TreeSet (and TreeMap) relies on the natural ordering of elements (see Comparable).
However, if an object is to be inserted and the hash code of this object would trigger its insertion into a non empty hash bucket (see the legal, but broken, .hashCode() implementation above), then .equals() is used to determine whether that object is really unique.
Note that, internally, a HashSet is a HashMap...
Hashing is a way to assign a unique code for any variable/object after applying any function/algorithm on its properties.
HashMap stores key-value pair in Map.Entry static nested class implementation.
HashMap works on hashing algorithm and uses hashCode() and equals() method in put and get methods.
When we call put method by passing key-value pair, HashMap uses Key hashCode() with hashing to find out
the index to store the key-value pair. The Entry is stored in the LinkedList, so if there are already
existing entry, it uses equals() method to check if the passed key already exists, if yes it overwrites
the value else it creates a new entry and store this key-value Entry.
When we call get method by passing Key, again it uses the hashCode() to find the index
in the array and then use equals() method to find the correct Entry and return it’s value.
Below image will explain these detail clearly.
The other important things to know about HashMap are capacity, load factor, threshold resizing.
HashMap initial default capacity is 16 and load factor is 0.75. Threshold is capacity multiplied
by load factor and whenever we try to add an entry, if map size is greater than threshold,
HashMap rehashes the contents of map into a new array with a larger capacity.
The capacity is always power of 2, so if you know that you need to store a large number of key-value pairs,
for example in caching data from database, it’s good idea to initialize the HashMap with correct capacity
and load factor.
I have been reading/researching the reason why HashMapis faster than HashSet.
I am not quite understanding the following statements:
HashMap is faster than HashSet because the values are associated to a unique key.
In HashSet, member object is used for calculating hashcode value which can be same for two objects so equals() method is used to check for equality. If it returns false, that means the two objects are different. In HashMap, the hashcode value is calculated using the key object.
The HashMap hashcode value is calculated using the key object. Here, the member object is used to calculate the hashcode, which can be the same for two objects, so equals() method is used to check for equality. If it returns false, that means the two objects are different.
To conclude my question:
I thought HashMap and HashSet calculate the hashcode in the same way. Why are they different?
Can you provide a concrete example how HashSet and HashMap calculating the hashcode differently?
I know what a "key object" is, but what does it mean by "member object"?
HashMap can do the same things as HashSet, and faster. Why do we need HashSet? Example:
HashMap <Object1, Boolean>= new HashMap<Object1, boolean>();
map.put("obj1",true); => exist
map.get("obj1"); =>if null = not exist, else exist
Performance:
If you look at the source code of HashSet (at least JDK 6, 7 and 8), it uses HashMap internally, so it basically does exactly what you are doing with sample code.
So, if you need a Set implementation, you use HashSet, if you need a Map - HashMap. Code using HashMap instead of HashSet will have exactly the same performance as using HashSet directly.
Choosing the right collection
Map - maps keys to values (associative array) - http://en.wikipedia.org/wiki/Associative_array.
Set - a collection that contains no duplicate elements - http://en.wikipedia.org/wiki/Set_(computer_science).
If the only thing you need your collection for is to check if an element is present in there - use Set. Your code will be cleaner and more understandable to others.
If you need to store some data for your elements - use Map.
None of these answers really explain why HashMap is faster than HashSet. They both have to calculate the hashcode, but think about the nature of the key of a HashMap - it is typically a simple String or even a number. Calculating the hashcode of that is much faster than the default hashcode calculation of an entire object. If the key of the HashMap was the same object as that stored in a HashSet, there would be no real difference in performance. The difference comes in the what sort of object is the HashMap's key.