Combining Hash of String and Hash of Long

Combining Hash of String and Hash of Long - java

I have the following java class:
public class Person{
String name; //a unique name
Long DoB; //a unique time
.
.
.
#Override
public int hashCode(){
return name.hashCode() + DoB.hashCode();
}
}
Is my hashCode method correct (i.e. would it return a unique number of all combinations.
I have a feeling I'm missing something here.

You could let java.util.Arrays do it for you:
return Arrays.hashCode(new Object[]{ name, DoB });

You might also want to use something more fluent and more NPE-bulletproof like Google Guava:
#Override
public int hashCode(){
return Objects.hashCode(name, DoB);
}
#Override
public boolean equals(Object o) {
if ( this == o ) {
return true;
}
if ( o == null || o.getClass() != Person.class ) {
return false;
}
final Person that = (Person) o;
return Objects.equal(name, that.name) && Objects.equal(DoB, that.DoB);
}
Edit:
IntelliJ IDEA and Eclipse can generate more efficient hashCode() and equals().

Aside for the obvious, which is, you might want to implement the equals method as well...
Summing two hash codes has the very small risk of overflowing int
The sum itself seems like a bit of a weak methodology to provide unique hash codes. I would instead try some bitwise manipulation and use a seed.

See Bloch's Effective Java #9.
But you should start with an initial value (so that subsequent zero values are significant), and combine the fields that apply to the result along with a multiplier so that order is significant (so that similar classes will have much different hashes.)
Also, you will have to treat things like long fields and Strings a little different. e.g., for longs:
(int) (field ^ (field>>>32))
So, this means something like:
#Override public int hashCode() {
int result = 17;
result += name.hashCode() == null ? 0 : name.hashCode();
result = 31 * result + (int) (DoB ^ (DoB >>> 32));
return result;
}
31 is slightly magic, but odd primes can make it easier for the compiler to optimize the math to shift-subtraction. (Or you can do the shift-subtraction yourself, but why not let the compiler do it.)

usually a hashcode is build like so:
#Override
public int hashCode(){
return name.hashCode() ^ DoB.hashCode();
}
but the important thing to remember when doing a hashcode method is the use of it. the use of hashcode method is to put different object in different buckets in a hashtable or other collection using hashcode. as such, it's impotent to have a method that gives different answers to different objects at a low run time but doesn't have to be different for every item, though it's better that way.
This hash is used by other code when storing or manipulating the
instance – the values are intended to be evenly distributed for varied
inputs in order to use in clustering. This property is important to
the performance of hash tables and other data structures that store
objects in groups ("buckets") based on their computed hash values
and
The general contract for overridden implementations of this method is
that they behave in a way consistent with the same object's equals()
method: that a given object must consistently report the same hash
value (unless it is changed so that the new version is no longer
considered "equal" to the old), and that two objects which equals()
says are equal must report the same hash value.

Your hash code implementation is fine and correct. It could be better if you follow any of the suggestions other people have made, but it satisfies the contract for hashCode, and collisions aren't particularly likely, though they could be made less likely.

Related

Uses of hashcode in Java apart from hashing collections [duplicate]

In Java, obj.hashCode() returns some value. What is the use of this hash code in programming?

hashCode() is used for bucketing in Hash implementations like HashMap, HashTable, HashSet, etc.
The value received from hashCode() is used as the bucket number for storing elements of the set/map. This bucket number is the address of the element inside the set/map.
When you do contains() it will take the hash code of the element, then look for the bucket where hash code points to. If more than 1 element is found in the same bucket (multiple objects can have the same hash code), then it uses the equals() method to evaluate if the objects are equal, and then decide if contains() is true or false, or decide if element could be added in the set or not.

From the Javadoc:
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java programming language.)

hashCode() is a function that takes an object and outputs a numeric value. The hashcode for an object is always the same if the object doesn't change.
Functions like HashMap, HashTable, HashSet, etc. that need to store objects will use a hashCode modulo the size of their internal array to choose in what "memory position" (i.e. array position) to store the object.
There are some cases where collisions may occur (two objects end up with the same hashcode), and that, of course, needs to be solved carefully.

The value returned by hashCode() is the object's hash code, which is the object's memory address in hexadecimal.
By definition, if two objects are equal, their hash code must also be equal. If you override the equals() method, you change the way two objects are equated and Object's implementation of hashCode() is no longer valid. Therefore, if you override the equals() method, you must also override the hashCode() method as well.
This answer is from the java SE 8 official tutorial documentation

A hashcode is a number generated from any object.
This is what allows objects to be stored/retrieved quickly in a Hashtable.
Imagine the following simple example:
On the table in front of you. you have nine boxes, each marked with a number 1 to 9. You also have a pile of wildly different objects to store in these boxes, but once they are in there you need to be able to find them as quickly as possible.
What you need is a way of instantly deciding which box you have put each object in. It works like an index. you decide to find the cabbage so you look up which box the cabbage is in, then go straight to that box to get it.
Now imagine that you don't want to bother with the index, you want to be able to find out immediately from the object which box it lives in.
In the example, let's use a really simple way of doing this - the number of letters in the name of the object. So the cabbage goes in box 7, the pea goes in box 3, the rocket in box 6, the banjo in box 5 and so on.
What about the rhinoceros, though? It has 10 characters, so we'll change our algorithm a little and "wrap around" so that 10-letter objects go in box 1, 11 letters in box 2 and so on. That should cover any object.
Sometimes a box will have more than one object in it, but if you are looking for a rocket, it's still much quicker to compare a peanut and a rocket, than to check a whole pile of cabbages, peas, banjos, and rhinoceroses.
That's a hash code. A way of getting a number from an object so it can be stored in a Hashtable. In Java, a hash code can be any integer, and each object type is responsible for generating its own. Lookup the "hashCode" method of Object.
Source - here

Although hashcode does nothing with your business logic, we have to take care of it in most cases. Because when your object is put into a hash based container(HashSet, HashMap...), the container puts/gets the element's hashcode.

hashCode() is a unique code which is generated by the JVM for every object creation.
We use hashCode() to perform some operation on hashing related algorithm like Hashtable, Hashmap etc..
The advantages of hashCode() make searching operation easy because when we search for an object that has unique code, it helps to find out that object.
But we can't say hashCode() is the address of an object. It is a unique code generated by JVM for every object.
That is why nowadays hashing algorithm is the most popular search algorithm.

One of the uses of hashCode() is building a Catching mechanism.
Look at this example:
class Point
{
public int x, y;
public Point(int x, int y)
{
this.x = x;
this.y = y;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Point point = (Point) o;
if (x != point.x) return false;
return y == point.y;
}
#Override
public int hashCode()
{
int result = x;
result = 31 * result + y;
return result;
}
class Line
{
public Point start, end;
public Line(Point start, Point end)
{
this.start = start;
this.end = end;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Line line = (Line) o;
if (!start.equals(line.start)) return false;
return end.equals(line.end);
}
#Override
public int hashCode()
{
int result = start.hashCode();
result = 31 * result + end.hashCode();
return result;
}
}
class LineToPointAdapter implements Iterable<Point>
{
private static int count = 0;
private static Map<Integer, List<Point>> cache = new HashMap<>();
private int hash;
public LineToPointAdapter(Line line)
{
hash = line.hashCode();
if (cache.get(hash) != null) return; // we already have it
System.out.println(
String.format("%d: Generating points for line [%d,%d]-[%d,%d] (no caching)",
++count, line.start.x, line.start.y, line.end.x, line.end.y));
}

How are hashcode() and compareTo() related?

I've found this class definition:
class TwoTuple28<A,B> implements Comparable {
// ...
public int hashCode() {
int result = 17;
result = result * 37 + first.hashCode();
result = result * 37 + second.hashCode();
return result;
}
public int compareTo(Object o) {
if(!(o instanceof TwoTuple28)) throw new ClassCastException();
TwoTuple28 t = (TwoTuple28)o;
return (this.hashCode() - t.hashCode() < 0) ? -1 :
((this.hashCode() - t.hashCode() > 0 ? 1 : 0));
}
Could you please explain me, why did the developer use hashCode() into compareTo()? How are they related? Isn't it a wrong way?

In short, this is a very bad idea. It sort of works, but will fail in ways which would be easily missed in testing.
The purpose of comparison is to say when one object is higher, lower or equal to another. When two object are equal, they are a considered a duplicate for ConcurrentSkipListMap, TreeMap and TreeSet which means that two objects with the same hashCode in this case would be considered a duplicate and ignored. How likely is two object with the same hashCode? if you have a collection of tens of thousands, you are highly likely to have duplicates even with a much better hashCode than the one above.
BTW, HashMap and HashSet now use compareTo when there are collisions and even in these collections a poor compareTo can mean keys/elements disappear.
The safe way to implement this method is to assume the first and second fields are Comparable, otherwise you don't have a basis to compare them.
On a related note, a puzzle for you.
Write a program to print strings of words which have a hashCode() of 0. You should be able to generate thousands in less than ten seconds.

Hashset allows duplicates?

This question surely isn't a new one, but I didn't find any helpful answer anywhere.
As you can see in the code below, the equals and hashcode methods are overriden, but it still allows duplicates. The Hashcode has been generated automatically by Netbeans.
#Override
public boolean equals(Object o)
{
TaskDetails other = (TaskDetails) o;
if ( (id_subtask == other.id_subtask)
&& ((date.compareTo(other.date)) == 0) )
{
System.err.println("Duplicate Entry"+id_subtask+" + "+other.id_subtask);
return true;
}
else
{
System.out.println("Good!" +id_subtask+" + "+other.id_subtask);
return false;
}
}
#Override
public int hashCode() {
int hash = 7;
hash = 71 * hash + this.id_subtask;
hash = 71 * hash + this.id_team_member;
hash = 71 * hash + Float.floatToIntBits(this.nb_hours);
hash = 71 * hash + (this.date != null ? this.date.hashCode() : 0);
hash = 71 * hash + (this.comment != null ? this.comment.hashCode() : 0);
hash = 71 * hash + (this.subtask_name != null ? this.subtask_name.hashCode() : 0);
System.out.println("Hash : "+hash + "Subtask : " + id_subtask);
return hash;
}
This the code used to add an entry into the hashset :
TaskDetails newTaskDetails = new TaskDetails
(
s.getId_subtask(),
mus.teamMember.getId_team_member(),
f,
mysqlFormat.format(caldate),
c.substring(0, Math.min(c.length(), 100)),
s.getName_subtask()
);
allTasks.add(newTaskDetails);
(allTasks being the Hashset)
This code is used in function A and B.
If only function A is executed, it works fine. If function B is executed after function A (so the code above is executed twice), then the hashset suddenly accepts duplicates, even though system.err is triggered saying there is a duplicate entry?
Is there a flaw in the code, or am I just missing something?
Thanks for the help!

you are using 2 fields to consider 2 objects to be "equal", but you are using more than 2 fields to construct the hashcode. your hashCode() method cannot be more specific than your equals() method. as a good rule of thumb, your hashCode() method should not use any fields that your equals() method does not use (it can use fewer however). to put it more technically, if 2 objects are "equal" they must have the same hashcode (the reverse is not required).

You are violating the consistency requirement between hashCode() and equals(). If two objects are equal according to equals(), they must also have the same hash. Because your equals only considers two fields, and hashCode considers more, this requirement is not met.

Your problem is that the implementation of hashCode() does not match equals(). Both methods must use the same attributes of your object.
It's likely in your implementation that the hashCode() is different even if equals() evaluates to true. In this case (different hashCodes) the objects are different for the HashMap.
Please correct your implementations to use the same attributes. Then the error should vanish.

From the javadoc of Object
If two objects are equal according to the equals(Object) method, then
calling the hashCode method on each of the two objects must produce the
same integer result.
Your hashcodes are different for two objects that are equal according to the equals(Object) method, so the other code HashSet is going to make the wrong assumptions, and return the wrong results.
Some code is written in a manner that they depend on other objects honoring "contracts". Your class doesn't honor the Object contract, so nothing in collections can be assumed to work, as collections requires that the Object contracts not be broken.

This is a duplicate question, see my previous answer.
The behaviour where a java.util.HashSet allows duplicates is caused when the hash code of the objects in the java.util.HashSet can change.
This typically happens when an object's hash code is constructed from mutable fields.

What is the use of hashCode in Java?

In Java, obj.hashCode() returns some value. What is the use of this hash code in programming?

hashCode() is used for bucketing in Hash implementations like HashMap, HashTable, HashSet, etc.
The value received from hashCode() is used as the bucket number for storing elements of the set/map. This bucket number is the address of the element inside the set/map.
When you do contains() it will take the hash code of the element, then look for the bucket where hash code points to. If more than 1 element is found in the same bucket (multiple objects can have the same hash code), then it uses the equals() method to evaluate if the objects are equal, and then decide if contains() is true or false, or decide if element could be added in the set or not.

From the Javadoc:
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java programming language.)

hashCode() is a function that takes an object and outputs a numeric value. The hashcode for an object is always the same if the object doesn't change.
Functions like HashMap, HashTable, HashSet, etc. that need to store objects will use a hashCode modulo the size of their internal array to choose in what "memory position" (i.e. array position) to store the object.
There are some cases where collisions may occur (two objects end up with the same hashcode), and that, of course, needs to be solved carefully.

The value returned by hashCode() is the object's hash code, which is the object's memory address in hexadecimal.
By definition, if two objects are equal, their hash code must also be equal. If you override the equals() method, you change the way two objects are equated and Object's implementation of hashCode() is no longer valid. Therefore, if you override the equals() method, you must also override the hashCode() method as well.
This answer is from the java SE 8 official tutorial documentation

Although hashcode does nothing with your business logic, we have to take care of it in most cases. Because when your object is put into a hash based container(HashSet, HashMap...), the container puts/gets the element's hashcode.

hashCode() is a unique code which is generated by the JVM for every object creation.
We use hashCode() to perform some operation on hashing related algorithm like Hashtable, Hashmap etc..
The advantages of hashCode() make searching operation easy because when we search for an object that has unique code, it helps to find out that object.
But we can't say hashCode() is the address of an object. It is a unique code generated by JVM for every object.
That is why nowadays hashing algorithm is the most popular search algorithm.

One of the uses of hashCode() is building a Catching mechanism.
Look at this example:
class Point
{
public int x, y;
public Point(int x, int y)
{
this.x = x;
this.y = y;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Point point = (Point) o;
if (x != point.x) return false;
return y == point.y;
}
#Override
public int hashCode()
{
int result = x;
result = 31 * result + y;
return result;
}
class Line
{
public Point start, end;
public Line(Point start, Point end)
{
this.start = start;
this.end = end;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Line line = (Line) o;
if (!start.equals(line.start)) return false;
return end.equals(line.end);
}
#Override
public int hashCode()
{
int result = start.hashCode();
result = 31 * result + end.hashCode();
return result;
}
}
class LineToPointAdapter implements Iterable<Point>
{
private static int count = 0;
private static Map<Integer, List<Point>> cache = new HashMap<>();
private int hash;
public LineToPointAdapter(Line line)
{
hash = line.hashCode();
if (cache.get(hash) != null) return; // we already have it
System.out.println(
String.format("%d: Generating points for line [%d,%d]-[%d,%d] (no caching)",
++count, line.start.x, line.start.y, line.end.x, line.end.y));
}

In Java, why must equals() and hashCode() be consistent?

If I override either method on a class, it must make sure that if A.equals(B) == true then A.hashCode() == B.hashCode must also be true.
Can someone show me a simple example where if this is violated, it'll cause a problem? I think it has something to do with if you use that class as the type of keys to Hashmap?

Sure:
public class Test {
private final int m, n;
public Test(int m, int n) {
this.m = m;
this.n = n;
}
public int hashCode() { return n * m; }
public boolean equals(Object ob) {
if (ob.getClass() != Test.class) return false;
Test other = (Test)ob;
return m == other.m;
}
}
with:
Set<Test> set = new HashSet<Test>();
set.put(new Test(3,4));
boolean b = set.contains(new Test(3, 10)); // false
Technically that should be true because m == 3 in both cases.
In general a HashMap works like this: it has a variable number of what are commonly called "buckets". The number of buckets can change over time (as entries are added and removed) but it is always a power of 2.
Let's say a given HashMap has 16 buckets. When you call put() to add an entry, the hashCode() of the key is calculated and then a mask is taken depending on the size of the buckets. If you (bitwise) AND the hashCode() with 15 (0x0F) you will get the last 4 bits, equaling a number between 0 and 15 inclusive:
int factor = 4;
int buckets = 1 << (factor-1) - 1; // 16
int mask = buckets - 1; // 15
int code = key.hashCode();
int dest = code & mask; // a number from 0 to 15 inclusive
Now if there is already an entry in that bucket you have what's called a collision. There are multiple ways of dealing with this but the one used by HashMap (and is probably the most common overall) is bucketing. All the entries with the same masked hashCode are put in a list of some kind.
So to find if a given key is in the map already:
Calculate the masked hash code;
Find the appropriate bucket;
If it's empty, key not found;
If is isn't empty, loop through all entries in the bucket checking equals().
Looking through a bucket is a linear (O(n)) operation but it's on a small subset. The hashcode bucket determination is essentially constant (O(1)). If buckets are sufficiently small then access to a HashMap is usually described as "near O(1)".
You can make a couple of observations about this.
Firstly, if you have a bunch of objects that all return 42 as their hash code a HashMap will still work but it will operate as an expensive list. Access will be O(n) (as everything will be in the same bucket regardless of the number of buckets). I've actually been asked this in an interview.
Secondly, returning to your original point, if two objects are equal (meaning a.equals(b) == b.equals(a) == true) but have different hash codes then the HashMap will go looking in (probably) the wrong bucket resulting in unpredictable and undefined behaviour.

This is discussed in the Item 8: Always override hashCode when you override equals of Joshua Bloch's Effective Java:
A common source of bugs is the failure to override the hashCode method. You must
override hashCode in every class that overrides equals. Failure to do so will
result in a violation of the general contract for Object.hashCode, which will pre-
vent your class from functioning properly in conjunction with all hash-based collec-
tions, including HashMap, HashSet, and Hashtable.
Here is the contract, copied from the
java.lang.Object specification:
Whenever it is invoked on the same object more than once during an execution of an application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
The key provision that is violated when you fail to override hashCode is
the second one: Equal objects must have equal hash codes. Two distinct
instances may be logically equal according to the class’s equals method, but to
the Object class’s hashCode method, they’re just two objects with nothing much
in common. Therefore object’s hashCode method returns two seemingly random
numbers instead of two equal numbers as required by the contract.
For example, consider the following simplistic PhoneNumber class, whose
equals method is constructed according to the recipe in Item 7:
public final class PhoneNumber {
private final short areaCode;
private final short exchange;
private final short extension;
public PhoneNumber(int areaCode, int exchange,
int extension) {
rangeCheck(areaCode, 999, "area code");
rangeCheck(exchange, 999, "exchange");
rangeCheck(extension, 9999, "extension");
this.areaCode = (short) areaCode;
this.exchange = (short) exchange;
this.extension = (short) extension;
}
private static void rangeCheck(int arg, int max,
String name) {
if (arg < 0 || arg > max)
throw new IllegalArgumentException(name +": " + arg);
}
public boolean equals(Object o) {
if (o == this)
return true;
if (!(o instanceof PhoneNumber))
return false;
PhoneNumber pn = (PhoneNumber)o;
return pn.extension == extension &&
pn.exchange == exchange &&
pn.areaCode == areaCode;
}
// No hashCode method!
... // Remainder omitted
}
Suppose you attempt to use this class
with a HashMap:
Map m = new HashMap();
m.put(new PhoneNumber(408, 867, 5309), "Jenny");
At this point, you might expect
m.get(new PhoneNumber(408 , 867,
5309)) to return "Jenny", but it
returns null. Notice that two PhoneNumber instances are
involved: One is used for insertion
into the HashMap, and a second, equal,
instance is used for (attempted)
retrieval. The PhoneNumber class’s
failure to override hashCode causes
the two equal instances to have
unequal hash codes, in violation of
the hashCode contract. Therefore the
get method looks for the phone number
in a different hash bucket from the
one in which it was stored by the put
method. Fixing this problem is as
simple as providing a proper hashCode
method for the PhoneNumber class.
[...]
See the Chapter 3 for the full content.

Containers like HashSet rely on the hash function to determine where to put it, and where to get it from when asked for it. If A.equals(B), then a HashSet is expecting A to be in the same place as B. If you put A in with value V, and look up B, you should expect to get V back (since you've said A.equals(B)). But if A.hashcode() != B.hashcode(), then the hashset may not find where you put it.

Here's a little example:
Set<Foo> myFoos = new HashSet<Foo>();
Foo firstFoo = new Foo(123,"Alpha");
myFoos.add(firstFoo);
// later in the processing you get another Foo from somewhere
Foo someFoo = //use imagination here...;
// maybe you get it from a database... and it's equal to Foo(123,"Alpha)
if (myFoos.contains(someFoo)) {
// maybe you win a million bucks.
}
So, imagine that the hashCode that gets created for firstFoo is 99999 and it winds up at a specific spot in the myFoos HashSet. Later when you get the someFoo and you look for it in the myFoos HashSet, it needs to generate the same hashCode so you can find it.

It's exactly because of hash tables.
Because of the possibility of hash code collisions, hash tables need to check identity as well, otherwise the table can't determine if it found the object it was looking for, or one with the same hash code. So every get() in a hash table calls key.equals(potentialMatch) before returning a value.
If equals() and hashCode() are inconsistent you can get very inconsistent behavior. Say for two objects, a and b, a.equals(b) returns true, but a.hashCode() != b.hashCode(). Insert a and a HashSet will return false for .contains(b), but a List created from that set will return true (because the list doesn't use hash codes).
HashSet set = new HashSet();
set.add(a);
set.contains(b); // false
new ArrayList(set).contains(b); // true
Obviously, that could be bad.

The idea behind this is that two objects are "equal" if all of their fields have equal values. If all of fields have equal values, the two objects should have the same hash value.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.