I have three fields, namely
Number1
Number2
Time
I am trying to write a function in java that returns a unique hash value (long needs to be the return type of hash) for the above fields. This hash would then be used to store database rows corresponding to the above mentioned fields in a HashSet. I am new to writing a hash code function, can someone please review what I have.
public class HashCode {
private long Number1;
private long Number2;
String Time;
public HashCode(long Number1, long Number2, String Time){
this.Number1 = Number1;
this.Number2 = Number2;
this.Time = Time;
}
public long getHashCode() {
long hash = 3;
hash = 47 * hash + (long) (this.Number1 ^ (this.Number1 >>> 32));
hash = 47 * hash + (long) (this.Number2 ^ (this.Number2 >>> 32));
hash = 47 * hash + (this.Time != null ? this.Time.hashCode() : 0);
return hash;
}
}
You can just use HashCodeBuilder from commons-lang and not have to worry about doing this by hand anymore.
#Override
public int hashCode() {
// you pick a hard-coded, randomly chosen, non-zero, odd number
// ideally different for each class
return new HashCodeBuilder(17, 37).
append(Number1).
append(Number2).
append(Time).
toHashCode();
}
btw, it's convention in Java for variable names to start with a lowercase. You're going to find it confusing to name variables things like Number1, Number2, etc., as people will confuse these with the names of types (such as String, Number, Long, etc.).
I take it's a special version of hashCode. Otherwise you would need to overwrite hashCode, don't define a new method. Containers like HashSet don't get your own hash code.
So for your specialized version for long, you do not need to use the xor (^) because it's already long. Just use the long value.
When using hashCode of String it's not for long, just for int, so it will not "use" all your space. You could duplicate the hashCode of String with longs for your purpose.
else looks good.
(By the way, members should be called with lower letters and Time should be private as well.)
Related
The accepted answer in Best implementation for hashCode method gives a seemingly good method for finding Hash Codes. But I'm new to Hash Codes, so I don't quite know what to do.
For 1), does it matter what nonzero value I choose? Is 1 just as good as other numbers such as the prime 31?
For 2), do I add each value to c? What if I have two fields that are both a long, int, double, etc?
Did I interpret it right in this class:
public MyClass{
long a, b, c; // these are the only fields
//some code and methods
public int hashCode(){
return 37 * (37 * ((int) (a ^ (a >>> 32))) + (int) (b ^ (b >>> 32)))
+ (int) (c ^ (c >>> 32));
}
}
The value is not important, it can be whatever you want. Prime numbers will result in a better distribution of the hashCode values therefore they are preferred.
You do not necessary have to add them, you are free to implement whatever algorithm you want, as long as it fulfills the hashCode contract:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
There are some algorithms which can be considered as not good hashCode implementations, simple adding of the attributes values being one of them. The reason for that is, if you have a class which has two fields, Integer a, Integer b and your hashCode() just sums up these values then the distribution of the hashCode values is highly depended on the values your instances store. For example, if most of the values of a are between 0-10 and b are between 0-10 then the hashCode values are be between 0-20. This implies that if you store the instance of this class in e.g. HashMap numerous instances will be stored in the same bucket (because numerous instances with different a and b values but with the same sum will be put inside the same bucket). This will have bad impact on the performance of the operations on the map, because when doing a lookup all the elements from the bucket will be compared using equals().
Regarding the algorithm, it looks fine, it is very similar to the one that Eclipse generates, but it is using a different prime number, 31 not 37:
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + (int) (a ^ (a >>> 32));
result = prime * result + (int) (b ^ (b >>> 32));
result = prime * result + (int) (c ^ (c >>> 32));
return result;
}
A well-behaved hashcode method already exists for long values - don't reinvent the wheel:
int hashCode = Long.hashCode((a * 31 + b) * 31 + c); // Java 8+
int hashCode = Long.valueOf((a * 31 + b) * 31 + c).hashCode() // Java <8
Multiplying by a prime number (usually 31 in JDK classes) and cumulating the sum is a common method of creating a "unique" number from several numbers.
The hashCode() method of Long keeps the result properly distributed across the int range, making the hash "well behaved" (basically pseudo random).
I created a class "Book":
public class Book {
public static int idCount = 1;
private int id;
private String title;
private String author;
private String publisher;
private int yearOfPublication;
private int numOfPages;
private Cover cover;
...
}
And then i need to override the hashCode() and equals() methods.
#Override
public int hashCode() {
int result = id; // !!!
result = 31 * result + (title != null ? title.hashCode() : 0);
result = 31 * result + (author != null ? author.hashCode() : 0);
result = 31 * result + (publisher != null ? publisher.hashCode() : 0);
result = 31 * result + yearOfPublication;
result = 31 * result + numOfPages;
result = 31 * result + (cover != null ? cover.hashCode() : 0);
return result;
}
It's no problem with equals(). I just wondering about one thing in hashCode() method.
Note: IntelliJ IDEA generated that hashCode() method.
So, is it OK to set the result variable to id, or should i use some prime number?
What is the better choice here?
Thanks!
Note that only the initial value of the result is set to id, not the final one. The final value is calculated by combining that initial value with hash codes of other parts of the object, multiplied by a power of a small prime number (i.e. 31). Using id rather than an arbitrary prime is definitely right in this context.
In general, there is no advantage to hash code being prime (it's the number of hash buckets that needs to be prime). Using an int as its own hash code (in your case, that's id and numOfPages) is a valid approach.
It helps to know what the hashCode is used for. It's supposed to help you map a theoretically infinite set of objects to fitting in a small number of "bins", with each bin having a number, and each object saying which bin it wants to go in based on its hashCode. The question is not whether it's okay to do one thing or another, but whether what you want to do matches what the hashCode function is for.
As per http://docs.oracle.com/javase/6/docs/api/java/lang/Object.html#hashCode(), it's not about the number you return, it's about how it behaves for different objects of the same class.
If the object doesn't change, the hashCode must be the same value every time you call the hashCode() function.
Two objects that are equal according to .equals, must have the same hashCode.
Two objects that are not equal may have the same hashCode. (if this wasn't the case, there would be no point in using the hashCode at all, because every object already has a unique object pointer)
If you're reimplementing the hashCode function, the most important thing is to either rely on a tool to generate it for you, or to use code you understand that obeys those rules. The basic Java hashCode function uses an incredibly well-researched, seemingly simple bit of code for String hashing, so the code you see is based on turning everything into Strings and falling back to that.
If you don't know why that works, don't touch it. Just rely on it working and move on. That 31 is ridiculously important and ensures an even hashing distribution. See Why does Java's hashCode() in String use 31 as a multiplier? for the why on that one.
However, this might also be way more than you need. You could use id, but then you're basically negating the reason to use a hashCode (because now every object will want to be in a bin on its own, turning any hashed collection into a flat array. Kind of silly).
If you know the distribution of your id values, there are far easier hashCodes to come up with. Say you know they are always between 0 and Interger.MAX_VALUE, and you know there are never any gaps between ids, you could simply generate a hashCode like
final int modulus = Intereger.MAX_VALUE / 255;
int hashCode() {
return this.id % modulus;
}
now, you have a hashCode optimised for 255 bins, fulfilling the necessary requirements for an acceptable hashCode function.
Note : In my answer I am assuming that you know how hash code is meant to be used. The following just talks about any potential optimization using a non-zero constant for the initial value of result may produce.
If id is rarely 0 then it's fine to use it. However, if it's 0 frequently you should use some constant instead (just using 1 should be fine). The reason you want for it to be non-zero is so that the 31 * result part always adds some value to the hash. That way say if object A has all fields null or 0 except for yearOfPublication = 1 and object B has all fields null or 0 except for numOfPages = 1 the hash codes will be:
A.hashCode() => initialValue * 31 ^ 4 + 1
B.hashCode() => initialValue * 31 ^ 5 + 1
As you can see if initialValue is 0 then both hash codes are the same, however if it's not 0 then they will be different. It is preferable for them to be different so as to reduce collisions in data structures that use the hash code like HashMap.
That said, in your example of the Book class it is likely that id will never be 0. In fact, if id uniquely identifies the Book then you can have the hashCode() method just return the id.
I need a hashCode implementation in Java which ignores the order of the fields in my class Edge. It should be possible that Node first could be Node second, and second could be Node first.
Here is my method is depend on the order:
public class Edge {
private Node first, second;
#Override
public int hashCode() {
int hash = 17;
int hashMultiplikator = 79;
hash = hashMultiplikator * hash
+ first.hashCode();
hash = hashMultiplikator * hash
+ second.hashCode();
return hash;
}
}
Is there a way to compute a hash which is for the following Edges the same but unique?
Node n1 = new Node("a");
Node n2 = new Node("b");
Edge ab = new Edge(n1,n2);
Edge ba = new Edge(n2,n1);
ab.hashCode() == ba.hashCode() should be true.
You can use some sort of commutative operation instead of what you have now, like addition:
#Override
public int hashCode() {
int hash = 17;
int hashMultiplikator = 79;
int hashSum = first.hashCode() + second.hashCode();
hash = hashMultiplikator * hash * hashSum;
return hash;
}
I'd recommend that you still use the multiplier since it provides some entropy to your hash code. See my answer here, which says:
Some good rules to follow for hashing are:
Mix up your operators. By mixing your operators, you can cause the results to vary more. Using simply x * y in this test, I had a very
large number of collisions.
Use prime numbers for multiplication. Prime numbers have interesting binary properties that cause multiplication to be more volatile.
Avoid using shift operators (unless you really know what you're doing). They insert lots of zeroes or ones into the binary of the
number, decreasing volatility of other operations and potentially even
shrinking your possible number of outputs.
To solve you problem you have to combine both hashCodes of the components.
An example could be:
#Override
public int hashCode() {
int prime = 17;
return prime * (first.hashCode() + second.hashCode());
}
Please check if this matches your requirements. Also a multiplikation or an XOR insted of an addition could be possible.
How do you in a general (and performant) way implement hashcode while minimizing collisions for objects with 2 or more integers?
update: as many stated, you cant ofcource eliminate colisions entierly (honestly didnt think about it). So my question should be how do you minimize collisions in a proper way, edited to reflect that.
Using NetBeans' autogeneration fails; for example:
public class HashCodeTest {
#Test
public void testHashCode() {
int loopCount = 0;
HashSet<Integer> hashSet = new HashSet<Integer>();
for (int outer = 0; outer < 18; outer++) {
for (int inner = 0; inner < 2; inner++) {
loopCount++;
hashSet.add(new SimpleClass(inner, outer).hashCode());
}
}
org.junit.Assert.assertEquals(loopCount, hashSet.size());
}
private class SimpleClass {
int int1;
int int2;
public SimpleClass(int int1, int int2) {
this.int1 = int1;
this.int2 = int2;
}
#Override
public int hashCode() {
int hash = 5;
hash = 17 * hash + this.int1;
hash = 17 * hash + this.int2;
return hash;
}
}
}
Can you in a general (and performant) way implement hashcode without
colisions for objects with 2 or more integers.
It is technically impossible to have zero collision when hashing to 32 bits (one integer) something made of more than 32 bits (like 2 or more integers).
This is what eclipse auto-generates:
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + getOuterType().hashCode();
result = prime * result + int1;
result = prime * result + int2;
return result;
}
And with this code your testcase passes...
PS: And don't forget to implement equals()!
There is no way to eliminate hash collisions entirely. Your approach is basically the preferred one to minimize collisions.
Creating a hash method with zero collisions is impossible. The idea of a hash method is you're taking a large set of objects and mapping it to a smaller set of integers. The best you can do is minimize the number of collisions you get within a subset of your objects.
As others have said, it's more important to minimize collisions that to eliminate them -- especially since you didn't say how many buckets you're aiming for. It's going to be much easier to have zero collisions with 5 items in 1000 buckets than if you have 5 items in 2 buckets! And even if there are plenty of buckets, your collisions could look very different with 1000 buckets vs 1001.
Another thing to note is that there's a good chance that the hash you provide won't even be the one the HashMap eventually uses. If you take a look at the OpenJDK HashMap code, for instance, you'll see that your keys' hashCodes are put through a private hash method (line 264 in that link) which re-hashes them. So, if you're going through the trouble of creating a carefully constructed custom hash function to reduce collisions (rather than just a simple, auto-generated one), make sure you also understand who's going to use it, and how.
I've recently encountered an odd situation when computing the hash Code of tuples of doubles in java. Suppose that you have the two tuples (1.0,1.0) and (Double.POSITIVE_INFINITY,Double.POSITIVE_INFINITY). Using the idiom stated in Joshua Bloch's Effective Java(Item 7), these two tuples would not be considered equal (Imagine that these tuples are objects). However, using the formula stated in Item 8 to compute hashCode() of each tuple evaluates to the same value.
So my question is: is there something strange about this formula that I missed out on when I was writing my formulas, or is it just an odd case of hash-code collisions?
Here is my short, comparative method to illustrate the situation (I wrote it as a JUnit4 test, but it should be pretty easily converted to a main method).
#Test
public void testDoubleHashCodeAndInfinity(){
double a = 1.0;
double b = 1.0;
double c = Double.POSITIVE_INFINITY;
double d = Double.POSITIVE_INFINITY;
int prime = 31;
int result1 = 17;
int result2 = 17;
long temp1 = Double.doubleToLongBits(a);
long temp2 = Double.doubleToLongBits(c);
//this assertion passes successfully
assertTrue("Double.doubleToLongBits(Double.POSITIVE_INFINITY" +
"==Double.doubleToLongBits(1.0)",temp1!=temp2);
result1 = prime*result1 + (int)(temp1^(temp1>>>32));
result2 = prime*result2 + (int)(temp2^(temp2>>>32));
//this assertion passes successfully
assertTrue("Double.POSITIVE_INFINITY.hashCode()" +
"==(1.0).hashCode()",result1!=result2);
temp1 = Double.doubleToLongBits(b);
temp2 = Double.doubleToLongBits(d);
//this assertion should pass successfully
assertTrue("Double.doubleToLongBits(Double.POSITIVE_INFINITY" +
"==Double.doubleToLongBits(1.0)",temp1!=temp2);
result1 = prime*result1+(int)(temp1^(temp1>>>32));
result2 = prime*result2+(int)(temp2^(temp2>>>32));
//this assertion fails!
assertTrue("(1.0,1.0).hashCode()==" +
"(Double.POSITIVE_INFINITY,Double.POSITIVE_INFINITY).hashCode()",
result1!=result2);
}
It's just a coincidence. However, it's an interesting one. Try this:
Double d1 = 1.0;
Double d2 = Double.POSITIVE_INFINITY;
int hash1 = d1.hashCode();
int hash2 = d2.hashCode();
// These both print -1092616192
// This was me using the wrong hash combinator *and*
// the wrong tuples... but it's interesting
System.out.println(hash1 * 17 + hash2);
System.out.println(hash2 * 17 + hash1);
// These both print -33554432
System.out.println(hash1 * 31 + hash1);
System.out.println(hash2 * 31 + hash2);
Basically the bit patterns of the hash determine this. hash1 (1.0's hash code) is 0x3ff00000 and hash2 (infinity's hash code) is 0x7ff00000. That sort of hash and those sort of multipliers produces that sort of effect...
Executive summary: it's a coincidence, but don't worry about it :)
It may be a coincidence, but that sure does not help when you are trying to use the hashCode in a Map to cache objects that have doubles in tuples. I ran into this when creating a map of Thermostat temp settings classes. Then other tests were failing because I was getting the wrong object out of the Map when using the hashCode as the key.
The solution I found to fix this was to create an appended String of the 2 double parameters and called hashCode() on the String. To avoid the String overhead I cached the hashcode.
private volatile hashCode;
#Override public int hashCode()
{
int result = hashCode;
if (result == 0) {
String value = new StringBuilder().append(d1).append(d2).toString();
result = value.hashCode();
hashCode = result;
}
return result;
}