I've recently encountered an odd situation when computing the hash Code of tuples of doubles in java. Suppose that you have the two tuples (1.0,1.0) and (Double.POSITIVE_INFINITY,Double.POSITIVE_INFINITY). Using the idiom stated in Joshua Bloch's Effective Java(Item 7), these two tuples would not be considered equal (Imagine that these tuples are objects). However, using the formula stated in Item 8 to compute hashCode() of each tuple evaluates to the same value.
So my question is: is there something strange about this formula that I missed out on when I was writing my formulas, or is it just an odd case of hash-code collisions?
Here is my short, comparative method to illustrate the situation (I wrote it as a JUnit4 test, but it should be pretty easily converted to a main method).
#Test
public void testDoubleHashCodeAndInfinity(){
double a = 1.0;
double b = 1.0;
double c = Double.POSITIVE_INFINITY;
double d = Double.POSITIVE_INFINITY;
int prime = 31;
int result1 = 17;
int result2 = 17;
long temp1 = Double.doubleToLongBits(a);
long temp2 = Double.doubleToLongBits(c);
//this assertion passes successfully
assertTrue("Double.doubleToLongBits(Double.POSITIVE_INFINITY" +
"==Double.doubleToLongBits(1.0)",temp1!=temp2);
result1 = prime*result1 + (int)(temp1^(temp1>>>32));
result2 = prime*result2 + (int)(temp2^(temp2>>>32));
//this assertion passes successfully
assertTrue("Double.POSITIVE_INFINITY.hashCode()" +
"==(1.0).hashCode()",result1!=result2);
temp1 = Double.doubleToLongBits(b);
temp2 = Double.doubleToLongBits(d);
//this assertion should pass successfully
assertTrue("Double.doubleToLongBits(Double.POSITIVE_INFINITY" +
"==Double.doubleToLongBits(1.0)",temp1!=temp2);
result1 = prime*result1+(int)(temp1^(temp1>>>32));
result2 = prime*result2+(int)(temp2^(temp2>>>32));
//this assertion fails!
assertTrue("(1.0,1.0).hashCode()==" +
"(Double.POSITIVE_INFINITY,Double.POSITIVE_INFINITY).hashCode()",
result1!=result2);
}
It's just a coincidence. However, it's an interesting one. Try this:
Double d1 = 1.0;
Double d2 = Double.POSITIVE_INFINITY;
int hash1 = d1.hashCode();
int hash2 = d2.hashCode();
// These both print -1092616192
// This was me using the wrong hash combinator *and*
// the wrong tuples... but it's interesting
System.out.println(hash1 * 17 + hash2);
System.out.println(hash2 * 17 + hash1);
// These both print -33554432
System.out.println(hash1 * 31 + hash1);
System.out.println(hash2 * 31 + hash2);
Basically the bit patterns of the hash determine this. hash1 (1.0's hash code) is 0x3ff00000 and hash2 (infinity's hash code) is 0x7ff00000. That sort of hash and those sort of multipliers produces that sort of effect...
Executive summary: it's a coincidence, but don't worry about it :)
It may be a coincidence, but that sure does not help when you are trying to use the hashCode in a Map to cache objects that have doubles in tuples. I ran into this when creating a map of Thermostat temp settings classes. Then other tests were failing because I was getting the wrong object out of the Map when using the hashCode as the key.
The solution I found to fix this was to create an appended String of the 2 double parameters and called hashCode() on the String. To avoid the String overhead I cached the hashcode.
private volatile hashCode;
#Override public int hashCode()
{
int result = hashCode;
if (result == 0) {
String value = new StringBuilder().append(d1).append(d2).toString();
result = value.hashCode();
hashCode = result;
}
return result;
}
Related
So for a given prime number 31, how can I write a hash function for a string parameter?
Here is my attempt.
private int hash(String key){
int c = 31;
int hash = 0;
for (int i = 0; i < key.length(); i++ ) {
int ascii = key.charAt(i);
hash += c * hash + ascii;
}
return (hash % sizetable);} // sizetable is an integer which is declared outside. You can see it as a table.length().
So, since I can not run any other function in my work and I need to be sure about the process here, I need your answers and help! Thank you so much.
Your implementation looks quite similar to what is documented as standard String.hashCode() implementation, this even uses also 31 as prime factor, so it should be good enough.
I just would not assign 31 to a variable, but declare a private static final field or use it directly as magic number - not OK in general, but might be OK in this case.
Additionally you should add some tests - if you already know about the concept of unit tests - to prove that your method gives different hashes for different strings. And pick the samples clever, so they are different (for the case of the homework ;)
I need a hashCode implementation in Java which ignores the order of the fields in my class Edge. It should be possible that Node first could be Node second, and second could be Node first.
Here is my method is depend on the order:
public class Edge {
private Node first, second;
#Override
public int hashCode() {
int hash = 17;
int hashMultiplikator = 79;
hash = hashMultiplikator * hash
+ first.hashCode();
hash = hashMultiplikator * hash
+ second.hashCode();
return hash;
}
}
Is there a way to compute a hash which is for the following Edges the same but unique?
Node n1 = new Node("a");
Node n2 = new Node("b");
Edge ab = new Edge(n1,n2);
Edge ba = new Edge(n2,n1);
ab.hashCode() == ba.hashCode() should be true.
You can use some sort of commutative operation instead of what you have now, like addition:
#Override
public int hashCode() {
int hash = 17;
int hashMultiplikator = 79;
int hashSum = first.hashCode() + second.hashCode();
hash = hashMultiplikator * hash * hashSum;
return hash;
}
I'd recommend that you still use the multiplier since it provides some entropy to your hash code. See my answer here, which says:
Some good rules to follow for hashing are:
Mix up your operators. By mixing your operators, you can cause the results to vary more. Using simply x * y in this test, I had a very
large number of collisions.
Use prime numbers for multiplication. Prime numbers have interesting binary properties that cause multiplication to be more volatile.
Avoid using shift operators (unless you really know what you're doing). They insert lots of zeroes or ones into the binary of the
number, decreasing volatility of other operations and potentially even
shrinking your possible number of outputs.
To solve you problem you have to combine both hashCodes of the components.
An example could be:
#Override
public int hashCode() {
int prime = 17;
return prime * (first.hashCode() + second.hashCode());
}
Please check if this matches your requirements. Also a multiplikation or an XOR insted of an addition could be possible.
I have a class which has three integers to represent it: a serverID, a streamID and an messageID.
I have some HashSet that are small but I do lots of stuff like set intersection on, and others that have 10K+ elements in.
There are only a handful of values for serverID, but they are truly random numbers with a full 32-bits of randomness. Often there is only one serverID for a whole hashtable; other times just a couple of serverIDs.
The streamID is a small number, typically 0 but may be 1 or 2 sometimes.
The messageID is sequentially increasing for each serverID/streamID pair.
I currently have:
(-messageID << 24) ^ messageID ^ serverID ^ streamID
I want to understand that I have a good hash function despite having a sequentially increasing messageID and not a lot of other bits to mix in.
What makes a good hashCode and how can I best mix these three numbers?
I personally always use strategy implemented in java.lang.String:
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
So, in your case I'd use the following: 31 * (31 * messageID + serverID) + streamID
eclipse gives it self good hashcode generation
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + messageID;
result = prime * result + serverID;
result = prime * result + streamID;
return result;
}
I created a program that calculates this algorithm starting at k=1 and ending it k=100:
Here's the code that I've created:
public static void calculatePi() {
BigInteger firstFactorial;
BigInteger secondFactorial;
BigInteger firstMultiplication;
BigInteger firstExponent;
BigInteger secondExponent;
int firstNumber = 1103;
BigInteger firstAddition;
double summationPi = 3.0;
double currentPi = 3.0;
double pi = 3.0;
int secondNumber = 2;
double thirdNumber = Math.sqrt(2.0);
int fourthNumber = 9801;
double prefix = 1;
for(int i=1;i<101;i++){
firstFactorial = factorial(4*i);
secondFactorial = factorial(i);
firstMultiplication = BigInteger.valueOf(26390*i);
firstExponent = exponent(secondFactorial, 4);
secondExponent = exponent(BigInteger.valueOf(396),4*i);
firstAddition = BigInteger.valueOf(firstNumber).add(firstMultiplication);
summationPi = firstFactorial.intValue()*firstAddition.intValue();
summationPi /= firstExponent.intValue()*secondExponent.intValue();
currentPi += summationPi;
}
prefix = secondNumber*thirdNumber;
prefix = prefix/fourthNumber;
summationPi = summationPi*prefix;
pi = 1/summationPi;
System.out.println("Pi is: " + pi);
return;
}
The function exponent(a,b); returns the result of a^b. The function factorial(a) returns the factorial of a. I have proven that both of these functions work perfectly. However, the code seems to mysteriously be returning "NaN." I understand that this happens when something is divided by zero, however I have not been able to find any point at which something is divided by zero. Is there anything else that would cause this/I'm doing wrong?
Note: In the for statement, I'm using i as k in the algorithm.
Thanks in advance!
Problem:
These lines are likely where the error is happening:
summationPi = firstFactorial.intValue()*firstAddition.intValue();
summationPi /= firstExponent.intValue()*secondExponent.intValue();
The reason being that you are calling intValue() on a BigInteger, which is not guaranteed to return the full value (since an int can only hold 32 bits of data. This could also come in to play with storing the result as a double instead of a BigDecimal).
You then take that possible NaN value and use it as the divisor in your division.
Solution:
BigDecimal currentPi = BigDecimal.ONE;
currentPi = currentPi.add(
new BigDecimal(firstFactorial.multiply(firstAddition))
.divide(new BigDecimal(firstExponent.multiply(secondExponent)), new MathContext(10000)));
Notice that I am able to eliminate summationPi by combining multiple lines into one. Also, the MathContext that comes up in the divide() method is set to 10000, this can be changed to any accuracy you want.
For more information on BigDecimal, check the API.
The cause of this problem is at this line:
summationPi /= firstExponent.intValue()*secondExponent.intValue();
where the value of the secondExponent becomes so large as i increases that if you retrieve its int value using the intValue() method, you will get 0.
I have three fields, namely
Number1
Number2
Time
I am trying to write a function in java that returns a unique hash value (long needs to be the return type of hash) for the above fields. This hash would then be used to store database rows corresponding to the above mentioned fields in a HashSet. I am new to writing a hash code function, can someone please review what I have.
public class HashCode {
private long Number1;
private long Number2;
String Time;
public HashCode(long Number1, long Number2, String Time){
this.Number1 = Number1;
this.Number2 = Number2;
this.Time = Time;
}
public long getHashCode() {
long hash = 3;
hash = 47 * hash + (long) (this.Number1 ^ (this.Number1 >>> 32));
hash = 47 * hash + (long) (this.Number2 ^ (this.Number2 >>> 32));
hash = 47 * hash + (this.Time != null ? this.Time.hashCode() : 0);
return hash;
}
}
You can just use HashCodeBuilder from commons-lang and not have to worry about doing this by hand anymore.
#Override
public int hashCode() {
// you pick a hard-coded, randomly chosen, non-zero, odd number
// ideally different for each class
return new HashCodeBuilder(17, 37).
append(Number1).
append(Number2).
append(Time).
toHashCode();
}
btw, it's convention in Java for variable names to start with a lowercase. You're going to find it confusing to name variables things like Number1, Number2, etc., as people will confuse these with the names of types (such as String, Number, Long, etc.).
I take it's a special version of hashCode. Otherwise you would need to overwrite hashCode, don't define a new method. Containers like HashSet don't get your own hash code.
So for your specialized version for long, you do not need to use the xor (^) because it's already long. Just use the long value.
When using hashCode of String it's not for long, just for int, so it will not "use" all your space. You could duplicate the hashCode of String with longs for your purpose.
else looks good.
(By the way, members should be called with lower letters and Time should be private as well.)