Why is BigDecimal natural ordering inconsistent with equals? - java

From the Javadoc for BigDecimal:
Note: care should be exercised if BigDecimal objects are used as keys in a SortedMap or elements in a SortedSet since BigDecimal's natural ordering is inconsistent with equals.
For example, if you create a HashSet and add new BigDecimal("1.0") and new BigDecimal("1.00") to it, the set will contain two elements (because the values have different scales, so are non-equal according to equals and hashCode), but if you do the same thing with a TreeSet, the set will contain only one element, because the values compare as equal when you use compareTo.
Is there any specific reason behind this inconsistency?

From the OpenJDK implementation of BigDecimal:
/**
* Compares this {#code BigDecimal} with the specified
* {#code Object} for equality. Unlike {#link
* #compareTo(BigDecimal) compareTo}, this method considers two
* {#code BigDecimal} objects equal only if they are equal in
* value and scale (thus 2.0 is not equal to 2.00 when compared by
* this method).
*
* #param x {#code Object} to which this {#code BigDecimal} is
* to be compared.
* #return {#code true} if and only if the specified {#code Object} is a
* {#code BigDecimal} whose value and scale are equal to this
* {#code BigDecimal}'s.
* #see #compareTo(java.math.BigDecimal)
* #see #hashCode
*/
#Override
public boolean equals(Object x) {
if (!(x instanceof BigDecimal))
return false;
BigDecimal xDec = (BigDecimal) x;
if (x == this)
return true;
if (scale != xDec.scale)
return false;
long s = this.intCompact;
long xs = xDec.intCompact;
if (s != INFLATED) {
if (xs == INFLATED)
xs = compactValFor(xDec.intVal);
return xs == s;
} else if (xs != INFLATED)
return xs == compactValFor(this.intVal);
return this.inflate().equals(xDec.inflate());
}
More from the implementation:
* <p>Since the same numerical value can have different
* representations (with different scales), the rules of arithmetic
* and rounding must specify both the numerical result and the scale
* used in the result's representation.
Which is why the implementation of equals takes scale into consideration. The constructor that takes a string as a parameter is implemented like this:
public BigDecimal(String val) {
this(val.toCharArray(), 0, val.length());
}
where the third parameter will be used for the scale (in another constructor) which is why the strings 1.0 and 1.00 will create different BigDecimals (with different scales).
From Effective Java By Joshua Bloch:
The final paragraph of the compareTo contract, which is a strong
suggestion rather than a true provision, simply states that the
equality test imposed by the compareTo method should generally return
the same results as the equals method. If this provision is obeyed,
the ordering imposed by the compareTo method is said to be consistent
with equals. If it’s violated, the ordering is said to be inconsistent
with equals. A class whose compareTo method imposes an order that is
inconsistent with equals will still work, but sorted collections
containing elements of the class may not obey the general contract of
the appropriate collection interfaces (Collection, Set, or Map). This
is because the general contracts for these interfaces are defined in
terms of the equals method, but sorted collections use the equality
test imposed by compareTo in place of equals. It is not a catastrophe
if this happens, but it’s something to be aware of.

The behaviour seems reasonable in the context of arithmetic precision where trailing zeros are significant figures and 1.0 does not carry the same meaning as 1.00. Making them unequal seems to be a reasonable choice.
However from a comparison perspective neither of the two is greater or less than the other and the Comparable interface requires a total order (i.e. each BigDecimal must be comparable with any other BigDecimal). The only reasonable option here was to define a total order such that the compareTo method would consider the two numbers equal.
Note that inconsistency between equal and compareTo is not a problem as long as it's documented. It is even sometimes exactly what one needs.

BigDecimal works by having two numbers, an integer and a scale. The integer is the "number" and the scale is the number of digits to the right of the decimal place. Basically a base 10 floating point number.
When you say "1.0" and "1.00" these are technically different values in BigDecimal notation:
1.0
integer: 10
scale: 1
precision: 2
= 10 x 10 ^ -1
1.00
integer: 100
scale: 2
precision: 3
= 100 x 10 ^ -2
In scientific notation you wouldn't do either of those, it should be 1 x 10 ^ 0 or just 1, but BigDecimal allows it.
In compareTo the scale is ignored and they are evaluated as ordinary numbers, 1 == 1. In equals the integer and scale values are compared, 10 != 100 and 1 != 2. The BigDecimal equals method ignores the object == this check I assume because the intention is that each BigDecimal is treated as a type of number, not like an object.
I would liken it to this:
// same number, different types
float floatOne = 1.0f;
double doubleOne = 1.0;
// true: 1 == 1
System.out.println( (double)floatOne == doubleOne );
// also compare a float to a double
Float boxFloat = floatOne;
Double boxDouble = doubleOne;
// false: one is 32-bit and the other is 64-bit
System.out.println( boxInt.equals(boxDouble) );
// BigDecimal should behave essentially the same way
BigDecimal bdOne1 = new BigDecimal("1.0");
BigDecimal bdOne2 = new BigDecimal("1.00");
// true: 1 == 1
System.out.println( bdOne1.compareTo(bdOne2) );
// false: 10 != 100 and 1 != 2 ensuring 2 digits != 3 digits
System.out.println( bdOne1.equals(bdOne2) );
Because BigDecimal allows for a specific "precision", comparing both the integer and the scale is more or less the same as comparing both the number and the precision.
Although there is a semi-caveat to that when talking about BigDecimal's precision() method which always returns 1 if the BigDecimal is 0. In this case compareTo && precision evaluates true and equals evaluates false. But 0 * 10 ^ -1 should not equal 0 * 10 ^ -2 because the former is a 2 digit number 0.0 and the latter is a 3 digit number 0.00. The equals method is comparing both the value and the number of digits.
I suppose it is weird that BigDecimal allows trailing zeroes but this is basically necessary. Doing a mathematical operation like "1.1" + "1.01" requires a conversion but "1.10" + "1.01" doesn't.
So compareTo compares BigDecimals as numbers and equals compares BigDecimals as BigDecimals.
If the comparison is unwanted, use a List or array where this doesn't matter. HashSet and TreeSet are of course designed specifically for holding unique elements.

The answer is pretty short. equals() method compares objects while compareTo() compares values. In case of BigDecimal different objects can represent same value. Thats why equals() might return false, while compareTo() returns 0.
equal objects => equal values
equal values =/> equal objects
Object is just a computer representation of a some real world value. For example same picture might be represented in a GIF and JPEG formats. Thats very like BigDecimal, where same value might have distinct representations.

Related

Bad Hash Function [duplicate]

The accepted answer in Best implementation for hashCode method gives a seemingly good method for finding Hash Codes. But I'm new to Hash Codes, so I don't quite know what to do.
For 1), does it matter what nonzero value I choose? Is 1 just as good as other numbers such as the prime 31?
For 2), do I add each value to c? What if I have two fields that are both a long, int, double, etc?
Did I interpret it right in this class:
public MyClass{
long a, b, c; // these are the only fields
//some code and methods
public int hashCode(){
return 37 * (37 * ((int) (a ^ (a >>> 32))) + (int) (b ^ (b >>> 32)))
+ (int) (c ^ (c >>> 32));
}
}
The value is not important, it can be whatever you want. Prime numbers will result in a better distribution of the hashCode values therefore they are preferred.
You do not necessary have to add them, you are free to implement whatever algorithm you want, as long as it fulfills the hashCode contract:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
There are some algorithms which can be considered as not good hashCode implementations, simple adding of the attributes values being one of them. The reason for that is, if you have a class which has two fields, Integer a, Integer b and your hashCode() just sums up these values then the distribution of the hashCode values is highly depended on the values your instances store. For example, if most of the values of a are between 0-10 and b are between 0-10 then the hashCode values are be between 0-20. This implies that if you store the instance of this class in e.g. HashMap numerous instances will be stored in the same bucket (because numerous instances with different a and b values but with the same sum will be put inside the same bucket). This will have bad impact on the performance of the operations on the map, because when doing a lookup all the elements from the bucket will be compared using equals().
Regarding the algorithm, it looks fine, it is very similar to the one that Eclipse generates, but it is using a different prime number, 31 not 37:
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + (int) (a ^ (a >>> 32));
result = prime * result + (int) (b ^ (b >>> 32));
result = prime * result + (int) (c ^ (c >>> 32));
return result;
}
A well-behaved hashcode method already exists for long values - don't reinvent the wheel:
int hashCode = Long.hashCode((a * 31 + b) * 31 + c); // Java 8+
int hashCode = Long.valueOf((a * 31 + b) * 31 + c).hashCode() // Java <8
Multiplying by a prime number (usually 31 in JDK classes) and cumulating the sum is a common method of creating a "unique" number from several numbers.
The hashCode() method of Long keeps the result properly distributed across the int range, making the hash "well behaved" (basically pseudo random).

Does Java Float.compare() Always Produce The Correct Result?

This seems like a question that I could easily find an answer for but I couldn't see any entry on this. I know how the floating-point arithmetic works and in order to compare floating numbers I need to use an epsilon check. When I shared this with my team, one of my colleagues asked me this and I couldn't answer.
Does the compare method on Java always produce the correct result, i.e, the result an epsilon check for f1 and f2 would yield?
Float.compare(float f1, float f2);
Note: Especially consider this question for the equality case.
No, Float.compare does not use any kind of epsilon checking.
Here's for example the OpenJDK 13 implementation of the method:
/**
* Compares the two specified {#code float} values. The sign
* of the integer value returned is the same as that of the
* integer that would be returned by the call:
* <pre>
* new Float(f1).compareTo(new Float(f2))
* </pre>
*
* #param f1 the first {#code float} to compare.
* #param f2 the second {#code float} to compare.
* #return the value {#code 0} if {#code f1} is
* numerically equal to {#code f2}; a value less than
* {#code 0} if {#code f1} is numerically less than
* {#code f2}; and a value greater than {#code 0}
* if {#code f1} is numerically greater than
* {#code f2}.
* #since 1.4
*/
public static int compare(float f1, float f2) {
if (f1 < f2)
return -1; // Neither val is NaN, thisVal is smaller
if (f1 > f2)
return 1; // Neither val is NaN, thisVal is larger
// Cannot use floatToRawIntBits because of possibility of NaNs.
int thisBits = Float.floatToIntBits(f1);
int anotherBits = Float.floatToIntBits(f2);
return (thisBits == anotherBits ? 0 : // Values are equal
(thisBits < anotherBits ? -1 : // (-0.0, 0.0) or (!NaN, NaN)
1)); // (0.0, -0.0) or (NaN, !NaN)
}
Source
If you read the Javadoc of Float.compare() they talk about "numerically equal". That means that values that represent the same theoretical number, but are differently encoded, are considered equal. Examples are 0.0 == -0.0 and subnormal numbers.
Epsilon checks (where you check if two numbers are within a range of one other, typically a very small number), is not a de-facto standard, and has a lot of practical issues (like which epsilon to choose when you don't know the magnutide of the numbers, or what to do when the two numbers have vastly different magnitudes). Because of that reason, Java only implements exact operations.
No, java.lang.Float.compare( double, double ) does not return 0 (equal) for values within a hidden epsilon value of each other.
However, you can easily do such a comparison by hand, or with one of a few common libraries.
By hand, you can write a function that returns zero if a fuzzy check for equality passes, else returns Float.compare(). The fuzzy check for equality can be something like Math.abs( f1 - f2 ) < epsilon.
Alternatively, in Apache Commons Math, the Precision class provides compares() and equals() methods for floats and doubles that accept an epsilon value. And the class provides a value for epsilon in constant EPSILON.
Alternatively, in Google Guava, the DoubleMath class provides fuzzyCompare() and fuzzyEquals() methods that accept doubles and an epsilon value.
Sounds like you need to check that two float values are equal within the given non-negative delta, correct?
You can use JUnit's Assertions.assertEquals() method to achieve that:
assertEquals​(float f1, float f2, float delta)
Commonly done with a test like: if (abs(a-b) < DELTA) ... where DELTA is globally declared as some very-small numeric constant for consistency throughout the program. Since the test is so very simple, I like to explicitly state it.

Collision strength of Java's Arrays.hashCode

How strong is the hashing mechanism that is used in the Arrays.hashCode methods against collision? What is the possibility of two different arrays (of, say, double) to have an exact hash value calculated with these methods?
Arrays.hashCode(double[]) is specified to return the equivalent value of a List containing Double values representing the same numeric value.
List.hashCode in turn is specified with a fairly simple algorithm:
int hashCode = 1;
for (E e : list)
hashCode = 31*hashCode + (e==null ? 0 : e.hashCode());
In general the multiplication with a prime number is a good practice for general-purpose hash functions, but it's far from a cryptographically strong hash function.
This means that while collisions are unlikely in the general (effectively random) case, they can usually be constructed quite easily if you can influence (or select) the hashCode of the items in the List.
As a constructed example consider these two statements:
System.out.println(Arrays.hashCode(new double[] {4.753E-321d}));
System.out.println(Arrays.hashCode(new double[] {4.9E-324d, 4.9E-324d}));
Both of these will output 993, despite being clearly different arrays.
This is the implementation of Arrays.hashCode that you use
public static int hashCode(int a[]) {
if (a == null)
return 0;
int result = 1;
for (int element : a)
result = 31 * result + element;
return result;
}
If your values happen to be smaller then 31 they are treated like distinct numbers in the base 31, so each result in a different numbers (if we ignore overflows for now). Lets call those pure hashes
Now of course 31^11 is way larger then the number of integers in Java, so we will get tons of overflows. But since the powers of 31 and the maximum integer are "very different" you don't get a almost random distribution, but a very regular uniform distribution.
Lets consider a smaller example. I assume you have only 2 elements in your array and the range from 0 to 5 each. I try to create "hashCode" between 0 and 37 by taking the modulo 38 of the "pure hash" The result is that I get streaks of 5 integers with small gaps in between, and not a single collision.
val hashes = for {
i <- 0 to 4
j <- 0 to 4
} yield (i * 31 + j) % 38
enter code here
println(hashes.size) // prints 25
println(hashes.toSet.size) // prints 25
To verify if this is what happens to your numbers you might create a graph as follows: For each hash take the first 16 bits for x and and the second 16 bits for y, color that dot black. I bet you will see an extremely regular pattern.

Meaning of Double.doubleToLongBits(x)

I am writing a class Vec2D, representing a 2 dimensional vector. I store x and y in doubles.
When asked to generate equals(Object obj and hashCode(), eclipse generated this:
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
long temp;
temp = Double.doubleToLongBits(x);
result = prime * result + (int) (temp ^ (temp >>> 32));
temp = Double.doubleToLongBits(y);
result = prime * result + (int) (temp ^ (temp >>> 32));
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Vec2D other = (Vec2D) obj;
if (Double.doubleToLongBits(x) != Double.doubleToLongBits(other.x))
return false;
if (Double.doubleToLongBits(y) != Double.doubleToLongBits(other.y))
return false;
return true;
}
What is the significance of Double.doubleToLongBits(x) in this context? Can I not simply write x != other.x?
Short answer: Eclipse uses Double.doubleToLongBits because that's what Double.equals does:
The result is true if and only if the argument is not null and is a Double object that represents a double that has the same value as the double represented by this object. For this purpose, two double values are considered to be the same if and only if the method doubleToLongBits(double) returns the identical long value when applied to each.
Long answer: the JLS specifies a few differences between Double.equals and ==. For one difference specified in JLS 4.2.3 and JLS 15.21.1:
Positive zero and negative zero compare equal; thus the result of the expression 0.0==-0.0 is true and the result of 0.0>-0.0 is false. But other operations can distinguish positive and negative zero; for example, 1.0/0.0 has the value positive infinity, while the value of 1.0/-0.0 is negative infinity.
Another regards NaN:
If either operand is NaN, then the result of == is false but the result of != is true.
Indeed, the test x!=x is true if and only if the value of x is NaN.
As you can see, it's possible for two double values to compare with == but actually correspond to different behavior when used in math and hash tables. Thus, when writing a generated equality method, Eclipse makes the assumption that two doubles are only equal if and only if all operations that can be done to them are identical, or (equivalently) if they were autoboxed and compared with their equals methods. This is particularly important if switching between double and Double—it would be particularly unexpected for equality properties to differ there.
Of course, you're free to drift from that assumption: Regardless of whether it's a good idea, you may assign special cases to any of the many possible NaN representations, in which case Double.doubleToRawLongBits() would be a better match for your equals and hashCode methods. By the same token, your use case might treat objects with +0.0 and -0.0 as equivalent and guarantee that NaN values are not possible, in which case a raw == comparison may work better for equals (but at which point emulating the same criteria for hashCode becomes difficult).
Because == and != follow IEEE-754 semantics for doubles, Double.NaN != Double.NaN and 0.0 == -0.0. These behaviors may not be what you want, so Double.doubleToLongBits() converts the 64 bits of double data to 64 bits of long data so that operations like bit shifts and XOR work.
Honestly, though, I would say that the use of doubleToLongBits is a bug here, since if you care about exact equality you should be using Double.doubleToRawLongBits() (which does not perform any translations on the double data at all) instead.
A quick glance at the online javadoc yields this:
Returns a representation of the specified floating-point value according to the IEEE 754 floating-point "double format" bit layout.
...
In all cases, the result is a long integer that, when given to the longBitsToDouble(long) method, will produce a floating-point value the same as the argument to doubleToLongBits (except all NaN values are collapsed to a single "canonical" NaN value).
So it's probably a way of standardizing the double representations of x and y, as NaN can have multiple double representations

Hashcode generated by Eclipse

In SO I have read several answers related to the implementation of hashcode and the suggestion to use the XOR operator. (E.g. Why are XOR often used in java hashCode() but another bitwise operators are used rarely?).
When I use Eclipse to generate the hashcode function where field is an object and timestamp a long, the output is:
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result
+ field == null) ? 0 : field.hashCode());
return result;
}
Is there any reason by not using the XOR operator like below?
result = prime * result + (int) (timestamp ^ (timestamp >>> 32));
Eclipse takes the safe way out. Although the calculation method that uses a prime, a multiplication, and an addition is slower than a single XOR, it gives you an overall better hash code in situations when you have multiple fields.
Consider a simple example - a class with two Strings, a and b. You can use
a.hashCode() ^ b.hashCode()
or
a.hashCode() * 31 + b.hashCode()
Now consider two objects:
a = "ABC"; b = "XYZ"
and
a = "XYZ"; b = "ABC"
The first method will produce identical hash codes for them, because XOR is symmetric; the second method will produce different hash codes, which is good, because the objects are not equal. In general, you want non-equal objects to have different hash codes as often as possible, to improve performance of hash-based containers of these objects. The 31*a+b method achieves this goal better than XOR.
Note that when you are dealing with portions of the same object, as in
timestamp ^ (timestamp >>> 32)
the above argument is much weaker: encountering two timestamps such that the only difference between them is that their upper and lower parts are swapped is harder to imagine than two objects with swapped a and b field values.

Categories