Robust Map<Double, sth> in Java - java

I am looking for a robust Map in Java, where the key lookup would take into account that Double has a limited precision (something around 1e-15 or 1e-16). Where could I find such a thing?
EDIT: Following Jon's advice I think it would make sense to define equivalence. One idea would be to center these at numbers rounded to 15 most relevant decimal digits. Other numbers would be rounded (in any consistent way - the fastest to implement). Would this make sense? What would be the best implementation?

I'd suggest you to use TreeMap and implement your own custom comparator that compares 2 double values taking into account the required precision.

IMHO The best approach is to normalise the values before adding or looking up values. e.g. by using rounding.
BTW: You can use TDoubleObjectHashMap which support custom hash strategies and uses primitive double keys.

I'm not completely sure what you need it for, but you can implement a wrapper around Double and override its hashCode() and equals() methods to meet your "limited precision" lookup. Therefore any Map implementation will be robust, because it relies on hashCode() an equals() for key lookup.
Of course, your map will be in a form Map<DoubleWrapper, smth>.

Summing up answers and comments above, I ended up with the following wrapper (which probably doesn't handle NaN atm):
public static class DoubleWrapper {
private static final int PRECISION = 15;
private final Double roundedValue;
public DoubleWrapper(double value) {
final double d = Math.ceil(Math.log10(value < 0 ? -value: value));
final int power = PRECISION - (int) d;
final double magnitude = Math.pow(10, power);
final long shifted = Math.round(value*magnitude);
roundedValue = shifted/magnitude;
}
public double getDouble() {
return roundedValue;
}
#Override
public boolean equals(Object obj) {
return roundedValue.equals(obj);
}
#Override
public int hashCode() {
return roundedValue.hashCode();
}
}

Related

How to efficiently store a set of tuples/pairs in Java

I need to perform a check if the combination of a long value and an integer value were already seen before in a very performance-critical part of an application. Both values can become quite large, at least the long will use more than MAX_INT values in some cases.
Currently I have a very simple implementation using a Set<Pair<Integer, Long>>, however this will require too many allocations, because even when the object is already in the set, something like seen.add(Pair.of(i, l)) to add/check existence would allocate the Pair for each call.
Is there a better way in Java (without libraries like Guava, Trove or Apache Commons), to do this check with minimal allocations and in good O(?)?
Two ints would be easy because I could combine them into one long in the Set, but the long cannot be avoided here.
Any suggestions?
Here are two possibilities.
One thing in both of the following suggestions is to store a bunch of pairs together as triple ints in an int[]. The first int would be the int and the next two ints would be the upper and lower half of the long.
If you didn't mind a 33% extra space disadvantage in exchange for an addressing speed advantage, you could use a long[] instead and store the int and long in separate indexes.
You'd never call an equals method. You'd just compare the three ints with three other ints, which would be very fast. You'd never call a compareTo method. You'd just do a custom lexicographic comparison of the three ints, which would be very fast.
B* tree
If memory usage is the ultimate concern, you can make a B* tree using an int[][] or an ArrayList<int[]>. B* trees are relatively quick and fairly compact.
There are also other types of B-trees that might be more appropriate to your particular use case.
Custom hash set
You can also implement a custom hash set with a custom, fast-calculated hash function (perhaps XOR the int and the upper and lower halves of the long together, which will be very fast) rather than relying on the hashCode method.
You'd have to figure out how to implement the int[] buckets to best suit the performance of your application. For example, how do you want to convert your custom hash code into a bucket number? Do you want to rebucket everything when the buckets start getting too many elements? And so on.
How about creating a class that holds two primitives instead? You would drop at least 24 bytes just for the headers of Integer and Long in a 64 bit JVM.
Under this conditions you are looking for a Pairing Function, or generate an unique number from 2 numbers. That wikipeia page has a very good example (and simple) of one such possibility.
How about
class Pair {
int v1;
long v2;
#Override
public boolean equals(Object o) {
return v1 == ((Pair) o).v1 && v2 == ((Pair) o).v2;
}
#Override
public int hashCode() {
return 31 * (31 + Integer.hashCode(v1)) + Long.hashCode(v2);
}
}
class Store {
// initial capacity should be tweaked
private static final Set<Pair> store = new HashSet<>(100*1024);
private static final ThreadLocal<Pair> threadPairUsedForContains = new ThreadLocal<>();
void init() { // each thread has to call init() first
threadPairUsedForContains.set(new Pair());
}
boolean contains(int v1, long v2) { // zero allocation contains()
Pair pair = threadPairUsedForContains.get();
pair.v1 = v1;
pair.v2 = v2;
return store.contains(pair);
}
void add(int v1, long v2) {
Pair pair = new Pair();
pair.v1 = v1;
pair.v2 = v2;
store.add(pair);
}
}

Sum of BigDecimal(s) created from possible null Double

In order to avoid possible loss of precision in Java operation on Double objects i.e.:
Double totalDouble = new Double(1590.0);
Double taxesDouble = new Double(141.11);
Double totalwithTaxes = Double.sum(totalDouble,taxesDouble);
//KO: 1731.1100000000001
System.out.println(totalwithTaxes); // 1731.1100000000001
I wrote this code, where totalDouble and taxesDouble could be also null:
Double totalDouble = myObject.getTotalDouble();
Double taxesDouble = myObject.getTaxesDouble();
BigDecimal totalBigDecimalNotNull = (totalDouble==null) ? BigDecimal.valueOf(0d):BigDecimal.valueOf(totalDouble);
BigDecimal taxesBigDecimalNotNull = (taxesDouble==null) ? BigDecimal.valueOf(0d):BigDecimal.valueOf(taxesDouble);
BigDecimal totalWithTaxesBigDecimal = totalBigDecimalNotNull.add(taxesBigDecimalNotNull);
System.out.println(totalWithTaxesBigDecimal);
Is there a better way (also with third part libraries i.e. guava, etc) to initialize BigDecimal in this cases (zero if Double is null and Double value otherwise)?
Not really. That is to say, you're still going to need to make a decision based on whether or not the value is null, but you can do it cleaner if you use the Optional pattern.
You can change your getTotalDouble and getTaxesDouble returns to Optional<Double> instead to mititgate having to do the ternary...
public Optional<Double> getTotalDouble() {
return Optional.ofNullable(totalDouble);
}
public Optional<Double> getTaxesDouble() {
return Optional.ofNullable(taxesDouble);
}
...then, you can use the conditional evaluation provided by Optional itself to evaluate and return a default value.
BigDecimal totalBigDecimalNotNull =
BigDecimal.valueOf(myObject.getTotalDouble().orElse(0d));
A simplification would be to return Optional<BigDecimal> instead, as opposed to transforming the value that you want in this fashion.
As an addendum, be careful when talking about precision. There is standing advice to use either int or long instead to ensure you don't lose any coin precision.
Whether you use Optional or not I recommend creating a static helper method so that you don't have to repeat yourself. e.g.:
public static BigDecimal bigDecimalValueOfOrZero(Double val) {
return val == null ? BigDecimal.ZERO : BigDecimal.valueOf(val);
}

Overriding HashCode() in Java with double values

I know this is asked a lot, and I don't know if I quite understand hash codes, but it's supposed to be the address and so how do I fix my particular example? If my understanding is correct, I have doubles in my class, but I can't add them to the hash code, because of
possible loss of precision
found : double
required: int
return this.area();
Here is my Shape class:
abstract class Shape implements Comparable<Shape>
{
abstract double area();
public int compareTo(Shape sh){
return Double.compare(this.area(),sh.area());
}
public int hashCode() {
return this.area();
}
public boolean equals(Shape sh) {
if ( sh instanceof Shape && this.area()==sh.area() ) {
return true;
} else {
return false ;
}
}
}
Is area() the only value I need to worry about in hashCode()?
You can use the example of hashCode in Double class :
public int hashCode() {
long bits = doubleToLongBits(value);
return (int)(bits ^ (bits >>> 32));
}
This would avoid the loss of precision caused by simply casting the double to int.
If the area is the only property that determines the hashCode, you can use the exact same code, replacing value with area.
However, I'm not sure area is a good candidate for hashCode calculation, since it is itself calculated from properties of the sub-classes of Shape. You should probably implement hashCode in each sub-class of Shape based on the specific properties of that sub-class.
Don't just add the numbers together to produce a hash code; that's very likely to get duplicate hash codes for unequal objects close together. Instead, I recommend using either the Objects.hash method from the standard Java API or the more expressive and efficient HashCodeBuilder from Apache Commons-Lang. You should include in the hashCode calculation exactly the same fields that you use to determine equals.
Of course, as #khelwood pointed out, you very likely don't want to implement equals and hashCode on this abstract object, since a 1x4 rectangle and a 2x2 rectangle probably aren't equal. Instead, you can re-declare those methods as abstract on Shape to force subclasses to implement them:
public abstract class Shape {
#Override
public abstract int hashCode();
}
You could do:
public int hashCode() {
return Double.valueOf(this.area()).hashCode();
}
-- Edition: corrected valueOf method name. Thank you #user4254704.

How can I effectively use floats in equals()?

I have the following immutable HSL class that I use to represent a colour and aide in calculations on RGBA colours with a little more finesse.
public class HSL {
protected final float hue;
protected final float saturation;
protected final float lightness;
public HSL(float hue, float saturation, float lightness) {
this.hue = hue;
this.saturation = saturation;
this.lightness = lightness;
}
// [snip] Removed some calculation helper functions
#Override
public String toString() {
return "HSL(" + hue + ", " + saturation + ", " + lightness + ")";
}
#Override
public int hashCode() {
return
37 * Float.floatToIntBits(hue) +
37 * Float.floatToIntBits(saturation) +
37 * Float.floatToIntBits(lightness);
}
#Override
public boolean equals(Object o) {
if (o == null || !(o instanceof HSL)) {
return false;
}
HSL hsl = (HSL)o;
// We're only worried about 4 decimal places of accuracy. That's more than the 24b RGB space can represent anyway.
return
Math.abs(this.hue - hsl.hue) < 0.0001f &&
Math.abs(this.lightness - hsl.lightness) < 0.0001f &&
Math.abs(this.saturation - hsl.saturation) < 0.0001f;
}
}
While I don't foresee using this particular class in a HashMap or similar as it's an intermediary between a similar RGBA class which uses ints for it's internal storage, I'm somewhat concerned about the general case of using floats in equality. You can only make the epsilon for comparison so small, and even if you could make it arbitrarily small there are many float values that can be represented internally in multiple ways leading to different values returned from Float.floatToIntBits. What is the best way to get around all this? Or is it not an issue in reality and I'm just overthinking things?
The other answers do a great job of explaining why your current design may cause problems. Rather than repeat that, I'll propose a solution.
If you care about the accuracy only to a certain number of decimal places (it appears 4), you could multiply all incoming float values by 10,000 and manage them as long values. Then your equality and hashcode calculations are exact.
I assume you've omitted the getters from your class for brevity. If so, ensure your Javadocs clearly explain the loss of precision that will occur when passing a float into your class constructor and retrieving it from a getter.
This definition of equals() is not transitive. It really shouldn't be called equals(). This and the hashCode() issue would most likely silently bite when used in the Collections API. Things like HashSet would not work as expected and methods like remove(). For purposes here you should test for exact equality.
I think you are correct to be concerned about the general case of hashCode() diverging heavily from equals().
The violation of the general convention that hashes of two "equal" objects should have the same hashCode() will most likely lead to all sorts of unexpected behavior if this object is used in the future.
For starters, any library code that makes the usual assumption that unequal hashCodes implies unequal objects will find a lot of unequal objects where it should have found equal objects, because the hashCode check usually comes first, for performance.
hashcode is like defining a bucket that an object should reside in. With hashing, if an object is not in its right bucket, you cannot find that with the equals() method.
Therefore, all the equal objects must reside in the same bucket (i.e., their hashcode() method should return the same result) or the results would be unpredictable.
I think you could try to weaken the hashCode implementation to prevent it from violating contract with equals - I mean ensure that when equals returns true, then hashCode returns the same value but not necessarily the other way around.

Creating a HashSet for Doubles

I wish to create a HashSet for real numbers (at present Doubles) using a defined tolerance (epsilon), (cf Assert.assertEquals(double, double, double)
Since using Double.equals() only works for exact equality and Double is a final class I can't use it. My initial idea is to extend HashSet (e.g. to DoubleHashSet), with a setEpsilon(double) method and create a new class ComparableDouble where equals() uses this value from DoubleHashSet. However I'd like to check whether there are existing solutions already and existing F/OSS libraries.
(In the future I shall want to extend this to tuples of real numbers - e.g. rectangles and cubes - so a generic approach is preferable
NOTE: #NPE has suggested it's impossible. Unfortunately I suspect this is formally correct :-) So I'm wondering if there are approximate methods ... Others must have had this problem and solved it approximately. (I already regularly use a tool Real.isEqual(a, b, epsilon) and it's very useful.) I am prepared to accept some infrequent errors of transitivity.
NOTE: I shall use a TreeSet as that solves the problem of "nearly equals()". Later I shall be comparing complexNumbers, rectangles (and more complex objects) and it's really useful to be able to set a limit within which 2 things are equal. There is no simple natural ordering of complexNumbers (perhaps a Cantor approach would work), but we can tell whether they are nearly equal.
There are some fundamental flaws in this approach.
HashSet uses equals() to check two elements for equality. The contract on equals() has the following among its requirements:
It is transitive: for any non-null reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
Now consider the following example:
x = 0.0
y = 0.9 * epsilon
z = 1.8 * epsilon
It is clear that your proposed comparison scheme would break the transitivity requirement (x equals y and y equals z, yet x doesn't equal z). In these circumstances, HashSet cannot function correctly.
Furthermore, hashCode() will produce additional challenges, due to the following requirement:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
The hashCode() requirement can be sidestepped by using a TreeSet instead of HashSet.
What I would do is round the doubles before using them (assuming this is appropriate)
e.g.
public static double roundByFactor(double d, long factor) {
return (double) Math.round(d * factor) / factor;
}
TDoubleHashSet set = new TDoubleHashSet(); // more efficient than HashSet<Double>
set.add(roundByFactor(1.001, 100));
set.add(roundByFactor(1.005, 100));
set.add(roundByFactor(1.01, 100));
// set has two elements.
You can wrap this behaviour in your own DoubleHashSet. If you want to reserve the original value you can use HashMap or TDoubleDoubleHashMap where the key is the rounded value and the value is the original.
I have implemented #NPE's approach (I have accepted his/her answer so s/he gets the points :-) and give the code here
//Create a comparator:
public class RealComparator implements Comparator<Double> {
private double epsilon = 0.0d;
public RealComparator(double eps) {
this.setEpsilon(eps);
}
/**
* if Math.abs(d0-d1) <= epsilon
* return -1 if either arg is null
*/
public int compare(Double d0, Double d1) {
if (d0 == null || d1 == null) {
return -1;
}
double delta = Math.abs(d0 - d1);
if (delta <= epsilon) {
return 0;
}
return (d0 < d1) ? -1 : 1;
}
/** set the tolerance
* negative values are converted to positive
* #param epsilon
*/
public void setEpsilon(double epsilon) {
this.epsilon = Math.abs(epsilon);
}
and test it
public final static Double ONE = 1.0;
public final static Double THREE = 3.0;
#Test
public void testTreeSet(){
RealComparator comparator = new RealComparator(0.0);
Set<Double> set = new TreeSet<Double>(comparator);
set.add(ONE);
set.add(ONE);
set.add(THREE);
Assert.assertEquals(2, set.size());
}
#Test
public void testTreeSet1(){
RealComparator comparator = new RealComparator(0.0);
Set<Double> set = new TreeSet<Double>(comparator);
set.add(ONE);
set.add(ONE-0.001);
set.add(THREE);
Assert.assertEquals(3, set.size());
}
#Test
public void testTreeSet2(){
RealComparator comparator = new RealComparator(0.01);
Set<Double> set = new TreeSet<Double>(comparator);
set.add(ONE);
set.add(ONE - 0.001);
set.add(THREE);
Assert.assertEquals(2, set.size());
}
#Test
public void testTreeSet3(){
RealComparator comparator = new RealComparator(0.01);
Set<Double> set = new TreeSet<Double>(comparator);
set.add(ONE - 0.001);
set.add(ONE);
set.add(THREE);
Assert.assertEquals(2, set.size());
}

Categories