Accurate geometry in Java

Accurate geometry in Java - java

I'm coding an application in Java that requires quite a lot of geometry. I made heavy use of existing classes and my calculations so far have been in double precision (so I'm using for instance, Point2D.Double, Line2D.Double and coded a convex polygon class using the latter...).
I ran into several issues relating to double precision calculations that make my application at times unstable and I considered switching to BigDecimal but that would imply creating creating my own Point2D, Line2D classes with BigDecimals etc, and rewriting several functions. Another solution would be to accept the imprecisions and deal with them; i.e. A point is actually a small square, a line is a an infinite band, a point lies on a line if the square and the band intersect and so on. Although this solution can be implemented quickly my code would be disfigured by statements like (Math.abs(x) < precision) (to signify that x == 0) scattered here and there.
Is someone aware of nice clean way to do accurate geometry in Java?

I tried to squeeze (parts of) this into a comment, but it didn't fit. You should not consider this as "THE" answer, but there are some points that I would like to list here.
The recommendation to use BigDecimal is annoyingly common whenever someone mentions precision problems with float or double - and yet is equally inappropriate in such cases as this one. In all but the fewest cases, the limited precision of double is simply not relevant.
Unless, maybe, you are writing software that should compute the trajectory of a manned spacecraft that is about to be sent to Mars, or doing other highly scientific computations.
Additionally, replacing double with BigDecimal tends to only replace one small problem with several larger ones. For example, you'll have to think about the RoundingMode and "scale", which can be tricky. And eventually, you will notice that a simple value like 1.0/3.0 can't be represented with BigDecimal either.
For your particular application case, there are more caveats:
Even with a BigDecimal-based implementation of Point2D, the data would still be exposed as double, via the getX()/getY() methods. For example, a method like Line2D#ptLineDistSq will still use the double values. This could only be avoided if you wrote everything that is related to your computations, from scratch, using BigDecimal really everywhere.
But even if you did this: You cannot compute the slope of a line from the point (-1,0) to the point (2,1), and you cannot say where this line intersects the y-axis. You might try some rational number representation here, but there's still this issue with the length of the diagonal of a unit square - which is an irrational number.
The imprecisions of double are annoying. You can compute whether a point is left of a line or right of a line. And due to the precision issues, it may well be that it is both. Doing computations with points that should "mathematically" be equal, but differ by some small floating-point error can lead to bogus results (I also stumbled over this in one of my libraries).
As you already mentioned in the question: Some concepts that work in pure mathematics have to be rethought when they should be implemented with limited precision. Any == comparison is a no-go, and other comparisons should be carefully validated, taking the possible rounding errors into account.
But using some "epsilon"-based comparisons is the usual way to deal with this. Of course, they make the code a bit more clumsy. But compare this to some "arbitrary precision" code with BigDecimal:
BigDecimal computeArea(BigDecimal radius) {
// Let's be very precise here....
BigDecimal pi = new BigDecimal("3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067982148086513282306647093844609550582231725359408128481117450284102701938521105559644622948954930381964428810975665933446128475648233786783165271201909145648566923460348610454326648213393607260249141273724587006606315588174881520920962829254091715364367892590360011330530548820466521384146951941511609433057270365759591953092186117381932611793105118548074462379962749567351885752724891227938183011949129833673362440656643086021394946395224737190702179860943702770539217176293176752384674818467669405132000568127145263560827785771342757789609173637178721468440901224953430146549585371050792279689258923542019956112129021960864034418159813629774771309960518707211349999998372978049951059731732816096318595024459455346908302642522308253344685035261931188171010003137838752886587533208381420617177669147303598253490428755468731159562863882353787593751957781857780532171226806613001927876611195909216420198938095257201065485863278865936153381827968230301952035301852968995773622599413891249721775283479131515574857242454150695950829533116861727855889075098381754637464939319");
BigDecimal radiusSquared = radius.multiply(radius);
BigDecimal area = radiusSquared.multiply(pi);
return area;
}
Vs.
double computeArea(double radius) {
return Math.PI * radius * radius;
}
Also, the epsilon-based comparisons are still error-prone and raise some questions. Most prominently: How large should this "epsilon" be? Where should the epsilon-based comparison take place? However, existing implementations, like the geometric algorithms in http://www.geometrictools.com/ might give some ideas of how this can be done (even though they are implemented in C++, and became a bit less readable in the latest versions). They are time-tested and already show how to cope with many of the precision-related problems.

Related

Why does Math.tan() return different values for different levels of precision?

I'm in the middle of testing a program that reads mathematical input, and calculates the answer according to order of operations. I've come across a problem. When calculating the tangent of Math.PI / 2, it returns the value 1.633123935319537E16.
However, somewhere along the way in my program, that value can get shortened to 1.5707964, instead of 1.5707963267948966. When I call Math.tan(1.5707964), it returns a value of -1.3660249798894601E7.
I'm not asking for help in figuring out the shortening, but rather, I want to understand the divergent answers, and any other things I should watch out for when calculating trigonometric functions.

I want to understand the divergent answers
tan(π/2) is undefined
tan(π/2 - tiny amount) is very large in magnitude and positive
tan(π/2 + tiny amount) is very large in magnitude and negative
The numbers that you are passing in are not exactly π/2:
1.5707963267948966192313216... is slightly more precise value of π/2 (calculated here; more decimal places aren't necessary to illustrate the point).
1.5707963267948966 is just smaller
1.5707964 is just larger.
To illustrate, here is graph from Math Is Fun:

How to deal with float rounding errors

I'm trying to implement basic 2D vector math functions for a game, in Java. They will be intensively used by the game, so I want them to be as fast as possible.
I started with integers as the vector coordinates because the game needs nothing more precise for the coordinates, but for all calculations I still would have to change to double vectors to get a clear result (eg. intersection between two lines).
Using doubles, there are rounding errors. I could simply ignore them and use something like
d1 - d2 <= 0.0001
to compare the values, but I assume with further calculations the error could sum up until it becomes significant. So I thought I could round them after every possibly unprecise operation, but that turned out to produce much worse results, assumedly because the program also rounds unexact values (eg. 0.33333333... -> 0.3333300...).
Using BigDecimal would be far too slow.
What is the best way to solve this problem?

Inaccurate Method
When you are using numbers that require Precise calculations you need to be sure that you aren't doing something like: (and this is what it seems like you are currently doing)
This will result in the accumulation of rounding errors as the process continues; giving you extremely innacurate data long-term. In the above example, you are actually rounding off the starting float 4 times, each time it becomes more and more inaccurate!
Accurate Method
A better and more accurate way of obtaining numbers is to do this:
This will help you to avoid the accumulation of rounding errors because each calculation is based off of only 1 conversion and the results from that conversion are not compounded into the next calculation.
The best method of attack would be to start at the highest precision that is necessary, then convert on an as-needed basis, but leave the original intact. I would suggest you to follow the process from the second picture that I posted.
I started with integers as the vector coordinates because the game needs nothing more precise for the coordinates, but for all calculations I still would have to change to double vectors to get a clear result (eg. intersection between two lines).
It's important to note that you should not attempt to perform any type of rounding of your values if there is not noticeable impact on your end result; you will simply be doing more work for little to no gain, and may even suffer a performance decrease if done often enough.

This is a minor addition to the prior answer. When converting the float to an integer, it is important to round rather than just casting. In the following program, d is the largest double that is strictly less than 1.0. It could easily arise as the result of a calculation that would have result 1.0 in infinitely precise real number arithmetic.
The simple cast gets result 0. Rounding first gets result 1.
public class Test {
public static void main(String[] args) {
double d = Math.nextDown(1.0);
System.out.println(d);
System.out.println((int)d);
System.out.println((int)Math.round(d));
}
}
Output:
0.9999999999999999
0
1

Java resources to do number crunching?

What are the best resources on learning 'number crunching' using Java ? I am referring to things like correct methods of decimal number processing , best practices , API , notable idioms for performance and common pitfalls ( and their solutions ) while coding for number processing using Java.

This question seems a bit open ended and open to interpretation. As such, I will just give two short things.
1) Decimal precision - never assume that two floating point (or double) numbers are equal, even if you went through the exact same steps to calculate them both. Due to a number of issues with rounding in various situations, you often cannot be certain that a decimal number is exactly what you expect. If you do double myNumber = calculateMyNumber() and then do a bunch of things and then come back to it and check if(myNumber == calculateMyNumber(), that evaluation could be false even if you have not changed the calculations done in calculateMyNumber()
2) There are limitations in the size and precision of numbers that you can keep track of. If you have int myNumber = 2000000000 and if(myNumber*2 < myNumber), that will actually evaluate to true, as myNumber*2 will result in a number less than myNumber, because the memory allocated for the number isn't big enough to hold a number that large and it will overflow, becoming smaller than it was before. Look into classes that encapsulate large numbers, such as BigInteger and BigDecimal.
You will figure stuff like this out as a side effect if you study the computer representations of numbers, or binary representations of numbers.

First, you should learn about floating point math. This is not specific to java, but it will allow you to make informed decisions later about, for example, when it's OK to use Java primitives such as float and double. Relevant topics include (copied from a course that I took on scientific computing):
Sources of error: roundoff, truncation error, incomplete convergence, statistical error,
program bug.
Computer floating point arithmetic and the IEEE standard.
Error amplification through cancellation.
Conditioning, condition number, and error amplification.
This leads you to decisions about whether to use Java's BigDecimal, BigInteger, etc. There are lots of questions and answers about this already.
Next, you're going to hit performance, including both CPU and memory. You probably will find various rules of thumb, such as "autoboxing a lot is a serious performance problem." But as always, the best thing to do is profile your own application. For example, let's say somebody insists that some optimization is important, even though it affects legibility. If that optimization doesn't matter for your application, then don't do it!
Finally, where possible, it often pays to use a library for numerical stuff. Your first attempt probably will be slower and more buggy than existing libraries. For example, for goodness sake, don't implement your own linear programming routine.

Is it better to cast a whole expression, or just the different type variable

I am using floats for some Android Java game graphics, but the Math library trig functions all return double, so I have to explicitly cast them.
I understand that floats are quicker to process than doubles, and I do not need high precision answers.
e.g. which is better:
screenXf = (float) (shipXf + offsetXf * Math.sin(headingf) - screenMinXf);
or
screenXf = shipXf + offsetXf * (float) (Math.sin(headingf)) - floatScreenMinXf;
I suppose other questions would be 'how can I test this on an emulator without other factors (e.g. PC services) confusing the issue?' and 'Is it going to be different on different hardware anyway?'
Oh dear, that's three questions. Life is never simple :-(
-Frink

Consider using FloatMath.sin() instead.
FloatMath
Math routines similar to those found in Math. Performs computations on float values directly without incurring the overhead of conversions to and from double.
But note this blurb in the android docs:
http://developer.android.com/guide/practices/design/performance.html#avoidfloat
Designing for Performance
...
In speed terms, there's no difference between float and double on the more modern hardware. Space-wise, double is 2x larger. As with desktop machines, assuming space isn't an issue, you should prefer double to float.
Although this guy #fadden, purportedly one of the guys who wrote the VM, says:
Why are there so many floats in the Android API?
On devices without an FPU, the single-precision floating point ops are much faster than the double-precision equivalents. Because of this, the Android framework provides a FloatMath class that replicates some java.lang.Math functions, but with float arguments instead of double.
On recent Android devices with an FPU, the time required for single- and double-precision operations is about the same, and is significantly faster than the software implementation. (The "Designing for Performance" page was written for the G1, and needs to be updated to reflect various changes.)
His last sentence ("page ... need to be updated") refers to the page I referenced above, so I wonder if he is referring to that sentence about "no difference" that I quoted above.

This is definitely dependent on the HW. I know nothing about the target platforms, but on a current PC it takes the same amount of time while floats were about twice as fast as doubles on a i386.
Unless your emulator can report the cycle count, you can't find it out as the HW of your PC has little in common with the HW of the target platform. When the target platform were your PC, than I'd recommend http://code.google.com/p/caliper/ for this microbenchmark.

How to handle multiplication of numbers close to 1

I have a bunch of floating point numbers (Java doubles), most of which are very close to 1, and I need to multiply them together as part of a larger calculation. I need to do this a lot.
The problem is that while Java doubles have no problem with a number like:
0.0000000000000000000000000000000001 (1.0E-34)
they can't represent something like:
1.0000000000000000000000000000000001
Consequently of this I lose precision rapidly (the limit seems to be around 1.000000000000001 for Java's doubles).
I've considered just storing the numbers with 1 subtracted, so for example 1.0001 would be stored as 0.0001 - but the problem is that to multiply them together again I have to add 1 and at this point I lose precision.
To address this I could use BigDecimals to perform the calculation (convert to BigDecimal, add 1.0, then multiply), and then convert back to doubles afterwards, but I have serious concerns about the performance implications of this.
Can anyone see a way to do this that avoids using BigDecimal?
Edit for clarity: This is for a large-scale collaborative filter, which employs a gradient descent optimization algorithm. Accuracy is an issue because often the collaborative filter is dealing with very small numbers (such as the probability of a person clicking on an ad for a product, which may be 1 in 1000, or 1 in 10000).
Speed is an issue because the collaborative filter must be trained on tens of millions of data points, if not more.

Yep: because
(1 + x) * (1 + y) = 1 + x + y + x*y
In your case, x and y are very small, so x*y is going to be far smaller - way too small to influence the results of your computation. So as far as you're concerned,
(1 + x) * (1 + y) = 1 + x + y
This means you can store the numbers with 1 subtracted, and instead of multiplying, just add them up. As long as the results are always much less than 1, they'll be close enough to the mathematically precise results that you won't care about the difference.
EDIT: Just noticed: you say most of them are very close to 1. Obviously this technique won't work for numbers that are not close to 1 - that is, if x and y are large. But if one is large and one is small, it might still work; you only care about the magnitude of the product x*y. (And if both numbers are not close to 1, you can just use regular Java double multiplication...)

Perhaps you could use logarithms?
Logarithms conveniently reduce multiplication to addition.
Also, to take care of the initial precision loss, there is the function log1p (at least, it exists in C/C++), which returns log(1+x) without any precision loss. (e.g. log1p(1e-30) returns 1e-30 for me)
Then you can use expm1 to get the decimal part of the actual result.

Isn't this sort of situation exactly what BigDecimal is for?
Edited to add:
"Per the second-last paragraph, I would prefer to avoid BigDecimals if possible for performance reasons." – sanity
"Premature optimization is the root of all evil" - Knuth
There is a simple solution practically made to order for your problem. You are concerned it might not be fast enough, so you want to do something complicated that you think will be faster. The Knuth quote gets overused sometimes, but this is exactly the situation he was warning against. Write it the simple way. Test it. Profile it. See if it's too slow. If it is then start thinking about ways to make it faster. Don't add all this additional complex, bug-prone code until you know it's necessary.

Depending on where the numbers are coming from and how you are using them, you may want to use rationals instead of floats. Not the right answer for all cases, but when it is the right answer there's really no other.
If rationals don't fit, I'd endorse the logarithms answer.
Edit in response to your edit:
If you are dealing with numbers representing low response rates, do what scientists do:
Represent them as the excess / deficit (normalize out the 1.0 part)
Scale them. Think in terms of "parts per million" or whatever is appropriate.
This will leave you dealing with reasonable numbers for calculations.

Its worth noting that you are testing the limits of your hardware rather than Java. Java uses the 64-bit floating point in your CPU.
I suggest you test the performance of BigDecimal before you assume it won't be fast enough for you. You can still do tens of thousands of calculations per second with BigDecimal.

As David points out, you can just add the offsets up.
(1+x) * (1+y) = 1 + x + y + x*y
However, it seems risky to choose to drop out the last term. Don't. For example, try this:
x = 1e-8
y = 2e-6
z = 3e-7
w = 4e-5
What is (1+x)(1+y)(1+z)*(1+w)? In double precision, I get:
(1+x)(1+y)(1+z)*(1+w)
ans =
1.00004231009302
However, see what happens if we just do the simple additive approximation.
1 + (x+y+z+w)
ans =
1.00004231
We lost the low order bits that may have been important. This is only an issue if some of the differences from 1 in the product are at least sqrt(eps), where eps is the precision you are working in.
Try this instead:
f = #(u,v) u + v + u*v;
result = f(x,y);
result = f(result,z);
result = f(result,w);
1+result
ans =
1.00004231009302
As you can see, this gets us back to the double precision result. In fact, it is a bit more accurate, since the internal value of result is 4.23100930230249e-05.

If you really need the precision, you will have to use something like BigDecimal, even if it's slower than Double.
If you don't really need the precision, you could perhaps go with David's answer. But even if you use multiplications a lot, it might be some Premature Optimization, so BIgDecimal might be the way to go anyway

When you say "most of which are very close to 1", how many, exactly?
Maybe you could have an implicit offset of 1 in all your numbers and just work with the fractions.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.