How to deal with float rounding errors

How to deal with float rounding errors - java

I'm trying to implement basic 2D vector math functions for a game, in Java. They will be intensively used by the game, so I want them to be as fast as possible.
I started with integers as the vector coordinates because the game needs nothing more precise for the coordinates, but for all calculations I still would have to change to double vectors to get a clear result (eg. intersection between two lines).
Using doubles, there are rounding errors. I could simply ignore them and use something like
d1 - d2 <= 0.0001
to compare the values, but I assume with further calculations the error could sum up until it becomes significant. So I thought I could round them after every possibly unprecise operation, but that turned out to produce much worse results, assumedly because the program also rounds unexact values (eg. 0.33333333... -> 0.3333300...).
Using BigDecimal would be far too slow.
What is the best way to solve this problem?

Inaccurate Method
When you are using numbers that require Precise calculations you need to be sure that you aren't doing something like: (and this is what it seems like you are currently doing)
This will result in the accumulation of rounding errors as the process continues; giving you extremely innacurate data long-term. In the above example, you are actually rounding off the starting float 4 times, each time it becomes more and more inaccurate!
Accurate Method
A better and more accurate way of obtaining numbers is to do this:
This will help you to avoid the accumulation of rounding errors because each calculation is based off of only 1 conversion and the results from that conversion are not compounded into the next calculation.
The best method of attack would be to start at the highest precision that is necessary, then convert on an as-needed basis, but leave the original intact. I would suggest you to follow the process from the second picture that I posted.
I started with integers as the vector coordinates because the game needs nothing more precise for the coordinates, but for all calculations I still would have to change to double vectors to get a clear result (eg. intersection between two lines).
It's important to note that you should not attempt to perform any type of rounding of your values if there is not noticeable impact on your end result; you will simply be doing more work for little to no gain, and may even suffer a performance decrease if done often enough.

This is a minor addition to the prior answer. When converting the float to an integer, it is important to round rather than just casting. In the following program, d is the largest double that is strictly less than 1.0. It could easily arise as the result of a calculation that would have result 1.0 in infinitely precise real number arithmetic.
The simple cast gets result 0. Rounding first gets result 1.
public class Test {
public static void main(String[] args) {
double d = Math.nextDown(1.0);
System.out.println(d);
System.out.println((int)d);
System.out.println((int)Math.round(d));
}
}
Output:
0.9999999999999999
0
1

Related

Why does Math.tan() return different values for different levels of precision?

I'm in the middle of testing a program that reads mathematical input, and calculates the answer according to order of operations. I've come across a problem. When calculating the tangent of Math.PI / 2, it returns the value 1.633123935319537E16.
However, somewhere along the way in my program, that value can get shortened to 1.5707964, instead of 1.5707963267948966. When I call Math.tan(1.5707964), it returns a value of -1.3660249798894601E7.
I'm not asking for help in figuring out the shortening, but rather, I want to understand the divergent answers, and any other things I should watch out for when calculating trigonometric functions.

I want to understand the divergent answers
tan(π/2) is undefined
tan(π/2 - tiny amount) is very large in magnitude and positive
tan(π/2 + tiny amount) is very large in magnitude and negative
The numbers that you are passing in are not exactly π/2:
1.5707963267948966192313216... is slightly more precise value of π/2 (calculated here; more decimal places aren't necessary to illustrate the point).
1.5707963267948966 is just smaller
1.5707964 is just larger.
To illustrate, here is graph from Math Is Fun:

Accurate geometry in Java

I'm coding an application in Java that requires quite a lot of geometry. I made heavy use of existing classes and my calculations so far have been in double precision (so I'm using for instance, Point2D.Double, Line2D.Double and coded a convex polygon class using the latter...).
I ran into several issues relating to double precision calculations that make my application at times unstable and I considered switching to BigDecimal but that would imply creating creating my own Point2D, Line2D classes with BigDecimals etc, and rewriting several functions. Another solution would be to accept the imprecisions and deal with them; i.e. A point is actually a small square, a line is a an infinite band, a point lies on a line if the square and the band intersect and so on. Although this solution can be implemented quickly my code would be disfigured by statements like (Math.abs(x) < precision) (to signify that x == 0) scattered here and there.
Is someone aware of nice clean way to do accurate geometry in Java?

I tried to squeeze (parts of) this into a comment, but it didn't fit. You should not consider this as "THE" answer, but there are some points that I would like to list here.
The recommendation to use BigDecimal is annoyingly common whenever someone mentions precision problems with float or double - and yet is equally inappropriate in such cases as this one. In all but the fewest cases, the limited precision of double is simply not relevant.
Unless, maybe, you are writing software that should compute the trajectory of a manned spacecraft that is about to be sent to Mars, or doing other highly scientific computations.
Additionally, replacing double with BigDecimal tends to only replace one small problem with several larger ones. For example, you'll have to think about the RoundingMode and "scale", which can be tricky. And eventually, you will notice that a simple value like 1.0/3.0 can't be represented with BigDecimal either.
For your particular application case, there are more caveats:
Even with a BigDecimal-based implementation of Point2D, the data would still be exposed as double, via the getX()/getY() methods. For example, a method like Line2D#ptLineDistSq will still use the double values. This could only be avoided if you wrote everything that is related to your computations, from scratch, using BigDecimal really everywhere.
But even if you did this: You cannot compute the slope of a line from the point (-1,0) to the point (2,1), and you cannot say where this line intersects the y-axis. You might try some rational number representation here, but there's still this issue with the length of the diagonal of a unit square - which is an irrational number.
The imprecisions of double are annoying. You can compute whether a point is left of a line or right of a line. And due to the precision issues, it may well be that it is both. Doing computations with points that should "mathematically" be equal, but differ by some small floating-point error can lead to bogus results (I also stumbled over this in one of my libraries).
As you already mentioned in the question: Some concepts that work in pure mathematics have to be rethought when they should be implemented with limited precision. Any == comparison is a no-go, and other comparisons should be carefully validated, taking the possible rounding errors into account.
But using some "epsilon"-based comparisons is the usual way to deal with this. Of course, they make the code a bit more clumsy. But compare this to some "arbitrary precision" code with BigDecimal:
BigDecimal computeArea(BigDecimal radius) {
// Let's be very precise here....
BigDecimal pi = new BigDecimal("3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067982148086513282306647093844609550582231725359408128481117450284102701938521105559644622948954930381964428810975665933446128475648233786783165271201909145648566923460348610454326648213393607260249141273724587006606315588174881520920962829254091715364367892590360011330530548820466521384146951941511609433057270365759591953092186117381932611793105118548074462379962749567351885752724891227938183011949129833673362440656643086021394946395224737190702179860943702770539217176293176752384674818467669405132000568127145263560827785771342757789609173637178721468440901224953430146549585371050792279689258923542019956112129021960864034418159813629774771309960518707211349999998372978049951059731732816096318595024459455346908302642522308253344685035261931188171010003137838752886587533208381420617177669147303598253490428755468731159562863882353787593751957781857780532171226806613001927876611195909216420198938095257201065485863278865936153381827968230301952035301852968995773622599413891249721775283479131515574857242454150695950829533116861727855889075098381754637464939319");
BigDecimal radiusSquared = radius.multiply(radius);
BigDecimal area = radiusSquared.multiply(pi);
return area;
}
Vs.
double computeArea(double radius) {
return Math.PI * radius * radius;
}
Also, the epsilon-based comparisons are still error-prone and raise some questions. Most prominently: How large should this "epsilon" be? Where should the epsilon-based comparison take place? However, existing implementations, like the geometric algorithms in http://www.geometrictools.com/ might give some ideas of how this can be done (even though they are implemented in C++, and became a bit less readable in the latest versions). They are time-tested and already show how to cope with many of the precision-related problems.

Setting Spark RDD sizes:Casting long to Double inside 10^9+ for loop, really bad idea?

(EDIT: Looking at where this question started, it really ended up in a much better place. It wound up being a nice resource on the limits of RDD sizes in Spark when set through SparkContext.parallelize() vs. the actual size limits of RDDs. Also uncovered some arguments to parallelize() not found in user docs. Look especially at zero323's comments and his accepted answer.)
Nothing new under the sun but I can't find this question already asked ... the question is about how wrong/inadvisable/improper it might be to run a cast inside a large for loop in Java.
I want to run a for loop to initialize an Arraylist before passing it to a SparkContext.parallelize() method. I have found passing an uninitialized array to Spark can cause an empty collection error.
I have seen many posts about how floats and doubles are bad ideas as counters, I get that, just seems like this is a bad idea too? Like there must be a better way?
numListLen will be 10^6 * 10^3 for now, maybe as large at 10^12 at some point.
List<Double> numList = new ArrayList<Double>(numListLen);
for (long i = 0; i < numListLen; i++) {
numList.add((double) i);
}
I would love to hear where specifically this code falls down and can be improved. I'm a junior-level CS student so I haven't seen all the angles yet haha. Here's a CMU page seemingly approving this approach in C using implicit casting.
Just for background, numList is going to be passed to Spark to tell it how many times to run a simulation and create a RDD with the results, like this:
JavaRDD dataSet = jsc.parallelize(numList,SLICES_AKA_PARTITIONS);
// the function will be applied to each member of dataSet
Double count = dataSet.map(new Function<Double, Double>() {...
(Actually I'd love to run this Arraylist creation through Spark but it doesn't seem to take enough time to warrant that, 5 seconds on my i5 dual-core but if boosted to 10^12 then ... longer )

davidstenberg and Konstantinos Chalkias already covered problems related to using Doubles as counters and radiodef pointed out an issue with creating objects in the loop but at the end of the day you simply cannot allocate ArrayList larger than Integer.MAX_VALUE. On top of that, even with 231 elements, this is a pretty large object and serialization and network traffic can add a substantial overhead to your job.
There a few ways you can handle this:
using SparkContext.range method:
range(start: Long, end: Long,
step: Long = 1, numSlices: Int = defaultParallelism)
initializing RDD using range object. In PySpark you can use or range (xrange in Python 2), in Scala Range:
val rdd = sc.parallelize(1L to Long.MaxValue)
It requires constant memory on the driver and constant network traffic per executor (all you have to transfer it just a beginning and end).
In Java 8 LongStream.range could work the same way but it looks like JavaSparkContext doesn't provide required constructors yet. If you're brave enough to deal with all the singletons and implicits you can use Scala Range directly and if not you can simply write a Java friendly wrapper.
initialize RDD using emptyRDD method / small number of seeds and populate it using mapPartitions(WithIndex) / flatMap. See for example Creating array per Executor in Spark and combine into RDD
With a little bit of creativity you can actually generate an infinite number of elements this way (Spark FlatMap function for huge lists).
given you particular use case you should also take a look at mllib.random.RandomRDDs. It provides number of useful generators from different distributions.

The problem is using a double or float as the loop counter. In your case the loop counter is a long and does not suffer the same problem.
One problem with a double or float as a loop counter is that the floating point precision will leave gaps in the series of numbers represented. It is possible to get to a place within the valid range of a floating point number where adding one falls below the precision of the number being represented (requires 16 digits when the floating point format only supports 15 digits for example). If your loop went through such a point in normal execution it would not increment and continue in an infinite loop.
The other problem with doubles as loop counters is the ability to compare two floating points. Rounding means that to compare the variables successfully you need to look at values within a range. While you might find 1.0000000 == 0.999999999 your computer would not. So rounding might also make you miss the loop termination condition.
Neither of these problems occurs with your long as the loop counter. So enjoy having done it right.

Although I don't recommend the use of floating-point values (either single or double precision) as for-loop counters, in your case, where the step is not a decimal number (you use 1 as a step), everything depends on your largest expected number Vs the fraction part of double representation (52 bits).
Still, double numbers from 2^52..2^53 represent the integer part correctly, but after 2^53, you cannot always achieve integer-part precision.
In practice and because your loop step is 1, you would not experience any problems till 9,007,199,254,740,992 if you used double as counter and thus avoiding casting (you can't avoid boxing though from double to Double).
Perform a simple increment-test; you will see that 9,007,199,254,740,995 is the first false positive!
FYI: for float numbers, you are safe incrementing till 2^24 = 16777216 (in the article you provided, it uses the number 100000001.0f > 16777216 to present the problem).

Dealing with Roundoff Error when writing RREF

I'm trying to write a program that solves for the reduced row echelon form when given a matrix. Basically what I'm doing is writing a program that solves systems of equations. However, due to the fact that there are times when I need to do division to result in repeating digits (such as 2/3 which is .66666...) and java rounds off to a certain digit, there are times when a pivot should be 0 (meaning no pivot) is something like .0000001 and it messes up my whole program.
My first question is if I were to have some sort of if statement, what is the best way to write something like "if this number is less than .00001 away from being an integer, then round to that closest integer".
My second question is does anyone have any ideas on more optimal ways of handling this situation rather than just put if statements rounding numbers all over the place.
Thank you very much.

You say that you are writing a program that solves systems of equations. This is quite a complicated problem. If you only want to use such a program, you are better off using a library written by somebody else. I will assume that you really want to write the program yourself, for fun and/or education.
You identified the main problem: using floating point numbers leads to rounding and thus to inexact results. There are two solutions for this.
The first solution is not to use floating point numbers. Use only integers and reduce the matrix to row echelon form (not reduced); this can be done without divisions. Since all computations with integers are exact, a pivot that should be 0 will be exactly 0 (actually, there may be a problem with overflow). Of course, this will only work if the matrix you start with consists of integers. You can generalize this approach by working with fractions instead of integers.
The second solution is to use floating point numbers and be very careful. This is a topic of a whole branch of mathematics / computer science called numerical analysis. It is too complicated to explain in an answer here, so you have to get a book on numerical analysis. In simple terms, what you want to do is to say that if Math.abs(pivot) < some small value, then you assume that the pivot should be zero, but that it is something like .0000000001 because of rounding errors, so you just act as if the pivot is zero. The problem is finding out what "some small value" is.

How to handle multiplication of numbers close to 1

I have a bunch of floating point numbers (Java doubles), most of which are very close to 1, and I need to multiply them together as part of a larger calculation. I need to do this a lot.
The problem is that while Java doubles have no problem with a number like:
0.0000000000000000000000000000000001 (1.0E-34)
they can't represent something like:
1.0000000000000000000000000000000001
Consequently of this I lose precision rapidly (the limit seems to be around 1.000000000000001 for Java's doubles).
I've considered just storing the numbers with 1 subtracted, so for example 1.0001 would be stored as 0.0001 - but the problem is that to multiply them together again I have to add 1 and at this point I lose precision.
To address this I could use BigDecimals to perform the calculation (convert to BigDecimal, add 1.0, then multiply), and then convert back to doubles afterwards, but I have serious concerns about the performance implications of this.
Can anyone see a way to do this that avoids using BigDecimal?
Edit for clarity: This is for a large-scale collaborative filter, which employs a gradient descent optimization algorithm. Accuracy is an issue because often the collaborative filter is dealing with very small numbers (such as the probability of a person clicking on an ad for a product, which may be 1 in 1000, or 1 in 10000).
Speed is an issue because the collaborative filter must be trained on tens of millions of data points, if not more.

Yep: because
(1 + x) * (1 + y) = 1 + x + y + x*y
In your case, x and y are very small, so x*y is going to be far smaller - way too small to influence the results of your computation. So as far as you're concerned,
(1 + x) * (1 + y) = 1 + x + y
This means you can store the numbers with 1 subtracted, and instead of multiplying, just add them up. As long as the results are always much less than 1, they'll be close enough to the mathematically precise results that you won't care about the difference.
EDIT: Just noticed: you say most of them are very close to 1. Obviously this technique won't work for numbers that are not close to 1 - that is, if x and y are large. But if one is large and one is small, it might still work; you only care about the magnitude of the product x*y. (And if both numbers are not close to 1, you can just use regular Java double multiplication...)

Perhaps you could use logarithms?
Logarithms conveniently reduce multiplication to addition.
Also, to take care of the initial precision loss, there is the function log1p (at least, it exists in C/C++), which returns log(1+x) without any precision loss. (e.g. log1p(1e-30) returns 1e-30 for me)
Then you can use expm1 to get the decimal part of the actual result.

Isn't this sort of situation exactly what BigDecimal is for?
Edited to add:
"Per the second-last paragraph, I would prefer to avoid BigDecimals if possible for performance reasons." – sanity
"Premature optimization is the root of all evil" - Knuth
There is a simple solution practically made to order for your problem. You are concerned it might not be fast enough, so you want to do something complicated that you think will be faster. The Knuth quote gets overused sometimes, but this is exactly the situation he was warning against. Write it the simple way. Test it. Profile it. See if it's too slow. If it is then start thinking about ways to make it faster. Don't add all this additional complex, bug-prone code until you know it's necessary.

Depending on where the numbers are coming from and how you are using them, you may want to use rationals instead of floats. Not the right answer for all cases, but when it is the right answer there's really no other.
If rationals don't fit, I'd endorse the logarithms answer.
Edit in response to your edit:
If you are dealing with numbers representing low response rates, do what scientists do:
Represent them as the excess / deficit (normalize out the 1.0 part)
Scale them. Think in terms of "parts per million" or whatever is appropriate.
This will leave you dealing with reasonable numbers for calculations.

Its worth noting that you are testing the limits of your hardware rather than Java. Java uses the 64-bit floating point in your CPU.
I suggest you test the performance of BigDecimal before you assume it won't be fast enough for you. You can still do tens of thousands of calculations per second with BigDecimal.

As David points out, you can just add the offsets up.
(1+x) * (1+y) = 1 + x + y + x*y
However, it seems risky to choose to drop out the last term. Don't. For example, try this:
x = 1e-8
y = 2e-6
z = 3e-7
w = 4e-5
What is (1+x)(1+y)(1+z)*(1+w)? In double precision, I get:
(1+x)(1+y)(1+z)*(1+w)
ans =
1.00004231009302
However, see what happens if we just do the simple additive approximation.
1 + (x+y+z+w)
ans =
1.00004231
We lost the low order bits that may have been important. This is only an issue if some of the differences from 1 in the product are at least sqrt(eps), where eps is the precision you are working in.
Try this instead:
f = #(u,v) u + v + u*v;
result = f(x,y);
result = f(result,z);
result = f(result,w);
1+result
ans =
1.00004231009302
As you can see, this gets us back to the double precision result. In fact, it is a bit more accurate, since the internal value of result is 4.23100930230249e-05.

If you really need the precision, you will have to use something like BigDecimal, even if it's slower than Double.
If you don't really need the precision, you could perhaps go with David's answer. But even if you use multiplications a lot, it might be some Premature Optimization, so BIgDecimal might be the way to go anyway

When you say "most of which are very close to 1", how many, exactly?
Maybe you could have an implicit offset of 1 in all your numbers and just work with the fractions.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.