Principle of java float number calculation error [duplicate] - java

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
How do you explain floating point inaccuracy to fresh programmers and laymen who still think computers are infinitely wise and accurate?
Do you have a favourite example or anecdote which seems to get the idea across much better than an precise, but dry, explanation?
How is this taught in Computer Science classes?

There are basically two major pitfalls people stumble in with floating-point numbers.
The problem of scale. Each FP number has an exponent which determines the overall “scale” of the number so you can represent either really small values or really larges ones, though the number of digits you can devote for that is limited. Adding two numbers of different scale will sometimes result in the smaller one being “eaten” since there is no way to fit it into the larger scale.
PS> $a = 1; $b = 0.0000000000000000000000001
PS> Write-Host a=$a b=$b
a=1 b=1E-25
PS> $a + $b
1
As an analogy for this case you could picture a large swimming pool and a teaspoon of water. Both are of very different sizes, but individually you can easily grasp how much they roughly are. Pouring the teaspoon into the swimming pool, however, will leave you still with roughly a swimming pool full of water.
(If the people learning this have trouble with exponential notation, one can also use the values 1 and 100000000000000000000 or so.)
Then there is the problem of binary vs. decimal representation. A number like 0.1 can't be represented exactly with a limited amount of binary digits. Some languages mask this, though:
PS> "{0:N50}" -f 0.1
0.10000000000000000000000000000000000000000000000000
But you can “amplify” the representation error by repeatedly adding the numbers together:
PS> $sum = 0; for ($i = 0; $i -lt 100; $i++) { $sum += 0.1 }; $sum
9,99999999999998
I can't think of a nice analogy to properly explain this, though. It's basically the same problem why you can represent 1/3 only approximately in decimal because to get the exact value you need to repeat the 3 indefinitely at the end of the decimal fraction.
Similarly, binary fractions are good for representing halves, quarters, eighths, etc. but things like a tenth will yield an infinitely repeating stream of binary digits.
Then there is another problem, though most people don't stumble into that, unless they're doing huge amounts of numerical stuff. But then, those already know about the problem. Since many floating-point numbers are merely approximations of the exact value this means that for a given approximation f of a real number r there can be infinitely many more real numbers r1, r2, ... which map to exactly the same approximation. Those numbers lie in a certain interval. Let's say that rmin is the minimum possible value of r that results in f and rmax the maximum possible value of r for which this holds, then you got an interval [rmin, rmax] where any number in that interval can be your actual number r.
Now, if you perform calculations on that number—adding, subtracting, multiplying, etc.—you lose precision. Every number is just an approximation, therefore you're actually performing calculations with intervals. The result is an interval too and the approximation error only ever gets larger, thereby widening the interval. You may get back a single number from that calculation. But that's merely one number from the interval of possible results, taking into account precision of your original operands and the precision loss due to the calculation.
That sort of thing is called Interval arithmetic and at least for me it was part of our math course at the university.

Show them that the base-10 system suffers from exactly the same problem.
Try to represent 1/3 as a decimal representation in base 10. You won't be able to do it exactly.
So if you write "0.3333", you will have a reasonably exact representation for many use cases.
But if you move that back to a fraction, you will get "3333/10000", which is not the same as "1/3".
Other fractions, such as 1/2 can easily be represented by a finite decimal representation in base-10: "0.5"
Now base-2 and base-10 suffer from essentially the same problem: both have some numbers that they can't represent exactly.
While base-10 has no problem representing 1/10 as "0.1" in base-2 you'd need an infinite representation starting with "0.000110011..".

How's this for an explantation to the layman. One way computers represent numbers is by counting discrete units. These are digital computers. For whole numbers, those without a fractional part, modern digital computers count powers of two: 1, 2, 4, 8. ,,, Place value, binary digits, blah , blah, blah. For fractions, digital computers count inverse powers of two: 1/2, 1/4, 1/8, ... The problem is that many numbers can't be represented by a sum of a finite number of those inverse powers. Using more place values (more bits) will increase the precision of the representation of those 'problem' numbers, but never get it exactly because it only has a limited number of bits. Some numbers can't be represented with an infinite number of bits.
Snooze...
OK, you want to measure the volume of water in a container, and you only have 3 measuring cups: full cup, half cup, and quarter cup. After counting the last full cup, let's say there is one third of a cup remaining. Yet you can't measure that because it doesn't exactly fill any combination of available cups. It doesn't fill the half cup, and the overflow from the quarter cup is too small to fill anything. So you have an error - the difference between 1/3 and 1/4. This error is compounded when you combine it with errors from other measurements.

In python:
>>> 1.0 / 10
0.10000000000000001
Explain how some fractions cannot be represented precisely in binary. Just like some fractions (like 1/3) cannot be represented precisely in base 10.

Another example, in C
printf (" %.20f \n", 3.6);
incredibly gives
3.60000000000000008882

Here is my simple understanding.
Problem:
The value 0.45 cannot be accurately be represented by a float and is rounded up to 0.450000018. Why is that?
Answer:
An int value of 45 is represented by the binary value 101101.
In order to make the value 0.45 it would be accurate if it you could take 45 x 10^-2 (= 45 / 10^2.)
But that’s impossible because you must use the base 2 instead of 10.
So the closest to 10^2 = 100 would be 128 = 2^7. The total number of bits you need is 9 : 6 for the value 45 (101101) + 3 bits for the value 7 (111).
Then the value 45 x 2^-7 = 0.3515625. Now you have a serious inaccuracy problem. 0.3515625 is not nearly close to 0.45.
How do we improve this inaccuracy? Well we could change the value 45 and 7 to something else.
How about 460 x 2^-10 = 0.44921875. You are now using 9 bits for 460 and 4 bits for 10. Then it’s a bit closer but still not that close. However if your initial desired value was 0.44921875 then you would get an exact match with no approximation.
So the formula for your value would be X = A x 2^B. Where A and B are integer values positive or negative.
Obviously the higher the numbers can be the higher would your accuracy become however as you know the number of bits to represent the values A and B are limited. For float you have a total number of 32. Double has 64 and Decimal has 128.

A cute piece of numerical weirdness may be observed if one converts 9999999.4999999999 to a float and back to a double. The result is reported as 10000000, even though that value is obviously closer to 9999999, and even though 9999999.499999999 correctly rounds to 9999999.

Related

Java - How to reduce float number precision? [duplicate]

This question already has answers here:
Java float 123.129456 to 123.12 without rounding
(5 answers)
How to round a number to n decimal places in Java
(39 answers)
Closed 5 years ago.
Can I reduce the precision of a float number?
In all the searching I've been doing I saw only how to reduce the precision for printing the number. I do not need to print it.
I want, for example, to convert 13.2836 to 13.28. Without even rounding it.
Is it possible?
The suggested answer from the system is not what I am looking for. It also deals with printing the value and I want to have a float.
There isn't really a way to do it, with good reason. While john16384's answer alludes to this, his answer doesn't make the problem clear... so probably you'll try it, it won't do what you want, and perhaps you still won't know why...
The problem is that while we think in decimal and expect that the decimal point is controlled by a power-of-10 exponent, typical floating point implementations (including Java float) use a power-of-2 exponent. Why does it matter?
You know that to represent 1/3 in decimal you'd say 0.3(repeating) - so if you have a limited number of decimal digits, you can't really represent 1/3. When the exponent is 2 instead of 10, you can't really represent 1/5 either, or a lot of other numbers that you could represent exactly in decimal.
As it happens .28 is one of those numbers. So you could multiply by 100, pass the result to floor, and divide by 100, but when this gets converted back to a float, the resulting value will be a little different from .28 and so, if you then check its value, you'll still see more than 2 decimal places.
The solution would be to use something like BigDecimal that can exactly represent decimal values of a given precision.
The standard warnings about doing precision arithmetic with floats applies, but you can do this:
float f = 13.2836;
f = Math.floor(f * 100) / 100;
if you need to save memory in some part of your calculation, And your numbers are smaller than 2^15/100 (range short), you can do the following.
Part of this taken from this post https://stackoverflow.com/a/25201407/7256243.
float number = 1.2345667f;
number= (short)(100*number);
number=(float)(number/100);
You only need to rememeber that the short's are 100 times larger.
Most answers went straight to how do represent floats more accurately, which is strange because you're asking:
Can I reduce the precision of a float number
Which is the exact opposite. So I'll try to answer this.
However there are several way to "reduce precision":
Reduce precision to gain performance
Reduce memory footprint
Round / floor arbitrarily
Make the number more "fuzzy"
Reduce the number of digits after the coma
I'll tackle those separately.
Reduce precision to gain performance
Just to get it out of the way: simply because you're dropping precision off of your calculations on a float, doesn't mean it'll be any faster. Quite the contrary. This answer by #john16384:
f = Math.floor(f * 100) / 100;
Only adds up computation time. If you know the number of significant digits from the result is low, don't bother removing them, just carry that information with the number:
public class Number WithSignificantDigits {
private float value;
private int significantdigits;
(implement basic operations here, but don't floor/round anywhere)
}
If you're doing this because you're worried about performance: stop it now, just use the full precision. If not, read on.
Reduce memory footprint
To actually store a number with less precision, you need to move away from float.
One such representation is using an int with a fixed point convention (i.e. the last 2 digits are past the coma).
If you're trying to save on storage space, do this. If not, read on.
Round / floor arbitrarily
To keep using float, but drop its precision, several options exist:
#john16384 proposed:
`f = Math.floor(f * 100) / 100;`
Or even
f = ((int) (f*100)) / 100.;
If the answer is this, your question is a duplicate. If not, read on.
Make the number more "fuzzy"
Since you just want to lose precision, but haven't stated how much, you could do with bitwise shifts:
float v = 0;
int bits = Float.floatToIntBits(v);
bits = bits >> 7; // Precision lost here
float truncated = Float.intBitsToFloat(bits);
Use 7 bitshifts to reduce precision to nearest 1/128th (close enough to 1/100)
Use 10 bitshifts to reduce precision to nearest 1/1024th (close enough to 1/1000)
I haven't tested performance of those, but If your read this, you did not care.
If you want to lose precision, and you don't care about formatting (numbers may stil have a large number of digits after the coma, like 0,9765625 instead of 1), do this. If you care about formatting and want a limited number of digits after the coma, read on.
Reduce the number of digits after the coma
For this you can:
Follow #Mark Adelsberger's suggestion of BigDecimals, or
Store as a String (yuk)
Because floats or doubles won't let you do this in most cases.

How to round a double/float to BINARY precision?

I am writing tests for code performing calculations on floating point numbers. Quite expectedly, the results are rarely exact and I would like to set a tolerance between the calculated and expected result. I have verified that in practice, with double precision, the results are always correct after rounding of last two significant decimals, but usually after rounding the last decimal. I am aware of the format in which doubles and floats are stored, as well as the two main methods of rounding (precise via BigDecimal and faster via multiplication, math.round and division). As the mantissa is stored in binary however, is there a way to perform rounding using base 2 rather than 10?
Just clearing the last 3 bits almost always yields equal results, but if I could push it and instead 'add 2' to the mantissa if its second least significast bit is set, I could probably reach the limit of accuracy. This would be easy enough, expect I have no idea how to handle overflow (when all bits 52-1 are set).
A Java solution would be preferred, but I could probably port one for another language if I understood it.
EDIT:
As part of the problem was that my code was generic with regards to arithmetic (relying on scala.Numeric type class), what I did was an incorporation of rounding suggested in the answer into a new numeric type, which carried the calculated number (floating point in this case) and rounding error, essentially representing a range instead of a point. I then overrode equals so that two numbers are equal if their error ranges overlap (and they share arithmetic, i.e. the number type).
Yes, rounding off binary digits makes more sense than going through BigDecimal and can be implemented very efficiently if you are not worried about being within a small factor of Double.MAX_VALUE.
You can round a floating-point double value x with the following sequence in Java (untested):
double t = 9 * x; // beware: this overflows if x is too close to Double.MAX_VALUE
double y = x - t + t;
After this sequence, y should contain the rounded value. Adjust the distance between the two set bits in the constant 9 in order to adjust the number of bits that are rounded off. The value 3 rounds off one bit. The value 5 rounds off two bits. The value 17 rounds off four bits, and so on.
This sequence of instruction is attributed to Veltkamp and is typically used in “Dekker multiplication”. This page has some references.

Weird floor rounding [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Moving decimal places over in a double
I am having this weird problem in Java, I have following code:
double velocity = -0.07;
System.out.println("raw value " + velocity*200 );
System.out.println("floored value " + Math.floor(velocity*200) );
I have following output:
raw value -14.000000000000002
floored value -15.0
Those traling 0002 screw everything up, and BTW there should not be that traling 2, I think it should be all zeroes after decimal point, can I get rid of that 2?
Update: Thanks for help, Guys do you know any way to make floor rounding on BigDecimal object without calling doubleValue method?
Because floor(-14.000000000000002) is indeed -15!
You see, floor is defined as the maximal whole number less or equal to the argument. As -14.000000000000002 is not a whole number, the closest whole number downwards is -15.
Well, now let's clear why -0.07 * 200 is not exactly -14. This is because the inner representation of floating-point numbers is in base 2, so the fractions where the denominator is not a power of 2 cannot be represented with 100% precision. (The same way as you cannot represent 1/3 as the decimal fraction with finite amount of decimal places.) So, the value of velocity is not exactly -0.07. (When the compiler sees the constant -0.07, it silently replaces it with a binary fraction which is quite close to -0.07, but not actually equal to.) This is why velocity * 200 is not exactly -14.
From The Floating-Point Guide:
Why don’t my numbers, like 0.1 + 0.2 add up to a nice round 0.3, and instead I get a weird result like 0.30000000000000004?
Because internally, computers use a format (binary floating-point)
that cannot accurately represent a number like 0.1, 0.2 or 0.3 at all.
When the code is compiled or interpreted, your “0.1” is already
rounded to the nearest number in that format, which results in a small
rounding error even before the calculation happens.
If you need numbers that exactly add up to specific expected values, you cannot use double. Read the linked-to site for details.
Use BigDecimal... The problem above is a well-known rounding problem with the representation schemes used on a computer with finite-memory. The problem is that the answer is repetitive in the binary (that is, base 2) system (i.e. like 1/3 = 0.33333333... with decimal) and cannot be presented correctly. A good example of this is 1/10 = 0.1 which is 0.000110011001100110011001100110011... in binary. After some point the 1s and 0s have to end, causing the perceived error.
Hope you're not working on life-critical stuff... for example http://www.ima.umn.edu/~arnold/disasters/patriot.html. 28 people lost their lives due to a rounding error.
Java doubles follow the IEEE 754 floating-point arithmetic, which can't represent every single real number with infinite accuracy. This round up is normal. You can't get rid of it in the internal representation. You can of course use String.format to print the result.

Weird Java behavior: How come adding doubles with EXACTLY two decimal places result to a double with MORE THAN two decimal places?

If I have an array of doubles that each have EXACTLY two decimal places, add them up altogether via a loop, and print out the total, what comes out is a number with MORE THAN two decimal places. Which is weird, because theoretically, adding two numbers that each have 2 and only 2 decimal places will NEVER produce a number that has a non-zero digit beyond the hundredths place.
Try executing this code:
double[] d = new double[2000];
for (int i = 0; i < d.length; i++) {
d[i] = 9.99;
}
double total = 0,00;
for (int i = 0; i < d.length; i++) {
total += d[i];
if (("" + total).matches("[0-9]+\\.[0-9]{3,}")) { // if there are 3 or more decimal places in the total
System.out.println("total: " + total + ", " + i); // print the total and the iteration when it occured
}
}
In my computer, this prints out:
total: 59.940000000000005, 5
If I round off the total to two decimal places then I'd get the same number as I would if I manually added 9.99 six times on a calculator. But how come this is happening and where are the extra decimal places coming from? Am I doing something wrong or (I doubt this is likely) is this a Java bug?
Are you familiar with base 10 to base 2 conversion (decimal to binary) for fractions? If not, look it up.
Then you'll see that although 9.99 looks pretty normal in base 10, it doesn't really look that nice in binary; It looks like a repeating decimal, but in binary. I'm sure you've seen a repeating decimal before, right? It doesn't end. But Java (or any language for that matter) has to save that infinite sequence of digits into a limited number of bytes. And that's when the extra digits appear. When you convert that truncated binary back to decimal, you're really dealing with a different number. The number stored in the variable isn't 9.99 exactly, it something like 9.9999999991 (just an example, I didn't work out the math).
But you're probably interested on how to solve this, right? Look up the BigDecimal class. That's what you want to use for your calculations, especially when dealing with currency. Also, look up DecimalFormat, which is a class for writing a number as a properly formatted string. I think it does rounding for you when you want to show only 2 decimal digits and your number has a lot more, for example.
If I have an array of doubles that each have EXACTLY two decimal places
Let's stop right there, because I suspect you don't. For example, you give 9.99 in your sample code. That isn't really 9.99. That's "the closest double to 9.99" as 9.99 itself can't be exactly represented in binary floating point.
At that point, the rest of your reasoning goes out of the window.
If you want values with an exact number of decimal digits, you should use a type which stores values in a decimal-centric manner, such as BigDecimal. Alternatively, store everything as integers and "know" that you're actually remembering "the value * 100" instead.
Doubles are represented in a binary format on the computer (). This means that certain numbers cannot be represented accurately, so the computer will use the closest number that can be represented.
E.g. 10.5 = 2^3+2+2^(-1) = 1.0101 * 2^3 (here the mantissa is in binary)
but 10.1 = 2^3+2+2^(-4)+2^(-5)+(infinite series here) = 1.0100001... * 2^3
9.99 is such a number with infinite representation. Thus when you add them together, the finite representation used by the computer is used in the calculation and the result will be even more further away from the mathematical sum than the originals were from their true representation. This is why you see more digits displayed than used in the original numbers.
this is because of floating point arithmetics.
doubles and floats are not exactly real numbers, there are finite number of bits to represent them while there are infinite number of real numbers [in any range], so you cannot represent all real numbers - You are getting the closest number you can have with the floating point representation.
Whenever you deal with floating points - remember that they are only an approximation to the number you are seeking. You might want to use BigDecimal if you want the exact number [or at least control the error].
More info can be found at this article
Use BigDecimal to perform floating point calculations with precision. It's a must when it comes to money.
This is a known issue that stems in the fact that binary calculations don't allow for precise floating point operations. Look at "floating point arithmetics" for more details.
This is due to inaccuracies when it comes to representing decimal numbers using a binary floating point value. In other words, the double literal 0.99 does not actually represent the mathematical value 9.99.
To reveal exactly what number a value, such as 9.99 represents you could let BigDecimal print the value.
Code to reveal the exact value:
System.out.println(new BigDecimal(9.99));
Output:
9.9900000000000002131628207280300557613372802734375
Btw, your reasoning would be completely accurate if you were taking about binary places instead of decimal places, since a number with two binary places can be exactly represented by a binary floating point value.

BigDecimal.divide(...) - appropriate scale for quotients with non-terminating expansions

I'm using BigDecimal for some floating-point math. If you divide 5 by 4.2 you'll get an exception (as the result has a non-terminating expansion which cannot be represented by a BigDecimal) i.e.
BigDecimal five = new BigDecimal("5");
BigDecimal fourPointTwo = new BigDecimal("4.2");
five.divide(fourPointTwo) // ArithmeticException: Non-terminating decimal expansion; no exact representable decimal result.
I'm prepared to lose some precision in this case, so I am going to use the divide(...) method which allows a scale for the result to be provided:
five.divide(fourPointTwo, 2, RoundingMode.HALF_UP); //Fine, but obviously not 100% accurate
What scale should I pass to this method so that the result is as accurate as if I had performed the calculation using two doubles?
From the javadocs of the BigDecimal class:
If zero or positive, the scale is the number of digits to the right of the decimal point. If negative, the unscaled value of the number is multiplied by ten to the power of the negation of the scale. The value of the number represented by the BigDecimal is therefore (unscaledValue × 10-scale).
The precision of double varies accordingly to the order of magnitude of the value. According to this, it uses 52 bits to store the unsigned mantissa, so any integer that may be represented with 52 bits will be ok. This will be roughly 18 decimal digits.
Further, double uses 11 bits to store the exponent. So, something like 4 decimal precision will do. This way, any integer up to 52 bits multiplied by a positive or negative power of 2 with at most 10 bits may be represented (one bit is the sign of the expoent). Beyond that, you start to lose precision.
The extra bit of double stores the sign.
This way, scale 18 + 4 = 22 will be at least as precise as double.
Your problem is called "round-off error" or "rounding error". Example:
You have two numbers a and b. You know that each has a certain precision (i.e. number of digits that you're confident of) which means that every other digit is "random" noise.
Imagine b has a precision of two digits. The result of (b*100)-int(b*100) will be random since the operation cuts away all "correct" digits.
These errors propagate depending on the mathematical operation. Some examples:
Errors margins add when the numbers are added. If a and b have a precision of two, adding them might turn the second digit of the fraction into garbage: 0.003 + 0.008 = 0.011
Multiplication grows the error fast and exponential functions grow it even faster.
Division reduces the error margin (0.003 / 3 = 0.001)
So if you want a correct answer, you must calculate the error margins of all operations in your code following the rules outlined above. Link anyone?
Of course, this is usually not an option. So you need to think what amount of error you can live with. For example if you do math on financial data, a precision of 10 or 20 is generally enough because you have enough bits to "waste" for several mathematical operations before the error grows into significant parts of the value.
Example: You start with 10.500 000 000 and 3.100 000 000. If you divide the two, you get 3.387 096 774. From that, you only need 3.87 - the rest is spare precision which you can use up in further operations until you round the last result to two digits and save it back in the database.

Categories