Why is float value "++" operation different for 3.14? [duplicate] - java

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 4 years ago.
Code
public class test{
public static void main(String[] args){
double first = 3.14 ;
first++;
System.out.println(first);
}
}
Result
ubuntu#john:~/Desktop$ javac test.java
ubuntu#john:~/Desktop$ java test
Output : 4.140000000000001
I am getting expected answer for almost every other case...
Eg : For 4.14 ,result is 5.14...
Why this case is special ?

There are lots of numbers that can be expressed exactly in decimal, but not exactly in binary. That is, they have a terminating decimal representation, but no terminating binary representation.
To understand this, consider the number 1/3. It doesn't have a terminating decimal representation - we can keep writing 0.3333333333333 for a while, but sooner or later, we have to stop, and we still haven't quite written 1/3.
The same thing happens when we try to write 2.14 in binary. It's 10.001000111... and a bunch more 0s and 1s that eventually start repeating, in the same way as 0.333333 repeats in decimal.
Now a double is just a binary number with 53 significant figures. So it can't store exactly 2.14, but it can get very close. Now see what happens when we start incrementing it.
2.14 = 10.001000111 .... (53 significant figures, 51 of them after the dot)
3.14 = 11.001000111 .... (53 significant figures, 51 of them after the dot)
4.14 = 100.001000111 ... (53 significant figures, 50 of them after the dot)
5.14 = 101.001000111 ... (53 significant figures, 50 of them after the dot)
So we didn't lose any accuracy when we went from 2.14 to 3.14, because the part after the dot didn't change. Likewise when we went from 4.14 to 5.14.
But when we went from 3.14 to 4.14, we lost accuracy, because we needed one extra digit before the dot, so we lost a digit after the dot.
Now Java has a complicated algorithm for figuring out how to display a floating point number. Basically, it picks the shortest decimal representation that's closer to the floating point number you're trying to represent, than to any other floating point number. That way, if you write double d = 2.14;, then you'll get a floating point number that's SO CLOSE to 2.14, that it will always show up as 2.14 when you print it out.
But as soon as you start messing with the digits after the dot, the complexity of Java's printing algorithm kicks in - and the number can end up printed differently from how you expect.
So this won't happen when you increment a double, but don't change the number of digits before the dot. It can only happen when you increment a double past a power of 2; because this changes the number of digits before the dot.
To illustrate this, I ran this code.
for(int i = 0; i < 1000000000; i++) {
if ( i + 1 + 0.14 != i + 0.14 + 1 ) {
System.out.println(i + 0.14 + 1);
}
}
and got this output.
4.140000000000001
1024.1399999999999
2048.1400000000003
4096.139999999999
1048576.1400000001
2097152.1399999997
4194304.140000001
Observe that all these discrepant values are just past a power of two.

A float increment does not increment with 1, but with something that is a tiny bit off.
So it will go fine for a while, but after some time the answer will not be correct.
This is because of round-off 'problems' with float increments/decrements.
It's bad style. ++ and -- are intended to set a value to its next or previous value, like the next or previous integer, the next or previous element in an array (for pointers) for example
'Next' and 'previous' values are not well-defined for floats.
see: Is floating point math broken?

Related

When using doubles, why isn't (x / (y * z)) the same as (x / y / z)? [duplicate]

This question already has answers here:
How to avoid floating point precision errors with floats or doubles in Java?
(12 answers)
Double calculation producing odd result [duplicate]
(3 answers)
Closed 7 years ago.
This is partly academic, as for my purposes I only need it rounded to two decimal places; but I am keen to know what is going on to produce two slightly different results.
This is the test that I wrote to narrow it to the simplest implementation:
#Test
public void shouldEqual() {
double expected = 450.00d / (7d * 60); // 1.0714285714285714
double actual = 450.00d / 7d / 60; // 1.0714285714285716
assertThat(actual).isEqualTo(expected);
}
But it fails with this output:
org.junit.ComparisonFailure:
Expected :1.0714285714285714
Actual :1.0714285714285716
Can anyone explain in detail what is going on under the hood to result in the value at 1.000000000000000X being different?
Some of the points I'm looking for in an answer are:
Where is the precision lost?
Which method is preferred, and why?
Which is actually correct? (In pure maths, both can't be right. Perhaps both are wrong?)
Is there a better solution or method for these arithmetic operations?
I see a bunch of questions that tell you how to work around this problem, but not one that really explains what's going on, other than "floating-point roundoff error is bad, m'kay?" So let me take a shot at it. Let me first point out that nothing in this answer is specific to Java. Roundoff error is a problem inherent to any fixed-precision representation of numbers, so you get the same issues in, say, C.
Roundoff error in a decimal data type
As a simplified example, imagine we have some sort of computer that natively uses an unsigned decimal data type, let's call it float6d. The length of the data type is 6 digits: 4 dedicated to the mantissa, and 2 dedicated to the exponent. For example, the number 3.142 can be expressed as
3.142 x 10^0
which would be stored in 6 digits as
503142
The first two digits are the exponent plus 50, and the last four are the mantissa. This data type can represent any number from 0.001 x 10^-50 to 9.999 x 10^+49.
Actually, that's not true. It can't store any number. What if you want to represent 3.141592? Or 3.1412034? Or 3.141488906? Tough luck, the data type can't store more than four digits of precision, so the compiler has to round anything with more digits to fit into the constraints of the data type. If you write
float6d x = 3.141592;
float6d y = 3.1412034;
float6d z = 3.141488906;
then the compiler converts each of these three values to the same internal representation, 3.142 x 10^0 (which, remember, is stored as 503142), so that x == y == z will hold true.
The point is that there is a whole range of real numbers which all map to the same underlying sequence of digits (or bits, in a real computer). Specifically, any x satisfying 3.1415 <= x <= 3.1425 (assuming half-even rounding) gets converted to the representation 503142 for storage in memory.
This rounding happens every time your program stores a floating-point value in memory. The first time it happens is when you write a constant in your source code, as I did above with x, y, and z. It happens again whenever you do an arithmetic operation that increases the number of digits of precision beyond what the data type can represent. Either of these effects is called roundoff error. There are a few different ways this can happen:
Addition and subtraction: if one of the values you're adding has a different exponent from the other, you will wind up with extra digits of precision, and if there are enough of them, the least significant ones will need to be dropped. For example, 2.718 and 121.0 are both values that can be exactly represented in the float6d data type. But if you try to add them together:
1.210 x 10^2
+ 0.02718 x 10^2
-------------------
1.23718 x 10^2
which gets rounded off to 1.237 x 10^2, or 123.7, dropping two digits of precision.
Multiplication: the number of digits in the result is approximately the sum of the number of digits in the two operands. This will produce some amount of roundoff error, if your operands already have many significant digits. For example, 121 x 2.718 gives you
1.210 x 10^2
x 0.02718 x 10^2
-------------------
3.28878 x 10^2
which gets rounded off to 3.289 x 10^2, or 328.9, again dropping two digits of precision.
However, it's useful to keep in mind that, if your operands are "nice" numbers, without many significant digits, the floating-point format can probably represent the result exactly, so you don't have to deal with roundoff error. For example, 2.3 x 140 gives
1.40 x 10^2
x 0.23 x 10^2
-------------------
3.22 x 10^2
which has no roundoff problems.
Division: this is where things get messy. Division will pretty much always result in some amount of roundoff error unless the number you're dividing by happens to be a power of the base (in which case the division is just a digit shift, or bit shift in binary). As an example, take two very simple numbers, 3 and 7, divide them, and you get
3. x 10^0
/ 7. x 10^0
----------------------------
0.428571428571... x 10^0
The closest value to this number which can be represented as a float6d is 4.286 x 10^-1, or 0.4286, which distinctly differs from the exact result.
As we'll see in the next section, the error introduced by rounding grows with each operation you do. So if you're working with "nice" numbers, as in your example, it's generally best to do the division operations as late as possible because those are the operations most likely to introduce roundoff error into your program where none existed before.
Analysis of roundoff error
In general, if you can't assume your numbers are "nice", roundoff error can be either positive or negative, and it's very difficult to predict which direction it will go just based on the operation. It depends on the specific values involved. Look at this plot of the roundoff error for 2.718 z as a function of z (still using the float6d data type):
In practice, when you're working with values that use the full precision of your data type, it's often easier to treat roundoff error as a random error. Looking at the plot, you might be able to guess that the magnitude of the error depends on the order of magnitude of the result of the operation. In this particular case, when z is of the order 10-1, 2.718 z is also on the order of 10-1, so it will be a number of the form 0.XXXX. The maximum roundoff error is then half of the last digit of precision; in this case, by "the last digit of precision" I mean 0.0001, so the roundoff error varies between -0.00005 and +0.00005. At the point where 2.718 z jumps up to the next order of magnitude, which is 1/2.718 = 0.3679, you can see that the roundoff error also jumps up by an order of magnitude.
You can use well-known techniques of error analysis to analyze how a random (or unpredictable) error of a certain magnitude affects your result. Specifically, for multiplication or division, the "average" relative error in your result can be approximated by adding the relative error in each of the operands in quadrature - that is, square them, add them, and take the square root. With our float6d data type, the relative error varies between 0.0005 (for a value like 0.101) and 0.00005 (for a value like 0.995).
Let's take 0.0001 as a rough average for the relative error in values x and y. The relative error in x * y or x / y is then given by
sqrt(0.0001^2 + 0.0001^2) = 0.0001414
which is a factor of sqrt(2) larger than the relative error in each of the individual values.
When it comes to combining operations, you can apply this formula multiple times, once for each floating-point operation. So for instance, for z / (x * y), the relative error in x * y is, on average, 0.0001414 (in this decimal example) and then the relative error in z / (x * y) is
sqrt(0.0001^2 + 0.0001414^2) = 0.0001732
Notice that the average relative error grows with each operation, specifically as the square root of the number of multiplications and divisions you do.
Similarly, for z / x * y, the average relative error in z / x is 0.0001414, and the relative error in z / x * y is
sqrt(0.0001414^2 + 0.0001^2) = 0.0001732
So, the same, in this case. This means that for arbitrary values, on average, the two expressions introduce approximately the same error. (In theory, that is. I've seen these operations behave very differently in practice, but that's another story.)
Gory details
You might be curious about the specific calculation you presented in the question, not just an average. For that analysis, let's switch to the real world of binary arithmetic. Floating-point numbers in most systems and languages are represented using IEEE standard 754. For 64-bit numbers, the format specifies 52 bits dedicated to the mantissa, 11 to the exponent, and one to the sign. In other words, when written in base 2, a floating point number is a value of the form
1.1100000000000000000000000000000000000000000000000000 x 2^00000000010
52 bits 11 bits
The leading 1 is not explicitly stored, and constitutes a 53rd bit. Also, you should note that the 11 bits stored to represent the exponent are actually the real exponent plus 1023. For example, this particular value is 7, which is 1.75 x 22. The mantissa is 1.75 in binary, or 1.11, and the exponent is 1023 + 2 = 1025 in binary, or 10000000001, so the content stored in memory is
01000000000111100000000000000000000000000000000000000000000000000
^ ^
exponent mantissa
but that doesn't really matter.
Your example also involves 450,
1.1100001000000000000000000000000000000000000000000000 x 2^00000001000
and 60,
1.1110000000000000000000000000000000000000000000000000 x 2^00000000101
You can play around with these values using this converter or any of many others on the internet.
When you compute the first expression, 450/(7*60), the processor first does the multiplication, obtaining 420, or
1.1010010000000000000000000000000000000000000000000000 x 2^00000001000
Then it divides 450 by 420. This produces 15/14, which is
1.0001001001001001001001001001001001001001001001001001001001001001001001...
in binary. Now, the Java language specification says that
Inexact results must be rounded to the representable value nearest to the infinitely precise result; if the two nearest representable values are equally near, the one with its least significant bit zero is chosen. This is the IEEE 754 standard's default rounding mode known as round to nearest.
and the nearest representable value to 15/14 in 64-bit IEEE 754 format is
1.0001001001001001001001001001001001001001001001001001 x 2^00000000000
which is approximately 1.0714285714285714 in decimal. (More precisely, this is the least precise decimal value that uniquely specifies this particular binary representation.)
On the other hand, if you compute 450 / 7 first, the result is 64.2857142857..., or in binary,
1000000.01001001001001001001001001001001001001001001001001001001001001001...
for which the nearest representable value is
1.0000000100100100100100100100100100100100100100100101 x 2^00000000110
which is 64.28571428571429180465... Note the change in the last digit of the binary mantissa (compared to the exact value) due to roundoff error. Dividing this by 60 gives you
1.000100100100100100100100100100100100100100100100100110011001100110011...
Look at the end: the pattern is different! It's 0011 that repeats, instead of 001 as in the other case. The closest representable value is
1.0001001001001001001001001001001001001001001001001010 x 2^00000000000
which differs from the other order of operations in the last two bits: they're 10 instead of 01. The decimal equivalent is 1.0714285714285716.
The specific rounding that causes this difference should be clear if you look at the exact binary values:
1.0001001001001001001001001001001001001001001001001001001001001001001001...
1.0001001001001001001001001001001001001001001001001001100110011001100110...
^ last bit of mantissa
It works out in this case that the former result, numerically 15/14, happens to be the most accurate representation of the exact value. This is an example of how leaving division until the end benefits you. But again, this rule only holds as long as the values you're working with don't use the full precision of the data type. Once you start working with inexact (rounded) values, you no longer protect yourself from further roundoff errors by doing the multiplications first.
It has to do with how the double type is implemented and the fact that the floating-point types don't make the same precision guarantees as other simpler numerical types. Although the following answer is more specifically about sums, it also answers your question by explaining how there is no guarantee of infinite precision in floating-point mathematical operations: Why does changing the sum order returns a different result?. Essentially you should never attempt to determine the equality of floating-point values without specifying an acceptable margin of error. Google's Guava library includes DoubleMath.fuzzyEquals(double, double, double) to determine the equality of two double values within a certain precision. If you wish to read up on the specifics of floating-point equality this site is quite useful; the same site also explains floating-point rounding errors. In summation: the expected and actual values of your calculation differ because of the rounding differing between the calculations due to the order of operations.
Let's simplify things a bit. What you want to know is why 450d / 420 and 450d / 7 / 60 (specifically) give different results.
Let's see how division is performed in IEE double-precision floating point format. Without going deep into implementation details, it's basically XOR-ing the sign bit, subtracting the exponent of the divisor from the exponent of the dividend, dividing the mantissas, and normalizing the result.
First, we should represent our numbers in the proper format for double:
450 is 0 10000000111 1100001000000000000000000000000000000000000000000000
420 is 0 10000000111 1010010000000000000000000000000000000000000000000000
7 is 0 10000000001 1100000000000000000000000000000000000000000000000000
60 is 0 10000000100 1110000000000000000000000000000000000000000000000000
Let's first divide 450 by 420
First comes the sign bit, it's 0 (0 xor 0 == 0).
Then comes the exponent. 10000000111b - 10000000111b + 1023 == 10000000111b - 10000000111b + 01111111111b == 01111111111b
Looking good, now the mantissa:
1.1100001000000000000000000000000000000000000000000000 / 1.1010010000000000000000000000000000000000000000000000 == 1.1100001 / 1.101001. There are a couple of different ways to do this, I'll talk a bit about them later. The result is 1.0(001) (you can verify it here).
Now we should normalize the result. Let's see the guard, round and sticky bit values:
0001001001001001001001001001001001001001001001001001 0 0 1
Guard bit's 0, we don't do any rounding. The result is, in binary:
0 01111111111 0001001001001001001001001001001001001001001001001001
Which gets represented as 1.0714285714285714 in decimal.
Now let's divide 450 by 7 by analogy.
Sign bit = 0
Exponent = 10000000111b - 10000000001b + 01111111111b == -01111111001b + 01111111111b + 01111111111b == 10000000101b
Mantissa = 1.1100001 / 1.11 == 1.00000(001)
Rounding:
0000000100100100100100100100100100100100100100100100 1 0 0
Guard bit is set, round and sticky bits are not. We are rounding to-nearest (default mode for IEEE), and we're stuck right between the two possible values which we could round to. As the lsb is 0, we add 1. This gives us the rounded mantissa:
0000000100100100100100100100100100100100100100100101
The result is
0 10000000101 0000000100100100100100100100100100100100100100100101
Which gets represented as 64.28571428571429 in decimal.
Now we will have to divide it by 60... But you already know that we have lost some precision. Dividing 450 by 420 didn't require rounding at all, but here, we already had to round the result at least once. But, for completeness's sake, let's finish the job:
Dividing 64.28571428571429 by 60
Sign bit = 0
Exponent = 10000000101b - 10000000100b + 01111111111b == 01111111110b
Mantissa = 1.0000000100100100100100100100100100100100100100100101 / 1.111 == 0.10001001001001001001001001001001001001001001001001001100110011
Round and shift:
0.1000100100100100100100100100100100100100100100100100 1 1 0 0
1.0001001001001001001001001001001001001001001001001001 1 0 0
Rounding just as in the previous case, we get the mantissa: 0001001001001001001001001001001001001001001001001010.
As we shifted by 1, we add that to the exponent, getting
Exponent = 01111111111b
So, the result is:
0 01111111111 0001001001001001001001001001001001001001001001001010
Which gets represented as 1.0714285714285716 in decimal.
Tl;dr:
The first division gave us:
0 01111111111 0001001001001001001001001001001001001001001001001001
And the last division gave us:
0 01111111111 0001001001001001001001001001001001001001001001001010
The difference is in the last 2 bits only, but we could have lost more - after all, to get the second result, we had to round two times instead of none!
Now, about mantissa division. Floating-point division is implemented in two major ways.
The way mandated by the IEEE long division (here are some good examples; it's basically the regular long division, but with binary instead of decimal), and it's pretty slow. That is what your computer did.
There is also a faster but less accrate option, multiplication by inverse. First, a reciprocal of the divisor is found, and then multiplication is performed.
That's because double division often lead to a loss of precision. Said loss can vary depending on the order of the divisions.
When you divide by 7d, you already lost some precision with the actual result. Then only you divide an erroneous result by 60.
When you divide by 7d * 60, you only have to use division once, thus losing precision only once.
Note that double multiplication can sometimes fail too, but that's much less common.
Certainly the order of the operations mixed with the fact that doubles aren't precise :
450.00d / (7d * 60) --> a = 7d * 60 --> result = 450.00d / a
vs
450.00d / 7d / 60 --> a = 450.00d /7d --> result = a / 60

Why do floating-point numbers have roundoff errors? [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 8 years ago.
I have been struggling to understand the concept of roundoff errors in Java with floating-point. While I understand that double is not supposed to be used for financial calculations, I don't understand why the 'd' variable does not come out to 0.0. How can I get this to print out the first println?
package zetcom;
public class floatingComparison {
public static void main(String[] args) {
double r = Math.sqrt(2);
double d = r * r - 2;
if (d == 0)
{
System.out.println("sqrt(2) squared minus 2 is 0");
}
else
{
System.out.println("sqrt(2) squared minus 2 is " + d);
}
}
}
Any explanation would be appreciated.
Short answer:
The square root of 2 requires an infinite number of digits -- that is, infinite precision. doubles have 52 bits of precision. That's a lot, but it's far, far short of infinite.
If you tried to represent 1/3 with two digits (0.33), you wouldn't be surprised at rounding errors, right? You'd multiply it by 3 and be not-at-all-surprised to get an answer of 0.99 instead of 1.0. It's the same thing exactly.
Digging a bit further...
What's a bit more un-intuitive is that numbers that can be represented with a infinite number of digits in base 10 might not be able to be represented by a finite number of digits in base 2 (which is what doubles and floats use). For instance, 1/10 is 0.1 in base 10, but it's 0.0001100110011... in base 2. So it will also be rounded off when you store it as a double, for the same reason as above: storing 1/10 in a finite number of digits in binary is as impossible as storing 1/3 in a finite number of digits in decimal.
Digging in even more...
And finally, you can look at it the other way around, too. While 1/3 is impossible to write in decimal with finite precision, it's just 0.1 in base 3.
Floating point numbers in virtually all languages are always approximate (bar powers of 2), because they cannot be accurately represented in binary. This is due to how computers process information: in bits.
For instance, how do you represent .3 in binary? You're always going to get a round off error if you try to achieve maximum precision with floating point numbers due to them having to be represented in binary.

Java 1029 / 9.8 = 104.999 [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 8 years ago.
double test = 1029 / 9.8; // = 104.99999...
int inttest1 = (int) test; // 104
int inttest2 = (int)Math.floor(test); // 104
double testtt = 9.8 * 105; // 1029.0
1029 / 9.8 equals 105
but Java returns 104.9999...
More serious problem is integer casing result is 104, not 105
Why this happens and how can I avoid this result?
There are an infinite number of numbers, even in the limited range represented by Java. That's because mathematically, if you give me any two distinct numbers, I can average them to get a number between them. No matter how close they are.
And there are only a limited number of bits available to represent those numbers.
Hence, something has to give. What gives is the precision of the numbers. Not all numbers can be represented exactly, so some (the vast majority actually) are approximations.
For example, 0.1 cannot be represented exactly with IEEE754 encoding, even with a billion bits available to you.
See this answer for more information on the inherent imprecision of limited-storage floating point numbers.
Casting to an int implicitly drops any decimal. No need to call Math.floor() (assuming positive numbers)
To avoid this behavior use BigDecimal;
http://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html
Standard floating point variables are in binary floating point. Many decimal floating point values (which are the ones you type in your code) have no exact representation in binary floating point. So it isn't doing the calculation with the exact numbers you entered but with some values very close to it. You can use Math.round to round the result to the precision you need and most likely the small error will disappear.
If you really need exact decimal calculation use BigDecimal but note that it is much slower.

Principle of java float number calculation error [duplicate]

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
How do you explain floating point inaccuracy to fresh programmers and laymen who still think computers are infinitely wise and accurate?
Do you have a favourite example or anecdote which seems to get the idea across much better than an precise, but dry, explanation?
How is this taught in Computer Science classes?
There are basically two major pitfalls people stumble in with floating-point numbers.
The problem of scale. Each FP number has an exponent which determines the overall “scale” of the number so you can represent either really small values or really larges ones, though the number of digits you can devote for that is limited. Adding two numbers of different scale will sometimes result in the smaller one being “eaten” since there is no way to fit it into the larger scale.
PS> $a = 1; $b = 0.0000000000000000000000001
PS> Write-Host a=$a b=$b
a=1 b=1E-25
PS> $a + $b
1
As an analogy for this case you could picture a large swimming pool and a teaspoon of water. Both are of very different sizes, but individually you can easily grasp how much they roughly are. Pouring the teaspoon into the swimming pool, however, will leave you still with roughly a swimming pool full of water.
(If the people learning this have trouble with exponential notation, one can also use the values 1 and 100000000000000000000 or so.)
Then there is the problem of binary vs. decimal representation. A number like 0.1 can't be represented exactly with a limited amount of binary digits. Some languages mask this, though:
PS> "{0:N50}" -f 0.1
0.10000000000000000000000000000000000000000000000000
But you can “amplify” the representation error by repeatedly adding the numbers together:
PS> $sum = 0; for ($i = 0; $i -lt 100; $i++) { $sum += 0.1 }; $sum
9,99999999999998
I can't think of a nice analogy to properly explain this, though. It's basically the same problem why you can represent 1/3 only approximately in decimal because to get the exact value you need to repeat the 3 indefinitely at the end of the decimal fraction.
Similarly, binary fractions are good for representing halves, quarters, eighths, etc. but things like a tenth will yield an infinitely repeating stream of binary digits.
Then there is another problem, though most people don't stumble into that, unless they're doing huge amounts of numerical stuff. But then, those already know about the problem. Since many floating-point numbers are merely approximations of the exact value this means that for a given approximation f of a real number r there can be infinitely many more real numbers r1, r2, ... which map to exactly the same approximation. Those numbers lie in a certain interval. Let's say that rmin is the minimum possible value of r that results in f and rmax the maximum possible value of r for which this holds, then you got an interval [rmin, rmax] where any number in that interval can be your actual number r.
Now, if you perform calculations on that number—adding, subtracting, multiplying, etc.—you lose precision. Every number is just an approximation, therefore you're actually performing calculations with intervals. The result is an interval too and the approximation error only ever gets larger, thereby widening the interval. You may get back a single number from that calculation. But that's merely one number from the interval of possible results, taking into account precision of your original operands and the precision loss due to the calculation.
That sort of thing is called Interval arithmetic and at least for me it was part of our math course at the university.
Show them that the base-10 system suffers from exactly the same problem.
Try to represent 1/3 as a decimal representation in base 10. You won't be able to do it exactly.
So if you write "0.3333", you will have a reasonably exact representation for many use cases.
But if you move that back to a fraction, you will get "3333/10000", which is not the same as "1/3".
Other fractions, such as 1/2 can easily be represented by a finite decimal representation in base-10: "0.5"
Now base-2 and base-10 suffer from essentially the same problem: both have some numbers that they can't represent exactly.
While base-10 has no problem representing 1/10 as "0.1" in base-2 you'd need an infinite representation starting with "0.000110011..".
How's this for an explantation to the layman. One way computers represent numbers is by counting discrete units. These are digital computers. For whole numbers, those without a fractional part, modern digital computers count powers of two: 1, 2, 4, 8. ,,, Place value, binary digits, blah , blah, blah. For fractions, digital computers count inverse powers of two: 1/2, 1/4, 1/8, ... The problem is that many numbers can't be represented by a sum of a finite number of those inverse powers. Using more place values (more bits) will increase the precision of the representation of those 'problem' numbers, but never get it exactly because it only has a limited number of bits. Some numbers can't be represented with an infinite number of bits.
Snooze...
OK, you want to measure the volume of water in a container, and you only have 3 measuring cups: full cup, half cup, and quarter cup. After counting the last full cup, let's say there is one third of a cup remaining. Yet you can't measure that because it doesn't exactly fill any combination of available cups. It doesn't fill the half cup, and the overflow from the quarter cup is too small to fill anything. So you have an error - the difference between 1/3 and 1/4. This error is compounded when you combine it with errors from other measurements.
In python:
>>> 1.0 / 10
0.10000000000000001
Explain how some fractions cannot be represented precisely in binary. Just like some fractions (like 1/3) cannot be represented precisely in base 10.
Another example, in C
printf (" %.20f \n", 3.6);
incredibly gives
3.60000000000000008882
Here is my simple understanding.
Problem:
The value 0.45 cannot be accurately be represented by a float and is rounded up to 0.450000018. Why is that?
Answer:
An int value of 45 is represented by the binary value 101101.
In order to make the value 0.45 it would be accurate if it you could take 45 x 10^-2 (= 45 / 10^2.)
But that’s impossible because you must use the base 2 instead of 10.
So the closest to 10^2 = 100 would be 128 = 2^7. The total number of bits you need is 9 : 6 for the value 45 (101101) + 3 bits for the value 7 (111).
Then the value 45 x 2^-7 = 0.3515625. Now you have a serious inaccuracy problem. 0.3515625 is not nearly close to 0.45.
How do we improve this inaccuracy? Well we could change the value 45 and 7 to something else.
How about 460 x 2^-10 = 0.44921875. You are now using 9 bits for 460 and 4 bits for 10. Then it’s a bit closer but still not that close. However if your initial desired value was 0.44921875 then you would get an exact match with no approximation.
So the formula for your value would be X = A x 2^B. Where A and B are integer values positive or negative.
Obviously the higher the numbers can be the higher would your accuracy become however as you know the number of bits to represent the values A and B are limited. For float you have a total number of 32. Double has 64 and Decimal has 128.
A cute piece of numerical weirdness may be observed if one converts 9999999.4999999999 to a float and back to a double. The result is reported as 10000000, even though that value is obviously closer to 9999999, and even though 9999999.499999999 correctly rounds to 9999999.

Weird Java behavior: How come adding doubles with EXACTLY two decimal places result to a double with MORE THAN two decimal places?

If I have an array of doubles that each have EXACTLY two decimal places, add them up altogether via a loop, and print out the total, what comes out is a number with MORE THAN two decimal places. Which is weird, because theoretically, adding two numbers that each have 2 and only 2 decimal places will NEVER produce a number that has a non-zero digit beyond the hundredths place.
Try executing this code:
double[] d = new double[2000];
for (int i = 0; i < d.length; i++) {
d[i] = 9.99;
}
double total = 0,00;
for (int i = 0; i < d.length; i++) {
total += d[i];
if (("" + total).matches("[0-9]+\\.[0-9]{3,}")) { // if there are 3 or more decimal places in the total
System.out.println("total: " + total + ", " + i); // print the total and the iteration when it occured
}
}
In my computer, this prints out:
total: 59.940000000000005, 5
If I round off the total to two decimal places then I'd get the same number as I would if I manually added 9.99 six times on a calculator. But how come this is happening and where are the extra decimal places coming from? Am I doing something wrong or (I doubt this is likely) is this a Java bug?
Are you familiar with base 10 to base 2 conversion (decimal to binary) for fractions? If not, look it up.
Then you'll see that although 9.99 looks pretty normal in base 10, it doesn't really look that nice in binary; It looks like a repeating decimal, but in binary. I'm sure you've seen a repeating decimal before, right? It doesn't end. But Java (or any language for that matter) has to save that infinite sequence of digits into a limited number of bytes. And that's when the extra digits appear. When you convert that truncated binary back to decimal, you're really dealing with a different number. The number stored in the variable isn't 9.99 exactly, it something like 9.9999999991 (just an example, I didn't work out the math).
But you're probably interested on how to solve this, right? Look up the BigDecimal class. That's what you want to use for your calculations, especially when dealing with currency. Also, look up DecimalFormat, which is a class for writing a number as a properly formatted string. I think it does rounding for you when you want to show only 2 decimal digits and your number has a lot more, for example.
If I have an array of doubles that each have EXACTLY two decimal places
Let's stop right there, because I suspect you don't. For example, you give 9.99 in your sample code. That isn't really 9.99. That's "the closest double to 9.99" as 9.99 itself can't be exactly represented in binary floating point.
At that point, the rest of your reasoning goes out of the window.
If you want values with an exact number of decimal digits, you should use a type which stores values in a decimal-centric manner, such as BigDecimal. Alternatively, store everything as integers and "know" that you're actually remembering "the value * 100" instead.
Doubles are represented in a binary format on the computer (). This means that certain numbers cannot be represented accurately, so the computer will use the closest number that can be represented.
E.g. 10.5 = 2^3+2+2^(-1) = 1.0101 * 2^3 (here the mantissa is in binary)
but 10.1 = 2^3+2+2^(-4)+2^(-5)+(infinite series here) = 1.0100001... * 2^3
9.99 is such a number with infinite representation. Thus when you add them together, the finite representation used by the computer is used in the calculation and the result will be even more further away from the mathematical sum than the originals were from their true representation. This is why you see more digits displayed than used in the original numbers.
this is because of floating point arithmetics.
doubles and floats are not exactly real numbers, there are finite number of bits to represent them while there are infinite number of real numbers [in any range], so you cannot represent all real numbers - You are getting the closest number you can have with the floating point representation.
Whenever you deal with floating points - remember that they are only an approximation to the number you are seeking. You might want to use BigDecimal if you want the exact number [or at least control the error].
More info can be found at this article
Use BigDecimal to perform floating point calculations with precision. It's a must when it comes to money.
This is a known issue that stems in the fact that binary calculations don't allow for precise floating point operations. Look at "floating point arithmetics" for more details.
This is due to inaccuracies when it comes to representing decimal numbers using a binary floating point value. In other words, the double literal 0.99 does not actually represent the mathematical value 9.99.
To reveal exactly what number a value, such as 9.99 represents you could let BigDecimal print the value.
Code to reveal the exact value:
System.out.println(new BigDecimal(9.99));
Output:
9.9900000000000002131628207280300557613372802734375
Btw, your reasoning would be completely accurate if you were taking about binary places instead of decimal places, since a number with two binary places can be exactly represented by a binary floating point value.

Categories