Java: Is (int) double reliable? - java

When I perform simple math in java with doubles and other number data types, the double values seem to randomly vary a bit from the supposed result, which might be 5,59999999997 or 6,0000000002 or something. When I cast to int, the double value is obviously rounded down to the next whole number. Does this mean the double could be both 5 or 6? Or does that "5,999999999997" still count as 6 though which would be depending on the binary float value? If not, is there a way to let the negative vary be rounded up, but not lower values from 5,5 to 5,999999999996?
I mean, I dont really want to round the value as described in my last sentence. I'd like to always round down to the next whole number, but I don't want to cause an extra decrement due to wrong double math results.

Converting a double to an int always rounds down. You can round to the nearest whole integer via Math.round(double). The double is varying from what you expect because of floating point error.

If you want to round, you can use the round() method.
double d = 6 +/- some small error
long l = Math.round(d);
Or you can add 0.5 for positive numbers
long l = (long) (d + 0.5);
or
long l = (long) (d + (d < 0 ? -0.5 : 0.5));

I'm not sure I understand the question. Usually when you cast a double to int you add 0.5 to have a nice round.

From the Java Language Specification:
The Java programming language uses round toward zero when converting a floating value to an
integer (§5.1.3), which acts, in this case, as though the number were truncated, discarding
the mantissa bits. Rounding toward zero chooses at its result the format's value closest to
and no greater in magnitude than the infinitely precise result.
So 5,999999999997 when casted to an int will 5 and 6,0000000002 will be 6. If I understand what you are asking with having negative versions of the values (e.g. -5.97), I fail to see how
Math.round() does not suffice you. -6,0000000002 will be rounded to -6 as will -5,999999999997 and every other value above (but not including) -5.5.

Related

Java Float multiplication giving wrong result

Multiplication using FLOAT is giving noticeable difference.
public static void main(String[] args) {
// using string and parsing instead of actual data type is part of use case, that is why representing the same here
double v1 = parseDouble("590.0");
double v2 = parseDouble("490.0");
double v3 = parseDouble("391.0");
float v4 = parseFloat("590.0");
float v5 = parseFloat("490.0");
float v6 = parseFloat("391.0");
System.out.println(new BigDecimal(v1 * v2 * v3));
System.out.println(new BigDecimal(v4 * v5 * v6));
System.out.println(BigDecimal.valueOf(Float.parseFloat("289100.0") * Float.parseFloat("391.0")));
System.out.println(BigDecimal.valueOf(Double.parseDouble("289100.0") * Double.parseDouble("391.0")));
}
Output:
113038100 // double multiplication
113038096 // float multiplication
113038096
113038100
For above code,
(590.0 * 490.0 * 391.0) gives 113038100 using double
(590.0 * 490.0 * 391.0) gives 113038096 using float (113038100 - 113038096 = 4 // difference)
I have read through https://floating-point-gui.de/basic/ this link and able to understand how floating point calculation happens and all however 4 count different is unexpected.
Please help me understand below things
Is this correct first
Does always float gives wrong numbers ??
As I can see double also use same technique, so how much guarantee we have to get correct result if we use double
Does always float gives wrong numbers ??
it depend on the number if the number can be represented using the float precision then it will be fine
"As I can see double also use same technique, so how much guarantee we
have to get correct result if we use double"
double has same issue but since double has more precision the possibility get lower but it still happen
so when you need a very precise result like in scientific or financial app you will need to use BigDecimal
watch this video it explain how the float point number work
https://www.youtube.com/watch?v=ajaHQ9S4uTA
Is this correct first
The Java float format is IEEE-754 binary32. In this format, every finite number is represented as a sign, a 24-bit integer, and a scaling by a power of two from 2−149 to 2104. The integer part is called the significand. (The format is often described as a sign, a 24-bit number with a binary point after the first bit, so it has a value in [0, 2), and a scaling from 2−126 to 2127. These are mathematically equivalent, and the format used here is noted in the IEEE-754 standard as an option.) In normal form, the 24-bit integer is 223 or greater. (Representable numbers less than 2−126 cannot be represented in normal form and are necessarily subnormal.)
In this format, 590 can be represented as +590•20 or +8,339,456•2−14. 490 is +490•20 or +16,056,320•2−15.
Their product is +289,100•20 or +9,251,200•2−5.
391 is +391•20 or +12,812,288−15.
The ordinary arithmetic product of +289,100•20 and +391•20 is +113,038,100•20. However, 113,038,100 is not a 24-bit number; it is a 27-bit number. To get it under 224, we can adjust the scaling, multiplying the significand by ⅛ and multiplying the scaling by 8 = 23.
That gives us +14,129,762.5•23. However, now the significand is not an integer. This result is not representable in the float format. To produce a result, the operation of adding in the float format is defined to round the ordinary arithmetic to the nearest representable value. In this case, there is a tie, we could round the .5 up or down. Ties are resolved by rounding to make the low digit even, so we round to +14,129,762•23.
+14,129,762•23 is 113,038,096. That is the result you got, so it is correct.
Does always float gives wrong numbers ??
This is not wrong; the computer behaved according to its specification.
Observe float is a 32-bit format, but there are infinitely many real numbers. There are even infinitely many rational numbers. It is impossible for a 32-bit format to produce the same results as theoretical real-number arithmetic or rational-number arithmetic. There are simply more possible results than there are representable values.
This is true of the 64-bit double format as well. It is also true of integer formats, fixed-precision formats, and all numerical formats with a fixed number of bits. A fixed number of bits cannot represent infinitely many values.
Your comments suggest you thought floating-point would produce approximate results for fractional values, numbers less than one. But the limitation on how many values can be represented applies at all scales. At each scale (each power of two), only 224 values are representable (223 in normal form). For scale 20, all the non-negative integers below 224 are representable. But, above that, only some of the integers are representable. At first, we have to skip every second integer, then every fourth, then every eighth, and so on.
Floating-point arithmetic is designed to approximate real-number arithmetic. It should be used when you want to approximate real-number arithmetic. It should not be used, with rare exceptions, when you want exact arithmetic.

When using doubles, why isn't (x / (y * z)) the same as (x / y / z)? [duplicate]

This question already has answers here:
How to avoid floating point precision errors with floats or doubles in Java?
(12 answers)
Double calculation producing odd result [duplicate]
(3 answers)
Closed 7 years ago.
This is partly academic, as for my purposes I only need it rounded to two decimal places; but I am keen to know what is going on to produce two slightly different results.
This is the test that I wrote to narrow it to the simplest implementation:
#Test
public void shouldEqual() {
double expected = 450.00d / (7d * 60); // 1.0714285714285714
double actual = 450.00d / 7d / 60; // 1.0714285714285716
assertThat(actual).isEqualTo(expected);
}
But it fails with this output:
org.junit.ComparisonFailure:
Expected :1.0714285714285714
Actual :1.0714285714285716
Can anyone explain in detail what is going on under the hood to result in the value at 1.000000000000000X being different?
Some of the points I'm looking for in an answer are:
Where is the precision lost?
Which method is preferred, and why?
Which is actually correct? (In pure maths, both can't be right. Perhaps both are wrong?)
Is there a better solution or method for these arithmetic operations?
I see a bunch of questions that tell you how to work around this problem, but not one that really explains what's going on, other than "floating-point roundoff error is bad, m'kay?" So let me take a shot at it. Let me first point out that nothing in this answer is specific to Java. Roundoff error is a problem inherent to any fixed-precision representation of numbers, so you get the same issues in, say, C.
Roundoff error in a decimal data type
As a simplified example, imagine we have some sort of computer that natively uses an unsigned decimal data type, let's call it float6d. The length of the data type is 6 digits: 4 dedicated to the mantissa, and 2 dedicated to the exponent. For example, the number 3.142 can be expressed as
3.142 x 10^0
which would be stored in 6 digits as
503142
The first two digits are the exponent plus 50, and the last four are the mantissa. This data type can represent any number from 0.001 x 10^-50 to 9.999 x 10^+49.
Actually, that's not true. It can't store any number. What if you want to represent 3.141592? Or 3.1412034? Or 3.141488906? Tough luck, the data type can't store more than four digits of precision, so the compiler has to round anything with more digits to fit into the constraints of the data type. If you write
float6d x = 3.141592;
float6d y = 3.1412034;
float6d z = 3.141488906;
then the compiler converts each of these three values to the same internal representation, 3.142 x 10^0 (which, remember, is stored as 503142), so that x == y == z will hold true.
The point is that there is a whole range of real numbers which all map to the same underlying sequence of digits (or bits, in a real computer). Specifically, any x satisfying 3.1415 <= x <= 3.1425 (assuming half-even rounding) gets converted to the representation 503142 for storage in memory.
This rounding happens every time your program stores a floating-point value in memory. The first time it happens is when you write a constant in your source code, as I did above with x, y, and z. It happens again whenever you do an arithmetic operation that increases the number of digits of precision beyond what the data type can represent. Either of these effects is called roundoff error. There are a few different ways this can happen:
Addition and subtraction: if one of the values you're adding has a different exponent from the other, you will wind up with extra digits of precision, and if there are enough of them, the least significant ones will need to be dropped. For example, 2.718 and 121.0 are both values that can be exactly represented in the float6d data type. But if you try to add them together:
1.210 x 10^2
+ 0.02718 x 10^2
-------------------
1.23718 x 10^2
which gets rounded off to 1.237 x 10^2, or 123.7, dropping two digits of precision.
Multiplication: the number of digits in the result is approximately the sum of the number of digits in the two operands. This will produce some amount of roundoff error, if your operands already have many significant digits. For example, 121 x 2.718 gives you
1.210 x 10^2
x 0.02718 x 10^2
-------------------
3.28878 x 10^2
which gets rounded off to 3.289 x 10^2, or 328.9, again dropping two digits of precision.
However, it's useful to keep in mind that, if your operands are "nice" numbers, without many significant digits, the floating-point format can probably represent the result exactly, so you don't have to deal with roundoff error. For example, 2.3 x 140 gives
1.40 x 10^2
x 0.23 x 10^2
-------------------
3.22 x 10^2
which has no roundoff problems.
Division: this is where things get messy. Division will pretty much always result in some amount of roundoff error unless the number you're dividing by happens to be a power of the base (in which case the division is just a digit shift, or bit shift in binary). As an example, take two very simple numbers, 3 and 7, divide them, and you get
3. x 10^0
/ 7. x 10^0
----------------------------
0.428571428571... x 10^0
The closest value to this number which can be represented as a float6d is 4.286 x 10^-1, or 0.4286, which distinctly differs from the exact result.
As we'll see in the next section, the error introduced by rounding grows with each operation you do. So if you're working with "nice" numbers, as in your example, it's generally best to do the division operations as late as possible because those are the operations most likely to introduce roundoff error into your program where none existed before.
Analysis of roundoff error
In general, if you can't assume your numbers are "nice", roundoff error can be either positive or negative, and it's very difficult to predict which direction it will go just based on the operation. It depends on the specific values involved. Look at this plot of the roundoff error for 2.718 z as a function of z (still using the float6d data type):
In practice, when you're working with values that use the full precision of your data type, it's often easier to treat roundoff error as a random error. Looking at the plot, you might be able to guess that the magnitude of the error depends on the order of magnitude of the result of the operation. In this particular case, when z is of the order 10-1, 2.718 z is also on the order of 10-1, so it will be a number of the form 0.XXXX. The maximum roundoff error is then half of the last digit of precision; in this case, by "the last digit of precision" I mean 0.0001, so the roundoff error varies between -0.00005 and +0.00005. At the point where 2.718 z jumps up to the next order of magnitude, which is 1/2.718 = 0.3679, you can see that the roundoff error also jumps up by an order of magnitude.
You can use well-known techniques of error analysis to analyze how a random (or unpredictable) error of a certain magnitude affects your result. Specifically, for multiplication or division, the "average" relative error in your result can be approximated by adding the relative error in each of the operands in quadrature - that is, square them, add them, and take the square root. With our float6d data type, the relative error varies between 0.0005 (for a value like 0.101) and 0.00005 (for a value like 0.995).
Let's take 0.0001 as a rough average for the relative error in values x and y. The relative error in x * y or x / y is then given by
sqrt(0.0001^2 + 0.0001^2) = 0.0001414
which is a factor of sqrt(2) larger than the relative error in each of the individual values.
When it comes to combining operations, you can apply this formula multiple times, once for each floating-point operation. So for instance, for z / (x * y), the relative error in x * y is, on average, 0.0001414 (in this decimal example) and then the relative error in z / (x * y) is
sqrt(0.0001^2 + 0.0001414^2) = 0.0001732
Notice that the average relative error grows with each operation, specifically as the square root of the number of multiplications and divisions you do.
Similarly, for z / x * y, the average relative error in z / x is 0.0001414, and the relative error in z / x * y is
sqrt(0.0001414^2 + 0.0001^2) = 0.0001732
So, the same, in this case. This means that for arbitrary values, on average, the two expressions introduce approximately the same error. (In theory, that is. I've seen these operations behave very differently in practice, but that's another story.)
Gory details
You might be curious about the specific calculation you presented in the question, not just an average. For that analysis, let's switch to the real world of binary arithmetic. Floating-point numbers in most systems and languages are represented using IEEE standard 754. For 64-bit numbers, the format specifies 52 bits dedicated to the mantissa, 11 to the exponent, and one to the sign. In other words, when written in base 2, a floating point number is a value of the form
1.1100000000000000000000000000000000000000000000000000 x 2^00000000010
52 bits 11 bits
The leading 1 is not explicitly stored, and constitutes a 53rd bit. Also, you should note that the 11 bits stored to represent the exponent are actually the real exponent plus 1023. For example, this particular value is 7, which is 1.75 x 22. The mantissa is 1.75 in binary, or 1.11, and the exponent is 1023 + 2 = 1025 in binary, or 10000000001, so the content stored in memory is
01000000000111100000000000000000000000000000000000000000000000000
^ ^
exponent mantissa
but that doesn't really matter.
Your example also involves 450,
1.1100001000000000000000000000000000000000000000000000 x 2^00000001000
and 60,
1.1110000000000000000000000000000000000000000000000000 x 2^00000000101
You can play around with these values using this converter or any of many others on the internet.
When you compute the first expression, 450/(7*60), the processor first does the multiplication, obtaining 420, or
1.1010010000000000000000000000000000000000000000000000 x 2^00000001000
Then it divides 450 by 420. This produces 15/14, which is
1.0001001001001001001001001001001001001001001001001001001001001001001001...
in binary. Now, the Java language specification says that
Inexact results must be rounded to the representable value nearest to the infinitely precise result; if the two nearest representable values are equally near, the one with its least significant bit zero is chosen. This is the IEEE 754 standard's default rounding mode known as round to nearest.
and the nearest representable value to 15/14 in 64-bit IEEE 754 format is
1.0001001001001001001001001001001001001001001001001001 x 2^00000000000
which is approximately 1.0714285714285714 in decimal. (More precisely, this is the least precise decimal value that uniquely specifies this particular binary representation.)
On the other hand, if you compute 450 / 7 first, the result is 64.2857142857..., or in binary,
1000000.01001001001001001001001001001001001001001001001001001001001001001...
for which the nearest representable value is
1.0000000100100100100100100100100100100100100100100101 x 2^00000000110
which is 64.28571428571429180465... Note the change in the last digit of the binary mantissa (compared to the exact value) due to roundoff error. Dividing this by 60 gives you
1.000100100100100100100100100100100100100100100100100110011001100110011...
Look at the end: the pattern is different! It's 0011 that repeats, instead of 001 as in the other case. The closest representable value is
1.0001001001001001001001001001001001001001001001001010 x 2^00000000000
which differs from the other order of operations in the last two bits: they're 10 instead of 01. The decimal equivalent is 1.0714285714285716.
The specific rounding that causes this difference should be clear if you look at the exact binary values:
1.0001001001001001001001001001001001001001001001001001001001001001001001...
1.0001001001001001001001001001001001001001001001001001100110011001100110...
^ last bit of mantissa
It works out in this case that the former result, numerically 15/14, happens to be the most accurate representation of the exact value. This is an example of how leaving division until the end benefits you. But again, this rule only holds as long as the values you're working with don't use the full precision of the data type. Once you start working with inexact (rounded) values, you no longer protect yourself from further roundoff errors by doing the multiplications first.
It has to do with how the double type is implemented and the fact that the floating-point types don't make the same precision guarantees as other simpler numerical types. Although the following answer is more specifically about sums, it also answers your question by explaining how there is no guarantee of infinite precision in floating-point mathematical operations: Why does changing the sum order returns a different result?. Essentially you should never attempt to determine the equality of floating-point values without specifying an acceptable margin of error. Google's Guava library includes DoubleMath.fuzzyEquals(double, double, double) to determine the equality of two double values within a certain precision. If you wish to read up on the specifics of floating-point equality this site is quite useful; the same site also explains floating-point rounding errors. In summation: the expected and actual values of your calculation differ because of the rounding differing between the calculations due to the order of operations.
Let's simplify things a bit. What you want to know is why 450d / 420 and 450d / 7 / 60 (specifically) give different results.
Let's see how division is performed in IEE double-precision floating point format. Without going deep into implementation details, it's basically XOR-ing the sign bit, subtracting the exponent of the divisor from the exponent of the dividend, dividing the mantissas, and normalizing the result.
First, we should represent our numbers in the proper format for double:
450 is 0 10000000111 1100001000000000000000000000000000000000000000000000
420 is 0 10000000111 1010010000000000000000000000000000000000000000000000
7 is 0 10000000001 1100000000000000000000000000000000000000000000000000
60 is 0 10000000100 1110000000000000000000000000000000000000000000000000
Let's first divide 450 by 420
First comes the sign bit, it's 0 (0 xor 0 == 0).
Then comes the exponent. 10000000111b - 10000000111b + 1023 == 10000000111b - 10000000111b + 01111111111b == 01111111111b
Looking good, now the mantissa:
1.1100001000000000000000000000000000000000000000000000 / 1.1010010000000000000000000000000000000000000000000000 == 1.1100001 / 1.101001. There are a couple of different ways to do this, I'll talk a bit about them later. The result is 1.0(001) (you can verify it here).
Now we should normalize the result. Let's see the guard, round and sticky bit values:
0001001001001001001001001001001001001001001001001001 0 0 1
Guard bit's 0, we don't do any rounding. The result is, in binary:
0 01111111111 0001001001001001001001001001001001001001001001001001
Which gets represented as 1.0714285714285714 in decimal.
Now let's divide 450 by 7 by analogy.
Sign bit = 0
Exponent = 10000000111b - 10000000001b + 01111111111b == -01111111001b + 01111111111b + 01111111111b == 10000000101b
Mantissa = 1.1100001 / 1.11 == 1.00000(001)
Rounding:
0000000100100100100100100100100100100100100100100100 1 0 0
Guard bit is set, round and sticky bits are not. We are rounding to-nearest (default mode for IEEE), and we're stuck right between the two possible values which we could round to. As the lsb is 0, we add 1. This gives us the rounded mantissa:
0000000100100100100100100100100100100100100100100101
The result is
0 10000000101 0000000100100100100100100100100100100100100100100101
Which gets represented as 64.28571428571429 in decimal.
Now we will have to divide it by 60... But you already know that we have lost some precision. Dividing 450 by 420 didn't require rounding at all, but here, we already had to round the result at least once. But, for completeness's sake, let's finish the job:
Dividing 64.28571428571429 by 60
Sign bit = 0
Exponent = 10000000101b - 10000000100b + 01111111111b == 01111111110b
Mantissa = 1.0000000100100100100100100100100100100100100100100101 / 1.111 == 0.10001001001001001001001001001001001001001001001001001100110011
Round and shift:
0.1000100100100100100100100100100100100100100100100100 1 1 0 0
1.0001001001001001001001001001001001001001001001001001 1 0 0
Rounding just as in the previous case, we get the mantissa: 0001001001001001001001001001001001001001001001001010.
As we shifted by 1, we add that to the exponent, getting
Exponent = 01111111111b
So, the result is:
0 01111111111 0001001001001001001001001001001001001001001001001010
Which gets represented as 1.0714285714285716 in decimal.
Tl;dr:
The first division gave us:
0 01111111111 0001001001001001001001001001001001001001001001001001
And the last division gave us:
0 01111111111 0001001001001001001001001001001001001001001001001010
The difference is in the last 2 bits only, but we could have lost more - after all, to get the second result, we had to round two times instead of none!
Now, about mantissa division. Floating-point division is implemented in two major ways.
The way mandated by the IEEE long division (here are some good examples; it's basically the regular long division, but with binary instead of decimal), and it's pretty slow. That is what your computer did.
There is also a faster but less accrate option, multiplication by inverse. First, a reciprocal of the divisor is found, and then multiplication is performed.
That's because double division often lead to a loss of precision. Said loss can vary depending on the order of the divisions.
When you divide by 7d, you already lost some precision with the actual result. Then only you divide an erroneous result by 60.
When you divide by 7d * 60, you only have to use division once, thus losing precision only once.
Note that double multiplication can sometimes fail too, but that's much less common.
Certainly the order of the operations mixed with the fact that doubles aren't precise :
450.00d / (7d * 60) --> a = 7d * 60 --> result = 450.00d / a
vs
450.00d / 7d / 60 --> a = 450.00d /7d --> result = a / 60

How to actually avoid floating point errors when you need to use float?

I am trying to affect the translation of a 3D model using some UI buttons to shift the position by 0.1 or -0.1.
My model position is a three dimensional float so simply adding 0.1f to one of the values causes obvious rounding errors. While I can use something like BigDecimal to retain precision, I still have to convert it from a float and back to a float at the end and it always results in silly numbers that are making my UI look like a mess.
I could just pretty the displayed values but the rounding errors will only get worse with more editing and they make my save files rather hard to read.
So how do I actually avoid these errors when I need to use a float?
The Kahan summation and pairwise summation algorithms help to reduce floating point errors. Here's some Java code for the Kahan algorithm.
I would use a Rational class. There are many out there - this one looks like it should work.
One significant cost will be when the Rational is rendered into a float and one when the denominator is reduced to the gcd. The one I posted keeps the numerator and denominator in fully reduced state at all times which should be quite efficient if you are always adding or subtracting 1/10.
This implementation holds the values normalised (i.e. with consistent sign) but unreduced.
You should choose your implementation to best fit your usage.
A simple solution is to either use fixed precision. i.e. an integer 10x or 100x what you want.
float f = 10;
f += 0.1f;
becomes
int i = 100;
i += 1; // use an many times as you like
// use i / 10.0 as required.
I wouldn't use float in any case as you get more rounding errors than double for next to no benefit (unless you have millions of float values) double gives you 8 more digits of precision and with sensible rounding would won't see those errors.
If you stick with floats:
The easiest way to avoid the error is using floats which are exact, but
near the desired value which is
round(2^n * value) * 1/2^n.
n is the number of bits, value the number to use (in your case 0.1)
In your case with increasing precision:
n = 4 => 0.125
n = 8 (byte) => 0.9765625
n = 16 (short)=> 0.100006103516....
The long number chains are artefacts of the binary conversion,
the real number has much less bits.
As the floats are exact, addition and subtraction will
not introduce offset errors, but will always be
predictable as long as the number of bits is
not longer than the float value holds.
If you fear that your display will be compromised by
using this solution (because they are odd floats), use
and store only integers (step increase -1/1).
The final value which is internally set is
x = value * step.
As the step increases or decreases by an amount of 1,
precision will be retained.

Java sum of all double does not return expected result [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Moving decimal places over in a double
Why is the following sum of numbers not equal to 0.4622? but 0.46219999999999994
Double total = new Double(0.08) + new Double(0.0491) + new Double(0.3218) +
new Double(0.0113) + new Double(0.0); // = 0.46219999999999994
I have an application that checks the users input.
The user inputs 5 decimal numbers and a total number. The application checks if the sum of all 5 numbers capped at 4 decimals behind the komma is equal to the total number.
Capping it gives me 0.4621 which is not equal to 0.4622. I can't use DecimalFormat because it rounds it up. And if i explicitly say, round down then it will fail for this situation.
Any suggestion for how I can solve this?
Try with java.math.BigDecimal. Double rounds result. You will just have to use add method, not + operator.
Avoid using float and double if exact answers are required-- Item 48 -- Effective Java Second edition
Use BigDecimal instead.
Looks like a classic case of floating point arithmetic. If you want exact calculations, use java.math.BigDecimal. Have a look at What Every Computer Scientist Should Know About Floating-Point Arithmetic
When you use floating point arithmetic you must also use appropriate rounding.
BTW: Don't use an object when a primitive will do.
double total = 0.08 + 0.0491 + 0.3218 + 0.0113 + 0.0;
System.out.printf("%.4f%n", total);
double rounded = Math.round(total * 1e4) / 1e4;
if (rounded == 0.4622)
System.out.println("rounded matched");
prints
0.4622
rounded matched
as expected.
Double and float in Java are internally represented as binary fractions and can therefore be not precise in representing decimal fractions (IEEE standard 754). If your decimal number calculations require precision, use Java.math.BigDecimal.
Floating point representation is a close approximation so you will have these little rounding errors when you use float and double. If you try to convert 0.08 to binary for instance you will realize that you cannot actually do it exactly. You need to consider this whenever you use double and float in calculations.
0.0810 = 0.00010100011110101110...2
a repeating pattern. So no matter how many bits you use this will have a rounding error.
That is yet another rounding issue. You should never compare doubles and expect them to be exactly equal. Instead define a small epsylon and expect the result to be within epsylon of the expected answer.
Any floating point value is inexact. The solution is to use DecimalFormat when you have to display the values. And no, it doesn't round up but to the nearest value.
From the javadoc :
DecimalFormat uses half-even rounding (see ROUND_HALF_EVEN) for
formatting.
The internal representation of floating point numbers like Double is never a exact one. This is why during calculations such errors can occur.
It is always suggested to format such a result to a specific number of digits past the comma, so you result would be correctly be display as "0.4622" with 4 to 15 or more digits.
Perhaps checking the string input directly would be more feasible for you. That is check the length of characters after the decimal place.

Loss of precision after subtracting double from double [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Retain precision with Doubles in java
Alright so I've got the following chunk of code:
int rotation = e.getWheelRotation();
if(rotation < 0)
zoom(zoom + rotation * -.05);
else if(zoom - .05 > 0)
zoom(zoom - rotation * .05);
System.out.println(zoom);
Now, the zoom variable is of type double, initially set to 1. So, I would expect the results to be like 1 - .05 = .95; .95 - .05 = .9; .9 - .05 = .85; etc. This appears to be not the case though when I print the result as you can see below:
0.95
0.8999999999999999
0.8499999999999999
0.7999999999999998
0.7499999999999998
0.6999999999999997
Hopefully someone is able to clearly explain. I searched the internet and I read it has something to do with some limitations when we're storing floats in binary but I still don't quite understand. A solution to my problem is not shockingly important but I would like to understand this kind of behavior.
Java uses IEEE-754 floating point numbers. They're not perfectly precise. The famous example is:
System.out.println(0.1d + 0.2d);
...which outputs 0.30000000000000004.
What you're seeing is just a symptom of that imprecision. You can improve the precision by using double rather than float.
If you're dealing with financial calculations, you might prefer BigDecimal to float or double.
float and double have limited precision because its fractional part is represented as a series of powers of 2 e.g. 1/2 + 1/4 + 1/8 ... If you have an number like 1/10 it has to be approximated.
For this reason, whenever you deal with floating point you must use reasonable rounding or you can see small errors.
e.g.
System.out.printf("%.2f%n", zoom);
To minimise round errors, you could count the number of rotations instead and divide this int value by 20.0. You won't see a rounding error this way, and it will be faster, with less magic numbers.
float and double have precision issues. I would recommend you take a look at the BigDecimal Class. That should take care of precision issues.
Since decimal numbers (and integer numbers as well) can have an infinite number of possible values, they are impossible to map precisely to bits using a standard format. Computers circumvent this problem by limiting the range the numbers can assume.
For example, an int in java can represent nothing larger then Integer.MAX_VALUE or 2^31 - 1.
For decimal numbers, there is also a problem with the numbers after the comma, which also might be infinite. This is solved by not allowing all decimal values, but limiting to a (smartly chosen) number of possibilities, based on powers of 2. This happens automatically but is often nothing to worry about, you can interpret your result of 0.899999 as 0.9. In case you do need explicit precision, you will have to resort to other data types, which might have other limitations.

Categories