Math in Java - different results with different objects - java

I'm getting some strange results doing a calculation for the application I'm working on and I thought someone on here might be able to help figure out what's going on.
The requirements for this particular calculation state that the calculation should look like this:
A and B are known
A * B = C
For this particular calculation
A = 0.0410
B = 123456789010
Here are the results I'm seeing:
Calculator:
0.0410 * 123456789010 = 5061728349.41
Java:
B is a double:
0.0410f * 123456789010d = 5.061728489223363E9 = 5061728489.223363
B is a long:
0.0410f * 123456789010l = 5.0617288E9
The loss of precision is of less importance to me (I only need 9 digits of precision anyway) than the difference in the 10s and 1s spot. Why does doing the calculation using the double give me the "wrong" result?
Incidentally, I tried doing the calculation using BigDecimal and got the same result as I did using a double.

The various type conversions that happen are specified by the JLS #5.6.2. In your case (extract):
If either operand is of type double, the other is converted to double.
Otherwise, if either operand is of type float, the other is converted to float.
In 0.0410f * 123456789010d = 506172848.9223363, 0.0410f is first converted to a double which is not necessarily equal to 0.0410d. Actually you can try it and see that is is not:
double d1 = 0.041d;
double d2 = 0.041f;
System.out.println(new BigDecimal(d1));
System.out.println(new BigDecimal(d2));
outputs:
0.041000000000000001720845688168992637656629085540771484375
0.041000001132488250732421875
In your next example:
0.0410f * 123456789010L = 506172832
the long is converted to a float, which you can verify with this example:
float f1 = 0.0410f;
float f2 = 123456789010L;
System.out.println(new BigDecimal(f1)); // 0.041000001132488250732421875
System.out.println(new BigDecimal(f2)); // 123456790528
System.out.println(new BigDecimal(0.0410f * 123456789010L)); // 5061728768
System.out.println(new BigDecimal(f1 * f2)); // 5061728768
As for the precision of float / double operations in general, check this question.
Finally, if you use a BigDecimal, you get the correct answer:
BigDecimal a = new BigDecimal("0.041");
BigDecimal b = new BigDecimal("123456789010");
System.out.println(a.multiply(b)); // outputs 5061728349.410

TLDR Answer: The float cannot represent the 'correct' answer any more exactly. Use a double instead. Also the multiplication will be done inexactly as well without an explicit cast.
Answers I get using http://www.ideone.com
A B C
float long float 5061728768.000000
double long double 5061728489.223363
The problem is that the precision of a float is much less than a double, so when multiplied up by a large number (e.g. your 10^10 value) you lose this precision in the multiplication. If we explicitly cast A to a double for the multiplication:
double C = ((double)A)*B; //=5061728489.223363
Then we get back the additional precision. If we cast the double answer back to a float:
float C = (float)((double)((double)A)*B); //=5061728256.000000
You see that the answer is different again. The result type of the multiply is used, so in this instance double, but the cast back to float drops precision. Without an explicit case to double (double C=A*B), the float type is used. With both casts, the multiply is done as a double, and the precision is lost after the multiplication.

The first calculation is using double (64 bits), the second float (32 bits). What you are seeing is "rounding errors".
In both cases it is a floating-point calculation, but in the second case, no "double" arguments are involved, so it just uses 32 bit arithmetic.
Quoting the Java language spec:
If at least one of the operands to a binary operator is of floating-point type, then the operation is a floating-point operation, even if the other is integral.
If at least one of the operands to a numerical operator is of type double, then the operation is carried out using 64-bit floating-point arithmetic, and the result of the numerical operator is a value of type double. If the other operand is not a double, it is first widened (§5.1.5) to type double by numeric promotion (§5.6).
Otherwise, the operation is carried out using 32-bit floating-point arithmetic, and the result of the numerical operator is a value of type float. (If the other operand is not a float, it is first widened to type float by numeric promotion.)

The answer to your question is probably in the Floating point operation section of the Java Language Specification and in this older post. You are probably experiencing rounding errors due to the implicit conversion that is ocurring.
The quote that applies to your situation is
Third operation:
If at least one of the operands to a binary operator is of
floating-point type, then the operation is a floating-point operation,
even if the other is integral.
Second operation:
If at least one of the operands to a numerical operator is of type
double, then the operation is carried out using 64-bit floating-point
arithmetic, and the result of the numerical operator is a value of
type double. If the other operand is not a double, it is first widened
(§5.1.5) to type double by numeric promotion (§5.6).
First operation
Otherwise, the operation is carried out using 32-bit floating-point
arithmetic, and the result of the numerical operator is a value of
type float. (If the other operand is not a float, it is first widened
to type float by numeric promotion.)
Hence, you should not be worried, but decide what is the precision you desire and use the appropriate casting, if necessary.

32-bit IEEE floating point numbers have seven digits of precision; 64-bit allows 16. That's all you get. If neither of those is sufficient, you have to use BigDecimal.
This is true in every language that implements the IEE standard, not just Java.

Related

Java implicit conversion

With the following code:
Float a = 1.2;
there is an error because it takes the decimal as double value and double is a bigger datatype than float.
Now, it takes integer as default int type. So, why is the following code not giving any error?
Byte b = 20;
The compiler is smart enough to figure out that the bit representation of 20 (an int value) can fit into a byte with no loss of data. From the Java Language Specification §5.1.3:
A narrowing primitive conversion from double to float is governed by the IEEE 754 rounding rules (§4.2.4). This conversion can lose precision, but also lose range, resulting in a float zero from a nonzero double and a float infinity from a finite double. A double NaN is converted to a float NaN and a double infinity is converted to the same-signed float infinity.
A narrowing conversion of a signed integer to an integral type T simply discards all but the n lowest order bits, where n is the number of bits used to represent type T. In addition to a possible loss of information about the magnitude of the numeric value, this may cause the sign of the resulting value to differ from the sign of the input value.
See also this thread.
There are no implicit narrowing conversions in general - constant expressions are the only exception, and they are explicitly allowed by JLS 5.2:
In addition, if the expression is a constant expression (§15.28) of type byte, short, char, or int:
* A narrowing primitive conversion may be used if the type of the variable is byte, short, or char, and the value of the constant expression is representable in the type of the variable.
There is no mention of implicit narrowing conversions being allowed for floating point numbers, so they are forbidden as per the general rule.

java- issue in getting correct value with decimal

I am trying to get exact calculated value (in my case it should be 66.66... not 66.0) but it prints 66.0
If I will get 66.66 then I can use Math.round(66.66) so that I will get 67
Below code after execution should return 66.66 but it returns 66.0
double d = (2*100)/3
Please suggest..
Regards
(2*100)/3 performs integer multiplication and division, which results in an integer.
You need to force floating point calculation by changing one of the operands to an double (or float):
double d = (2.0*100)/3
As other answers suggested , you can provide at least one floating number. Or if you don't want to change the numbers, you can add cast to the numerator
double d= (double)(2*100)/3;
System.out.println(d);
prints
66.66666666666667
As Eran says it's your expression perform integer arithmetic you need to perform like
double d = (double) (2*100)/3;
if needs format the result.
You are using all int values in your arithmetic operation so final result will always be int and when you put it into a double then it becomes 66.0. But since original result is int, so you loose precision.
You can use double d = (2.0*100.0)/3.0; or have at least one value with decimal point, so that you can get expected decimal points.
Number after decimal point is commonly known as precision. So, the issue which you talked about is commonly known as floating-point imprecision, and in your case you can call it as double-point imprecision.
Rule of thumb:
If either is of type double, the other is converted to double for arithmetic operation and final result will be a double.
Otherwise, if either is a float, the other is converted to float for arithmetic operation and final result will be a float.
Otherwise, if either is of type long, the other is converted to long for arithmetic operation and final result will be a long.
Otherwise, both operands are converted to int for arithmetic operation and final result will be a int.

type calculation in math expressions

I am prepare for ocpjp.
I know that:
byte * byte = int
long * int = long
But I was wondered that float * float = float
float on float may be very huge number and for logically convert it to double.
Anyway for success exam passing I should to know all rules about it.
Please explain me these rules or just point to relevant jls part.
I suppose the relevant part of the JLS is 4.2.4 Floating-Point Operations :
If at least one of the operands to a numerical operator is of type double, then the operation is carried out using 64-bit floating-point arithmetic, and the result of the numerical operator is a value of type double. If the other operand is not a double, it is first widened (§5.1.5) to type double by numeric promotion (§5.6).
Otherwise, the operation is carried out using 32-bit floating-point arithmetic, and the result of the numerical operator is a value of type float. (If the other operand is not a float, it is first widened to type float by numeric promotion.)
So if you multiply two float values, a 32-bit operation is performed. An overflow to infinity will occur if the values are too large.
If you multiply a float by a double, the result will be a double and the float will be expanded to a 64-bit value during the operation.
As a result of my investigation I created following rule:
if (statement contains double)
result-double
else if (statement contains float)
result-float
else if (statement contains long)
result - long
else
result - int

Java literal value assignment behaviour

In the book of SCJP guide by Kathy Sierra, in the assignments chapter, we learn that we can declare something like this byte b = 7;. Behind the scene the code is byte b = (byte) 7;. This is so because in java, number 7 is considered a literal int value so has be to cast to int.
Now other situation. Double can include every byte contained within a float value as it is a bigger datatype. So can we say float f = 10.543; As 10.543 is quite a small value and should fit within a float. Also literal value for such number is considered a Double so compiler should implicitly cast it to float. But it's not so, compiler stops us. We have to append an F or f after that value.
Why are these two conflicting behaviour there for literal value assignment? In short if byte b = 7 is possible. Why is float f = 10.543 not possible?
You can read JLS 5.2 Assignment Conversion
The compile-time narrowing of constants means that code such as:
byte theAnswer = 42;
is allowed. Without the narrowing, the fact that the integer literal 42 has type int would mean that a cast to byte would be required:
byte theAnswer = (byte)42; // cast is permitted but not required
If the type of the expression cannot be converted to the type of the variable by a conversion permitted in an assignment context, then a compile-time error occurs.
If the type of the variable is float or double, then value set conversion (§5.1.13) is applied to the value v
JLS #3.10.2.Floating-Point Literals
A floating-point literal is of type float if it is suffixed with an ASCII letter F or f; otherwise its type is double and it can optionally be suffixed with an ASCII letter D or d
5.1.2. Widening Primitive Conversion
A narrowing primitive conversion from double to float is governed by the IEEE 754 rounding rules (§4.2.4). This conversion can lose precision, but also lose range, resulting in a float zero from a nonzero double and a float infinity from a finite double. A double NaN is converted to a float NaN and a double infinity is converted to the same-signed float infinity.
I hope above clarifies your doubt.
To add to the previous answers, the actual representation of 10.543 is:
float: 10.54300022125244140625
double: 10.5429999999999992610355548095
Since you are actually specifying two different numbers, it makes sense to require an explicit declaration.
A difference :
when you're "converting" from int to byte, you're just truncating the bytes
when you're converting from double to float, you're making a complex not trivial operation, which can't be implicit
Assigning a double to a float can cause precision loss, so java tells you you need to explicitly say how you want it to perform the assignment. Simple truncating may result in significant rounding errors.
Consider that a limited decimal in base 10 may actually be an unlimited fractional value in binary (e.g. floating point) base. Making conversion between floating point types explicit is therefore a good rule of thumb, useful in the vast majority of cases.
For integral types like byte the situation is slightly different: integral types can only differ on their size, but they all have the same decimal precision, which is zero. So there's no ambiguity in assigning a fitting value of a bigger integral type to a smaller integral variable.
"Behind the scene the code is byte b = (byte) 7".
That's not correct. See JLS #5.2 as referred to in several other answers. It says "A narrowing primitive conversion may be used if the type of the variable is byte, short, or char, and the value of the constant expression is representable in the type of the variable."
Nothing there about a typecast.

Why does Java implicitly (without cast) convert a `long` to a `float`?

Every time I think I understand about casting and conversions, I find another strange behavior.
long l = 123456789L;
float f = l;
System.out.println(f); // outputs 1.23456792E8
Given that a long has greater bit-depth than a float, I would expect that an explicit cast would be required in order for this to compile. And not surprisingly, we see that we have lost precision in the result.
Why is a cast not required here?
The same question could be asked of long to double - both conversions may lose information.
Section 5.1.2 of the Java Language Specification says:
Widening primitive conversions do not
lose information about the overall
magnitude of a numeric value. Indeed,
conversions widening from an integral
type to another integral type do not
lose any information at all; the
numeric value is preserved exactly.
Conversions widening from float to
double in strictfp expressions also
preserve the numeric value exactly;
however, such conversions that are not
strictfp may lose information about
the overall magnitude of the converted
value.
Conversion of an int or a long value
to float, or of a long value to
double, may result in loss of
precision-that is, the result may lose
some of the least significant bits of
the value. In this case, the resulting
floating-point value will be a
correctly rounded version of the
integer value, using IEEE 754
round-to-nearest mode (§4.2.4).
In other words even though you may lose information, you know that the value will still be in the overall range of the target type.
The choice could certainly have been made to require all implicit conversions to lose no information at all - so int and long to float would have been explicit and long to double would have been explicit. (int to double is okay; a double has enough precision to accurately represent all int values.)
In some cases that would have been useful - in some cases not. Language design is about compromise; you can't win 'em all. I'm not sure what decision I'd have made...
The Java Language Specification, Chapter 5: Conversion and Promotion addresses this issue:
5.1.2 Widening Primitive Conversion
The following 19 specific conversions
on primitive types are called the
widening primitive conversions:
byte to short, int, long, float, or double
short to int, long, float, or double
char to int, long, float, or double
int to long, float, or double
long to float or double
float to double
Widening primitive conversions do not lose information about the overall magnitude of a numeric value.
...
Conversion of an int or a long value to float, or of a long value to double, may result in loss of precision-that is, the result may lose some of the least significant bits of the value. In this case, the resulting floating-point value will be a correctly rounded version of the integer value
To put it another way, the JLS distinguishes between a loss of magnitude and a loss of precision.
int to byte for example is a (potential) loss of magnitude because you can't store 500 in a byte.
long to float is a potential loss of precision but not magnitude because the value range for floats is larger than that for longs.
So the rule is:
Loss of magnitude: explicit cast required;
Loss of precision: no cast required.
Subtle? Sure. But I hope that clears that up.
Though you're correct that a long uses more bits internally than a float, the java language works on a widening path:
byte -> short -> int -> long -> float -> double
To convert from left to right (a widening conversion), there is no cast necessary (which is why long to float is allowed). To convert right to left (a narrowing conversion) an explicit cast is necessary.
Somewhere I heard this. Float can store in exponential form as is we write it. '23500000000' is stored as '2.35e10' .So, float has space to occupy the range of values of long. Storing in exponential form is also the reason for precision loss.

Categories