I am new to Java, and wondering how does double to int cast work ? I understand that it's simple for long to int by taking the low 32 bits, but what about double (64 bits) to int (32 bits) ? those 64 bits from double in binary is in Double-precision floating-point format (Mantissa), so how does it convert to int internally ?
It's all documented in section 5.1.3 of the JLS.
In the first step, the floating-point number is converted either to a long, if T is long, or to an int, if T is byte, short, char, or int, as follows:
If the floating-point number is NaN (§4.2.3), the result of the first step of the conversion is an int or long 0.
Otherwise, if the floating-point number is not an infinity, the floating-point value is rounded to an integer value V, rounding toward zero using IEEE 754 round-toward-zero mode (§4.2.3). Then there are two cases:
If T is long, and this integer value can be represented as a long, then the result of the first step is the long value V.
Otherwise, if this integer value can be represented as an int, then the result of the first step is the int value V.
Otherwise, one of the following two cases must be true:
The value must be too small (a negative value of large magnitude or negative infinity), and the result of the first step is the smallest representable value of type int or long.
The value must be too large (a positive value of large magnitude or positive infinity), and the result of the first step is the largest representable value of type int or long.
(The second step here is irrelevant, when T is int.)
In most cases I'd expect this to be implemented using hardware support - converting floating point numbers to integers is something which is usually handled by CPUs.
Java truncates its value if you use (int) cast, as you may notice:
double d = 2.4d;
int i = (int) d;
System.out.println(i);
d = 2.6;
i = (int) d;
System.out.println(i);
Output:
2
2
Unless you use Math.round, Math.ceil, Math.floor...
You may want to read the Java specification:
http://docs.oracle.com/javase/specs/jls/se7/html/jls-5.html#jls-5.1.3
The relevant section is 5.1.3 - "Narrowing Primitive Conversion".
Related
I have been exploring how Java deals with integer overflow and underflow and I came across these 2 situations:
If a value out of range is assigned to an int directly, there could be no wrap around and it depends on the conversion:
long tooBigLong=2147483648L;
int integerL=(int)tooBigLong;
double tooBigDouble=Math.pow(2, 31);
int integerD=(int)tooBigDouble;
results:
Converted from Long to int: -2147483648
Converted from Double to int: 2147483647
It seems that a integer wrap around would only occur when I convert from Long to int.
But I do want to figure out how it wouldn't work for a double to int conversion?
Thanks for the help!
The JLS specifies the rules for 5.1.3. Narrowing Primitive Conversion:
When casting long to int:
A narrowing conversion of a signed integer to an integral type T simply discards all but the n lowest order bits, where n is the number of bits used to represent type T. In addition to a possible loss of information about the magnitude of the numeric value, this may cause the sign of the resulting value to differ from the sign of the input value.
Hence, dropping the top 32 bits of the original long and keeping the bottom 32 bits results in a negative int in your example.
When casting double to int:
A narrowing conversion of a floating-point number to an integral type T takes two steps:
In the first step, the floating-point number is converted either to a long, if T is long, or to an int, if T is byte, short, char, or int, as follows:
If the floating-point number is NaN (§4.2.3), the result of the first step of the conversion is an int or long 0.
Otherwise, if the floating-point number is not an infinity, the floating-point value is rounded to an integer value V, rounding toward zero using IEEE 754 round-toward-zero mode (§4.2.3). Then there are two cases:
a. If T is long, and this integer value can be represented as a long, then the result of the first step is the long value V.
b. Otherwise, if this integer value can be represented as an int, then the result of the first step is the int value V.
Otherwise, one of the following two cases must be true:
a. The value must be too small (a negative value of large magnitude or negative infinity), and the result of the first step is the smallest representable value of type int or long.
b. The value must be too large (a positive value of large magnitude or positive infinity), and the result of the first step is the largest representable value of type int or long.
The highlighted part determines that (int)tooBigDouble should result in Integer.MAX_VALUE. You would get the same result for higher double values. For example, for double tooBigDouble=Math.pow(2, 39);
I've started reading through the documentation of the Java8 and tried different sample codes. Found below strange behavior.
Sample1
Double di = new Double(Math.pow(2,32-1));
System.out.printf("%f\n",di.doubleValue()); //2147483648.000000
int a= di.intValue();
System.out.println(a); //2147483647
Sample2
Double di = new Double(Math.pow(2,32-1)) - 1.0;
System.out.printf("%f\n",di.doubleValue()); //2147483647.000000
int a= di.intValue();
System.out.println(a); //2147483647
How come in both the cases, the int value is returning same value?
Please see https://docs.oracle.com/javase/specs/jls/se7/html/jls-5.html#jls-5.1.3 THe important part highlighted in bold at the end:
A narrowing conversion of a floating-point number to an integral type T takes two steps:
In the first step, the floating-point number is converted either to a long, if T is long, or to an int, if T is byte, short, char, or int, as follows:
If the floating-point number is NaN (§4.2.3), the result of the first step of the conversion is an int or long 0.
Otherwise, if the floating-point number is not an infinity, the floating-point value is rounded to an integer value V, rounding toward zero using IEEE 754 round-toward-zero mode (§4.2.3). Then there are two cases:
If T is long, and this integer value can be represented as a long, then the result of the first step is the long value V.
Otherwise, if this integer value can be represented as an int, then the result of the first step is the int value V.
Otherwise, one of the following two cases must be true:
The value must be too small (a negative value of large magnitude or negative infinity), and the result of the first step is the smallest representable value of type int or long.
The value must be too large (a positive value of large magnitude or positive infinity), and the result of the first step is the largest representable value of type int or long.
Saying that, your double value is 2147483648 (you can try with higher number). Highest representable value int int is 2147483647. That's why you end up with 2147483647.
int value cannot exceed Integer.MAX_VALUE, which is exactly 2147483647.
You voluntary abandon all the tails when call intValue().
This is because maximum value of int type in Java is 2147483647. When you invoke Double::doubleValue it will perform narrowing conversion if the value that you are trying to convert is out of bound and it is stated by this method's docs :
Returns the value of this Double as an int after a narrowing primitive conversion.
And it even points to JLS 5.1.3 where this narrowing conversion is described.
For converting floating point values it takes two steps. The reason why you see this value is explained in the sentence for first step, dot three, option b :
The value must be too large (a positive value of large magnitude or positive infinity), and the result of the first step is the largest representable value of type int or long.
A narrowing conversion of a floating-point number to an integral type T takes two steps:
In the first step, the floating-point number is converted either to a long, if T is long, or to an int, if T is byte, short, char, or int, as follows:
If the floating-point number is NaN (§4.2.3), the result of the first step of the conversion is an int or long 0.
Otherwise, if the floating-point number is not an infinity, the floating-point value is rounded to an integer value V, rounding toward zero using IEEE 754 round-toward-zero mode (§4.2.3). Then there are two cases:
a If T is long, and this integer value can be represented as a long, then the result of the first step is the long value V.
b Otherwise, if this integer value can be represented as an int, then the result of the first step is the int value V.
Otherwise, one of the following two cases must be true:
a The value must be too small (a negative value of large magnitude or negative infinity), and the result of the first step is the smallest representable value of type int or long.
b The value must be too large (a positive value of large magnitude or positive infinity), and the result of the first step is the largest representable value of type int or long.
In the second step:
If T is int or long, the result of the conversion is the result of the first step.
If T is byte, char, or short, the result of the conversion is the result of a narrowing conversion to type T (§5.1.3) of the result of the first step.
So the result will be the largest representable value of type int, in this case.
I am facing issues while converting from double to long:
double power = Math.pow(2, 63);
long powerInLong = (double) power;
The above code returns:
9223372036854775807
while it should return:
-9223372036854775808
I am confused why is this happening.
Thanks in advance!
You are doing a conversion from double to long, which is a narrowing primitive conversion.
This conversion is clearly specified in the JLS §5.1.3 (emphasis mine):
A narrowing conversion of a floating-point number to an integral type
T takes two steps:
In the first step, the floating-point number is converted either to a long, if T is long, or to an int, if T is byte, short, char, or int,
as follows:
If the floating-point number is NaN (§4.2.3), the result of the first step of the conversion is an int or long 0.
Otherwise, if the floating-point number is not an infinity, the floating-point value is rounded to an integer value V, rounding toward
zero using IEEE 754 round-toward-zero mode (§4.2.3). Then there are
two cases:
a. If T is long, and this integer value can be represented as a long, then the result of the first step is the long value V.
b. Otherwise, if this integer value can be represented as an int, then the result of the first step is the int value V.
Otherwise, one of the following two cases must be true:
a. The value must be too small (a negative value of large magnitude or negative infinity), and the result of the first step is
the smallest representable value of type int or long.
b. The value must be too large (a positive value of large magnitude or positive infinity), and the result of the first step is
the largest representable value of type int or long.
Because 2^63 exceeds the range of long (but not the range of double), converting it to a long will result in the largest long. This can be demonstrated also by converting Math.pow(2, 64) to long, which results in the same long.
I am trying to convert a double value of 2147483648 to integer, After typecasting it, I get output as 2147483647, the number is reduced by 1. I know that this is happening because of overflow, but is there a way where I can convert it to int type without loosing its precision?
As aforementioned in previous answers, you can use the long primitive data type, or the BigInteger reference type. Using integral data types rather than floating points would be better for representing integers exactly.
As a side note, since Java SE 8, one can use the Integer class to use int for unsigned arithmetic. Per the Oracle doc - "Use the Integer class to use int data type as an unsigned integer". Unsigned integers have a greater maximum value than int. This question can be of use: Declaring an unsigned int in Java.
This is my first answer, hope you solved your problem! :)
I can think of 2 options.
Long
Unlike int, which is a 32-bit signed integer data type, long is a 64-bit signed integer data type. This means that the largest value it can store is 9223372036854775807. The number that you are trying to store, 2147483648, is well inside that range.
BigInteger
This is a reference type that represents an "immutable arbitrary-precision integer". You can create a BigInteger instance that represents 2147483648 by doing
new BigInteger("2147483648")
Learn more about it here: https://docs.oracle.com/javase/7/docs/api/java/math/BigInteger.html
Max Values
The maximum integer, as indicated in the Integer class by the static final int MAX_VALUE, is 2^31-1 (2,147,483,647). This value is the maximum integer as it is the largest 32-bit signed integer.
The maximum double, as indicated in the Double class by the static final int MAX_VALUE, is (2-2^(-52))(2^1023). The double data type follows the double-precision 64-bit IEEE 754 floating point format to express a wide range of dynamic numerical values.
Narrowing Primitive Conversion
In you're observation, you have a double with a value 2,147,483,648 which you attempt to convert to an integer by type casting.
double d = 2147483648;
int i = (int) d;
Casting between primitive types allows your to convert the value of one primitive type to another. Converting from a double to an integer is known as a Narrowing Primitive Conversion, wherein you:
"may lose information about the overall magnitude of a numeric value and may also lose precision and range."
The narrowing conversion of a floating-point number to an int is as follows:
If the floating-point number is NaN, the result is an int of 0.
Otherwise, if the floating-point number is not an infinity, the floating-point number is rounded to an integer value V, rounding toward zero using IEEE 754 round-toward-zero mode. If this integer value can be represented as an int, then the result is V.
Otherwise, one of the following two cases must be true: a. The value must be too small and the result is the smallest representable value of type int or b. The value must be too large and the result is the largest representable value of type int.
With the integer value of the double value larger than the Integer.MAX_VALUE, in order to allow for representation as an integer the value of Integer.MAX_VALUE is used.
Avoid Loss
To avoid the loss of precision you will need to either cast to a primitive type or a numeric Object wherein the maximum value is greater than 2147483648 (and resolution allows for accuracy to be maintained).
The long primitive type has a maximum value of 2^63-1 (9.223372e+18), which would be a good choice if you want to use numbers within the numeric integer space. Note that while the Long.MAX_VALUE is very large, Double.MAX_VALUE is MUCH larger due to the floating-point format.
double d = 2147483648;
long i = (long) d;
Converting from a double to an int will lose precision for numbers greater than Integer.MAX_VALUE or less that Integer.MIN_VALUE. There is no way to represent numbers outside that range as an int. It is mathematically impossible.
Converting from a double to an long will also lose precision. This will occur for all integers outside of Long.MIN_VALUE through Long.MAX_VALUE. But a second problem is that double itself is not able to represent all integers in that range ... so there will be loss of precision before the conversion1.
Moral:
Don't use floating point numbers if you need to represent integers precisely. Use an integral type (byte, short, char, int or long) ... or BigInteger.
1 - A double is a 64 bit IEE floating point number, which has 52 bits of precision and a sign bit. By contrast, a long has 64 bits of precision (including sign).
I'm getting some strange results doing a calculation for the application I'm working on and I thought someone on here might be able to help figure out what's going on.
The requirements for this particular calculation state that the calculation should look like this:
A and B are known
A * B = C
For this particular calculation
A = 0.0410
B = 123456789010
Here are the results I'm seeing:
Calculator:
0.0410 * 123456789010 = 5061728349.41
Java:
B is a double:
0.0410f * 123456789010d = 5.061728489223363E9 = 5061728489.223363
B is a long:
0.0410f * 123456789010l = 5.0617288E9
The loss of precision is of less importance to me (I only need 9 digits of precision anyway) than the difference in the 10s and 1s spot. Why does doing the calculation using the double give me the "wrong" result?
Incidentally, I tried doing the calculation using BigDecimal and got the same result as I did using a double.
The various type conversions that happen are specified by the JLS #5.6.2. In your case (extract):
If either operand is of type double, the other is converted to double.
Otherwise, if either operand is of type float, the other is converted to float.
In 0.0410f * 123456789010d = 506172848.9223363, 0.0410f is first converted to a double which is not necessarily equal to 0.0410d. Actually you can try it and see that is is not:
double d1 = 0.041d;
double d2 = 0.041f;
System.out.println(new BigDecimal(d1));
System.out.println(new BigDecimal(d2));
outputs:
0.041000000000000001720845688168992637656629085540771484375
0.041000001132488250732421875
In your next example:
0.0410f * 123456789010L = 506172832
the long is converted to a float, which you can verify with this example:
float f1 = 0.0410f;
float f2 = 123456789010L;
System.out.println(new BigDecimal(f1)); // 0.041000001132488250732421875
System.out.println(new BigDecimal(f2)); // 123456790528
System.out.println(new BigDecimal(0.0410f * 123456789010L)); // 5061728768
System.out.println(new BigDecimal(f1 * f2)); // 5061728768
As for the precision of float / double operations in general, check this question.
Finally, if you use a BigDecimal, you get the correct answer:
BigDecimal a = new BigDecimal("0.041");
BigDecimal b = new BigDecimal("123456789010");
System.out.println(a.multiply(b)); // outputs 5061728349.410
TLDR Answer: The float cannot represent the 'correct' answer any more exactly. Use a double instead. Also the multiplication will be done inexactly as well without an explicit cast.
Answers I get using http://www.ideone.com
A B C
float long float 5061728768.000000
double long double 5061728489.223363
The problem is that the precision of a float is much less than a double, so when multiplied up by a large number (e.g. your 10^10 value) you lose this precision in the multiplication. If we explicitly cast A to a double for the multiplication:
double C = ((double)A)*B; //=5061728489.223363
Then we get back the additional precision. If we cast the double answer back to a float:
float C = (float)((double)((double)A)*B); //=5061728256.000000
You see that the answer is different again. The result type of the multiply is used, so in this instance double, but the cast back to float drops precision. Without an explicit case to double (double C=A*B), the float type is used. With both casts, the multiply is done as a double, and the precision is lost after the multiplication.
The first calculation is using double (64 bits), the second float (32 bits). What you are seeing is "rounding errors".
In both cases it is a floating-point calculation, but in the second case, no "double" arguments are involved, so it just uses 32 bit arithmetic.
Quoting the Java language spec:
If at least one of the operands to a binary operator is of floating-point type, then the operation is a floating-point operation, even if the other is integral.
If at least one of the operands to a numerical operator is of type double, then the operation is carried out using 64-bit floating-point arithmetic, and the result of the numerical operator is a value of type double. If the other operand is not a double, it is first widened (§5.1.5) to type double by numeric promotion (§5.6).
Otherwise, the operation is carried out using 32-bit floating-point arithmetic, and the result of the numerical operator is a value of type float. (If the other operand is not a float, it is first widened to type float by numeric promotion.)
The answer to your question is probably in the Floating point operation section of the Java Language Specification and in this older post. You are probably experiencing rounding errors due to the implicit conversion that is ocurring.
The quote that applies to your situation is
Third operation:
If at least one of the operands to a binary operator is of
floating-point type, then the operation is a floating-point operation,
even if the other is integral.
Second operation:
If at least one of the operands to a numerical operator is of type
double, then the operation is carried out using 64-bit floating-point
arithmetic, and the result of the numerical operator is a value of
type double. If the other operand is not a double, it is first widened
(§5.1.5) to type double by numeric promotion (§5.6).
First operation
Otherwise, the operation is carried out using 32-bit floating-point
arithmetic, and the result of the numerical operator is a value of
type float. (If the other operand is not a float, it is first widened
to type float by numeric promotion.)
Hence, you should not be worried, but decide what is the precision you desire and use the appropriate casting, if necessary.
32-bit IEEE floating point numbers have seven digits of precision; 64-bit allows 16. That's all you get. If neither of those is sufficient, you have to use BigDecimal.
This is true in every language that implements the IEE standard, not just Java.