Java Why is converting a long (64) to float (32) considered widening? - java

As it states from oracle
Reference from Oracle Docs
Widening Primitive Conversion
19 specific conversions on primitive types are called the widening primitive conversions:
byte to short, int, long, float, or double
short to int, long, float, or double
char to int, long, float, or double
int to long, float, or double
long to float or double?
float to double
If a float has 32 bits and a long has 64 how is that considered widening? Shouldn't this be considered narrowing?

The range of values that can be represented by a float or double is much larger than the range that can be represented by a long. Although one might lose significant digits when converting from a long to a float, it is still a "widening" operation because the range is wider.
From the Java Language Specification, §5.1.2:
A widening conversion of an int or a long value to float, or of a long value to double, may result in loss of precision - that is, the result may lose some of the least significant bits of the value. In this case, the resulting floating-point value will be a correctly rounded version of the integer value, using IEEE 754 round-to-nearest mode (§4.2.4).
Note that a double can exactly represent every possible int value.

It is considered widening because the numbers that can be represented by a float is larger than numbers that can represented by long. Just because float uses 32 bit precision does not mean the numbers it can represent are limited to 2^32.
For instance the float (float)Long.MAX_VALUE+(float)Long.MAX_VALUE is larger than Long.MAX_VALUE, even though the float has less precision that the long.

It's considered widening because float and double can represent larger values than long. You may lose precision, but it will be possible to represent the value (at least approximately).

If you look at the matter in simple terms, it is about how data has been represented by original designers.
ideally bit depth of long(64) is larger than float(32). But float data has represented using scientific notion
which allows to represent considerably much larger range
Ex: 300[original number] : 3×102 [scientific representation]
Long : -2^63 to 2^63-1
Float : (-3.4)*10^38 to (3.4)*10^38
Notice the Long(power of two) Vs Float(power of ten) representational difference here which allow float to have higher range
hope this is helpful

Related

Type Conversion in Java Double to Int

I am trying to convert a double value of 2147483648 to integer, After typecasting it, I get output as 2147483647, the number is reduced by 1. I know that this is happening because of overflow, but is there a way where I can convert it to int type without loosing its precision?
As aforementioned in previous answers, you can use the long primitive data type, or the BigInteger reference type. Using integral data types rather than floating points would be better for representing integers exactly.
As a side note, since Java SE 8, one can use the Integer class to use int for unsigned arithmetic. Per the Oracle doc - "Use the Integer class to use int data type as an unsigned integer". Unsigned integers have a greater maximum value than int. This question can be of use: Declaring an unsigned int in Java.
This is my first answer, hope you solved your problem! :)
I can think of 2 options.
Long
Unlike int, which is a 32-bit signed integer data type, long is a 64-bit signed integer data type. This means that the largest value it can store is 9223372036854775807. The number that you are trying to store, 2147483648, is well inside that range.
BigInteger
This is a reference type that represents an "immutable arbitrary-precision integer". You can create a BigInteger instance that represents 2147483648 by doing
new BigInteger("2147483648")
Learn more about it here: https://docs.oracle.com/javase/7/docs/api/java/math/BigInteger.html
Max Values
The maximum integer, as indicated in the Integer class by the static final int MAX_VALUE, is 2^31-1 (2,147,483,647). This value is the maximum integer as it is the largest 32-bit signed integer.
The maximum double, as indicated in the Double class by the static final int MAX_VALUE, is (2-2^(-52))(2^1023). The double data type follows the double-precision 64-bit IEEE 754 floating point format to express a wide range of dynamic numerical values.
Narrowing Primitive Conversion
In you're observation, you have a double with a value 2,147,483,648 which you attempt to convert to an integer by type casting.
double d = 2147483648;
int i = (int) d;
Casting between primitive types allows your to convert the value of one primitive type to another. Converting from a double to an integer is known as a Narrowing Primitive Conversion, wherein you:
"may lose information about the overall magnitude of a numeric value and may also lose precision and range."
The narrowing conversion of a floating-point number to an int is as follows:
If the floating-point number is NaN, the result is an int of 0.
Otherwise, if the floating-point number is not an infinity, the floating-point number is rounded to an integer value V, rounding toward zero using IEEE 754 round-toward-zero mode. If this integer value can be represented as an int, then the result is V.
Otherwise, one of the following two cases must be true: a. The value must be too small and the result is the smallest representable value of type int or b. The value must be too large and the result is the largest representable value of type int.
With the integer value of the double value larger than the Integer.MAX_VALUE, in order to allow for representation as an integer the value of Integer.MAX_VALUE is used.
Avoid Loss
To avoid the loss of precision you will need to either cast to a primitive type or a numeric Object wherein the maximum value is greater than 2147483648 (and resolution allows for accuracy to be maintained).
The long primitive type has a maximum value of 2^63-1 (9.223372e+18), which would be a good choice if you want to use numbers within the numeric integer space. Note that while the Long.MAX_VALUE is very large, Double.MAX_VALUE is MUCH larger due to the floating-point format.
double d = 2147483648;
long i = (long) d;
Converting from a double to an int will lose precision for numbers greater than Integer.MAX_VALUE or less that Integer.MIN_VALUE. There is no way to represent numbers outside that range as an int. It is mathematically impossible.
Converting from a double to an long will also lose precision. This will occur for all integers outside of Long.MIN_VALUE through Long.MAX_VALUE. But a second problem is that double itself is not able to represent all integers in that range ... so there will be loss of precision before the conversion1.
Moral:
Don't use floating point numbers if you need to represent integers precisely. Use an integral type (byte, short, char, int or long) ... or BigInteger.
1 - A double is a 64 bit IEE floating point number, which has 52 bits of precision and a sign bit. By contrast, a long has 64 bits of precision (including sign).

Java implicit conversion

With the following code:
Float a = 1.2;
there is an error because it takes the decimal as double value and double is a bigger datatype than float.
Now, it takes integer as default int type. So, why is the following code not giving any error?
Byte b = 20;
The compiler is smart enough to figure out that the bit representation of 20 (an int value) can fit into a byte with no loss of data. From the Java Language Specification §5.1.3:
A narrowing primitive conversion from double to float is governed by the IEEE 754 rounding rules (§4.2.4). This conversion can lose precision, but also lose range, resulting in a float zero from a nonzero double and a float infinity from a finite double. A double NaN is converted to a float NaN and a double infinity is converted to the same-signed float infinity.
A narrowing conversion of a signed integer to an integral type T simply discards all but the n lowest order bits, where n is the number of bits used to represent type T. In addition to a possible loss of information about the magnitude of the numeric value, this may cause the sign of the resulting value to differ from the sign of the input value.
See also this thread.
There are no implicit narrowing conversions in general - constant expressions are the only exception, and they are explicitly allowed by JLS 5.2:
In addition, if the expression is a constant expression (§15.28) of type byte, short, char, or int:
* A narrowing primitive conversion may be used if the type of the variable is byte, short, or char, and the value of the constant expression is representable in the type of the variable.
There is no mention of implicit narrowing conversions being allowed for floating point numbers, so they are forbidden as per the general rule.

How does double to int cast work in Java

I am new to Java, and wondering how does double to int cast work ? I understand that it's simple for long to int by taking the low 32 bits, but what about double (64 bits) to int (32 bits) ? those 64 bits from double in binary is in Double-precision floating-point format (Mantissa), so how does it convert to int internally ?
It's all documented in section 5.1.3 of the JLS.
In the first step, the floating-point number is converted either to a long, if T is long, or to an int, if T is byte, short, char, or int, as follows:
If the floating-point number is NaN (§4.2.3), the result of the first step of the conversion is an int or long 0.
Otherwise, if the floating-point number is not an infinity, the floating-point value is rounded to an integer value V, rounding toward zero using IEEE 754 round-toward-zero mode (§4.2.3). Then there are two cases:
If T is long, and this integer value can be represented as a long, then the result of the first step is the long value V.
Otherwise, if this integer value can be represented as an int, then the result of the first step is the int value V.
Otherwise, one of the following two cases must be true:
The value must be too small (a negative value of large magnitude or negative infinity), and the result of the first step is the smallest representable value of type int or long.
The value must be too large (a positive value of large magnitude or positive infinity), and the result of the first step is the largest representable value of type int or long.
(The second step here is irrelevant, when T is int.)
In most cases I'd expect this to be implemented using hardware support - converting floating point numbers to integers is something which is usually handled by CPUs.
Java truncates its value if you use (int) cast, as you may notice:
double d = 2.4d;
int i = (int) d;
System.out.println(i);
d = 2.6;
i = (int) d;
System.out.println(i);
Output:
2
2
Unless you use Math.round, Math.ceil, Math.floor...
You may want to read the Java specification:
http://docs.oracle.com/javase/specs/jls/se7/html/jls-5.html#jls-5.1.3
The relevant section is 5.1.3 - "Narrowing Primitive Conversion".

Math in Java - different results with different objects

I'm getting some strange results doing a calculation for the application I'm working on and I thought someone on here might be able to help figure out what's going on.
The requirements for this particular calculation state that the calculation should look like this:
A and B are known
A * B = C
For this particular calculation
A = 0.0410
B = 123456789010
Here are the results I'm seeing:
Calculator:
0.0410 * 123456789010 = 5061728349.41
Java:
B is a double:
0.0410f * 123456789010d = 5.061728489223363E9 = 5061728489.223363
B is a long:
0.0410f * 123456789010l = 5.0617288E9
The loss of precision is of less importance to me (I only need 9 digits of precision anyway) than the difference in the 10s and 1s spot. Why does doing the calculation using the double give me the "wrong" result?
Incidentally, I tried doing the calculation using BigDecimal and got the same result as I did using a double.
The various type conversions that happen are specified by the JLS #5.6.2. In your case (extract):
If either operand is of type double, the other is converted to double.
Otherwise, if either operand is of type float, the other is converted to float.
In 0.0410f * 123456789010d = 506172848.9223363, 0.0410f is first converted to a double which is not necessarily equal to 0.0410d. Actually you can try it and see that is is not:
double d1 = 0.041d;
double d2 = 0.041f;
System.out.println(new BigDecimal(d1));
System.out.println(new BigDecimal(d2));
outputs:
0.041000000000000001720845688168992637656629085540771484375
0.041000001132488250732421875
In your next example:
0.0410f * 123456789010L = 506172832
the long is converted to a float, which you can verify with this example:
float f1 = 0.0410f;
float f2 = 123456789010L;
System.out.println(new BigDecimal(f1)); // 0.041000001132488250732421875
System.out.println(new BigDecimal(f2)); // 123456790528
System.out.println(new BigDecimal(0.0410f * 123456789010L)); // 5061728768
System.out.println(new BigDecimal(f1 * f2)); // 5061728768
As for the precision of float / double operations in general, check this question.
Finally, if you use a BigDecimal, you get the correct answer:
BigDecimal a = new BigDecimal("0.041");
BigDecimal b = new BigDecimal("123456789010");
System.out.println(a.multiply(b)); // outputs 5061728349.410
TLDR Answer: The float cannot represent the 'correct' answer any more exactly. Use a double instead. Also the multiplication will be done inexactly as well without an explicit cast.
Answers I get using http://www.ideone.com
A B C
float long float 5061728768.000000
double long double 5061728489.223363
The problem is that the precision of a float is much less than a double, so when multiplied up by a large number (e.g. your 10^10 value) you lose this precision in the multiplication. If we explicitly cast A to a double for the multiplication:
double C = ((double)A)*B; //=5061728489.223363
Then we get back the additional precision. If we cast the double answer back to a float:
float C = (float)((double)((double)A)*B); //=5061728256.000000
You see that the answer is different again. The result type of the multiply is used, so in this instance double, but the cast back to float drops precision. Without an explicit case to double (double C=A*B), the float type is used. With both casts, the multiply is done as a double, and the precision is lost after the multiplication.
The first calculation is using double (64 bits), the second float (32 bits). What you are seeing is "rounding errors".
In both cases it is a floating-point calculation, but in the second case, no "double" arguments are involved, so it just uses 32 bit arithmetic.
Quoting the Java language spec:
If at least one of the operands to a binary operator is of floating-point type, then the operation is a floating-point operation, even if the other is integral.
If at least one of the operands to a numerical operator is of type double, then the operation is carried out using 64-bit floating-point arithmetic, and the result of the numerical operator is a value of type double. If the other operand is not a double, it is first widened (§5.1.5) to type double by numeric promotion (§5.6).
Otherwise, the operation is carried out using 32-bit floating-point arithmetic, and the result of the numerical operator is a value of type float. (If the other operand is not a float, it is first widened to type float by numeric promotion.)
The answer to your question is probably in the Floating point operation section of the Java Language Specification and in this older post. You are probably experiencing rounding errors due to the implicit conversion that is ocurring.
The quote that applies to your situation is
Third operation:
If at least one of the operands to a binary operator is of
floating-point type, then the operation is a floating-point operation,
even if the other is integral.
Second operation:
If at least one of the operands to a numerical operator is of type
double, then the operation is carried out using 64-bit floating-point
arithmetic, and the result of the numerical operator is a value of
type double. If the other operand is not a double, it is first widened
(§5.1.5) to type double by numeric promotion (§5.6).
First operation
Otherwise, the operation is carried out using 32-bit floating-point
arithmetic, and the result of the numerical operator is a value of
type float. (If the other operand is not a float, it is first widened
to type float by numeric promotion.)
Hence, you should not be worried, but decide what is the precision you desire and use the appropriate casting, if necessary.
32-bit IEEE floating point numbers have seven digits of precision; 64-bit allows 16. That's all you get. If neither of those is sufficient, you have to use BigDecimal.
This is true in every language that implements the IEE standard, not just Java.

Why does Java implicitly (without cast) convert a `long` to a `float`?

Every time I think I understand about casting and conversions, I find another strange behavior.
long l = 123456789L;
float f = l;
System.out.println(f); // outputs 1.23456792E8
Given that a long has greater bit-depth than a float, I would expect that an explicit cast would be required in order for this to compile. And not surprisingly, we see that we have lost precision in the result.
Why is a cast not required here?
The same question could be asked of long to double - both conversions may lose information.
Section 5.1.2 of the Java Language Specification says:
Widening primitive conversions do not
lose information about the overall
magnitude of a numeric value. Indeed,
conversions widening from an integral
type to another integral type do not
lose any information at all; the
numeric value is preserved exactly.
Conversions widening from float to
double in strictfp expressions also
preserve the numeric value exactly;
however, such conversions that are not
strictfp may lose information about
the overall magnitude of the converted
value.
Conversion of an int or a long value
to float, or of a long value to
double, may result in loss of
precision-that is, the result may lose
some of the least significant bits of
the value. In this case, the resulting
floating-point value will be a
correctly rounded version of the
integer value, using IEEE 754
round-to-nearest mode (§4.2.4).
In other words even though you may lose information, you know that the value will still be in the overall range of the target type.
The choice could certainly have been made to require all implicit conversions to lose no information at all - so int and long to float would have been explicit and long to double would have been explicit. (int to double is okay; a double has enough precision to accurately represent all int values.)
In some cases that would have been useful - in some cases not. Language design is about compromise; you can't win 'em all. I'm not sure what decision I'd have made...
The Java Language Specification, Chapter 5: Conversion and Promotion addresses this issue:
5.1.2 Widening Primitive Conversion
The following 19 specific conversions
on primitive types are called the
widening primitive conversions:
byte to short, int, long, float, or double
short to int, long, float, or double
char to int, long, float, or double
int to long, float, or double
long to float or double
float to double
Widening primitive conversions do not lose information about the overall magnitude of a numeric value.
...
Conversion of an int or a long value to float, or of a long value to double, may result in loss of precision-that is, the result may lose some of the least significant bits of the value. In this case, the resulting floating-point value will be a correctly rounded version of the integer value
To put it another way, the JLS distinguishes between a loss of magnitude and a loss of precision.
int to byte for example is a (potential) loss of magnitude because you can't store 500 in a byte.
long to float is a potential loss of precision but not magnitude because the value range for floats is larger than that for longs.
So the rule is:
Loss of magnitude: explicit cast required;
Loss of precision: no cast required.
Subtle? Sure. But I hope that clears that up.
Though you're correct that a long uses more bits internally than a float, the java language works on a widening path:
byte -> short -> int -> long -> float -> double
To convert from left to right (a widening conversion), there is no cast necessary (which is why long to float is allowed). To convert right to left (a narrowing conversion) an explicit cast is necessary.
Somewhere I heard this. Float can store in exponential form as is we write it. '23500000000' is stored as '2.35e10' .So, float has space to occupy the range of values of long. Storing in exponential form is also the reason for precision loss.

Categories