Typecasting Float in Java - java

A.) Does precision loss happens when one converts a float to a double in Java ?
B.) If I typecast a float to a float does that result in any precision loss or does Java simply intelligently ignores/skips such kind of typecasting ?

Question A is covered by the Java Language Specification, Widening Primitive Conversion.
"Conversions widening from float to double in strictfp expressions also preserve the numeric value exactly; however, such conversions that are not strictfp may lose information about the overall magnitude of the converted value."
The non-strictfp issue relates to numbers with extreme exponent values. The issue is discussed in FP-strict Expressions.
"Within an FP-strict expression, all intermediate values must be elements of the float value set or the double value set, implying that the results of all FP-strict expressions must be those predicted by IEEE 754 arithmetic on operands represented using single and double formats. Within an expression that is not FP-strict, some leeway is granted for an implementation to use an extended exponent range to represent intermediate results; the net effect, roughly speaking, is that a calculation might produce "the correct answer" in situations where exclusive use of the float value set or double value set might result in overflow or underflow."
If you want to be sure that conversions from float to double will be exact, use strictfp.
Question B is a question about Identity Conversions. I'm not sure whether an identity conversion can trigger a change in value set, involving the same strictfp vs. non-strict issue as for Question A.

Assume a double is a 64-bit bucket and a float a 32-bit bucket. If you empty the contents of a float into the double-bucket surely it will fit with more space to spare. However converting from double to float won't be the same because the float-bucket is just too small to hold the bigger content of a double. In case of float to float, they are all of the same size so you surely won't lose nothing and will fit perfectly into each other's buckets.

Related

Is it possible to add a float to a double?

Our teacher asked us to search about this and what I kept on getting from the net are explanations stating what double and float means.
Can you tell me whether it is possible or not, and explain why or why not?
Simple answer: yes, but only if the double is not too large.
float's are single-precision floating point numbers, meaning they use a 23-bit mantissa and 8-bit exponent, corresponding to ~6/7 s.f. precision and ~ 10^38 range.
double's are double-precision - with 52-bit mantissa and 11-bit exponent, corresponding to ~14/15 s.f. precision and ~ 10^308 range.
Since double's have larger range than floats, adding a float to a very large double will nullify the float's effects (called underflow). Of course this can happen for two double types as well.
https://en.wikipedia.org/wiki/Floating_point
Can you add two numbers with varying decimal places (e.g. 432.54385789364 + 432.1)? Yes you can.
In Java, it is the same idea.
From the Java Tutorials:
float: The float data type is a single-precision 32-bit IEEE 754 floating point. Its range of values is beyond the scope of this discussion, but is specified in the Floating-Point Types, Formats, and Values section of the Java Language Specification. As with the recommendations for byte and short, use a float (instead of double) if you need to save memory in large arrays of floating point numbers. This data type should never be used for precise values, such as currency. For that, you will need to use the java.math.BigDecimal class instead. Numbers and Strings covers BigDecimal and other useful classes provided by the Java platform.
double: The double data type is a double-precision 64-bit IEEE 754 floating point. Its range of values is beyond the scope of this discussion, but is specified in the Floating-Point Types, Formats, and Values section of the Java Language Specification. For decimal values, this data type is generally the default choice. As mentioned above, this data type should never be used for precise values, such as currency.
Basically, they are both holders to decimals. The way that they are different is how precise they can be. A float can only be 32 bits in size, compared to a double which is 64 bits in size. A float can have precision up to around 5 or 6 float point numbers, and a double can have precision up to around 10 floating point numbers.
Basically... a double can store a decimal better than a float... but takes up more space.
To answer your question, you can add a float to a double and vice versa. Generally, the result will be made into a double, and you will have to cast it back to a float if that is what you want.
If you want to be really deep about it you should say yes it is possible due to value coercion, but that it opens the door for more severe precision errors to accumulate invisibly to the compiler. float has substantially precision than double and is very regrettably the default type of literal floating-point numbers in Java source. In practice make sure to use the d suffix on literals to make sure theh are double if you have to use floating point.
These precision errors can lead to serious harm and even loss of life in sensitive systems.
Floating point is very hard to use correctly and should be avoided if possible. One extremely obvious thing not to do that is commonly mistakenly done is representing currency as a float or double. This can cause real money to be effectively given to or stolen from people.
Floating point (preferring double) is appropriate for approximate calculations and certain high performance scientific computing applications. However it is still extremely important to be aware of the precision loss characteristics particularly when a resulting floating point value is fed into further floating-point calculations.
This more generally leads in Numerical Computing and now I've really gone afield :)
SAS has a decent paper on this:
http://support.sas.com/resources/papers/proceedings11/275-2011.pdf

Java implicit conversion

With the following code:
Float a = 1.2;
there is an error because it takes the decimal as double value and double is a bigger datatype than float.
Now, it takes integer as default int type. So, why is the following code not giving any error?
Byte b = 20;
The compiler is smart enough to figure out that the bit representation of 20 (an int value) can fit into a byte with no loss of data. From the Java Language Specification §5.1.3:
A narrowing primitive conversion from double to float is governed by the IEEE 754 rounding rules (§4.2.4). This conversion can lose precision, but also lose range, resulting in a float zero from a nonzero double and a float infinity from a finite double. A double NaN is converted to a float NaN and a double infinity is converted to the same-signed float infinity.
A narrowing conversion of a signed integer to an integral type T simply discards all but the n lowest order bits, where n is the number of bits used to represent type T. In addition to a possible loss of information about the magnitude of the numeric value, this may cause the sign of the resulting value to differ from the sign of the input value.
See also this thread.
There are no implicit narrowing conversions in general - constant expressions are the only exception, and they are explicitly allowed by JLS 5.2:
In addition, if the expression is a constant expression (§15.28) of type byte, short, char, or int:
* A narrowing primitive conversion may be used if the type of the variable is byte, short, or char, and the value of the constant expression is representable in the type of the variable.
There is no mention of implicit narrowing conversions being allowed for floating point numbers, so they are forbidden as per the general rule.

Floating Point Casting in Java

Casting for integers is very straightforward, the extra bits simply disappear.
But, is it important to understand what is happening under the hood for casting floating point? I've tried to read information on how floating point is calculated, but I have yet to find one that explains it well. At least that's my excuse. I get the basic idea although the calculation of the mantissa is a bit difficult.
At least up to Java 7, I understand that floating points cannot be used in bitwise operations. Which makes sense because of how they are stored internally. Is there anything important that is needed to know on how floating points operate or are cast?
So, to Summarize:
Is it important to understand the internal workings of floating point like integers?
What is the internal process of casting a floating point to an integer?
What is the internal process of casting a floating point to an integer?
Java calls the machine code instruction which does this in compliance with the IEEE-754 standard. There is nothing for Java to do as such. If you want to know how casting works I suggest you read the standard.
Basically, the mantissa is shifted by the exponent and the sign applied. i.e. a floating point number is sign * 2^exponent * mantissa and all it does is perform this calculation and drop and fractional parts.
First, you need to understand that a floating point number is essentially an approximation. You can put in, say 1.23 and get out 1.229998 (or some such), because 1.23 is represented exactly. Regardless of whether you will be doing any casts, you need to understand this, and how it affects computations (and especially comparisons).
From the standpoint of cast, casting a float to a double causes no loss of information, since a double can contain every value that a float can contain. But casting from double to float can cause loss of precision (and, for very large or small numbers, exponent overflow/underflow), since there's simply more information in a 64-bit value than in a 32-bit one, so some data's going to end up "on the floor".
Similarly, casting from an int to a double causes no loss of information, since a double can contain every value an int can contain and then some. But casting from int to float or from long to double or float can result in loss of precision (though there can never be an exponent overflow/underflow).
Casting from float or double to int or long can easily result in overflow/underflow and major loss of data, if the float or double value has a large positive exponent or any negative exponent. And, of course, when you cast from floating-point to fixed the fractional part of the number is truncated (essentially a "floor" operation).

Floating point arithmetics restricted to integers

I use doubles for a uniform implementation of some arithmetic calculations. These calculations may be actually applied to integers too, but there are no C++-like templates in Java and I don't want to duplicate the implementation code, so I simply use "double" version for ints.
Does JVM spec guarantees the correctness of integer operations such a <=,>=, +, -, *, and / (in case of remainder==0) when the operations are emulated as corresponding floating point ops?
(Any integer, of course, has reasonable size to be represented in double's mantissa)
According to the Java Language Specification:
Operators on floating-point numbers
behave as specified by IEEE 754 (with
the exception of the remainder
operator (§15.17.3)).
So you're guaranteed uniform behaviour, and while I don't have access to the official IEEE standard document, I'm pretty sure that it implicitly guarantees that operations on integers that can be represented exactly as a float/double work as expected.
briefly yes.
double a = 3.0;
double b = 2.0;
System.out.println(a*b); // 6.0
System.out.println(a+b); // 5.0
System.out.println(a-b); // 1.0
System.out.println(a/b); // 1.5 // if you want to get 1 here you should cast it to `integer (int)`
System.out.println(a>=b); // true
System.out.println(a<=b); // false
but be careful while multiplication (*) because a*b can cause overflow while casting to integer. same situation for (+ and -)
Indeed, I 've found the standard and it says "yes"
JVM spec:
The rounding operations of the Java virtual machine always use IEEE 754 round to
nearest mode. Inexact results are rounded to the nearest representable value, with ties going to the value with a zero least-significant bit. This is the IEEE 754 default mode. But Java virtual machine instructions that convert values of floating-point types to values of integral types round toward zero. The Java virtual machine does not give any means to change the floating-point rounding mode.
ANSI/IEEE Std 754-1985 5.
... Except for binary <---> decimal conversion, each of the operations shall be performed as if it first produced an intermediate result correct to infinite precision and with unbounded range, and then coerced this intermediate result to fit in the destination’s format
ANSI/IEEE Std 754-1985 5.4.
Conversions between floating-point integers and integer formats shall be exact unless an exception arises as specified in 7.1.
Summary
1) exact operations are always exact if the result fits the double format (and, therefore, integer result is always floating-point integer).
2) int <--> double conversions are always exact for floating point integers.

Why does Java implicitly (without cast) convert a `long` to a `float`?

Every time I think I understand about casting and conversions, I find another strange behavior.
long l = 123456789L;
float f = l;
System.out.println(f); // outputs 1.23456792E8
Given that a long has greater bit-depth than a float, I would expect that an explicit cast would be required in order for this to compile. And not surprisingly, we see that we have lost precision in the result.
Why is a cast not required here?
The same question could be asked of long to double - both conversions may lose information.
Section 5.1.2 of the Java Language Specification says:
Widening primitive conversions do not
lose information about the overall
magnitude of a numeric value. Indeed,
conversions widening from an integral
type to another integral type do not
lose any information at all; the
numeric value is preserved exactly.
Conversions widening from float to
double in strictfp expressions also
preserve the numeric value exactly;
however, such conversions that are not
strictfp may lose information about
the overall magnitude of the converted
value.
Conversion of an int or a long value
to float, or of a long value to
double, may result in loss of
precision-that is, the result may lose
some of the least significant bits of
the value. In this case, the resulting
floating-point value will be a
correctly rounded version of the
integer value, using IEEE 754
round-to-nearest mode (§4.2.4).
In other words even though you may lose information, you know that the value will still be in the overall range of the target type.
The choice could certainly have been made to require all implicit conversions to lose no information at all - so int and long to float would have been explicit and long to double would have been explicit. (int to double is okay; a double has enough precision to accurately represent all int values.)
In some cases that would have been useful - in some cases not. Language design is about compromise; you can't win 'em all. I'm not sure what decision I'd have made...
The Java Language Specification, Chapter 5: Conversion and Promotion addresses this issue:
5.1.2 Widening Primitive Conversion
The following 19 specific conversions
on primitive types are called the
widening primitive conversions:
byte to short, int, long, float, or double
short to int, long, float, or double
char to int, long, float, or double
int to long, float, or double
long to float or double
float to double
Widening primitive conversions do not lose information about the overall magnitude of a numeric value.
...
Conversion of an int or a long value to float, or of a long value to double, may result in loss of precision-that is, the result may lose some of the least significant bits of the value. In this case, the resulting floating-point value will be a correctly rounded version of the integer value
To put it another way, the JLS distinguishes between a loss of magnitude and a loss of precision.
int to byte for example is a (potential) loss of magnitude because you can't store 500 in a byte.
long to float is a potential loss of precision but not magnitude because the value range for floats is larger than that for longs.
So the rule is:
Loss of magnitude: explicit cast required;
Loss of precision: no cast required.
Subtle? Sure. But I hope that clears that up.
Though you're correct that a long uses more bits internally than a float, the java language works on a widening path:
byte -> short -> int -> long -> float -> double
To convert from left to right (a widening conversion), there is no cast necessary (which is why long to float is allowed). To convert right to left (a narrowing conversion) an explicit cast is necessary.
Somewhere I heard this. Float can store in exponential form as is we write it. '23500000000' is stored as '2.35e10' .So, float has space to occupy the range of values of long. Storing in exponential form is also the reason for precision loss.

Categories