Loss of precision - int -> float or double

Loss of precision - int -> float or double - java

I have an exam question I am revising for and the question is for 4 marks.
"In java we can assign a int to a double or a float". Will this ever lose information and why?
I have put that because ints are normally of fixed length or size - the precision for storing data is finite, where storing information in floating point can be infinite, essentially we lose information because of this
Now I am a little sketchy as to whether or not I am hitting the right areas here. I very sure it will lose precision but I can't exactly put my finger on why. Can I get some help, please?

In Java Integer uses 32 bits to represent its value.
In Java a FLOAT uses a 23 bit mantissa, so integers greater than 2^23 will have their least significant bits truncated. For example 33554435 (or 0x200003) will be truncated to around 33554432 +/- 4
In Java a DOUBLE uses a 52 bit mantissa, so will be able to represent a 32bit integer without lost of data.
See also "Floating Point" on wikipedia

It's not necessary to know the internal layout of floating-point numbers. All you need is the pigeonhole principle and the knowledge that int and float are the same size.
int is a 32-bit type, for which every bit pattern represents a distinct integer, so there are 2^32 int values.
float is a 32-bit type, so it has at most 2^32 distinct values.
Some floats represent non-integers, so there are fewer than 2^32 float values that represent integers.
Therefore, different int values will be converted to the same float (=loss of precision).
Similar reasoning can be used with long and double.

Here's what JLS has to say about the matter (in a non-technical discussion).
JLS 5.1.2 Widening primitive conversion
The following 19 specific conversions on primitive types are called the widening primitive conversions:
int to long, float, or double
(rest omitted)
Conversion of an int or a long value to float, or of a long value to double, may result in loss of precision -- that is, the result may lose some of the least significant bits of the value. In this case, the resulting floating-point value will be a correctly rounded version of the integer value, using IEEE 754 round-to-nearest mode.
Despite the fact that loss of precision may occur, widening conversions among primitive types never result in a run-time exception.
Here is an example of a widening conversion that loses precision:
class Test {
public static void main(String[] args) {
int big = 1234567890;
float approx = big;
System.out.println(big - (int)approx);
}
}
which prints:
-46
thus indicating that information was lost during the conversion from type int to type float because values of type float are not precise to nine significant digits.

No, float and double are fixed-length too - they just use their bits differently. Read more about how exactly they work in the Floating-Poing Guide .
Basically, you cannot lose precision when assigning an int to a double, because double has 52 bits of precision, which is enough to hold all int values. But float only has 23 bits of precision, so it cannot exactly represent all int values that are larger than about 2^23.

Your intuition is correct, you MAY loose precision when converting int to float. However it not as simple as presented in most other answers.
In Java a FLOAT uses a 23 bit mantissa, so integers greater than 2^23 will have their least significant bits truncated. (from a post on this page)
Not true.
Example: here is an integer that is greater than 2^23 that converts to a float with no loss:
int i = 33_554_430 * 64; // is greater than 2^23 (and also greater than 2^24); i = 2_147_483_520
float f = i;
System.out.println("result: " + (i - (int) f)); // Prints: result: 0
System.out.println("with i:" + i + ", f:" + f);//Prints: with i:2_147_483_520, f:2.14748352E9
Therefore, it is not true that integers greater than 2^23 will have their least significant bits truncated.
The best explanation I found is here:
A float in Java is 32-bit and is represented by:
sign * mantissa * 2^exponent
sign * (0 to 33_554_431) * 2^(-125 to +127)
Source: http://www.ibm.com/developerworks/java/library/j-math2/index.html
Why is this an issue?
It leaves the impression that you can determine whether there is a loss of precision from int to float just by looking at how large the int is.
I have especially seen Java exam questions where one is asked whether a large int would convert to a float with no loss.
Also, sometimes people tend to think that there will be loss of precision from int to float:
when an int is larger than: 1_234_567_890 not true (see counter-example above)
when an int is larger than: 2 exponent 23 (equals: 8_388_608) not true
when an int is larger than: 2 exponent 24 (equals: 16_777_216) not true
Conclusion
Conversions from sufficiently large ints to floats MAY lose precision.
It is not possible to determine whether there will be loss just by looking at how large the int is (i.e. without trying to go deeper into the actual float representation).

Possibly the clearest explanation I've seen:
http://www.ibm.com/developerworks/java/library/j-math2/index.html
the ULP or unit of least precision defines the precision available between any two float values. As these values increase the available precision decreases.
For example: between 1.0 and 2.0 inclusive there are 8,388,609 floats, between 1,000,000 and 1,000,001 there are 17. At 10,000,000 the ULP is 1.0, so above this value you soon have multiple integeral values mapping to each available float, hence the loss of precision.

There are two reasons that assigning an int to a double or a float might lose precision:
There are certain numbers that just can't be represented as a double/float, so they end up approximated
Large integer numbers may contain too much precision in the lease-significant digits

For these examples, I'm using Java.
Use a function like this to check for loss of precision when casting from int to float
static boolean checkPrecisionLossToFloat(int val)
{
if(val < 0)
{
val = -val;
}
// 8 is the bit-width of the exponent for single-precision
return Integer.numberOfLeadingZeros(val) + Integer.numberOfTrailingZeros(val) < 8;
}
Use a function like this to check for loss of precision when casting from long to double
static boolean checkPrecisionLossToDouble(long val)
{
if(val < 0)
{
val = -val;
}
// 11 is the bit-width for the exponent in double-precision
return Long.numberOfLeadingZeros(val) + Long.numberOfTrailingZeros(val) < 11;
}
Use a function like this to check for loss of precision when casting from long to float
static boolean checkPrecisionLossToFloat(long val)
{
if(val < 0)
{
val = -val;
}
// 8 + 32
return Long.numberOfLeadingZeros(val) + Long.numberOfTrailingZeros(val) < 40;
}
For each of these functions, returning true means that casting that integral value to the floating point value will result in a loss of precision.
Casting to float will lose precision if the integral value has more than 24 significant bits.
Casting to double will lose precision if the integral value has more than 53 significant bits.

You can assign double as int without losing precision.

Related

Adding Float to long makes value decrease [duplicate]

This question already has answers here:
Why does Java implicitly (without cast) convert a `long` to a `float`?
(4 answers)
Closed 7 years ago.
if you call the following method of Java
void processIt(long a) {
float b = a; /*do I have loss here*/
}
do I have information loss when I assign the long variable to the float variable?
The Java language Specification says that the float type is a supertype of long.

Do I have information loss when I assign the long variable to the float variable?
Potentially, yes. That should be fairly clear from the fact that long has 64 bits of information, whereas float has only 32.
More specifically, as float values get bigger, the gap between successive values becomes more than 1 - whereas with long, the gap between successive values is always 1.
As an example:
long x = 100000000L;
float f1 = (float) x;
float f2 = (float) (x + 1);
System.out.println(f1 == f2); // true
In other words, two different long values have the same nearest representation in float.
This isn't just true of float though - it can happen with double too. In that case the numbers have to be bigger (as double has more precision) but it's still potentially lossy.
Again, it's reasonably easy to see that it has to be lossy - even though both long and double are represented in 64 bits, there are obviously double values which can't be represented as long values (trivially, 0.5 is one such) which means there must be some long values which aren't exactly representable as double values.

Yes, this is possible: if only for the reason that float has too few (typically 6-7) significant digits to deal with all possible numbers that long can represent (19 significant digits). This is in part due to the fact that float has only 32 bits of storage, and long has 64 (the other part is float's storage format † ). As per the JLS:
A widening conversion of an int or a long value to float, or of a long value to double, may result in loss of precision - that is, the result may lose some of the least significant bits of the value. In this case, the resulting floating-point value will be a correctly rounded version of the integer value, using IEEE 754 round-to-nearest mode (§4.2.4).
By example:
long i = 1000000001; // 10 significant digits
float f = i;
System.out.printf(" %d %n %.1f", i, f);
This prints (with the difference highlighted):
1000000001
1000000000.0
~ ← lost the number 1
It is worth noting this is also the case with int to float and long to double (as per that quote). In fact the only integer → floating point conversion that won't lose precision is int to double.
~~~~~~
† I say in part as this is also true for int widening to float which can also lose precision, despite both int and float having 32-bits. The same sample above but with int i has the same result as printed. This is unsurprising once you consider the way that float is structured; it uses some of the 32-bits to store the mantissa, or significand, so cannot represent all integer numbers in the same range as that of int.

Yes you will, for example...
public static void main(String[] args) {
long g = 2;
g <<= 48;
g++;
System.out.println(g);
float f = (float) g;
System.out.println(f);
long a = (long) f;
System.out.println(a);
}
... prints...
562949953421313
5.6294995E14
562949953421312

Long is of size 8 bytes then how can it be 'promoted' to float(4 bytes) in JAVA?

I read that in Java the long type can be promoted float and double ( http://www.javatpoint.com/method-overloading-in-java ). I wanted to ask that long integer takes 8 bytes of memory in JAVA and float takes 4 bytes then how this promotion works? Isn't it possible that we could be facing some data loss if we promote this way? Also it is noticeable that all other type promotions are from smaller size primitive datatype to similar or larger size datatypes.
byte to short, int, long, float, or double
short to int, long, float, or double
char to int, long, float, or double
int to long, float, or double
long to float or double _______________ Exceptional In case Of Float
float to double

long uses more bytes, but it has a smaller range: while long cannot go above 263, float can go to about 2127. Obviously, the expansion of range comes at the price of lower precision, but since the range of float is larger, the conversion from long to float is a promotion.

float is represented in a different way than integral types. For further infos on the floating-type, read this: https://en.wikipedia.org/wiki/Single-precision_floating-point_format . The content cooked down would look like this: the floating-point format consists of a sign-bit, 8 bits for the exponent and 23 bits for the fractional part of the value. The value is calculated like this: (-1)^signbit * 1.fractionalpart * 2 ^ (exponent - 127). Thus this algorithm allows representation of larger values than a 64bit integral type.

This quick test should show why:
public class Main {
public static void main(String[] args) {
System.out.println("Float: " + Float.MAX_VALUE);
System.out.println("Long: " + Long.MAX_VALUE);
}
}
Ouput:
Float: 3.4028235E38
Long: 9223372036854775807
Note the scientific notation in the Float line. The Float takes up less space, but due to its representation, it can hold up to a larger number than a Long.

Difference between number/10 and number*0.1 in java

I've been working on an interview question for 1.5 hours and could not find the bug in my Java program.
And then I found what the problem was, which I don't understand (don't pay attention to the values, there were others, it's about the types):
int size=100;
Integer a=12;
if(a >= size/10)...
//didn't work
is different than
if(a >= size*0.1)...
//worked
I understand that there is a conversion, but still, how is it possible that with a=12, if(a>=size/10) returns false?
Why is that?

/10 is integer division. While *0.1 first converts the first operand to a double and performs a floating point multiplication.
If you use the /10, and the operand is 14, it will result in 1 indeed, 14/10=1.4 but integer division rounds this down. Thus 29/10=2.
If you use *0.1, the Java compiler will first convert the value of size to a double, thus 14.0 and then muliplies it with 0.1 resulting in 1.4.
On the other hand it's not all beaty that comes out of floating points. float and double can't represent every integer, and round off after computation.
For the given values for size however, it will result in the effect because 100 is a multiple of 10 and a float or double is capable of representing any integer value in the range from zero to hundred.
Finally /10 is not always an integer division: if the first operand is a floating point (e.g. 14.0d/10), the compiler will convert this to a floating point division.
Short version:
int/int is an integer division that rounds down to the nearest (lower) integer.
int*double is a double multiplication that - with rounding off errors - results in the floating point value, nearest to the correct result (with decimal digits).

I just tested here:
public class a {
public static void main(String[] args) {
int size = 100;
int a = 12;
System.out.println((a >= size / 10) ? "OK" : "Failed?");
}
}
And it worked. I don't think this is your real problem. Probably it's in another part of your code.

Float and double datatype in Java

The float data type is a single-precision 32-bit IEEE 754 floating point and the double data type is a double-precision 64-bit IEEE 754 floating point.
What does it mean? And when should I use float instead of double or vice-versa?

The Wikipedia page on it is a good place to start.
To sum up:
float is represented in 32 bits, with 1 sign bit, 8 bits of exponent, and 23 bits of the significand (or what follows from a scientific-notation number: 2.33728*1012; 33728 is the significand).
double is represented in 64 bits, with 1 sign bit, 11 bits of exponent, and 52 bits of significand.
By default, Java uses double to represent its floating-point numerals (so a literal 3.14 is typed double). It's also the data type that will give you a much larger number range, so I would strongly encourage its use over float.
There may be certain libraries that actually force your usage of float, but in general - unless you can guarantee that your result will be small enough to fit in float's prescribed range, then it's best to opt with double.
If you require accuracy - for instance, you can't have a decimal value that is inaccurate (like 1/10 + 2/10), or you're doing anything with currency (for example, representing $10.33 in the system), then use a BigDecimal, which can support an arbitrary amount of precision and handle situations like that elegantly.

A float gives you approx. 6-7 decimal digits precision while a double gives you approx. 15-16. Also the range of numbers is larger for double.
A double needs 8 bytes of storage space while a float needs just 4 bytes.

Floating-point numbers, also known as real numbers, are used when evaluating expressions that require fractional precision. For example, calculations such as square root, or transcendentals such as sine and cosine, result in a value whose precision requires a floating-point type. Java implements the standard (IEEE–754) set of floatingpoint types and operators. There are two kinds of floating-point types, float and double, which represent single- and double-precision numbers, respectively. Their width and ranges are shown here:
Name Width in Bits Range
double 64 1 .7e–308 to 1.7e+308
float 32 3 .4e–038 to 3.4e+038
float
The type float specifies a single-precision value that uses 32 bits of storage. Single precision is faster on some processors and takes half as much space as double precision, but will become imprecise when the values are either very large or very small. Variables of type float are useful when you need a fractional component, but don't require a large degree of precision.
Here are some example float variable declarations:
float hightemp, lowtemp;
double
Double precision, as denoted by the double keyword, uses 64 bits to store a value. Double precision is actually faster than single precision on some modern processors that have been optimized for high-speed mathematical calculations. All transcendental math functions, such as sin( ), cos( ), and sqrt( ), return double values. When you need to maintain accuracy over many iterative calculations, or are manipulating large-valued numbers, double is the best choice.

This will give error:
public class MyClass {
public static void main(String args[]) {
float a = 0.5;
}
}
/MyClass.java:3: error: incompatible types: possible lossy conversion from double to float
float a = 0.5;
This will work perfectly fine
public class MyClass {
public static void main(String args[]) {
double a = 0.5;
}
}
This will also work perfectly fine
public class MyClass {
public static void main(String args[]) {
float a = (float)0.5;
}
}
Reason : Java by default stores real numbers as double to ensure higher precision.
Double takes more space but more precise during computation and float takes less space but less precise.

Java seems to have a bias towards using double for computations nonetheless:
Case in point the program I wrote earlier today, the methods didn't work when I used float, but now work great when I substituted float with double (in the NetBeans IDE):
package palettedos;
import java.util.*;
class Palettedos{
private static Scanner Z = new Scanner(System.in);
public static final double pi = 3.142;
public static void main(String[]args){
Palettedos A = new Palettedos();
System.out.println("Enter the base and height of the triangle respectively");
int base = Z.nextInt();
int height = Z.nextInt();
System.out.println("Enter the radius of the circle");
int radius = Z.nextInt();
System.out.println("Enter the length of the square");
long length = Z.nextInt();
double tArea = A.calculateArea(base, height);
double cArea = A.calculateArea(radius);
long sqArea = A.calculateArea(length);
System.out.println("The area of the triangle is\t" + tArea);
System.out.println("The area of the circle is\t" + cArea);
System.out.println("The area of the square is\t" + sqArea);
}
double calculateArea(int base, int height){
double triArea = 0.5*base*height;
return triArea;
}
double calculateArea(int radius){
double circArea = pi*radius*radius;
return circArea;
}
long calculateArea(long length){
long squaArea = length*length;
return squaArea;
}
}

According to the IEEE standards, float is a 32 bit representation of a real number while double is a 64 bit representation.
In Java programs we normally mostly see the use of double data type. It's just to avoid overflows as the range of numbers that can be accommodated using the double data type is more that the range when float is used.
Also when high precision is required, the use of double is encouraged. Few library methods that were implemented a long time ago still requires the use of float data type as a must (that is only because it was implemented using float, nothing else!).
But if you are certain that your program requires small numbers and an overflow won't occur with your use of float, then the use of float will largely improve your space complexity as floats require half the memory as required by double.

This example illustrates how to extract the sign (the leftmost bit), exponent (the 8 following bits) and mantissa (the 23 rightmost bits) from a float in Java.
int bits = Float.floatToIntBits(-0.005f);
int sign = bits >>> 31;
int exp = (bits >>> 23 & ((1 << 8) - 1)) - ((1 << 7) - 1);
int mantissa = bits & ((1 << 23) - 1);
System.out.println(sign + " " + exp + " " + mantissa + " " +
Float.intBitsToFloat((sign << 31) | (exp + ((1 << 7) - 1)) << 23 | mantissa));
The same approach can be used for double’s (11 bit exponent and 52 bit mantissa).
long bits = Double.doubleToLongBits(-0.005);
long sign = bits >>> 63;
long exp = (bits >>> 52 & ((1 << 11) - 1)) - ((1 << 10) - 1);
long mantissa = bits & ((1L << 52) - 1);
System.out.println(sign + " " + exp + " " + mantissa + " " +
Double.longBitsToDouble((sign << 63) | (exp + ((1 << 10) - 1)) << 52 | mantissa));
Credit: http://s-j.github.io/java-float/

You should use double instead of float for precise calculations, and float instead of double when using less accurate calculations. Float contains only decimal numbers, but double contains an IEEE754 double-precision floating point number, making it easier to contain and computate numbers more accurately. Hope this helps.

In regular programming calculations, we don’t use float. If we ensure that the result range is within the range of float data type then we can choose a float data type for saving memory. Generally, we use double because of two reasons:-
If we want to use the floating-point number as float data type then method caller must explicitly suffix F or f, because by default every floating-point number is treated as double. It increases the burden to the programmer. If we use a floating-point number as double data type then we don’t need to add any suffix.
Float is a single-precision data type means it occupies 4 bytes. Hence in large computations, we will not get a complete result. If we choose double data type, it occupies 8 bytes and we will get complete results.
Both float and double data types were designed especially for scientific calculations, where approximation errors are acceptable. If accuracy is the most prior concern then, it is recommended to use BigDecimal class instead of float or double data types. Source:- Float and double datatypes in Java

loss of precision during widening conversions

Output of the below code is
package com.ajay.compoitepattern;
class Test {
public static void main(String[] args) {
int big = 1234567890;
float approx = big;
System.out.println(big - (approx));
System.out.println(big - (int)(approx));
}
}
The outptut to this program is
0.0
-46
My question is , if the precision was lost in the widening conversion it should have been -46 in the first
sysout also, why is the first output 0.0 ?

First output is 0.0 because float substract from an int and it makes whole statement in to float.
That means int big also converted into a float value. Basically you are doing is (approx - approx) as (float)big = approx.
This should be the reason why you are getting zero.
If you want just try this one too
System.out.println(big/(approx));
Operations between int and float converts the whole statement into float

This is covered by JLS §4.2.4:
If at least one of the operands to a binary operator is of
floating-point type, then the operation is a floating-point operation,
even if the other is integral.
In your first example, big - approx, since approx is a float, this counts as a floating-point operation. big is widened into a float, and since approx is also big widening into a float, the loss of precision cancels itself out, netting you an answer of zero.
In your second example, big - (int) approx, neither operand is a floating-point type since you casted approx to int. The loss of precision is now present, and your answer is no longer zero.

float in Java uses 23 bit for mantissa, plus 1 implisit bit, hence 24 significant bits are available, and int is represented by 32 bits. So there is inevitable loss of precision for int values higher than 2^23. You should use double here, as it has a 53 significant bits, so the result will be 0 in second case.
public static void main(String[] args) {
int big = 1234567890;
double approx = big;
System.out.println(big - (approx)); // --> 0.0
System.out.println(big - (int) (approx)); // --> 0
}

Read this may help you to understand. Read this SO post too..
Let's consider your code
int big = 1234567890; // here your int big=1234567890;
float approx = big;
BigDecimal bd=BigDecimal.valueOf(approx);
System.out.println(bd); // Now you can see int to float creates approx= 1234567936;
System.out.println(big - (approx)); // In here big again casting to float since you are doing a operation with float, result is zero.
System.out.println(big - (int)(approx)); // here big=1234567890 and approx= 1234567936. so difference is -46

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.