Java Conversion into Double - java

class Rextester
{
public static void main(String args[])
{
double b = 1.13f * 100;
System.out.println(b);
}
}
In the Above code when f is not appended to 1.13 the output is 112.99999999999999 but when f is appended to 1.13 the value us 113. Why is this behaviour?

The f suffix is telling Java that the number is a single precision floating point number, instead of a double floating point number.
The problem with floating point numbers in general, is that certain numbers cannot be precisely represented. Each bit of the mantissa of the internal representation represents a fraction with a power of 2 in the denominator, so 1/2, 1/4, 1/8, 1/16 etc. Then the computer will select the closest number that represents the number you want.
What is happening in your case is that when you leave out the f, it uses the full double precision bits, and gets the closest number to it (112.9999999999). When you do the f your are telling the program to round it up to the closest single precision floating point, so the first 9 that doesn't fit gets rounded and propagates up to the value of 113.
It is a bit of a matter of coincidence for this specific number. Don't assume that using single precision floating point will always give you the expected result. Floating point arithmetic is always a bit messy in computing.

When you add f to the decimal it makes it a float constant which has only about 6 digits of precision. This makes representation error much more likely and much bigger.
When you drop the use of f, the decimal is a double which has half a trillion times the precision. This makes the representation error much smaller, and when printed as a double you are less likely to see it.
When you print a double, the libraries expects there to be some representation error and will show you the simplest/shortest number which has the same representation as the double. (This is actually an infinite number of numbers which map to the same representation)
However, this implicit rounding will only correct a very small amount and is unlikely to correct for the representation error of a float. Note: if you print using a float instead of a double it will perform greater rounding, hiding the error.

Related

Java Float multiplication giving wrong result

Multiplication using FLOAT is giving noticeable difference.
public static void main(String[] args) {
// using string and parsing instead of actual data type is part of use case, that is why representing the same here
double v1 = parseDouble("590.0");
double v2 = parseDouble("490.0");
double v3 = parseDouble("391.0");
float v4 = parseFloat("590.0");
float v5 = parseFloat("490.0");
float v6 = parseFloat("391.0");
System.out.println(new BigDecimal(v1 * v2 * v3));
System.out.println(new BigDecimal(v4 * v5 * v6));
System.out.println(BigDecimal.valueOf(Float.parseFloat("289100.0") * Float.parseFloat("391.0")));
System.out.println(BigDecimal.valueOf(Double.parseDouble("289100.0") * Double.parseDouble("391.0")));
}
Output:
113038100 // double multiplication
113038096 // float multiplication
113038096
113038100
For above code,
(590.0 * 490.0 * 391.0) gives 113038100 using double
(590.0 * 490.0 * 391.0) gives 113038096 using float (113038100 - 113038096 = 4 // difference)
I have read through https://floating-point-gui.de/basic/ this link and able to understand how floating point calculation happens and all however 4 count different is unexpected.
Please help me understand below things
Is this correct first
Does always float gives wrong numbers ??
As I can see double also use same technique, so how much guarantee we have to get correct result if we use double
Does always float gives wrong numbers ??
it depend on the number if the number can be represented using the float precision then it will be fine
"As I can see double also use same technique, so how much guarantee we
have to get correct result if we use double"
double has same issue but since double has more precision the possibility get lower but it still happen
so when you need a very precise result like in scientific or financial app you will need to use BigDecimal
watch this video it explain how the float point number work
https://www.youtube.com/watch?v=ajaHQ9S4uTA
Is this correct first
The Java float format is IEEE-754 binary32. In this format, every finite number is represented as a sign, a 24-bit integer, and a scaling by a power of two from 2−149 to 2104. The integer part is called the significand. (The format is often described as a sign, a 24-bit number with a binary point after the first bit, so it has a value in [0, 2), and a scaling from 2−126 to 2127. These are mathematically equivalent, and the format used here is noted in the IEEE-754 standard as an option.) In normal form, the 24-bit integer is 223 or greater. (Representable numbers less than 2−126 cannot be represented in normal form and are necessarily subnormal.)
In this format, 590 can be represented as +590•20 or +8,339,456•2−14. 490 is +490•20 or +16,056,320•2−15.
Their product is +289,100•20 or +9,251,200•2−5.
391 is +391•20 or +12,812,288−15.
The ordinary arithmetic product of +289,100•20 and +391•20 is +113,038,100•20. However, 113,038,100 is not a 24-bit number; it is a 27-bit number. To get it under 224, we can adjust the scaling, multiplying the significand by ⅛ and multiplying the scaling by 8 = 23.
That gives us +14,129,762.5•23. However, now the significand is not an integer. This result is not representable in the float format. To produce a result, the operation of adding in the float format is defined to round the ordinary arithmetic to the nearest representable value. In this case, there is a tie, we could round the .5 up or down. Ties are resolved by rounding to make the low digit even, so we round to +14,129,762•23.
+14,129,762•23 is 113,038,096. That is the result you got, so it is correct.
Does always float gives wrong numbers ??
This is not wrong; the computer behaved according to its specification.
Observe float is a 32-bit format, but there are infinitely many real numbers. There are even infinitely many rational numbers. It is impossible for a 32-bit format to produce the same results as theoretical real-number arithmetic or rational-number arithmetic. There are simply more possible results than there are representable values.
This is true of the 64-bit double format as well. It is also true of integer formats, fixed-precision formats, and all numerical formats with a fixed number of bits. A fixed number of bits cannot represent infinitely many values.
Your comments suggest you thought floating-point would produce approximate results for fractional values, numbers less than one. But the limitation on how many values can be represented applies at all scales. At each scale (each power of two), only 224 values are representable (223 in normal form). For scale 20, all the non-negative integers below 224 are representable. But, above that, only some of the integers are representable. At first, we have to skip every second integer, then every fourth, then every eighth, and so on.
Floating-point arithmetic is designed to approximate real-number arithmetic. It should be used when you want to approximate real-number arithmetic. It should not be used, with rare exceptions, when you want exact arithmetic.

Is there a way to get right results from BigDecimal.floatValue() function? [duplicate]

I am working with an application that is based entirely on doubles, and am having trouble in one utility method that parses a string into a double. I've found a fix where using BigDecimal for the conversion solves the issue, but raises another problem when I go to convert the BigDecimal back to a double: I'm losing several places of precision. For example:
import java.math.BigDecimal;
import java.text.DecimalFormat;
public class test {
public static void main(String [] args){
String num = "299792.457999999984";
BigDecimal val = new BigDecimal(num);
System.out.println("big decimal: " + val.toString());
DecimalFormat nf = new DecimalFormat("#.0000000000");
System.out.println("double: "+val.doubleValue());
System.out.println("double formatted: "+nf.format(val.doubleValue()));
}
}
This produces the following output:
$ java test
big decimal: 299792.457999999984
double: 299792.458
double formatted: 299792.4580000000
The formatted double demonstrates that it's lost the precision after the third place (the application requires those lower places of precision).
How can I get BigDecimal to preserve those additional places of precision?
Thanks!
Update after catching up on this post. Several people mention this is exceeding the precision of the double data type. Unless I'm reading this reference incorrectly:
http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.3
then the double primitive has a maximum exponential value of Emax = 2K-1-1, and the standard implementation has K=11. So, the max exponent should be 511, no?
You've reached the maximum precision for a double with that number. It can't be done. The value gets rounded up in this case. The conversion from BigDecimal is unrelated and the precision problem is the same either way. See this for example:
System.out.println(Double.parseDouble("299792.4579999984"));
System.out.println(Double.parseDouble("299792.45799999984"));
System.out.println(Double.parseDouble("299792.457999999984"));
Output is:
299792.4579999984
299792.45799999987
299792.458
For these cases double has more than 3 digits of precision after the decimal point. They just happen to be zeros for your number and that's the closest representation you can fit into a double. It's closer for it to round up in this case, so your 9's seem to disappear. If you try this:
System.out.println(Double.parseDouble("299792.457999999924"));
You'll notice that it keeps your 9's because it was closer to round down:
299792.4579999999
If you require that all of the digits in your number be preserved then you'll have to change your code that operates on double. You could use BigDecimal in place of them. If you need performance then you might want to explore BCD as an option, although I'm not aware of any libraries offhand.
In response to your update: the maximum exponent for a double-precision floating-point number is actually 1023. That's not your limiting factor here though. Your number exceeds the precision of the 52 fractional bits that represent the significand, see IEEE 754-1985.
Use this floating-point conversion to see your number in binary. The exponent is 18 since 262144 (2^18) is nearest. If you take the fractional bits and go up or down one in binary, you can see there's not enough precision to represent your number:
299792.457999999900 // 0010010011000100000111010100111111011111001110110101
299792.457999999984 // here's your number that doesn't fit into a double
299792.458000000000 // 0010010011000100000111010100111111011111001110110110
299792.458000000040 // 0010010011000100000111010100111111011111001110110111
The problem is that a double can hold 15 digits, while a BigDecimal can hold an arbitrary number. When you call toDouble(), it attempts to apply a rounding mode to remove the excess digits. However, since you have a lot of 9's in the output, that means that they keep getting rounded up to 0, with a carry to the next-highest digit.
To keep as much precision as you can, you need to change the BigDecimal's rounding mode so that it truncates:
BigDecimal bd1 = new BigDecimal("12345.1234599999998");
System.out.println(bd1.doubleValue());
BigDecimal bd2 = new BigDecimal("12345.1234599999998", new MathContext(15, RoundingMode.FLOOR));
System.out.println(bd2.doubleValue());
Only that many digits are printed so that, when parsing the string back to double, it will result in the exact same value.
Some detail can be found in the javadoc for Double#toString
How many digits must be printed for the fractional part of m or a? There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type double. That is, suppose that x is the exact mathematical value represented by the decimal representation produced by this method for a finite nonzero argument d. Then d must be the double value nearest to x; or if two double values are equally close to x, then d must be one of them and the least significant bit of the significand of d must be 0.
If it's entirely based on doubles ... why are you using BigDecimal? Wouldn't Double make more sense? If it's too large of value (or too much precision) for that then ... you can't convert it; that would be the reason to use BigDecimal in the first place.
As to why it's losing precision, from the javadoc
Converts this BigDecimal to a double. This conversion is similar to the narrowing primitive conversion from double to float as defined in the Java Language Specification: if this BigDecimal has too great a magnitude represent as a double, it will be converted to Double.NEGATIVE_INFINITY or Double.POSITIVE_INFINITY as appropriate. Note that even when the return value is finite, this conversion can lose information about the precision of the BigDecimal value.
You've hit the maximum possible precision for the double. If you would still like to store the value in primitives... one possible way is to store the part before the decimal point in a long
long l = 299792;
double d = 0.457999999984;
Since you are not using up (that's a bad choice of words) the precision for storing the decimal section, you can hold more digits of precision for the fractional component. This should be easy enough to do with some rounding etc..

Appropriate scale for converting via BigDecimal to floating point

I've written an arbitrary precision rational number class that needs to provide a way to convert to floating-point. This can be done straightforwardly via BigDecimal:
return new BigDecimal(num).divide(new BigDecimal(den), 17, RoundingMode.HALF_EVEN).doubleValue();
but this requires a value for the scale parameter when dividing the decimal numbers. I picked 17 as the initial guess because that is approximately the precision of a double precision floating point number, but I don't know whether that's actually correct.
What would be the correct number to use, defined as, the smallest number such that making it any larger would not make the answer any more accurate?
Introduction
No finite precision suffices.
The problem posed in the question is equivalent to:
What precision p guarantees that converting any rational number x to p decimal digits and then to floating-point yields the floating-point number nearest x (or, in case of a tie, either of the two nearest x)?
To see this is equivalent, observe that the BigDecimal divide shown in the question returns num/div to a selected number of decimal places. The question then asks whether increasing that number of decimal places could increase the accuracy of the result. Clearly, if there is a floating-point number nearer x than the result, then the accuracy could be improved. Thus, we are asking how many decimal places are needed to guarantee the closest floating-point number (or one of the tied two) is obtained.
Since BigDecimal offers a choice of rounding methods, I will consider whether any of them suffices. For the conversion to floating-point, I presume round-to-nearest-ties-to-even is used (which BigDecimal appears to use when converting to Double or Float). I give a proof using the IEEE-754 binary64 format, which Java uses for Double, but the proof applies to any binary floating-point format by changing the 252 used below to 2w-1, where w is the number of bits in the significand.
Proof
One of the parameters to a BigDecimal division is the rounding method. Java’s BigDecimal has several rounding methods. We only need to consider three, ROUND_UP, ROUND_HALF_UP, and ROUND_HALF_EVEN. Arguments for the others are analogous to those below, by using various symmetries.
In the following, suppose we convert to decimal using any large precision p. That is, p is the number of decimal digits in the result of the conversion.
Let m be the rational number 252+1+½−10−p. The two binary64 numbers neighboring m are 252+1 and 252+2. m is closer to the first one, so that is the result we require from converting m first to decimal and then to floating-point.
In decimal, m is 4503599627370497.4999…, where there are p−1 trailing 9s. When rounded to p significant digits with ROUND_UP, ROUND_HALF_UP, or ROUND_HALF_EVEN, the result is 4503599627370497.5 = 252+1+½. (Recognize that, at the position where rounding occurs, there are 16 trailing 9s being discarded, effectively a fraction of .9999999999999999 relative to the rounding position. In ROUND_UP, any non-zero discarded amount causes rounding up. In ROUND_HALF_UP and ROUND_HALF_EVEN, a discarded amount greater than ½ at that position causes rounding up.)
252+1+½ is equally close to the neighboring binary64 numbers 252+1 and 252+2, so the round-to-nearest-ties-to-even method produces 252+2.
Thus, the result is 252+2, which is not the binary64 value closest to m.
Therefore, no finite precision p suffices to round all rational numbers correctly.

Is it possible that a number exactly represented as float can NOT be exactly represented as double?

I have a question which arose from another question about precision of floating numbers.
Now, I know that floating points can not always be represented accurately and hence they are stored as the closest possible floating number that can be represented.
My question is actually about the difference in representation of float and double.
Where does this question arise from?
Suppose I do:
System.out.println(.475d+.075d);
then the output would not be 0.55 but 0.549999 (on my machine)
However, when I do :
System.out.println(.475f+.075f);
I get the correct answer, i.e. 0.55 (a little unexpected for me)
Till now I was under an impression that double has more precision(double will be more accurate upto a longer number of decimal places) that float. So, if a double cannot be represented precisely, then its equivalent float representation will also be stored inaccurately.
However the results I got are a little disturbing for me. I am confused if:
I have an incorrect understanding of what precision means?
float and double are represented differently, apart from the fact that double has more bits?
A number that can be reprsented as a float can be represented as double too.
What you read is just formatted output, you don't read actual binary representation.
System.out.println(Long.toBinaryString(Double.doubleToRawLongBits(.475d + .075d)));
// 11111111100001100110011001100110011001100110011001100110011001
System.out.println(Integer.toBinaryString(Float.floatToRawIntBits(.475f + .075f)));
// 111111000011001100110011001101
double d = .475d + .075d;
System.out.println(d);
// 0.5499999999999999
System.out.println((float)d);
// 0.55 (as expected)
System.out.println((double)(float)d);
// 0.550000011920929
System.out.println( .475f + .075f == 0.550000011920929d);
// true
Precision just means more bits. A number that cannot be represented as a float may have an exact representation as a double, but that the number of those cases is infinitely small relative to the total number of possible cases.
For the simple cases like 0.1, that is not representable as a fixed-length floating-point number, no matter what the number of bits available. This is the same as saying that a fraction such as 1/7 cannot be represented exactly in decimal, regardless of the number of digits you are allowed to use (as long as the number of digits is finite). You can approximate it as 0.142857142857142857... repeating over and over again, but you will never be able to write it EXACTLY no matter how long you go on.
Conversely, if a number is representable exactly as a float, it will also be representable exactly as a double. A double has a larger exponent range and more mantissa bits.
For your example, the cause of the apparent discrepancy is that in float, the difference between 0.475 and its float representation was in the 'right' direction so that when truncation occurred it went how you expected it. When increasing the precision available, the representation was "closer" to 0.475 but now on the opposite side. As a gross example, let's say that the closest possible float was 0.475006 but in a double the closest possible value was 0.474999. This would give you the results you see.
Edit: Here's the results of a quick experiment:
public class Test {
public static void main(String[] args)
{
float f = 0.475f;
double d = 0.475d;
System.out.printf("%20.16f", f);
System.out.printf("%20.16f", d);
}
}
Output:
0.4749999940395355 0.4750000000000000
What this means is that the floating-point representation of the number 0.475, if you had a huge number of bits, would be just a tiny bit less than 0.475. This is see in the double representation. However, the first 'wrong' bit occurs so far to the right that when truncated to fit in a float, it just happens to work out to 0.475. This is purely an accident.
If one regards that floating-point types actually represent ranges of values, rather than discrete values (e.g. 0.1f doesn't represent 13421773/134217728, but rather "something between 13421772.5/134217728 and 13421773.5/134217728"), conversions from double to float will usually be accurate, while conversions from float to double will usually not. Unfortunately, Java allows the usually-inaccurate conversions to be performed implicitly, while requiring a typecast in the usually-accurate direction.
For every value of type float, there exists a value of type double whose range is centered about the center of the float's range. That does not mean the double is an accurate representation of the value in the float. For example, converting 0.1f to double yields a value meaning "something between 13421772.9999999/134217728 and 13421773.0000001/134217728", a value which is off by over a million times the implied tolerance.
For almost every value of type double, there exists a value of type float whose range completely includes the range implied by the double. The only exceptions are values whose range is centered precisely on the boundary between two float values. Converting such values to float would require that the system chose one range or the other; if the system rounds up when the double actually represented a number below the center of its range, or vice versa, the range of the float would not totally encompass that of the double. In practical terms, though, this is a non-issue, since it means that instead of a float cast from a double representing a range like (13421772.5/134217728 to 13421773.5/134217728) it would represent a range like (13421772.4999999/134217728 to 13421773.5000001/134217728). Compared with the horrendous imprecision resulting from a float to double cast, that tiny imprecision is nothing.
BTW, returning to the particular numbers you are using, when you do your calculations as float, the computations are:
0.075f = 20132660±½ / 268435456
0.475f = 31876710±½ / 67108864
Sum = 18454938±½ / 33554432
In other words, the sum represents a number somewhere between roughly 0.54999999701 and 0.55000002682. The most natural representation is 0.55 (since the actual value could be more or less than that, additional digits would be meaningless).

losing precision converting from java BigDecimal to double

I am working with an application that is based entirely on doubles, and am having trouble in one utility method that parses a string into a double. I've found a fix where using BigDecimal for the conversion solves the issue, but raises another problem when I go to convert the BigDecimal back to a double: I'm losing several places of precision. For example:
import java.math.BigDecimal;
import java.text.DecimalFormat;
public class test {
public static void main(String [] args){
String num = "299792.457999999984";
BigDecimal val = new BigDecimal(num);
System.out.println("big decimal: " + val.toString());
DecimalFormat nf = new DecimalFormat("#.0000000000");
System.out.println("double: "+val.doubleValue());
System.out.println("double formatted: "+nf.format(val.doubleValue()));
}
}
This produces the following output:
$ java test
big decimal: 299792.457999999984
double: 299792.458
double formatted: 299792.4580000000
The formatted double demonstrates that it's lost the precision after the third place (the application requires those lower places of precision).
How can I get BigDecimal to preserve those additional places of precision?
Thanks!
Update after catching up on this post. Several people mention this is exceeding the precision of the double data type. Unless I'm reading this reference incorrectly:
http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.3
then the double primitive has a maximum exponential value of Emax = 2K-1-1, and the standard implementation has K=11. So, the max exponent should be 511, no?
You've reached the maximum precision for a double with that number. It can't be done. The value gets rounded up in this case. The conversion from BigDecimal is unrelated and the precision problem is the same either way. See this for example:
System.out.println(Double.parseDouble("299792.4579999984"));
System.out.println(Double.parseDouble("299792.45799999984"));
System.out.println(Double.parseDouble("299792.457999999984"));
Output is:
299792.4579999984
299792.45799999987
299792.458
For these cases double has more than 3 digits of precision after the decimal point. They just happen to be zeros for your number and that's the closest representation you can fit into a double. It's closer for it to round up in this case, so your 9's seem to disappear. If you try this:
System.out.println(Double.parseDouble("299792.457999999924"));
You'll notice that it keeps your 9's because it was closer to round down:
299792.4579999999
If you require that all of the digits in your number be preserved then you'll have to change your code that operates on double. You could use BigDecimal in place of them. If you need performance then you might want to explore BCD as an option, although I'm not aware of any libraries offhand.
In response to your update: the maximum exponent for a double-precision floating-point number is actually 1023. That's not your limiting factor here though. Your number exceeds the precision of the 52 fractional bits that represent the significand, see IEEE 754-1985.
Use this floating-point conversion to see your number in binary. The exponent is 18 since 262144 (2^18) is nearest. If you take the fractional bits and go up or down one in binary, you can see there's not enough precision to represent your number:
299792.457999999900 // 0010010011000100000111010100111111011111001110110101
299792.457999999984 // here's your number that doesn't fit into a double
299792.458000000000 // 0010010011000100000111010100111111011111001110110110
299792.458000000040 // 0010010011000100000111010100111111011111001110110111
The problem is that a double can hold 15 digits, while a BigDecimal can hold an arbitrary number. When you call toDouble(), it attempts to apply a rounding mode to remove the excess digits. However, since you have a lot of 9's in the output, that means that they keep getting rounded up to 0, with a carry to the next-highest digit.
To keep as much precision as you can, you need to change the BigDecimal's rounding mode so that it truncates:
BigDecimal bd1 = new BigDecimal("12345.1234599999998");
System.out.println(bd1.doubleValue());
BigDecimal bd2 = new BigDecimal("12345.1234599999998", new MathContext(15, RoundingMode.FLOOR));
System.out.println(bd2.doubleValue());
Only that many digits are printed so that, when parsing the string back to double, it will result in the exact same value.
Some detail can be found in the javadoc for Double#toString
How many digits must be printed for the fractional part of m or a? There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type double. That is, suppose that x is the exact mathematical value represented by the decimal representation produced by this method for a finite nonzero argument d. Then d must be the double value nearest to x; or if two double values are equally close to x, then d must be one of them and the least significant bit of the significand of d must be 0.
If it's entirely based on doubles ... why are you using BigDecimal? Wouldn't Double make more sense? If it's too large of value (or too much precision) for that then ... you can't convert it; that would be the reason to use BigDecimal in the first place.
As to why it's losing precision, from the javadoc
Converts this BigDecimal to a double. This conversion is similar to the narrowing primitive conversion from double to float as defined in the Java Language Specification: if this BigDecimal has too great a magnitude represent as a double, it will be converted to Double.NEGATIVE_INFINITY or Double.POSITIVE_INFINITY as appropriate. Note that even when the return value is finite, this conversion can lose information about the precision of the BigDecimal value.
You've hit the maximum possible precision for the double. If you would still like to store the value in primitives... one possible way is to store the part before the decimal point in a long
long l = 299792;
double d = 0.457999999984;
Since you are not using up (that's a bad choice of words) the precision for storing the decimal section, you can hold more digits of precision for the fractional component. This should be easy enough to do with some rounding etc..

Categories