I've written an arbitrary precision rational number class that needs to provide a way to convert to floating-point. This can be done straightforwardly via BigDecimal:
return new BigDecimal(num).divide(new BigDecimal(den), 17, RoundingMode.HALF_EVEN).doubleValue();
but this requires a value for the scale parameter when dividing the decimal numbers. I picked 17 as the initial guess because that is approximately the precision of a double precision floating point number, but I don't know whether that's actually correct.
What would be the correct number to use, defined as, the smallest number such that making it any larger would not make the answer any more accurate?
Introduction
No finite precision suffices.
The problem posed in the question is equivalent to:
What precision p guarantees that converting any rational number x to p decimal digits and then to floating-point yields the floating-point number nearest x (or, in case of a tie, either of the two nearest x)?
To see this is equivalent, observe that the BigDecimal divide shown in the question returns num/div to a selected number of decimal places. The question then asks whether increasing that number of decimal places could increase the accuracy of the result. Clearly, if there is a floating-point number nearer x than the result, then the accuracy could be improved. Thus, we are asking how many decimal places are needed to guarantee the closest floating-point number (or one of the tied two) is obtained.
Since BigDecimal offers a choice of rounding methods, I will consider whether any of them suffices. For the conversion to floating-point, I presume round-to-nearest-ties-to-even is used (which BigDecimal appears to use when converting to Double or Float). I give a proof using the IEEE-754 binary64 format, which Java uses for Double, but the proof applies to any binary floating-point format by changing the 252 used below to 2w-1, where w is the number of bits in the significand.
Proof
One of the parameters to a BigDecimal division is the rounding method. Java’s BigDecimal has several rounding methods. We only need to consider three, ROUND_UP, ROUND_HALF_UP, and ROUND_HALF_EVEN. Arguments for the others are analogous to those below, by using various symmetries.
In the following, suppose we convert to decimal using any large precision p. That is, p is the number of decimal digits in the result of the conversion.
Let m be the rational number 252+1+½−10−p. The two binary64 numbers neighboring m are 252+1 and 252+2. m is closer to the first one, so that is the result we require from converting m first to decimal and then to floating-point.
In decimal, m is 4503599627370497.4999…, where there are p−1 trailing 9s. When rounded to p significant digits with ROUND_UP, ROUND_HALF_UP, or ROUND_HALF_EVEN, the result is 4503599627370497.5 = 252+1+½. (Recognize that, at the position where rounding occurs, there are 16 trailing 9s being discarded, effectively a fraction of .9999999999999999 relative to the rounding position. In ROUND_UP, any non-zero discarded amount causes rounding up. In ROUND_HALF_UP and ROUND_HALF_EVEN, a discarded amount greater than ½ at that position causes rounding up.)
252+1+½ is equally close to the neighboring binary64 numbers 252+1 and 252+2, so the round-to-nearest-ties-to-even method produces 252+2.
Thus, the result is 252+2, which is not the binary64 value closest to m.
Therefore, no finite precision p suffices to round all rational numbers correctly.
Related
I am working with an application that is based entirely on doubles, and am having trouble in one utility method that parses a string into a double. I've found a fix where using BigDecimal for the conversion solves the issue, but raises another problem when I go to convert the BigDecimal back to a double: I'm losing several places of precision. For example:
import java.math.BigDecimal;
import java.text.DecimalFormat;
public class test {
public static void main(String [] args){
String num = "299792.457999999984";
BigDecimal val = new BigDecimal(num);
System.out.println("big decimal: " + val.toString());
DecimalFormat nf = new DecimalFormat("#.0000000000");
System.out.println("double: "+val.doubleValue());
System.out.println("double formatted: "+nf.format(val.doubleValue()));
}
}
This produces the following output:
$ java test
big decimal: 299792.457999999984
double: 299792.458
double formatted: 299792.4580000000
The formatted double demonstrates that it's lost the precision after the third place (the application requires those lower places of precision).
How can I get BigDecimal to preserve those additional places of precision?
Thanks!
Update after catching up on this post. Several people mention this is exceeding the precision of the double data type. Unless I'm reading this reference incorrectly:
http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.3
then the double primitive has a maximum exponential value of Emax = 2K-1-1, and the standard implementation has K=11. So, the max exponent should be 511, no?
You've reached the maximum precision for a double with that number. It can't be done. The value gets rounded up in this case. The conversion from BigDecimal is unrelated and the precision problem is the same either way. See this for example:
System.out.println(Double.parseDouble("299792.4579999984"));
System.out.println(Double.parseDouble("299792.45799999984"));
System.out.println(Double.parseDouble("299792.457999999984"));
Output is:
299792.4579999984
299792.45799999987
299792.458
For these cases double has more than 3 digits of precision after the decimal point. They just happen to be zeros for your number and that's the closest representation you can fit into a double. It's closer for it to round up in this case, so your 9's seem to disappear. If you try this:
System.out.println(Double.parseDouble("299792.457999999924"));
You'll notice that it keeps your 9's because it was closer to round down:
299792.4579999999
If you require that all of the digits in your number be preserved then you'll have to change your code that operates on double. You could use BigDecimal in place of them. If you need performance then you might want to explore BCD as an option, although I'm not aware of any libraries offhand.
In response to your update: the maximum exponent for a double-precision floating-point number is actually 1023. That's not your limiting factor here though. Your number exceeds the precision of the 52 fractional bits that represent the significand, see IEEE 754-1985.
Use this floating-point conversion to see your number in binary. The exponent is 18 since 262144 (2^18) is nearest. If you take the fractional bits and go up or down one in binary, you can see there's not enough precision to represent your number:
299792.457999999900 // 0010010011000100000111010100111111011111001110110101
299792.457999999984 // here's your number that doesn't fit into a double
299792.458000000000 // 0010010011000100000111010100111111011111001110110110
299792.458000000040 // 0010010011000100000111010100111111011111001110110111
The problem is that a double can hold 15 digits, while a BigDecimal can hold an arbitrary number. When you call toDouble(), it attempts to apply a rounding mode to remove the excess digits. However, since you have a lot of 9's in the output, that means that they keep getting rounded up to 0, with a carry to the next-highest digit.
To keep as much precision as you can, you need to change the BigDecimal's rounding mode so that it truncates:
BigDecimal bd1 = new BigDecimal("12345.1234599999998");
System.out.println(bd1.doubleValue());
BigDecimal bd2 = new BigDecimal("12345.1234599999998", new MathContext(15, RoundingMode.FLOOR));
System.out.println(bd2.doubleValue());
Only that many digits are printed so that, when parsing the string back to double, it will result in the exact same value.
Some detail can be found in the javadoc for Double#toString
How many digits must be printed for the fractional part of m or a? There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type double. That is, suppose that x is the exact mathematical value represented by the decimal representation produced by this method for a finite nonzero argument d. Then d must be the double value nearest to x; or if two double values are equally close to x, then d must be one of them and the least significant bit of the significand of d must be 0.
If it's entirely based on doubles ... why are you using BigDecimal? Wouldn't Double make more sense? If it's too large of value (or too much precision) for that then ... you can't convert it; that would be the reason to use BigDecimal in the first place.
As to why it's losing precision, from the javadoc
Converts this BigDecimal to a double. This conversion is similar to the narrowing primitive conversion from double to float as defined in the Java Language Specification: if this BigDecimal has too great a magnitude represent as a double, it will be converted to Double.NEGATIVE_INFINITY or Double.POSITIVE_INFINITY as appropriate. Note that even when the return value is finite, this conversion can lose information about the precision of the BigDecimal value.
You've hit the maximum possible precision for the double. If you would still like to store the value in primitives... one possible way is to store the part before the decimal point in a long
long l = 299792;
double d = 0.457999999984;
Since you are not using up (that's a bad choice of words) the precision for storing the decimal section, you can hold more digits of precision for the fractional component. This should be easy enough to do with some rounding etc..
class Rextester
{
public static void main(String args[])
{
double b = 1.13f * 100;
System.out.println(b);
}
}
In the Above code when f is not appended to 1.13 the output is 112.99999999999999 but when f is appended to 1.13 the value us 113. Why is this behaviour?
The f suffix is telling Java that the number is a single precision floating point number, instead of a double floating point number.
The problem with floating point numbers in general, is that certain numbers cannot be precisely represented. Each bit of the mantissa of the internal representation represents a fraction with a power of 2 in the denominator, so 1/2, 1/4, 1/8, 1/16 etc. Then the computer will select the closest number that represents the number you want.
What is happening in your case is that when you leave out the f, it uses the full double precision bits, and gets the closest number to it (112.9999999999). When you do the f your are telling the program to round it up to the closest single precision floating point, so the first 9 that doesn't fit gets rounded and propagates up to the value of 113.
It is a bit of a matter of coincidence for this specific number. Don't assume that using single precision floating point will always give you the expected result. Floating point arithmetic is always a bit messy in computing.
When you add f to the decimal it makes it a float constant which has only about 6 digits of precision. This makes representation error much more likely and much bigger.
When you drop the use of f, the decimal is a double which has half a trillion times the precision. This makes the representation error much smaller, and when printed as a double you are less likely to see it.
When you print a double, the libraries expects there to be some representation error and will show you the simplest/shortest number which has the same representation as the double. (This is actually an infinite number of numbers which map to the same representation)
However, this implicit rounding will only correct a very small amount and is unlikely to correct for the representation error of a float. Note: if you print using a float instead of a double it will perform greater rounding, hiding the error.
I am writing tests for code performing calculations on floating point numbers. Quite expectedly, the results are rarely exact and I would like to set a tolerance between the calculated and expected result. I have verified that in practice, with double precision, the results are always correct after rounding of last two significant decimals, but usually after rounding the last decimal. I am aware of the format in which doubles and floats are stored, as well as the two main methods of rounding (precise via BigDecimal and faster via multiplication, math.round and division). As the mantissa is stored in binary however, is there a way to perform rounding using base 2 rather than 10?
Just clearing the last 3 bits almost always yields equal results, but if I could push it and instead 'add 2' to the mantissa if its second least significast bit is set, I could probably reach the limit of accuracy. This would be easy enough, expect I have no idea how to handle overflow (when all bits 52-1 are set).
A Java solution would be preferred, but I could probably port one for another language if I understood it.
EDIT:
As part of the problem was that my code was generic with regards to arithmetic (relying on scala.Numeric type class), what I did was an incorporation of rounding suggested in the answer into a new numeric type, which carried the calculated number (floating point in this case) and rounding error, essentially representing a range instead of a point. I then overrode equals so that two numbers are equal if their error ranges overlap (and they share arithmetic, i.e. the number type).
Yes, rounding off binary digits makes more sense than going through BigDecimal and can be implemented very efficiently if you are not worried about being within a small factor of Double.MAX_VALUE.
You can round a floating-point double value x with the following sequence in Java (untested):
double t = 9 * x; // beware: this overflows if x is too close to Double.MAX_VALUE
double y = x - t + t;
After this sequence, y should contain the rounded value. Adjust the distance between the two set bits in the constant 9 in order to adjust the number of bits that are rounded off. The value 3 rounds off one bit. The value 5 rounds off two bits. The value 17 rounds off four bits, and so on.
This sequence of instruction is attributed to Veltkamp and is typically used in “Dekker multiplication”. This page has some references.
I have the following statement:
float diff = tempVal - m_constraint.getMinVal();
tempVal is declared as a float and the getMinVal() returns a float value.
I have the following print out:
diff=0.099999905, tempVal=5.1, m_constraint.getMinVal()=5.0
I expect the diff is 0.1 but not the above number. how to do that?
Floats use the IEEE754 to represent numbers, and that system has some rounding errors.
Floating point guide
What Every Computer Scientist Should Know About Floating-Point Arithmetic
Wikipedia on IEE754
Bottom-line if you are doing arithmetic and it needs to be exact don't use float or double but us BigDecimal
Because of the way they store values internally, floats and doubles can only store completely accurately numbers which can be decomposed into a sum of powers of 2 (and then, within certain constraints relating to their absolute and relative magnitude).
So as soon as you attempt to store, or perform a calculating involving, a number which cannot be stored exactly, you are going to get an error in the final digit.
Usually this isn't a problem provided you use floats and doubles with some precaution:
use a size of floating point primitive which has "spare" digits of precision beyond what you need;
for many applications, this probably means don't use float at all (use double instead): it has very poor precision and, with the exception of division, has no performance benefit on many processors;
when printing FP numbers, only actually print and consider the number of digits of precision that you need, and certainly don't include the final digit (use String.format to help you);
if you need arbitrary number of digits of precision, use BigDecimal instead.
You cannot get exact results with floating point numbers. You might need to use a FixedPoint library for that. See : http://sourceforge.net/projects/jmfp/
Java encodes real numbers using binary floating point representations defined in IEEE 754. Like all finite representations it cannot accurately represent all real numbers because there is far more real numbers than potential representations. Numbers which cannot be represented exactly (like 0.1 in your case) are rounded to the nearest representable number.
I am working with an application that is based entirely on doubles, and am having trouble in one utility method that parses a string into a double. I've found a fix where using BigDecimal for the conversion solves the issue, but raises another problem when I go to convert the BigDecimal back to a double: I'm losing several places of precision. For example:
import java.math.BigDecimal;
import java.text.DecimalFormat;
public class test {
public static void main(String [] args){
String num = "299792.457999999984";
BigDecimal val = new BigDecimal(num);
System.out.println("big decimal: " + val.toString());
DecimalFormat nf = new DecimalFormat("#.0000000000");
System.out.println("double: "+val.doubleValue());
System.out.println("double formatted: "+nf.format(val.doubleValue()));
}
}
This produces the following output:
$ java test
big decimal: 299792.457999999984
double: 299792.458
double formatted: 299792.4580000000
The formatted double demonstrates that it's lost the precision after the third place (the application requires those lower places of precision).
How can I get BigDecimal to preserve those additional places of precision?
Thanks!
Update after catching up on this post. Several people mention this is exceeding the precision of the double data type. Unless I'm reading this reference incorrectly:
http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.3
then the double primitive has a maximum exponential value of Emax = 2K-1-1, and the standard implementation has K=11. So, the max exponent should be 511, no?
You've reached the maximum precision for a double with that number. It can't be done. The value gets rounded up in this case. The conversion from BigDecimal is unrelated and the precision problem is the same either way. See this for example:
System.out.println(Double.parseDouble("299792.4579999984"));
System.out.println(Double.parseDouble("299792.45799999984"));
System.out.println(Double.parseDouble("299792.457999999984"));
Output is:
299792.4579999984
299792.45799999987
299792.458
For these cases double has more than 3 digits of precision after the decimal point. They just happen to be zeros for your number and that's the closest representation you can fit into a double. It's closer for it to round up in this case, so your 9's seem to disappear. If you try this:
System.out.println(Double.parseDouble("299792.457999999924"));
You'll notice that it keeps your 9's because it was closer to round down:
299792.4579999999
If you require that all of the digits in your number be preserved then you'll have to change your code that operates on double. You could use BigDecimal in place of them. If you need performance then you might want to explore BCD as an option, although I'm not aware of any libraries offhand.
In response to your update: the maximum exponent for a double-precision floating-point number is actually 1023. That's not your limiting factor here though. Your number exceeds the precision of the 52 fractional bits that represent the significand, see IEEE 754-1985.
Use this floating-point conversion to see your number in binary. The exponent is 18 since 262144 (2^18) is nearest. If you take the fractional bits and go up or down one in binary, you can see there's not enough precision to represent your number:
299792.457999999900 // 0010010011000100000111010100111111011111001110110101
299792.457999999984 // here's your number that doesn't fit into a double
299792.458000000000 // 0010010011000100000111010100111111011111001110110110
299792.458000000040 // 0010010011000100000111010100111111011111001110110111
The problem is that a double can hold 15 digits, while a BigDecimal can hold an arbitrary number. When you call toDouble(), it attempts to apply a rounding mode to remove the excess digits. However, since you have a lot of 9's in the output, that means that they keep getting rounded up to 0, with a carry to the next-highest digit.
To keep as much precision as you can, you need to change the BigDecimal's rounding mode so that it truncates:
BigDecimal bd1 = new BigDecimal("12345.1234599999998");
System.out.println(bd1.doubleValue());
BigDecimal bd2 = new BigDecimal("12345.1234599999998", new MathContext(15, RoundingMode.FLOOR));
System.out.println(bd2.doubleValue());
Only that many digits are printed so that, when parsing the string back to double, it will result in the exact same value.
Some detail can be found in the javadoc for Double#toString
How many digits must be printed for the fractional part of m or a? There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type double. That is, suppose that x is the exact mathematical value represented by the decimal representation produced by this method for a finite nonzero argument d. Then d must be the double value nearest to x; or if two double values are equally close to x, then d must be one of them and the least significant bit of the significand of d must be 0.
If it's entirely based on doubles ... why are you using BigDecimal? Wouldn't Double make more sense? If it's too large of value (or too much precision) for that then ... you can't convert it; that would be the reason to use BigDecimal in the first place.
As to why it's losing precision, from the javadoc
Converts this BigDecimal to a double. This conversion is similar to the narrowing primitive conversion from double to float as defined in the Java Language Specification: if this BigDecimal has too great a magnitude represent as a double, it will be converted to Double.NEGATIVE_INFINITY or Double.POSITIVE_INFINITY as appropriate. Note that even when the return value is finite, this conversion can lose information about the precision of the BigDecimal value.
You've hit the maximum possible precision for the double. If you would still like to store the value in primitives... one possible way is to store the part before the decimal point in a long
long l = 299792;
double d = 0.457999999984;
Since you are not using up (that's a bad choice of words) the precision for storing the decimal section, you can hold more digits of precision for the fractional component. This should be easy enough to do with some rounding etc..