The documentation of the SE Math library is, thankfully, very transparent about rounding errors:
If a method always has an error less than 0.5 ulps, the method always returns the floating-point number nearest the exact result; such a method is correctly rounded. A correctly rounded method is generally the best a floating-point approximation can be; however, it is impractical for many floating-point methods to be correctly rounded. Instead, for the Math class, a larger error bound of 1 or 2 ulps is allowed for certain methods. Informally, with a 1 ulp error bound, when the exact result is a representable number, the exact result should be returned as the computed result; otherwise, either of the two floating-point values which bracket the exact result may be returned.
And every floating-point method mentions its error bounds in ulps. In particular, for Math.log():
Returns the natural logarithm (base e) of a double value...The computed result must be within 1 ulp of the exact result
Therefore, Math.log() will possibly round to the nearest representable value in the wrong direction.
I need a correctly rounded implementation of base-e log. Where might I find one?
Related
This question already has an answer here:
Why is pow(-infinity, positive non-integer) +infinity?
(1 answer)
Closed 5 years ago.
I tried two different ways to find the square root in Java:
Math.sqrt(Double.NEGATIVE_INFINITY); // NaN
Math.pow(Double.NEGATIVE_INFINITY, 0.5); // Infinity
Why doesn't the second way return the expected answer which is NaN (same as with the first way)?
A NaN is returned (under IEEE 754) in order to continue a computation when a truly undefined (intermediate) result has been obtained. An infinity is returned in order to continue a computation after an overflow has occurred.
Thus the behaviour
Math.sqrt(Double.NEGATIVE_INFINITY); // NaN
is specified because it is known (easily and quickly) that an undefined value has been generated; based solely on the sign of the argument.
However evaluation of the expression
Math.pow(Double.NEGATIVE_INFINITY, 0.5); // Infinity
encounters both an overflow AND an invalid operation. However the invalid operation recognition is critically dependent on how accurate the determination of the second argument is. If the second argument is the result of a prior rounding operation, then it may not be exactly 0.5. Thus the less serious determination, recognition of an overflow, is returned in order to avoid critical dependence of the result on the accuracy of the second argument.
Additional details on some of the reasoning behind the IEEE 754 standard, including the reasoning behind returning flag values instead of generating exceptions, is available in
What Every Computer Scientist Should Know About Floating-Point Arithmetic (1991, David Goldberg),
which is Appendix D of
Sun Microsystems Numerical Computation Guide.
It is just acting as is described in the documentation of Math.
For Math.sqrt:
If the argument is NaN or less than zero, then the result is NaN.
For Math.pow:
If
the first argument is negative zero and the second argument is less than zero but not a finite odd integer, or
the first argument is negative infinity and the second argument is greater than zero but not a finite odd integer,
then the result is positive infinity.
As to why they made that design choice - you'll have to ask the authors of java.
I am writing tests for code performing calculations on floating point numbers. Quite expectedly, the results are rarely exact and I would like to set a tolerance between the calculated and expected result. I have verified that in practice, with double precision, the results are always correct after rounding of last two significant decimals, but usually after rounding the last decimal. I am aware of the format in which doubles and floats are stored, as well as the two main methods of rounding (precise via BigDecimal and faster via multiplication, math.round and division). As the mantissa is stored in binary however, is there a way to perform rounding using base 2 rather than 10?
Just clearing the last 3 bits almost always yields equal results, but if I could push it and instead 'add 2' to the mantissa if its second least significast bit is set, I could probably reach the limit of accuracy. This would be easy enough, expect I have no idea how to handle overflow (when all bits 52-1 are set).
A Java solution would be preferred, but I could probably port one for another language if I understood it.
EDIT:
As part of the problem was that my code was generic with regards to arithmetic (relying on scala.Numeric type class), what I did was an incorporation of rounding suggested in the answer into a new numeric type, which carried the calculated number (floating point in this case) and rounding error, essentially representing a range instead of a point. I then overrode equals so that two numbers are equal if their error ranges overlap (and they share arithmetic, i.e. the number type).
Yes, rounding off binary digits makes more sense than going through BigDecimal and can be implemented very efficiently if you are not worried about being within a small factor of Double.MAX_VALUE.
You can round a floating-point double value x with the following sequence in Java (untested):
double t = 9 * x; // beware: this overflows if x is too close to Double.MAX_VALUE
double y = x - t + t;
After this sequence, y should contain the rounded value. Adjust the distance between the two set bits in the constant 9 in order to adjust the number of bits that are rounded off. The value 3 rounds off one bit. The value 5 rounds off two bits. The value 17 rounds off four bits, and so on.
This sequence of instruction is attributed to Veltkamp and is typically used in “Dekker multiplication”. This page has some references.
This question already has answers here:
Is floating point math broken?
(31 answers)
Rounding oddity - what is special about "100"? [duplicate]
(2 answers)
Closed 9 years ago.
As I understand this, some numbers can't be represented with exactitude in binary, and that's why floating-point arithmetic sometimes gives us unexpected results; like 4.35 * 100 = 434.99999999999994. Something similar to what happens with 1/3 in decimal.
That makes sense, but this induces another question. Seems that in binary both 4.35 and 435 can be represented with exactitude. That's when it stops making sense to me. Why does 4.35 * 100 evaluates to 434.99999999999994? 435 and 4.35 have an exact representation in the double type dynamics:
double number1 = 4.35;
double number2 = 435;
double number3 = 100;
System.out.println(number1); // 4.35
System.out.println(number2); // 435.0
System.out.println(number3); // 100.0
// So far so good. Everything ok.
System.out.println(number1 * number3); // 434.99999999999994 !!!
// But 4.35 * 100 evaluates to 434.99999999999994
Why?
Edit: this question was marked as duplicate, and it is not. As you can see in the accepted answer, my confusion was regarding the discrepancy between the actual value and the printed value.
Seems that in binary both 4.35 and 435 can be represented with exactitude.
I see that you understand how the floating point numbers are internally represented. As for your doubt, no 4.35 does not have an exact binary representation. So the issue is, why the 1st print statement prints 4.35.
That is happening because System.out.println() invokes the Double.toString(double) method, which in turns uses FloatingDecimal#toJavaFormatString() method, which performs some rounding internally on the passed double argument. You can go through the source code I linked.
For seeing the actual value of 4.35, try using this:
BigDecimal bd = new BigDecimal(number1);
System.out.println(bd);
This will print:
4.3499999999999996447286321199499070644378662109375
In this case, rather than printing the double value, you create a BigDecimal object passing double value as argument. BigDecimal represents arbitrary precision signed decimal number. So it gives you the exact value of 4.35.
You are right in that sometimes floating-point arithmetic gives unexpected results.
Your assertion that 4.35 can be represented exactly in floating-point is incorrect, because it can't be represented as a terminating binary decimal. 100 can obviously be represented exactly, so for the result to be 434.99999999999994, `4.35 must not be represented exactly.
To be represented exactly in floating-point, a number must be able to be converted to a fraction where the denominator is a power of two only (and it must not be so precise that it exceeds the maximum precision of the floating-point type you're using). In this case, 4.35 is 4 7/20, and the denominator has a factor of 5, so the number can't be represented exactly in binary.
Although from a hardware perspective each floating-point number represents some exact value of the form M * 2^E (where M and E are integers in a certain range), from a software perspective it is more helpful to think of each floating-point number as representing "Something for which M * 2^E has been deemed the best representation, and which is hopefully close to that". Given a floating-point value (M * 2^E), one should figure that the actual number it's intended to represent may very easily be anywhere from (N - 1/2) * 2^E to (N + 1/2) * 2^E and in practice may extend a bit further beyond.
As a simple example, with type float, the value of M is limited to the range 0-16777215. The best representation of 2000000.1f is thus 16000001 * 2^-3 [i.e. 16000001/8]. Although exact decimal value of 16000001/8 is 2000000.125, the last digit isn't necessary to define the value of the number, since 16000001/8 would the best representation of 2000000.120 and 2000000.129 (or, for that matter, all values between 2000000.0625 and 2000000.1875, non-inclusive). Because the number of digits that would required to display the exact decimal value of a number of the form M * 2^E would often far exceed the number of meaningful digits, it is common to limit number of displayed digits to roughly those necessary to uniquely define the value.
Note that if one regards floating-point numbers as representing ranges, one will observe that casts from double to float--even though they must be explicitly specified--are actually safe since converting the double that best represents a particular value to float will yield either the best float representation of that value or something very close to it. Conversely, conversion from float to double, even though it's allowed implicitly, is dangerous because such conversion is very unlikely to select the double which would best represent the number that the float was supposed to represent.
it is a bit hard to explain in English, because I have learned computer number representation in Hungarian. In short, 4.35, 435 nor 100 is not exactly these numbers, but mantissa * 2^k (k-characteristic from -k to +k, and t - is the length of the mantissa in the M = (t,-k,+k) ) although the print call does some rounding. So the number-line is not continuous, but near some famous points, denser ).
So as I think these numbers are not exactly what you expect, and after the operation (I suppose this is one or two simple binary operation) you get the multiple of error distance of the two float point number representation.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Moving decimal places over in a double
I am having this weird problem in Java, I have following code:
double velocity = -0.07;
System.out.println("raw value " + velocity*200 );
System.out.println("floored value " + Math.floor(velocity*200) );
I have following output:
raw value -14.000000000000002
floored value -15.0
Those traling 0002 screw everything up, and BTW there should not be that traling 2, I think it should be all zeroes after decimal point, can I get rid of that 2?
Update: Thanks for help, Guys do you know any way to make floor rounding on BigDecimal object without calling doubleValue method?
Because floor(-14.000000000000002) is indeed -15!
You see, floor is defined as the maximal whole number less or equal to the argument. As -14.000000000000002 is not a whole number, the closest whole number downwards is -15.
Well, now let's clear why -0.07 * 200 is not exactly -14. This is because the inner representation of floating-point numbers is in base 2, so the fractions where the denominator is not a power of 2 cannot be represented with 100% precision. (The same way as you cannot represent 1/3 as the decimal fraction with finite amount of decimal places.) So, the value of velocity is not exactly -0.07. (When the compiler sees the constant -0.07, it silently replaces it with a binary fraction which is quite close to -0.07, but not actually equal to.) This is why velocity * 200 is not exactly -14.
From The Floating-Point Guide:
Why don’t my numbers, like 0.1 + 0.2 add up to a nice round 0.3, and instead I get a weird result like 0.30000000000000004?
Because internally, computers use a format (binary floating-point)
that cannot accurately represent a number like 0.1, 0.2 or 0.3 at all.
When the code is compiled or interpreted, your “0.1” is already
rounded to the nearest number in that format, which results in a small
rounding error even before the calculation happens.
If you need numbers that exactly add up to specific expected values, you cannot use double. Read the linked-to site for details.
Use BigDecimal... The problem above is a well-known rounding problem with the representation schemes used on a computer with finite-memory. The problem is that the answer is repetitive in the binary (that is, base 2) system (i.e. like 1/3 = 0.33333333... with decimal) and cannot be presented correctly. A good example of this is 1/10 = 0.1 which is 0.000110011001100110011001100110011... in binary. After some point the 1s and 0s have to end, causing the perceived error.
Hope you're not working on life-critical stuff... for example http://www.ima.umn.edu/~arnold/disasters/patriot.html. 28 people lost their lives due to a rounding error.
Java doubles follow the IEEE 754 floating-point arithmetic, which can't represent every single real number with infinite accuracy. This round up is normal. You can't get rid of it in the internal representation. You can of course use String.format to print the result.
double d=1/0.0;
System.out.println(d);
It prints Infinity , but if we will write double d=1/0; and print it we'll get this exception: Exception
in thread "main" java.lang.ArithmeticException: / by zero
at D.main(D.java:3) Why does Java know in one case that diving by zero is infinity but for the int 0 it is not defined?
In both cases d is double and in both cases the result is infinity.
Floating point data types have a special value reserved to represent infinity, integer values do not.
In your code 1/0 is an integer division that, of course, fails. However, 1/0.0 is a floating point division and so results in Infinity.
strictly speaking, 1.0/0.0 isn't infinity at all, it's undefined.
As David says in his answer, Floats have a way of expressing a number that is not in the range of the highest number it can represent and the lowest. These values are collectively known as "Not a number" or just NaNs. NaNs can also occur from calculations that really are infinite (such as limx -> 0 ln2 x), values that are finite but overflow the range floats can represent (like 10100100), as well as undefined values like 1/0.
Floating point numbers don't quite clearly distinguish among undefined values, overflow and infinity; what combination of bits results from that calculation depends. Since just printing "NaN" or "Not a Number" is a bit harder to understand for folks that don't know how floating point values are represented, that formatter just prints "Infinity" or sometimes "-Infinity" Since it provides the same level of information when you do know what FP NaN's are all about, and has some meaning when you don't.
Integers don't have anything comparable to floating point NaN's. Since there's no sensible value for an integer to take when you do 1/0, the only option left is to raise an exception.
The same code written in machine language can either invoke an interrupt, which is comparable to a Java exception, or set a condition register, which would be a global value to indicate that the last calculation was a divide by zero. which of those are available varies a bit by platform.