I use doubles for a uniform implementation of some arithmetic calculations. These calculations may be actually applied to integers too, but there are no C++-like templates in Java and I don't want to duplicate the implementation code, so I simply use "double" version for ints.
Does JVM spec guarantees the correctness of integer operations such a <=,>=, +, -, *, and / (in case of remainder==0) when the operations are emulated as corresponding floating point ops?
(Any integer, of course, has reasonable size to be represented in double's mantissa)
According to the Java Language Specification:
Operators on floating-point numbers
behave as specified by IEEE 754 (with
the exception of the remainder
operator (§15.17.3)).
So you're guaranteed uniform behaviour, and while I don't have access to the official IEEE standard document, I'm pretty sure that it implicitly guarantees that operations on integers that can be represented exactly as a float/double work as expected.
briefly yes.
double a = 3.0;
double b = 2.0;
System.out.println(a*b); // 6.0
System.out.println(a+b); // 5.0
System.out.println(a-b); // 1.0
System.out.println(a/b); // 1.5 // if you want to get 1 here you should cast it to `integer (int)`
System.out.println(a>=b); // true
System.out.println(a<=b); // false
but be careful while multiplication (*) because a*b can cause overflow while casting to integer. same situation for (+ and -)
Indeed, I 've found the standard and it says "yes"
JVM spec:
The rounding operations of the Java virtual machine always use IEEE 754 round to
nearest mode. Inexact results are rounded to the nearest representable value, with ties going to the value with a zero least-significant bit. This is the IEEE 754 default mode. But Java virtual machine instructions that convert values of floating-point types to values of integral types round toward zero. The Java virtual machine does not give any means to change the floating-point rounding mode.
ANSI/IEEE Std 754-1985 5.
... Except for binary <---> decimal conversion, each of the operations shall be performed as if it first produced an intermediate result correct to infinite precision and with unbounded range, and then coerced this intermediate result to fit in the destination’s format
ANSI/IEEE Std 754-1985 5.4.
Conversions between floating-point integers and integer formats shall be exact unless an exception arises as specified in 7.1.
Summary
1) exact operations are always exact if the result fits the double format (and, therefore, integer result is always floating-point integer).
2) int <--> double conversions are always exact for floating point integers.
Related
This does not answer the question.
I ran the same exact code in Java & C# and it gave two differents results.
Why? As the doubles in both languages have the exact same specifications :
double is a type that represents 64-bit IEEE 754 floating-point number
in Java
double is a type that represents 64-bit double-precision number in
IEEE 754 format in C#.
Java
double a = Math.pow(Math.sin(3), 2);
double b = Math.pow(Math.cos(3), 2);
System.out.println(a); // 0.01991485667481699
System.out.println(b); // 0.9800851433251829
System.out.println(a+b); // 0.9999999999999999
C#
double a = Math.Pow(Math.Sin(3), 2);
double b = Math.Pow(Math.Cos(3), 2);
Console.WriteLine(a); // 0.019914856674817
Console.WriteLine(b); // 0.980085143325183
Console.WriteLine(a+b); // 1
It's just the precision that C# is using with the writeLine method. See https://msdn.microsoft.com/en-us/library/dwhawy9k.aspx#GFormatString where it specifies that the G format specifier gives 15-digit precision.
If you write:
Console.WriteLine(a.ToString("R"));
it prints 0.019914856674816989.
The root of it is that floating point numbers are imprecise and calculations can't even really be relied upon to be deterministic. But they're close.
Most likely the difference is probably because the CLR is allowed to work with doubles as 80 bit numbers internally. You don't ever see more than 64 bits, however the processor will work with 80. I'm unsure how Java handles floating point numbers internally. It could possibly be the same.
There's tons on the topic, but here's some random light reading from Google which may be of interest.
IEEE754 has a distiction between required operations and optional operations:
required operations, like addition, subtraction, etc must be exactly rounded
optional operations are not required to be exactly rounded, and the list of these operations includes all trigonometric functions (and others), they are let to the implementation
So you have no guarantees from the standard that sin and cos implementation should match between implementations.
More infomations here or here.
Our teacher asked us to search about this and what I kept on getting from the net are explanations stating what double and float means.
Can you tell me whether it is possible or not, and explain why or why not?
Simple answer: yes, but only if the double is not too large.
float's are single-precision floating point numbers, meaning they use a 23-bit mantissa and 8-bit exponent, corresponding to ~6/7 s.f. precision and ~ 10^38 range.
double's are double-precision - with 52-bit mantissa and 11-bit exponent, corresponding to ~14/15 s.f. precision and ~ 10^308 range.
Since double's have larger range than floats, adding a float to a very large double will nullify the float's effects (called underflow). Of course this can happen for two double types as well.
https://en.wikipedia.org/wiki/Floating_point
Can you add two numbers with varying decimal places (e.g. 432.54385789364 + 432.1)? Yes you can.
In Java, it is the same idea.
From the Java Tutorials:
float: The float data type is a single-precision 32-bit IEEE 754 floating point. Its range of values is beyond the scope of this discussion, but is specified in the Floating-Point Types, Formats, and Values section of the Java Language Specification. As with the recommendations for byte and short, use a float (instead of double) if you need to save memory in large arrays of floating point numbers. This data type should never be used for precise values, such as currency. For that, you will need to use the java.math.BigDecimal class instead. Numbers and Strings covers BigDecimal and other useful classes provided by the Java platform.
double: The double data type is a double-precision 64-bit IEEE 754 floating point. Its range of values is beyond the scope of this discussion, but is specified in the Floating-Point Types, Formats, and Values section of the Java Language Specification. For decimal values, this data type is generally the default choice. As mentioned above, this data type should never be used for precise values, such as currency.
Basically, they are both holders to decimals. The way that they are different is how precise they can be. A float can only be 32 bits in size, compared to a double which is 64 bits in size. A float can have precision up to around 5 or 6 float point numbers, and a double can have precision up to around 10 floating point numbers.
Basically... a double can store a decimal better than a float... but takes up more space.
To answer your question, you can add a float to a double and vice versa. Generally, the result will be made into a double, and you will have to cast it back to a float if that is what you want.
If you want to be really deep about it you should say yes it is possible due to value coercion, but that it opens the door for more severe precision errors to accumulate invisibly to the compiler. float has substantially precision than double and is very regrettably the default type of literal floating-point numbers in Java source. In practice make sure to use the d suffix on literals to make sure theh are double if you have to use floating point.
These precision errors can lead to serious harm and even loss of life in sensitive systems.
Floating point is very hard to use correctly and should be avoided if possible. One extremely obvious thing not to do that is commonly mistakenly done is representing currency as a float or double. This can cause real money to be effectively given to or stolen from people.
Floating point (preferring double) is appropriate for approximate calculations and certain high performance scientific computing applications. However it is still extremely important to be aware of the precision loss characteristics particularly when a resulting floating point value is fed into further floating-point calculations.
This more generally leads in Numerical Computing and now I've really gone afield :)
SAS has a decent paper on this:
http://support.sas.com/resources/papers/proceedings11/275-2011.pdf
This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 7 years ago.
I am executing the following code in java but i got two different answers for what should be the same number mathematically.
public class TestClass {
public static void main(String[] args) {
double a=0.01;
double b=4.5;
double c=789;
System.out.println("Value1---->"+(a*b*c));
System.out.println("Value2---->"+(b*c*a));
}
}
Output:
Value1---->35.504999999999995
Value2---->35.505
Floating point numbers have a certain precision. Some fractions can not be displayed correctly with floating point numbers, that's why rounding errors can occur.
The results are different because of the precedence of the calculations. Each of your calculations consists of two multiplications. The multiply * operator in Java has a left to right associativity. That means that in (a*b*c), a*b is calculated first and then multiplied by c as in ((a*b)*c). One of those calculation chains happens to produce a rounding error because a number in it simply can't be represented as a floating point number.
Essentially, Java uses binary floating point values to handle all of its decimal based operations. As mentioned, in another answer, here is a link to the IEEE 754 that addresses the issue you've encountered. And as also mentioned in Joshua Bloch's Effective Java, refer to item 48 "Avoid float and double if exact answers are required":
In summary, don’t use float or double for any calculations that require an
exact answer. Use BigDecimal if you want the system to keep track of the decimal
point and you don’t mind the inconvenience and cost of not using a primitive type.
It is because type double is an approximation.
Double in Java denotes to IEEE 754 standart type decimal64.
To resolve this problem use Math.round() or either BigDecimal class.
Multiplication of floating points uses a process that introduces precision errors.
To quote Wikipedia:
"To multiply, the significands are multiplied while the exponents are added, and the result is rounded and normalized."
Java multiplies from left to right. In your example, the first parts (a * b and b * c) actually produce no precision errors.
So your final multiplications end up as:
System.out.println("Value1---->" + (0.045 * 789));
System.out.println("Value2---->" + (3550.5 * 0.01));
Now, 0.045 * 789 produces a precision error due to that floating point multiplication process. Whereas 3550.5 * 0.01 does not.
'Cause double * double will be double, and that not totally precise.
Try the following code:
System.out.println(1.0-0.9-0.1) // -2.7755575615628914E-17
If you want totally precise real numbers, use BigDecimal instead!
This is because double has finite precision. Binary representation can't store exactly the value of for example 0.01. See also wikipedia entry on double precision floating point numbers.
Order of multiplication can change the way that representation errors are accumulated.
Consider using BigDecimal class, if you need precision.
As JavaDoc the double is a floating point type, and it's imprecise by nature. That why two exactly identical operation will wield different results, since the float point type (double) is an approximation.
See http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html :
double: The double data type is a double-precision 64-bit IEEE 754
floating point. Its range of values is beyond the scope of this
discussion, but is specified in the Floating-Point Types, Formats, and
Values section of the Java Language Specification. For decimal values,
this data type is generally the default choice. As mentioned above,
this data type should never be used for precise values, such as
currency.
See also the wikipedia http://en.wikipedia.org/wiki/Floating_point :
The floating-point representation is by far the most common way of
representing in computers an approximation to real numbers.
http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
Quote: "double: The double data type is a double-precision 64-bit IEEE 754 floating point."
When you dig into IEEE 754 you will understand how doubles are stored in memory.
For such calculations I would recommend http://docs.oracle.com/javase/8/docs/api/java/math/BigDecimal.html
See this reply from 2011
Java:Why should we use BigDecimal instead of Double in the real world?
It's called loss of precision and is very noticeable when working with either very big numbers or very small numbers.
See the section
Decimal numbers are approximations
And read down
As mentioned, there is an issue with the floating precision. You can either use printf or you can use Math.round() like so (change the number of zeros to affect precision):
System.out.println("Value 1 ----> " + (double) Math.round((a*b*c) * 100000) / 100000);
System.out.println("Value 2 ----> " + (double) Math.round((b*c*a) * 100000) / 100000);
Output
Value 1 ----> 35.505
Value 2 ----> 35.505
When I write something like
double a = 0.0;
double b = 0.0;
double c = a/b;
The result is Double.NaN, but when I try the same for integers, it produces an ArithmeticException. So, why isn't there a Integer.NaN?
The answer has very little to do with Java. Infinity or undefined numbers are not a part of the integer set, so they are excluded from Integer, whereas floating point types represent real numbers as well as complex numbers, so to deal with these, NaN has been included with floating point types.
For the same reason that there is no integer NaN in any other language.
Modern computers use 2's complement binary representation for integers, and that representation doesn't have a NaN value. (All values in the domain of the representation type represent definite integers.)
It follows that computer integer arithmetic hardware does not recognize any NaN representation.
In theory, someone could invent an alternative representation for integers that includes NaN (or INF, or some other exotic value). However, arithmetic using such a representation would not be supported by the hardware. While it would be possible to implement it in software, it would be prohibitively expensive1... and undesirable in other respects too to include this support in the Java language.
1 - It is of course relative, but I'd anticipate that a software implementation of NaNs would be (at least) an order of magnitude slower than hardware. If you actually, really, needed this, then that would be acceptable. But the vast majority of integer arithmetic codes don't need this. In most cases throwing an exception for "divide by zero" is just fine, and an order of magnitude slow down in all integer arithmetic operations is ... not acceptable.
By contrast:
the "unused" values in the representation space already exist
NaN and INF values are part of the IEE floating point standard, and
they are (typically) implemented by the native hardware implementation of floating point arithmetic
As noted in other comments, it's largely because NaN is a standard value for floating point numbers. You can read about the reasons NaN would be returned on Wikipedia here:
http://en.wikipedia.org/wiki/NaN
Notice that only one of these reasons exists for integer numbers (divide by zero). There is also both a positive and a negative infinity value for floating point numbers that integers don't have and is closely linked to NaN in the floating point specification.
What is the inclusive range of float and double in Java?
Why are you not recommended to use float or double for anything where precision is critical?
Java's Primitive Data Types
boolean:
1-bit. May take on the values true and false only.
byte:
1 signed byte (two's complement). Covers values from -128 to 127.
short:
2 bytes, signed (two's complement), -32,768 to 32,767
int:
4 bytes, signed (two's complement). -2,147,483,648 to 2,147,483,647.
long:
8 bytes signed (two's complement). Ranges from -9,223,372,036,854,775,808 to +9,223,372,036,854,775,807.
float:
4 bytes, IEEE 754. Covers a range from 1.40129846432481707e-45 to 3.40282346638528860e+38 (positive or negative).
double:
8 bytes IEEE 754. Covers a range from 4.94065645841246544e-324d to 1.79769313486231570e+308d (positive or negative).
char:
2 bytes, unsigned, Unicode, 0 to 65,535
Java's Double class has members containing the Min and Max value for the type.
2^-1074 <= x <= (2-2^-52)·2^1023 // where x is the double.
Check out the Min_VALUE and MAX_VALUE static final members of Double.
(some)People will suggest against using floating point types for things where accuracy and precision are critical because rounding errors can throw off calculations by measurable (small) amounts.
Binary floating-point numbers have interesting precision characteristics, since the value is stored as a binary integer raised to a binary power. When dealing with sub-integer values (that is, values between 0 and 1), negative powers of two "round off" very differently than negative powers of ten.
For example, the number 0.1 can be represented by 1 x 10-1, but there is no combination of base-2 exponent and mantissa that can precisely represent 0.1 -- the closest you get is 0.10000000000000001.
So if you have an application where you are working with values like 0.1 or 0.01 a great deal, but where small (less than 0.000000000000001%) errors cannot be tolerated, then binary floating-point numbers are not for you.
Conversely, if powers of ten are not "special" to your application (powers of ten are important in currency calculations, but not in, say, most applications of physics), then you are actually better off using binary floating-point, since it's usually at least an order of magnitude faster, and it is much more memory efficient.
The article from the Python documentation on floating point issues and limitations does an excellent job of explaining this issue in an easy to understand form. Wikipedia also has a good article on floating point that explains the math behind the representation.
From Primitives Data Types:
float: The float data type is a single-precision 32-bit IEEE 754
floating point. Its range of values is
beyond the scope of this discussion,
but is specified in section 4.2.3
of the Java Language Specification. As
with the recommendations for byte
and short, use a float (instead of
double) if you need to save memory
in large arrays of floating point
numbers. This data type should never
be used for precise values, such as
currency. For that, you will need to
use the java.math.BigDecimal
class instead. Numbers and
Strings covers BigDecimal and
other useful classes provided by the
Java platform.
double: The double data type is a double-precision 64-bit IEEE 754
floating point. Its range of values is
beyond the scope of this discussion,
but is specified in section 4.2.3
of the Java Language Specification.
For decimal values, this data type is
generally the default choice. As
mentioned above, this data type should
never be used for precise values, such
as currency.
For the range of values, see the section 4.2.3 Floating-Point Types, Formats, and Values of the JLS.
Of course you can use floats or doubles for "critical" things ... Many applications do nothing but crunch numbers using these datatypes.
You might have misunderstood some of the various caveats regarding floating-point numbers, such as the recommendation to never compare for exact equality, and so on.