Upon checking Float.compare(f1,f2) I found that it compares f1f2
and returns -1,0,1.
Then it returns -1,0,1 if the values are -0.0, 0.0 or NAN.
What does that mean -0.0?
I would have expected something like
return (Math.abs(f1 - f2) - 0.001f) > 0)
where 0.001 is a given epsilon value.
Thanks.
-0.0 is the negative zero, as specified by the IEEE 754 standard.
If you're curious about how such a value might arise, the following article does a good job of explaining it: http://www.savrola.com/resources/negative_zero.html
As to not taking an epsilon value, this is how Float.compare is designed work (it's an exact comparison, not an approximate one). There's nothing to stop you from having another comparison function that does take an epsilon and does perform an approximate comparison.
Both exact and approximate comparisons of floating-point numbers have their uses.
As to your actual code, it suffers from a number of issues:
it isn't a three-way comparison like Float.compare;
it doesn't handle NaNs;
it is generally better to specify the epsilon as a relative value, not as an absolute one, so that it scales with f1 and f2 (see this article for a discussion).
My point here isn't to criticise your code but to show that writing good floating-point code is harder than it first looks.
Floating point arithmetic is tricky. This article throws some light on the basics.
-0 is signed zero:
In ordinary arithmetic, −0 = +0 = 0. However, in computing, some
number representations allow for the existence of two zeros, often
denoted by −0 (negative zero) and +0 (positive zero).
[...]
The IEEE 754 standard for floating point arithmetic (presently used by
most computers and programming languages that support floating point
numbers) requires both +0 and −0. The zeroes can be considered as a
variant of the extended real number line such that 1/−0 = −∞ and 1/+0
= +∞, division by zero is only undefined for ±0/±0 and ±∞/±∞.
Related
Our teacher asked us to search about this and what I kept on getting from the net are explanations stating what double and float means.
Can you tell me whether it is possible or not, and explain why or why not?
Simple answer: yes, but only if the double is not too large.
float's are single-precision floating point numbers, meaning they use a 23-bit mantissa and 8-bit exponent, corresponding to ~6/7 s.f. precision and ~ 10^38 range.
double's are double-precision - with 52-bit mantissa and 11-bit exponent, corresponding to ~14/15 s.f. precision and ~ 10^308 range.
Since double's have larger range than floats, adding a float to a very large double will nullify the float's effects (called underflow). Of course this can happen for two double types as well.
https://en.wikipedia.org/wiki/Floating_point
Can you add two numbers with varying decimal places (e.g. 432.54385789364 + 432.1)? Yes you can.
In Java, it is the same idea.
From the Java Tutorials:
float: The float data type is a single-precision 32-bit IEEE 754 floating point. Its range of values is beyond the scope of this discussion, but is specified in the Floating-Point Types, Formats, and Values section of the Java Language Specification. As with the recommendations for byte and short, use a float (instead of double) if you need to save memory in large arrays of floating point numbers. This data type should never be used for precise values, such as currency. For that, you will need to use the java.math.BigDecimal class instead. Numbers and Strings covers BigDecimal and other useful classes provided by the Java platform.
double: The double data type is a double-precision 64-bit IEEE 754 floating point. Its range of values is beyond the scope of this discussion, but is specified in the Floating-Point Types, Formats, and Values section of the Java Language Specification. For decimal values, this data type is generally the default choice. As mentioned above, this data type should never be used for precise values, such as currency.
Basically, they are both holders to decimals. The way that they are different is how precise they can be. A float can only be 32 bits in size, compared to a double which is 64 bits in size. A float can have precision up to around 5 or 6 float point numbers, and a double can have precision up to around 10 floating point numbers.
Basically... a double can store a decimal better than a float... but takes up more space.
To answer your question, you can add a float to a double and vice versa. Generally, the result will be made into a double, and you will have to cast it back to a float if that is what you want.
If you want to be really deep about it you should say yes it is possible due to value coercion, but that it opens the door for more severe precision errors to accumulate invisibly to the compiler. float has substantially precision than double and is very regrettably the default type of literal floating-point numbers in Java source. In practice make sure to use the d suffix on literals to make sure theh are double if you have to use floating point.
These precision errors can lead to serious harm and even loss of life in sensitive systems.
Floating point is very hard to use correctly and should be avoided if possible. One extremely obvious thing not to do that is commonly mistakenly done is representing currency as a float or double. This can cause real money to be effectively given to or stolen from people.
Floating point (preferring double) is appropriate for approximate calculations and certain high performance scientific computing applications. However it is still extremely important to be aware of the precision loss characteristics particularly when a resulting floating point value is fed into further floating-point calculations.
This more generally leads in Numerical Computing and now I've really gone afield :)
SAS has a decent paper on this:
http://support.sas.com/resources/papers/proceedings11/275-2011.pdf
When I write something like
double a = 0.0;
double b = 0.0;
double c = a/b;
The result is Double.NaN, but when I try the same for integers, it produces an ArithmeticException. So, why isn't there a Integer.NaN?
The answer has very little to do with Java. Infinity or undefined numbers are not a part of the integer set, so they are excluded from Integer, whereas floating point types represent real numbers as well as complex numbers, so to deal with these, NaN has been included with floating point types.
For the same reason that there is no integer NaN in any other language.
Modern computers use 2's complement binary representation for integers, and that representation doesn't have a NaN value. (All values in the domain of the representation type represent definite integers.)
It follows that computer integer arithmetic hardware does not recognize any NaN representation.
In theory, someone could invent an alternative representation for integers that includes NaN (or INF, or some other exotic value). However, arithmetic using such a representation would not be supported by the hardware. While it would be possible to implement it in software, it would be prohibitively expensive1... and undesirable in other respects too to include this support in the Java language.
1 - It is of course relative, but I'd anticipate that a software implementation of NaNs would be (at least) an order of magnitude slower than hardware. If you actually, really, needed this, then that would be acceptable. But the vast majority of integer arithmetic codes don't need this. In most cases throwing an exception for "divide by zero" is just fine, and an order of magnitude slow down in all integer arithmetic operations is ... not acceptable.
By contrast:
the "unused" values in the representation space already exist
NaN and INF values are part of the IEE floating point standard, and
they are (typically) implemented by the native hardware implementation of floating point arithmetic
As noted in other comments, it's largely because NaN is a standard value for floating point numbers. You can read about the reasons NaN would be returned on Wikipedia here:
http://en.wikipedia.org/wiki/NaN
Notice that only one of these reasons exists for integer numbers (divide by zero). There is also both a positive and a negative infinity value for floating point numbers that integers don't have and is closely linked to NaN in the floating point specification.
I have the following statement:
float diff = tempVal - m_constraint.getMinVal();
tempVal is declared as a float and the getMinVal() returns a float value.
I have the following print out:
diff=0.099999905, tempVal=5.1, m_constraint.getMinVal()=5.0
I expect the diff is 0.1 but not the above number. how to do that?
Floats use the IEEE754 to represent numbers, and that system has some rounding errors.
Floating point guide
What Every Computer Scientist Should Know About Floating-Point Arithmetic
Wikipedia on IEE754
Bottom-line if you are doing arithmetic and it needs to be exact don't use float or double but us BigDecimal
Because of the way they store values internally, floats and doubles can only store completely accurately numbers which can be decomposed into a sum of powers of 2 (and then, within certain constraints relating to their absolute and relative magnitude).
So as soon as you attempt to store, or perform a calculating involving, a number which cannot be stored exactly, you are going to get an error in the final digit.
Usually this isn't a problem provided you use floats and doubles with some precaution:
use a size of floating point primitive which has "spare" digits of precision beyond what you need;
for many applications, this probably means don't use float at all (use double instead): it has very poor precision and, with the exception of division, has no performance benefit on many processors;
when printing FP numbers, only actually print and consider the number of digits of precision that you need, and certainly don't include the final digit (use String.format to help you);
if you need arbitrary number of digits of precision, use BigDecimal instead.
You cannot get exact results with floating point numbers. You might need to use a FixedPoint library for that. See : http://sourceforge.net/projects/jmfp/
Java encodes real numbers using binary floating point representations defined in IEEE 754. Like all finite representations it cannot accurately represent all real numbers because there is far more real numbers than potential representations. Numbers which cannot be represented exactly (like 0.1 in your case) are rounded to the nearest representable number.
I use doubles for a uniform implementation of some arithmetic calculations. These calculations may be actually applied to integers too, but there are no C++-like templates in Java and I don't want to duplicate the implementation code, so I simply use "double" version for ints.
Does JVM spec guarantees the correctness of integer operations such a <=,>=, +, -, *, and / (in case of remainder==0) when the operations are emulated as corresponding floating point ops?
(Any integer, of course, has reasonable size to be represented in double's mantissa)
According to the Java Language Specification:
Operators on floating-point numbers
behave as specified by IEEE 754 (with
the exception of the remainder
operator (§15.17.3)).
So you're guaranteed uniform behaviour, and while I don't have access to the official IEEE standard document, I'm pretty sure that it implicitly guarantees that operations on integers that can be represented exactly as a float/double work as expected.
briefly yes.
double a = 3.0;
double b = 2.0;
System.out.println(a*b); // 6.0
System.out.println(a+b); // 5.0
System.out.println(a-b); // 1.0
System.out.println(a/b); // 1.5 // if you want to get 1 here you should cast it to `integer (int)`
System.out.println(a>=b); // true
System.out.println(a<=b); // false
but be careful while multiplication (*) because a*b can cause overflow while casting to integer. same situation for (+ and -)
Indeed, I 've found the standard and it says "yes"
JVM spec:
The rounding operations of the Java virtual machine always use IEEE 754 round to
nearest mode. Inexact results are rounded to the nearest representable value, with ties going to the value with a zero least-significant bit. This is the IEEE 754 default mode. But Java virtual machine instructions that convert values of floating-point types to values of integral types round toward zero. The Java virtual machine does not give any means to change the floating-point rounding mode.
ANSI/IEEE Std 754-1985 5.
... Except for binary <---> decimal conversion, each of the operations shall be performed as if it first produced an intermediate result correct to infinite precision and with unbounded range, and then coerced this intermediate result to fit in the destination’s format
ANSI/IEEE Std 754-1985 5.4.
Conversions between floating-point integers and integer formats shall be exact unless an exception arises as specified in 7.1.
Summary
1) exact operations are always exact if the result fits the double format (and, therefore, integer result is always floating-point integer).
2) int <--> double conversions are always exact for floating point integers.
Can anyone shed some light on why Double.MIN_VALUE is not actually the minimum value that Doubles can take? It is a positive value, and a Double can of course be negative.
I understand why it's a useful number, but it seems a very unintuitive name, especially when compared to Integer.MIN_VALUE. Calling it Double.SMALLEST_POSITIVE or MIN_INCREMENT or similar would have clearer semantics.
Also, what is the minimum value that Doubles can take? Is it -Double.MAX_VALUE? The docs don't seem to say.
The IEEE 754 format has one bit reserved for the sign and the remaining bits representing the magnitude. This means that it is "symmetrical" around origo (as opposed to the Integer values, which have one more negative value). Thus the minimum value is simply the same as the maximum value, with the sign-bit flipped, so yes, -Double.MAX_VALUE is the lowest actual number you can represent with a double.
I suppose the Double.MAX_VALUE should be seen as maximum magnitude, in which case it actually makes sense to simply write -Double.MAX_VALUE. It also explains why Double.MIN_VALUE is the least positive value (since that represents the least possible magnitude).
But sure, I agree that the naming is a bit misleading. Being used to the meaning Integer.MIN_VALUE, I too was a bit surprised when I read that Double.MIN_VALUE was the smallest absolute value that could be represented. Perhaps they thought it was superfluous to have a constant representing the least possible value as it is simply a - away from MAX_VALUE :-)
(Note, there is also Double.NEGATIVE_INFINITY but I'm disregarding from this, as it is to be seen as a "special case" and does not in fact represent any actual number.)
Here is a good text on the subject.
These constants have nothing to do with sign. This makes more sense if you consider a double as a composite of three parts: Sign, Exponent and Mantissa.
Double.MIN_VALUE is actually the smallest value Mantissa can assume when the Exponent is at minimun value before a flush to zero occurs. Likewise MAX_VALUE can be understood as the largest value Mantissa can assume when the Exponent is at maximum value before a flush to infinity occurs.
A more descriptive name for these two could be Largest Absolute (add non-zero for verbositiy) and Smallest Absolute value (add non-infinity for verbositiy).
Check out the IEEE 754 (1985) standard for details. There is a revised (2008) version, but that only introduces more formats which aren't even supported by java (strictly speaking java even lacks support for some mandatory features of IEEE 754 1985, like many other high level languages).
I assume the confusing names can be traced back to C, which defined FLT_MIN as the smallest positive number.
Like in Java, where you have to use -Double.MAX_VALUE, you have to use -FLT_MAX to get the smallest float in C.
The minimum value for a double is Double.NEGATIVE_INFINITY that's why Double.MIN_VALUE isn't really the minimum for a Double.
As the double are floating point numbers, you can only have the biggest number (with a lower precision) or the closest number to 0 (with a great precision).
If you really want a minimal value for a double that isn't infinity then you can use -Double.MAX_VALUE.
Because with floating point numbers, the precision is what is important as there's no exact range.
/**
* A constant holding the smallest positive nonzero value of type
* <code>double</code>, 2<sup>-1074</sup>. It is equal to the
* hexadecimal floating-point literal
* <code>0x0.0000000000001P-1022</code> and also equal to
* <code>Double.longBitsToDouble(0x1L)</code>.
*/
But i agree that it should probably have been named something better :)
As it says in the documents,
Double.MIN_VALUE is a constant holding the smallest POSITIVE nonzero value of type double, 2^(-1074).
The trick here is we are talking about a floating point number representation. The double data type is a double-precision 64-bit IEEE 754 floating point. Floating points represent numbers from 1,000,000,000,000 to 0.0000000000000001 with ease, and while maximizing precision (the number of digits) at both ends of the scale. (For more refer this)
The mantissa, always a positive number, holds the significant digits of the floating-point number. The exponent indicates the positive or negative power of the radix that the mantissa and sign should be multiplied by. The four components are combined as follows to get the floating-point value.
Think that the MIN_VALUE is the minimum value that the mantissa can represent. As the minimum values of a floating point representation is the minimum magnitude that can be represented using that. (Could have used a better name to avoid this confusion though)
123 > 10 > 1 > 0.12 > 0.012 > 0.0000123 > 0.000000001 > 0.0000000000000001
Below is just FYI.
Double-precision floating-point can represent 2,098 powers of two, from 2^-1074 through 2^1023. Denormalized powers of two are those from 2^-1074 through 2^-1023; normalized powers of two are those from 2^-1022 through 2^1023. Refer this and this.