What is the inclusive range of float and double in Java?
Why are you not recommended to use float or double for anything where precision is critical?
Java's Primitive Data Types
boolean:
1-bit. May take on the values true and false only.
byte:
1 signed byte (two's complement). Covers values from -128 to 127.
short:
2 bytes, signed (two's complement), -32,768 to 32,767
int:
4 bytes, signed (two's complement). -2,147,483,648 to 2,147,483,647.
long:
8 bytes signed (two's complement). Ranges from -9,223,372,036,854,775,808 to +9,223,372,036,854,775,807.
float:
4 bytes, IEEE 754. Covers a range from 1.40129846432481707e-45 to 3.40282346638528860e+38 (positive or negative).
double:
8 bytes IEEE 754. Covers a range from 4.94065645841246544e-324d to 1.79769313486231570e+308d (positive or negative).
char:
2 bytes, unsigned, Unicode, 0 to 65,535
Java's Double class has members containing the Min and Max value for the type.
2^-1074 <= x <= (2-2^-52)·2^1023 // where x is the double.
Check out the Min_VALUE and MAX_VALUE static final members of Double.
(some)People will suggest against using floating point types for things where accuracy and precision are critical because rounding errors can throw off calculations by measurable (small) amounts.
Binary floating-point numbers have interesting precision characteristics, since the value is stored as a binary integer raised to a binary power. When dealing with sub-integer values (that is, values between 0 and 1), negative powers of two "round off" very differently than negative powers of ten.
For example, the number 0.1 can be represented by 1 x 10-1, but there is no combination of base-2 exponent and mantissa that can precisely represent 0.1 -- the closest you get is 0.10000000000000001.
So if you have an application where you are working with values like 0.1 or 0.01 a great deal, but where small (less than 0.000000000000001%) errors cannot be tolerated, then binary floating-point numbers are not for you.
Conversely, if powers of ten are not "special" to your application (powers of ten are important in currency calculations, but not in, say, most applications of physics), then you are actually better off using binary floating-point, since it's usually at least an order of magnitude faster, and it is much more memory efficient.
The article from the Python documentation on floating point issues and limitations does an excellent job of explaining this issue in an easy to understand form. Wikipedia also has a good article on floating point that explains the math behind the representation.
From Primitives Data Types:
float: The float data type is a single-precision 32-bit IEEE 754
floating point. Its range of values is
beyond the scope of this discussion,
but is specified in section 4.2.3
of the Java Language Specification. As
with the recommendations for byte
and short, use a float (instead of
double) if you need to save memory
in large arrays of floating point
numbers. This data type should never
be used for precise values, such as
currency. For that, you will need to
use the java.math.BigDecimal
class instead. Numbers and
Strings covers BigDecimal and
other useful classes provided by the
Java platform.
double: The double data type is a double-precision 64-bit IEEE 754
floating point. Its range of values is
beyond the scope of this discussion,
but is specified in section 4.2.3
of the Java Language Specification.
For decimal values, this data type is
generally the default choice. As
mentioned above, this data type should
never be used for precise values, such
as currency.
For the range of values, see the section 4.2.3 Floating-Point Types, Formats, and Values of the JLS.
Of course you can use floats or doubles for "critical" things ... Many applications do nothing but crunch numbers using these datatypes.
You might have misunderstood some of the various caveats regarding floating-point numbers, such as the recommendation to never compare for exact equality, and so on.
Related
Our teacher asked us to search about this and what I kept on getting from the net are explanations stating what double and float means.
Can you tell me whether it is possible or not, and explain why or why not?
Simple answer: yes, but only if the double is not too large.
float's are single-precision floating point numbers, meaning they use a 23-bit mantissa and 8-bit exponent, corresponding to ~6/7 s.f. precision and ~ 10^38 range.
double's are double-precision - with 52-bit mantissa and 11-bit exponent, corresponding to ~14/15 s.f. precision and ~ 10^308 range.
Since double's have larger range than floats, adding a float to a very large double will nullify the float's effects (called underflow). Of course this can happen for two double types as well.
https://en.wikipedia.org/wiki/Floating_point
Can you add two numbers with varying decimal places (e.g. 432.54385789364 + 432.1)? Yes you can.
In Java, it is the same idea.
From the Java Tutorials:
float: The float data type is a single-precision 32-bit IEEE 754 floating point. Its range of values is beyond the scope of this discussion, but is specified in the Floating-Point Types, Formats, and Values section of the Java Language Specification. As with the recommendations for byte and short, use a float (instead of double) if you need to save memory in large arrays of floating point numbers. This data type should never be used for precise values, such as currency. For that, you will need to use the java.math.BigDecimal class instead. Numbers and Strings covers BigDecimal and other useful classes provided by the Java platform.
double: The double data type is a double-precision 64-bit IEEE 754 floating point. Its range of values is beyond the scope of this discussion, but is specified in the Floating-Point Types, Formats, and Values section of the Java Language Specification. For decimal values, this data type is generally the default choice. As mentioned above, this data type should never be used for precise values, such as currency.
Basically, they are both holders to decimals. The way that they are different is how precise they can be. A float can only be 32 bits in size, compared to a double which is 64 bits in size. A float can have precision up to around 5 or 6 float point numbers, and a double can have precision up to around 10 floating point numbers.
Basically... a double can store a decimal better than a float... but takes up more space.
To answer your question, you can add a float to a double and vice versa. Generally, the result will be made into a double, and you will have to cast it back to a float if that is what you want.
If you want to be really deep about it you should say yes it is possible due to value coercion, but that it opens the door for more severe precision errors to accumulate invisibly to the compiler. float has substantially precision than double and is very regrettably the default type of literal floating-point numbers in Java source. In practice make sure to use the d suffix on literals to make sure theh are double if you have to use floating point.
These precision errors can lead to serious harm and even loss of life in sensitive systems.
Floating point is very hard to use correctly and should be avoided if possible. One extremely obvious thing not to do that is commonly mistakenly done is representing currency as a float or double. This can cause real money to be effectively given to or stolen from people.
Floating point (preferring double) is appropriate for approximate calculations and certain high performance scientific computing applications. However it is still extremely important to be aware of the precision loss characteristics particularly when a resulting floating point value is fed into further floating-point calculations.
This more generally leads in Numerical Computing and now I've really gone afield :)
SAS has a decent paper on this:
http://support.sas.com/resources/papers/proceedings11/275-2011.pdf
When I write something like
double a = 0.0;
double b = 0.0;
double c = a/b;
The result is Double.NaN, but when I try the same for integers, it produces an ArithmeticException. So, why isn't there a Integer.NaN?
The answer has very little to do with Java. Infinity or undefined numbers are not a part of the integer set, so they are excluded from Integer, whereas floating point types represent real numbers as well as complex numbers, so to deal with these, NaN has been included with floating point types.
For the same reason that there is no integer NaN in any other language.
Modern computers use 2's complement binary representation for integers, and that representation doesn't have a NaN value. (All values in the domain of the representation type represent definite integers.)
It follows that computer integer arithmetic hardware does not recognize any NaN representation.
In theory, someone could invent an alternative representation for integers that includes NaN (or INF, or some other exotic value). However, arithmetic using such a representation would not be supported by the hardware. While it would be possible to implement it in software, it would be prohibitively expensive1... and undesirable in other respects too to include this support in the Java language.
1 - It is of course relative, but I'd anticipate that a software implementation of NaNs would be (at least) an order of magnitude slower than hardware. If you actually, really, needed this, then that would be acceptable. But the vast majority of integer arithmetic codes don't need this. In most cases throwing an exception for "divide by zero" is just fine, and an order of magnitude slow down in all integer arithmetic operations is ... not acceptable.
By contrast:
the "unused" values in the representation space already exist
NaN and INF values are part of the IEE floating point standard, and
they are (typically) implemented by the native hardware implementation of floating point arithmetic
As noted in other comments, it's largely because NaN is a standard value for floating point numbers. You can read about the reasons NaN would be returned on Wikipedia here:
http://en.wikipedia.org/wiki/NaN
Notice that only one of these reasons exists for integer numbers (divide by zero). There is also both a positive and a negative infinity value for floating point numbers that integers don't have and is closely linked to NaN in the floating point specification.
In java I am using float to store the numbers. I chose the float format as I am working both with integers and double numbers, where the numbers are different, there can be big integers or big double numbers with different number of decimals. But when I insert these numbers into database, the wrong number is stored. For example:
float value = 0f;
value = 67522665;
System.out.println(value);
Printed: 6.7522664E7 and it is stored in the database as 67522664 not as 67522665
Floating point numbers have limited resolution — roughly 7 significant digits. You are seeing round-off error. You can use a double for more resolution or, for exact arithmetic, use BigDecimal.
Suggested reading: What Every Computer Scientist Should Know About Floating-Point Arithmetic
Doubles and floats have storage issues.
How is floating point stored?
"The float and double types are designed primarily for scientific and engineering
calculations. They perform binary floating-point arithmetic, which was carefully
designed to furnish accurate approximations quickly over a broad range of magnitudes.
They do not, however, provide exact results and should not be used where
exact results are required."
Don't use float. Use BigDecimal instead. And in my experience with databases, they return their NUMBER-typed elements as BigDecimal. When I fetch them using JDBC, they are BigDecimal objects.
As far as I got it, this is about the gap size (or ULP, units in the last place) in the binary representation, that is the spacing between contiguous f-point values.
This value is equal to:
2^(e+1-p)
being e the actual exponent of a number, and p the precision.
Note that the spacing (or gap) increases as the value of the represented number increases:
In IEEE-754, the precision is p 24, so you can see that when e >= 23 we can start talking of integer spacing in the floating point world.
2^23 = 8388608 --> 8388608 actually stored IEEE-754
8388608.2 --> 8388608 actually stored IEEE-754
Things get worse as numbers get bigger. For example:
164415560 --> 164415552 actually stored IEEE-754
Ref: The Spacing of Binary Floating-Point Numbers
I use doubles for a uniform implementation of some arithmetic calculations. These calculations may be actually applied to integers too, but there are no C++-like templates in Java and I don't want to duplicate the implementation code, so I simply use "double" version for ints.
Does JVM spec guarantees the correctness of integer operations such a <=,>=, +, -, *, and / (in case of remainder==0) when the operations are emulated as corresponding floating point ops?
(Any integer, of course, has reasonable size to be represented in double's mantissa)
According to the Java Language Specification:
Operators on floating-point numbers
behave as specified by IEEE 754 (with
the exception of the remainder
operator (§15.17.3)).
So you're guaranteed uniform behaviour, and while I don't have access to the official IEEE standard document, I'm pretty sure that it implicitly guarantees that operations on integers that can be represented exactly as a float/double work as expected.
briefly yes.
double a = 3.0;
double b = 2.0;
System.out.println(a*b); // 6.0
System.out.println(a+b); // 5.0
System.out.println(a-b); // 1.0
System.out.println(a/b); // 1.5 // if you want to get 1 here you should cast it to `integer (int)`
System.out.println(a>=b); // true
System.out.println(a<=b); // false
but be careful while multiplication (*) because a*b can cause overflow while casting to integer. same situation for (+ and -)
Indeed, I 've found the standard and it says "yes"
JVM spec:
The rounding operations of the Java virtual machine always use IEEE 754 round to
nearest mode. Inexact results are rounded to the nearest representable value, with ties going to the value with a zero least-significant bit. This is the IEEE 754 default mode. But Java virtual machine instructions that convert values of floating-point types to values of integral types round toward zero. The Java virtual machine does not give any means to change the floating-point rounding mode.
ANSI/IEEE Std 754-1985 5.
... Except for binary <---> decimal conversion, each of the operations shall be performed as if it first produced an intermediate result correct to infinite precision and with unbounded range, and then coerced this intermediate result to fit in the destination’s format
ANSI/IEEE Std 754-1985 5.4.
Conversions between floating-point integers and integer formats shall be exact unless an exception arises as specified in 7.1.
Summary
1) exact operations are always exact if the result fits the double format (and, therefore, integer result is always floating-point integer).
2) int <--> double conversions are always exact for floating point integers.
Can anyone shed some light on why Double.MIN_VALUE is not actually the minimum value that Doubles can take? It is a positive value, and a Double can of course be negative.
I understand why it's a useful number, but it seems a very unintuitive name, especially when compared to Integer.MIN_VALUE. Calling it Double.SMALLEST_POSITIVE or MIN_INCREMENT or similar would have clearer semantics.
Also, what is the minimum value that Doubles can take? Is it -Double.MAX_VALUE? The docs don't seem to say.
The IEEE 754 format has one bit reserved for the sign and the remaining bits representing the magnitude. This means that it is "symmetrical" around origo (as opposed to the Integer values, which have one more negative value). Thus the minimum value is simply the same as the maximum value, with the sign-bit flipped, so yes, -Double.MAX_VALUE is the lowest actual number you can represent with a double.
I suppose the Double.MAX_VALUE should be seen as maximum magnitude, in which case it actually makes sense to simply write -Double.MAX_VALUE. It also explains why Double.MIN_VALUE is the least positive value (since that represents the least possible magnitude).
But sure, I agree that the naming is a bit misleading. Being used to the meaning Integer.MIN_VALUE, I too was a bit surprised when I read that Double.MIN_VALUE was the smallest absolute value that could be represented. Perhaps they thought it was superfluous to have a constant representing the least possible value as it is simply a - away from MAX_VALUE :-)
(Note, there is also Double.NEGATIVE_INFINITY but I'm disregarding from this, as it is to be seen as a "special case" and does not in fact represent any actual number.)
Here is a good text on the subject.
These constants have nothing to do with sign. This makes more sense if you consider a double as a composite of three parts: Sign, Exponent and Mantissa.
Double.MIN_VALUE is actually the smallest value Mantissa can assume when the Exponent is at minimun value before a flush to zero occurs. Likewise MAX_VALUE can be understood as the largest value Mantissa can assume when the Exponent is at maximum value before a flush to infinity occurs.
A more descriptive name for these two could be Largest Absolute (add non-zero for verbositiy) and Smallest Absolute value (add non-infinity for verbositiy).
Check out the IEEE 754 (1985) standard for details. There is a revised (2008) version, but that only introduces more formats which aren't even supported by java (strictly speaking java even lacks support for some mandatory features of IEEE 754 1985, like many other high level languages).
I assume the confusing names can be traced back to C, which defined FLT_MIN as the smallest positive number.
Like in Java, where you have to use -Double.MAX_VALUE, you have to use -FLT_MAX to get the smallest float in C.
The minimum value for a double is Double.NEGATIVE_INFINITY that's why Double.MIN_VALUE isn't really the minimum for a Double.
As the double are floating point numbers, you can only have the biggest number (with a lower precision) or the closest number to 0 (with a great precision).
If you really want a minimal value for a double that isn't infinity then you can use -Double.MAX_VALUE.
Because with floating point numbers, the precision is what is important as there's no exact range.
/**
* A constant holding the smallest positive nonzero value of type
* <code>double</code>, 2<sup>-1074</sup>. It is equal to the
* hexadecimal floating-point literal
* <code>0x0.0000000000001P-1022</code> and also equal to
* <code>Double.longBitsToDouble(0x1L)</code>.
*/
But i agree that it should probably have been named something better :)
As it says in the documents,
Double.MIN_VALUE is a constant holding the smallest POSITIVE nonzero value of type double, 2^(-1074).
The trick here is we are talking about a floating point number representation. The double data type is a double-precision 64-bit IEEE 754 floating point. Floating points represent numbers from 1,000,000,000,000 to 0.0000000000000001 with ease, and while maximizing precision (the number of digits) at both ends of the scale. (For more refer this)
The mantissa, always a positive number, holds the significant digits of the floating-point number. The exponent indicates the positive or negative power of the radix that the mantissa and sign should be multiplied by. The four components are combined as follows to get the floating-point value.
Think that the MIN_VALUE is the minimum value that the mantissa can represent. As the minimum values of a floating point representation is the minimum magnitude that can be represented using that. (Could have used a better name to avoid this confusion though)
123 > 10 > 1 > 0.12 > 0.012 > 0.0000123 > 0.000000001 > 0.0000000000000001
Below is just FYI.
Double-precision floating-point can represent 2,098 powers of two, from 2^-1074 through 2^1023. Denormalized powers of two are those from 2^-1074 through 2^-1023; normalized powers of two are those from 2^-1022 through 2^1023. Refer this and this.