Floating Point Casting in Java - java

Casting for integers is very straightforward, the extra bits simply disappear.
But, is it important to understand what is happening under the hood for casting floating point? I've tried to read information on how floating point is calculated, but I have yet to find one that explains it well. At least that's my excuse. I get the basic idea although the calculation of the mantissa is a bit difficult.
At least up to Java 7, I understand that floating points cannot be used in bitwise operations. Which makes sense because of how they are stored internally. Is there anything important that is needed to know on how floating points operate or are cast?
So, to Summarize:
Is it important to understand the internal workings of floating point like integers?
What is the internal process of casting a floating point to an integer?

What is the internal process of casting a floating point to an integer?
Java calls the machine code instruction which does this in compliance with the IEEE-754 standard. There is nothing for Java to do as such. If you want to know how casting works I suggest you read the standard.
Basically, the mantissa is shifted by the exponent and the sign applied. i.e. a floating point number is sign * 2^exponent * mantissa and all it does is perform this calculation and drop and fractional parts.

First, you need to understand that a floating point number is essentially an approximation. You can put in, say 1.23 and get out 1.229998 (or some such), because 1.23 is represented exactly. Regardless of whether you will be doing any casts, you need to understand this, and how it affects computations (and especially comparisons).
From the standpoint of cast, casting a float to a double causes no loss of information, since a double can contain every value that a float can contain. But casting from double to float can cause loss of precision (and, for very large or small numbers, exponent overflow/underflow), since there's simply more information in a 64-bit value than in a 32-bit one, so some data's going to end up "on the floor".
Similarly, casting from an int to a double causes no loss of information, since a double can contain every value an int can contain and then some. But casting from int to float or from long to double or float can result in loss of precision (though there can never be an exponent overflow/underflow).
Casting from float or double to int or long can easily result in overflow/underflow and major loss of data, if the float or double value has a large positive exponent or any negative exponent. And, of course, when you cast from floating-point to fixed the fractional part of the number is truncated (essentially a "floor" operation).

Related

Is it possible to add a float to a double?

Our teacher asked us to search about this and what I kept on getting from the net are explanations stating what double and float means.
Can you tell me whether it is possible or not, and explain why or why not?
Simple answer: yes, but only if the double is not too large.
float's are single-precision floating point numbers, meaning they use a 23-bit mantissa and 8-bit exponent, corresponding to ~6/7 s.f. precision and ~ 10^38 range.
double's are double-precision - with 52-bit mantissa and 11-bit exponent, corresponding to ~14/15 s.f. precision and ~ 10^308 range.
Since double's have larger range than floats, adding a float to a very large double will nullify the float's effects (called underflow). Of course this can happen for two double types as well.
https://en.wikipedia.org/wiki/Floating_point
Can you add two numbers with varying decimal places (e.g. 432.54385789364 + 432.1)? Yes you can.
In Java, it is the same idea.
From the Java Tutorials:
float: The float data type is a single-precision 32-bit IEEE 754 floating point. Its range of values is beyond the scope of this discussion, but is specified in the Floating-Point Types, Formats, and Values section of the Java Language Specification. As with the recommendations for byte and short, use a float (instead of double) if you need to save memory in large arrays of floating point numbers. This data type should never be used for precise values, such as currency. For that, you will need to use the java.math.BigDecimal class instead. Numbers and Strings covers BigDecimal and other useful classes provided by the Java platform.
double: The double data type is a double-precision 64-bit IEEE 754 floating point. Its range of values is beyond the scope of this discussion, but is specified in the Floating-Point Types, Formats, and Values section of the Java Language Specification. For decimal values, this data type is generally the default choice. As mentioned above, this data type should never be used for precise values, such as currency.
Basically, they are both holders to decimals. The way that they are different is how precise they can be. A float can only be 32 bits in size, compared to a double which is 64 bits in size. A float can have precision up to around 5 or 6 float point numbers, and a double can have precision up to around 10 floating point numbers.
Basically... a double can store a decimal better than a float... but takes up more space.
To answer your question, you can add a float to a double and vice versa. Generally, the result will be made into a double, and you will have to cast it back to a float if that is what you want.
If you want to be really deep about it you should say yes it is possible due to value coercion, but that it opens the door for more severe precision errors to accumulate invisibly to the compiler. float has substantially precision than double and is very regrettably the default type of literal floating-point numbers in Java source. In practice make sure to use the d suffix on literals to make sure theh are double if you have to use floating point.
These precision errors can lead to serious harm and even loss of life in sensitive systems.
Floating point is very hard to use correctly and should be avoided if possible. One extremely obvious thing not to do that is commonly mistakenly done is representing currency as a float or double. This can cause real money to be effectively given to or stolen from people.
Floating point (preferring double) is appropriate for approximate calculations and certain high performance scientific computing applications. However it is still extremely important to be aware of the precision loss characteristics particularly when a resulting floating point value is fed into further floating-point calculations.
This more generally leads in Numerical Computing and now I've really gone afield :)
SAS has a decent paper on this:
http://support.sas.com/resources/papers/proceedings11/275-2011.pdf

The accuracy of a double in general programming and Java

I understand that due to the nature of a float/double one should not use them for precision important calculations. However, i'm a little confused on their limitations due to mixed answers on similar questions, whether or not floats and doubles will always be inaccurate regardless of significant digits or are only inaccurate up to the 16th digit.
I've ran a few examples in Java,
System.out.println(Double.parseDouble("999999.9999999999");
// this outputs correctly w/ 16 digits
System.out.println(Double.parseDouble("9.99999999999999");
// This also outputs correctly w/ 15 digits
System.out.println(Double.parseDouble("9.999999999999999");
// But this doesn't output correctly w/ 16 digits. Outputs 9.999999999999998
I can't find the link to another answer that stated that values like 1.98 and 2.02 would round down to 2.0 and therefore create inaccuracies but testing shows that the values are printed correctly. So my first question is whether or not floating/double values will always be inaccurate or is there a lower limit where you can be assured of precision.
My second question is in regards to using BigDecimal. I know that I should be using BigDecimal for precision important calculations. Therefore I should be using BigDecimal's methods for arithmetic and comparing. However, BigDecimal also includes a doubleValue() method which will convert the BigDecimal to a double. Would it be safe for me to do a comparison between double values that I know for sure have less than 16 digits? There will be no arithmetic done on them at all so the inherent values should not have changed.
For example, is it safe for me to do the following?
BigDecimal myDecimal = new BigDecimal("123.456");
BigDecimal myDecimal2 = new BigDecimal("234.567");
if (myDecimal.doubleValue() < myDecimal2.doubleValue()) System.out.println("myDecimal is smaller than myDecimal2");
Edit: After reading some of the responses to my own answer i've realized my understanding was incorrect and have deleted it. Here are some snippets from it that might help in the future.
"A double cannot hold 0.1 precisely. The closest representable value to 0.1 is 0.1000000000000000055511151231257827021181583404541015625. Java Double.toString only prints enough digits to uniquely identify the double, not the exact value." - Patricia Shanahan
Sources:
https://stackoverflow.com/a/5749978 - States that a double can hold up to 15 digits
I suggest you read this page:
https://en.wikipedia.org/wiki/Double-precision_floating-point_format
Once you've read and understood it, and perhaps converted several examples to their binary representations in the 64 bit floating point format, then you'll have a much better idea of what significant digits a Double can hold.
As a side note, (perhaps trivial) a nice and reliable way to store a known precision of value is to simply multiply it by the relevant factor and store as some integral type, which are completely precise.
For example:
double costInPounds = <something>; //e.g. 3.587
int costInPence = (int)(costInPounds * 100 + 0.5); //359
Plainly some precision can be lost, but if a required/desired precision is known, this can save a lot of bother with floating point values, and once this has been done, no precision can be lost by further manipulations.
The + 0.5 is to ensure that rounding works as expected. (int) takes the 'floor' of the provided double value, so adding 0.5 makes it round up and down as expected.

Why doesn't Integer represent NaN in Java?

When I write something like
double a = 0.0;
double b = 0.0;
double c = a/b;
The result is Double.NaN, but when I try the same for integers, it produces an ArithmeticException. So, why isn't there a Integer.NaN?
The answer has very little to do with Java. Infinity or undefined numbers are not a part of the integer set, so they are excluded from Integer, whereas floating point types represent real numbers as well as complex numbers, so to deal with these, NaN has been included with floating point types.
For the same reason that there is no integer NaN in any other language.
Modern computers use 2's complement binary representation for integers, and that representation doesn't have a NaN value. (All values in the domain of the representation type represent definite integers.)
It follows that computer integer arithmetic hardware does not recognize any NaN representation.
In theory, someone could invent an alternative representation for integers that includes NaN (or INF, or some other exotic value). However, arithmetic using such a representation would not be supported by the hardware. While it would be possible to implement it in software, it would be prohibitively expensive1... and undesirable in other respects too to include this support in the Java language.
1 - It is of course relative, but I'd anticipate that a software implementation of NaNs would be (at least) an order of magnitude slower than hardware. If you actually, really, needed this, then that would be acceptable. But the vast majority of integer arithmetic codes don't need this. In most cases throwing an exception for "divide by zero" is just fine, and an order of magnitude slow down in all integer arithmetic operations is ... not acceptable.
By contrast:
the "unused" values in the representation space already exist
NaN and INF values are part of the IEE floating point standard, and
they are (typically) implemented by the native hardware implementation of floating point arithmetic
As noted in other comments, it's largely because NaN is a standard value for floating point numbers. You can read about the reasons NaN would be returned on Wikipedia here:
http://en.wikipedia.org/wiki/NaN
Notice that only one of these reasons exists for integer numbers (divide by zero). There is also both a positive and a negative infinity value for floating point numbers that integers don't have and is closely linked to NaN in the floating point specification.

Double Precision when a float value is passed in double

I have on question regarding double precision.When a float value is passed into double then I get some different result. For e.g.
float f= 54.23f;
double d1 = f;
System.out.println(d1);
The output is 54.22999954223633. Can someone explain the reason behind this behaviour. Is it like double defaults to 14 places of decimal precision.
The same value is printed differently for float and double because the Java specification requires printing as many digits as needed to distinguish the value from adjacent representable values in the same type (per my answer here, and see the linked documentation for more precision in the definition).
Since float has fewer bits to represent values, and hence fewer values, they are spaced more widely apart, and you do not need as many digits to distinguish them. When you put the value into a double and print it, the Java rules require that more digits be printed so that the value is distinguished from nearby double values. The println function does not know that the value originally came from a float and does not contain as much information as can fit into a double.
54.23f is exactly 54.229999542236328125 (in hexadecimal, 0x1.b1d70ap+5). The float values just below and just above this are 54.2299957275390625 (0x1.b1d708p+5) and 54.23000335693359375 (0x1.b1d70cp+5). As you can see, printing “54.229999” would distinguish the value from 54.229995… and from 54.23…. However, the double values just below and just above 54.23f are 54.22999954223632101957264239899814128875732421875 and 54.22999954223633523042735760100185871124267578125. To distinguish the value, you need “54.22999954223633”.
This is because the float hides the extra decimals and double shows them. The double will represent the actual number quite precisely and shows more digits.
Try this:
System.out.println(f.doubleValue()); (need to make it a Float first ofcourse)
So as you can see, the information is there, it is just rounded.
Hope this helps
This is due to the Internal Representation.
Floating-point numbers are typically packed into a computer datum as the sign bit, the exponent field, and the significand (mantissa), from left to right.
This is called as Accuracy Problems.
The fact that floating-point numbers cannot precisely represent all real numbers, and that floating-point operations cannot precisely represent true arithmetic operations, leads to many surprising situations. This is related to the finite precision with which computers generally represent numbers.
It is not a problem. It is how double works. You do not have to handle it and care about it. The precision of double is enough. Think, the difference between you number and the expected result is in the 14 position after decimal point.
If you need arbitrarily good precision, use the java.math.BigDecimal class.
Or if you still want to use double. Do like this:
double d = 5.5451521841;
NumberFormat nf = new DecimalFormat("##.###");
System.out.println(nf.format(d));
Please let me know in case of any doubt.
Actually this is only about different visual representation or converting float / double to String. Let's take a look at internal binary representation
float f = 0.23f;
double d = f;
System.out.println(Integer.toBinaryString(Float.floatToIntBits(f)));
System.out.println(Long.toBinaryString(Double.doubleToLongBits(d)));
output
111110011010111000010100011111
11111111001101011100001010001111100000000000000000000000000000
it means that f was converted to d1 without any distortion, significant digits are the same
double and float represent numbers in different formats.
Because of this you are bound to find certain numbers that store perfectly in one format but not in the other. You happen to have found one that correctly fits in a float but does not fit exactly in a `double.
This problem can also show itself when two different formatters are used.

float number is not the expected number after subtraction

I have the following statement:
float diff = tempVal - m_constraint.getMinVal();
tempVal is declared as a float and the getMinVal() returns a float value.
I have the following print out:
diff=0.099999905, tempVal=5.1, m_constraint.getMinVal()=5.0
I expect the diff is 0.1 but not the above number. how to do that?
Floats use the IEEE754 to represent numbers, and that system has some rounding errors.
Floating point guide
What Every Computer Scientist Should Know About Floating-Point Arithmetic
Wikipedia on IEE754
Bottom-line if you are doing arithmetic and it needs to be exact don't use float or double but us BigDecimal
Because of the way they store values internally, floats and doubles can only store completely accurately numbers which can be decomposed into a sum of powers of 2 (and then, within certain constraints relating to their absolute and relative magnitude).
So as soon as you attempt to store, or perform a calculating involving, a number which cannot be stored exactly, you are going to get an error in the final digit.
Usually this isn't a problem provided you use floats and doubles with some precaution:
use a size of floating point primitive which has "spare" digits of precision beyond what you need;
for many applications, this probably means don't use float at all (use double instead): it has very poor precision and, with the exception of division, has no performance benefit on many processors;
when printing FP numbers, only actually print and consider the number of digits of precision that you need, and certainly don't include the final digit (use String.format to help you);
if you need arbitrary number of digits of precision, use BigDecimal instead.
You cannot get exact results with floating point numbers. You might need to use a FixedPoint library for that. See : http://sourceforge.net/projects/jmfp/
Java encodes real numbers using binary floating point representations defined in IEEE 754. Like all finite representations it cannot accurately represent all real numbers because there is far more real numbers than potential representations. Numbers which cannot be represented exactly (like 0.1 in your case) are rounded to the nearest representable number.

Categories