In java I am using float to store the numbers. I chose the float format as I am working both with integers and double numbers, where the numbers are different, there can be big integers or big double numbers with different number of decimals. But when I insert these numbers into database, the wrong number is stored. For example:
float value = 0f;
value = 67522665;
System.out.println(value);
Printed: 6.7522664E7 and it is stored in the database as 67522664 not as 67522665
Floating point numbers have limited resolution — roughly 7 significant digits. You are seeing round-off error. You can use a double for more resolution or, for exact arithmetic, use BigDecimal.
Suggested reading: What Every Computer Scientist Should Know About Floating-Point Arithmetic
Doubles and floats have storage issues.
How is floating point stored?
"The float and double types are designed primarily for scientific and engineering
calculations. They perform binary floating-point arithmetic, which was carefully
designed to furnish accurate approximations quickly over a broad range of magnitudes.
They do not, however, provide exact results and should not be used where
exact results are required."
Don't use float. Use BigDecimal instead. And in my experience with databases, they return their NUMBER-typed elements as BigDecimal. When I fetch them using JDBC, they are BigDecimal objects.
As far as I got it, this is about the gap size (or ULP, units in the last place) in the binary representation, that is the spacing between contiguous f-point values.
This value is equal to:
2^(e+1-p)
being e the actual exponent of a number, and p the precision.
Note that the spacing (or gap) increases as the value of the represented number increases:
In IEEE-754, the precision is p 24, so you can see that when e >= 23 we can start talking of integer spacing in the floating point world.
2^23 = 8388608 --> 8388608 actually stored IEEE-754
8388608.2 --> 8388608 actually stored IEEE-754
Things get worse as numbers get bigger. For example:
164415560 --> 164415552 actually stored IEEE-754
Ref: The Spacing of Binary Floating-Point Numbers
Related
Multiplication using FLOAT is giving noticeable difference.
public static void main(String[] args) {
// using string and parsing instead of actual data type is part of use case, that is why representing the same here
double v1 = parseDouble("590.0");
double v2 = parseDouble("490.0");
double v3 = parseDouble("391.0");
float v4 = parseFloat("590.0");
float v5 = parseFloat("490.0");
float v6 = parseFloat("391.0");
System.out.println(new BigDecimal(v1 * v2 * v3));
System.out.println(new BigDecimal(v4 * v5 * v6));
System.out.println(BigDecimal.valueOf(Float.parseFloat("289100.0") * Float.parseFloat("391.0")));
System.out.println(BigDecimal.valueOf(Double.parseDouble("289100.0") * Double.parseDouble("391.0")));
}
Output:
113038100 // double multiplication
113038096 // float multiplication
113038096
113038100
For above code,
(590.0 * 490.0 * 391.0) gives 113038100 using double
(590.0 * 490.0 * 391.0) gives 113038096 using float (113038100 - 113038096 = 4 // difference)
I have read through https://floating-point-gui.de/basic/ this link and able to understand how floating point calculation happens and all however 4 count different is unexpected.
Please help me understand below things
Is this correct first
Does always float gives wrong numbers ??
As I can see double also use same technique, so how much guarantee we have to get correct result if we use double
Does always float gives wrong numbers ??
it depend on the number if the number can be represented using the float precision then it will be fine
"As I can see double also use same technique, so how much guarantee we
have to get correct result if we use double"
double has same issue but since double has more precision the possibility get lower but it still happen
so when you need a very precise result like in scientific or financial app you will need to use BigDecimal
watch this video it explain how the float point number work
https://www.youtube.com/watch?v=ajaHQ9S4uTA
Is this correct first
The Java float format is IEEE-754 binary32. In this format, every finite number is represented as a sign, a 24-bit integer, and a scaling by a power of two from 2−149 to 2104. The integer part is called the significand. (The format is often described as a sign, a 24-bit number with a binary point after the first bit, so it has a value in [0, 2), and a scaling from 2−126 to 2127. These are mathematically equivalent, and the format used here is noted in the IEEE-754 standard as an option.) In normal form, the 24-bit integer is 223 or greater. (Representable numbers less than 2−126 cannot be represented in normal form and are necessarily subnormal.)
In this format, 590 can be represented as +590•20 or +8,339,456•2−14. 490 is +490•20 or +16,056,320•2−15.
Their product is +289,100•20 or +9,251,200•2−5.
391 is +391•20 or +12,812,288−15.
The ordinary arithmetic product of +289,100•20 and +391•20 is +113,038,100•20. However, 113,038,100 is not a 24-bit number; it is a 27-bit number. To get it under 224, we can adjust the scaling, multiplying the significand by ⅛ and multiplying the scaling by 8 = 23.
That gives us +14,129,762.5•23. However, now the significand is not an integer. This result is not representable in the float format. To produce a result, the operation of adding in the float format is defined to round the ordinary arithmetic to the nearest representable value. In this case, there is a tie, we could round the .5 up or down. Ties are resolved by rounding to make the low digit even, so we round to +14,129,762•23.
+14,129,762•23 is 113,038,096. That is the result you got, so it is correct.
Does always float gives wrong numbers ??
This is not wrong; the computer behaved according to its specification.
Observe float is a 32-bit format, but there are infinitely many real numbers. There are even infinitely many rational numbers. It is impossible for a 32-bit format to produce the same results as theoretical real-number arithmetic or rational-number arithmetic. There are simply more possible results than there are representable values.
This is true of the 64-bit double format as well. It is also true of integer formats, fixed-precision formats, and all numerical formats with a fixed number of bits. A fixed number of bits cannot represent infinitely many values.
Your comments suggest you thought floating-point would produce approximate results for fractional values, numbers less than one. But the limitation on how many values can be represented applies at all scales. At each scale (each power of two), only 224 values are representable (223 in normal form). For scale 20, all the non-negative integers below 224 are representable. But, above that, only some of the integers are representable. At first, we have to skip every second integer, then every fourth, then every eighth, and so on.
Floating-point arithmetic is designed to approximate real-number arithmetic. It should be used when you want to approximate real-number arithmetic. It should not be used, with rare exceptions, when you want exact arithmetic.
I've written an arbitrary precision rational number class that needs to provide a way to convert to floating-point. This can be done straightforwardly via BigDecimal:
return new BigDecimal(num).divide(new BigDecimal(den), 17, RoundingMode.HALF_EVEN).doubleValue();
but this requires a value for the scale parameter when dividing the decimal numbers. I picked 17 as the initial guess because that is approximately the precision of a double precision floating point number, but I don't know whether that's actually correct.
What would be the correct number to use, defined as, the smallest number such that making it any larger would not make the answer any more accurate?
Introduction
No finite precision suffices.
The problem posed in the question is equivalent to:
What precision p guarantees that converting any rational number x to p decimal digits and then to floating-point yields the floating-point number nearest x (or, in case of a tie, either of the two nearest x)?
To see this is equivalent, observe that the BigDecimal divide shown in the question returns num/div to a selected number of decimal places. The question then asks whether increasing that number of decimal places could increase the accuracy of the result. Clearly, if there is a floating-point number nearer x than the result, then the accuracy could be improved. Thus, we are asking how many decimal places are needed to guarantee the closest floating-point number (or one of the tied two) is obtained.
Since BigDecimal offers a choice of rounding methods, I will consider whether any of them suffices. For the conversion to floating-point, I presume round-to-nearest-ties-to-even is used (which BigDecimal appears to use when converting to Double or Float). I give a proof using the IEEE-754 binary64 format, which Java uses for Double, but the proof applies to any binary floating-point format by changing the 252 used below to 2w-1, where w is the number of bits in the significand.
Proof
One of the parameters to a BigDecimal division is the rounding method. Java’s BigDecimal has several rounding methods. We only need to consider three, ROUND_UP, ROUND_HALF_UP, and ROUND_HALF_EVEN. Arguments for the others are analogous to those below, by using various symmetries.
In the following, suppose we convert to decimal using any large precision p. That is, p is the number of decimal digits in the result of the conversion.
Let m be the rational number 252+1+½−10−p. The two binary64 numbers neighboring m are 252+1 and 252+2. m is closer to the first one, so that is the result we require from converting m first to decimal and then to floating-point.
In decimal, m is 4503599627370497.4999…, where there are p−1 trailing 9s. When rounded to p significant digits with ROUND_UP, ROUND_HALF_UP, or ROUND_HALF_EVEN, the result is 4503599627370497.5 = 252+1+½. (Recognize that, at the position where rounding occurs, there are 16 trailing 9s being discarded, effectively a fraction of .9999999999999999 relative to the rounding position. In ROUND_UP, any non-zero discarded amount causes rounding up. In ROUND_HALF_UP and ROUND_HALF_EVEN, a discarded amount greater than ½ at that position causes rounding up.)
252+1+½ is equally close to the neighboring binary64 numbers 252+1 and 252+2, so the round-to-nearest-ties-to-even method produces 252+2.
Thus, the result is 252+2, which is not the binary64 value closest to m.
Therefore, no finite precision p suffices to round all rational numbers correctly.
I am writing tests for code performing calculations on floating point numbers. Quite expectedly, the results are rarely exact and I would like to set a tolerance between the calculated and expected result. I have verified that in practice, with double precision, the results are always correct after rounding of last two significant decimals, but usually after rounding the last decimal. I am aware of the format in which doubles and floats are stored, as well as the two main methods of rounding (precise via BigDecimal and faster via multiplication, math.round and division). As the mantissa is stored in binary however, is there a way to perform rounding using base 2 rather than 10?
Just clearing the last 3 bits almost always yields equal results, but if I could push it and instead 'add 2' to the mantissa if its second least significast bit is set, I could probably reach the limit of accuracy. This would be easy enough, expect I have no idea how to handle overflow (when all bits 52-1 are set).
A Java solution would be preferred, but I could probably port one for another language if I understood it.
EDIT:
As part of the problem was that my code was generic with regards to arithmetic (relying on scala.Numeric type class), what I did was an incorporation of rounding suggested in the answer into a new numeric type, which carried the calculated number (floating point in this case) and rounding error, essentially representing a range instead of a point. I then overrode equals so that two numbers are equal if their error ranges overlap (and they share arithmetic, i.e. the number type).
Yes, rounding off binary digits makes more sense than going through BigDecimal and can be implemented very efficiently if you are not worried about being within a small factor of Double.MAX_VALUE.
You can round a floating-point double value x with the following sequence in Java (untested):
double t = 9 * x; // beware: this overflows if x is too close to Double.MAX_VALUE
double y = x - t + t;
After this sequence, y should contain the rounded value. Adjust the distance between the two set bits in the constant 9 in order to adjust the number of bits that are rounded off. The value 3 rounds off one bit. The value 5 rounds off two bits. The value 17 rounds off four bits, and so on.
This sequence of instruction is attributed to Veltkamp and is typically used in “Dekker multiplication”. This page has some references.
Our teacher asked us to search about this and what I kept on getting from the net are explanations stating what double and float means.
Can you tell me whether it is possible or not, and explain why or why not?
Simple answer: yes, but only if the double is not too large.
float's are single-precision floating point numbers, meaning they use a 23-bit mantissa and 8-bit exponent, corresponding to ~6/7 s.f. precision and ~ 10^38 range.
double's are double-precision - with 52-bit mantissa and 11-bit exponent, corresponding to ~14/15 s.f. precision and ~ 10^308 range.
Since double's have larger range than floats, adding a float to a very large double will nullify the float's effects (called underflow). Of course this can happen for two double types as well.
https://en.wikipedia.org/wiki/Floating_point
Can you add two numbers with varying decimal places (e.g. 432.54385789364 + 432.1)? Yes you can.
In Java, it is the same idea.
From the Java Tutorials:
float: The float data type is a single-precision 32-bit IEEE 754 floating point. Its range of values is beyond the scope of this discussion, but is specified in the Floating-Point Types, Formats, and Values section of the Java Language Specification. As with the recommendations for byte and short, use a float (instead of double) if you need to save memory in large arrays of floating point numbers. This data type should never be used for precise values, such as currency. For that, you will need to use the java.math.BigDecimal class instead. Numbers and Strings covers BigDecimal and other useful classes provided by the Java platform.
double: The double data type is a double-precision 64-bit IEEE 754 floating point. Its range of values is beyond the scope of this discussion, but is specified in the Floating-Point Types, Formats, and Values section of the Java Language Specification. For decimal values, this data type is generally the default choice. As mentioned above, this data type should never be used for precise values, such as currency.
Basically, they are both holders to decimals. The way that they are different is how precise they can be. A float can only be 32 bits in size, compared to a double which is 64 bits in size. A float can have precision up to around 5 or 6 float point numbers, and a double can have precision up to around 10 floating point numbers.
Basically... a double can store a decimal better than a float... but takes up more space.
To answer your question, you can add a float to a double and vice versa. Generally, the result will be made into a double, and you will have to cast it back to a float if that is what you want.
If you want to be really deep about it you should say yes it is possible due to value coercion, but that it opens the door for more severe precision errors to accumulate invisibly to the compiler. float has substantially precision than double and is very regrettably the default type of literal floating-point numbers in Java source. In practice make sure to use the d suffix on literals to make sure theh are double if you have to use floating point.
These precision errors can lead to serious harm and even loss of life in sensitive systems.
Floating point is very hard to use correctly and should be avoided if possible. One extremely obvious thing not to do that is commonly mistakenly done is representing currency as a float or double. This can cause real money to be effectively given to or stolen from people.
Floating point (preferring double) is appropriate for approximate calculations and certain high performance scientific computing applications. However it is still extremely important to be aware of the precision loss characteristics particularly when a resulting floating point value is fed into further floating-point calculations.
This more generally leads in Numerical Computing and now I've really gone afield :)
SAS has a decent paper on this:
http://support.sas.com/resources/papers/proceedings11/275-2011.pdf
I have the following statement:
float diff = tempVal - m_constraint.getMinVal();
tempVal is declared as a float and the getMinVal() returns a float value.
I have the following print out:
diff=0.099999905, tempVal=5.1, m_constraint.getMinVal()=5.0
I expect the diff is 0.1 but not the above number. how to do that?
Floats use the IEEE754 to represent numbers, and that system has some rounding errors.
Floating point guide
What Every Computer Scientist Should Know About Floating-Point Arithmetic
Wikipedia on IEE754
Bottom-line if you are doing arithmetic and it needs to be exact don't use float or double but us BigDecimal
Because of the way they store values internally, floats and doubles can only store completely accurately numbers which can be decomposed into a sum of powers of 2 (and then, within certain constraints relating to their absolute and relative magnitude).
So as soon as you attempt to store, or perform a calculating involving, a number which cannot be stored exactly, you are going to get an error in the final digit.
Usually this isn't a problem provided you use floats and doubles with some precaution:
use a size of floating point primitive which has "spare" digits of precision beyond what you need;
for many applications, this probably means don't use float at all (use double instead): it has very poor precision and, with the exception of division, has no performance benefit on many processors;
when printing FP numbers, only actually print and consider the number of digits of precision that you need, and certainly don't include the final digit (use String.format to help you);
if you need arbitrary number of digits of precision, use BigDecimal instead.
You cannot get exact results with floating point numbers. You might need to use a FixedPoint library for that. See : http://sourceforge.net/projects/jmfp/
Java encodes real numbers using binary floating point representations defined in IEEE 754. Like all finite representations it cannot accurately represent all real numbers because there is far more real numbers than potential representations. Numbers which cannot be represented exactly (like 0.1 in your case) are rounded to the nearest representable number.