How do you get the mantissa of a float in java? - java

I'm trying to get the mantissa of a float (just to learn), but it isn't working as expected.
The mantissa of say 5.3 is 53, right? I tried this code:
System.out.println(Float.floatToIntBits(5.3f) & 0x7FFFFF);
It printed 2726298. Shouldn't it remove the exponent bits and leave 53? I tried plenty of things, but this always happens. What am I doing wrong here?

The formula for single precision following the IEEE standard is:
(-1)^sign + 1.Mantissa x 2^(Exponent - Bias)
So 5.3 base 10 is 101.0100110011001100110011 base 2
101.0100110011001100110011 = 1.010100110011001100110011 * 2^2
2^2 = 2^(exp - bias) having bias = 127 (according to the IEEE standard for single precision)
so: exp - 127 = 2 => exp = 129 base 10 or 10000001 base 2
Single precision table:
0 | 10000001 | 01010011001100110011001
Sign = 0
Exp = 129
Mantissa = 2726297

From the article IBM: Java's new math. Floating-point numbers (in Russian) the simplest way to get the mantissa is:
public static double getMantissa(double x) {
int exponent = Math.getExponent(x);
return x / Math.pow(2, exponent);
}

Related

Why `2.0 - 1.1` and `2.0F - 1.1F` produce different results?

I am working on a code where I am comparing Double and float values:
class Demo {
public static void main(String[] args) {
System.out.println(2.0 - 1.1); // 0.8999999999999999
System.out.println(2.0 - 1.1 == 0.9); // false
System.out.println(2.0F - 1.1F); // 0.9
System.out.println(2.0F - 1.1F == 0.9F); // true
System.out.println(2.0F - 1.1F == 0.9); // false
}
}
Output is given below:
0.8999999999999999
false
0.9
true
false
I believe the Double value can save more precision than the float.
Please explain this, looks like the float value is not lose precision but the double one lose?
Edit:
#goodvibration I'm aware of that 0.9 can not be exactly saved in any computer language, i'm just confused how java works with this in detail, why 2.0F - 1.1F == 0.9F, but 2.0 - 1.1 != 0.9, another interesting found may help:
class Demo {
public static void main(String[] args) {
System.out.println(2.0 - 0.9); // 1.1
System.out.println(2.0 - 0.9 == 1.1); // true
System.out.println(2.0F - 0.9F); // 1.1
System.out.println(2.0F - 0.9F == 1.1F); // true
System.out.println(2.0F - 0.9F == 1.1); // false
}
}
I know I can't count on the float or double precision, just.. can't figure it out drive me crazy, whats the real deal behind this? Why 2.0 - 0.9 == 1.1 but 2.0 - 1.1 != 0.9 ??
The difference between float and double:
IEEE 754 single-precision binary floating-point format
IEEE 754 double-precision binary floating-point format
Let's run your numbers in a simple C program, in order to get their binary representations:
#include <stdio.h>
typedef union {
float val;
struct {
unsigned int fraction : 23;
unsigned int exponent : 8;
unsigned int sign : 1;
} bits;
} F;
typedef union {
double val;
struct {
unsigned long long fraction : 52;
unsigned long long exponent : 11;
unsigned long long sign : 1;
} bits;
} D;
int main() {
F f = {(float )(2.0 - 1.1)};
D d = {(double)(2.0 - 1.1)};
printf("%d %d %d\n" , f.bits.sign, f.bits.exponent, f.bits.fraction);
printf("%lld %lld %lld\n", d.bits.sign, d.bits.exponent, d.bits.fraction);
return 0;
}
The printout of this code is:
0 126 6710886
0 1022 3602879701896396
Based on the two format specifications above, let's convert these numbers to rational values.
In order to achieve high accuracy, let's do this in a simple Python program:
from decimal import Decimal
from decimal import getcontext
getcontext().prec = 100
TWO = Decimal(2)
def convert(sign, exponent, fraction, e_len, f_len):
return (-1) ** sign * TWO ** (exponent - 2 ** (e_len - 1) + 1) * (1 + fraction / TWO ** f_len)
def toFloat(sign, exponent, fraction):
return convert(sign, exponent, fraction, 8, 23)
def toDouble(sign, exponent, fraction):
return convert(sign, exponent, fraction, 11, 52)
f = toFloat(0, 126, 6710886)
d = toDouble(0, 1022, 3602879701896396)
print('{:.40f}'.format(f))
print('{:.40f}'.format(d))
The printout of this code is:
0.8999999761581420898437500000000000000000
0.8999999999999999111821580299874767661094
If we print these two values while specifying between 8 and 15 decimal digits, then we shall experience the same thing that you have observed (the double value printed as 0.9, while the float value printed as close to 0.9):
In other words, this code:
for n in range(8, 15 + 1):
string = '{:.' + str(n) + 'f}';
print(string.format(f))
print(string.format(d))
Gives this printout:
0.89999998
0.90000000
0.899999976
0.900000000
0.8999999762
0.9000000000
0.89999997616
0.90000000000
0.899999976158
0.900000000000
0.8999999761581
0.9000000000000
0.89999997615814
0.90000000000000
0.899999976158142
0.900000000000000
Our conclusion is therefore that Java prints decimals with a precision of between 8 and 15 digits by default.
Nice question BTW...
Pop quiz: Represent 1/3rd, in decimal.
Answer: You can't; not precisely.
Computers count in binary. There are many more numbers that 'cannot be completely represented'. Just like, in the decimal question, if you have only a small piece of paper to write it on, you may simply go with 0.3333333 and call it a day, and you'd then have a number that is quite close to, but not entirely the same as, 1 / 3, so do computers represent fractions.
Or, think about it this way: a float occupies 32-bits; a double occupies 64. There are only 2^32 (about 4 billion) different numbers that a 32-bit value can represent. And yet, even between 0 and 1 there are an infinite amount of numbers. So, given that there are at most 2^32 specific, concrete numbers that are 'representable precisely' as a float, any number that isn't in that blessed set of about 4 billion values, is not representable. Instead of just erroring out, you simply get the one in this pool of 4 billion values that IS representable, and is the closest number to the one you wanted.
In addition, because computers count in binary and not decimal, your sense of what is 'representable' and what isn't, is off. You may think that 1/3 is a big problem, but surely 1/10 is easy, right? That's simply 0.1 and that is a precise representation. Ah, but, a tenth works well in decimal. After all, decimal is based around the number 10, no surprise there. But in binary? a half, a fourth, an eighth, a sixteenth: Easy in binary. A tenth? That is as difficult as a third: NOT REPRESENTABLE.
0.9 is, itself, not a representable number. And yet, when you printed your float, that's what you got.
The reason is, printing floats/doubles is an art, more than a science. Given that only a few numbers are representable, and given that these numbers don't feel 'natural' to humans due to the binary v. decimal thing, you really need to add a 'rounding' strategy to the number or it'll look crazy (nobody wants to read 0.899999999999999999765). And that is precisely what System.out.println and co do.
But you really should take control of the rounding function: Never use System.out.println to print doubles and floats. Use System.out.printf("%.6f", yourDouble); instead, and in this case, BOTH would print 0.9. Because whilst neither can actually represent 0.9 precisely, the number that is closest to it in floats (or rather, the number you get when you take the number closest to 2.0 (which is 2.0), and the number closest to 1.1 (which is not 1.1 precisely), subtract them, and then find the number closest to that result) – prints as 0.9 even though it isn't for floats, and does not print as 0.9 in double.

Java Math.IEEERemainder confusing result

Java's Math.IEEERemainder function states:
The remainder value is mathematically equal to f1 - f2 × n, where n is
the mathematical integer closest to the exact mathematical value of
the quotient f1/f2, and if two mathematical integers are equally close
to f1/f2, then n is the integer that is even
For the following:
double f1 = 0.1;
double f2 = 0.04;
System.out.println(Math.IEEEremainder(f1, f2));
The output is -0.019999999999999997
However, 0.1/0.04 = 2.5 which is equidistant from both the integers 2 and 3. Shouldn't we pick n = 2 here, resulting in 0.1 - 0.04*2 = 0.02, instead of -0.02 ?
See: Is floating point math broken?
You would think that 0.1 / 0.04 would return exactly 2.5, but that's not true. According to this article, 0.1 cannot be accurately represented using IEEE 754, and is actually represented as 0.100000000000000005551....
In this case, the quotient is slightly higher due to that minuscule offset, which results in a value of 3 for n, as it's no longer equidistant between 2 and 3.
Computing it results in the following:
0.1 - 0.04 * 3 = 0.1 - 0.12 = -0.02 ~= -0.019999999999999997

Why double has specific range of values? (Java)

Why double in Java has a specific range of values from ±5,0*10(^-324) to ±1,7*10(^308)? I mean why it's not like ±5,0*10(^-324) to ±5,0*10(^308) or ±1,7*10(^-324) to ±1,7*10(^308)?
Answer to your question is subnormal numbers, check following link
https://en.wikipedia.org/wiki/Denormal_number
Double floating point numbers in Java are based on the format defined in IEEE 754.
See this link for the explanation.
https://en.wikipedia.org/wiki/Double-precision_floating-point_format
Following is a simple set of rules
Floating point number is represented in 64 bits
64 bits are divided in following
Sign bit: 1 bit (sign of the number)
Exponent: 11 bits (signed)
Significand precision (Fraction): 52 bits
Number range that we get from this setup is
-1022 <= Exponent <= 1023 (total 2046) (excluding 0 and 2047, they have special meanings)
000 (0 in base 16) is used to represent a signed zero (if F=0) and subnormals (if F≠0); and
7ff (2047 in base 16) is used to represent ∞ (if F=0) and NaNs (if F≠0),
https://en.wikipedia.org/wiki/Exponent_bias
and
-2^52 <= Fraction <= 2^52
So the minimum and maximum numbers that can be represented are
Min positive double = +1 * 2^(-1022) ≈ 2.225 * 10(−308)
Note: 1022 * Math.log(2) / Math.log(10) = 307.652
and Math.pow(10, 1 - .652) = 2.228 (.652 is approximation)
Max positive double = +(2^52) * (2^1023) = 1.797 * 10^308
So the range becomes [-2.225 * 10(−308), 1.797 * 10^308]
This range changes due to subnormal numbers
Subnormal number is a number that is smaller than the minimum normal
number defined by the specification.
If I have a number 0.00123 it would be represented as 1.23 * 10^(-3). Floating point numbers by specification don't have leading zeroes. So If there's a number with leading zeros, it adds to the default Exponent. So If I have a number with minimum exponent possible with leading zeroes, leading zeros will add to the negative exponent.
There are 52 bits for the signifand (fraction) so maximum number of leading zeros in binary can be 51. which effectively produce following number.
Min positive Subnormal = 1 * 2^-52 * (2^-1022) = 2^(-2074) ≈ 4.9·10^(−324)
Note: 1074 * Math.log(2) / Math.log(10) = 323.306
Math.pow(10, 1 - 0.306) = 4.943
So there you have it, range is now
[- Min subnormal number, + Max normal number]
or
[- 4.9 * 10^(−324), + 1.79769 *10^308]

Why maximum integer number multiplication gives 1 as result

I think much explanation is not required, why below calculation gives result as 1?
int a = 2147483647;
int b = 2147483647;
int c = a * b;
long d = a * b;
double e = a * b;
System.out.println(c); //1
System.out.println(d); //1
System.out.println(e); //1.0
The binary representation of the integer number 2147483647 is as following:
01111111 11111111 11111111 11111111
Multiplying this with itself results in the number 4611686014132420609 whose binary representation is:
00111111 11111111 11111111 11111111 00000000 00000000 00000000 00000001
This is too large for the int type which has only 32 bits. The multiplication of a * b is done as an integer multiplication only, regardless of the variable's type to which the result is assigned (which might do a widening conversion, but only after the multiplication).
So, the result simply cuts off all bits that do not fit into the 32 bits, leaving only the following result:
00000000 00000000 00000000 00000001
And this simply is the value 1.
If you want keep the information, you must do the multiplication with the long type which has 64 bits:
long a = 2147483647;
long b = 2147483647;
long mult = a * b;
System.out.println((int) mult); // 1
System.out.println(mult); // 4611686014132420609
System.out.println((double) mult); // 4.6116860141324206E18
If you need more bits for a calculation you might consider BigInteger (for integral numbers) or BigDecimal (for decmial numbers).
2147483647 * 2147483647 = 4611686014132420609
Which in Hexa = ‭3FFFFFFF 00000001‬, after truncation only 00000001 remains which represents 1.
First of all, the reason that the three attempts all give the same answer is that they are all performing 32 bit multiplications and the multiplication overflows, resulting in "loss of information". The overflow / loss of information happens before the value of the RHS1 expression is assigned to the variable on the LHS.
In the 2nd and 3rd case you could cause the expression to be evaluated using 64 bit or floating point:
int c = a * b;
long d = ((long) a) * b;
double e = ((double) a) * b;
and you would not get overflow.
As to why you get overflow in the 32 bit case, that is simple. The result is larger than 32 bits. The other answers explain why the answer is 1.
Just for fun, here is an informal proof.
Assume that we are talking about a modular number system with numbers in the range 2N-1 to 2N-1 - 1. In such a number system, X * 2N maps to zero ... for all integers X.
If we multiply the max value by itself we get
(2N-1 - 1) * (2N-1 - 1)
-> 22N-2 - 2 * 2N-1 + 1
-> 22N-2 - 2N + 1
Now map that into the original range:
22N-2 maps to 0
2N maps 0
1 maps to 1
0 + 0 + 0 -> 1
1 - LHS == left hand side, RHS == right hand side.
What you are seeing is simply the result of integer overflow, which follows this rule:
Integer.MAX_VALUE + 1 = Integer.MIN_VALUE
One way to see what is happening is to contrive Java's int type as ranging from -7 to 7, with the same rule still applying. Let's see what happens when we multiply 7*7:
7 + 7 = 14 -> -2 (7 x 2)
-2 + 7 = 5 (7 x 3)
5 + 7 = 12 -> -4 (7 x 4)
-4 + 7 = 3 (7 x 5)
3 + 7 = 10 -> -6 (7 x 6)
-6 + 7 = 1 (7 x 7, one is leftover)
The same thing is happening in your code, with 2147483647 overflowing according to:
2147483647 + 1 = -2147483648
Because:
2147483647 * 2147483647 = 4611686014132420609
Integer capacity = 4294967295
4611686014132420609 % 4294967295 = 1
integer has only 3 byte memory allocation as well as double has 8 byte memory allocation your multiplication is so large than its give only starting which has (0000 0001)2 = 1

convert float to double losing precision [duplicate]

I've been trying to find out the reason, but I couldn't.
Can anybody help me?
Look at the following example.
float f = 125.32f;
System.out.println("value of f = " + f);
double d = (double) 125.32f;
System.out.println("value of d = " + d);
This is the output:
value of f = 125.32
value of d = 125.31999969482422
The value of a float does not change when converted to a double. There is a difference in the displayed numerals because more digits are required to distinguish a double value from its neighbors, which is required by the Java documentation. That is the documentation for toString, which is referred (through several links) from the documentation for println.
The exact value for 125.32f is 125.31999969482421875. The two neighboring float values are 125.3199920654296875 and 125.32000732421875. Observe that 125.32 is closer to 125.31999969482421875 than to either of the neighbors. Therefore, by displaying “125.32”, Java has displayed enough digits so that conversion back from the decimal numeral to float reproduces the value of the float passed to println.
The two neighboring double values of 125.31999969482421875 are 125.3199996948242045391452847979962825775146484375 and 125.3199996948242329608547152020037174224853515625.
Observe that 125.32 is closer to the latter neighbor than to the original value (125.31999969482421875). Therefore, printing “125.32” does not contain enough digits to distinguish the original value. Java must print more digits in order to ensure that a conversion from the displayed numeral back to double reproduces the value of the double passed to println.
When you convert a float into a double, there is no loss of information. Every float can be represented exactly as a double.
On the other hand, neither decimal representation printed by System.out.println is the exact value for the number. An exact decimal representation could require up to about 760 decimal digits. Instead, System.out.println prints exactly the number of decimal digits that allow to parse the decimal representation back into the original float or double. There are more doubles, so when printing one, System.out.println needs to print more digits before the representation becomes unambiguous.
The conversion from float to double is a widening conversion, as specified by the JLS. A widening conversion is defined as an injective mapping of a smaller set into its superset. Therefore the number being represented does not change after a conversion from float to double.
More information regarding your updated question
In your update you added an example which is supposed to demonstrate that the number has changed. However, it only shows that the string representation of the number has changed, which indeed it has due to the additional precision acquired through the conversion to double. Note that your first output is just a rounding of the second output. As specified by Double.toString,
There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type double.
Since the adjacent values in the type double are much closer than in float, more digits are needed to comply with that ruling.
The 32bit IEEE-754 floating point number closest to 125.32 is in fact 125.31999969482421875. Pretty close, but not quite there (that's because 0.32 is repeating in binary).
When you cast that to a double, it's the value 125.31999969482421875 that will be made into a double (125.32 is nowhere to be found at this point, the information that it should really end in .32 is completely lost) and of course can be represented exactly by a double. When you print that double, the print routine thinks it has more significant digits than it really has (but of course it can't know that), so it prints to 125.31999969482422, which is the shortest decimal that rounds to that exact double (and of all decimals of that length, it is the closest).
The issue of the precision of floating-point numbers is really language-agnostic, so I'll be using MATLAB in my explanation.
The reason you see a difference is that certain numbers are not exactly representable in fixed number of bits. Take 0.1 for example:
>> format hex
>> double(0.1)
ans =
3fb999999999999a
>> double(single(0.1))
ans =
3fb99999a0000000
So the error in the approximation of 0.1 in single-precision gets bigger when you cast it as double-precision floating-point number. The result is different from its approximation if you started directly in double-precision.
>> double(single(0.1)) - double(0.1)
ans =
1.490116113833651e-09
As already explained, all floats can be exactly represented as a double and the reason for your issue is that System.out.println performs some rounding when displaying the value of a float or double but the rounding methodology is not the same in both cases.
To see the exact value of the float, you can use a BigDecimal:
float f = 125.32f;
System.out.println("value of f = " + new BigDecimal(f));
double d = (double) 125.32f;
System.out.println("value of d = " + new BigDecimal(d));
which outputs:
value of f = 125.31999969482421875
value of d = 125.31999969482421875
it won`t work in java because in java by default it will take real values as double and if we declare a float value without float representation
like
123.45f
by default it will take it as double and it will cause an error as loss of precision
The representation of the values changes due to contracts of the methods that convert numerical values to a String, correspondingly java.lang.Float#toString(float) and java.lang.Double#toString(double), while the actual value remains the same. There is a common part in Javadoc of both aforementioned methods that elaborates requirements to values' String representation:
There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values
To illustrate the similarity of significant parts for values of both types, the following snippet can be run:
package com.my.sandbox.numbers;
public class FloatToDoubleConversion {
public static void main(String[] args) {
float f = 125.32f;
floatToBits(f);
double d = (double) f;
doubleToBits(d);
}
private static void floatToBits(float floatValue) {
System.out.println();
System.out.println("Float.");
System.out.println("String representation of float: " + floatValue);
int bits = Float.floatToIntBits(floatValue);
int sign = bits >>> 31;
int exponent = (bits >>> 23 & ((1 << 8) - 1)) - ((1 << 7) - 1);
int mantissa = bits & ((1 << 23) - 1);
System.out.println("Bytes: " + Long.toBinaryString(Float.floatToIntBits(floatValue)));
System.out.println("Sign: " + Long.toBinaryString(sign));
System.out.println("Exponent: " + Long.toBinaryString(exponent));
System.out.println("Mantissa: " + Long.toBinaryString(mantissa));
System.out.println("Back from parts: " + Float.intBitsToFloat((sign << 31) | (exponent + ((1 << 7) - 1)) << 23 | mantissa));
System.out.println(10D);
}
private static void doubleToBits(double doubleValue) {
System.out.println();
System.out.println("Double.");
System.out.println("String representation of double: " + doubleValue);
long bits = Double.doubleToLongBits(doubleValue);
long sign = bits >>> 63;
long exponent = (bits >>> 52 & ((1 << 11) - 1)) - ((1 << 10) - 1);
long mantissa = bits & ((1L << 52) - 1);
System.out.println("Bytes: " + Long.toBinaryString(Double.doubleToLongBits(doubleValue)));
System.out.println("Sign: " + Long.toBinaryString(sign));
System.out.println("Exponent: " + Long.toBinaryString(exponent));
System.out.println("Mantissa: " + Long.toBinaryString(mantissa));
System.out.println("Back from parts: " + Double.longBitsToDouble((sign << 63) | (exponent + ((1 << 10) - 1)) << 52 | mantissa));
}
}
In my environment, the output is:
Float.
String representation of float: 125.32
Bytes: 1000010111110101010001111010111
Sign: 0
Exponent: 110
Mantissa: 11110101010001111010111
Back from parts: 125.32
Double.
String representation of double: 125.31999969482422
Bytes: 100000001011111010101000111101011100000000000000000000000000000
Sign: 0
Exponent: 110
Mantissa: 1111010101000111101011100000000000000000000000000000
Back from parts: 125.31999969482422
This way, you can see that values' sign, exponent are the same, while its mantissa was extended retained its significant part (11110101010001111010111) exactly the same.
The used extraction logic of floating point number parts: 1 and 2.
Both are what Microsoft refers to as "approximate number data types."
There's a reason. A float has a precision of 7 digits, and a double 15. But I have seen it happen many times that 8.0 - 1.0 - 6.999999999. This is because they are not guaranteed to represent a decimal number fraction exactly.
If you need absolute, invariable precision, go with a decimal, or integral type.

Categories