Offset binary format for float in java

Offset binary format for float in java - java

when i am looking at the wikipedia page for Offset Binary, i cant follow the following sentence:
Unusually however, instead of using "excess 2^(n-1)" it uses "excess
2^(n-1)-1" which means that inverting the leading (high-order) bit of
the exponent will not convert the exponent to correct twos' complement
notation.
can anyone explain it in details and give me some examples?

This is the range of exponents which allows you to calculate 1/Float.MAX_VALUE and 1/Float.MIN_NORMAL without going to zero or Infinity. If there was one more negative exponent and one less positive exponent, (with an offset of 128) 1/Float.MIN_NORMAL would be Infinity.
The exponent in floating point is offset rather than plain two-complement.
e.g. for double a 0 exponent is 11-bit value for 1023 or 0b0111111111, -1 is 1022 or 0b0111111110, +1 is 1024 0b10000000000.
In twos-complement, the number would be 0 is 0b00000000000, -1 is 0b11111111111, and +1 is 0b00000000001
A property of using an offset, is that the maximum is half the possible number of values. i.e. for 11 bits the range is -1023 to 1024 instead of -1024 to 1023.
Another property is that number can be compared by just comparing the integer values.
i.e.
long l1 = Double.doubleToRawLongBits(d1);
long l2 = Double.doubleToRawLongBits(d2);
if (l1 > l2) // like d1 > d2
The only difference is in the handling of NaN and -0.0 The approach Double.compare(double, double) uses is based on this.

Related

Long Representation vs Double representation of positive and negative zero in java

I was wondering about the differences between positive and negative zero in different numeric types.
I understand the IEEE-754 for floating point arithmetic and bit representation in double precision so the following didn't come as a surprise
double posz = 0.0;
double negz = -0.0;
System.out.println(Long.toBinaryString(Double.doubleToLongBits(posz)));
System.out.println(Long.toBinaryString(Double.doubleToLongBits(negz)));
// output
>>> 0
>>> 1000000000000000000000000000000000000000000000000000000000000000
What did surprise me and showed me that im clueless about the bit representation of long type in java is that even if i shift right (unsigned >>>) then the binary representation of both positive and negative zero is the same
long posz = 0L;
long negz = -0L;
for (int i = 63; i >= 0; i--) {
System.out.print((posz >>> i) & 1);
}
System.out.println();
for (int i = 63; i >= 0; i--) {
System.out.print((negz >>> i) & 1);
}
// output
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> 0000000000000000000000000000000000000000000000000000000000000000
so i am wondering what does java do from a bit representation when i write the following
long posz = 0L;
long negz = -0L;
Does the compiler understand that they are both zero and disregards the sign (and so assignes 0 to the sign bit) or is there other magic here?

or is there other magic here?
Yes. 2's complement.
2's complement is a bit magical. It accomplishes 2 major objectives. Before getting into that, let's first stew on the notion of negative zero for a moment.
Negative zero is kinda weird. Why does it exist at all?
Negative zero isn't actually a thing. Ask any mathematician "Hey, so, what's up with negative zero?" and they'll just look at you in befuddlement. It's not a thing. Mathematically, 0 and -0 are utterly identical. Not just 'nearly identical', but 100%, fully, in all possible ways, identical. We don't generally want our numbers to be capable of representing both 5.0 as well as 5.00 - as those two are entirely, 100%, identical. If you don't think that a value system ought to waste bits trying to differentiate between 5.0 and 5.00, then it's equally bizarro to want the ability to represent -0.0 and +0.0 as distinct entities.
So, wanting -0 in the first place is kinda weird. All the numeric primitives (long, int, short, byte, and I guess char which is technically numeric too) all cannot represent this number. Instead, long z = -0 boils down to:
Take the constant "0".
Apply the 'negate' operation to this number (- is a unary operator. Just like 2+5 makes the system calculate the binary operation of "addition" on elements 2 and 5, -x makes the system calculate the unary operation of "negation" on element x. Applying the negation operation to 0 produces 0. It's no different from writing, say, int x = 5 + 0;. That +0 part doesn't do anything. The - in front of -0 doesn't do anything. In contrast to -0.0 where it does do something (gets you negative zero, the double value, instead of positive zero).
Store this result in z (so, just 0 then).
There is no way to tell if that minus is there. They both result in ALL ZERO bits, and hence, there is no way for the computer to tell if you initialized that variable with the expression -0 or with +0. Again in contrast to double where as you noticed there's a bit different.
So why does double have it then?
Let's stew a bit on the notion of doubles and IEEE-754 math.
A double takes 64 bits. From basic pure mathematical principles then, a double is as incapable of representing more than 2^64 different possible values you are capable of breaking the speed of light or making 1+1=3.
And yet, a double aims to represent all numbers. There are way more numbers between 0 and 1 than 2^64 options (in fact, an infinite amount of numbers exist between 0 and 1), and that's just 0 to 1.
So, how doubles actually work is different. A few less than 2^64 numbers are chosen from the entire number line. Let's call these the blessed numbers.
The blessed numbers are not equally distributed. The closer you are to 1, the more blessed numbers exist. In other words, the distance between 2 blessed numbers increases as you move away from 1. For example, if you go from, say, 1e100 (a 1 with a hundred zeroes) and want to find the next blessed number, it's quite a ways. It's in fact higher than 1.0! - 1e100+1 is in fact 1e100 again, because the way double math works is that after every single last mathematical operation you to do them, the end result is rounded to the nearest blessed number.
Let's try it!
double d = 1e100;
System.out.println(d);
System.out.println(d + 1);
// prints: 1.0E100
// 1.0E100
But that means.. double values don't actually represent a single number!!. What any given double represents is in fact this concept:
An unknown number whose value lies between [D - 𝛿, D + 𝛿], where D is the blessed number that is closed to this unknown number this value represents, and, and 𝛿 is half of the distance between D and the next nearest blessed number on either side.
Given that usually 𝛿 is incredibly small, this is 'good enough'. But this weirdness does explain why you really, really do not want any business at all with double if accuracy is important (such as with currencies. Don't store those in doubles, ever!)
Given that, what does -0.0 represent? not actually just 0. It represents, specifically: An unknown number whose value lies between [-𝛿, 0] where 0 is real zero (and this, has no sign), and 𝛿 is Double.MIN_VALUE: the smallest non-zero positive number representable with a double.
That's why -0.0 and +0.0 both exist: They are in fact different concepts. Rarely relevant, but sometimes it is. In contrast to e.g. long where 5 just means 5 and not "between 4.5 and 5.5", because longs fundamentally don't recognize that fractional parts exist in the first place. Given that 5 just means 5, then 0 just means 0, and there is no such thing as negative zero in the first place.
Now we get to 2's complement
2's complement is a cool system. It has two neat properties:
It only has the one zero.
It does not matter if you treat the bit sequence as signed-by-way-of-2s-complement or as unsigned, for the purposes of the operations: Addition, Substraction, Increment, Decrement, zero-check. The modifications you do to the bits to implement those operations is identical.
It DOES matter for greater than, less than, and divide.
2's complement works like this: To negate a number, take all bits and flip them (i.e. do a NOT operation on the bits). Then, add 1.
Let's try it!
int x = 5;
int y = -x;
for (int i = 31; i >= 0; i--) {
System.out.print((x >>> i) & 1);
}
System.out.println();
for (int i = 31; i >= 0; i--) {
System.out.print((y >>> i) & 1);
}
System.out.println();
// prints 00000000000000000000000000000101
// 11111111111111111111111111111011
As we can see, the 'flip all bits and add 1' algorithm was applied.
2s complement is, of course, reversible: If you do 'flip all bits and add 1' twice in a row you get the same number out.
Now let's try -0. 0 is 32 0 bits, then flip them all, then add 1:
00000000000000000000000000000000
11111111111111111111111111111111 // flip all
100000000000000000000000000000000 // add 1
00000000000000000000000000000000 // that 1 fell off
and because ints can only store 32 bits, that final '1' falls off of the end. And we're left with zero again.
Now let's go with bytes ( abit smaller) and try to add, say, 200 and 50 together.
11001000 // 200 in binary
00110010 // 50 in binary
-------- +
11111010 // 250 in binary.
now let's instead go: Oh wait, whoops, that was an error, actually these numbers are in 2s complement. That wasn't 200, nono. 11001000 is a bit sequence that actually means (let's apply the 'flip all bits, add 1' scheme: 00111000 - it's actually -56. So the operation was meant to represent '-56 + 50'. Which is -6. -6 in binary is (write out 6, flip bits, add 1):
00000110
11111001
11111010
hey now, look at that, nothing changed! It's the same result! So, when the computer does x + y, where x and y are numbers, the computer does not care. Whether x is "an unsigned number" or "a signed with 2s complement number", the operation is identical.
That's why 2s complement is applied. It makes math MUCH faster. The CPU doesn't have to futz about with branching out to deal with sign bits.
In this sense it is more correct to say that in java, int, long, char, byte and short are neither signed nor unsigned, they just are. At least for the purposes of +, -, ++, and --. No the idea that int is signed is fundamentally a property of e.g. System.out.println(int) - that method chooses to render the bitsequence 11111111111111111111111111111111 as "-1" instead of as 4294967296.

long has no such thing as negative zero. Only float and double have a different representation of positive and negative zero.

What explains the range of a double in java?

I'm taking an introductory course in programming (in java) and I'm interested in understanding why the ranges of the primitive data types in java are as they are. When it comes to integer-like data types like a byte, it seems easy to understand why it only accepts values from -128 to 127; a byte has a size of 8 bits, which can take up to 256 values, so we can assign to each of these 256 values a natural number, which I believe is what is happening behind the curtain. It also explains why 128 is excluded: we have 128 negative numbers, 127 positives, and zero. These add up to 256 already. There is no bijection between [1,256] and [-128,128].
This idea makes sense in regards to the ranges of all the integer-like data types, but when it comes to floating-points, one sees some strange ranges. For example the range of a double is [-2^{1074},(2-2^{-52})2^{1023}]. Why is this the case?
I apologize for the cumbersome notation, apparently I can't use latex here.

Java uses the IEEE-754 binary64 format. In this format, one bit represents a sign (+ or − according to whether the bit is 0 or 1), eight bits are used for an exponent, and 52 bits are used for the primary encoding of the significant. One bit of the 53-bit significand is encoded via the exponent.
The values of the exponent field range from 0 to 2047. 2047 is reserved for use with infinities and NaNs. 0 is used for subnormal numbers. The values 1 to 2046 are used for normal numbers. In this range, an exponent field value of E represents an exponent e = E−1023. So the lowest value of e is 1−1023 = −1022, and the highest value is 2046−1023 = 1023. An exponent field value of 0 also represents the lowest value of E, −1022.
The 53-bit significand represents a binary numeral d.ddd…ddd2, where the first d is 0 if the exponent field is 0 and 1 if the exponent field is 1 to 2046. There are 52 bits after the “.”, and they are given by the primary significand field.
Let S be the sign bit field, E be the value of the exponent field, and F be the value of the primary significand field as an integer. Then the value represented is:
if E is 1 to 2046, (−1)S • 2E−1023 • (1 + F•2−52),
if E is 0, (−1)S • 2−1022 • (0 + F•2−52).
Now we can see the smallest positive number is represented when S is 1, E is 0, and F is 1 (00000000000000000000000000000000000000000000000000012). Then the value represented is (−1)0 • 2−1022 • (0 + 1•2−52) = +1 • 2−1022 • 2−52 = 2−1074.
The greatest finite number is represented when S is 1, E is 2046, and F is 252−1 (11111111111111111111111111111111111111111111111111112). Then the value represented is (−1)0 • 22046−1023 • (1 + (252−1)•2−52 = +1 • 21023 • (1 + 1 − 2−52) = 21023 • (21 − 2−52) = 21024 − 2971.

It's the same thing for floating-point values as for integer values - the number of bits available.
An IEEE double-length floating point value has 64 bits, used as follows:
Sign bit: 1 bit
Exponent: 11 bits
Significand precision: 52 bits
The significand is a binary fraction, the maximum value (all bits set) is therefore somewhat less than one.
The exponent has value 0 to 2047, or -1024 to +1023. That gives you the approximate range of 2 to the -1024 to 2 to the +1023 (it's actually less since a couple of values are reserved for specific use).
Wikiipedia for more exact details

Why does Oracle claim java.util.Random.nextFloat() generates 2^24 possibilities and not 2^23?

According to the documentation for java.util.Random, the Java class
implements nextFloat by next(24) / ((float)(1 << 24)) (ie. a random non-negative integer with 24 bits divided by 224). The documentation claims that all 224 possible values can be returned. They are between 0 (inclusive) and 1 (exclusive). However, I don't think that is true.
First, note that according to the IEC 559 (IEEE 754) standard, float has 23 fraction bits. But there is an implicit leading bit (to the left of the binary point) with value 1, unless the exponent is stored with all zeros. Therefore, it is true that there are a total of 224 values of type float that are between 0 (inclusive - not counting negative zero) and 1 (exclusive), but exactly half of these numbers are subnormal (all bits in the exponent are 0), which makes them all less than 2-126. Therefore, none of these numbers can be generated by the implementation. This is because they are all strictly smaller than 2-24 which is used in the implementation.
The layout of float can be found at Single-precision floating-point format.
So what is it that I am missing?

In the interval [+0, 1), there are 127•223 values representable in float, not 224. There is one for each combination of an exponent encoding field from 0 to 126, inclusive, with each value of 23 bits in the significand encoding field.
Every value in the form m•2−24, with 0 ≤ m < 224, is representable. The smallest non-zero value in this form is 2−24, which is represented with an exponent code of 103 and a significand code of 0. The mathematical exponent is the code, 103, minus the bias, 127, which equals −24, and the mathematical significand is 1.
For any such m other than zero, let b be the position number of its leading 1 bit (numbering from 0 for the low bit). Then m•2−24 is encoded in float with an exponent code of b+103 and a significand code of m•224− b−224. For m = 0, it is encoded with all zero bits.
None of the numbers in this form are subnormal.

Every number of type float has 23 fraction bits and 8 exponent bits.
We show how to use float to precisely represent every number n of the form 0.b1b2b3...b24.
Let i be the smallest number such that bi is non-zero.
If no such i exists then n=0, which can be represented in float. Otherwise,
n = 2-i2i0.b1b2b3... b24.
The part 2-i can clearly be represented by the 8 bits
of exponents in float.
Furthermore, the part
2i0.b1b2b3... b24
is of the form 1.bi+1bi+2...b24 which has at most 23 fraction bits.
Therefore, n can be precisely represented in float.
All that remains is to uniformly generate random integers of the form
b1b2b3... b24 and then divide them by 224 which is exactly what Oracle is suggesting.

How are NaN and Infinity of a float or double stored in memory?

As I understand it java will store a float in memory as a 32 bit integer with the following properties:
The first bit is used to determine the sign
The next 8 bits represent the exponent
The final 23 bits are used to store the fraction
This leaves no spare bits for the three special cases:
NaN
Positive Infinity
Negative Infinity
I can guess that negative 0 could be used to store one of these.
How are these actually represented in memory?

Java specifies that floating point numbers follow the IEEE 754 standard.
This is how it's stored:
bit 0 : sign bit
bits 1 to 11 : exponent
bits 12 to 63 : fraction
Now, I have executed below method with different double values:
public static void print(double d){
System.out.println(Long.toBinaryString(Double.doubleToRawLongBits(d)));
}
I executed with these values:
print(Double.NaN);
print(Double.NEGATIVE_INFINITY);
print(Double.POSITIVE_INFINITY);
print(-Double.MAX_VALUE);
print(Double.MAX_VALUE);
And got the following output for the values above (formatted for readability):
NaN: 0111111111111000000000000000000000000000000000000000000000000000
-Inf: 1111111111110000000000000000000000000000000000000000000000000000
+Inf: 0111111111110000000000000000000000000000000000000000000000000000
-Max: 1111111111101111111111111111111111111111111111111111111111111111
+Max: 0111111111101111111111111111111111111111111111111111111111111111
Wikipedia explains that when the exponent field is all-bits-1, the number is either Inf or NaN. Inf has all bits of the mantissa zero; NaN has at least one bit in the mantissa set to 1. The sign bit retains its normal meaning for Inf but is not meaningful for NaN. Java's Double.NaN is one particular value that will be interpreted as NaN, but there are 253−3 others.

From here:
Q. How are zero, infinity and NaN represented using IEEE 754?
A. By setting all the exponent bits to 1. Positive infinity =
0x7ff0000000000000 (all exponent bits 1, sign bit 0 and all mantissa
bits 0), negative infinity = 0xfff0000000000000 (all exponent bits 1,
sign bit 1 and all mantissa bits 0), NaN = 0x7ff8000000000000 (all
exponent bits 1, at least one mantissa bit set). Positive zero = all
bits 0. Negative zero = all bits 0, except sign bit which is 1.
Also refer the Javadocs about NAN, Positive Infinity and Negative Infinity.

As described in Wikipedia, the exponent with all bits set to 1 is used to identify those numbers. The fraction field set to 0 is used to identify infinity (positive or negative, as identified by the sign), and a non-zero fraction field identifies a NaN value.

Java uses IEEE 754 floating point.
Most numbers are expressed in an sign-exponent-mantissa format with the mantissa having an implicit leading 1.
The extreme values of the exponent (all zeros and all ones) field are not used as normal exponent values. Instead they are used to represent special cases.
All zeros in the exponent feild is used to represent numbers (including both positive and negative zero) that are too small to represent in the normal format.
All ones in the exponenent is used to represent special values. If all the bits in the mantissa are zero then the value is plus or minus infinity (sign indicated by the sign bit). Otherwise the value is NaN.

First of all we have to learn how the number is represented as the float point and double in the memory.
The general number is of the form: 1.M * 2^e.
(where the M is called mantissa and the e is the exponent in the excess-127)
In floating point
The MSB(Most significant bit) is used as sign bit and the bit number from 23 to 31 is used for the exponential value in the form of excess-127 and the bit number from 0 to 30 is used for storing the mantissa.
In Double
The MSB(Most significant bit) is used as sign bit and the bit number from 52 to 63 is used for the exponential value in the form of excess-127 and the bit number from 0 to is used for storing the mantissa.
so now we are in position to understand the NaN, Infinity representation in the float or double.
NaN(Not an Number)
In the representation of the NaN all the Exponent bits are 1 and the Mantissa bits can be anything and it does not matter that it is in float or decimal.
Infinity
In the representation of the Infinity all the Exponent bits are 1 and the Mantissa bits are 0 and it does not matter that it is in float or decimal.
The positive Infinity is represent just by same as above but the sign bit is 0 and the negative infinity is represented also just by same but the sign bit is here 1.

Floating point notation representation in java specification

Here: http://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html#jls-4.2.3
it says that:
The finite nonzero values of any floating-point value set can all be expressed in the form s · m · 2^(e - N + 1), where s is +1 or -1, m is a positive integer less than 2^N, and e is an integer between Emin = -(2^(K-1)-2) and Emax = 2^(K-1)-1, inclusive, and where N and K are parameters that depend on the value set.
and there is a table below:
Parameter float
N 24
K 8
So let's say N = 24 and K = 8 then we can have the following value from the formula:
s · 2^N · 2^(2^(K-1)-1 - N + 1) which gives us according to values specified in the table:
s * 2^24 * 2^(127 - 24) which is equal to s * 2^127. But float has only 32 bits so it's not possible to store in it such a big number.
So it's obvious that initial formula should be read in a different way. How then?
Also in javadoc for Float max value: http://docs.oracle.com/javase/7/docs/api/java/lang/Float.html#MAX_VALUE
it says:
A constant holding the largest positive finite value of type float, (2-2^-23)·2^127
This also doesn't make sense, as resulting value is much larger than 2^32 - which is possible the biggest value that can be stored in float variable. So again, I'm misreading this notation. So how it should be read?

The idea with the floating point notation is to store a much larger range of numbers than can be stored in the same space (bytes) with the integer representation. So, for example, you say that the "resulting value is much larger than 2^32". But, that would only be a problem if we're storing a typical binary number as one computes in a typical math class.
Instead, floating point representations break those 32 bytes into two main parts:
- significand
- exponent
For simplicity, imagine that 3 bytes are used for the significand and 1 byte for the exponent. Also assume that each of these is your typical binary integer style of representation. So, the three bytes can have a value 2^24, or 2^23 if you want to keep one bit for the sign.
However, the other byte can store up to 2^7 (if you want a sign there too).
So, you could express 500^100, by storing the 500 in the three bytes and the 100 in the 1 byte.
Essentially, one cannot store every number precisely. One changes it into significant form and one can store as many significant digits as the portion reserved for the significand (3 bytes in this example).
Rather than try to explain the complications, check this Wikipedia article for more.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.