I'm taking an introductory course in programming (in java) and I'm interested in understanding why the ranges of the primitive data types in java are as they are. When it comes to integer-like data types like a byte, it seems easy to understand why it only accepts values from -128 to 127; a byte has a size of 8 bits, which can take up to 256 values, so we can assign to each of these 256 values a natural number, which I believe is what is happening behind the curtain. It also explains why 128 is excluded: we have 128 negative numbers, 127 positives, and zero. These add up to 256 already. There is no bijection between [1,256] and [-128,128].
This idea makes sense in regards to the ranges of all the integer-like data types, but when it comes to floating-points, one sees some strange ranges. For example the range of a double is [-2^{1074},(2-2^{-52})2^{1023}]. Why is this the case?
I apologize for the cumbersome notation, apparently I can't use latex here.
Java uses the IEEE-754 binary64 format. In this format, one bit represents a sign (+ or − according to whether the bit is 0 or 1), eight bits are used for an exponent, and 52 bits are used for the primary encoding of the significant. One bit of the 53-bit significand is encoded via the exponent.
The values of the exponent field range from 0 to 2047. 2047 is reserved for use with infinities and NaNs. 0 is used for subnormal numbers. The values 1 to 2046 are used for normal numbers. In this range, an exponent field value of E represents an exponent e = E−1023. So the lowest value of e is 1−1023 = −1022, and the highest value is 2046−1023 = 1023. An exponent field value of 0 also represents the lowest value of E, −1022.
The 53-bit significand represents a binary numeral d.ddd…ddd2, where the first d is 0 if the exponent field is 0 and 1 if the exponent field is 1 to 2046. There are 52 bits after the “.”, and they are given by the primary significand field.
Let S be the sign bit field, E be the value of the exponent field, and F be the value of the primary significand field as an integer. Then the value represented is:
if E is 1 to 2046, (−1)S • 2E−1023 • (1 + F•2−52),
if E is 0, (−1)S • 2−1022 • (0 + F•2−52).
Now we can see the smallest positive number is represented when S is 1, E is 0, and F is 1 (00000000000000000000000000000000000000000000000000012). Then the value represented is (−1)0 • 2−1022 • (0 + 1•2−52) = +1 • 2−1022 • 2−52 = 2−1074.
The greatest finite number is represented when S is 1, E is 2046, and F is 252−1 (11111111111111111111111111111111111111111111111111112). Then the value represented is (−1)0 • 22046−1023 • (1 + (252−1)•2−52 = +1 • 21023 • (1 + 1 − 2−52) = 21023 • (21 − 2−52) = 21024 − 2971.
It's the same thing for floating-point values as for integer values - the number of bits available.
An IEEE double-length floating point value has 64 bits, used as follows:
Sign bit: 1 bit
Exponent: 11 bits
Significand precision: 52 bits
The significand is a binary fraction, the maximum value (all bits set) is therefore somewhat less than one.
The exponent has value 0 to 2047, or -1024 to +1023. That gives you the approximate range of 2 to the -1024 to 2 to the +1023 (it's actually less since a couple of values are reserved for specific use).
Wikiipedia for more exact details
Related
According to the documentation for java.util.Random, the Java class
implements nextFloat by next(24) / ((float)(1 << 24)) (ie. a random non-negative integer with 24 bits divided by 224). The documentation claims that all 224 possible values can be returned. They are between 0 (inclusive) and 1 (exclusive). However, I don't think that is true.
First, note that according to the IEC 559 (IEEE 754) standard, float has 23 fraction bits. But there is an implicit leading bit (to the left of the binary point) with value 1, unless the exponent is stored with all zeros. Therefore, it is true that there are a total of 224 values of type float that are between 0 (inclusive - not counting negative zero) and 1 (exclusive), but exactly half of these numbers are subnormal (all bits in the exponent are 0), which makes them all less than 2-126. Therefore, none of these numbers can be generated by the implementation. This is because they are all strictly smaller than 2-24 which is used in the implementation.
The layout of float can be found at Single-precision floating-point format.
So what is it that I am missing?
In the interval [+0, 1), there are 127•223 values representable in float, not 224. There is one for each combination of an exponent encoding field from 0 to 126, inclusive, with each value of 23 bits in the significand encoding field.
Every value in the form m•2−24, with 0 ≤ m < 224, is representable. The smallest non-zero value in this form is 2−24, which is represented with an exponent code of 103 and a significand code of 0. The mathematical exponent is the code, 103, minus the bias, 127, which equals −24, and the mathematical significand is 1.
For any such m other than zero, let b be the position number of its leading 1 bit (numbering from 0 for the low bit). Then m•2−24 is encoded in float with an exponent code of b+103 and a significand code of m•224− b−224. For m = 0, it is encoded with all zero bits.
None of the numbers in this form are subnormal.
Every number of type float has 23 fraction bits and 8 exponent bits.
We show how to use float to precisely represent every number n of the form 0.b1b2b3...b24.
Let i be the smallest number such that bi is non-zero.
If no such i exists then n=0, which can be represented in float. Otherwise,
n = 2-i2i0.b1b2b3... b24.
The part 2-i can clearly be represented by the 8 bits
of exponents in float.
Furthermore, the part
2i0.b1b2b3... b24
is of the form 1.bi+1bi+2...b24 which has at most 23 fraction bits.
Therefore, n can be precisely represented in float.
All that remains is to uniformly generate random integers of the form
b1b2b3... b24 and then divide them by 224 which is exactly what Oracle is suggesting.
As I understand it java will store a float in memory as a 32 bit integer with the following properties:
The first bit is used to determine the sign
The next 8 bits represent the exponent
The final 23 bits are used to store the fraction
This leaves no spare bits for the three special cases:
NaN
Positive Infinity
Negative Infinity
I can guess that negative 0 could be used to store one of these.
How are these actually represented in memory?
Java specifies that floating point numbers follow the IEEE 754 standard.
This is how it's stored:
bit 0 : sign bit
bits 1 to 11 : exponent
bits 12 to 63 : fraction
Now, I have executed below method with different double values:
public static void print(double d){
System.out.println(Long.toBinaryString(Double.doubleToRawLongBits(d)));
}
I executed with these values:
print(Double.NaN);
print(Double.NEGATIVE_INFINITY);
print(Double.POSITIVE_INFINITY);
print(-Double.MAX_VALUE);
print(Double.MAX_VALUE);
And got the following output for the values above (formatted for readability):
NaN: 0111111111111000000000000000000000000000000000000000000000000000
-Inf: 1111111111110000000000000000000000000000000000000000000000000000
+Inf: 0111111111110000000000000000000000000000000000000000000000000000
-Max: 1111111111101111111111111111111111111111111111111111111111111111
+Max: 0111111111101111111111111111111111111111111111111111111111111111
Wikipedia explains that when the exponent field is all-bits-1, the number is either Inf or NaN. Inf has all bits of the mantissa zero; NaN has at least one bit in the mantissa set to 1. The sign bit retains its normal meaning for Inf but is not meaningful for NaN. Java's Double.NaN is one particular value that will be interpreted as NaN, but there are 253−3 others.
From here:
Q. How are zero, infinity and NaN represented using IEEE 754?
A. By setting all the exponent bits to 1. Positive infinity =
0x7ff0000000000000 (all exponent bits 1, sign bit 0 and all mantissa
bits 0), negative infinity = 0xfff0000000000000 (all exponent bits 1,
sign bit 1 and all mantissa bits 0), NaN = 0x7ff8000000000000 (all
exponent bits 1, at least one mantissa bit set). Positive zero = all
bits 0. Negative zero = all bits 0, except sign bit which is 1.
Also refer the Javadocs about NAN, Positive Infinity and Negative Infinity.
As described in Wikipedia, the exponent with all bits set to 1 is used to identify those numbers. The fraction field set to 0 is used to identify infinity (positive or negative, as identified by the sign), and a non-zero fraction field identifies a NaN value.
Java uses IEEE 754 floating point.
Most numbers are expressed in an sign-exponent-mantissa format with the mantissa having an implicit leading 1.
The extreme values of the exponent (all zeros and all ones) field are not used as normal exponent values. Instead they are used to represent special cases.
All zeros in the exponent feild is used to represent numbers (including both positive and negative zero) that are too small to represent in the normal format.
All ones in the exponenent is used to represent special values. If all the bits in the mantissa are zero then the value is plus or minus infinity (sign indicated by the sign bit). Otherwise the value is NaN.
First of all we have to learn how the number is represented as the float point and double in the memory.
The general number is of the form: 1.M * 2^e.
(where the M is called mantissa and the e is the exponent in the excess-127)
In floating point
The MSB(Most significant bit) is used as sign bit and the bit number from 23 to 31 is used for the exponential value in the form of excess-127 and the bit number from 0 to 30 is used for storing the mantissa.
In Double
The MSB(Most significant bit) is used as sign bit and the bit number from 52 to 63 is used for the exponential value in the form of excess-127 and the bit number from 0 to is used for storing the mantissa.
so now we are in position to understand the NaN, Infinity representation in the float or double.
NaN(Not an Number)
In the representation of the NaN all the Exponent bits are 1 and the Mantissa bits can be anything and it does not matter that it is in float or decimal.
Infinity
In the representation of the Infinity all the Exponent bits are 1 and the Mantissa bits are 0 and it does not matter that it is in float or decimal.
The positive Infinity is represent just by same as above but the sign bit is 0 and the negative infinity is represented also just by same but the sign bit is here 1.
Here: http://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html#jls-4.2.3
it says that:
The finite nonzero values of any floating-point value set can all be expressed in the form s · m · 2^(e - N + 1), where s is +1 or -1, m is a positive integer less than 2^N, and e is an integer between Emin = -(2^(K-1)-2) and Emax = 2^(K-1)-1, inclusive, and where N and K are parameters that depend on the value set.
and there is a table below:
Parameter float
N 24
K 8
So let's say N = 24 and K = 8 then we can have the following value from the formula:
s · 2^N · 2^(2^(K-1)-1 - N + 1) which gives us according to values specified in the table:
s * 2^24 * 2^(127 - 24) which is equal to s * 2^127. But float has only 32 bits so it's not possible to store in it such a big number.
So it's obvious that initial formula should be read in a different way. How then?
Also in javadoc for Float max value: http://docs.oracle.com/javase/7/docs/api/java/lang/Float.html#MAX_VALUE
it says:
A constant holding the largest positive finite value of type float, (2-2^-23)·2^127
This also doesn't make sense, as resulting value is much larger than 2^32 - which is possible the biggest value that can be stored in float variable. So again, I'm misreading this notation. So how it should be read?
The idea with the floating point notation is to store a much larger range of numbers than can be stored in the same space (bytes) with the integer representation. So, for example, you say that the "resulting value is much larger than 2^32". But, that would only be a problem if we're storing a typical binary number as one computes in a typical math class.
Instead, floating point representations break those 32 bytes into two main parts:
- significand
- exponent
For simplicity, imagine that 3 bytes are used for the significand and 1 byte for the exponent. Also assume that each of these is your typical binary integer style of representation. So, the three bytes can have a value 2^24, or 2^23 if you want to keep one bit for the sign.
However, the other byte can store up to 2^7 (if you want a sign there too).
So, you could express 500^100, by storing the 500 in the three bytes and the 100 in the 1 byte.
Essentially, one cannot store every number precisely. One changes it into significant form and one can store as many significant digits as the portion reserved for the significand (3 bytes in this example).
Rather than try to explain the complications, check this Wikipedia article for more.
I was looking into 32-bit and 64-bit. I noticed that the range of integer values that can stored in 32 bits is ±4,294,967,295 but the Java int is also 32-bit (If I am not mistaken) and it stores values up to ±2 147 483 648. Same thing for long, it stores values from 0 to ±2^63 but 64-bit stores ±2^64 values. How come these values are different?
Integers in Java are signed, so one bit is reserved to represent whether the number is positive or negative. The representation is called "two's complement notation." With this approach, the maximum positive value represented by n bits is given by
(2 ^ (n - 1)) - 1
and the corresponding minimum negative value is given by
-(2 ^ (n - 1))
The "off-by-one" aspect to the positive and negative bounds is due to zero. Zero takes up a slot, leaving an even number of negative numbers and an odd number of positive numbers. If you picture the represented values as marks on a circle—like hours on a clock face—you'll see that zero belongs more to the positive range than the negative range. In other words, if you count zero as sort of positive, you'll find more symmetry in the positive and negative value ranges.
To learn this representation, start small. Take, say, three bits and write out all the numbers that can be represented:
0
1
2
3
-4
-3
-2
-1
Can you write the three-bit sequence that defines each of those numbers? Once you understand how to do that, try it with one more bit. From there, you imagine how it extends up to 32 or 64 bits.
That sequence forms a "wheel," where each is formed by adding one to the previous, with noted wraparound from 3 to -4. That wraparound effect (which can also occur with subtraction) is called "modulo arithemetic."
In 32 bit you can store 2^32 values. If you call these values 0 to 4294967295 or -2147483648 to +2147483647 is up to you. This difference is called "signed type" versus "unsigned type". The language Java supports only signed types for int. Other languages have different types for an unsigned 32bit type.
NO laguage will have a 32bit type for ±4294967295, because the "-" part would require another bit.
That's because Java ints are signed, so you need one bit for the sign.
when i am looking at the wikipedia page for Offset Binary, i cant follow the following sentence:
Unusually however, instead of using "excess 2^(n-1)" it uses "excess
2^(n-1)-1" which means that inverting the leading (high-order) bit of
the exponent will not convert the exponent to correct twos' complement
notation.
can anyone explain it in details and give me some examples?
This is the range of exponents which allows you to calculate 1/Float.MAX_VALUE and 1/Float.MIN_NORMAL without going to zero or Infinity. If there was one more negative exponent and one less positive exponent, (with an offset of 128) 1/Float.MIN_NORMAL would be Infinity.
The exponent in floating point is offset rather than plain two-complement.
e.g. for double a 0 exponent is 11-bit value for 1023 or 0b0111111111, -1 is 1022 or 0b0111111110, +1 is 1024 0b10000000000.
In twos-complement, the number would be 0 is 0b00000000000, -1 is 0b11111111111, and +1 is 0b00000000001
A property of using an offset, is that the maximum is half the possible number of values. i.e. for 11 bits the range is -1023 to 1024 instead of -1024 to 1023.
Another property is that number can be compared by just comparing the integer values.
i.e.
long l1 = Double.doubleToRawLongBits(d1);
long l2 = Double.doubleToRawLongBits(d2);
if (l1 > l2) // like d1 > d2
The only difference is in the handling of NaN and -0.0 The approach Double.compare(double, double) uses is based on this.