Floating point notation representation in java specification

Floating point notation representation in java specification - java

Here: http://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html#jls-4.2.3
it says that:
The finite nonzero values of any floating-point value set can all be expressed in the form s · m · 2^(e - N + 1), where s is +1 or -1, m is a positive integer less than 2^N, and e is an integer between Emin = -(2^(K-1)-2) and Emax = 2^(K-1)-1, inclusive, and where N and K are parameters that depend on the value set.
and there is a table below:
Parameter float
N 24
K 8
So let's say N = 24 and K = 8 then we can have the following value from the formula:
s · 2^N · 2^(2^(K-1)-1 - N + 1) which gives us according to values specified in the table:
s * 2^24 * 2^(127 - 24) which is equal to s * 2^127. But float has only 32 bits so it's not possible to store in it such a big number.
So it's obvious that initial formula should be read in a different way. How then?
Also in javadoc for Float max value: http://docs.oracle.com/javase/7/docs/api/java/lang/Float.html#MAX_VALUE
it says:
A constant holding the largest positive finite value of type float, (2-2^-23)·2^127
This also doesn't make sense, as resulting value is much larger than 2^32 - which is possible the biggest value that can be stored in float variable. So again, I'm misreading this notation. So how it should be read?

The idea with the floating point notation is to store a much larger range of numbers than can be stored in the same space (bytes) with the integer representation. So, for example, you say that the "resulting value is much larger than 2^32". But, that would only be a problem if we're storing a typical binary number as one computes in a typical math class.
Instead, floating point representations break those 32 bytes into two main parts:
- significand
- exponent
For simplicity, imagine that 3 bytes are used for the significand and 1 byte for the exponent. Also assume that each of these is your typical binary integer style of representation. So, the three bytes can have a value 2^24, or 2^23 if you want to keep one bit for the sign.
However, the other byte can store up to 2^7 (if you want a sign there too).
So, you could express 500^100, by storing the 500 in the three bytes and the 100 in the 1 byte.
Essentially, one cannot store every number precisely. One changes it into significant form and one can store as many significant digits as the portion reserved for the significand (3 bytes in this example).
Rather than try to explain the complications, check this Wikipedia article for more.

Related

Assign negative int to long in java

I've 2 integer values stored in a bytebuffer, in little-endian format. These integers are actually the 32-bit pieces of a long. I've to store them as a class' member variables, loBits and hiBits.
This is what I did:
long loBits = buffer.getInt(offset);
long hiBits = buffer.getInt(offset + Integer.BYTES);
I want to know why directly assigning signed int to long is wrong. I kind of know what's going on, but would really appreciate an explanation.
The int I read from the buffer is signed (because Java). If it is negative then directly assigning it to a long value (or casting it like (long)) would change all the higher order bits in the long to the signed bit value.
For e.g. Hex representation of an int, -1684168480 is 9b9da0e0. If I assign this int to a long, all higher order 32 bits would become F.
int negativeIntValue = -1684168480;
long val1 = negativeIntValue;
long val2 = (long) negativeIntValue;
Hex representation of:
negativeIntValue is 0x9b9da0e0
val1 is 0xffffffff9b9da0e0
val2 is 0xffffffff9b9da0e0
However, if I mask the negativeIntValue with 0x00000000FFFFFFFFL, I get a long which has the same hex representation as negativeIntValue and a positive long value of 2610798816.
So my questions are:
Is my understanding correct?
Why does this happen?

Yes, your understanding is correct (at least if I understood your understanding correctly).
The reason this happens is because (most) computers use 2's complement to store signed values. So when assigning a smaller datatype to a larger one, the value is sign extended meaning that the excess part of the datatype is filled with 0 or 1 bits depending on whether the original value was positive or negative.
Also related is the difference between >> and >>> operators in Java. The first one performs sign extending (keeping negative values negative) the second one does not (shifting a negative value makes it positive).

The reason for this is that negative values are stored as two's complement.
Why do we use two's complement?
In a fixed width numbering system what happens, if you substract 1 from 0?
0000b - 0001b -> 1111b
and what is the next lesser number to 0? It is -1.
Therfore we thread a binary number with all bits set (for a signed datatype) as -1
The big advantage is that the CPU does not need to do any special operation when changing from positive to negative numbers. It handles 5 - 3 the same as 3 - 5

Calculating number of bits and number of words BigInteger

While converting a String into BigInteger, Java internally calculates the number of bits and then the number of words(each word is a group of 9 integers i think) in a BigInteger as can be seen here from Line 325 to Line 327. numWords is used then to create an array that can accomodate that BigInteger.
I don't understand the logic used for calculating numBits in line 325 and then the logic for numWords in Line 326.
Logically i think that for the string "123456789", numWords should be 1 and for "12345678912",numWords should be 2 , but that's not always the case. For example for "12345678912345678912", numWords should be 3, but it comes out to be 2.
Can anyone please explain the logic used in line 325 and 326?

To represent decimal number of numDigits as binary number, it requires
numDigits * Math.log(10) / Math.log(2)
bits.
int numBits = (int)(((numDigits * bitsPerDigit[radix]) >>> 10) + 1);
In the calculation above bitsPerDigit[10] is 3402.
Math.log(10) / Math.log(2) * Math.pow(2, 10) = 3401.6543691646593

In Java, BigIntegers are not stored as strings or bytes with a digit each. They are stored as an array of 32-bit integers, which together form the so-called magnitude of the BigInteger. There can be no leading zero integers(*), so the BigInteger is stored as compactly as possible.
The "words" mentioned are these 32-bit integers. They are not groups of 9 digits, they are used in full, so each bit counts.
So you just have to know how many 32-bit integers are stored, which is the length of the internal array times 32. But the top integer can still have leading zeroes, so you must get the number of leading zeroes of that top integer and subtract them from the obtained product, in pseudo-code:
numBits = internalArray.length * 32 - numberOfLeadingZeroBits(internalArray[0]);
Note that the internal array is stored with the top integer at the lowest address (I have no idea why that is), so the top integer is at index 0 of the array.
(*) In reality, the above is a little more complicated, since the top item may be stored at an offset from the start of the array (probably to make certain calculations easier), but to understand the mechanism, you can pretend there are no extra integers.

Words doesn't refer to words as you know it - it's referring to words as memory blocks.
https://en.wikipedia.org/wiki/Word_(computer_architecture)

Why is the maximum capacity of a Java HashMap 1<<30 and not 1<<31?

Why is the maximum capacity of a Java HashMap 1<<30 and not 1<<31, even though the max value of an int is 231-1? The maximum capacity is initialized as static final int MAXIMUM_CAPACITY = 1 << 30;

Java uses signed integers which means the first bit is used to store the sign of the number (positive/negative).
A four byte integer has 32 bits in which the numerical portion may only span 31 bits due to the signing bit. This limits the range of the number to 2^31 - 1 (due to inclusion of 0) to - (2^31).

While it would be possible for a hash map to handle quantities of items between 2^30 and 2^31-1 without having to use larger integer types, writing code which works correctly even near the upper limits of a language's integer types is difficult. Further, in a language which treats integers as an abstract algebraic ring that "wraps" on overflow, rather than as numbers which should either yield numerically-correct results or throw exceptions when they cannot do so, it may be hard to ensure that there aren't any cases where overflows would cause invalid operations to go undetected.
Specifying an upper limit of 2^30 or even 2^29, and ensuring correct behavior on things no larger than that, is often much easier than trying to ensure correct behavior all the way up to 2^31-1. Absent a particular reason to squeeze out every last bit of range, it's generally better to use the simpler approach.

By default, the int data type is a 32-bit signed two's complement integer, which has a minimum value of -2^31 and a maximum value of (2^31)-1, ranges from –2,147,483,648 to 2,147,483,647.
The first bit is reserved for the sign bit — it is 1 if the number is negative and 0 if it is positive.
1 << 30 is equal to 1,073,741,824
it's two's complement binary integer is 01000000-00000000-00000000-00000000.
1 << 31 is equal to -2,147,483,648.
it's two's complement binary integer is 10000000-00000000-00000000-00000000.
It says the maximum size to which hash-map can expand is 1,073,741,824 = 2^30.

You are thinking of unsigned, with signed upper range is (2^31)-1

Calculate lat/long

Can any please help me to find out the algorithm of calculating latitude and longitude from 8 byte value?
8- byte value - A027AFDF5D984840 and It's longitude is 49.1903648
8- byte value - 3AC7253383DD4B40 and It's latitude is 55.7305664
Also if possible then please tell me how to calculate floating point value of 0000000000805A40 as 106.0

This isn't specific to longitude and latitude calculation. Given that you have 8 bytes to work with in your examples, what you are looking to do is convert a byte array to double precision floating point values. The process is the same for all three examples.
To find the floating point value, you need to deal with the byte array at the bit level. An explanation of the process is here. Or check out Wikipedia.
To get you started, in your first example, you will be dealing with the bytes in the order of 40, 48, 98, 5D and so on. Breaking down the first two bytes, 40 and 48 (0100 0000 0100 1000), you have enough information to get the sign bit (0) and the exponent bit range (100 0000 0100 -> 1028). From here, continue listing out the 52 remaining bits to determine the fractional portion of the number. If you follow the calculations at the links above, you will see the floating point values you expect from your examples.
As a side note, some programming languages provide a method of doing this conversion for you. Depending on what language you are using, there is no need to reinvent the wheel here.
EDIT: Example
To convert the byte array to a double precision floating point value, we will need three pieces of information from the byte array: the sign bit (S), the exponent (E) and the fraction (F). The first thing to do is create your 64 bit representation of the 8 byte array. As I mentioned above, you will use the bytes in "reverse" order (little-endian if you want to do reading on the topic). I will use only the first four bytes as they will be enough to illustrate the process.
40 48 98 5D ==> 01000000 01001000 10011000 01011101
I will refer to the bits above in the order of 0 being leftmost.
Sign bit:
This is the 0th bit in the array above. In this case, S is 0. The sign bit is exactly what you'd think it would be, it determines whether the floating point result will be negative or positive.
Exponent:
Bits 1-11 determine the exponent. In the example, the exponent E is 100 0000 0100 ==> 1028.
Fraction:
The remaining bits, 12-63 determine the fraction. I only illustrated bits 12-31 to show how the process works: 1000 10011000 01011101....
The fractional bits are not converted to a decimal value. Instead, you need to iterate through them paying attention to the bits which are set (1, not 0). The index of the bits is what is important here. Consider the fraction bits indexed starting at 1 increasing from left to right to 20. Again, in the full example (bits 12-63), this indexing would be 1 to 52.
The fraction is found by summing each i'th bit that is set using this expression: 2^-i. For this example this means we are dealing with indexes 1,5,8,9,14,16,17,18,20.
The first four indexes will give us enough precision for the purposes of the example:
F = (2^-1) + (2^-5) + (2^-8) + (2^-9) + ... = 0.5 + 0.03125 + 0.00390625 + 0.001953125 = 0.537109375
The final value V is found by applying the formula:
V = -1^S * 2^(E-1023) * (1 + F) = -1^0 + 2^(1028-1023) * 1.537109375 = 49.1875
This is a good approximation of your goal value of 49.1903648. If you were to continue using the full fractional bit range in the manner I showed you, your final value will match.
Lastly, since you mentioned you are using Java, have you taken a look at using the ByteBuffer class and the getDouble function here?

Why do integers in Java integer not use all the 32 or 64 bits?

I was looking into 32-bit and 64-bit. I noticed that the range of integer values that can stored in 32 bits is ±4,294,967,295 but the Java int is also 32-bit (If I am not mistaken) and it stores values up to ±2 147 483 648. Same thing for long, it stores values from 0 to ±2^63 but 64-bit stores ±2^64 values. How come these values are different?

Integers in Java are signed, so one bit is reserved to represent whether the number is positive or negative. The representation is called "two's complement notation." With this approach, the maximum positive value represented by n bits is given by
(2 ^ (n - 1)) - 1
and the corresponding minimum negative value is given by
-(2 ^ (n - 1))
The "off-by-one" aspect to the positive and negative bounds is due to zero. Zero takes up a slot, leaving an even number of negative numbers and an odd number of positive numbers. If you picture the represented values as marks on a circle—like hours on a clock face—you'll see that zero belongs more to the positive range than the negative range. In other words, if you count zero as sort of positive, you'll find more symmetry in the positive and negative value ranges.
To learn this representation, start small. Take, say, three bits and write out all the numbers that can be represented:
0
1
2
3
-4
-3
-2
-1
Can you write the three-bit sequence that defines each of those numbers? Once you understand how to do that, try it with one more bit. From there, you imagine how it extends up to 32 or 64 bits.
That sequence forms a "wheel," where each is formed by adding one to the previous, with noted wraparound from 3 to -4. That wraparound effect (which can also occur with subtraction) is called "modulo arithemetic."

In 32 bit you can store 2^32 values. If you call these values 0 to 4294967295 or -2147483648 to +2147483647 is up to you. This difference is called "signed type" versus "unsigned type". The language Java supports only signed types for int. Other languages have different types for an unsigned 32bit type.
NO laguage will have a 32bit type for ±4294967295, because the "-" part would require another bit.

That's because Java ints are signed, so you need one bit for the sign.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Floating point notation representation in java specification - java

Related

Assign negative int to long in java

Calculating number of bits and number of words BigInteger

Why is the maximum capacity of a Java HashMap 1<<30 and not 1<<31?

Calculate lat/long

Why do integers in Java integer not use all the 32 or 64 bits?

Categories

Resources