Long Representation vs Double representation of positive and negative zero in java - java

I was wondering about the differences between positive and negative zero in different numeric types.
I understand the IEEE-754 for floating point arithmetic and bit representation in double precision so the following didn't come as a surprise
double posz = 0.0;
double negz = -0.0;
System.out.println(Long.toBinaryString(Double.doubleToLongBits(posz)));
System.out.println(Long.toBinaryString(Double.doubleToLongBits(negz)));
// output
>>> 0
>>> 1000000000000000000000000000000000000000000000000000000000000000
What did surprise me and showed me that im clueless about the bit representation of long type in java is that even if i shift right (unsigned >>>) then the binary representation of both positive and negative zero is the same
long posz = 0L;
long negz = -0L;
for (int i = 63; i >= 0; i--) {
System.out.print((posz >>> i) & 1);
}
System.out.println();
for (int i = 63; i >= 0; i--) {
System.out.print((negz >>> i) & 1);
}
// output
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> 0000000000000000000000000000000000000000000000000000000000000000
so i am wondering what does java do from a bit representation when i write the following
long posz = 0L;
long negz = -0L;
Does the compiler understand that they are both zero and disregards the sign (and so assignes 0 to the sign bit) or is there other magic here?

or is there other magic here?
Yes. 2's complement.
2's complement is a bit magical. It accomplishes 2 major objectives. Before getting into that, let's first stew on the notion of negative zero for a moment.
Negative zero is kinda weird. Why does it exist at all?
Negative zero isn't actually a thing. Ask any mathematician "Hey, so, what's up with negative zero?" and they'll just look at you in befuddlement. It's not a thing. Mathematically, 0 and -0 are utterly identical. Not just 'nearly identical', but 100%, fully, in all possible ways, identical. We don't generally want our numbers to be capable of representing both 5.0 as well as 5.00 - as those two are entirely, 100%, identical. If you don't think that a value system ought to waste bits trying to differentiate between 5.0 and 5.00, then it's equally bizarro to want the ability to represent -0.0 and +0.0 as distinct entities.
So, wanting -0 in the first place is kinda weird. All the numeric primitives (long, int, short, byte, and I guess char which is technically numeric too) all cannot represent this number. Instead, long z = -0 boils down to:
Take the constant "0".
Apply the 'negate' operation to this number (- is a unary operator. Just like 2+5 makes the system calculate the binary operation of "addition" on elements 2 and 5, -x makes the system calculate the unary operation of "negation" on element x. Applying the negation operation to 0 produces 0. It's no different from writing, say, int x = 5 + 0;. That +0 part doesn't do anything. The - in front of -0 doesn't do anything. In contrast to -0.0 where it does do something (gets you negative zero, the double value, instead of positive zero).
Store this result in z (so, just 0 then).
There is no way to tell if that minus is there. They both result in ALL ZERO bits, and hence, there is no way for the computer to tell if you initialized that variable with the expression -0 or with +0. Again in contrast to double where as you noticed there's a bit different.
So why does double have it then?
Let's stew a bit on the notion of doubles and IEEE-754 math.
A double takes 64 bits. From basic pure mathematical principles then, a double is as incapable of representing more than 2^64 different possible values you are capable of breaking the speed of light or making 1+1=3.
And yet, a double aims to represent all numbers. There are way more numbers between 0 and 1 than 2^64 options (in fact, an infinite amount of numbers exist between 0 and 1), and that's just 0 to 1.
So, how doubles actually work is different. A few less than 2^64 numbers are chosen from the entire number line. Let's call these the blessed numbers.
The blessed numbers are not equally distributed. The closer you are to 1, the more blessed numbers exist. In other words, the distance between 2 blessed numbers increases as you move away from 1. For example, if you go from, say, 1e100 (a 1 with a hundred zeroes) and want to find the next blessed number, it's quite a ways. It's in fact higher than 1.0! - 1e100+1 is in fact 1e100 again, because the way double math works is that after every single last mathematical operation you to do them, the end result is rounded to the nearest blessed number.
Let's try it!
double d = 1e100;
System.out.println(d);
System.out.println(d + 1);
// prints: 1.0E100
// 1.0E100
But that means.. double values don't actually represent a single number!!. What any given double represents is in fact this concept:
An unknown number whose value lies between [D - 𝛿, D + 𝛿], where D is the blessed number that is closed to this unknown number this value represents, and, and 𝛿 is half of the distance between D and the next nearest blessed number on either side.
Given that usually 𝛿 is incredibly small, this is 'good enough'. But this weirdness does explain why you really, really do not want any business at all with double if accuracy is important (such as with currencies. Don't store those in doubles, ever!)
Given that, what does -0.0 represent? not actually just 0. It represents, specifically: An unknown number whose value lies between [-𝛿, 0] where 0 is real zero (and this, has no sign), and 𝛿 is Double.MIN_VALUE: the smallest non-zero positive number representable with a double.
That's why -0.0 and +0.0 both exist: They are in fact different concepts. Rarely relevant, but sometimes it is. In contrast to e.g. long where 5 just means 5 and not "between 4.5 and 5.5", because longs fundamentally don't recognize that fractional parts exist in the first place. Given that 5 just means 5, then 0 just means 0, and there is no such thing as negative zero in the first place.
Now we get to 2's complement
2's complement is a cool system. It has two neat properties:
It only has the one zero.
It does not matter if you treat the bit sequence as signed-by-way-of-2s-complement or as unsigned, for the purposes of the operations: Addition, Substraction, Increment, Decrement, zero-check. The modifications you do to the bits to implement those operations is identical.
It DOES matter for greater than, less than, and divide.
2's complement works like this: To negate a number, take all bits and flip them (i.e. do a NOT operation on the bits). Then, add 1.
Let's try it!
int x = 5;
int y = -x;
for (int i = 31; i >= 0; i--) {
System.out.print((x >>> i) & 1);
}
System.out.println();
for (int i = 31; i >= 0; i--) {
System.out.print((y >>> i) & 1);
}
System.out.println();
// prints 00000000000000000000000000000101
// 11111111111111111111111111111011
As we can see, the 'flip all bits and add 1' algorithm was applied.
2s complement is, of course, reversible: If you do 'flip all bits and add 1' twice in a row you get the same number out.
Now let's try -0. 0 is 32 0 bits, then flip them all, then add 1:
00000000000000000000000000000000
11111111111111111111111111111111 // flip all
100000000000000000000000000000000 // add 1
00000000000000000000000000000000 // that 1 fell off
and because ints can only store 32 bits, that final '1' falls off of the end. And we're left with zero again.
Now let's go with bytes ( abit smaller) and try to add, say, 200 and 50 together.
11001000 // 200 in binary
00110010 // 50 in binary
-------- +
11111010 // 250 in binary.
now let's instead go: Oh wait, whoops, that was an error, actually these numbers are in 2s complement. That wasn't 200, nono. 11001000 is a bit sequence that actually means (let's apply the 'flip all bits, add 1' scheme: 00111000 - it's actually -56. So the operation was meant to represent '-56 + 50'. Which is -6. -6 in binary is (write out 6, flip bits, add 1):
00000110
11111001
11111010
hey now, look at that, nothing changed! It's the same result! So, when the computer does x + y, where x and y are numbers, the computer does not care. Whether x is "an unsigned number" or "a signed with 2s complement number", the operation is identical.
That's why 2s complement is applied. It makes math MUCH faster. The CPU doesn't have to futz about with branching out to deal with sign bits.
In this sense it is more correct to say that in java, int, long, char, byte and short are neither signed nor unsigned, they just are. At least for the purposes of +, -, ++, and --. No the idea that int is signed is fundamentally a property of e.g. System.out.println(int) - that method chooses to render the bitsequence 11111111111111111111111111111111 as "-1" instead of as 4294967296.

long has no such thing as negative zero. Only float and double have a different representation of positive and negative zero.

Related

Convert an int to hex in Java

I need to find out if int n is a power of two and my approach is to convert n to a hex number and check each bit (0 or 1). However, I've never used hex numbers in Java, could anyone help me out here?
Both converting to a String and replacing using a regular expression is expensive.
A simple way to check for a (positive) power of two is to check the number bits set.
if (x > 0 && Long.bitCount(x) == 1)
While Long.bitCount looks complicated, the JVM can replace it with a single machine code instruction.
The canonical way (which you will encounter a lot when googling how to test for powers of two, and you will probably encounter it in code) to test for powers of two is
x != 0 && (x & x - 1) == 0
It is often encountered with more parenthesis (precedence paranoia?)
x != 0 && (x & (x - 1)) == 0
It is common to skip the x != 0 check and guarantee zero can't be an input, or (often enough) zero is "as good as" a power of two in some sense.
You can of course change the condition to x > 0 if you don't want to consider -2147483648 a power of two, though in many cases it would be (because it's congruent to 231 modulo 232, so interpreted unsigned it's a PoT, and anyway it only has one bit set so it was a PoT all along)
So what's the deal with x & x - 1. It unsets the lowest set bit, but let's look at it in the context of PoTs.
Let's ignore x<2 and say x has the form a10k (ie an arbitrary string of bits followed by a one followed by k zeroes), subtracting 1 from it gives a01k because the -1 borrows through all the trailing zeroes until it reaches the lowest set bit, unsets that bit, then the borrow dies and the top bits are unchanged.
If you take the bitwise AND of a10k and a01k you get a00k because something ANDed with itself is itself again (the a), and in the tail there's always a 0 involved in the AND so it's all zeroes.
Now there are two cases. 0) a=0. Then the result is zero, and we know we started out with x of the form 10k which is a power of two. And 1) a!=0, then the result isn't zero either because a still appears in it, and x wasn't a power of two because it has at least two set bits (the lowest set bit which we explicitly looked at, and at least one other somewhere in a).
Well actually there are two more cases, x=0 and x=1 which were ignored. If we allow k to be zero then x=1 is also included. x=0 is annoying though, in x & x - 1 there is that x as the left operand of &, so the result must be zero no matter what happens in the right operand. So it falls out as a special case.
There is no reason to convert to hex to check bits. In fact, that actually makes it harder. The following line produces a binary String for an int variable "num":
String binary = Integer.toString(num, 2)
That String will only have 1s and 0s in it. Then eliminate any 0s:
String ones = binary.replace("0", "");
If the length of the String is more than one, there there is more than one bit, which means it is not a power of 2. If the length is 0, it means the number had no bits, which means it is 0.

Check division by 3 with binary operations?

I've read this interesting answer about "Checking if a number is divisible by 3"
Although the answer is in Java , it seems to work with other languages also.
Obviously we can do :
boolean canBeDevidedBy3 = (i % 3) == 0;
But the interesting part was this other calculation :
boolean canBeDevidedBy3 = ((int) (i * 0x55555556L >> 30) & 3) == 0;
For simplicity :
0x55555556L = "1010101010101010101010101010110"
Nb
There's also another method to check it :
One can determine if an integer is divisible by 3 by counting the 1
bits at odd bit positions, multiply this number by 2, add the number
of 1-bits at even bit positions add them to the result and check if
the result is divisible by 3
For example :
9310 ( is divisible by 3)
010111012
It has 2 bits in the odd places and 4 bits at the even places ( place is the zero based of the base 2 digit location)
So 2*1 + 4 = 6 which is divisible by 3.
At first I thought those 2 methods are related but I didn't find how.
Question
How does
boolean canBeDevidedBy3 = ((int) (i * 0x55555556L >> 30) & 3) == 0;
— actually determines if i%3==0 ?
Whenever you add 3 to a number, what you do is to add binary 11. Whatever the original value of the number, this will maintain the invariant that twice the number of 1 bits at odd positions, plus the number of 1 bits at even positions, will also be divisible by 3.
You can see that in this way. Let's call the value of the above expression c. You're adding 1 to an odd position, and 1 to an even position. When you add 1 to an even position, either the bit you've added 1 to was set or unset. If it was unset, you'll increase the value of c by 1, because you've added a new 1 in an odd position. If it was previously set, you'll flip that bit, but add a 1 in an even position (from the carry). This means that you initially decrease c by 1, but now when you add the 1 in the even position, you increase c by 2, so overall you've increased c by 2.
Of course, this carry bit might also get added to a bit that's already set, in which case we need to check that this part still increases c by 2: you'll remove a 1 in an even position (decreasing c by 2), and then add a 1 in an odd position (increasing c by 1), meaning that we've in fact decreased c by 1. That is the same as increasing c by 2, though, if we're working modulo 3.
A more formal version of this would be structured as a proof by induction.
The two methods do not appear to be related. The bit-wise method seems to be related to certain methods for the efficient computation of modulo b-1 when using digit base b, known in decimal arithmetic as "casting out nines".
The multiplication-based method is directly based on the definition of division when accomplished by multiplication with the reciprocal. Letting / denote mathematical division, we have
int_quot = (int)(i / 3)
frac_quot = i / 3 - int_quot = i / 3 - (int)(i / 3)
i % 3 = 3 * frac_quot = 3 * (i / 3 - (int)(i / 3))
The fractional portion of the mathematical quotient translates directly into the remainder of integer division: If the fraction is 0, the remainder is 0, if the fraction is 1/3 the remainder is 1, if the fraction is 2/3 the remainder is 2. This means we only need to examine the fractional portion of the quotient.
Instead of dividing by 3, we can multiply by 1/3. If we perform the computation in a 32.32 fixed-point format, 1/3 corresponds to 232*1/3 which is a number between 0x55555555 and 0x55555556. For reasons that will become apparent shortly, we use the overestimation here, that is the rounded-up result 0x555555556.
When we multiply 0x55555556 by i, the most significant 32 bits of the full 64-bit product will contain the integral portion of the quotient (int)(i * 1/3) = (int)(i / 3). We are not interested in this integral portion, so we neither compute nor store it. The lower 32-bits of the product is one of the fractions 0/3, 1/3, 2/3 however computed with a slight error since our value of 0x555555556 is slightly larger than 1/3:
i = 1: i * 0.55555556 = 0.555555556
i = 2: i * 0.55555556 = 0.AAAAAAAAC
i = 3: i * 0.55555556 = 1.000000002
i = 4: i * 0.55555556 = 1.555555558
i = 5: i * 0.55555556 = 1.AAAAAAAAE
If we examine the most significant bits of the three possible fraction values in binary, we find that 0x5 = 0101, 0xA = 1010, 0x0 = 0000. So the two most significant bits of the fractional portion of the quotient correspond exactly to the desired modulo values. Since we are dealing with 32-bit operands, we can extract these two bits with a right shift by 30 bits followed by a mask of 0x3 to isolate two bits. I think the masking is needed in Java as 32-bit integers are always signed. For uint32_t operands in C/C++ the shift alone would suffice.
We now see why choosing 0x55555555 as representation of 1/3 wouldn't work. The fractional portion of the quotient would turn into 0xFFFFFFF*, and since 0xF = 1111 in binary, the modulo computation would deliver an incorrect result of 3.
Note that as i increases in magnitude, the accumulated error from the imprecise representation of 1/3 affects more and more bits of the fractional portion. In fact, exhaustive testing shows that the method only works for i < 0x60000000: beyond that limit the error overwhelms the most significant fraction bits which represent our result.

When using doubles, why isn't (x / (y * z)) the same as (x / y / z)? [duplicate]

This question already has answers here:
How to avoid floating point precision errors with floats or doubles in Java?
(12 answers)
Double calculation producing odd result [duplicate]
(3 answers)
Closed 7 years ago.
This is partly academic, as for my purposes I only need it rounded to two decimal places; but I am keen to know what is going on to produce two slightly different results.
This is the test that I wrote to narrow it to the simplest implementation:
#Test
public void shouldEqual() {
double expected = 450.00d / (7d * 60); // 1.0714285714285714
double actual = 450.00d / 7d / 60; // 1.0714285714285716
assertThat(actual).isEqualTo(expected);
}
But it fails with this output:
org.junit.ComparisonFailure:
Expected :1.0714285714285714
Actual :1.0714285714285716
Can anyone explain in detail what is going on under the hood to result in the value at 1.000000000000000X being different?
Some of the points I'm looking for in an answer are:
Where is the precision lost?
Which method is preferred, and why?
Which is actually correct? (In pure maths, both can't be right. Perhaps both are wrong?)
Is there a better solution or method for these arithmetic operations?
I see a bunch of questions that tell you how to work around this problem, but not one that really explains what's going on, other than "floating-point roundoff error is bad, m'kay?" So let me take a shot at it. Let me first point out that nothing in this answer is specific to Java. Roundoff error is a problem inherent to any fixed-precision representation of numbers, so you get the same issues in, say, C.
Roundoff error in a decimal data type
As a simplified example, imagine we have some sort of computer that natively uses an unsigned decimal data type, let's call it float6d. The length of the data type is 6 digits: 4 dedicated to the mantissa, and 2 dedicated to the exponent. For example, the number 3.142 can be expressed as
3.142 x 10^0
which would be stored in 6 digits as
503142
The first two digits are the exponent plus 50, and the last four are the mantissa. This data type can represent any number from 0.001 x 10^-50 to 9.999 x 10^+49.
Actually, that's not true. It can't store any number. What if you want to represent 3.141592? Or 3.1412034? Or 3.141488906? Tough luck, the data type can't store more than four digits of precision, so the compiler has to round anything with more digits to fit into the constraints of the data type. If you write
float6d x = 3.141592;
float6d y = 3.1412034;
float6d z = 3.141488906;
then the compiler converts each of these three values to the same internal representation, 3.142 x 10^0 (which, remember, is stored as 503142), so that x == y == z will hold true.
The point is that there is a whole range of real numbers which all map to the same underlying sequence of digits (or bits, in a real computer). Specifically, any x satisfying 3.1415 <= x <= 3.1425 (assuming half-even rounding) gets converted to the representation 503142 for storage in memory.
This rounding happens every time your program stores a floating-point value in memory. The first time it happens is when you write a constant in your source code, as I did above with x, y, and z. It happens again whenever you do an arithmetic operation that increases the number of digits of precision beyond what the data type can represent. Either of these effects is called roundoff error. There are a few different ways this can happen:
Addition and subtraction: if one of the values you're adding has a different exponent from the other, you will wind up with extra digits of precision, and if there are enough of them, the least significant ones will need to be dropped. For example, 2.718 and 121.0 are both values that can be exactly represented in the float6d data type. But if you try to add them together:
1.210 x 10^2
+ 0.02718 x 10^2
-------------------
1.23718 x 10^2
which gets rounded off to 1.237 x 10^2, or 123.7, dropping two digits of precision.
Multiplication: the number of digits in the result is approximately the sum of the number of digits in the two operands. This will produce some amount of roundoff error, if your operands already have many significant digits. For example, 121 x 2.718 gives you
1.210 x 10^2
x 0.02718 x 10^2
-------------------
3.28878 x 10^2
which gets rounded off to 3.289 x 10^2, or 328.9, again dropping two digits of precision.
However, it's useful to keep in mind that, if your operands are "nice" numbers, without many significant digits, the floating-point format can probably represent the result exactly, so you don't have to deal with roundoff error. For example, 2.3 x 140 gives
1.40 x 10^2
x 0.23 x 10^2
-------------------
3.22 x 10^2
which has no roundoff problems.
Division: this is where things get messy. Division will pretty much always result in some amount of roundoff error unless the number you're dividing by happens to be a power of the base (in which case the division is just a digit shift, or bit shift in binary). As an example, take two very simple numbers, 3 and 7, divide them, and you get
3. x 10^0
/ 7. x 10^0
----------------------------
0.428571428571... x 10^0
The closest value to this number which can be represented as a float6d is 4.286 x 10^-1, or 0.4286, which distinctly differs from the exact result.
As we'll see in the next section, the error introduced by rounding grows with each operation you do. So if you're working with "nice" numbers, as in your example, it's generally best to do the division operations as late as possible because those are the operations most likely to introduce roundoff error into your program where none existed before.
Analysis of roundoff error
In general, if you can't assume your numbers are "nice", roundoff error can be either positive or negative, and it's very difficult to predict which direction it will go just based on the operation. It depends on the specific values involved. Look at this plot of the roundoff error for 2.718 z as a function of z (still using the float6d data type):
In practice, when you're working with values that use the full precision of your data type, it's often easier to treat roundoff error as a random error. Looking at the plot, you might be able to guess that the magnitude of the error depends on the order of magnitude of the result of the operation. In this particular case, when z is of the order 10-1, 2.718 z is also on the order of 10-1, so it will be a number of the form 0.XXXX. The maximum roundoff error is then half of the last digit of precision; in this case, by "the last digit of precision" I mean 0.0001, so the roundoff error varies between -0.00005 and +0.00005. At the point where 2.718 z jumps up to the next order of magnitude, which is 1/2.718 = 0.3679, you can see that the roundoff error also jumps up by an order of magnitude.
You can use well-known techniques of error analysis to analyze how a random (or unpredictable) error of a certain magnitude affects your result. Specifically, for multiplication or division, the "average" relative error in your result can be approximated by adding the relative error in each of the operands in quadrature - that is, square them, add them, and take the square root. With our float6d data type, the relative error varies between 0.0005 (for a value like 0.101) and 0.00005 (for a value like 0.995).
Let's take 0.0001 as a rough average for the relative error in values x and y. The relative error in x * y or x / y is then given by
sqrt(0.0001^2 + 0.0001^2) = 0.0001414
which is a factor of sqrt(2) larger than the relative error in each of the individual values.
When it comes to combining operations, you can apply this formula multiple times, once for each floating-point operation. So for instance, for z / (x * y), the relative error in x * y is, on average, 0.0001414 (in this decimal example) and then the relative error in z / (x * y) is
sqrt(0.0001^2 + 0.0001414^2) = 0.0001732
Notice that the average relative error grows with each operation, specifically as the square root of the number of multiplications and divisions you do.
Similarly, for z / x * y, the average relative error in z / x is 0.0001414, and the relative error in z / x * y is
sqrt(0.0001414^2 + 0.0001^2) = 0.0001732
So, the same, in this case. This means that for arbitrary values, on average, the two expressions introduce approximately the same error. (In theory, that is. I've seen these operations behave very differently in practice, but that's another story.)
Gory details
You might be curious about the specific calculation you presented in the question, not just an average. For that analysis, let's switch to the real world of binary arithmetic. Floating-point numbers in most systems and languages are represented using IEEE standard 754. For 64-bit numbers, the format specifies 52 bits dedicated to the mantissa, 11 to the exponent, and one to the sign. In other words, when written in base 2, a floating point number is a value of the form
1.1100000000000000000000000000000000000000000000000000 x 2^00000000010
52 bits 11 bits
The leading 1 is not explicitly stored, and constitutes a 53rd bit. Also, you should note that the 11 bits stored to represent the exponent are actually the real exponent plus 1023. For example, this particular value is 7, which is 1.75 x 22. The mantissa is 1.75 in binary, or 1.11, and the exponent is 1023 + 2 = 1025 in binary, or 10000000001, so the content stored in memory is
01000000000111100000000000000000000000000000000000000000000000000
^ ^
exponent mantissa
but that doesn't really matter.
Your example also involves 450,
1.1100001000000000000000000000000000000000000000000000 x 2^00000001000
and 60,
1.1110000000000000000000000000000000000000000000000000 x 2^00000000101
You can play around with these values using this converter or any of many others on the internet.
When you compute the first expression, 450/(7*60), the processor first does the multiplication, obtaining 420, or
1.1010010000000000000000000000000000000000000000000000 x 2^00000001000
Then it divides 450 by 420. This produces 15/14, which is
1.0001001001001001001001001001001001001001001001001001001001001001001001...
in binary. Now, the Java language specification says that
Inexact results must be rounded to the representable value nearest to the infinitely precise result; if the two nearest representable values are equally near, the one with its least significant bit zero is chosen. This is the IEEE 754 standard's default rounding mode known as round to nearest.
and the nearest representable value to 15/14 in 64-bit IEEE 754 format is
1.0001001001001001001001001001001001001001001001001001 x 2^00000000000
which is approximately 1.0714285714285714 in decimal. (More precisely, this is the least precise decimal value that uniquely specifies this particular binary representation.)
On the other hand, if you compute 450 / 7 first, the result is 64.2857142857..., or in binary,
1000000.01001001001001001001001001001001001001001001001001001001001001001...
for which the nearest representable value is
1.0000000100100100100100100100100100100100100100100101 x 2^00000000110
which is 64.28571428571429180465... Note the change in the last digit of the binary mantissa (compared to the exact value) due to roundoff error. Dividing this by 60 gives you
1.000100100100100100100100100100100100100100100100100110011001100110011...
Look at the end: the pattern is different! It's 0011 that repeats, instead of 001 as in the other case. The closest representable value is
1.0001001001001001001001001001001001001001001001001010 x 2^00000000000
which differs from the other order of operations in the last two bits: they're 10 instead of 01. The decimal equivalent is 1.0714285714285716.
The specific rounding that causes this difference should be clear if you look at the exact binary values:
1.0001001001001001001001001001001001001001001001001001001001001001001001...
1.0001001001001001001001001001001001001001001001001001100110011001100110...
^ last bit of mantissa
It works out in this case that the former result, numerically 15/14, happens to be the most accurate representation of the exact value. This is an example of how leaving division until the end benefits you. But again, this rule only holds as long as the values you're working with don't use the full precision of the data type. Once you start working with inexact (rounded) values, you no longer protect yourself from further roundoff errors by doing the multiplications first.
It has to do with how the double type is implemented and the fact that the floating-point types don't make the same precision guarantees as other simpler numerical types. Although the following answer is more specifically about sums, it also answers your question by explaining how there is no guarantee of infinite precision in floating-point mathematical operations: Why does changing the sum order returns a different result?. Essentially you should never attempt to determine the equality of floating-point values without specifying an acceptable margin of error. Google's Guava library includes DoubleMath.fuzzyEquals(double, double, double) to determine the equality of two double values within a certain precision. If you wish to read up on the specifics of floating-point equality this site is quite useful; the same site also explains floating-point rounding errors. In summation: the expected and actual values of your calculation differ because of the rounding differing between the calculations due to the order of operations.
Let's simplify things a bit. What you want to know is why 450d / 420 and 450d / 7 / 60 (specifically) give different results.
Let's see how division is performed in IEE double-precision floating point format. Without going deep into implementation details, it's basically XOR-ing the sign bit, subtracting the exponent of the divisor from the exponent of the dividend, dividing the mantissas, and normalizing the result.
First, we should represent our numbers in the proper format for double:
450 is 0 10000000111 1100001000000000000000000000000000000000000000000000
420 is 0 10000000111 1010010000000000000000000000000000000000000000000000
7 is 0 10000000001 1100000000000000000000000000000000000000000000000000
60 is 0 10000000100 1110000000000000000000000000000000000000000000000000
Let's first divide 450 by 420
First comes the sign bit, it's 0 (0 xor 0 == 0).
Then comes the exponent. 10000000111b - 10000000111b + 1023 == 10000000111b - 10000000111b + 01111111111b == 01111111111b
Looking good, now the mantissa:
1.1100001000000000000000000000000000000000000000000000 / 1.1010010000000000000000000000000000000000000000000000 == 1.1100001 / 1.101001. There are a couple of different ways to do this, I'll talk a bit about them later. The result is 1.0(001) (you can verify it here).
Now we should normalize the result. Let's see the guard, round and sticky bit values:
0001001001001001001001001001001001001001001001001001 0 0 1
Guard bit's 0, we don't do any rounding. The result is, in binary:
0 01111111111 0001001001001001001001001001001001001001001001001001
Which gets represented as 1.0714285714285714 in decimal.
Now let's divide 450 by 7 by analogy.
Sign bit = 0
Exponent = 10000000111b - 10000000001b + 01111111111b == -01111111001b + 01111111111b + 01111111111b == 10000000101b
Mantissa = 1.1100001 / 1.11 == 1.00000(001)
Rounding:
0000000100100100100100100100100100100100100100100100 1 0 0
Guard bit is set, round and sticky bits are not. We are rounding to-nearest (default mode for IEEE), and we're stuck right between the two possible values which we could round to. As the lsb is 0, we add 1. This gives us the rounded mantissa:
0000000100100100100100100100100100100100100100100101
The result is
0 10000000101 0000000100100100100100100100100100100100100100100101
Which gets represented as 64.28571428571429 in decimal.
Now we will have to divide it by 60... But you already know that we have lost some precision. Dividing 450 by 420 didn't require rounding at all, but here, we already had to round the result at least once. But, for completeness's sake, let's finish the job:
Dividing 64.28571428571429 by 60
Sign bit = 0
Exponent = 10000000101b - 10000000100b + 01111111111b == 01111111110b
Mantissa = 1.0000000100100100100100100100100100100100100100100101 / 1.111 == 0.10001001001001001001001001001001001001001001001001001100110011
Round and shift:
0.1000100100100100100100100100100100100100100100100100 1 1 0 0
1.0001001001001001001001001001001001001001001001001001 1 0 0
Rounding just as in the previous case, we get the mantissa: 0001001001001001001001001001001001001001001001001010.
As we shifted by 1, we add that to the exponent, getting
Exponent = 01111111111b
So, the result is:
0 01111111111 0001001001001001001001001001001001001001001001001010
Which gets represented as 1.0714285714285716 in decimal.
Tl;dr:
The first division gave us:
0 01111111111 0001001001001001001001001001001001001001001001001001
And the last division gave us:
0 01111111111 0001001001001001001001001001001001001001001001001010
The difference is in the last 2 bits only, but we could have lost more - after all, to get the second result, we had to round two times instead of none!
Now, about mantissa division. Floating-point division is implemented in two major ways.
The way mandated by the IEEE long division (here are some good examples; it's basically the regular long division, but with binary instead of decimal), and it's pretty slow. That is what your computer did.
There is also a faster but less accrate option, multiplication by inverse. First, a reciprocal of the divisor is found, and then multiplication is performed.
That's because double division often lead to a loss of precision. Said loss can vary depending on the order of the divisions.
When you divide by 7d, you already lost some precision with the actual result. Then only you divide an erroneous result by 60.
When you divide by 7d * 60, you only have to use division once, thus losing precision only once.
Note that double multiplication can sometimes fail too, but that's much less common.
Certainly the order of the operations mixed with the fact that doubles aren't precise :
450.00d / (7d * 60) --> a = 7d * 60 --> result = 450.00d / a
vs
450.00d / 7d / 60 --> a = 450.00d /7d --> result = a / 60

Why does this random value have a 25/75 distribution instead of 50/50?

Edit: So basically what I'm trying to write is a 1 bit hash for double.
I want to map a double to true or false with a 50/50 chance. For that I wrote code that picks some random numbers (just as an example, I want to use this on data with regularities and still get a 50/50 result), checks their last bit and increments y if it is 1, or n if it is 0.
However, this code constantly results in 25% y and 75% n. Why is it not 50/50? And why such a weird, but straight-forward (1/3) distribution?
public class DoubleToBoolean {
#Test
public void test() {
int y = 0;
int n = 0;
Random r = new Random();
for (int i = 0; i < 1000000; i++) {
double randomValue = r.nextDouble();
long lastBit = Double.doubleToLongBits(randomValue) & 1;
if (lastBit == 1) {
y++;
} else {
n++;
}
}
System.out.println(y + " " + n);
}
}
Example output:
250167 749833
Because nextDouble works like this: (source)
public double nextDouble()
{
return (((long) next(26) << 27) + next(27)) / (double) (1L << 53);
}
next(x) makes x random bits.
Now why does this matter? Because about half the numbers generated by the first part (before the division) are less than 1L << 52, and therefore their significand doesn't entirely fill the 53 bits that it could fill, meaning the least significant bit of the significand is always zero for those.
Because of the amount of attention this is receiving, here's some extra explanation of what a double in Java (and many other languages) really looks like and why it mattered in this question.
Basically, a double looks like this: (source)
A very important detail not visible in this picture is that numbers are "normalized"1 such that the 53 bit fraction starts with a 1 (by choosing the exponent such that it is so), that 1 is then omitted. That is why the picture shows 52 bits for the fraction (significand) but there are effectively 53 bits in it.
The normalization means that if in the code for nextDouble the 53rd bit is set, that bit is the implicit leading 1 and it goes away, and the other 52 bits are copied literally to the significand of the resulting double. If that bit is not set however, the remaining bits must be shifted left until it becomes set.
On average, half the generated numbers fall into the case where the significand was not shifted left at all (and about half those have a 0 as their least significant bit), and the other half is shifted by at least 1 (or is just completely zero) so their least significant bit is always 0.
1: not always, clearly it cannot be done for zero, which has no highest 1. These numbers are called denormal or subnormal numbers, see wikipedia:denormal number.
From the docs:
The method nextDouble is implemented by class Random as if by:
public double nextDouble() {
return (((long)next(26) << 27) + next(27))
/ (double)(1L << 53);
}
But it also states the following (emphasis mine):
[In early versions of Java, the result was incorrectly calculated as:
return (((long)next(27) << 27) + next(27))
/ (double)(1L << 54);
This might seem to be equivalent, if not better, but in fact it introduced a large nonuniformity because of the bias in the rounding of floating-point numbers: it was three times as likely that the low-order bit of the significand would be 0 than that it would be 1! This nonuniformity probably doesn't matter much in practice, but we strive for perfection.]
This note has been there since Java 5 at least (docs for Java <= 1.4 are behind a loginwall, too lazy to check). This is interesting, because the problem apparently still exists even in Java 8. Perhaps the "fixed" version was never tested?
This result doesn't surprise me given how floating-point numbers are represented. Let's suppose we had a very short floating-point type with only 4 bits of precision. If we were to generate a random number between 0 and 1, distributed uniformly, there would be 16 possible values:
0.0000
0.0001
0.0010
0.0011
0.0100
...
0.1110
0.1111
If that's how they looked in the machine, you could test the low-order bit to get a 50/50 distribution. However, IEEE floats are represented as a power of 2 times a mantissa; one field in the float is the power of 2 (plus a fixed offset). The power of 2 is selected so that the "mantissa" part is always a number >= 1.0 and < 2.0. This means that, in effect, the numbers other than 0.0000 would be represented like this:
0.0001 = 2^(-4) x 1.000
0.0010 = 2^(-3) x 1.000
0.0011 = 2^(-3) x 1.100
0.0100 = 2^(-2) x 1.000
...
0.0111 = 2^(-2) x 1.110
0.1000 = 2^(-1) x 1.000
0.1001 = 2^(-1) x 1.001
...
0.1110 = 2^(-1) x 1.110
0.1111 = 2^(-1) x 1.111
(The 1 before the binary point is an implied value; for 32- and 64-bit floats, no bit is actually allocated to hold this 1.)
But looking at the above should demonstrate why, if you convert the representation to bits and look at the low bit, you will get zero 75% of the time. This is due to all values less than 0.5 (binary 0.1000), which is half the possible values, having their mantissas shifted over, causing 0 to appear in the low bit. The situation is essentially the same when the mantissa has 52 bits (not including the implied 1) as a double does.
(Actually, as #sneftel suggested in a comment, we could include more than 16 possible values in the distribution, by generating:
0.0001000 with probability 1/128
0.0001001 with probability 1/128
...
0.0001111 with probability 1/128
0.001000 with probability 1/64
0.001001 with probability 1/64
...
0.01111 with probability 1/32
0.1000 with probability 1/16
0.1001 with probability 1/16
...
0.1110 with probability 1/16
0.1111 with probability 1/16
But I'm not sure it's the kind of distribution most programmers would expect, so it probably isn't worthwhile. Plus it doesn't gain you much when the values are used to generate integers, as random floating-point values often are.)

What is the purpose of the unsigned right shift operator ">>>" in Java?

I understand what the unsigned right shift operator ">>>" in Java does, but why do we need it, and why do we not need a corresponding unsigned left shift operator?
The >>> operator lets you treat int and long as 32- and 64-bit unsigned integral types, which are missing from the Java language.
This is useful when you shift something that does not represent a numeric value. For example, you could represent a black and white bit map image using 32-bit ints, where each int encodes 32 pixels on the screen. If you need to scroll the image to the right, you would prefer the bits on the left of an int to become zeros, so that you could easily put the bits from the adjacent ints:
int shiftBy = 3;
int[] imageRow = ...
int shiftCarry = 0;
// The last shiftBy bits are set to 1, the remaining ones are zero
int mask = (1 << shiftBy)-1;
for (int i = 0 ; i != imageRow.length ; i++) {
// Cut out the shiftBits bits on the right
int nextCarry = imageRow & mask;
// Do the shift, and move in the carry into the freed upper bits
imageRow[i] = (imageRow[i] >>> shiftBy) | (carry << (32-shiftBy));
// Prepare the carry for the next iteration of the loop
carry = nextCarry;
}
The code above does not pay attention to the content of the upper three bits, because >>> operator makes them
There is no corresponding << operator because left-shift operations on signed and unsigned data types are identical.
>>> is also the safe and efficient way of finding the rounded mean of two (large) integers:
int mid = (low + high) >>> 1;
If integers high and low are close to the the largest machine integer, the above will be correct but
int mid = (low + high) / 2;
can get a wrong result because of overflow.
Here's an example use, fixing a bug in a naive binary search.
Basically this has to do with sign (numberic shifts) or unsigned shifts (normally pixel related stuff).
Since the left shift, doesn't deal with the sign bit anyhow, it's the same thing (<<< and <<)...
Either way I have yet to meet anyone that needed to use the >>>, but I'm sure they are out there doing amazing things.
As you have just seen, the >> operator automatically fills the
high-order bit with its previous contents each time a shift occurs.
This preserves the sign of the value. However, sometimes this is
undesirable. For example, if you are shifting something that does not
represent a numeric value, you may not want sign extension to take
place. This situation is common when you are working with pixel-based
values and graphics. In these cases you will generally want to shift a
zero into the high-order bit no matter what its initial value was.
This is known as an unsigned shift. To accomplish this, you will use
java’s unsigned, shift-right operator,>>>, which always shifts zeros
into the high-order bit.
Further reading:
http://henkelmann.eu/2011/02/01/java_the_unsigned_right_shift_operator
http://www.java-samples.com/showtutorial.php?tutorialid=60
The signed right-shift operator is useful if one has an int that represents a number and one wishes to divide it by a power of two, rounding toward negative infinity. This can be nice when doing things like scaling coordinates for display; not only is it faster than division, but coordinates which differ by the scale factor before scaling will differ by one pixel afterward. If instead of using shifting one uses division, that won't work. When scaling by a factor of two, for example, -1 and +1 differ by two, and should thus differ by one afterward, but -1/2=0 and 1/2=0. If instead one uses signed right-shift, things work out nicely: -1>>1=-1 and 1>>1=0, properly yielding values one pixel apart.
The unsigned operator is useful either in cases where either the input is expected to have exactly one bit set and one will want the result to do so as well, or in cases where one will be using a loop to output all the bits in a word and wants it to terminate cleanly. For example:
void processBitsLsbFirst(int n, BitProcessor whatever)
{
while(n != 0)
{
whatever.processBit(n & 1);
n >>>= 1;
}
}
If the code were to use a signed right-shift operation and were passed a negative value, it would output 1's indefinitely. With the unsigned-right-shift operator, however, the most significant bit ends up being interpreted just like any other.
The unsigned right-shift operator may also be useful when a computation would, arithmetically, yield a positive number between 0 and 4,294,967,295 and one wishes to divide that number by a power of two. For example, when computing the sum of two int values which are known to be positive, one may use (n1+n2)>>>1 without having to promote the operands to long. Also, if one wishes to divide a positive int value by something like pi without using floating-point math, one may compute ((value*5468522205L) >>> 34) [(1L<<34)/pi is 5468522204.61, which rounded up yields 5468522205]. For dividends over 1686629712, the computation of value*5468522205L would yield a "negative" value, but since the arithmetically-correct value is known to be positive, using the unsigned right-shift would allow the correct positive number to be used.
A normal right shift >> of a negative number will keep it negative. I.e. the sign bit will be retained.
An unsigned right shift >>> will shift the sign bit too, replacing it with a zero bit.
There is no need to have the equivalent left shift because there is only one sign bit and it is the leftmost bit so it only interferes when shifting right.
Essentially, the difference is that one preserves the sign bit, the other shifts in zeros to replace the sign bit.
For positive numbers they act identically.
For an example of using both >> and >>> see BigInteger shiftRight.
In the Java domain most typical applications the way to avoid overflows is to use casting or Big Integer, such as int to long in the previous examples.
int hiint = 2147483647;
System.out.println("mean hiint+hiint/2 = " + ( (((long)hiint+(long)hiint)))/2);
System.out.println("mean hiint*2/2 = " + ( (((long)hiint*(long)2)))/2);
BigInteger bhiint = BigInteger.valueOf(2147483647);
System.out.println("mean bhiint+bhiint/2 = " + (bhiint.add(bhiint).divide(BigInteger.valueOf(2))));

Categories