I'm writing code in Java using short typed variables. Short variables are normally 16 bits but unfortunately Java doesn't have unsigned primitive types so I'm using the 15 lower bits instead ignoring the sign bit. Please don't suggest changes to this part as I'm already quite far in this implementation... Here is my question:
I have a variable which I need to XOR.
In C++ I would just write
myunsignedshort = myunsignedshort ^ 0x2000;
0x2000 (hex) = 0010000000000000 (binary)
However, in Java, I have to deal with the sign bit also so I'm trying to change my mask so that it doesn't affect the xor...
mysignedshort = mysignedshort ^ 0xA000;
0xA000 (hex) = 1010000000000000 (binary)
This isn't having the desired effect and I'm not sure why. Anyone can see where I'm going wrong?
Regards.
EDIT: ok you guys are right, that bit wasn't causing the issue.
the issue comes when I'm shifting bits to the left.
I accidentally shift bits into the sign bit.
mysignedshort = mysignedshort << 1;
Any any ideas how to avoid this new prob so that if it shifts into the MSB then nothing happens at all? or should I just do a manual test? Theres a lot of this shifting in the code though so I would prefer a more terse solution.
Regards.
Those operations don't care about signedness, as mentioned in the comments. But I can expand on that.
Operations for which the signed and unsigned versions are the same:
addition/subtraction
and/or/xor
multiplication
left shift
equality testing
Operations for which they are different:
division/remainder
right shift, there's >> and >>>
ordered comparison, you can make a < b as (a ^ 0x80000000) < (b ^ 0x80000000) to change from signed to unsigned, or unsigned to signed.
You can also use (a & 0xffffffffL) < (b & 0xffffffffL) to get an unsigned comparison, but that doesn't generalize to longs.
Related
Hello everyone I'm learning programming in java and I wanted to know why is the choice for
Sign bit propagation is ">>" and not ">>>"?
I would assume << and >> should have the same implementation.
Sorry if it sounds like a silly question :)
Thanks in advance!
The reason it works this way is because C and C++ used << for left shift and >> for right shift long before Java. Those languages have both signed and unsigned types, and for signed types the sign bit was propagated in the right-shift case.
Java does not have unsigned types, so they kept the behavior of C and C++ so as not to sow confusion and incur the undying wrath of developers the world over. Then they included >>> to provide a right-shift that treated the bit value as unsigned.
This question is really about reading James Gosling's mind :-). But my guess is that << and >> both make sense mathematically: << causes a number to be multiplied by 2 (barring overflow), and >> causes a number to be divided by 2--when you have sign propagation, this works whether the number is positive or negative. Perhaps the language designers thought this would be a more common use of the right shift than the operator that propagates 0's, which is more useful when integers are treated as strings of bits rather than actual numbers. Neither way is "right" or "wrong", and it's possible that if Gosling had had something different for breakfast that morning, he might have seen things your way instead...
Lets start with the questions that you didn't ask :-)
Q: Why is there no <<<?
A1: Because << performs the appropriate reverse operation for both >> and >>>.
>> N is equivalent to divide by 2N for a signed integer
>>> N is equivalent to divide by 2N for an unsigned integer
<< N is equivalent to multiply by 2N for both signed and unsigned integers
A2: Because the sign bit is on the left hand end, so "extending" it when you shift leftwards is nonsensical. (Rotating would make sense, but Java doesn't have any "rotate" operators. Reasons: C precedent, lack of hardware support in some instruction sets, rotation is rarely needed in Java code.)
Q: Why does only one of >> and >>> to sign extension
A: Because if they both did (or neither did) then you wouldn't need two operators.
Now for your questions (I think):
Q: Why did they choose >> to do sign extension and not >>>?
A: This is really unanswerable. As far as we know, there is no extant publicly available contemporaneous record of the original Oak / Java language design decisions. At best, we have to rely on the memory of James Gosling ... and his willingness to answer questions. AFAIK, the question has not been asked.
But my conjecture is that since Java integer types are (mostly) signed, it was thought that the >> operator would be used more often. In hindsight, I think Gosling et al got that right.
But this was NOT about copying C or C++. In those languages, there is only one right-shift operator (>>) and its behavior for signed integers is implementation defined!!. The >>> operator in Java was designed to fix that problem; i.e. remove the portability problem of C / C++ >>.
(Reference: Section 6.5.7 of the draft C11 Language spec.)
Next your comment:
I would assume << and >> should have the same implementation. By same implementation I mean same process but opposite direction.
That is answered above. From the perspective of useful functionality, >> and << do perform the same process for signed numbers but in different direction; i.e. division versus multiplication. And for unsigned numbers <<< corresponds to >> in the same way?
Why the difference? It is basically down to the mathematics of 2's complement and unsigned binary representations.
Note that you cannot perform the mathematical inverse of >> or >>>. Intuitively, these operators throw away the bits on the right end. Once thrown away, those bits cannot be recovered.
Q: So why don't they make << (or a hypothetical <<<) "extend" the right hand bit?
A: Because:
It is not useful. (I cannot think of any mainstream use-cases for extending the right hand bits of a number.)
There is typically no hardware support (... because it is not useful!)
Im trying to implement a class that stores a 32-bit number without using the int primitive type. For doing so, I'm using two short variables msbs and lsbs to store the 32 bits of the number, 16 bits in each variable.
The variable msbs will store the first 16 bits of the number and the lsbs variable the 16 bits left.
When It comes to save the given bytes to the variables I apply the next formula: (The bytes order are given as Little-Endian notation)
Input -> byte[] n = {0b00110101, -3, 0b1001, 0b0}; to the number 0b00000000 00001001 11111101 00110101 (654645)
msbs = ((n[3] << 8) | n[2]);
lsbs = ((n[1] << 8) | n[0]);
As shown below
private void saveNumber(byte[] n) {
msbs = (byte)((n[3] << 8) | n[2]);
lsbs = (byte)((n[1] << 8) | n[0]);
System.out.println(Integer.toBinaryString((n[1] << 8) | n[0]));//Prints 11111111111111111111110100110101
System.out.println("msbs -> " + Integer.toBinaryString(msbs));
System.out.println("lsbs -> " + Integer.toBinaryString(lsbs));//Prints 110101
}
The line
System.out.println(Integer.toBinaryString((n[1] << 8) | n[0]));//Prints 11111111111111111111110100110101
prints exactly what I need, despite the huge amount of useless 1's bits at the beggining (of which I can get rid of just by casting it to short)
But when I print the lsbs where I store the exact same value (apparently) it outputs 110101 when It should be 0b1111110100110101
Why does this behavior occur? I understand It must be something with the "internal" casting performed by Java at the time of store the value 11111111111111111111110100110101 into a 16 bits primitive type (Which personally, I think sholdn't be happening because I am shifting 8 bits to the left in an 8-bits number which should give me a 16-bits number)
As a side note, the msbs variable is doing exactly what I want it to do, so the problem should be related to the way that Java represents the negative numbers
Btw, I know Java isn't exactly the best language to have fun with bits.
Why does this behavior occur?
In Java, all bitwise operations are 32 or 64 bit operations. This is different from some other languages, and can be unexpected. But it is what it is.
I understand It must be something with the "internal" casting performed by Java ....
Java doesn't do an implicit narrowing casts in any of your examples1. In fact, I think that the cause of the unexpected behaviour is an explicit narrowing cast in your code:
msbs = (byte)((n[3] << 8) | n[2]);
You have explicitly cast a 32 bit value from ((n[3] << 8) | n[2]) to a byte. Based on what you say you expect, you should be casting to short.
Aside: When you write things like this "Which personally, I think sholdn't be happening ..." it implies that you are doubting the correctness of the Java compiler. In fact, in 99.999% of cases2, the real problem is someone does not understand what the compiler should do with their; i.e. their knowledge of the language is too shallow3. In most cases, there is a specification of the programming language that says precisely what a particular construct means. In the Java case, it is the Java Language Specification.
1 - In fact the only cases I can think of where internal narrowing happens with primitive types are in the assignment operators.
2 - I made that number up, but the point is that compiler bugs are rarely the cause of unexpected application behaviour.
3 - Or maybe it is just an application bug that the programmer has missed. Tiredness can do bad things to the brain ...
I need to extract an exact range of bits from an existing long, specifically I need bits 51:12 from a 64 bit value.
The value is:
0x0000000415B2C01E
So the value of bits 51:12 should be:
0x0000415B2C
I'm a bit confused as to how to actually extract that range, or any range for that matter. I've been told to simply left shift by 12 (value << 12) to obtain the bits I need, but that gives me the value of:
0x415B2C01E000
Now I might be completely misunderstanding how bit shifting works, but I can't get my head around how to extract bit ranges. I've found a lot of existing stuff on it, but I'm even more confused about it all now.
If anyone could help me out, it would certainly be appreciated.
Thanks
Shift and mask:
answer = value >> 12 & 0xFFFFFFFFFFF;
we can shift using >> operator, and we can use '/' to divide in java. What I am asking is what really happens behind the scene when we do these operations, both are exactly same or not..?
No, absolutely not the same.
You can use >> to divide, yes, but just by 2, because >> shift all the bits to the right with the consequence of dividing by 2 the number.
This is just because of how binary base operations work. And works for unsigned numbers, for signed ones it depends on which codification are you using and what kind of shift it is.
eg.
122 = 01111010 >> 1 = 00111101 = 61
Check this out for an explanation on bit shifting:
What are bitwise shift (bit-shift) operators and how do they work?
Once you understand that, you should understand the difference between that and the division operation.
Example code:
int a = 255;
byte b = (byte) a;
int c = b & 0xff; // Here be dragons
System.out.println(a);
System.out.println(b);
System.out.println(c);
So we start with an integer value of 255, convert it to a byte (becoming -1) and then converting it back to an int by using a magic formula. The expected output is:
255
-1
255
I'm wondering if this a & 0xff is the most elegant way to to this conversion. checkstyle for example complains about using a magic number at this place and it's not a good idea to ignore this value for this check because in other places 255 may really be a magic number which should be avoided. And it's quite annoying to define a constant for stuff like this on my own. So I wonder if there is a standard method in JRE which does this conversion instead? Or maybe an already defined constant with the highest unsigned byte value (similar to Byte.MAX_VALUE which is the highest signed value)
So to keep the question short: How can I convert a byte to an int without using a magic number?
Ok, so far the following possibilities were mentioned:
Keep using & 0xff and ignore the magic number 255 in checkstyle. Disadvantage: Other places which may use this number in some other scope (not bit operations) are not checked then, too. Advantage: Short and easy to read.
Define my own constant for it and then use code like & SomeConsts.MAX_UNSIGNED_BYTE_VALUE. Disadvantage: If I need it in different classes then I have to define my own constant class just for this darn constant. Advantage: No magic numbers here.
Do some clever math like b & ((1 << Byte.SIZE) - 1). The compiler output is most likely the same because it gets optimized to a constant value. Disadvantage: Pretty much code, difficult to read. Advantage: As long as 1 is not defined as magic number (checkstyle ignores it by default) we have no magic number here and we don't need to define custom constants. And when bytes are redefined to be 16 bit some day (Just kidding) then it still works because then Byte.SIZE will be 16 and not 8.
Are there more ideas? Maybe some other clever bit-wise operation which is shorter then the one above and only uses numbers like 0 and 1?
This is the standard way to do that transformation. If you want to get rid of the checkstyle complaints, try defining a constant, it could help:
public final static int MASK = 0xff;
BTW - keep in mind, that it is still a custom conversion. byte is a signed datatype so a byte can never hold the value 255. A byte can store the bit pattern 1111 1111 but this represents the integer value -1.
So in fact you're doing bit operations - and bit operations always require some magic numbers.
BTW-2 : Yes, there is a Byte.MAX_VALUE constant but this is - because byte is signed - defined as 27-1 (= 127). So it won't help in your case. You need a byte constant for -1.
Ignore checkstyle. 0xFF is not a magic number. If you define a constant for it, the constant is a magic constant, which is much less understandable than 0xFF itself. Every programmer educated in the recent centuries should be more familiar with 0xFF than with his girlfriend, if any.
should we write code like this?
for(int i = Math.ZERO; ... )
Guava to the rescue.
com.google.common.primitives.UnsignedBytes.toInt
Java 8 provides Byte.toUnsignedInt and Byte.toUnsignedLong (probably for really big bytes) methods:
byte b = (byte)255;
int c = Byte.toUnsignedInt(b); // 255
long asLong = Byte.toUnsignedLong(b); // 255
I wrote a method for this like
public static int unsigned(byte x) {
return int (x & 0xFF);
}
which is overloaded for short and int parameters, too (where int gets extended to long).
Instead of 0xFF you could use Byte.MAX_VALUE+Byte.MAX_VALUE+1 to keep FindBug shut, but I'd consider it to be an obfuscation. And it's too easy to get it wrong (s. previous versions).