This question already has answers here:
Java - Explicit Conversion from Int to Short
(3 answers)
Closed 7 years ago.
I am working on narrowing and checked the following code :-
int i = 131072;
short s = (short)i;
System.out.println(s); //giving 0
This narrowing is outputting 0. I am not able to get the logic behind.
131072 int is 00000000 00000010 00000000 00000000 in binary.
When you case it to short, only the lowest 16 bits remain - 00000000 00000000.
When you cast a primitive to a smaller primitive the top bits are dropped.
Written another way you can see what is happening.
int i = 0x20000;
short s = (short) (i & 0xFFFF);
Note: the lower 16 bits of your integer are all zero, so the answer is 0.
As binary casting to (short) keeps only the lower 16 bits.
00000000 00000010 (00000000 00000000)
If you were to cast a longer number, it would still take the lower bits in each case. Note: the & in each case is redundant and only to help clarity.
long l = 0x0FEDCBA987654321L;
// i = 0x87654321
int i = (int) (l & 0xFFFFFFFFL);
// c = \u4321
char c = (char) (l & 0xFFFF);
// s = 0x4321
short s = (short) (l & 0xFFFF);
// b = 0x21
byte b = (byte) (l & 0xFF);
Primitive values (int, short,...) are stored as binary values. int uses more bits than short. When you try to downcast you're cutting away bits which truncates (and potentially ruins) the value.
This is not a down cast (which refers to objects), it's a narrowing cast, or truncation. When you perform such a cast, you just copy the two least significant bytes of the int to your short. If the integer if smaller than 215 you'd just ignore bytes containing zeroes, so it would just work.
This is not the case here, however. If you examine the binary representation of 131072 you'll see it's 100000000000000000. So, the two least significant bytes are clearly 0, which is exactly what you're getting.
Related
As I was using bit-shifting on byte, I notice I was getting weird results when using unsigned right shift (>>>). With int, both right shift (signed:>> and unsigned:>>>) behave as expected:
int min1 = Integer.MIN_VALUE>>31; //min1 = -1
int min2 = Integer.MIN_VALUE>>>31; //min2 = 1
But when I do the same with byte, strange things happen with unsigned right shift:
byte b1 = Byte.MIN_VALUE; //b1 = -128
b1 >>= 7; //b1 = -1
byte b2 = Byte.MIN_VALUE; //b2 = -128
b2 >>>= 7; //b2 = -1; NOT 1!
b2 >>>= 8; //b2 = -1; NOT 0!
I figured that it could be that the compiler is converting the byte to int internally, but does not seem quite sufficient to explain that behaviour.
Why is bit-shifting behaving that way with byte in Java?
This happens exactly because byte is promoted to int prior performing bitwise operations. int -128 is presented as:
11111111 11111111 11111111 10000000
Thus, shifting right to 7 or 8 bits still leaves 7-th bit 1, so result is narrowed to negative byte value.
Compare:
System.out.println((byte) (b >>> 7)); // -1
System.out.println((byte) ((b & 0xFF) >>> 7)); // 1
By b & 0xFF, all highest bits are cleared prior shift, so result is produced as expected.
Shift operators for byte, short and char are always done on int.
Therefore, the value really being shifted is the int value -128, which looks like this
int b = 0b11111111_11111111_11111111_10000000;
When you do b2 >>= 7; what you are really doing is shifting the above value 7 places to the right, then casting back to a byte by only considering the last 8 bits.
After shifting 7 places to the right we get
0b11111111_11111111_11111111_11111111;
When we convert this back to a byte, we get just 11111111, which is -1, because the byte type is signed.
If you want to get the answer 1 you could shift 31 places without sign extension.
byte b2 = Byte.MIN_VALUE; //b2 = -128
b2 >>>= 31;
System.out.println(b2); // 1
Refer to JLS 15.19 Shift Operators:
Unary numeric promotion (ยง5.6.1) is performed on each operand separately.
and in 5.6.1 Unary Numeric Promotion :
if the operand is of compile-time type byte, short, or char, it is promoted to a value of type int by a widening primitive conversion
So, your byte operands are promoted to int before shifting. The value -128 is 11111111111111111111111110000000 .
After the shifting 7 or 8 times, the lowest 8 bits are all 1s, which when assigning to a byte, a narrowing primitive conversion occurs. Refer to JLS 5.1.3 Narrowing Primitive Conversion :
A narrowing conversion of a signed integer to an integral type T simply discards all but the n lowest order bits, where n is the number of bits used to represent type T.
I am trying to construct an IP header.
An IP header has the following fields: Version, IHL, DSCP etc. I would like to populate a Byte Array such that I can store the information in bytes.
Where I get confused however is that the Version field is only 4 bits wide. IHL is also only 4 bits wide. How do I fit the values of both of those fields to be represented as a byte? Do I need to do bitshifting?
E.g. Version = 4, IHL = 5. I would need to create a byte that would equal 0100 0101 = 45h or 69 decimal.
(byte) (4 << 4) | 5
This shifts the value 4 to the left, then sets lower 4 bits to the value 5.
00000100 A value (4)
01000000 After shifting left 4 bits (<< 4)
00000101 Another value (5)
01000101 The result of a bitwise OR (|) of #2 and #3
Because the operands are int types (and even if they were byte values, they'd be promoted to int when operators like | act on them), the final result needs a cast to be stored in a byte.
If you are using byte values as operands in any bitwise operations, the implicit conversion to int can cause unexpected results. If you want to treat a byte as if it were unsigned in that conversion, use a bitwise AND (&):
byte b = -128; // The byte value 0x80, -128d
int uint8 = b & 0xFF; // The int value 0x00000080, 128d
int i = b; // The int value 0xFFFFFF80, -128d
int uintr = (b & 0xFF) | 0x04; // 0x00000084
int sintr = b | 0x04; // 0xFFFFFF84
You can do something like this:
int a = 0x04;
a <<= 4;
a |= 0x05;
System.out.println(a);
which essentially turns 0b00000100 into 0b01000000, then into 0b01000101.
https://docs.oracle.com/javase/tutorial/java/nutsandbolts/op3.html
To make a compact field containing both Version and IHL in one byte, try doing
byte b = (byte)((Version << 4) + IHL);
This will only work if Version and IHL are numbers from 0 to 15
Just because a byte is 8 bits and your values can only be a maximum of 4 is not a problem. The extra 4 bits will just always be zeroes.
So if you were storing 1 for example:
0000 0001
or 15 (which is the maximum value right?):
0000 1111
Byte shifting is not possible in Java.
How does bitshifting work in Java?
However, as far as the logic is concerned, if you want the version and IHL in one byte, you could do it using the following
byte value = (byte) (IHL | VERSION << 4);
I have the following C-code (from FFMPEG):
static inline av_const int sign_extend(int val, unsigned bits)
{
unsigned shift = 8 * sizeof(int) - bits;
union { unsigned u; int s; } v = { (unsigned) val << shift };
return v.s >> shift;
}
I'm trying to reproduce this in JAVA. But I have difficulties understanding this. No matter how I toss the bits around, I don't get very close.
As for the value parameter: it takes unsigned byte value as int.
Bits parameter: 4
If the value is 255 and bits is 4. It returns -1. I can't reproduce this in JAVA. Sorry for such fuzzy question. But can you help me understand this code?
The big picture is that I'm trying to encode EA ADPCM audio in JAVA. In FFMPEG:
https://gitorious.org/ffmpeg/ffmpeg/source/c60caa5769b89ab7dc4aa41a21f87d5ee177bd30:libavcodec/adpcm.c#L981
Strictly speaking, the result of running this code with this input data has unspecified results because signed bitshift in C is only properly defined in circumstances that this scenario does not meet. From the C99 standard:
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has unsigned type or if E1 has signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2E2. If E1 has a signed type and negative value, the resulting value is implementation-defined.
(Emphasis mine)
But let's assume that your implementation defines signed right shift to extend the sign, meaning that the space on the left will be filled with ones if the sign bit is set and zeroes otherwise; the ffmpeg code clearly expects this to be the case. The following is happening: shift has the value of 28 (assuming 32-bit integers). In binary notation:
00000000 00000000 00000000 11111111 = val
11110000 00000000 00000000 00000000 = (unsigned) val << shift
Note that when interpreting (unsigned) val << shift as signed integer, as the code proceeds to do (assuming two's complement representation, as today's computers all use1), the sign bit of that integer is set, so a signed shift to the right fills up with zeroes from the left, and we get
11110000 00000000 00000000 00000000 = v.s
11111111 11111111 11111111 11111111 = v.s >> shift
...and in two's complement representation, that is -1.
In Java, this trick works the same way -- except better, because there the behavior is actually guaranteed. Simply:
public static int sign_extend(int val, int bits) {
int shift = 32 - bits; // int always has 32 bits in Java
int s = val << shift;
return s >> shift;
}
Or, if you prefer:
public static int sign_extend(int val, int bits) {
int shift = 32 - bits;
return val << shift >> shift;
}
1 Strictly speaking, this conversion does not have a well-defined value in the C standard either, for historical reasons. There used to be computers that used different representations, and the same bit pattern with a set sign bit has a completely different meaning in (for example) signed magnitude representation.
The reason why the code looks so odd is that C language is full of undefined behaviours that in Java are well-defined. For example in C bit-shifting a signed integer left so that the sign-bit changes is undefined behaviour and at that point the program can do anything - whatever the compiler causes the program to do - crash, print 42, make true = false, anything can happen, and the compiler still compiled it correctly.
Now the code uses a 1 trick to shift the integer left: it uses an union that lays the bytes of members top of each other - making an unsigned and an signed integer to occupy the same bytes; the bitshift is defined with the unsigned integer; so we do the unsigned shift using it; then shift back using the signed shift (the code assumes that the right shift of a negative number produces properly sign-extended negative numbers, which is also not guaranteed by standard but usually these kinds of libraries have a configuration utility that can refuse compilation on such a quite esoteric platform; likewise this program assumes that CHAR_BIT is 8; however C only makes a guarantee that a char is at least 8 bits wide.
In Java, you do not need anything like a union to accomplish this; instead you do:
static int signExtend(int val, int bits) {
int shift = 32 - bits; // fixed size
int v = val << shift;
return v >> shift;
}
In Java the width of an int is always 32 bits; << can be used for both signed and unsigned shift; and there is no undefined behaviour for extending to the sign bit; >> can be used for signed shift (>>> would be unsigned).
given this code:
static inline av_const int sign_extend(int val, unsigned bits)
{
unsigned shift = 8 * sizeof(int) - bits;
union { unsigned u; int s; } v = { (unsigned) val << shift };
return v.s >> shift;
}
the 'static' modifier says the function is not visible outside the current file.
The 'inline' modifier is a 'request' to the compiler to place the code 'inline' whereever the function is called rather than having a separate function with the associated call/return code sequences
the 'sign_extend' is the name of the function
in C, a right shift, for a signed value will propagate the sign bit,
In C, a right shift, for a unsigned value will zero fill.
It looks like your java is doing the zero fill.
regarding this line:
unsigned shift = 8 * sizeof(int) - bits;
on a 32bit machine, an integer is 32 bits and size of int is 4
so the variable 'shift' will contain (8*4)-bits
regarding this line:
union { unsigned u; int s; } v = { (unsigned) val << shift };
left shift of unsigned will shift the bits left,
with the upper bits being dropped into the bit bucket
and the lower bits being zero filled.
regarding this line:
return v.s >> shift;
this shifts the bits back to their original position,
while propagating the (new) sign bit
This question already has answers here:
Odd behavior when Java converts int to byte?
(12 answers)
Closed 9 years ago.
int i = 234;
byte b = (byte) i;
System.out.println(b); // -22
int i2 = b & 0xFF;
System.out.println(i2); // 234
I was looking at this code and was confused about how they values were stored. The first int is stored as 32 bits (4 bytes). b converts to binary and stores its signed value (8 bits). Does i2 store it as an 8 bit unsigned representation or does it convert it back to 32 bits?
Java does not have unsigned primitive types. All byte variables are signed 8-bit values.
Whether or not the most significant bit is interpreted as a sign bit, when you do bit-wise operations all the bits that are present are used by the operator. To make this concrete, the following are equivalent:
i2 = b & 0xFF;
i2 = b & ((byte) -1);
i2 is declares as an int, so it should be a full 32-bit value. It will contain the binary AND of b, which in this case is the lower 8 bits and 0xFF.
Things aren't "stored as" an unsigned or signed representation. This is a nitpicky distinction, but it seems to be where you're getting confused. The only difference is how the bits are interpreted. What's happening here is the following:
i stores the value 11101010 (plus some number of higher order bytes). Under Java's convention for integer storage format, this represents the value 234.
b stores the value 11101010, since ints are converted to bytes by simply truncating them. Under Java's convention for numerical byte storage format, this represents the value -22.
i2 stores the value 11101010 (plus some number of higher order bytes), because a bitwise operation was applied to b so its bits were directly copied. (Had you written int i2 = b or int i2 = b + 0, the byte would have been converted to its numerical value -22, and then stored as the int representation of -22.
I've been reading the book TCP/IP Sockets in Java, 2nd Edition. I was hoping to get more clarity on something, but since the book's website doesn't having a forum or anything, I thought I'd ask here.
In several places, the book uses a byte mask to avoid sign extension. Here's an example:
private final static int BYTEMASK = 0xFF; //8 bits
public static long decodeIntBigEndian(byte[] val, int offset, int size) {
long rtn = 0;
for(int i = 0; i < size; i++) {
rtn = (rtn << Byte.SIZE) | ((long) val[offset + i] & BYTEMASK);
}
return rtn;
}
So here's my guess of what's going on. Let me know if I'm right.
BYTEMASK in binary should look like 00000000 00000000 00000000 11111111.
To make things easy, let's just say the val byte array only contains 1 short so the offset is 0. So let's set the byte array to val[0] = 11111111, val[1] = 00001111. At i = 0, rtn is all 0's so rtn << Byte.SIZE just keeps the value the same. Then there's (long)val[0] making it 8 bytes with all 1's due to sign extension. But when you use & BYTEMASK, it sets all those extra 1's to 0's, leaving that last byte all 1's. Then you get rtn | val[0] which basically flips on any 1's in the last byte of rtn. For i = 1, (rtn << Byte.SIZE) pushes the least-significant byte over and leaves all 0's in place. Then (long)val[1] makes a long with all zero's plus 00001111 for the least-significant byte which is what we want. So using & BYTEMASK doesn't change it. Then when rtn | val[1] is used, it flips rtn's least-significant byte to all 1's. The final return value is now rtn = 00000000 00000000 00000000 00000000 00000000 00000000 11111111 11111111.
So, I hope this wasn't too long, and it was understandable. I just want to know if the way I'm thinking about this is correct, and not just completely wacked out logic. Also, one thing that confuses me is the BYTEMASK is 0xFF. In binary, this would be 11111111 11111111, so if it's being implicitly cast to an int, wouldn't it actually be 11111111 11111111 11111111 11111111 due to sign-extension? If that's the case, then it doesn't make sense to me how BYTEMASK would even work. Thank you for reading.
Everything is right except for the last point:
0xFF is already an int (0x000000FF), so it won't be sign-extended. In general, integer number literals in Java are ints unless they end with an L or l and then they are longs.