I have some int and short like this:
int a = //...
short b = //..
What is the fastest way to craft int c with the following bit representation:
2nd and 3rd most significant bytes of a consists of bytes representation of b.
The rest of the bytes of a are left unchanged.
Maybe bitwise OR will help here but still dont see how.
For example:
a = 01010101 01010101 01010101 01010101
b = 11111111 11111111
Then we have
c = 01010101 11111111 11111111 01010101
Remove what used to be in those bytes, then put in b:
c = (a & 0xFF0000FF) | ((b << 8) & 0x00FFFF00);
The extra & after the shift is to counteract the sign-extension, which would otherwise overwrite the top byte with 1's whenever b is negative.
Related
I don't understand why there's a difference between this code:
byte b = (byte) (0xff >> 1);
(so now b = 01111111),
and this code:
byte b = (byte) 0xff;
b >>= 1;
(but now b = 11111111).
Thanks in advance for your help!
In the first code, (0xff >> 1) is 255 >> 1, which is 127. That is calculated with ints and then you cast it to a byte. 127 as a byte is 01111111 bin.
In the second code, you start with (byte) 0xff, which is 11111111 bin, which is the two's complement representation of -1 in 8 bits. So (byte) 0xff is -1.
When you perform shifting, the byte value -1 is promoted to the int value -1. That's 11111111 11111111 11111111 11111111 bin.
Shifting it right one place with the arithmetic right shift operator, (-1) >> 1 gives you 11111111 11111111 11111111 11111111 again, because the >> operator on a negative number moves the bits to the right and fills in the left with ones instead of zeroes.
Then, since you're using >>=, the result is cast back to a byte to be stored in b. That only retains the last 8 bits, which are 11111111.
Alternatively, if you used the logical right shift operator, (-1) >>> 1 would give you 01111111 11111111 11111111 11111111 in binary (a zero followed by 31 ones). Since the last 8 bits are the same, this would still give you 11111111 when it is cast back to a byte.
I'm implementing variable lenght encoding and reading wikipedia about it. Here is what I found:
0x00000080 0x81 0x00
It mean 0x80 int is encoded as 0x81 0x00 2 bytes. That what I cannot understand. Okay, following the algorithm listed there we have.
Binary 0x80: 00000000 00000000 00000000 10000000
We move the sign bit to the next octet so we have and set to 1 (indicating that we have more octets):
00000000 00000000 00000001 10000000 which is not equals to 0x81 0x00. I tried to write a program for that:
byte[] ba = new byte[]{(byte) 0x81, (byte) 0x00};
int first = (ba[0] & 0xFF) & 0x7F;
int second = ((ba[1] & 0xFF) & 0x7F) << 7;
int result = first | second;
System.out.println(result); //prints 1, not 0x80
ideone
What did I miss?
Let's review the algorithm from the Wikipedia page:
Take the binary representation of the integer
Split it into groups of 7 bits, the group with the highest value will have less
Take these seven bits as a byte, setting the MSB (most significant bit) to 1 for all but the last; leave it 0 for the last one
We can implement the algorithm like this:
public static byte[] variableLengthInteger(int input) {
// first find out how many bytes we need to represent the integer
int numBytes = ((32 - Integer.numberOfLeadingZeros(input)) + 6) / 7;
// if the integer is 0, we still need 1 byte
numBytes = numBytes > 0 ? numBytes : 1;
byte[] output = new byte[numBytes];
// for each byte of output ...
for(int i = 0; i < numBytes; i++) {
// ... take the least significant 7 bits of input and set the MSB to 1 ...
output[i] = (byte) ((input & 0b1111111) | 0b10000000);
// ... shift the input right by 7 places, discarding the 7 bits we just used
input >>= 7;
}
// finally reset the MSB on the last byte
output[0] &= 0b01111111;
return output;
}
You can see it working for the examples from the Wikipedia page here, you can also plug in your own values and try it online.
Another Variable length encoding of integers exists and are widely used. For example ASN.1 from 1984 does define "length" field as:
The encoding of length can take two forms: short or long. The short
form is a single byte, between 0 and 127.
The long form is at least two bytes long, and has bit 8 of the first
byte set to 1. Bits 7-1 of the first byte indicate how many more bytes
are in the length field itself. Then the remaining bytes specify the
length itself, as a multi-byte integer.
This encoding is used for example in DLMS COSEM protocol or https certificates. For simple code, you can have a look at ASN.1 java library.
I am trying to construct an IP header.
An IP header has the following fields: Version, IHL, DSCP etc. I would like to populate a Byte Array such that I can store the information in bytes.
Where I get confused however is that the Version field is only 4 bits wide. IHL is also only 4 bits wide. How do I fit the values of both of those fields to be represented as a byte? Do I need to do bitshifting?
E.g. Version = 4, IHL = 5. I would need to create a byte that would equal 0100 0101 = 45h or 69 decimal.
(byte) (4 << 4) | 5
This shifts the value 4 to the left, then sets lower 4 bits to the value 5.
00000100 A value (4)
01000000 After shifting left 4 bits (<< 4)
00000101 Another value (5)
01000101 The result of a bitwise OR (|) of #2 and #3
Because the operands are int types (and even if they were byte values, they'd be promoted to int when operators like | act on them), the final result needs a cast to be stored in a byte.
If you are using byte values as operands in any bitwise operations, the implicit conversion to int can cause unexpected results. If you want to treat a byte as if it were unsigned in that conversion, use a bitwise AND (&):
byte b = -128; // The byte value 0x80, -128d
int uint8 = b & 0xFF; // The int value 0x00000080, 128d
int i = b; // The int value 0xFFFFFF80, -128d
int uintr = (b & 0xFF) | 0x04; // 0x00000084
int sintr = b | 0x04; // 0xFFFFFF84
You can do something like this:
int a = 0x04;
a <<= 4;
a |= 0x05;
System.out.println(a);
which essentially turns 0b00000100 into 0b01000000, then into 0b01000101.
https://docs.oracle.com/javase/tutorial/java/nutsandbolts/op3.html
To make a compact field containing both Version and IHL in one byte, try doing
byte b = (byte)((Version << 4) + IHL);
This will only work if Version and IHL are numbers from 0 to 15
Just because a byte is 8 bits and your values can only be a maximum of 4 is not a problem. The extra 4 bits will just always be zeroes.
So if you were storing 1 for example:
0000 0001
or 15 (which is the maximum value right?):
0000 1111
Byte shifting is not possible in Java.
How does bitshifting work in Java?
However, as far as the logic is concerned, if you want the version and IHL in one byte, you could do it using the following
byte value = (byte) (IHL | VERSION << 4);
I'm trying to decode somebody's byte array and I'm stuck at this part:
< state > ::= "01" <i>(2 bits) for A</i>
"10" <i>(2 bits) for B</i>
"11" <i>(2 bits) for C</i>
I think this wants me to look at the next 2 bits of the next byte. Would that mean the least or most significant digits of the byte? I suppose I would just throw away the last 6 bits if it means the least significant?
I found this code for looking at the bits of a byte:
for (int i = 0; i < byteArray.Length; i++)
{
byte b = byteArray[i];
byte mask = 0x01;
for (int j = 0; j < 8; j++)
{
bool value = b & mask;
mask << 1;
}
}
Can someone expand on what this does exactly?
Just to give you a start:
To extract individual bits of a byte, you use "&", called the bitwise and operator. The bitwise and operation means "preserve all bits which are set on both sides". E.g. when you calculate the bitwise-and of two bytes, e.g. 00000011 & 00000010, then the result is 00000010, because only the bit at the second last position is set in both sides.
In java programming language, the very same example looks like this:
int a = 3;
int b = 2;
int bitwiseAndResult = a & b; // bitwiseAndResult will be equal to 2 after this
Now to examine if the n'th bit of some int is set, you can do this:
int intToExamine = ...;
if ((intToExamine >> n)) & 1 != 0) {
// here we know that the n'th bit was set
}
The >> is called the bitshift operator. It simply shifts the bits from left to right, like this: 00011010 >> 2 will have the result 00000110.
So from the above you can see that for extracting the n'th bit of some value, you first shift the n'th bit to position 0 (note that the first bit is bit 0, not bit 1), and then you use the bitwise and operator (&) to only keep that bit 0.
Here are some simple examples of bitwise and bit shift operators:
http://www.tutorialspoint.com/java/java_bitwise_operators_examples.htm
I've been reading the book TCP/IP Sockets in Java, 2nd Edition. I was hoping to get more clarity on something, but since the book's website doesn't having a forum or anything, I thought I'd ask here.
In several places, the book uses a byte mask to avoid sign extension. Here's an example:
private final static int BYTEMASK = 0xFF; //8 bits
public static long decodeIntBigEndian(byte[] val, int offset, int size) {
long rtn = 0;
for(int i = 0; i < size; i++) {
rtn = (rtn << Byte.SIZE) | ((long) val[offset + i] & BYTEMASK);
}
return rtn;
}
So here's my guess of what's going on. Let me know if I'm right.
BYTEMASK in binary should look like 00000000 00000000 00000000 11111111.
To make things easy, let's just say the val byte array only contains 1 short so the offset is 0. So let's set the byte array to val[0] = 11111111, val[1] = 00001111. At i = 0, rtn is all 0's so rtn << Byte.SIZE just keeps the value the same. Then there's (long)val[0] making it 8 bytes with all 1's due to sign extension. But when you use & BYTEMASK, it sets all those extra 1's to 0's, leaving that last byte all 1's. Then you get rtn | val[0] which basically flips on any 1's in the last byte of rtn. For i = 1, (rtn << Byte.SIZE) pushes the least-significant byte over and leaves all 0's in place. Then (long)val[1] makes a long with all zero's plus 00001111 for the least-significant byte which is what we want. So using & BYTEMASK doesn't change it. Then when rtn | val[1] is used, it flips rtn's least-significant byte to all 1's. The final return value is now rtn = 00000000 00000000 00000000 00000000 00000000 00000000 11111111 11111111.
So, I hope this wasn't too long, and it was understandable. I just want to know if the way I'm thinking about this is correct, and not just completely wacked out logic. Also, one thing that confuses me is the BYTEMASK is 0xFF. In binary, this would be 11111111 11111111, so if it's being implicitly cast to an int, wouldn't it actually be 11111111 11111111 11111111 11111111 due to sign-extension? If that's the case, then it doesn't make sense to me how BYTEMASK would even work. Thank you for reading.
Everything is right except for the last point:
0xFF is already an int (0x000000FF), so it won't be sign-extended. In general, integer number literals in Java are ints unless they end with an L or l and then they are longs.