Variable length-encoding of int to 2 bytes - java

I'm implementing variable lenght encoding and reading wikipedia about it. Here is what I found:
0x00000080 0x81 0x00
It mean 0x80 int is encoded as 0x81 0x00 2 bytes. That what I cannot understand. Okay, following the algorithm listed there we have.
Binary 0x80: 00000000 00000000 00000000 10000000
We move the sign bit to the next octet so we have and set to 1 (indicating that we have more octets):
00000000 00000000 00000001 10000000 which is not equals to 0x81 0x00. I tried to write a program for that:
byte[] ba = new byte[]{(byte) 0x81, (byte) 0x00};
int first = (ba[0] & 0xFF) & 0x7F;
int second = ((ba[1] & 0xFF) & 0x7F) << 7;
int result = first | second;
System.out.println(result); //prints 1, not 0x80
ideone
What did I miss?

Let's review the algorithm from the Wikipedia page:
Take the binary representation of the integer
Split it into groups of 7 bits, the group with the highest value will have less
Take these seven bits as a byte, setting the MSB (most significant bit) to 1 for all but the last; leave it 0 for the last one
We can implement the algorithm like this:
public static byte[] variableLengthInteger(int input) {
// first find out how many bytes we need to represent the integer
int numBytes = ((32 - Integer.numberOfLeadingZeros(input)) + 6) / 7;
// if the integer is 0, we still need 1 byte
numBytes = numBytes > 0 ? numBytes : 1;
byte[] output = new byte[numBytes];
// for each byte of output ...
for(int i = 0; i < numBytes; i++) {
// ... take the least significant 7 bits of input and set the MSB to 1 ...
output[i] = (byte) ((input & 0b1111111) | 0b10000000);
// ... shift the input right by 7 places, discarding the 7 bits we just used
input >>= 7;
}
// finally reset the MSB on the last byte
output[0] &= 0b01111111;
return output;
}
You can see it working for the examples from the Wikipedia page here, you can also plug in your own values and try it online.

Another Variable length encoding of integers exists and are widely used. For example ASN.1 from 1984 does define "length" field as:
The encoding of length can take two forms: short or long. The short
form is a single byte, between 0 and 127.
The long form is at least two bytes long, and has bit 8 of the first
byte set to 1. Bits 7-1 of the first byte indicate how many more bytes
are in the length field itself. Then the remaining bytes specify the
length itself, as a multi-byte integer.
This encoding is used for example in DLMS COSEM protocol or https certificates. For simple code, you can have a look at ASN.1 java library.

Related

Unexpected result after addition

I'm coding a personal project in Java right now and have recently been using bit operations for the first time. I was trying to convert two bytes into a short, with one byte being the upper 8 bits and the other being the lower 8 bits.
I ran into an error when running the first line of code below.
Incorrect Results
short regPair = (short) ( (byte1 << 8) + (byte2) );
Correct Results
short regPair = (short) ( (byte1 << 8) + (byte2 & 0xFF) );
The expected results were: AAAAAAAABBBBBBBB, where A represents bits from byte1 and B represents bits from byte2.
Using the 1st line of code I would get the typical addition between a bit-shifted byte1 with byte 2 added to it.
Example of incorrect results
byte1 = 11, byte2 = -72
result = 2816 -72
= 2744
When using the line of code which produces the expected results I can get the proper answer of 3000. I am curious as to why the bit-masking is needed for byte2. My thoughts are that it converts byte2 into binary before the addition and then performs binary addition with both bytes.
In the incorrect case, byte2 is promoted to an int because of the + operator. This doesn't just mean adding some zeros to the start of the binary representation of byte2. Since integer types are represented in two's complement in Java, 1s will be added. After the promotion, byte2 becomes:
1111 1111 1111 1111 1111 1111 1011 1000
By doing & 0xFF, you force the promotion to int first, then you keep the least significant 8 bits: 1011 1000 and make everything else 0.
Print the intermediate value directly to see what is going on. Like,
System.out.printf("%d %s%n", ((byte) -72) & 0xFF, Integer.toBinaryString(((byte) -72) & 0xFF));
I get
184 10111000
So the correct code is actually adding 184 (not subtracting 72).
So I totally forgot that byte is singed in Java, therefore when performing math with a variable of this data type it will take the signed interpretation and not the direct value of the bits. By performing byte2 & 0xFF, Java converts the signed byte value into an unsigned int with all but the first 8 bits set as 0's. Therefore you can perform binary addition correctly.
signed byte value 0x11111111 = -1
unsigned byte value 0x11111111 = 255
In both cases byte values are promoted to int in the expression when
it is evaluated.
byte byte1 = 11, byte2 = -72;
short regPair = (short) ( (byte1 << 8) + (byte2) );
(2816) + (-72) = 2744
And even in below expression byte is promoted to int
short regPair = (short) ( (byte1 << 8) + (byte2 & 0xFF) );
2816 + 184 = 3000
Here in this expression there is no concatenation of two bytes like it has been expressed in the above question- AAAAAAAABBBBBBBB, where A represents bits from byte1 and B represents bits from byte2.
Actually -7 & 255 gives 184 which is added to 2816 to give the output 3000.

Java integer/double to unsigned byte

Now I understand java doesn't have unsigned bytes, but I'm not sure how to solve this if not.
I'm trying to implement SHA256 hashing in java, and i'm in the processing of converting the message to 512-bit.
int l = bytes.length; //total amount of bytes in the original message
int k = 0;
while((l+1+k) % 512 != 448) {
k++;
}
//k is the total amount of 0's to be padded
int rest = k % 8; //get the amount of 0's to be added in the byte with the 1
byte tmp =(byte) Math.pow(2, rest);
So the key instruction is the last row, if rest = 7 the resulting int is 128, but the bytes are signed in java and so the byte becomes 0x80 instead of 0xF0.
How can I achieve this in Java?
If anyone has a idea on how to implement this part please let me know.
Starting from the assumption your message consists of bytes, the padding always works out as mupltiple of 8 bits, aka bytes. This ensures the most significant pad bit is always located in bit 7 of the first padding byte following the message, thus the padding, if any, is always started by 0x80, follwed by as many 0x00 as needed.
This can be implemented in a very simple manner:
public static byte[] padMsg(byte[] rawMsg) {
int rawLen = rawMsg.length;
int padLen = (64 - (rawLen & 0x3F)) & 0x3F;
if (padLen == 0)
return rawMsg;
// all extra bytes in padded msg are zeros.
byte[] paddedMsg = Arrays.copyOf(rawMsg, rawLen + padLen);
// ensure topmost pad bit is a one
paddedMsg[rawLen] = (byte) 0x80;
return paddedMsg;
}
This takes the message length and gets the remainder. The remainder of a power of two (in this case 64), is most effectively gotten by simply and-masking with (power - 1), and this is where the 0x3F in the code comes from (= 64 - 1).
The remainder is taken again after calculating (64 - remainder) as prelimary padding length, to catch the special case where remainder is 0, leading to a wrong padding length of 64 bytes (which should be 0 padding).
Once the padding length in bytes is known, the case padding = 0 is caught. In any other case the message length is increased (with 0x00 bytes, Arrays.copyOf does this automatically). Then the first padding byte is replaced with 0x80 and the padded message that is now guaranteed to be a multiple of 64 bytes long is returned.

How to use bitshifting in Java

I am trying to construct an IP header.
An IP header has the following fields: Version, IHL, DSCP etc. I would like to populate a Byte Array such that I can store the information in bytes.
Where I get confused however is that the Version field is only 4 bits wide. IHL is also only 4 bits wide. How do I fit the values of both of those fields to be represented as a byte? Do I need to do bitshifting?
E.g. Version = 4, IHL = 5. I would need to create a byte that would equal 0100 0101 = 45h or 69 decimal.
(byte) (4 << 4) | 5
This shifts the value 4 to the left, then sets lower 4 bits to the value 5.
00000100 A value (4)
01000000 After shifting left 4 bits (<< 4)
00000101 Another value (5)
01000101 The result of a bitwise OR (|) of #2 and #3
Because the operands are int types (and even if they were byte values, they'd be promoted to int when operators like | act on them), the final result needs a cast to be stored in a byte.
If you are using byte values as operands in any bitwise operations, the implicit conversion to int can cause unexpected results. If you want to treat a byte as if it were unsigned in that conversion, use a bitwise AND (&):
byte b = -128; // The byte value 0x80, -128d
int uint8 = b & 0xFF; // The int value 0x00000080, 128d
int i = b; // The int value 0xFFFFFF80, -128d
int uintr = (b & 0xFF) | 0x04; // 0x00000084
int sintr = b | 0x04; // 0xFFFFFF84
You can do something like this:
int a = 0x04;
a <<= 4;
a |= 0x05;
System.out.println(a);
which essentially turns 0b00000100 into 0b01000000, then into 0b01000101.
https://docs.oracle.com/javase/tutorial/java/nutsandbolts/op3.html
To make a compact field containing both Version and IHL in one byte, try doing
byte b = (byte)((Version << 4) + IHL);
This will only work if Version and IHL are numbers from 0 to 15
Just because a byte is 8 bits and your values can only be a maximum of 4 is not a problem. The extra 4 bits will just always be zeroes.
So if you were storing 1 for example:
0000 0001
or 15 (which is the maximum value right?):
0000 1111
Byte shifting is not possible in Java.
How does bitshifting work in Java?
However, as far as the logic is concerned, if you want the version and IHL in one byte, you could do it using the following
byte value = (byte) (IHL | VERSION << 4);

How to convert negative byte value to either short or integer?

We have a file which contains byte array in particular format like header and then followed by data. This file is generated by another java program and then we are reading that same file through another java program in the same format in which it was written.
The program which is making that file, populates lengthOfKey as byte which is right as that's what we need as shown below.
for (Map.Entry<byte[], byte[]> entry : holder.entrySet()) {
byte typeKey = 0;
// getting the key length as byte (that's what we need to do)
byte lengthOfKey = (byte) entry.getKey().length;
byte[] actualKey = entry.getKey();
}
Now as byte can only store maximum value as 127, we are seeing for some of our record lengthOfKey is coming as negative while we read the file as shown below:
Program which is reading the file:
byte keyType = dis.readByte();
// for some record this is coming as negative. For example -74
byte lengthOfKey = dis.readByte();
Now my question is : Is there any way I can find out what was the actual length of key because of that it got converted to -74 while writing it. I mean it should be greater than 127 and that's why it got converted to -74 but what was the actual value?
I think question would be how to convert negative byte value to either short or integer? I just wanted to verify to see what was the actual length of key because of that it got converted to negative value.
If the original length from entry.getKey().length is greater than 255, then the actual length information is lost. However, if the original length was between 128 and 255, then it can be retrieved.
The narrowing conversion of casting to byte keeps only the least significant 8 bits, but the 8th bit is now interpreted as -128 instead of 128.
You can perform a bit-and operation with 0xFF, which will retain all bits, but that implicitly widens the value back to an int.
int length = lengthOfKey & 0xFF;
lengthOfKey (byte = -74): 10110110
Widening it to an int, with sign extension:
lengthOfKey (int = -74): 11111111 11111111 11111111 10110110
Masking out the last 8 bits as an int:
length (int = 182): 00000000 00000000 00000000 10110110
That will convert a negative byte back to a number between 128 and 255.
If you use Guava, it provides a number of unsigned math utilities, including UnsignedBytes.
UnsignedBytes.toInt(lengthOfKey);
Notice their implementation of toInt() is exactly what #rgettman suggests.
To convert an assumed unsigned byte to an int.
int value = (byteValue >= (byte) 0) ? (int) byteValue : 256 + (int) byteValue;

How to convert two's complement binary byte[] to decimal?

I receive datas, from a RS422 communication, in a byte tab (byte[]).
Some of my data are in two's complement binary with the following rules :
Significant bits Two's complement
MSB LSB
00000000 0
00000001 + LSB
01111111 + MSB - LSB
10000000 - MSB
10000001 - MSB + LSB
11111111 - LSB
To convert byte[] data to decimal, in pure binary, I use the following code :
Byte b05 = new Byte(new Integer(0x7A).byteValue()); // I use those bytes for my test
Byte b06 = new Byte(new Integer(0x00).byteValue());
Byte[] byteTabDay = new Byte[2] ;
byteTabDay[0] = b05 ;
byteTabDay[1] = b06 ;
int valueDay = byteTabDay[1] << 8 | byteTabDay[0] ;
System.out.println("day :" + valueDay); // print 122
But I don't know how to convert, like previously, byte[] that contain two's complement binary data like that:
Byte b20 = new Byte(new Integer(0x00).byteValue());
Byte b21 = new Byte(new Integer(0xFF).byteValue());
Byte b22 = new Byte(new Integer(0x3C).byteValue());
Those data contain, in theory, the value (more or less) : 1176
So I need help cause I don't understand how I can convert my byte data which contains two's complement binary to decimal.
Two's complement binary is the standard for representing numbers with negative numbers included.
For three-bit numbers:
Base 2 (One's Two's
complement) complement
000 = 0
001 = 1
010 = 2
011 = 3
100 = 4 -3 -4
101 = 5 -2 -3
110 = 6 -1 -2
111 = 7 -0 -1
Java assumes two's complement: the most significant bit being 1 means negative..
Also in java byte, short, int, long are signed.
As an aside, you used the Object wrappers for the primitive types. Primitive types are more immediate.
byte b05 = (byte) 0x7A;
byte b06 = (byte) 0x00; // MSB, positive as < 0x80
nyte[] byteTabDay = new byte[2];
byteTabDay[0] = b05;
byteTabDay[1] = b06;
int valueDay = ((int) byteTabDay[1]) << 8) | (0xFF & byteTabDay[0]);
System.out.println("day :" + valueDay); // print 122
What one has to do: keep the sign extension of the most significant byte, but for other bytes keep them to 8 bits by masking them with 0xFF.
The easiest way to convert an arbitrary byte array containing a two’s complement value to a decimal representation is new BigInteger(bytearray).toString().
This, however, expects the data to be in the big endian byte order. If the data is in little endian order as it seems in your question, you have to reverse it for the use with BigInteger.
byte[] bytearray={ 0x7A, 0x00 };
ByteBuffer b=ByteBuffer.allocate(bytearray.length);
for(int ix=bytearray.length; ix>0; ) b.put(bytearray[--ix]);
System.out.println(new BigInteger(b.array()).toString()); // print 122
If the length of the array matches a standard primitive value type size, you can use ByteBuffer to get the value directly:
byte[] bytearray={ 0x7A, 0x00 };
System.out.println(ByteBuffer.wrap(bytearray)
.order(ByteOrder.LITTLE_ENDIAN).getShort()); // print 122
However, don’t expect the value 1176.61254 as a result. That’s impossible.
If you think that this is the encoded value you will have to adapt your specification. There is no standard three-byte format for floating point values.

Categories