Java integer/double to unsigned byte - java

Now I understand java doesn't have unsigned bytes, but I'm not sure how to solve this if not.
I'm trying to implement SHA256 hashing in java, and i'm in the processing of converting the message to 512-bit.
int l = bytes.length; //total amount of bytes in the original message
int k = 0;
while((l+1+k) % 512 != 448) {
k++;
}
//k is the total amount of 0's to be padded
int rest = k % 8; //get the amount of 0's to be added in the byte with the 1
byte tmp =(byte) Math.pow(2, rest);
So the key instruction is the last row, if rest = 7 the resulting int is 128, but the bytes are signed in java and so the byte becomes 0x80 instead of 0xF0.
How can I achieve this in Java?
If anyone has a idea on how to implement this part please let me know.

Starting from the assumption your message consists of bytes, the padding always works out as mupltiple of 8 bits, aka bytes. This ensures the most significant pad bit is always located in bit 7 of the first padding byte following the message, thus the padding, if any, is always started by 0x80, follwed by as many 0x00 as needed.
This can be implemented in a very simple manner:
public static byte[] padMsg(byte[] rawMsg) {
int rawLen = rawMsg.length;
int padLen = (64 - (rawLen & 0x3F)) & 0x3F;
if (padLen == 0)
return rawMsg;
// all extra bytes in padded msg are zeros.
byte[] paddedMsg = Arrays.copyOf(rawMsg, rawLen + padLen);
// ensure topmost pad bit is a one
paddedMsg[rawLen] = (byte) 0x80;
return paddedMsg;
}
This takes the message length and gets the remainder. The remainder of a power of two (in this case 64), is most effectively gotten by simply and-masking with (power - 1), and this is where the 0x3F in the code comes from (= 64 - 1).
The remainder is taken again after calculating (64 - remainder) as prelimary padding length, to catch the special case where remainder is 0, leading to a wrong padding length of 64 bytes (which should be 0 padding).
Once the padding length in bytes is known, the case padding = 0 is caught. In any other case the message length is increased (with 0x00 bytes, Arrays.copyOf does this automatically). Then the first padding byte is replaced with 0x80 and the padded message that is now guaranteed to be a multiple of 64 bytes long is returned.

Related

Variable length-encoding of int to 2 bytes

I'm implementing variable lenght encoding and reading wikipedia about it. Here is what I found:
0x00000080 0x81 0x00
It mean 0x80 int is encoded as 0x81 0x00 2 bytes. That what I cannot understand. Okay, following the algorithm listed there we have.
Binary 0x80: 00000000 00000000 00000000 10000000
We move the sign bit to the next octet so we have and set to 1 (indicating that we have more octets):
00000000 00000000 00000001 10000000 which is not equals to 0x81 0x00. I tried to write a program for that:
byte[] ba = new byte[]{(byte) 0x81, (byte) 0x00};
int first = (ba[0] & 0xFF) & 0x7F;
int second = ((ba[1] & 0xFF) & 0x7F) << 7;
int result = first | second;
System.out.println(result); //prints 1, not 0x80
ideone
What did I miss?
Let's review the algorithm from the Wikipedia page:
Take the binary representation of the integer
Split it into groups of 7 bits, the group with the highest value will have less
Take these seven bits as a byte, setting the MSB (most significant bit) to 1 for all but the last; leave it 0 for the last one
We can implement the algorithm like this:
public static byte[] variableLengthInteger(int input) {
// first find out how many bytes we need to represent the integer
int numBytes = ((32 - Integer.numberOfLeadingZeros(input)) + 6) / 7;
// if the integer is 0, we still need 1 byte
numBytes = numBytes > 0 ? numBytes : 1;
byte[] output = new byte[numBytes];
// for each byte of output ...
for(int i = 0; i < numBytes; i++) {
// ... take the least significant 7 bits of input and set the MSB to 1 ...
output[i] = (byte) ((input & 0b1111111) | 0b10000000);
// ... shift the input right by 7 places, discarding the 7 bits we just used
input >>= 7;
}
// finally reset the MSB on the last byte
output[0] &= 0b01111111;
return output;
}
You can see it working for the examples from the Wikipedia page here, you can also plug in your own values and try it online.
Another Variable length encoding of integers exists and are widely used. For example ASN.1 from 1984 does define "length" field as:
The encoding of length can take two forms: short or long. The short
form is a single byte, between 0 and 127.
The long form is at least two bytes long, and has bit 8 of the first
byte set to 1. Bits 7-1 of the first byte indicate how many more bytes
are in the length field itself. Then the remaining bytes specify the
length itself, as a multi-byte integer.
This encoding is used for example in DLMS COSEM protocol or https certificates. For simple code, you can have a look at ASN.1 java library.

How can I concatenate two bytes in java?

I have an integer called writePos that takes a value between [0,1023]. I need to store it in the last two bytes of a byte array called bucket. So, I figure I need to represent it as a concatenation of the array's last two bytes.
How would I go about breaking down writePos into two bytes that, when concatenated and cast into an int, produces writePos again?
How would I go about concatenating once I get it broken down into the bytes?
This would be covered high-level by a ByteBuffer.
short loc = (short) writeLocation;
byte[] bucket = ...
int idex = bucket.length - 2;
ByteBuffer buf = ByteBuffer.wrap(bucket);
buf.order(ByteOrder.LITTLE__ENDIAN); // Optional
buf.putShort(index, loc);
writeLocation = buf.getShort(index);
The order can be specified, or left to the default (BIG_ENDIAN).
The ByteBuffer wraps the original byte array, and changes to ByteBuffer effect on the byte array too.
One can use sequential writing and reading an positioning (seek), but here I use overloaded methods for immediate positioning with index.
putShort writes to the byte array, modifying two bytes, a short.
getShort reads a short from the byte array, which can be put in an int.
Explanation
A short in java is a two-byte (signed) integral number. And that is what is meant. The order is whether LITTLE_ENDIAN: least significant byte first (n % 256, n / 256) or big endian.
Bitwise operations.
To byte:
byte[] bytes = new byte[2];
// This uses a bitwise and (&) to take only the last 8 bits of i
byte[0] = (byte)(i & 0xff);
// This uses a bitwise and (&) to take the 9th to 16th bits of i
// It then uses a right shift (>>) then move them right 8 bits
byte[1] = (byte)((i & 0xff00) >> 8);from byte:
To go back the other way
// This just reverses the shift, no need for masking.
// The & here is used to handle complications coming from the sign bit that
// will otherwise be moved as the bytes are combined together and converted
// into an int
i = (byte[0] & 0xFF)+(byte[1] & 0xFF)<<8;
There is a working example here of some of the conversions that you can play around with:
http://ideone.com/eRzsun
You need to split the integer into two bytes. The high and the low byte. Following your description it's stored as bug endian in the array.
int writeLocation = 511;
byte[] bucket = new byte[10];
// range checks must be done before
// bitwise right rotation by 8 bits
bucket[8] = (byte) (writeLocation >> 8); // the high byte
bucket[9] = (byte) (writeLocation & 0xFF); // the low byte
System.out.println("bytes = " + Arrays.toString(bucket));
// convert back the integer value 511 from the two bytes
bucket[8] = 1;
bucket[9] = (byte) (0xFF);
// the high byte will bit bitwise left rotated
// the low byte will be converted into an int
// and only the last 8 bits will be added
writeLocation = (bucket[8] << 8) + (((int) bucket[9]) & 0xFF);
System.out.println("writeLocation = " + writeLocation);

What kind of padding should AES use?

I have implemented the AES encryption (homework), but I stumble upon the problem of padding the messages.
If my messages are arrays of bytes as such:
public byte[] encrypt(byte[] message) {
int size = (int) Math.ceil(message.length / 16.0);
byte[] result = new byte[size * 16];
for (int i = 0; i < size; i++) {
if ((i+1) * 16 > message.length){
//padding here????
} else {
byte[] block = Arrays.copyOfRange(message, i * 16, (i + 1) * 16);
byte[] encryptedBlock = encryptBlock(block);
System.arraycopy(encryptedBlock, 0, result, i*16, 16);
}
}
return result;
}
How can I pad such a message?
I cannot use Zero Padding because the each byte could be zero, and it might affect such a message with trailing zeros.
I cannot find any reference to how is this done not even here (the paper describing the AES encryption)
There are a number of methods you can use, from simple to advanced. Bruce Schneier suggests two rather simple methods:
One is to pad the last block with n bytes all with value n, which is what Alex Wien suggested. This has issues (including restricting you to block sizes that are less than 256 bytes long). This padding mode is known as PKCS#7 padding (for 16 byte blocks) or PKCS#5 padding (for 8 byte blocks).
The other is to append a byte with value 0x80 (a byte with value 1000 0000 in binary) followed by as many zero bytes as needed to fill the last block. This method is known as ISO padding, which is short for ISO/IEC 9797-1 padding method 2. The padding itself is bit-level padding, a single bit valued 1 is added, and then add 0 valued bits until you reach the block size.
As for how to know whether a message is padded, the answer is a message will always be padded: even if the last chunk of the message fits perfectly inside a block (i.e. the size of the message is a multiple of the block size), you will have to add a dummy last block.
If you are interested in researching some of the more advanced methods, look up a technique called ciphertext stealing on wikipedia: http://en.wikipedia.org/wiki/Ciphertext_stealing
There is a trick with padding:
You should pad with the byte represntation of the length of the padding:
333
or
4444
or 999999999
Then when you read the firstpadding byte later, you know how many bytes are left to read.

Get byte representation of int, using only 3 bytes

What's a nice, readable way of getting the byte representation (i.e. a byte[]) of an int, but only using 3 bytes (instead of 4)? I'm using Hadoop/Hbase and their Bytes utility class has a toBytes function but that will always use 4 bytes.
Ideally, I'd also like a nice, readable way of encoding to as few bytes as possible, i.e. if the number fits in one byte then only use one.
Please note that I'm storing this in a byte[], so I know the length of the array and thus variable length encoding is not necessary. This is about finding an elegant way to do the cast.
A general solution for this is impossible.
If it were possible, you could apply the function iteratively to obtain unlimited compression of data.
Your domain might have some constraints on the integers that allow them to be compressed to 24-bits. If there are such constraints, please explain them in the question.
A common variable size encoding is to use 7 bits of each byte for data, and the high bit as a flag to indicate when the current byte is the last.
You can predict the number of bytes needed to encode an int with a utility method on Integer:
int n = 4 - Integer.numberOfLeadingZeros(x) / 8;
byte[] enc = new byte[n];
while (n-- > 0)
enc[n] = (byte) ((x >>> (n * 8)) & 0xFF);
Note that this will encode 0 as an empty array, and other values in little-endian format. These aspects are easily modified with a few more operations.
If you need to represent the whole 2^32 existing 4-byte integers, you need to chose between:
fixed-size representation, using 4 bytes always; or
variable-size representation, using at least 5 bytes for some numbers.
Take a look on how UTF-8 encodes the Unicode charactes, you might get some insights. (you use some short prefix to describe how many bytes must be read for that unicode character, then you read that many bytes and interpret them).
Try using ByteBuffer. You can even set little endian mode if required:
int exampleInt = 0x11FFFFFF;
ByteBuffer buf = ByteBuffer.allocate(Integer.SIZE / Byte.SIZE);
final byte[] threeByteBuffer = new byte[3];
buf.putInt(exampleInt);
buf.position(1);
buf.get(threeByteBuffer);
Or the shortest signed, Big Endian:
BigInteger bi = BigInteger.valueOf(exampleInt);
final byte[] shortestSigned = bi.toByteArray();
Convert your int to a 4 bytes array, and iterate it, if every high order byte is zero then remove it from array.
Something like:
byte[] bytes = toBytes(myInt);
int neededBytes = 4;
for (;neededBytes > 1; i--) {
if (bytes[neededBytes - 1] != 0) {
break;
}
}
byte[] result = new byte[neededBytes];
// then just use array copy to copy first neededBytes to result.
You can start with something like this:
byte[] Convert(int i)
{ // warning: untested
if (i == 0)
return new byte[0];
if (i > 0 && i < 256)
return new byte[]{(byte)i};
if (i > 0 && i < 256 * 256)
return new byte[]{(byte)i, (byte)(i >> 8)};
if (i > 0 && i < 256 * 256 * 256)
return new byte[]{(byte)i, (byte)(i >> 8), (byte)(i >> 16)};
return new byte[]{(byte)i, (byte)(i >> 8), (byte)(i >> 16), (byte)(i >> 24)};
}
You'll need to decide if you want to be little-endian or big-endian. Note that negative numbers are encoded in 4 bytes.
If i understand right that you really, desperately want to save space, even at expense of arcane bit shuffling: any array type is an unecessary luxury because you cannot use less than one whole byte for the length = addressing space 256 while you know that at most 4 will be needed. So i would reserve 4 bits for the length and sign flag and cram the rest aligned to that number of bytes. You might even save one more byte if your MSB is less than 128. The sign flag i see useful for ability to represent negative numbers in less than 4 bytes too. Better have the bit there every time (even for positive numbers) than overhead of 4 bytes for representing -1.
Anyway, this all is a thin water until you make some statistics on your data set, how many integers are actually compressible and whether the compression overhead is worth the effort.

Java - converting int to byte array without considering sign

To convert an int into a byte array, I'm using the following code:
int a = 128;
byte[] b = convertIntValueToByteArray(a);
private static byte[] convertIntValueToByteArray(int intValue){
BigInteger bigInteger = BigInteger.valueOf(intValue);
byte[] origByteArray = bigInteger.toByteArray();
byte[] noSignByteArray = new byte[bigInteger.bitLength()/8];
if(bigInteger.bitLength()%8!=0){
noSignByteArray = origByteArray;
}else{
System.arraycopy(origByteArray,1,noSignByteArray,0,noSignByteArray.length);
}
return noSignByteArray;
}
There are two things which I'm attempting to do.
1)I need to know the number of bytes (rounded up to the closes byte) of the original integer. However, I don't need the additional bit that is added for the sign bit when I call the toByteArray() method. This is the reason why I have the helper method. So in this example, if I don't have the helper method, when I convert 128 to a byte array I get the length to be 2 octets because of the sign bit but I'm only expecting it to be one octet.
2)I need the positive representation of the number. In this example, if I attempt to print the first element in array b, I get -128. However, the numbers I will be using will be positive numbers only so what I actually want is 128. I'm limited to using a byte array. Is there a way to accomplish this?
Updated Post
Thank you for the responses. I haven't found the exact answer I was looking for so I'll attempt to give more details. Ultimately, I want to write values of different types over a data output stream. In this post, I'd like to clarify what happens when ints are written to a data output stream. I've come across two scenarios.
1)
DataOutputStream os = new DataOutputStream(this.socket.getOutputStream());
byte[] b = BigInteger.valueOf(128).toByteArray();
os.write(b);
2)
DataOutputStream os = new DataOutputStream(this.socket.getOutputStream());
os.write(128);
In the first scenario, when the bytes are read from a data input stream, it seems that the first element in the byte array is a 0 to represent the msb and the second element in the array contains the number -128. However, since the msb is 0 we would be able to determine that it is intended to be a positive number. In the second scenario, there is no msb and the only element present in the byte array read from the input stream is -128. I was expecting the write() method of the data output stream to convert the int into the byte array in the same manner as the toByteArray() method does on a BigInteger object. However, this doesn't seem to be the case as the msb is not present. So my question is, how in the second scenario are we supposed to know that 128 is supposed to be a positive number and not a negative one if there is no msb.
As you probably already know
In an octet, the pattern 10000000 can be interpreted as either 128 or -128, depending on the, um, outside interpretation
Java's byte type interprets octects as values in -128...127 only.
If you are building an application in which the entire world consists of nonnegative integers only, then you could simply do all of your work under the assumption that the byte value -128 will mean 128 and -127 will mean 129 and ... and -1 will mean 255. This is certainly doable but it takes work.
Dealing with the notion of an "unsigned byte" like this is normally done by expanding the byte into a short or int with the higher order bits all set to zero and then performing arithmetic or displaying your values. You will need to decide whether such an approach is more to your liking than just representing 128 as two octets in your array.
I think the following code might be sufficient.
In java int is a twos-complements binary number:
-1 = 111...111
ones complement = 000...000; + 1 =
1 = 000...001
So that about the sign bit I do not understand. Be it, that you could do Math.abs(n).
A byte ranges from -128 to 127, but the interpretation is a matter of masking, as below.
public static void main(String[] args) {
int n = 128;
byte[] bytes = intToFlexBytes(n);
for (byte b: bytes)
System.out.println("byte " + (((int)b) & 0xFF));
}
public static byte[] intToFlexBytes(int n) {
// Convert int to byte[4], via a ByteBuffer:
byte[] bytes = new byte[4];
ByteBuffer bb = ByteBuffer.allocateDirect(4);
bb.asIntBuffer().put(n);
bb.position(0);
bb.get(bytes);
// Leading bytes with 0:
int i = 0;
while (i < 4 && bytes[i] == 0)
++i;
// Shorten bytes array if needed:
if (i != 0) {
byte[] shortenedBytes = new byte[4 - i];
for (int j = i; j < 4; ++j) {
shortenedBytes[j - i] = bytes[j]; // System.arrayCopy not needed.
}
bytes = shortenedBytes;
}
return bytes;
}
To answer your first question—how many bytes are required to represent a nonnegative integer using an unsigned representation—consider the following functions I wrote in Common Lisp.
(defconstant +bits-per-byte+ 8)
(defun bit-length (n)
(check-type n (integer 0) "a nonnegative integer")
(if (zerop n)
1
(1+ (floor (log n 2)))))
(defun bytes-for-bits (n)
(check-type n (integer 1) "a positive integer")
(values (ceiling n +bits-per-byte+)))
These highlight the mathematical underpinnings of the problem: namely, the logarithm tells you how many powers of two (as provided by bits) it takes to dominate a given nonnegative integer, adjusted to be a step function with floor, and the number of bytes it takes to hold that number of bits again as a step function, this time adjusted with ceiling.
Note that the number zero is intolerable as input to a logarithm function, so we avoid it explicitly. You may observe that the bit-length function could also be written with a slight transformation of the core expression:
(defun bit-length-alt (n)
(check-type n (integer 0) "a nonnegative integer")
(values (ceiling (log (1+ n) 2))))
Unfortunately, as the logarithm of one is always zero, regardless of the base, this version says that the integer zero can be represented by zero bits, which isn't the answer we want.
For your second goal, you can use the functions I've defined above to allocate the required number of bytes, and incrementally set the bits you need, ignoring sign. It's hard to tell if you're having trouble getting the proper bits set in the byte vector, or whether your problem is in interpreting the bits in way that avoids treating the high bit as a sign bit (that is, two's complement representation). Please elaborate what kind of push you need to get you moving again.

Categories