Convert char[] to byte[] without losing 'bits' - java

I'm developing an Android 2.3.3 application with Java.
This app is an iOS code with unsigned data types port to Java.
On iOS it works with UInt16 and UInt8. In one case instead using byte[] I'm using char[].
But know I have to send that char[] as a byte[] using a DatagramPacket.
If one element of char[] is 128, how can I do to insert into byte[] and the receiver gets 128. I don't know what happens if I do this:
char c = (char)128;
byte b = (byte)c;
Which will be b value?
128 = 10000000. b = -1 or b = 127?
How can I convert char[] to byte[] without losing any bits?

In Java char is an unsigned 16-bit quantity. So you can directly convert your uint16 to char without doing anything else.
For unsigned 8-bit quantity you have 2 options:
Use byte. It also holds 8 bits. You don't lose any bits just because it is signed. However, if you do arithmetic with it you need to remember that Java will scale byte up automatically to an int and sign-extend it. To prevent this just always mask it like this:
byte b;
int foo = 5 * (b & 0xFF);
Use char. It is unsigned and can hold 16 bits so the 8 bits will fit in there quite nicely. To put a byte into a char just do this:
byte b;
char c = (char)(b & 0xFF); // Mask off the low-order 8 bits
To put a char into a byte just do:
char c;
byte b = (byte)c; // Truncates to 8 bits
Be aware that byte in Java is signed, so that whenever you do arithmetic with it you need to mask the low-order 8 bits only (to prevent sign-extension). Like this:
byte b;
int foo = (b & 0xFF);
You can do all the normal bitwise operations you want with a byte without having to mask:
byte b;
if (b & 0x80) ... // Test a flag
b |= 0x40; // Set a flag
b ^= 0x20; // Flip a flag from 0 to 1 or 1 to 0
b ^= ~0x10; // Clear a flag
byte x = b << 3; // Shift left 3 bits and assign
byte x = b >>> 4; // Shift right 4 bits and assign (WITHOUT sign extension)

I think you need to rethink your approach so you don't end up needing to convert char[] to byte[].
If your data really is characters, then you want to look at various serialization techniques, such as using new String(char[]) to create a string and then using getBytes(Charset) to get the bytes as encoded by a given Charset (because, of course, the same characters result in different bytes when encoded in ASCII or UTF-8 or UTF-16, etc.).
But from your question, it sounds like you're not really using characters, you're just using char as a 16-bit type. If so, doing the conversion isn't difficult, something along these lines:
byte[] toBytes(char[] chars) {
byte[] bytes = new byte[chars.length * 2];
int ci, bi;
char ch;
bi = 0;
for (ci = 0; ci < chars.length; ++ci) {
ch = chars[ci];
bytes[bi++] = (byte)((ch & 0xFF00) >> 8);
bytes[bi++] = (byte)(ch & 0x00FF);
}
return bytes;
}
Reverse the masks if you want the result to be small-endian instead.
But again, I would look at your overall approach and try to avoid this.

Related

How can I concatenate two bytes in java?

I have an integer called writePos that takes a value between [0,1023]. I need to store it in the last two bytes of a byte array called bucket. So, I figure I need to represent it as a concatenation of the array's last two bytes.
How would I go about breaking down writePos into two bytes that, when concatenated and cast into an int, produces writePos again?
How would I go about concatenating once I get it broken down into the bytes?
This would be covered high-level by a ByteBuffer.
short loc = (short) writeLocation;
byte[] bucket = ...
int idex = bucket.length - 2;
ByteBuffer buf = ByteBuffer.wrap(bucket);
buf.order(ByteOrder.LITTLE__ENDIAN); // Optional
buf.putShort(index, loc);
writeLocation = buf.getShort(index);
The order can be specified, or left to the default (BIG_ENDIAN).
The ByteBuffer wraps the original byte array, and changes to ByteBuffer effect on the byte array too.
One can use sequential writing and reading an positioning (seek), but here I use overloaded methods for immediate positioning with index.
putShort writes to the byte array, modifying two bytes, a short.
getShort reads a short from the byte array, which can be put in an int.
Explanation
A short in java is a two-byte (signed) integral number. And that is what is meant. The order is whether LITTLE_ENDIAN: least significant byte first (n % 256, n / 256) or big endian.
Bitwise operations.
To byte:
byte[] bytes = new byte[2];
// This uses a bitwise and (&) to take only the last 8 bits of i
byte[0] = (byte)(i & 0xff);
// This uses a bitwise and (&) to take the 9th to 16th bits of i
// It then uses a right shift (>>) then move them right 8 bits
byte[1] = (byte)((i & 0xff00) >> 8);from byte:
To go back the other way
// This just reverses the shift, no need for masking.
// The & here is used to handle complications coming from the sign bit that
// will otherwise be moved as the bytes are combined together and converted
// into an int
i = (byte[0] & 0xFF)+(byte[1] & 0xFF)<<8;
There is a working example here of some of the conversions that you can play around with:
http://ideone.com/eRzsun
You need to split the integer into two bytes. The high and the low byte. Following your description it's stored as bug endian in the array.
int writeLocation = 511;
byte[] bucket = new byte[10];
// range checks must be done before
// bitwise right rotation by 8 bits
bucket[8] = (byte) (writeLocation >> 8); // the high byte
bucket[9] = (byte) (writeLocation & 0xFF); // the low byte
System.out.println("bytes = " + Arrays.toString(bucket));
// convert back the integer value 511 from the two bytes
bucket[8] = 1;
bucket[9] = (byte) (0xFF);
// the high byte will bit bitwise left rotated
// the low byte will be converted into an int
// and only the last 8 bits will be added
writeLocation = (bucket[8] << 8) + (((int) bucket[9]) & 0xFF);
System.out.println("writeLocation = " + writeLocation);

Equivalent in Java

Taking me hours already to figure this out by googling and I think need help here. I have a snippet that is causing it not to work like its equivalent in objective c so would need some experts' help.
What are the equivalent of the objective c snippets below in java?
unsigned char mByte[CC_SHA1_DIGEST_LENGTH];
uint64_t tBytes = 0xFFFFFFFF;
Well there is no absolute equivalent, if all languages were the same why would we need more than one?
The closest thing is probably:
final int CC_SHA1_DIGEST_LENGTH = 1024; //some length
char[] mByte = new char[CC_SHAR1_DIGEST_LENGTH];
//there is no unsigned keyword in java
//The long data type is a 64-bit two's complement integer
long tBytes = Long.MAX_VALUE;
long tBytes=0xFFFFFFFF; would also work, but this is a negative number (because it is treated as an integer, not a long). If you want it to be long you need to add L (0xFFFFFFFFL) at the end. Be careful!
More info on the primitive datatypes can be found here.
unsigned char mByte[CC_SHA1_DIGEST_LENGTH];
equivalent in java will be
final int CC_SHA1_DIGEST_LENGTH=1024;
byte mByte[CC_SHA1_DIGEST_LENGTH];
though mByte is unsigned so when you work with mByte array you have to convert byte value to unsigned using 0xFF.
for example,
mByte[0]=(0x88 & 0xFF);
uint64_t tBytes = 0xFFFFFFFF;
equivalent in java will be
long tBytes = 0xFFFFFFFF;
Well the rule of thumb would be to match type lenght's in bits.
In C unsigned char is 8 bit wide, so Java equivalent would be byte.
In C uint64_t is 64 bit wide, so Java equivalent would be long.
So therefore Java equivalent would be:
//SHA1 lenght would be 20 bytes
public static final int CC_SHA1_DIGEST_LENGTH = 20;
byte mBytes[] = new byte[CC_SHA1_DIGEST_LENGTH];
long tBytes = 0xFFFFFFFF;
#user3200809
mByte[0]=(0x88 & 0xFF); - this operation is pointless unless you have value larger than 8 bits.
See the same operation in binary:
1000 1000
&
1111 1111
=
1000 1000
When assigning to mByte[i] you would have to type cast to byte anyway so all excessive bits will be chopped.
When we look at memory representation of those types there is no difference between signed and unsigned types.
I'm guessing from authors post that he works with SHA1 algorithm which is bunch of binary operations (xor's, and's, shift's).
So for binary operations there is no difference that type is signed or unsigned:
byte b = (byte) 0x80; //-128 in signed decimal
b ^= (byte) 0x01;
System.out.printf("0x%x", b); // prints 0x81 which is -127 in signed decimal
BUT you could run into problems if you're doing e.g. division.

Is there a way to represent int value in byte over 127 without doing bitwise operations?

I am really short on time for doing the learning of bitwise operations.
I want to convert large integer(>127) values without doing '<<' or anything similar.
I need byte representation of integer values used to identify sequence numbers of packets in header sent across UDP. If there is no solution I will introduce two bytes..
Something like: 1, 1 ; 1,2 ; 1,3 ; packet lost ; 1,4 ; packet lost; 2,1 ,2,2
and then reset it upon reaching 127; 127
I can introduce third, but this is rather ugly.
It would be really useful to have black box that is part of java api doing all that byte conversion for me. Is there?
Thanks,
To pack an unsigned 8-bit value into a byte:
static byte toByte(int i) {
if ((i < 0) || (i > 255))
throw new IllegalArgumentException(String.valueOf(i));
return (byte) i;
}
To convert back:
static int toInt(byte b) {
return (b < 0) ? (b + 256) : b;
}
After reading your comments on other answers, it sounds like you might want something like this:
byte[] b = BigInteger.valueOf(counter).toByteArray();
and
long counter = new BigInteger(b).longValue();
Since the length of the array would vary as the counter grows, you'd need some way to indicate its length or delimit it. But this technique will convert any integer value to an array of bytes.
Is the problem that you want unsigned bytes, as in, numbers between 128 and 255 inclusive?
That's...tricky. The Java language won't let you directly treat bytes as unsigned...but with library support it gets a little easier. Guava provides an UnsignedBytes utility class for some of these needs. Addition, multiplication, and subtraction are all exactly the same on signed and unsigned bytes.
EDIT: Judging from your additional comments, you might be interested in Ints.toByteArray(int) and the like, which work on types between byte and BigInteger.
According to my understanding, you want to separate an int into 4 bytes. If so, then just copy paste this code:
int i = /* your int */
int[] b = { (i >> 24) & 0xff, (i >> 16) & 0xff, (i >> 8) & 0xff, i & 0xff };
Indices 0-3 are each of the 4 bytes in the int.

unsigned long in java, using BigInteger for arithmetics, but BigInteger.toByteArray returns 14 bytes instead of 8

I have the following c code which id like to port to Java
unsigned long long* data=(unsigned long long*)pBuffer; // file data
unsigned long long crypt = 0x0000;
unsigned long long next_crypt;
unsigned int len = size >> 3;
for(unsigned int i=0; i<len;i++) {
next_crypt = crypt+data[i]-0xCAFEBABE;
data[i] = ((data[i]<<0x1d)|(data[i]>>0x23))+0xCAFEBABE;
data[i] = (data[i]<<0x0e)|(data[i]>>0x32);
data[i] = data[i] - crypt;
crypt = next_crypt;
}
I tried to port this to java using long, however this would result in negative values. Therefor i switched to biginteger since i have to do arithmetics (bit shifting etc).
I got the desired 64bit unsigned long value using BigInteger, however when i wanted to convert it to byte (BigInteger.toByteArray) it was 14 bytes long and no longer 8 bytes - so i cannot modify my array/file anymore. I tried using toLongValue() but the data was incorrect.
Thanks
Your C code is relying on bits being shifted off the high-order end of the unsigned long long. (These are rotated around to the other end by the other shift.) BigInteger is arbitrary precision and hence has no end, so left-shifted bits are never shifted off.
You could construct a 64-bit BigInteger bitwise AND mask and AND it after the left shifts. This is an intuitive solution.
You could also just simply ignore the high-order bytes.
byte[] bar = foo.toByteArray();
if (bar.length > 8) {
bar = System.arrayCopy(bar, bar.length - 8, new byte[8], 0, 8);
}
If len is large, then this simple solution would be wasteful of memory.
In any case there is a saner and higher-performing solution. Java's signed integer types are all guaranteed to have two's complement semantics. The bit semantics for arithmetic with two's complement integers and unsigned integers are identical--the difference is only in the interpretation of the value! So just use the original C code (substituting in Java's long) and at the end, interpret them your way.
byte[] longToByteArray(long x) {
byte[] array = new byte[8];
for (int i = 7; i >= 0; i--) {
array[i] = (byte)x;
x >>>= 8;
}
}
By the way, be sure you replace the >> operator in the C code with Java's >>> operator.
The nice thing about Java is that it's guaranteed to be twos-complement, so provided you use >>> instead of >> and avoid % and / and inequalities, arithmetic is effectively unsigned anyway.

Read structured data from binary file -?

I know the file structure, suppose this structure is this:
[3-bytes long int],[1-byte long unsigned integer],[4-bytes long unsigned integer]
So the file contains chains of such records.
What is the most elegent way to parse such a file in Java?
Supposedly, we can define a byte[] array of overall length and read it with InputStream, but how then convert its subelements into correct integer values?
First thing, byte value in java is signed, we need unsigned value in our case.
Next thing, are there useful methods that allow to convert a sub-array of bytes, say, bytes from 1-st to 4-th into a correct integer value?
I know for sure, there are functions pack & unpack in Perl, that allow you to represent a string of bytes as an expression, let's say "VV" means 2 unsigned long int values. You define such a string and provide it as an argument to a pack or unpack functions, along with the bytes to be packed/unpacked. Are there such things in Java / Apache libs etc ?
Like #Bryan Kyle example but shorter. I like shorter, but that doesn't mean clearer, you decide. ;) Note: readByte() is signed and will have unexpected results if not masked with 0xFF.
DataInputStream dis = ...
// assuming BIG_ENDIAN format
int a = dis.read() << 16 | dis.read() << 8 | dis.read();
short b = (short) dis.read();
long c = dis.readInt() & 0xFFFFFFFFL;
or
ByteBuffer bb =
bb.position(a_random_postion);
int a = (bb.get() & 0xFF) << 16 | (bb.get() & 0xFF) << 8 | (bb.get() & 0xFF);
short b = (short) (bb.get() & 0xFF);
long c = bb.readInt() & 0xFFFFFFFFL;
You may take a look at this sample BinaryReader class which is based on the DataInputStream class.
You should be able to do this using a DataInputStream. It's been a while since I've done much development like this, but the trick I seem to remember is that if there's an impedance mis-match between your input format and the language's data types you'll need to construct the data byte by byte. In this case, it looks like you'll need to do that because the data structure has oddly sized structures.
To give you an example to read the first record you might need to do something like this (I'm using a, b, and c for the attributes of the record)
DataInputStream dis = ...
int a = 0;
a = dis.readByte();
a = a << 8;
a = a | dis.readByte();
a = a << 8;
a = a | dis.readByte();
short b = 0;
b = dis.readByte();
long c = 0;
c = dis.readByte();
c = c << 8;
c = c | dis.readByte();
c = c << 8;
c = c | dis.readByte();
c = c << 8;
c = c | dis.readByte();
Obviously, this code could be tightened up by compounding some of the statements, but you get the general idea. What you might notice is that for each of the attributes being read I have to use a primitive that's larger than needed so there aren't any overflow errors. For reference, in Java:
byte = 1 byte
short = 16 bit, 2 bytes
int = 32 bits, 4 bytes
long = 64 bits, 8 bytes

Categories