Read structured data from binary file -? - java

I know the file structure, suppose this structure is this:
[3-bytes long int],[1-byte long unsigned integer],[4-bytes long unsigned integer]
So the file contains chains of such records.
What is the most elegent way to parse such a file in Java?
Supposedly, we can define a byte[] array of overall length and read it with InputStream, but how then convert its subelements into correct integer values?
First thing, byte value in java is signed, we need unsigned value in our case.
Next thing, are there useful methods that allow to convert a sub-array of bytes, say, bytes from 1-st to 4-th into a correct integer value?
I know for sure, there are functions pack & unpack in Perl, that allow you to represent a string of bytes as an expression, let's say "VV" means 2 unsigned long int values. You define such a string and provide it as an argument to a pack or unpack functions, along with the bytes to be packed/unpacked. Are there such things in Java / Apache libs etc ?

Like #Bryan Kyle example but shorter. I like shorter, but that doesn't mean clearer, you decide. ;) Note: readByte() is signed and will have unexpected results if not masked with 0xFF.
DataInputStream dis = ...
// assuming BIG_ENDIAN format
int a = dis.read() << 16 | dis.read() << 8 | dis.read();
short b = (short) dis.read();
long c = dis.readInt() & 0xFFFFFFFFL;
or
ByteBuffer bb =
bb.position(a_random_postion);
int a = (bb.get() & 0xFF) << 16 | (bb.get() & 0xFF) << 8 | (bb.get() & 0xFF);
short b = (short) (bb.get() & 0xFF);
long c = bb.readInt() & 0xFFFFFFFFL;

You may take a look at this sample BinaryReader class which is based on the DataInputStream class.

You should be able to do this using a DataInputStream. It's been a while since I've done much development like this, but the trick I seem to remember is that if there's an impedance mis-match between your input format and the language's data types you'll need to construct the data byte by byte. In this case, it looks like you'll need to do that because the data structure has oddly sized structures.
To give you an example to read the first record you might need to do something like this (I'm using a, b, and c for the attributes of the record)
DataInputStream dis = ...
int a = 0;
a = dis.readByte();
a = a << 8;
a = a | dis.readByte();
a = a << 8;
a = a | dis.readByte();
short b = 0;
b = dis.readByte();
long c = 0;
c = dis.readByte();
c = c << 8;
c = c | dis.readByte();
c = c << 8;
c = c | dis.readByte();
c = c << 8;
c = c | dis.readByte();
Obviously, this code could be tightened up by compounding some of the statements, but you get the general idea. What you might notice is that for each of the attributes being read I have to use a primitive that's larger than needed so there aren't any overflow errors. For reference, in Java:
byte = 1 byte
short = 16 bit, 2 bytes
int = 32 bits, 4 bytes
long = 64 bits, 8 bytes

Related

Convert char[] to byte[] without losing 'bits'

I'm developing an Android 2.3.3 application with Java.
This app is an iOS code with unsigned data types port to Java.
On iOS it works with UInt16 and UInt8. In one case instead using byte[] I'm using char[].
But know I have to send that char[] as a byte[] using a DatagramPacket.
If one element of char[] is 128, how can I do to insert into byte[] and the receiver gets 128. I don't know what happens if I do this:
char c = (char)128;
byte b = (byte)c;
Which will be b value?
128 = 10000000. b = -1 or b = 127?
How can I convert char[] to byte[] without losing any bits?
In Java char is an unsigned 16-bit quantity. So you can directly convert your uint16 to char without doing anything else.
For unsigned 8-bit quantity you have 2 options:
Use byte. It also holds 8 bits. You don't lose any bits just because it is signed. However, if you do arithmetic with it you need to remember that Java will scale byte up automatically to an int and sign-extend it. To prevent this just always mask it like this:
byte b;
int foo = 5 * (b & 0xFF);
Use char. It is unsigned and can hold 16 bits so the 8 bits will fit in there quite nicely. To put a byte into a char just do this:
byte b;
char c = (char)(b & 0xFF); // Mask off the low-order 8 bits
To put a char into a byte just do:
char c;
byte b = (byte)c; // Truncates to 8 bits
Be aware that byte in Java is signed, so that whenever you do arithmetic with it you need to mask the low-order 8 bits only (to prevent sign-extension). Like this:
byte b;
int foo = (b & 0xFF);
You can do all the normal bitwise operations you want with a byte without having to mask:
byte b;
if (b & 0x80) ... // Test a flag
b |= 0x40; // Set a flag
b ^= 0x20; // Flip a flag from 0 to 1 or 1 to 0
b ^= ~0x10; // Clear a flag
byte x = b << 3; // Shift left 3 bits and assign
byte x = b >>> 4; // Shift right 4 bits and assign (WITHOUT sign extension)
I think you need to rethink your approach so you don't end up needing to convert char[] to byte[].
If your data really is characters, then you want to look at various serialization techniques, such as using new String(char[]) to create a string and then using getBytes(Charset) to get the bytes as encoded by a given Charset (because, of course, the same characters result in different bytes when encoded in ASCII or UTF-8 or UTF-16, etc.).
But from your question, it sounds like you're not really using characters, you're just using char as a 16-bit type. If so, doing the conversion isn't difficult, something along these lines:
byte[] toBytes(char[] chars) {
byte[] bytes = new byte[chars.length * 2];
int ci, bi;
char ch;
bi = 0;
for (ci = 0; ci < chars.length; ++ci) {
ch = chars[ci];
bytes[bi++] = (byte)((ch & 0xFF00) >> 8);
bytes[bi++] = (byte)(ch & 0x00FF);
}
return bytes;
}
Reverse the masks if you want the result to be small-endian instead.
But again, I would look at your overall approach and try to avoid this.

short to byte and byte to short conversion in Android

I am developing a software in Android. In a particular portion of software, I need to convert short to byte and re-convert to it to short. I tried below code but values are not same after conversion.
short n, n1;
byte b1, b2;
n = 1200;
// short to bytes conversion
b1 = (byte)(n & 0x00ff);
b2 = (byte)((n >> 8) & 0x00ff);
// bytes to short conversion
short n1 = (short)((short)(b1) | (short)(b2 << 8));
after executing the code values of n and n1 are not same. Why?
I did not get Grahams solution to work. This, however do work:
n1 = (short)((b1 & 0xFF) | b2<<8);
You can use a ByteBuffer:
final ByteBuffer buf = ByteBuffer.allocate(2);
buf.put(shortValue);
buf.position(0);
// Read back bytes
final byte b1 = buf.get();
final byte b2 = buf.get();
// Put them back...
buf.position(0);
buf.put(b1);
buf.put(b2);
// ... Read back a short
buf.position(0);
final short newShort = buf.getShort();
edit: fixed API usage. Gah.

How do you append two bytes to an int

I'm trying to append two bytes that have hex values and store them into an integer. So obviously everything will be unsigned values.
I'll provide an example since that is much easier to see.
two bytes
0x20 0x07
Integer
Edit: Oops I made a huge mistake here. Sorry for all the confusion.
I want integer to store 2007 not 0x2007. I'm really sorry about that.
Is there way to do this without converting the byte to String and append and switch to int?
or is converting to String is the only way?
You can try
byte b1 = (byte) 0x90;
byte b2 = (byte) 0xF7;
int i = ((b1 & 0xFF) << 8) | (b2 & 0xFF);
However if you are using DataInputStream or ByteBuffers you usually don't need to do this. Just use getShort in both cases.
Yes, just shift b1 by 8 bits and add it to b2:
byte b1 = 0x20;
byte b2 = 0x07;
int i1 = (b1 << 8) + b2; // gives 0x2007
// alternatively
int sameInt = b1 * 256 + b2; // gives 0x2007

send a int from objective-c to java by socket,but the value changed in java side

I send a int in a NSData like this:
NSData* dataLength = [[NSData alloc] initWithBytes:&theInt length:sizeof(theInt)];
then in java side, I get a int like this:
int theInt = aInputStreamOfSocket.readInt();
but the value changed! In my case, I send 1225516 and get 749933056
what's the problem?
Your trouble is a difference in endianness. Intel based processors use little-endian byte order while network based transports are almost always big-endian. Java thus expects big-endian for readInt(). Ideally you find a way to send the int as big-endian to conform to expected behavior. I however don't have that code offhand, so here's how to read little-endian on the Java side:
int ch1 = aInputStreamOfSocket.read();
int ch2 = aInputStreamOfSocket.read();
int ch3 = aInputStreamOfSocket.read();
int ch4 = aInputStreamOfSocket.read();
if ((ch1 | ch2 | ch3 | ch4) < 0) {
throw new EOFException();
}
int theInt = ch1 + (ch2 << 8) + (ch3 << 16) + (ch4 << 24);
Let's look at the hex for both of those numbers
1225516 = 0x0012B32C
749933056 = 0x2CB31200
You can see that the byte order (a.k.a. endianness) is reversed.
Generally, if you're sending data over a socket, you convert from the local byte order to network byte order with the functions htonl, htons, etc. On the receiving end, you convert from network byte order back to the local byte order. In java, you can do this by setting the byte order on the buffer with ByteBuffer#order(ByteOrder)
See this question also.

Converting Little Endian to Big Endian

All,
I have been practicing coding problems online. Currently I am working on a problem statement Problems where we need to convert Big Endian <-> little endian. But I am not able to jot down the steps considering the example given as:
123456789 converts to 365779719
The logic I am considering is :
1 > Get the integer value (Since I am on Windows x86, the input is Little endian)
2 > Generate the hex representation of the same.
3 > Reverse the representation and generate the big endian integer value
But I am obviously missing something here.
Can anyone please guide me. I am coding in Java 1.5
Since a great part of writing software is about reusing existing solutions, the first thing should always be a look into the documentation for your language/library.
reverse = Integer.reverseBytes(x);
I don't know how efficient this function is, but for toggling lots of numbers, a ByteBuffer should offer decent performance.
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
...
int[] myArray = aFountOfIntegers();
ByteBuffer buffer = ByteBuffer.allocate(myArray.length*Integer.BYTES);
buffer.order(ByteOrder.LITTLE_ENDIAN);
for (int x:myArray) buffer.putInt(x);
buffer.order(ByteOrder.BIG_ENDIAN);
buffer.rewind();
int i=0;
for (int x:myArray) myArray[i++] = buffer.getInt(x);
As eversor pointed out in the comments, ByteBuffer.putInt() is an optional method, and may not be available on all Java implementations.
The DIY Approach
Stacker's answer is pretty neat, but it is possible to improve upon it.
reversed = (i&0xff)<<24 | (i&0xff00)<<8 | (i&0xff0000)>>8 | (i>>24)&0xff;
We can get rid of the parentheses by adapting the bitmasks. E. g., (a & 0xFF)<<8 is equivalent to a<<8 & 0xFF00. The rightmost parentheses were not necessary anyway.
reversed = i<<24 & 0xff000000 | i<<8 & 0xff0000 | i>>8 & 0xff00 | i>>24 & 0xff;
Since the left shift shifts in zero bits, the first mask is redundant. We can get rid of the rightmost mask by using the logical shift operator, which shifts in only zero bits.
reversed = i<<24 | i>>8 & 0xff00 | i<<8 & 0xff0000 | i>>>24;
Operator precedence here, the gritty details on shift operators are in the Java Language Specification
Check this out
int little2big(int i) {
return (i&0xff)<<24 | (i&0xff00)<<8 | (i&0xff0000)>>8 | (i>>24)&0xff;
}
The thing you need to realize is that endian swaps deal with the bytes that represent the integer. So the 4 byte number 27 looks like 0x0000001B. To convert that number, it needs to go to 0x1B000000... With your example, the hex representation of 123456789 is 0x075BCD15 which needs to go to 0x15CD5B07 or in decimal form 365779719.
The function Stacker posted is moving those bytes around by bit shifting them; more specifically, the statement i&0xff takes the lowest byte from i, the << 24 then moves it up 24 bits, so from positions 1-8 to 25-32. So on through each part of the expression.
For example code, take a look at this utility.
Java primitive wrapper classes support byte reversing since 1.5 using reverseBytes method.
Short.reverseBytes(short i)
Integer.reverseBytes(int i)
Long.reverseBytes(long i)
Just a contribution for those who are looking for this answer in 2018.
I think this can also help:
int littleToBig(int i)
{
int b0,b1,b2,b3;
b0 = (i&0x000000ff)>>0;
b1 = (i&0x0000ff00)>>8;
b2 = (i&0x00ff0000)>>16;
b3 = (i&0xff000000)>>24;
return ((b0<<24)|(b1<<16)|(b2<<8)|(b3<<0));
}
Just use the static function (reverseBytes(int i)) in java which is under Integer Wrapper class
Integer i=Integer.reverseBytes(123456789);
System.out.println(i);
output:
365779719
the following method reverses the order of bits in a byte value:
public static byte reverseBitOrder(byte b) {
int converted = 0x00;
converted ^= (b & 0b1000_0000) >> 7;
converted ^= (b & 0b0100_0000) >> 5;
converted ^= (b & 0b0010_0000) >> 3;
converted ^= (b & 0b0001_0000) >> 1;
converted ^= (b & 0b0000_1000) << 1;
converted ^= (b & 0b0000_0100) << 3;
converted ^= (b & 0b0000_0010) << 5;
converted ^= (b & 0b0000_0001) << 7;
return (byte) (converted & 0xFF);
}

Categories