Efficient way of processing byte array which contains a mixture of encodings - java

I have some data in a byte array, retrieved earlier from a network session using non-blocking IO (to facilitate multiple channels).
The format of the data is essentially
varint: length of text
UTF-8: the text
I am trying to figure out a way of efficiently extracting the text, given that its starting position is undetermined (as a varint is variable in length). I have something that's really close but for one small niggle, here goes:
import com.clearspring.analytics.util.Varint;
// Some fields for your info
private final byte replyBuffer[] = new byte[32768];
private static final Charset UTF8 = Charset.forName ("UTF-8");
// ...
// Code which extracts the text
ByteArrayInputStream byteInputStream = new ByteArrayInputStream(replyBuffer);
DataInputStream inputStream = new DataInputStream(byteInputStream);
int textLengthBytes;
try {
textLengthBytes = Varint.readSignedVarInt (inputStream);
}
catch (IOException e) {
// I don't think we should ever get an IOException when using the
// ByteArrayInputStream class
throw new RuntimeException ("Unexpected IOException", e);
}
int offset = byteInputStream.pos(); // ** Here lies the problem **
String textReceived = new String (replyBuffer, offset, textLengthBytes, UTF8);
The idea being that the text offset in the buffer is indicated by byteInputStream.pos(). However that method is protected.
It seems to me that the only way to get the "rest" of the text after decoding the varint is to use something that copies it all into another buffer but that seems rather wasteful for me.
Constructing the string directly from the underlying buffer should be fine, because after this I don't care anymore for the state of byteInputStream or inputStream. So I am trying to figure out a way to calculate offset, or, put another way, how many bytes Varint.readSignedVarInt consumed. Perhaps there is an efficient method of converting from the integer value returned by Varint.readSignedVarInt to the number of bytes that would have taken up in the encoding?

There are a few ways you can find the offset of the string in the byte array:
You can create a subclass of ByteArrayInputStream that gives you access to the pos field. It has protected access so that subclasses can use it.
If you want something more generally applicable, create a subclass of FilterInputStream that counts the number of bytes that have been read. This is more work and probably not worth the effort though.
Count the number of bytes that encode the varint. There are at most 5.
int offset = 0; while (replyBuffer[offset++] < 0);
Calculate the number of bytes needed to encode a varint. Each byte encodes 7 bits so you can take the position of the highest 1 bit and divide by 7.
// "zigzag" encoding required since you store the length as signed
int textLengthUnsigned = (textLengthBytes<<2) ^ (textLengthBytes >> 31);
int offset = (31 - Integer.numberOfLeadingZeros(textLengthUnsigned))/7 + 1

Related

Strcture data communication between Java and C socket programming

I have a Java socket channel and I'm sending a object data and receiving it in C socket ..
Java Code::
//structure
class data
{
public String jobtype;
public String budget;
public String time ;
}
//creating a Socket Channel and sending data through it in java
Selector incomingMessageSelector = Selector.open();
SocketChannel sChannel = SocketChannel.open();
sChannel.configureBlocking(false);
sChannel.connect(new InetSocketAddress("localhost", 5000));
sChannel.register(incomingMessageSelector, SelectionKey.OP_CONNECT);
if(sChannel.finishConnect()==true)
{
sChannel.register(incomingMessageSelector, SelectionKey.OP_WRITE);
}
int len = 256;
ByteBuffer buf = ByteBuffer.allocate(len);
buf.putInt(len);
// Writing object of data in socket
buf.put(obj.jobtype.getBytes("US-ASCII"));
buf.put(obj.budget.getBytes("US-ASCII"));
buf.put(obj.time.getBytes("US-ASCII"));
buf.put((byte) 0);
buf.flip();
sChannel.write(buf);
C Code ::
struct data
{
char time[50];
char jobtype[50];
char budget[50];
};
n = read(newsockfd, &size, sizeof(size));
struct data *result = malloc(size);
n = read(newsockfd, result, size);
printf("\njobtype :: %s\nbudget :: %s\ntime :: %s\n",result->jobtype,result->budget,result->time);
After giving input in Java as:
jobtype = h1
budget = 20
time = 12
I'm getting these output in C:
jobtype ::
budget ::
time :: h1
The buffer which you are sending from Java to C needs to have exactly the same definition (from a byte point of view) in both languages. In your code that is not the case. The buffer you construct in Java does not have the same format as the struct you are using in C to interpret that buffer. Both the length of the strings and order of the strings do not match between sender (Java) and receiver (C). In addition, the size of the buffer sent does not match the size of the buffer expected based on the length information sent (i.e. you are not sending the correct length of your buffer).
In C you have defined a structure that is 150 bytes long containing 3 char arrays (strings), each 50 bytes long. With the order: time, jobtype, budget
In Java you have created a buffer of variable length with strings of variable length in the order: jobtype, budget, time. Fundamentally, the Java code is creating a variable length buffer where the C code is expecting to map this to a fixed length structure.
While it is not what you desire, your C program is obtaining the jobtype string which you placed first in the buffer and assigns it to time. This is how it is currently written.
Assuming that you leave the C program the same, the portion of your Java code which creates and fills the buffer could look something like:
public ByteBuffer createFixedLengthCString(String src, int len) {
//If the string is longer than len-1 it is truncated.
ByteBuffer cString = ByteBuffer.allocate(len);
if(src.length() > len - 1) {
//Using len-1 prevents the last 0 in the ByteBuffer from being
// overwritten. A final 0 is needed:C uses null (0) terminated strings.
cString.put(src.getBytes("US-ASCII"), 0, len-1);
} else {
//The string is not longer than the maximum length.
cString.put(src.getBytes("US-ASCII"));
}
//Already have null termination. Do not want to flip (would change length).
//Reset the position to 0.
cString.position(0);
return cString;
}
int maxBufLen = 256;
int payloadLen = 150
int cStringLen = 50;
ByteBuffer buf = ByteBuffer.allocate(maxBufLen);
//Tell C that the payload is 150 bytes long.
buf.putInt(payloadLen);
// Writing object data in the buffer
buf.put(createFixedLengthCString(obj.time, cStringLen));
buf.put(createFixedLengthCString(obj.jobtype, cStringLen));
buf.put(createFixedLengthCString(obj.budget, cStringLen));
//Use flip() here as it changes the length of bytes sent to the correct
// number (an int plus 150) and sets the position to 0, ready for reading.
buf.flip();
while(buf.hasRemaining()) {
//There is the possibility that a single call to write() will not
// write the entire buffer. Thus, loop until all data is written.
//There should be other conditions which cause us to break out of
// this loop (e.g. a maximum number of write attempts). Without such,
// if the channel is hung this is code will hang in this loop; effectively
// a blocking (for this code) write loop.
sChannel.write(buf);
}
This answer is only intended to address the specific malfunction you have identified in the question. However, the code as presented is really only appropriate as an example/test of transmitting limited data from one process to another on the same machine. Even for that there should be exception and error handling which is not included here.
As EJP implied in his comment, it is often better/easier to use already existing protocols when communicating over a bit pipe. These protocols are designed to address many different issues which can become relevant, even in simple inter-process communications.

Converting byte to a single int Android

I am implementing a Bluetooth Android application, where Image is sent from one device to another. The bitmap's byte array is sent and successfully reconstructed at the receiver end. However, I need to send a single integer value together with the bitmap as an index(so the receiver knows what to do with the received bitmap). So basically I want to send this in a byte stream:
int|bitmap
Since I need to transfer an int up to 27, that means it fits into a single byte, right? My current code looks like this:
ba[0] = Integer.valueOf(drawableNumber).byteValue(); //drawableNumber value is between 1 and 27
ByteArrayOutputStream bs = new ByteArrayOutputStream(); //create new output stream
try {
bs.write(ba); //bytes of the integer
bs.write(bitmapdata); //bytes of the bitmap
bs.toByteArray() // put everything into byte array
}
mChatService.write(bs.toByteArray()); // that is where bytes are sent to another device
And at the receiver end:
case MESSAGE_READ:
readBuf = (byte[]) msg.obj; // readBuf contains ALL the received bytes using .read method
So my question is, how can I reconstruct the integer and the image I have sent(basically a single byte to a single integer)? I manage to reconstruct the bitmap alone, but I need this additional integer value to know what to do with the received image. The integer value will always be between 0 and 27. I have checked all other answers, but could not find a proper solution..
EDIT: Main question is how to separate the integer bytes from the bitmap bytes in the byte array. Because at the receiving end I want to reconstruct the sent integer AND the bitmap separately
Since converting a byte to an int is a downcast, you can just assign the byte to an int variable.
int myInt = ba[0];
When I try this in java, it simply tells me what my mind was thinking when I commented. An integer is represented as a byte (so in your case, it is simply ba[0]). Or it should be based on your code. Any more than that and it would be a long. That means it is also the first byte that will get read out of your buffer (or should be).
import java.io.ByteArrayOutputStream;
public class TestClass {
public static void main(String args[]){
byte[] ba = new byte[10];
int myInt = 13;
ByteArrayOutputStream bs = new ByteArrayOutputStream(); //create new output stream
try {
bs.write(myInt); //bytes of the integer
ba = bs.toByteArray(); // put everything into byte array
} finally{};
for(int i = 0; i < ba.length; i++){
System.out.println(i);
System.out.println(ba[i]);
}
}
}
Again I realize this isn't exactly an answer, just too much for a comment.

How to parse byte stream in Java properly

Hello boys and girls.
I'm developing a terminal based client application which communicates over TCP/IP to server and sends and receives an arbitary number of raw bytes. Each byte represents a command which I need to parse to Java classes representing these commands, for further use.
My question how I should parse these bytes efficiently. I don't want to end up with bunch of nested ifs and switch-cases.
I have the data classes for these commands ready to go. I just need to figure out the proper way of doing the parsing.
Here's some sample specifications:
Byte stream can be for example in
integers:[1,24,2,65,26,18,3,0,239,19,0,14,0,42,65,110,110,97,32,109,121,121,106,228,42,15,20,5,149,45,87]
First byte is 0x01 which is start of header containing only one byte.
Second one is the length which is the number of bytes in certain
commands, only one byte here also.
The next can be any command where the first byte is the command, 0x02
in this case, and it follows n number of bytes which are included in
the command.
So on. In the end there are checksum related bytes.
Sample class representing the set_cursor command:
/**
* Sets the cursor position.
* Syntax: 0x0E | position
*/
public class SET_CURSOR {
private final int hexCommand = 0x0e;
private int position;
public SET_CURSOR(int position) {
}
public int getPosition() {
return position;
}
public int getHexCommnad() {
return hexCommand;
}
}
When parsing byte streams like this the best Design Pattern to use is the Command Pattern. Each of the different Commands will act as handlers to process the next several bytes in the stream.
interface Command{
//depending on your situation,
//either use InputStream if you don't know
//how many bytes each Command will use
// or the the commands will use an unknown number of bytes
//or a large number of bytes that performance
//would be affected by copying everything.
void execute(InputStream in);
//or you can use an array if the
//if the number of bytes is known and small.
void execute( byte[] data);
}
Then you can have a map containing each Command object for each of the byte "opcodes".
Map<Byte, Command> commands = ...
commands.put(Byte.parseByte("0x0e", 16), new SetCursorCommand() );
...
Then you can parse the message and act on the Commands:
InputStream in = ... //our byte array as inputstream
byte header = (byte)in.read();
int length = in.read();
byte commandKey = (byte)in.read();
byte[] data = new byte[length]
in.read(data);
Command command = commands.get(commandKey);
command.execute(data);
Can you have multiple Commands in the same byte message? If so you could then easily wrap the Command fetching and parsing in a loop until the EOF.
you can try JBBP library for that https://github.com/raydac/java-binary-block-parser
#Bin class Parsed { byte header; byte command; byte [] data; int checksum;}
Parsed parsed = JBBPParser.prepare("byte header; ubyte len; byte command; byte [len] data; int checksum;").parse(theArray).mapTo(Parsed.class);
This is a huge and complex subject.
It depends on the type of the data that you will read.
Is it a looooong stream ?
Is it a lot of small independent structures/objects ?
Do you have some references between structures/objects of your flow ?
I recently wrote a byte serialization/deserialization library for a proprietary software.
I took a visitor-like approach with type conversion, the same way JAXB works.
I define my object as a Java class. Initialize the parser on the class, and then pass it the bytes to unserialize or the Java object to serialize.
The type detection (based on the first byte of your flow) is done forward with a simple case matching mechanism (1 => ClassA, 15 => ClassF, ...).
EDIT: It may be complex or overloaded with code (embedding objects) but keep in mind that nowadays, java optimize this well, it keeps code clear and understandable.
ByteBuffer can be used for parsing byte stream - What is the use of ByteBuffer in Java?:
byte[] bytesArray = {4, 2, 6, 5, 3, 2, 1};
ByteBuffer bb = ByteBuffer.wrap(bytesArray);
int intFromBB = bb.order(ByteOrder.LITTLE_ENDIAN).getInt();
byte byteFromBB = bb.get();
short shortFromBB = bb.getShort();

Reading a double from a connected thread in Android/Java

I am currently using the GraphView from the developer jjoe64 on GitGub and I was wondering how I would retrieve the double I created in my BT connected thread class to the GraphView class. This is the original function to call random data, but I want the serial data from my BlueTooth class
The current function in this realtime graph is:
private double getRandom() {
double high = 3;
double low = 0.5;
return Math.random() * (high - low) + low;
}
In my Bluetooth class, I have the command ConnectedThread.read(), but It's not really working. Here it is:
public static double read() {
try {
byte[] buffer = new byte[1024];
double bytes = mmInStream.read(buffer);
return bytes;
} catch(IOException e) {
return 5;
}
}
I am not sure if I it's just my phone that's too slow, it's running Android2.3 (DesireHD), but my professor at my school said it should work fine if I just call ConnectedThread.read() and have it equal a double. Any advice?
You haven't provided enough information for a out-of-the box solution, but I'll give it a shot anyway.
First of all, I presume that mmInStream is an InputStream or its subclass. Look at the API of int InputStream.read(byte[] b):
Reads some number of bytes from the input stream and stores them into the buffer array b. The number of bytes actually read is returned as an integer. This method blocks until input data is available, end of file is detected, or an exception is thrown.
This means that what you're returning from your read() method is just the number of bytes that have been written to the buffer from mmInStream. That is probably not what you want to do. What you probably want to do is read just the value from this stream. To do that you should:
wrap your mmInStream in a DataInputStream just after the mmInStream is created:
mmInStream = yourMethodCreatingInputStream();
dataInStream = new DataInputStream(mmInStream);
read the double value from the dataInStream. But as in all computer systems you must be aware of the exact format that your input value comes in. You must refer to the specification of the device you're using to fetch the input data.
Now the dataInStream comes in handy because it abstracts the necessary low-level IO operations and lets you focus on the data. It will automatically translate your queries for the data to the IO operations. For example:
If your data is in double format (and I believe that is the case according to the words of your professor), your read() method is as simple as:
public static double read() {
return dataInStream.readDouble();
}
And in case the data is coming in the float format:
public static double read() {
return (double)dataInStream.readFloat();
}
But again, be sure to consult the specification of the device you're using for the exact format. Some devices may pass you data in exotic formats like for example: "first 2 bytes are the integer part of the resulting value, second 2 bytes are the fractional part". It is up to you as a consumer of the data to follow its format.

Attempt at parsing packets: Is there a Java equivalent to Python's "unpack"?

Is there an equivalent function to this Python function in Java?
struct.unpack(fmt, string)
I'm trying to port a parser written in Python to Java, and I'm looking for a way to implement the following line of code:
handle, msgVer, source, startTime, dataFormat, sampleCount, sampleInterval, physDim, digMin, digMax, physMin, physMax, freq, = unpack(self.headerFormat,self.unprocessed[pos:pos+calcsize(self.headerFormat)])
I'm using this in the context of a project where I receive bytes from the network and need to extract a specific part of the bytes to display them.
[EDIT 2]
The conclusion I had posted as an update was wrong. I deleted it to avoid misleading others.
I am not aware of any real equivalent to Python's unpack in Java.
The conventional approach would be to read the data from a stream (originating either from a socket, or from a byte array read from the socket, via a ByteArrayInputStream) using a DataInputStream. That class has a set of methods for reading various primitives.
In your case, you would do something like:
DataInputStream in;
char[] handle = new char[6]; in.readFully(handle);
byte messageVersion = in.readByte();
byte source = in.readByte();
int startTime = in.readInt();
byte dataFormat = in.readByte();
byte sampleCount = in.readByte();
int sampleInterval = in.readInt();
short physDim = in.readShort();
int digMin = in.readInt();
int digMax = in.readInt();
float physMin = in.readFloat();
float physMax = in.readFloat();
int freq = in.readInt();
And then turn those variables into a suitable object.
Note that i've opted to pack each field into the smallest primitive which will hold it; that means putting unsigned values into signed types of the same size. You might prefer to put them in bigger types, so that they keep their sign (eg putting an unsigned short into an int); DataInputStream has a set of readUnsignedXXX() methods that you can use for that.

Categories