Efficiently send large int[] over sockets in Java - java

I am working on a Java application where I need to send an array of 500,000 integers from one Android phone to another Android phone over a socket connection as quickly as possible. The main bottleneck seems to be converting the integers so the socket can take them, whether I use ObjectOutputStreams, ByteBuffers, or a low level mask-and-shift conversion. What is the fastest way to send an int[] over a socket from one Java app to another?
Here is the code for everything I've tried so far, with benchmarks on the LG Optimus V I'm testing on (600 MHz ARM processor, Android 2.2).
Low level mask-and-shift: 0.2 seconds
public static byte[] intToByte(int[] input)
{
byte[] output = new byte[input.length*4];
for(int i = 0; i < input.length; i++) {
output[i*4] = (byte)(input[i] & 0xFF);
output[i*4 + 1] = (byte)((input[i] & 0xFF00) >>> 8);
output[i*4 + 2] = (byte)((input[i] & 0xFF0000) >>> 16);
output[i*4 + 3] = (byte)((input[i] & 0xFF000000) >>> 24);
}
return output;
}
Using ByteBuffer and IntBuffer: 0.75 seconds
public static byte[] intToByte(int[] input)
{
ByteBuffer byteBuffer = ByteBuffer.allocate(input.length * 4);
IntBuffer intBuffer = byteBuffer.asIntBuffer();
intBuffer.put(input);
byte[] array = byteBuffer.array();
return array;
}
ObjectOutputStream: 3.1 seconds (I tried variations of this using DataOutPutStream and writeInt() instead of writeObject(), but it didn't make much of a difference)
public static void sendSerialDataTCP(String address, int[] array) throws IOException
{
Socket senderSocket = new Socket(address, 4446);
OutputStream os = senderSocket.getOutputStream();
BufferedOutputStream bos = new BufferedOutputStream (os);
ObjectOutputStream oos = new ObjectOutputStream(bos);
oos.writeObject(array);
oos.flush();
bos.flush();
os.flush();
oos.close();
os.close();
bos.close();
senderSocket.close();
}
Lastly, the code I used to send byte[]: takes an addition 0.2 seconds over the intToByte() functions
public static void sendDataTCP(String address, byte[] data) throws IOException
{
Socket senderSocket = new Socket(address, 4446);
OutputStream os = senderSocket.getOutputStream();
os.write(data, 0, data.length);
os.flush();
senderSocket.close();
}
I'm writing the code on both sides of the socket so I can try any kind of endianness, compression, serialization, etc. There's got to be a way to do this conversion more efficiently in Java. Please help!

As I noted in a comment, I think you're banging against the limits of your processor. As this might be helpful to others, I'll break it down. Here's your loop to convert integers to bytes:
for(int i = 0; i < input.length; i++) {
output[i*4] = (byte)(input[i] & 0xFF);
output[i*4 + 1] = (byte)((input[i] & 0xFF00) >>> 8);
output[i*4 + 2] = (byte)((input[i] & 0xFF0000) >>> 16);
output[i*4 + 3] = (byte)((input[i] & 0xFF000000) >>> 24);
}
This loop executes 500,000 times. You 600Mhz processor can process roughly 600,000,000 operations per second. So each iteration of the loop will consume roughly 1/1200 of a second for every operation.
Again, using very rough numbers (I don't know the ARM instruction set, so there may be more or less per action), here's an operation count:
Test/branch: 5 (retrieve counter, retrieve array length, compare, branch, increment counter)
Mask and shift: 10 x 4 (retrieve counter, retrieve input array base, add, retrieve mask, and, shift, multiply counter, add offset, add to output base, store)
OK, so in rough numbers, this loop takes at best 55/1200 of a second, or 0.04 seconds. However, you're not dealing with best case scenario. For one thing, with an array this large you're not going to benefit from a processor cache, so you'll introduce wait states into every array store and load.
Plus, the basic operations that I described may or may not translate directly into machine code. If not (and I suspect not), the loop will cost more than I've described.
Finally, if you're really unlucky, the JVM hasn't JIT-ed your code, so for some portion (or all) of the loop it's interpreting bytecode rather than executing native instructions. I don't know enough about Dalvik to comment on that.

Java was IMO never intended to be able efficiently reinterpret a memory region from int[] to byte[] like you could do in C. It doesn't even have such a memory address model.
You either need to go native to send the data or you can try to find some micro optimizations. But I doubt you will gain a lot.
E.g. this could be slightly faster than your version (if it works at all)
public static byte[] intToByte(int[] input)
{
byte[] output = new byte[input.length*4];
for(int i = 0; i < input.length; i++) {
int position = i << 2;
output[position | 0] = (byte)((input[i] >> 0) & 0xFF);
output[position | 1] = (byte)((input[i] >> 8) & 0xFF);
output[position | 2] = (byte)((input[i] >> 16) & 0xFF);
output[position | 3] = (byte)((input[i] >> 24) & 0xFF);
}
return output;
}

I would do it like this:
Socket senderSocket = new Socket(address, 4446);
OutputStream os = senderSocket.getOutputStream();
BufferedOutputStream bos = new BufferedOutputStream(os);
DataOutputStream dos = new DataOutputStream(bos);
dos.writeInt(array.length);
for(int i : array) dos.writeInt(i);
dos.close();
On the other side, read it like:
Socket recieverSocket = ...;
InputStream is = recieverSocket.getInputStream();
BufferedInputStream bis = new BufferedInputStream(is);
DataInputStream dis = new DataInputStream(bis);
int length = dis.readInt();
int[] array = new int[length];
for(int i = 0; i < length; i++) array[i] = dis.readInt();
dis.close();

If you're not adverse to using a library, you might want to check out Protocol Buffers from Google. It's built for much more complex object serialization, but I'd bet that they worked hard to figure out how to quickly serialize an array of integers in Java.
EDIT: I looked in the Protobuf source code, and it uses something similar to your low-level mask and shift.

Related

How do I pack an INT32 into my MessagePack using Java?

I'm having an issue where I need to send an INT32 to another application, but from what I've read, messagepack.putInt and messagepack.putLong will try to optimize this into UINT32 which is causing problems for the receiving application.
The receiving application is giving me the error message
decode error, skipping message. msgp: attempted to decode type "uint" with method for "int"
I am using maven with the dependency
<dependency>
<groupId>org.msgpack</groupId>
<artifactId>msgpack-core</artifactId>
<version>0.8.13</version>
</dependency>
Someone else had this same issue and stated the solution to it was as follows
"OK, so we found the problem, it seems like metricTank requires the time property of the message object to be serialized as INT32, however the packInt (or packLong) will always try to optimize it into UINT32 which metricTank doesnt like. so we had to use addPayload and serialize MessagePacker.Code.INT32, and then the actual 4 bytes of the time property."
But I am unsure what to do and I am unable to contact the OP.
I have tried the following but it does not work
ByteBuffer buf = ByteBuffer.allocate(1 + Long.BYTES);
buf.put(MessagePack.Code.INT32);
buf.putLong(md.time);
packer.addPayload(buf.array());
The bytes array needs to be 5 in length, first byte is the header, being 0xd2 and the other 4 bytes need to be the value
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(baos);
dos.writeLong(md.time);
dos.close();
byte[] longBytes = baos.toByteArray();
ByteBuffer lBytes = ByteBuffer.allocate(4);
for (int i = 0; i < 4; i++) {
lBytes.put(longBytes[i]);
}
ByteBuffer buf = ByteBuffer.allocate(5);
buf.put((byte) 0xd2);
buf.put(lBytes.array());
This produces no error, but the time value is incorrect when received.
Could someone show me how I can pack an INT32 into my MessagePack rather than UINT32 or show me how I can pack the data in the correct way so it is unpacked correctly on the receiving application?
The receiving application is written in Go and uses the tinylib msgp library to decode the data
// ReadInt64Bytes tries to read an int64
// from 'b' and return the value and the remaining bytes.
// Possible errors:
// - ErrShortBytes (too few bytes)
// - TypeError (not a int)
func ReadInt64Bytes(b []byte) (i int64, o []byte, err error) {
l := len(b)
if l < 1 {
return 0, nil, ErrShortBytes
}
lead := b[0]
if isfixint(lead) {
i = int64(rfixint(lead))
o = b[1:]
return
}
if isnfixint(lead) {
i = int64(rnfixint(lead))
o = b[1:]
return
}
switch lead {
case mint8:
if l < 2 {
err = ErrShortBytes
return
}
i = int64(getMint8(b))
o = b[2:]
return
case mint16:
if l < 3 {
err = ErrShortBytes
return
}
i = int64(getMint16(b))
o = b[3:]
return
case mint32:
if l < 5 {
err = ErrShortBytes
return
}
i = int64(getMint32(b))
o = b[5:]
return
case mint64:
if l < 9 {
err = ErrShortBytes
return
}
i = getMint64(b)
o = b[9:]
return
default:
err = badPrefix(IntType, lead)
return
}
}
This checks the first byte, and if the first byte is equal to mint32 which is 0xd2, then the next four bytes are read, which is the value of the long using getmint32
func getMint32(b []byte) int32 {
return (int32(b[1]) << 24) | (int32(b[2]) << 16) | (int32(b[3]) << 8) | (int32(b[4]))
}
In this particular issue, the receiving application had to receive an INT32 and the bytes array needs to be 5 in length. The first byte is the header 0xd2 as seen in the OP. This tells the decoding method that it's an INT32. The next four 4 bytes is the time value.
I was forgetting that a long is 8 bytes so we just need to use epoch time which is an integer.
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(baos);
dos.writeInt((int)(md.time/1000));
dos.close();
byte[] timeBytes = baos.toByteArray();
ByteBuffer buf = ByteBuffer.allocate(5);
buf.put((byte) 0xd2);//header
buf.put(timeBytes);//time value (int32 bytes not uint32)

Python byte array for Serial communication

i got a program which needs to send a byte array via a serial communication. And I got no clue how one can make such a thing in python.
I found a c/c++/java function which creates the needed byte array:
byte[] floatArrayToByteArray(float[] input)
{
int len = 4*input.length;
int index=0;
byte[] b = new byte[4];
byte[] out = new byte[len];
ByteBuffer buf = ByteBuffer.wrap(b);
for(int i=0;i<input.length;i++)
{
buf.position(0);
buf.putFloat(input[i]);
for(int j=0;j<4;j++) out[j+i*4]=b[3-j];
}
return out;
}
but how can I translate that to python code.
edit: the serial data is send to a device. where I can not change the firmware.
thanks
Put your data to array (here are [0,1,2] ), and send with: serial.write(). I assume you've properly opened serial port.
>> import array
>> tmp = array.array('B', [0x00, 0x01, 0x02]).tostring()
>> ser.write(tmp.encode())
Ansvered using: Binary data with pyserial(python serial port)
and this:pySerial write() won't take my string
It depends on if you are sending a signed or unsigned and other parameters. There is a bunch of documentation on this. This is an example I have used in the past.
x1= 0x04
x2 = 0x03
x3 = 0x02
x4 = x1+ x2+x3
input_array = [x1, x2, x3, x4]
write_bytes = struct.pack('<' + 'B' * len(input_array), *input_array)
ser.write(write_bytes)
To understand why I used 'B' and '<' you have to refer to pyserial documentation.
https://docs.python.org/2/library/struct.html

processing bytes received from UDP port in java

I am trying to read data from UDP port on localhost using Java. I'm pretty good with Java, but I can't solve this for quite a while now...
The thing is, after I connect using DatagramSocket and receive a packet with DatagramPacket, I get some bytes that have no sence, I can't see connection with the data I expect. Printout looks like this:
$őZAŇ"¤E€^ĽxΕ’M#ŢúCîS5;Ń8†8Ŕ$5»ôxŕ¸Ţf+?’Ť;Ů%>ż?>żA€ĹĽ‘_
so, I'm obviously handlig something in the wrong way. I've also read some signed/unsigned data problems with Java.
About a year ago I've created a similar app using C#, everything went pretty smooth.
Really hope someone can help.
Here is the code (one of the versions, I've tried a lot of different solutions)
DatagramSocket mySocket = new DatagramSocket(null);
InetSocketAddress addr = new InetSocketAddress("127.0.0.1", 20777);
mySocket.bind(addr);
byte[] receiveData = new byte[152];
while(true)
{
DatagramPacket receivePacket = new DatagramPacket(receiveData, 0, receiveData.length);
mySocket.receive(receivePacket);
byte[] barray = receivePacket.getData();
ByteArrayInputStream inputStream = new ByteArrayInputStream(barray);
DataInputStream dInputStream = new DataInputStream(inputStream);
float a = dInputStream.readFloat();
System.out.println(a);
}
Using this method you can convert a byte array to hexadecimal string representation.
private String bytesToHex(byte[] bytes) {
char[] hexArray = "0123456789ABCDEF".toCharArray();
char[] hexChars = new char[bytes.length * 2];
for ( int j = 0; j < bytes.length; j++ ) {
int v = bytes[j] & 0xFF;
hexChars[j * 2] = hexArray[v >>> 4];
hexChars[j * 2 + 1] = hexArray[v & 0x0F];
}
return new String(hexChars);
}
Hope it helps.
I won't flag your question as a duplicate because it is your first one, but I think you should refer to this other exchange. A very elegant and clear solution to your problem is available.
By the way, a citation of the code reading the section you printed would have been welcome. Good luck...
You need:
A specification of the packet format you are receiving.
A DataInputStreamwrapped around a ByteArrayInputStream wrapped around the byte array you used to build the DatagramPacket, not forgetting to use the constructor that takes an offset and length, which you get from the DatagramPacket.
Code that calls the appropriate DataInputStream methods corresponding to (1).
At the moment you don't even appear to have (1). Without that, you haven't got a hope. Just trying to 'make sense' of binary data, especially by just printing it, is a complete waste of your time.
EDIT If, as per your comment, all the fields are floats, just loop over the datagram calling DataInputStream.readFloat() until it throws EOFException:
try
{
while (true)
{
float f = dataInputStream.readFloat();
System.out.println(f);
}
}
catch (EOFException exc)
{
// expected
}
If that doesn't work (i.e produce recognizable value), you will have to switch to DatagramSocketChannel and ByteBuffer and experiment with the different byte-order possibilites.
Why you were trying to print floating-point data as though it was text remains a mystery.

How to read raw bytes from socket in Python?

I have an android java app sending bytes over a socket which is connected to a host machine running a server in Python. I need to receive these bytes as they were sent from the python socket. I see that in Python 'socket.recv' only returns a string. When I send an ASCII string from the java app, I am able to receive the data correctly in the python server, but when I send binary data using java byte, I see the data received is not same. I need to receive raw bytes in Python for my protocol to work correctly. Please point me in right direction.
Code snippet for Sending data on socket:
private void sendFrameMessage(byte[] data) {
byte[] lengthInfo = new byte[4];
Log.v(TAG, "sendFrameMessage");
for(int i=0; i<data.length; i++) {
Log.v(TAG, String.format("data[%d] = %d", i, data[i]));
}
try {
lengthInfo[0] = (byte) data.length;
lengthInfo[1] = (byte) (data.length >> 8);
lengthInfo[2] = (byte) (data.length >> 16);
lengthInfo[3] = (byte) (data.length >> 24);
DataOutputStream dos;
dos = new DataOutputStream(mSocket.getOutputStream());
dos.write(lengthInfo, 0, 4);
dos.write(data, 0, data.length);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
Python Code on receiver side
def recvFrameMessage(self, s):
recv_count = 4;
data = s.recv(recv_count)
if data == 0:
return None
total_rx = len(data)
lenInfo = data
while total_rx < recv_count:
data = s.recv(recv_count - total_rx)
if data == 0:
return None
total_rx += len(data)
lenInfo = lenInfo + data
recv_count = self.decodeFrameLen(lenInfo)
logger.info("length = %d" % recv_count)
data = s.recv(recv_count)
total_rx = len(data)
msg = data
while total_rx < recv_count:
data = s.recv(recv_count - total_rx)
if data == 0:
return None
total_rx += len(data)
msg = msg + data
logger.info("msg = " + msg)
for i in range(0, len(msg)-1):
logger.info("msg[%d] = %s" % (i, msg[i]))
return msg
#SteveP makes good points for binary data "with some structure", but if this is a plain stream of bytes, in Python 2 simply apply the ord() function to each "character" you get from the socket. For example, if the Java end sends a NUL byte, that will show up on the Python end as the character "\x00", and then:
>>> ord("\x00")
0
To convert a whole string s,
map(ord, s)
returns a list of the corresponding 8-bit unsigned integers.
I'm assuming Python 2 here.
Reading binary data is perfectly doable, but what if the binary representation from your android app is different than the byte representation on the Python server? From the Python documentation:
It is perfectly possible to send binary data over a socket. The major
problem is that not all machines use the same formats for binary data.
For example, a Motorola chip will represent a 16 bit integer with the
value 1 as the two hex bytes 00 01. Intel and DEC, however, are
byte-reversed - that same 1 is 01 00. Socket libraries have calls for
converting 16 and 32 bit integers - ntohl, htonl, ntohs, htons where
“n” means network and “h” means host, “s” means short and “l” means
long. Where network order is host order, these do nothing, but where
the machine is byte-reversed, these swap the bytes around
appropriately.
Without code and example input/output, this question is going to be really difficult to answer. I assume the issue is that the representation is different. The most likely issue is that Java uses big endian, whereas Python adheres to whatever machine you are running it off of. If your server uses little endian, then you need to account for that. See here for a more thorough explanation on endianness.

Trying to upload in chunks

I am trying to accomplish a large file upload on a blackberry. I am succesfully able to upload a file but only if I read the file and upload it 1 byte at a time. For large files I think this is decreasing performance. I want to be able to read and write at something more 128 kb at a time. If i try to initialise my buffer to anything other than 1 then I never get a response back from the server after writing everything.
Any ideas why i can upload using only 1 byte at a time?
z.write(boundaryMessage.toString().getBytes());
DataInputStream fileIn = fc.openDataInputStream();
boolean isCancel = false;
byte[]b = new byte[1];
int num = 0;
int left = buffer;
while((fileIn.read(b)>-1))
{
num += b.length;
left = buffer - num * 1;
Log.info(num + "WRITTEN");
if (isCancel == true)
{
break;
}
z.write(b);
}
z.write(endBoundary.toString().getBytes());
It's a bug in BlackBerry OS that appeared in OS 5.0, and persists in OS 6.0. If you try using a multi-byte read before OS 5, it will work fine. OS5 and later produce the behavior you have described.
You can also get around the problem by creating a secure connection, as the bug doesn't manifest itself for secure sockets, only plain sockets.
Most input streams aren't guaranteed to fill a buffer on every read. (DataInputStream has a special method for this, readFully(), which will throw an EOFException if there aren't enough bytes left in the stream to fill the buffer.) And unless the file is a multiple of the buffer length, no stream will fill the buffer on the final read. So, you need to store the number of bytes read and use it during the write:
while(!isCancel)
{
int n = fileIn.read(b);
if (n < 0)
break;
num += n;
Log.info(num + "WRITTEN");
z.write(b, 0, n);
}
Your loop isn't correct. You should take care of the return value from read. It returns how many bytes that were actually read, and that isn't always the same as the buffer size.
Edit:
This is how you usually write loops that does what you want to do:
OutputStream z = null; //Shouldn't be null
InputStream in = null; //Shouldn't be null
byte[] buffer = new byte[1024 * 32];
int len = 0;
while ((len = in.read(buffer)) > -1) {
z.write(buffer, 0, len);
}
Note that you might want to use buffered streams instead of unbuffered streams.

Categories