Transmitting files of 64k size through sockes fails

Transmitting files of 64k size through sockes fails - java

I am developing an application that uses java sockets between a server and client apps. I need to send files of size 64k from client to server through these sockets. When I locally run all the system (both, server and client) everything goes ok, but when I run the server and client in different machines it fails.
I am using JSON to process the file content, so the thrown exception in server is: "net.sf.json.util.JSONTokener.syntaxError". However the problem is not JSON, is the size of the file. When I send files with a size lesser than 8k everything goes ok, but bigger sizes truncates the sent information, so it throws a JSONTokener.syntaxError when server tries to interpret truncated received information.
I am defining a socket buffer of 64k as following (I am using NIO API):
SocketChannel sc;
private static final int BUFFER _SIZE = (int)Math.pow(2, 16);
.....
sc.socket().setReceiveBufferSize( BUFFER_SIZE );
sc.socket().setSendBufferSize( BUFFER_SIZE );
What do I need to do to enlarge the network package size when I run my system in a remote mode? Do you have any idea which is the problem?
Thank you very much in advance.
Oscar

The buffer sizes are typically over 64K so you could be actually shrinking them.
The MTU for TCP packets is typically 1.5 KB and changing this is highly unlikely to help you. In any case, 9000 bytes is the most I have ever seen it set to.
The problem you have could be in the way you are writing the data, but I suspect its in the way you are reading the data. It a common mistake to assume you will receive the same size of data you sent or that you will receive it all at once.
Streams do not "know" what size of data you wrote and you don't know when all the data sent has been received unless you have a protocol which includes the length.

Related

How to speed up data transfer over socket?

Currently I am using this code on both Server and Client Side. Client is an android device.
BufferedOutputStream os = new BufferedOutputStream(socket.getOutputStream(),10000000);
BufferedInputStream sin = new BufferedInputStream(socket.getInputStream(),10000000);
os.write("10000000\n".getBytes());
os.flush();
for (int i =0;i<10000000;i++){
os.write((sampleRead[i]+" ").getBytes());
}
os.flush();
The problem is that this code takes about 80 secs to transfer data from android client to server while it takes only 8 seconds to transfer the data back from server to client. The code is same on both sides and buffer is also same. I also tried with different buffer sizes but the problem is with this segment
for (int i =0;i<10000000;i++){
os.write((sampleRead[i]+" ").getBytes());
}
The buffering takes most of the time while the actual transfer takes only about 6-7 seconds on a 150mbps hotspot connection. What could be the problem and how to solve it?

First of all, as a commenter has already noted, using a monstrously large buffer is likely to be counter productive. Once your stream buffer is bigger than the size of a network packet, app-side buffering loses its effectiveness. (The data in your "big" buffer needs to be split packet-sized chunks by the TCP/IP stack before it goes onto the network.) Indeed, if the app-side buffer is really large, you may find that your data gets stuck in the buffer for a long time waiting for the buffer to fill ... while the network is effectively idle.
(The Buffered... readers, writers and streams are primarily designed to avoid lots of syscalls that transfer tiny amounts of data. Above 10K or so, the buffering doesn't performance help much.)
The other thing to now is that in a lot of OS environments, the network throughput is actually limited by virtualization and default network stack tuning parameters. To get a better throughput, you may need to tune at the OS level.
Finally, if your network path is going over a network path that is congested, has a high end-to-end latency or links with constrained data rate, then you are unlikely to get fast data transfers no matter how you tune things.
(Compression might help ... if you can afford the CPU overhead at both ends ... but some data links already do compression transparently.)

You could compress the data transfer, it will save a lot of memory and well to transfer a compress stream of data is cheaper... For that you need to implement compress logic in client side and decompress logic in server side, see GZIPInputStream... And try reducing the buffer size is huge for a mobile device...

How can I increase the throughput in this simple Java-based TCP server application?

I am writing a very basic TCP server. The server keeps track of the state it receives from clients. I documented the message format and published the source. On a 2009 MacBookPro (2.26 GHz Core 2 Duo, 4 GB RAM), the throughput is very low - 1 MB/s if server and client run on the same machine. I am looking for ways to dramatically increase the throughput.
Both, the main loop of the server and the client are fairly straightforward. After establishing the connection with the server, the client creates instances of UpdateOneMessage, and sends their byte[] representation to the server. From Client.run():
for (int i = 0; i < maxMessageCount; i++) {
send(new UpdateOneMessage(1 + i, id, "updatedState"));
// .. read response
}
Client.send() serializes the message and writes to the DataOutputStream.
private int send(final Message message) throws Exception {
final byte[] bytes = message.serialize();
out.write(bytes);
out.flush();
return bytes.length;
}
Profiling client and server with JVM Monitor, showed CPU time was dominated by reading from the InputStreamReader and writing to the DataOutputStream. But at 1 MB/s, this application is not even close to being IO-bound.
Which throughput can I expect from my app considering that each message is fairly small (55 bytes on average)?
What else can I do to find the bottlenecks in this simple application?

The following code
send(new UpdateOneMessage(1 + i, id, "updatedState"));
// .. read response
suggest that you switch direction of the traffic on each message. That is, you wait on response on each request before sending the next. This architecture is going to put some constraints on how fast you can run. The latency that each message is going to experience is going to hit the general throughput of your server.
If you move the client and server to two different locations with some distance between them, you will see an even slower transfer rate.
With e.g 1500 km of network, the speed of light will ensure that you at a maximum gets a 100 round trips per second. With 55 bytes per message that's only 5.5 Kb per second.
If you need faster transfer you can do several things.
The most obvious fix is to increase message size. This will give the most on longer distances.
Don't wait for responses before sending the next messages. This can increase throughput tremendously.
Use new connection + thread for each request. This way you can have several parallel requests underway at the same time.

For speed you are better off using another protocol that will save you both on number of bytes sent and on processing time.
For example Google protocol buffers are fast and bandwidth efficient.
http://code.google.com/p/protobuf/
Or if the objects really are as small as you are saying then just hand encode them with a custom protocol.
The aim is to get both the processing needed and the number of bytes sent over the network down to the minimum possible.

Recommended TCP buffer size? Recommended to break it up?

I am writing an application which grabs an XML file from the server and then works with the data inside. My question is, because TCP ensures that all packets arrive and is beyond my control to control how it breaks that data apart, does it make sense to cap the buffer size? If so, I can send the data over in chunks and reassemble them on the client side. Obviously I cannot make an infinite buffer. The XML can get fairly large, up to 256kb and I am bit worried about reserving a buffer of that size. The data is pulled by an Android device but we can assume the device have 1gb of RAM.

The TCP receive buffer size has nothing to do with the size of the data being transferred. Obviously, you can transport gigabytes of data over TCP streams and that doesn't require the buffer to be of the same size. The buffer size generally has to do with performance (both network and processor on the endpoints) and can be small - you probably don't have to change the default settings in most cases.

You don't need to reassemble it at the client side yourself. Just attach an XML parser directly to the socket InputStream.

The default buffers in the network stack are generally tuned to be good, on average. Unless your application is particularly unusual (which it does not sound like), you would be better off not changing the buffer size. The fact that the endpoints are different will also result in tension that prevents easy selection of anything more optimal for both simultaneously.
As suggested, if you use a streaming parser on the receiving side, the buffer size does not really matter. Send the messages as you have them ready to reduce latency caused by batching the entire document.

ReadableByteChannel.read(ByteBuffer dest) reads capped at 8KB. Why?

I've got some code that:
reads from a ReadableByteChannel into a ByteBuffer,
takes note of the bytes transfered,
pauses a few tens to hundreds of miliseconds,
passes the ByteBuffer onto a WritableByteChannel.
Some details:
Both Channels are TCP/IP sockets.
The total connection read size is in the tens of megabytes.
The source socket (which the ReadableByteChannel is getting bytes from) is on the same machine.
Debian Lenny 64-bit on HP DL380s
Sun Java 1.6.0 update 20
The problem is, no matter how large a ByteBuffer is allocated, either with .allocate() or .allocateDirect(), the number of bytes read into the ByteBuffer maxes out at 8KB. My target ByteBuffer size is 256KB, which is only a tiny fraction (1/32nd) is being used. About 10% of the time only 2896 bytes are read in.
I've checked the OS TCP buffer settings, and they look fine. This is confirmed by watching netstat's report on how many bytes are in the buffer--both have data in the socket buffers exceeding 8KB.
tcp 0 192384 1.2.3.4:8088 1.2.3.4:53404 ESTABLISHED
tcp6 110144 0 1.2.3.4:53404 1.2.3.4:8088 ESTABLISHED
One thing that stands out here is the mix of TCP and TCP6, but that should not be a problem, I think. My Java client is on port 53404 in the above output.
I've tried setting the socket properties to favor bandwidth over latency, but no change.
Socket socket = new Socket(host.getHostName(), host.getPort());
socket.setPerformancePreferences(1, 0, 2); //bw > connection time > latency
When I log the value of socket.getReceiveBufferSize(), it consistently reports a mere 43856 bytes. While it is smaller than I would like, it is still more than 8KB. (It is also is not a very round number, which I would have expected.)
I'm really stumped as to what the problem is here. In theory, AFAIK, this should not be happening. It would not be desirable to 'downgrade' to a stream-based solution, although that is where we are going next if a solution cannot be found.
What am I missing? What can I do to correct it?

OK, I've found the issue! (And am answer my own question in case someone has the same problem.)
I was instancing the ReadableByteChannel not directly from the Socket instance, but from an the HttpEntity.getContent() (Apache HTTP Commons Client) method's returned InputStream. The HTTP Commons client had been passed the socket early on with the DefaultHttpClientConnection.bind() method. What I did not understand is, I think, the Channel is of a BufferedInputStream instance buried inside the HTTP Commons Client implementation. (8KB just happens to be the default value for a Java 6.)
My solution, therefore, was to grab the ReadableByteChannel off the raw Socket instance.

Finding server internet bandwidth thru java for streaming

Following this thread.
Streaming large files in a java servlet.
Is it possible to find the total internet bandwidth available in current machine thru java?
what i am trying to do is while streaming large files thru servlet, based on the number of parallel request and the total band width i am trying to reduce the BUFFER_SIZE of the stream for each request. make sense?
Is there any pure java way? (without JNI)

Maybe you can time how long the app need to send one package (the buffer). And if that is larger than x milliseconds, then make your buffer smaller. You can use other values for the original bufferSize and if (stop - start > 700).
This is based on the thread you noticed:
ServletOutputStream out = response.getOutputStream();
InputStream in = [ code to get source input stream ];
String mimeType = [ code to get mimetype of data to be served ];
int bufferSize = 1024 * 4;
byte[] bytes = new byte[bufferSize];
int bytesRead;
response.setContentType(mimeType);
while ((bytesRead = in.read(bytes)) != -1) {
long start = System.currentTimeMillis();
out.write(bytes, 0, bytesRead);
long stop = System.currentTimeMillis();
if (stop - start > 700)
{
bufferSize /= 2;
bytes = new byte[bufferSize];
}
}
// do the following in a finally block:
in.close();
out.close();

The only way to find available bandwidth is to monitor / measure it. On windows you have access to Net.exe and can get the throughput on each NIC.

If you're serving the content through a servlet, then you could calculate how fast each servlet output stream is going. Collect that data for all streams for a user/session, and you could determine at least what the current bandwidth usage is.
A possible way to calculate the rate could be instead of writing the large files through the servlet output stream, write to a new FilterOutputStream that would keep track of your download rates.

The concept of "total internet bandwidth available in current machine" is really hard to define. However, tweaking the local buffer size will not affect how much data you can push through to an individual client.
The rate at which a given client can take data from your server will vary with the client, and with time. For any given connection, you might be limited by your local upstream connection to the Internet (e.g., server on DSL) or you might be limited somewhere in the core (unlikely) or the remote end (e.g., server in a data center, client on a dialup line). When you have many connections, each individual connection may have a different bottleneck. Measuring this available bandwidth is a hard problem; see for example this list of research and tools on the subject.
In general, TCP will handle using all the available bandwidth fairly for any given connection (though sometimes it may react to changes in available bandwidth slower than you like). If the client can't handle more data, the write call will block.
You should only need to tweak the buffersize in the linked question if you find that you are seeing low bandwidth and the cause of that is insufficient data buffered to write to the network. Another reason you might tweak the buffer size is if you have so many active connections that you are running low on memory.
In any case, the real answer may be to not buffer at all but instead put your static files on a separate server and use something like thttpd to serve them (using a system call like sendfile) instead of a servlet. This helps ensure that the bottleneck is not on your server, but somewhere out in the Internet, beyond your control.

EDIT: Re-reading this, it's a little muddled because it's late here. Basically, you shouldn't have to do this from scratch; use one of the existing highly scalable java servers, since they'll do it better and easier.
You're not going to like this, but it actually doesn't make sense, and here's why:
Total bandwidth is independent of the number of connections (though there is some small overhead), so messing with buffer sizes won't help much
Your chunks of data are being broken into variable-sized packets anyway. Your network card and protocol will deal with this better than your servlet can
Resizing buffers regularly is expensive -- far better to re-use constant buffers from a fixed-size pool and have all connections queue up for I/O rights
There are a billion and a half libraries that assist with this sort of server
Were this me, I would start looking at multiplexed I/O using NIO. You can almost certainly find a library to do this for you. The IBM article here may be a useful starting point.
I think the smart money gives you one network I/O thread, and one disk I/O thread, with multiplexing. Each connection requests a buffer from a pool, fills it with data (from a shared network or disk Stream or Channel), processes it, then returns the buffer to the pool for re-use. No re-sizing of buffers, just a bit of a wait for each chunk of data. If you want latency to stay short, then limit how many transfers can be active at a time, and queue up the others.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.