Fastest way to copy a file over socket

Fastest way to copy a file over socket - java

Which would be the fastest way to copy a file over a socket? I have tried several ways but I am not convinced I found the fastest way concerning transfer and CPU usage. (Best result: 175mBit/s (SSD/GBit Network))
Server:
ByteBuffer bb = ByteBuffer.allocate(packet_size);
DataOutputStream data_out = new DataOutputStream(socket.getOutputStream());
while(working){
int count =0;
int packet_size = in.readInt();
long pos = in.readLong();
if(filechannel.position()!=requested_pos){
filechannel.position(requested_pos);
}
bb.limit(packet_size);
bb.position(0);
if((count=filechannel.read(bb))>0){ //FileInputStream.getChannel()
data_out.writeInt(count);
data_out.write(bb.array(),0,count);
}else{
working=false;
}
}
Client:
for(long i=0;i<=steps;i++){
data_out.writeInt(packet_size); //requested packet size
data_out.writeLong(i*packet_size); //requested file position
count=in.readInt();
bb.clear();
bb.limit(count);
lastRead=0;
while(lastRead<count){
lastRead+=in.read(bytes,lastRead,count-lastRead);
}
bb.put(bytes,0,count);
bb.position(0);
filechannel.write(bb); // filechannel over RandomAccessFile
}
any suggestions?

You are looking at only half the issue. The code used to send/receive is only one factor. No matter how hard you optimize it, if you set up your socket with unsuitable parameters, performance takes a big hit.
For large data transfers, ensure the sockets have reasonably large buffers. I'd choose at least 64kb, possibly larger. Send and receive buffers can be set up independently, for the sender you want a large(r) send buffer and for the receiver a large(r) receive buffer.
socket.setReceiveBufferSize(int);
socket.setSendBufferSize(int);
socket.setTcpNoDelay(false);
Set TCP NO DELAY to OFF, unless you know what you're doing and after confirming you really absolutely need it. It will never improve throughput, on the contrary it may sacrifice throughput in favor of reduced latency.
The next thing is to tailor your sender code to do its best to keep that buffer full at all times. For maximum speed reading from the file and writing to the socket should be separated into two independent threads, communicating to each other using some kind of queue. Chunks in the queue should be reasonably large (at least a few kb).
Likewise the receiving code should do its best to keep the receive buffer as empty as possible. Again, for maximum speed this requires two threads, one reading the socket and another processing the data. Queue in between like the sender.
The job of the queues is to decouple stalls in reading data from file/writing to file from the actual network transfer, and vice versa.
The above is the generic pattern how you get maximum throughput, regardless of transmission channels. The slower channel will be kept completely saturated, be it file reading/writing or network transfer.
Buffer sizes can be tweaked to squeeze out the last few percent of possible performance (I'd start with 64kb for the socket and 8kb chunks in the queue with a maximum queue size of 1mb, this should deliver performance reasonably close to the maximum possible).
Another limiting factor you may run into is the TCP transfer window scaling (especially over a high bandwidth, high latency connection). Aside from ensuring the receiver empties the receive buffer as fast as possible there isn't anything you can do from the java side. Tweaking options exists on the OS level.

You want to use the NIO.
import java.nio.ByteBuffer;
import java.nio.channels.Channels;
import java.nio.channels.ReadableByteChannel;
import java.nio.channels.WritableByteChannel;
public class FileServlet extends HttpServlet {
#Override
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
try(final InputStream is = new BufferedInputStream((InputStream) <YOUR INPUT STREAM TO A FILE HERE>);
final OutputStream os = new BufferedOutputStream(response.getOutputStream()); ) {
fastCopy(is, os);
}
}
public static void fastCopy(final InputStream src, final OutputStream dest) throws IOException {
fastCopy(Channels.newChannel(src), Channels.newChannel(dest));
}
public static void fastCopy(final ReadableByteChannel src, final WritableByteChannel dest) throws IOException {
final ByteBuffer buffer = ByteBuffer.allocateDirect(16 * 1024);
while(src.read(buffer) != -1) {
buffer.flip();
dest.write(buffer);
buffer.compact();
}
buffer.flip();
while(buffer.hasRemaining()) {
dest.write(buffer);
}
}
}
}

Related

Accumulating Netty direct IO buffer to CompositeByteBuf

We are migrating from Servlet API 2.5 (thread per request) in our application to Netty. One of cases is using blocking-style ini file parser which we had for many years. Current approach is accumulating incoming ByteBuf into CompositeByteBuf and feeding it to a parser after wrapping with ByteBufInputStream.
In real application we are using HTTP and ini is sent to a server as HTTP request body. But in snippet below it is assumed that all inbound content is transferred file.
class AccumulatingChannelHandler extends ChannelInboundHandlerAdapter {
final BlockingIniFileParser parser = new BlockingIniFileParser();
CompositeByteBuf accumulator;
#Override
public void channelActive(ChannelHandlerContext ctx) {
accumulator = ctx.alloc().compositeBuffer(Integer.MAX_VALUE);
}
#Override
public void channelRead(ChannelHandlerContext ctx, Object msg) {
ByteBuf ioBuffer = (ByteBuf) msg;
accumulator.addComponent(true, ioBuffer);
}
#Override
public void channelInactive(ChannelHandlerContext ctx) {
IniFile iniFile = parser.parse(new ByteBufInputStream(accumulator));
accumulator.release();
ByteBuf result = process(iniFile);
ctx.writeAndFlush(result);
ctx.close();
}
private ByteBuf process(IniFile iniFile) {...}
}
class BlockingIniFileParser {
IniFile parse(InputStream in) {...}
}
interface IniFile {
String getSetting(String section, String entry);
}
By default pooled direct buffers are coming to channelRead method. And with such strategy we risk to get uncontrollable consumption of direct memory. So, I would like to understand:
Is it rational to accumulate IO buffers in such fashion?
Is there any best practice for integrating Netty IO with blocking parsers?
What is the best practice for parsing input (in some structured format) with Netty?

You have not mentioned the size of a typical uploaded ini file, but assuming they're large enough to cause some concern, I would consider ditching the CompositeByteBuf. Allocate an un-pooled [optionally direct] buffer and as pooled buffers come in the door, write them to the un-pooled buffer, then release them. When you're done reading, continue with your use of a ByteBufInputStream around the un-pooled buffer. Once complete, the un-pooled buffer will be GCed.
You will still be allocating chunks of memory, but it won't be drawn from your buffer pools, and if you use a direct un-pooled buffer, it will not impact your heap as much.
Ultimately, if the size of the un-pooled buffer remains a concern, I would bite the bullet and write the incoming pooled buffers out to disk and on completion, read them back in using a FileInputStream.

Java NIO. SocketChannel.read method all time return 0. Why?

I try understand how works java NIO. In particular, how works SocketChannel.
I wrote code below:
import java.io.*;
import java.net.*;
import java.nio.*;
import java.nio.channels.*;
public class Test {
public static void main(String[] args) throws IOException {
SocketChannel socketChannel = SocketChannel.open();
socketChannel.configureBlocking(false);
socketChannel.connect(new InetSocketAddress("google.com", 80));
while (!socketChannel.finishConnect()) {
// wait, or do something else...
}
String newData = "Some String...";
ByteBuffer buf = ByteBuffer.allocate(48);
buf.clear();
buf.put(newData.getBytes());
buf.flip();
while (buf.hasRemaining()) {
System.out.println(socketChannel.write(buf));
}
buf.clear().flip();
int bytesRead;
while ((bytesRead = socketChannel.read(buf)) != -1) {
System.out.println(bytesRead);
}
}
}
I try connect to google server.
Send request to the server;
Read answer from the server.
but, method socketChannel.read(buf) all time return 0 and performs infinitely.
Where I made mistake??

Because NIO SocektChannel will not block until the data is available for read. i.e, Non Blocking Channel can return 0 on read() operation.
That is why while using NIO you should be using java.nio.channels.Selector which gives you read notification on channel if the data is available.
On the otherhand, blocking channel will wait till the data is available and return how much data is available. i.e, Blocking channel will never return 0 on read() operation.
You can read more about NIO here:
http://tutorials.jenkov.com/java-nio/index.html
http://java.sun.com/developer/technicalArticles/releases/nio/
https://www.ibm.com/developerworks/java/tutorials/j-nio/section2.html

You have specifically configured
socketChannel.configureBlocking(false);
If you left the default which is blocking then you would never get a length of 0 returned.

While reading the title I was about the say: b/c there is no room in the buffer:
Unlike the answers so far the real culprit is buf.clear().flip(); this effectively sets the buffer like that position=limit=0 - hence no data read - change to buf.clear() and you are good to go.
Have fun w/ the NIO and use debugger before posting a question.

If you really want to use non-blocking I/O, then either use selector, or use nio2 (AsynchronousSocketChannel et al.) from Java 7. Using selector is tricky, better find ready working examples. Using nio2 is relatively easy.

The technical explanation is that you didn't send a complete HTTP request to the google webserver, therefore it won't send you a reply until it receives a complete request.
This explains why you are reading 0 bytes which means that no data is available for reading.

Why does my UART performance in Java vary?

I'm using an ordinary serial port on a PC to send and receive data in a Java application. The PC runs Windows XP SP3 with java 1.6.0. Here is the code:
import gnu.io.CommPortIdentifier;
import gnu.io.SerialPort;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.util.concurrent.ArrayBlockingQueue;
// Open the serial port.
CommPortIdentifier portId;
SerialPort serialPort;
portId = CommPortIdentifier.getPortIdentifier("COM1");
serialPort = (SerialPort) portId.open("My serial port", 1000 /* 1 second timeout */);
serialPort.setSerialPortParams(115200, SerialPort.DATABITS_8, SerialPort.STOPBITS_1, SerialPort.PARITY_NONE);
// Set up input and output streams which will be used to receive and transmit data on the UART.
InputStream input;
OutputStream output;
input = serialPort.getInputStream();
output = serialPort.getOutputStream();
// Wrap the input and output streams in buffers to improve performance. 1024 is the buffer size in bytes.
input = new BufferedInputStream(input, 1024);
output = new BufferedOutputStream(output, 1024);
// Sync connection.
// Validate connection.
// Start Send- and Receive threads (see below).
// Send a big chunk of data.
To send data I've set up a thread that takes packages from a queue (ArrayBlockingQueue) and sends it on the UART. Similar for receive. Other parts of the application can simply insert packages into the send queue and then poll the receive queue to get the reply.
private class SendThread extends Thread {
public void run() {
try {
SendPkt pkt = SendQueue.take();
// Register Time1.
output.write(pkt.data);
output.flush();
// Register Time2.
// Put the data length and Time2-Time1 into an array.
// Receive Acknowledge.
ResponsePkt RspPkt = new ResponsePkt();
RspPkt.data = receive(); // This function calls "input.read" and checks for errors.
ReceiveQueue.put(RspPkt);
} catch (IOException e) { ... }
}
Each send packet is at most 256 bytes, which should take 256*8 bits / 115200 bits/s = 17,7ms to transfer.
I put measurements of Time2-Time1 in an array, i.e. the send time, and check it later. It turns out that sometimes a transfer of 256 bytes takes 15ms to transfer, which seems good since it's close to the teoretical minimum. I'm not sure though why it's faster in practice than in theory. However, the problem is that sometimes a transfer of 256 bytes takes 32ms, i.e. twice as much as needed. What could be causing this?
/Henrik

A (windows) PC is not a real-time machine. That means that when ever your application has to access a hardware layer it may be delayed. You have no control over this and there is no fixed amount of time between entering your function and exiting your function due to how the system (kernel) works.
Most Linux machines behave the same way. There are other tasks (applications) running in the background which consume processing power and thus your application might be moved around a bit in the process queue before sending the real data.
Even in the sending process there might be delays between each byte being send. This is all handled by the kernel/hardware layer and your software can't change that.
If you do need real-time excecution then you'll have to look for a real-time operating system.
This line sums it up pretty nicely:
A key characteristic of an RTOS is the level of its consistency concerning the amount of time it takes to accept and complete an application's task; the variability is jitter.
Where with a RTOS this jitter is known/defined and with a normal OS this jitter is unknown/undefined.

Do you measure the time with System.nanoTime()?
Windows clock resolution used by System.currentTimeMillis() is by default around 15ms so perhaps in real time each task takes 20ms but some are spread over two ticks instead of one.
See System.currentTimeMillis vs System.nanoTime for more info.

Slow transfers in Jetty with chunked transfer encoding at certain buffer size

I'm investigating a performance problem with Jetty 6.1.26. Jetty appears to use Transfer-Encoding: chunked, and depending on the buffer size used, this can be very slow when transferring locally.
I've created a small Jetty test application with a single servlet that demonstrates the issue.
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.OutputStream;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import org.mortbay.jetty.Server;
import org.mortbay.jetty.nio.SelectChannelConnector;
import org.mortbay.jetty.servlet.Context;
public class TestServlet extends HttpServlet {
#Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp)
throws ServletException, IOException {
final int bufferSize = 65536;
resp.setBufferSize(bufferSize);
OutputStream outStream = resp.getOutputStream();
FileInputStream stream = null;
try {
stream = new FileInputStream(new File("test.data"));
int bytesRead;
byte[] buffer = new byte[bufferSize];
while( (bytesRead = stream.read(buffer, 0, bufferSize)) > 0 ) {
outStream.write(buffer, 0, bytesRead);
outStream.flush();
}
} finally {
if( stream != null )
stream.close();
outStream.close();
}
}
public static void main(String[] args) throws Exception {
Server server = new Server();
SelectChannelConnector ret = new SelectChannelConnector();
ret.setLowResourceMaxIdleTime(10000);
ret.setAcceptQueueSize(128);
ret.setResolveNames(false);
ret.setUseDirectBuffers(false);
ret.setHost("0.0.0.0");
ret.setPort(8080);
server.addConnector(ret);
Context context = new Context();
context.setDisplayName("WebAppsContext");
context.setContextPath("/");
server.addHandler(context);
context.addServlet(TestServlet.class, "/test");
server.start();
}
}
In my experiment, I'm using a 128MB test file that the servlet returns to the client, which connects using localhost. Downloading this data using a simple test client written in Java (using URLConnection) takes 3.8 seconds, which is very slow (yes, it's 33MB/s, which doesn't sound slow, except that this is purely local and the input file was cached; it should be much faster).
Now here's where it gets strange. If I download the data with wget, which is a HTTP/1.0 client and therefore doesn't support chunked transfer encoding, it only takes 0.1 seconds. That's a much better figure.
Now when I change bufferSize to 4096, the Java client takes 0.3 seconds.
If I remove the call to resp.setBufferSize entirely (which appears to use a 24KB chunk size), the Java client now takes 7.1 seconds, and wget is suddenly equally slow!
Please note I'm not in any way an expert with Jetty. I stumbled across this problem while diagnosing a performance problem in Hadoop 0.20.203.0 with reduce task shuffling, which transfers files using Jetty in a manner much like the reduced sample code, with a 64KB buffer size.
The problem reproduces both on our Linux (Debian) servers and on my Windows machine, and with both Java 1.6 and 1.7, so it appears to depend solely on Jetty.
Does anyone have any idea what could be causing this, and if there's something I can do about it?

I believe I have found the answer myself, by looking through the Jetty source code. It's actually a complex interplay of the response buffer size, the size of the buffer passed to outStream.write, and whether or not outStream.flush is called (in some situations). The issue is with the way Jetty uses its internal response buffer, and how the data you write to the output is copied to that buffer, and when and how that buffer is flushed.
If the size of the buffer used with outStream.write is equal to the response buffer (I think a multiple also works), or less and outStream.flush is used, then performance is fine. Each write call is then flushed straight to the output, which is fine. However, when the write buffer is larger and not a multiple of the response buffer, this seems to cause some weirdness in how the flushes are handled, causing extra flushes, leading to bad performance.
In the case of chunked transfer encoding, there's an extra kink in the cable. For all but the first chunk, Jetty reserves 12 bytes of the response buffer to contain the chunk size. This means that in my original example with a 64KB write and response buffer, the actual amount of data that fit in the response buffer was only 65524 bytes, so again, parts of the write buffer were spilling into multiple flushes. Looking at a captured network trace of this scenario, I see that the first chunk is 64KB, but all subsequent chunks are 65524 bytes. In this case, outStream.flush makes no difference.
When using a 4KB buffer I was seeing fast speeds only when outStream.flush was called. It turns out that resp.setBufferSize will only increase the buffer size, and since the default size is 24KB, resp.setBufferSize(4096) is a no-op. However, I was now writing 4KB pieces of data, which fit in the 24KB buffer even with the reserved 12 bytes, and are then flushed as a 4KB chunk by the outStream.flush call. However, when the call to flush is removed, it will let the buffer fill up, again with 12 bytes spilling into the next chunk because 24 is a multiple of 4.
In conclusion
It seems that to get good performance with Jetty, you must either:
When calling setContentLength (no chunked transfer encoding) and use a buffer for write that's the same size as the response buffer size.
When using chunked transfer encoding, use a write buffer that's at least 12 bytes smaller than the response buffer size, and call flush after each write.
Note that the performance of the "slow" scenario is still such that you'll likely only see the difference on the local host or very fast (1Gbps or more) network connection.
I guess I should file issue reports against Hadoop and/or Jetty for this.

Yes, Jetty will default to Transfer-Encoding: Chunked if the size of response cannot be determined.
If you know the size of response that what its going to be.
You need to call resp.setContentLength(135*1000*1000*1000); in this case instead of
resp.setBufferSize();
actually setting resp.setBufferSize is immaterial.
Before opening the OutputStream, that is before this line:
OutputStream outStream = resp.getOutputStream();
you need to call
resp.setContentLength(135*1000*1000*1000);
(the line above)
Give it a spin. see if that works.
Those are my guesses from theory.

This is pure speculation, but I'm guessing this is some sort of Garbage Collector issue. Does the performance of the Java client improve when you run the JVM with more heap like...
java -Xmx 128m
I don't recall the JVM switch to turn on GC logging, but figure that out and see if GC kicks in just as you are getting into your doGet.
My 2 cents.

OutputStream OutOfMemoryError when sending HTTP

I am trying to publish a large video/image file from the local file system to an http path, but I run into an out of memory error after some time...
here is the code
public boolean publishFile(URI publishTo, String localPath) throws Exception {
InputStream istream = null;
OutputStream ostream = null;
boolean isPublishSuccess = false;
URL url = makeURL(publishTo.getHost(), this.port, publishTo.getPath());
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
if (conn != null) {
try {
conn.setDoOutput(true);
conn.setDoInput(true);
conn.setRequestMethod("PUT");
istream = new FileInputStream(localPath);
ostream = conn.getOutputStream();
int n;
byte[] buf = new byte[4096];
while ((n = istream.read(buf, 0, buf.length)) > 0) {
ostream.write(buf, 0, n); //<--- ERROR happens on this line.......???
}
int rc = conn.getResponseCode();
if (rc == 201) {
isPublishSuccess = true;
}
} catch (Exception ex) {
log.error(ex);
} finally {
if (ostream != null) {
ostream.close();
}
if (istream != null) {
istream.close();
}
}
}
return isPublishSuccess;
}
HEre is the error i am getting...
Exception in thread "Thread-8773" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at sun.net.www.http.PosterOutputStream.write(PosterOutputStream.java:61)
at com.test.HTTPClient.publishFile(HTTPClient.java:110)
at com.test.HttpFileTransport.put(HttpFileTransport.java:97)

The HttpUrlConnection is buffering the data so that it can set the Content-Length header (per HTTP spec).
One alternative, if your destination server supports it, is to use "chunked" transfers. This will buffer only a small portion of data at a time. However, not all services support it (Amazon S3, for example, doesn't).
Another alternative (and imo a better one) is to use Jakarta HttpClient. You can set the "entity" in a request from a file, and the connection code will set request headers appropriately.
Edit: nos commented that the OP could call HttpURLConnection.setFixedLengthStreamingMode(long length). I was unaware of this method; it was added in 1.5, and I haven't used this class since then.
However, I still suggest using Jakarta HttpClient, for the simple reason that it reduces the amount of code that the OP has to maintain. Code that is boilerplate, yet still has the potential for errors:
The OP correctly handles the loop to copy between input and output. Usually when I see an example of this, the poster either doesn't properly check the returned buffer size, or keeps re-allocating the buffers. Congratulations, but you now have to ensure that your successors take as much care.
The exception handling isn't quite so good. Yes, the OP remembers to close the connections in a finally block, and again, congratulations on that. Except that either of the close() calls could throw IOException, keeping the other from executing. And the method as a whole throws Exception, so that the compiler isn't going to help catch similar errors.
I count 31 lines of code to setup and execute the response (excluding the response code check and the URL computation, but including the try/catch/finally). With HttpClient, this would be somewhere in the range of a half dozen LOC.
Even if the OP had written this code perfectly, and refactored it into methods similar to those in Jakarta Commons IO, s/he shouldn't do that. This code has been written and tested by others. I know that it's a waste of my time to rewrite it, and suspect that it's a waste of the OP's time as well.

conn.setFixedLengthStreamingMode((int) new File(localpath).length());
And for buffering you could cover your streams into the BufferedOutputStream and BufferedInputStream
Good example of chunked uploading you could find there: gdata-java-client

The problem is that the HttpURLConnection class is using a byte array to store your data. Presumably this video you are pushing is taking more memory than available. You have a few options here:
Increase the memory to your application. You can use the -Xmx1024m option to give 1GB of memory to your application. This will increase the amount of data you can store in memory.
If you still run out of memory, you might want to consider trying another library to push the video up that does not store the data all in memory at once. The Apache Commons HttpClient has such a feature. See this site for more information: http://hc.apache.org/httpclient-3.x/features.html. See this section for multi-part form upload of large files: http://hc.apache.org/httpclient-3.x/methods/multipartpost.html

For anything other than basic GET operations, the built-in java.net HTTP stuff isn't very good. Using Apache Commons HttpClient is recommended for this. It lets you do much more intuitive stuff like this:
PutMethod put = new PutMethod(url);
put.setRequestEntity(new FileRequestEntity(localFile, contentType));
int responseCode = put.executeMethod();
which replaces a lot of your boiler-plate code.

HttpsURLConnection#setChunkedStreamingMode(1024 * 1024 * 10); //10MB chunk
This ensures that any file (of any size) is streamed over a https connection, without internal buffering. This should be used when the file size or the content length is unknown.

Your problem is that you're trying to fix X video bytes into X/N bytes of RAM, when N > 1.
You either need to read the video into a smaller buffer and write it out as you go or make the file smaller or increase the memory available to your process.
Check your heap size. You can use -Xmx to increase it if you've taken the default.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.