Transferring huge load of data using java sockets

Transferring huge load of data using java sockets - java

We are using Java nio AsynchronousServerSocketChannel with completion handlers to write to a socket channel.
The sockets are used to communicate locally between two processes running in a same system.
We tend transfer quite a huge data. We use a buffer size 16384 to transfer the data in a chunked manner. Sending over UDP is not an option.
Is there anything else which can be done to improve the performance of the socket channel or reduce the payload transferred ?
Best Regards,
Saurav

There are a number of alternatives you may consider. I expect that you will need to implement each and test the actual performance on your hardware with your application in order to choose the right one.
You can try to tweak your current approach. Some thoughts: much larger buffer, double buffer (use two sockets so the writer always has a socket available for writing and the reader can always be reading), only send differences (if you are continuously sending and updated version of the data), compression, etc.
Use a completely different approach, such as shared memory or a memory-mapped file. A couple of questions with lots of good answers that may get you started: this and that.
While the details depend on your specific environment, you probably can speed up the communication by 10x (or maybe significantly more) over your current socket implementation.

Related

Pull files concurrently using single SFTP connection in Java - Improving SFTP performance

I need to pull the files concurrently from remote server using single SFTP connection in Java code.
I've already got few links to pull the files one by one on single connection.
Like:
To use sftpChannel.ls("Path to dir"); which will returns list of files in the given path as a vector and you have to iterate on the vector to download each file sftpChannel.get();
But I want to pull multiple files concurrently for eg. 2 files at a time on single connection.
Thank You!

The ChannelSftp.get method returns an InputStream.
So you can call the get multiple times, acquiring a stream for each download. And then keep polling the streams until all reach the end-of-file.
Though I do not see, what advantage this gives you over a sequential download.
If you want to improve performance, you first need to know, what is the bottleneck.
The typical bottlenecks are:
Network speed: If you are saturating the network speed already, you cannot improve anything.
Network latency: If the latency is the bottleneck, increasing size of an SFTP request queue may help. Use the ChannelSftp.setBulkRequests method (the default is 16, so use a higher number)
CPU: If the CPU is the bottleneck, you either have to improve efficiency of the encryption implementation, or spread the load across CPU cores. Spreading the encryption load of a single session/connection is tricky and would have to be supported on low-level SSH implementation. I do not think JSch or any other implementation supports that.
Disk: If a disk drive (local or remote) is the bottleneck (unlikely), the parallel transfers as shown above may help, even when using a single connection, if the parallel transfers use a different disk drive each.
For more in-depth information, see my answers to:
Why is FileZilla SFTP file transfer max capped at 1.3MiB/sec instead of saturating available bandwidth? rsync and WinSCP are even slower
Why is FileZilla so much faster than PSFTP?

What's the bottleneck for java.nio non-blocking I/O?

I know there are many open source server programs that leverage java.nio's non-blocking I/O, such as Mina. Many implementations use multiple selectors and multi-threading to handle selected events. It seems like a perfect design.
Is it? What is the bottleneck for an NIO-based server? It seems like there wouldn't be any?
Is there any need to control the number of connections? How would one do so?

With traditional blocking I/O, each connection must be handled by one or more dedicated threads. As the number of connections grows so does the number of required threads. This model works reasonably well with connection numbers into the hundreds or low thousands, but it doesn't scale well past that.
Multiplexing and non-blocking I/O invert the model, allowing one thread to service many different connections. It does so by selecting the active connections and only performing I/O when it's guaranteed the sockets are ready.
This is a much more scalable solution because now you're not having hordes of mostly-inactive threads sitting around twiddling their thumbs. Instead you have one or a few very active threads shuttling between all of the sockets. So what's the bottleneck here?
An NIO-based server is still limited by its CPU. As the number of sockets and the amount of I/O each does grows the CPU will be more and more busy.
The multiplexing threads need to service the sockets as quickly as possible. They can only work with one at a time. If the server isn't careful, there might be too much work going on in these threads. When that happens it can take some careful, perhaps difficult programming to move the work off-thread.
If the incoming data can't be processed immediately, it may be prudent to copy it off to a separate memory buffer so it's not sitting in the operating system's queue. That copying takes both time and additional memory.
Programs can't have an infinite number of file descriptors / kernel handles open. Every socket has associated read and write buffers in the OS, so you'll eventually run into the operating system's limits.
Obviously, you're still limited by your hardware and network infrastructure. A server is limited by its NIC, by the bandwidth and latency of other network hops, etc.
This is a very generic answer. To really answer this question you need to examine the particular server program in question. Every program is different. There may be bottlenecks in any particular program that aren't Java NIO's "fault", so to speak.

What is the bottleneck for an NIO-based server?
The network, memory, CPU, all the usual things.
It seems like there wouldn't be any?
Why?
Is there any need to control the number of connections?
Not really.
How would one do so?
Count them in and out, and deregister OP_ACCEPT while you're at the maximum.

Coding Java against serial ports: buffers and stream strategies

Please note: Although the library I'm using is called the Java Simple Serial Connector, this question is really more about Java coding against serial ports in general, and any strategies associated with doing so.
I'm using a Java library (JSSC as mentioned above) to read data from a serial port. The library requires you to poll the port for all available bytes. But this has me worried, because in between 2 different poll attempts, data could be streamed to the port (from the serial device) and therefore perhaps "lost".
Unless there is some kind of buffering/caching mechanism at the hardware layer that buffers data coming in to the serial port. In that case, the library's API makes sense, as it probably consults the buffer and reads anything thats been queueing up inside it. So I ask:
Is there such a "serial port buffer"? If so what is it? If not, then are there any strategies to "lossless" serial port reads?
If there is such a buffer, how does it work? What happens when it fills up?
The Java lib I'm using reads serial port data as byte[]'s; does it make sense to then construct a ByteArrayInputStream from these byte[]? What benefits would one gain from doing so?

How to avoid dropped packets with concurrent UDP server

Here is the original link that spawned this question.
So, apparently DataGramSocket will not queue received packets. So if two systems send concurrently, one of the packets will get lost (using the code from the link). I'm looking for ways to avoid dropping packets in this case.

If you want to be sure no data (packet) gets lost use TCP!
An advantage of UDP is that is has less overhead and is thus used for congested, high traffic connections, like video or game streams. A reason for the lower overhead is the lack of guarantees about data not getting missing during the transmission.
From your question is seems that you do care about missing data, so you need to build in measures to detect this. If you do you probably want the data to be resend until it arrived properly? This is what TCP offers you..!
If it is really Java that is dropping the data it is probably due to queues being full. The UDP protocol might be 'over', but Java knows there is a UDP protocol with all of its 'consequences'. As UDP is designed for high-throughput, the Java parts are designed for the same requirement. Queuing everything causes (massive) overhead which contrary to the UDP design, so this is highly unlikely. Furthermore, dropping data from a queue is nothing different than losing data during transmission (IMHO), so it does not surprise me that Java drops data!
If you want to prevent this you need bigger queues (although the might fill as well) and more importantly faster handling of the queued data (to prevent filling the queues).
But the most important thing to do is to accept data loss! If your application/server cannot handle this: do not use UDP

There are queues but they are always limited and you can always reach the limit. There is no way to avoid such issue completely. You can minimise the impact by having a low load on your servers so it can drain the queues quickly, and a dedicated network for UDP traffic.
Generally you have to build in some allowance for lost packets and you have to make the protocol reliable.

Network Programming: to maintain sockets or not?

I'm currently translating an API from C# to Java which has a network component.
The C# version seems to keep the input and output streams and the socket open for the duration of its classes being used.
Is this correct?
Bearing in mind that the application is sending commands and receiving events based on user input, is it more sensible to open a new socket stream for each "message"?
I'm maintaining a ServerSocket for listening to the server throwing events but I'm not so sure that maintaining a Socket and output stream for outbound comms is such a good idea.
I'm not really used to Socket programming. As with many developers I usually work at the application layer when I need to do networking and not at the socket layer, and it's been 5 or 6 years since I did this stuff at university.
Cheers for the help. I guess this is more asking for advice than for a definitive answer.

There is a trade off between the cost of keeping the connections open and the cost of creating those connections.
Creating connections costs time and bandwidth. You have to do the 3-way TCP handshake, launch a new server thread, ...
Keeping connections open costs mainly memory and connections. Network connections are a resource limited by the OS. If you have too many clients connected, you might run out of available connections. It will cost memory as you will have one thread open for each connection, with its associated state.
The right balanced will be different based on the usage you expect. If you have a lot of clients connecting for short period of times, it's probably gonna be more efficient to close the connections. If you have few clients connecting for long period of time, you should probably keep the connections open ...

If you've only got a single socket on the client and the server, you should keep it open for as long as possible.

If your application and the server it talks to are close, network-wise, it MAY be sensible to close the connection, but if they're distant, network-wise, you are probably better off letting the socket live for the duration.
Guillaume mentioned the 3-way handshake and that basically means that opening a socket will take a minimum of 3 times the shortest packet transit time. That can be approximated by "half the ping round-trip" and can easily reach 60-100 ms for long distances. If you end up with an additional 300 ms wait, for each command, will that impact the user experience?
Personally, I would leave the socket open, it's easier and doesn't cost time for every instance of "need to send something", the relative cost is small (one file descriptor, a bit of memory for the data structures in user-space and some extra storage in the kernel).

It depends on how frequent you expect the user to type in commands. If it happens quite infrequently, you could perhaps close the sockets. If frequent, creating sockets repeatedly can be an expensive operation.
Now having said that, how expensive, in terms of machine resources, is it to have a socket connection open for infrequent data? Why exactly do you think that "maintaining a Socket and output stream for outbound comms is not such a good idea" (even though it seems the right thing to do)? On the other hand, this is different for file streams if you expect that other processes might want to use the same file. Closing the file stream quickly in this case would be the way to go.
How likely is it that you are going to run out of the many TCP connections you can create, which other processes making outbound connections might want to use? Or do you expect to have a large number of clients connecting to your server at a time?

You can also look at DatagramSocket and DatagramPacket. The advantage is lower over-head, the disadvantage is the over-head that regular Socket provides.

I suggest you look at using an existing messaging solution like ActiveMQ or Netty. This will handle lot of the issues you may find with messaging.

I am coming a bit late, but I didn't see anyone suggest that.
I think it will be wise to consider pooling your connections(doesn't matter if Socket or TCP), being able to maintain couple connections open and quickly reuse them in your code base would be optimal in case of performance.
In fact, Roslyn compiler extensively use this technique in a lot of places.
https://github.com/dotnet/roslyn/search?l=C%23&q=pooled&type=&utf8=%E2%9C%93

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.