About Java ServerSocket Accept: busy-wating? - java

i'm reading the TCP/IP Socket in Java, about the serversocket, it says
When we call accept() on that ServerSocket
instance, if a new connection is pending, accept() returns immediately; otherwise it blocks
until either a connection comes in or the timer expires, whichever comes first. This allows
a single thread to handle multiple connections. Unfortunately, the approach requires that
we constantly poll all sources of I/O, and that kind of “busy waiting” approach again introduces
a lot of overhead from cycling through connections just to find out that they have
nothing to do.
As I understand it, should this be "notified" when a connection comes thus should not be "busy waiting"? Did i misunderstand something...?
-----------------EDIT----------------------
The whole paragraph is as below:
Because of these complications, some programmers prefer to stick with a single-threaded
approach, in which the server has only one thread, which deals with all clients—not sequentially,
but all at once. Such a server cannot afford to block on an I/O operation with any
one client, and must use nonblocking I/O exclusively. Recall that with nonblocking I/O, we specify the maximum amount of time that a call to an I/O method may block (including zero).
We saw an example of this in Chapter 4, where we set a timeout on the accept operation
(via the setSoTimeout() method of ServerSocket). When we call accept() on that ServerSocket
instance, if a new connection is pending, accept() returns immediately; otherwise it blocks
until either a connection comes in or the timer expires, whichever comes first. This allows
a single thread to handle multiple connections. Unfortunately, the approach requires that
we constantly poll all sources of I/O, and that kind of “busy waiting” approach again introduces
a lot of overhead from cycling through connections just to find out that they have
nothing to do

It's mostly nonsense, even in the entire quotation. Either you are using blocking I/O, in which case you need a thread per connection, and another per accept() loop, or you are using non-blocking I/O, in which case you have select(), or, from Java 7, you are using Asynchronous I/O, in which case it is all callbacks into completion handlers. In none of these cases do you need to poll or busy-wait.
I think he must be referring to using blocking mode with very short timeouts, but it's really most unclear.

Related

Passing a Java socket from thread A to B

In a server, there is a thread A listening for incoming connections, typically looping forever. When a connection is accepted, thread A creates a task (say, class Callable in Java) and submits it to an Executor.
All this really means is that A lost the reference to the socket, and that now there’s a thread B (created by the Executor) that manages the socket. If B experiences any exception, it would close the socket, and there is no risk that the socket, as an operating system resource, will not be reclaimed.
This is all fine if thread B starts. But what if the executor was shut down before B had a chance to get scheduled?
Does anyone think this is an issue? If the reference to the socket is lost due to this, would the garbage collector close the socket?
Yes, it sounds like an issue.
The OS will probably eventually free up the socket (at least if it's TCP, as far as I can tell) but it will probably take a relatively long time.
I don't think the garbage collector plays a role in this case. At least not for threads, which after having been started will usually keep running even if there is no reference to them in the code (this is true at least for non-daemon threads). Sockets may behave in a similar manner.
If you cannot guarantee the connection is going to be processed (by starting the handling Thread instance as soon as it is established) then you should keep a reference to the socket and make sure you close all of them as soon as possible, which probably means right after Executor.shutdown() or similar method has been called.
Please note that depending on how you ask the Executor to shut down it will either process or not threads which already have been submitted to execution but haven't yet started. So be sure to make your code behave accordingly.
Also if you have limited resources (available threads) to process incoming socket connections and don't want them to grow too much, consider closing them immediately after having been accepted so they don't pile up in the unprocessed wait queue, if this is feasible in your project. The client can then retry connecting at a later time. If you still need to consume connections as soon as they come in, consider a non-blocking I/O approach, which will tend to scale better (and up to a point).
If the reference to the socket is lost due to this, would the garbage collector close the socket?
Probably. But the garbage collector may not run until literally the end of next week: You can't rely on the GC running, pretty much ever, just because 'hey, java has a garbage collector'. It does, and it won't kick in until needed. It may simply never be needed.
Depending on the GC to close resources is a fine way to get your VM killed by the OS for using up too many system resources.
The real question is: What is the causal process that results in shutting down the executor?
If there is some sort of 'cancel all open connections' button, and you implemented that as a one-liner: queue.shutdown(), then, no - that is not a good idea: You'll now be leaning on the GC to clean up those sockets which is bad.
I assume your callables look like:
Socket socket = ....; // obtained from queue
Callable<Void> socketHandler = () -> {
try {
// all actual handling code is here.
} finally {
socket.close();
}
return null;
};
then yeah that is a problem: If the callable is never even started, that finally block won't run. (If you don't have finally you have an even bigger problem - that socket won't get cleaned up if an exception occurs during the handling of it!).
One way out is to have a list of sockets, abstract away the queue itself, and have that abstraction have a shutdown method which both shuts down the queue and closes every socket, guarding every step (both the queue shutdown as well as all the socket.close commands) with a try/catch block to ensure that a single exception in one of these steps won't just stop the shutdown process on the spot.
Note that a bunch of handlers are likely to still be chugging away, so closing the socket 'out from under them' like this will cause exceptions in the handlers. If you don't want that, shut down the queue, then await termination (guarded with try/catch stuff), and then close all the sockets.
You can close a closed socket, that is a noop, no need to check first and no need to worry about the impact of closing a ton of already-closed sockets.
But do worry about keeping an obj ref to an infinitely growing list of sockets. Once a socket is completely done with, get rid of it - also from this curated list of 'stuff you need to close if the queue is terminated'.
Of course, if the only process that leads to early queue termination is because you want to shut down the VM, don't worry about it. The sockets go away with the VM. In fact, no need to shutdown the queue. If you intend to end the VM, just.. end it. immediately: System.shutdown(0) is what you want. There is no such thing as 'but.. I should ask all the things to shut down nicely!'. That IS how you ask. Systems that need to clean up resources are mostly badly designed (design them so that they don't need cleanup on VM shutdown. All the resources work that way, for example), and if you must, register a shutdown hook.

Number of threads for NioEventLoopGroup with persistent connections

I would like to use Java Netty to create a TCP server for a large number of persistent connections from a clients. In other words, imaging that there are 1000 client devices out there, and all of them create and maintain a persistent connection to the TCP server. There will be a reasonable amount of traffic (mostly lines of text) that go back and forth across each of these persistent connections. How can I determine the best number of threads to use in the boss and worker groups for NioEventLoopGroup?
My understanding is that when the connection is created, Netty creates a SimpleChannelInboundHandler<String> object to handle the connection. When the connection is created then the handler channelActive method is called, and every time it gets a new message from the client, the method messageReceived gets called (or channelRead0 method in Netty 4.0.24).
Is my understanding correct?
What happens if I have long running code to run in messageReceived -
do I need to launch this code in yet another thread
(java.util.Thread)?
What happens if my messageReceived method blocks on something or
takes a long time to complete? Does that bring Netty to a grinding
halt?
Basically I need to write a TCP socket server that can serve a large number of persistent connections as quickly as possible.
Is there any guidance available on number of threads for NioEventLoopGroup and on how to use any threads inside the handler?
Any help would be greatly appreciated.
How can I determine the best number of threads to use in the boss and worker groups for NioEventLoopGroup?
About Boss Thread,if you are saying that you need persistent connections , there is no sense to use a lot of boss threads, because boss threads only responsible for accepting new connections. So I would use only one boss thread.
The number of worker threads should depends on your processor cores.
Don't forget to add -XmsYYYYM and -XmxYYYYM as your VM attributes, because without them you can face the case, when your JVM are not using all cores.
What happens if I have long running code to run in messageReceived - do I need to launch this code in yet another thread (java.util.Thread)?
Do you really need to do it? Probably you should think of doing your logic another way, if not then probably you should consider OIO with new thread for each connection.
What happens if my messageReceived method blocks on something or takes a long time to complete?
You should avoid using thread blocking actions in your handlers.
Does that bring Netty to a grinding halt?
Yep, it does.

Java NIO - non-blocking channels vs AsynchronousChannels

Java NIO offers SocketChannel and ServerSocketChannel which can be set to non-blocking mode (asynchronous). Most of the operations return a value that corresponds to success or that the operation is not yet done. What is the purpose of AynchronousSocketChannel and AsynchronousServerSocketChannel then, apart from the callback functionalities?
which can be set to non-blocking mode (asynchronous)
There's your misapprehension, right there. Non-blocking mode is different from asynchronous mode.
A non-blocking operation either transfers data or it doesn't. In either case there is no blocking, and the operation is complete once it returns. This mode is supported by SocketChannel, DatagramSocketChannel, and Selector.
An asynchronous operation starts when you call the method and continues in the background, with the result becoming available at a later time via a callback or a Future. This mode is supported by the AsynchronousSocketChannel etc classes you mention in your question.
The AynchronousSocketChannel and AsynchronousServerSocketChannel come into their own when using the methods that take a CompletionHandler.
For example the code in a server might look like this:
asynchronousServerSocketChannel.accept(Void, new ConnectionHander());
Where ConnectionHander is an implementation of CompletionHandler that deals with client connections.
The thread that makes the accept call can then continue doing other work and the NIO API will deal with scheduling the callback to the CompletionHandler when a client connection is made (I believe this is an OS level interupt).
The alternative code might look like this:
SocketChannel socketChannel = serverSocketChannel.accept();
Depending on the mode, the calling thread is now blocked until a client connection is made or null is returned leaving you to poll. In both cases, it's you that has to deal with the threads, which generally means more work.
At the end of the day, you take your pick based on your particular use-case, though I've generally the former produces clearer more reliable code.

Must a listening socket runs in thread?

I have some problems understanding how a socket should be handled. I get that server socket must runs in its own thread, because it must check if there are new connections. Now, i'm not sure if every socket opened by a new connection should runs in a thread.
What i have in mind is checking every x time the socket states. If it has something to be read, then read. If not, check the next socket. I see some examples where this process is done in a thread, but i dont want a socket to do stuff, just want to read if it has some data, and process them.
The answer is no, you don't need to listen in a separate thread. But, just realize that while you are "listening" your entire program will be waiting for that to complete before moving onward.
So unless you are fine with your entire program waiting, I would suggest a separate thread.
You can also have one thread which communicates with all sockets in a round-robin manner. It checks each socket if it has new data, and when it hasn't it checks the next.
Another alternative is to use NIO (New Input/Output).
The idea behind NIO is that you have a thread with one Selector which owns multiple Channels (a channel can be a network socket or any other IO interface). You then call selector.select() in a loop. This method blocks until one or more channels have data, and then returns a set of these channels. You can then process the data the channels delivered.
Here is a tutorial.
The problems with round-robin using available() are many.
It assumes that available() actually works, which isn't guaranteed.
It assumes that all clients need the same amount of service.
N-1 clients wait while one client is serviced.
A non-responsive client can block not only your application but all the other clients.
I'm sure there are more.
Don't do this. Use threads or NIO.

Socket vs SocketChannel

I am trying to understand SocketChannels, and NIO in general. I know how to work with regular sockets and how to make a simple thread-per-client server (using the regular blocking sockets).
So my questions:
What is a SocketChannel?
What is the extra I get when working with a SocketChannel instead of a Socket.
What is the relationship between a channel and a buffer?
What is a selector?
The first sentance in the documentation is A selectable channel for stream-oriented connecting sockets.. What does that mean?
I have read the also this documentation, but somehow I am not getting it...
A Socket is a blocking input/output device. It makes the Thread that is using it to block on reads and potentially also block on writes if the underlying buffer is full. Therefore, you have to create a bunch of different threads if your server has a bunch of open Sockets.
A SocketChannel is a non-blocking way to read from sockets, so that you can have one thread communicate with a bunch of open connections at once. This works by adding a bunch of SocketChannels to a Selector, then looping on the selector's select() method, which can notify you if sockets have been accepted, received data, or closed. This allows you to communicate with multiple clients in one thread and not have the overhead of multiple threads and synchronization.
Buffers are another feature of NIO that allows you to access the underlying data from reads and writes to avoid the overhead of copying data into new arrays.
By now NIO is so old that few remember what Java was like before 1.4, which is what you need to know in order to understand the "why" of NIO.
In a nutshell, up to Java 1.3, all I/O was of the blocking type. And worse, there was no analog of the select() system call to multiplex I/O. As a result, a server implemented in Java had no choice but to employ a "one-thread-per-connection" service strategy.
The basic point of NIO, introduced in Java 1.4, was to make the functionality of traditional UNIX-style multiplexed non-blocking I/O available in Java. If you understand how to program with select() or poll() to detect I/O readiness on a set of file descriptors (sockets, usually), then you will find the services you need for that in NIO: you will use SocketChannels for non-blocking I/O endpoints, and Selectors for fdsets or pollfd arrays. Servers with threadpools, or with threads handling more than one connection each, now become possible. That's the "extra".
A Buffer is the kind of byte array you need for non-blocking socket I/O, especially on the output/write side. If only part of a buffer can be written immediately, with blocking I/O your thread will simply block until the entirety can be written. With non-blocking I/O, your thread gets a return value of how much was written, leaving it up to you to handle the left-over for the next round. A Buffer takes care of such mechanical details by explicitly implementing a producer/consumer pattern for filling and draining, it being understood that your threads and the JVM's kernel will not be in sync.
Even though you are using SocketChannels, It's necessary to employ thread pool to process channels.
Thinking about the scenairo you use only one thread which is responsible for both polling select() and processing the SocketChannels selected from Selectors, if one channel takes 1 seconds for processing, and there are 10 channels in queue, it means you have to wait 10 seconds before next polling which is untolerable. so there should be a thread pool for channels processing.
In this sense, i don't see tremendous difference to the thread-per-client blocking sockets pattern. the major difference is in NIO pattern, the task is smaller, it's more like thread-per-task, and tasks could be read, write, biz process etc. for more detail, you can take a look at Netty's implementation of NioServerSocketChannelFactory, which is using one Boss thread accepting connection, and dispatch tasks to a pool of Worker threads for processing
If you are really fancy at one thread, the bottom-line is at least you shold have pooled I/O threads, because I/O operations is often oders of magnitude slower than instruction-processing cycles, you would not want the precious one thread being blocked by I/O, and this is exactly NodeJS doing, using one thread accept connection, and all I/O are asynchornous and being parallelly processed by back-end I/O threads pool
is the old style thread-per-client dead?
I don't think so, NIO programming is complex, and multi-threads is not naturally evil, Keep in mind that modern operating systems and CPU's become better and better at multitasking, so the overheads of multithreading becomes smaller over time.

Categories