Netty client uses only one thread - java

I'm implementing a binary protocol above TCP/IP and using Netty to achieve this. My problem is that the performance is rather poor (600 msg/s). I'm connecting as a client to a server with one connection only. When I investigated running instance with JTop I saw that Netty was using 1 worker thread very heavily and the other 5 worker threads are doing nothing (0% ussage). I was digging on the web and all I found is mention of ExecutionHandler. But why should I use this if those 6 worker threads should be enough. Or am I misunderstanding how Netty uses these threads?
My Netty init code:
this.channelFactory = new NioClientSocketChannelFactory(this.executors, DaemonExecutors.newCachedDaemonThreadPool(), 1, 6);
this.clientBootstrap = new ClientBootstrap(channelFactory);
this.channelGroupHandler = new ClientChannelGroupHandler(this.channels);
this.clientBootstrap.getPipeline().addLast("ChannelGroupHandler", this.channelGroupHandler);
Thanks for any hints
Matous

NIO, or rather the non-blocking version of NIO ("New" I/O) allows you to use a single thread for multiple connections, since the thread doesn't block (hence the name) on the read/write operations. Blocking I/O requires a thread for each connection, as the blocking would prevent you from handling traffic between different connections.
This allows you to perform more efficient communication, since you no longer have thread overhead for one.
A decent tutorial is available here (the original Oracle tutorial seems to have vanished from the face of the Google).

The reason you only see one worker thread being used is because you are making only a single connection to the server. Had you made multiple connections, more worker threads would have been used.
If each connection's work is suited for parallelization, then you can implement a handler that uses threads internally, but Netty won't to do that for you.
As for the NIO/OIO distinction, it's true that the idea of NIO is to have one thread handling the events for multiple connections. However, this doesn't mean one thread will handle the all the work. The "single thread" only dispatches work to other (i.e. worker) threads.
Here is an excerpt from the Netty doc:
How threads work There are two types of threads in a
NioServerSocketChannelFactory; one is boss thread and the other is
worker thread.
Boss threads Each bound ServerSocketChannel has its own boss thread.
For example, if you opened two server ports such as 80 and 443, you
will have two boss threads. A boss thread accepts incoming connections
until the port is unbound. Once a connection is accepted successfully,
the boss thread passes the accepted Channel to one of the worker
threads that the NioServerSocketChannelFactory manages.
Worker threads One NioServerSocketChannelFactory can have one or more
worker threads. A worker thread performs non-blocking read and write
for one or more Channels in a non-blocking mode.

Related

Reactor pattern how it works with threads

I started reading Vert.x framework documentation, but i didn't understand how it works and what is a Reactor pattern, i read this article https://dzone.com/articles/understanding-reactor-pattern-thread-based-and-eve and noticed that instead of general servlet based (one request one thread) approch, Reactor pattern uses event-driven architecture where single thread named event loop takes a request put it to some sort of the job queue and provides a handler that will be executed once the task has been finished , and code in handler will be executed by this event loop, so golden rule is - don't block event loop.
What I don't understand is , from article:
Those handlers/callbacks may utilize a thread pool in multi-core environments.
So handlers use thread pool , how this pool is difference from the standard thread pool for example Servlet's container TOMCAT. How these two concepts are different from each other in case of Http server if both are using Thread pool to manage requests.
Thank in advance
Forget that DZone article. Forget the Reactor pattern. Learn Asynchronous procedure call.
There are 2 ways to split all the work in computer to parts: threads and tasks (in Java- tasks are Runnables). Tasks execute on a thread pool when they are ready. And when they are not ready, they do not occupy thread with its huge stack, and we can afford to have millions of tasks in single JVM instance, while 10000 threads in single JVM instance is problematic.
The main problem with tasks is when task needs data which is not ready (not calculated by other task, or not yet arrived via network). In the thread world, the thread waiting for data executes a blocking operation like inputsream.read(), but tasks are not allowed to do this, or they would have occupied too many threads from thread pool and all advantages of task-based programming would be lost. So tasks are augmented with mechanisms which submit that task to the thread pool exactly when all their parameters arrived. Task with such a mechanism is called asynchronous procedure call. All the event-driven architectures are variants of asynchronous procedure call: Vert.x, RxJava, Project Reactor, Akka Actors etc. They just pretend to be something original and not always talk about this.

Blocking vs Non blocking main thread in tomcat

Normally in tomcat, a thread will be running and when a request comes in,it will assign the responsibility of servicing the request to a thread from thread pool.
Does it matter if that main thread is blocking or non blocking in terms of scalability?
Non Blocking IO has the following advantages:
Highly Scalable : Because no-more you require one thread per client. It can effectively support more number of clients.
High Keep Alive : Blocking IO requires to block until the keepalive time for the next request. Non-Blocking being notification model, it can support high keepalive times.
Better Performance on High Load : Because in blocking IO has one thread per connection, it requires n threads for n connections. As the value n increases, the performance degrades because more thread context switching.
When an incoming request is processed in tomcat it will assign the connection to a thread from its thread pool.
What matters here is to run the thread as fast as possible. You typically run blocking io calls in this thread, for file io, db and so on.
You need to adjust the size of this thread pool apropriatley to handle your expected traffic.
Esentially when using the Java EE servlet spec you are forced into handling your requests in a one thread per incoming connection manner.
There are a few non blocking frameworks out there. Check out http://www.playframework.org/ and Jetty ( Jetty nonblocking by default? )

does slow connections effect netty performance?

CODE-1
new NioServerSocketChannelFactory(Executors.newCachedThreadPool(), Executors.newCachedThreadPool(),WORKER_SIZE)
CODE-2
OrderedMemoryAwareThreadPoolExecutor executor = new OrderedMemoryAwareThreadPoolExecutor(48, 0, 0, 1, TimeUnit.SECONDS);
pipeline.addLast("executor", new ExecutionHandler(executor));
If IO worker thread pool size (default is 2*count of cpu) can be set from CODE-1, what is the purpose of adding executer (a thread pool) to pipeline in CODE-2 ?
IO operations are done from worker threads. Does that mean, a client with slow connection or bad network keeps IO worker thread busy until data is completely sent ? If so, increasing WORKER_SIZE would help me prevent latencies ?
Slow Connections does not affect Netty threads in NIO normally (check the update note).
Some points about Netty server internal threads
by default there will be only one Boss Thread per server port, and it
will accept connection and handover the connection to worker
thread.
to be precise: WORKER_SIZE is the maximum number of NioWorker
runnables a server can have. for example If the server has
only one connection, then there will be 1 worker thread. If number of connections are increasing and it can not be assigned to next worker (active connections > WORKER_SIZE), then connections will be assigned to a worker in a round robin fashion.
If IO worker thread pool size (default is 2*count of cpu) can be set from CODE-1, what is the purpose of adding executer (a thread pool) to pipeline in CODE-2 ?
If your upstream tasks are blocking then you should execute them in a separate thread pool using a execution handler. Otherwise Nio read/write will not work on time (latency?). I think having a execution handler will help to reduce the latency than setting big value to WORKER_SIZE.
IO operations are done from worker threads. Does that mean, a client with slow connection or bad network keeps IO worker thread busy until data is completely sent ? If so, increasing WORKER_SIZE would help me prevent latencies ?
Generally speaking, increasing the WORKER_SIZE >= number of cpu * 2 does not help because,
NIO is non blocking and If I am not mistaken, its CPU intensive.For CPU intensive task CPU * 2 number of threads are chosen mostly.
Update:
NioWorker runs a loop with selector.select(500ms) to receive OP_READ, selector.select with timeout a blocking call and if most of the connections are slow, performance may reduce?. You can reduce the timeout in org.jboss.netty.channel.socket.nio.SelectorUtil.java and test.
The thread pool[s] you are adding in CODE-1 are for the boss threads, and worker threads. The boss threads accept connections and pass it on to worker thread to handle.
The executor you add in CODE-2 is for handling the messages read by the worker threads.
Slow connections will not affect performance since you are using a non-blocking architecture (NIO) - which is set in Netty to not block (it could if it wanted to)

Netty threads being blocked

I have 3 ThreadPoolExecutors in my system.
One for Netty's Master process, another for netty's worker process and last one for processing ad-hoc processing (sending request to mail server).
ExecutorService bossExecutors = Executors.newFixedThreadPool(1,
new ServerThreadFactory("netty-boss"));
ExecutorService workerExecutors = Executors.newFixedThreadPool(10,
new ServerThreadFactory("netty-worker"));
ChannelFactory factory = new NioServerSocketChannelFactory(
bossExecutors,
workerExecutors,
Runtime.getRuntime().availableProcessors());
ExecutorService mailExecutor = Executors.newFixedThreadPool(40);
This works perfectly fine until mailExecutor starts making request to mail server. Until, that batch requests using mailExecutor, generally making 5000+ requests to mail server is completed, netty threads get blocked.
I don't understand why netty threads seem to be getting blocked that time since, I have allocated definite thread pools. During that time, Netty can't even process single request.
Any idea why it's happening or what I'm doing wrong?
Can you provide a thread-dump ?
jstack <pid>
Also you should never use a fixed threadpool for the worker / poss threadpool. Use a cached one, this way you can be sure you never get into any starvation. You should specify the worker count with the 3 argument in the constructor.
It sounds like a scheduling issue. You have 40 threads under heavy load, vs the availableProcessors number of threads for handling Netty work (what is your availableProcessors() count at the time you create your factory?).
So it could just be that the Netty threads are too few and are being starved since they never happen to be picked for execution compared to the 40 threads handling the mail work.
It may also be that for some reason, your worker threads are blocked on the mail threads finishing, perhaps due to some shared object that is being synchronized on (is there some queue or list of mail to be sent that the netty threads need to write to, and which the mail threads have locked while they send?).

Should any threads reside outside of the thread pool?

When using a thread pool, is it beneficial to still use singular thread objects for a specific task. I'm wondering in terms of a server in Java, whether or not the thread which is listening for connections, should share its resources with any other threads which are then allocated from this one listening thread? I may also be missing the point as I'm not familiar with this concept.
Yes, singular tasks that have to run concurrently can have their own threads outside of the thread pool. Forcing every thread to be part of the pool might obscure your design because you need all kinds of machinery to make concurrent tasks look like worker threads.
I'd create two pools, one for listening and one for internal tasks. This way you're never putting your server at risk of not being able to listen for connections.
The pool for internal tasks can be small if it's only a thread now and then, but at least it's safely isolated.
Resource sharing might be necessary in cases where your server needs to maintain a global application state (e.g. using an AtomicLong for the number of requests served by your server etc.). Your main thread would typically wait, ready to accept incoming connections/requests. You then update the global state (like hit counter), create a new "job" based on the new request (typically a Runnable or Callable) and submit it to a thread pool (java.util.concurrent) provides them.
The purpose of a thread pool is just to help you manage your threads. In other words, a thread pool handles the creation and termination of threads for you as well as giving work to idle threads. Threads that are blocked or waiting will not receive new tasks.
Your connection listener will probably be in an infinite loop waiting for connections and thus never be idle (although it could be in a wait state). Since this is the case, the connection listener thread will never be able to receive new tasks so it wouldn't make sense to pool it with the other threads.
Connection listening and connection handling are also two different things. From that perspective the connection listener shouldn't be pooled with the connection handlers either.
SImilar to #larsman's comment, I would do what ever you feel is simpler and clearer. I have tended to use one thread pool for everything because it appeared to be easier to manage. You don't have to do it that way and the listening task can be its own thread.

Categories