After reading about the Tomcat NIO connector I still don't get one thing: is the nio connector beneficial if the application code is blocking, i.e. it blocks on reading from the database, on reading the file system, on calling external web services?
So, for example, you have a REST-like API that receives a request, reads something from the database, and returns a response. It doesn't use servlet 3 async, it just writes to the response.
I didn't find a full description of the thread pools used by the NIO connector, but I imagine it has a thread pool for handling the requests, so each request ends up in its own thread, which it can block.
If that's the case, are the benefits of NIO still there, or the blocking code diminishes the benefits of NIO (in terms of resource utilization)?
Is the nio connector beneficial if the application code is blocking...?
Yes, the NIO connector is built with the assumption that your app will block somewhere. The NIO connector basically has several socket placeholders and responds to new incoming requests until information starts getting written back.
I didn't find a full description of the thread pools used by the NIO connector
I think this is the start of your confusion. Tomcat NIO has a selector pool, not a thread pool (reference). The connector code polls each selector to see if it has incoming or outgoing bytes to send. In this sense, the selector for a given request will continue to receive information until there is enough to process the request with a Request/Response object that bridges the gap between synchronous I/O and asynchronous I/O (reference).
The polling code never blocks longer than the time it takes to serialize a packet of information, so it's free to handle new requests. The only real limitation is the amount of memory available to Tomcat. While there is a thread pool, the number of actual threads used are much lower than the number of connections the application can handle (reference).
While there are performance differences between Tomcat Connectors (reference), the difference in raw request/response time is pretty small when the servlet itself blocks. However, the difference in the number of simultaneous requests that Tomcat can handle is vastly different when you use non-blocking I/O.
Related
I have read a lot of material to try and clearly understand the gains a Jetty Non Blocking Web Application Server can or can't offer.
So far what I understand (in part by referring to this: How do Jetty and other containers leverage NIO while sticking to the Servlet specification?) is that with a non blocking IO model a web server like Jetty runs a single (or one per CPU core) thread - the Selector thread - that determines connections that are ready for some I/O. Connections that are ready with some I/O are dispatched for processing on to an internal thread pool to process the request.
I can see how such an architecture could allow you to serve many more connections with far fewer resources. However, what I am not clear about is this:
If I wrote a servlet that ran a long running database operation using a standard JDBC driver performing blocking I/O, wouldn't the handler thread dispatched from the pool to handle this request block?
And if requests came through faster than database requests are fulfilled, the handler thread pool would exhaust at some point?
And so with an application such as this is there any benefit to be run on a Non Blocking Jetty webserver? Is the non-blocking benefit only truly accrued if the servlet itself used another layer of non-blocking access to the database? Or is there something I am missing?
Please do explain if there's some magic through which Jetty will pay less of a price for the blocking database operations than say, a blocking web server.
P.S: For a contrast I read about Node.js here - How the single threaded non blocking IO model works in Node.js - it seems to suggest that Node uses libuv underneath and applies other techniques to translate all blocking operations in code (such as database access and sleep()) into event callbacks ensuring the event loop and the internal thread pool never get blocked in a blocking callback. While it's still a little gobbledygook to me, but assuming that's true for Node, can Jetty promise the same? That too for servlets etc that are not written in a non-blocking way?
I am new to the RESTful Webservices world and I have a question regarding how WS works.
Context:
I am developing a RESTful WS that will have a high load; at one given time I can have let's say up to 10 clients sending multiple requests. All the requests will be sent to port 80.
I am developing the WS with Jersey (Java) and deploying on a Tomcat Webserver.
Question:
Let's say we have 5 clients that send requests at the same time; each one sends 2 requests to port 80; will they be treated in FIFO order? Can we have some sort of multi-threading if let's say we don't care about the order?
It all depends what server you use and how it is configured. Standard configuration (you have to work hard to make it not standard) is to have multiple threads. In other words - server usually automatically creates or uses another thread for each new request and it is almost certain that it will be processed in parallel.
You can actually see it inside your running code by using java.lang.Thread.currentThread() - print the name of current thread and Rest request and you will see.
To answer your question, a thread will be fetched from thread pool to server every request you send. The server does not care about the order, the request comes first will be served first.
More about the servers:
I suggest you use Nginx or Apache as reverse server to enable high performance, a thread will be fetched from the thread pool to server the request. To improve performance, you can increase the thread pool size. However, too much thread will, on the other hand, reduce your performance due to the frequency of switching from thread to thread increases. You don't want to have a very large thread pool.
If you are using Apache + Tomcat, basically, you have the same situation like you are using Tomcat. But apache is more suitable than tomcat to be the web server. In real life, companies use apache as reverse server that dispatch request to tomcat.
Apache and Tomcat are multithread based server, their performance reduce when there are too much requests. If you have to handle a lot of requests, you can use Nginx.
Nginx is an even based server, it uses queue to store requests and use FIFO to dispatch them. It can handle a lot of requests with much fewer threads. Therefore, its performance will be more stable even with larger amount of requests. However, with extremely large amount of requests, Nginx will also be overwhelmed, as its event loop has no room for extra requests.
Companies due with the situation by using distributed system concepts. For example load balancer. But to answer your question, that's a little too much. Check this article and this article to gain a better idea about nginx and apache.
Writing any kind of web server in Java (be it a webserver, RESTful webapp or a microservice) you get to use Sockets for dual channel communication between client and server.
Using the common Socket and ServerSocket class is trivial, but since Sockets are blocking, you end up creating a thread for each request. Using this threaded system, your server will work perfectly but won't scale very well.
The alternative is using Streams by means of SocketChannel, ServerSocketChannel and Selector, and is clearly not as trivial as common Sockets.
My question is: which of these two systems are used in production ready code? I'm talking about medium to big projects like Tomcat, Jetty, Sparkjava and the like?
I suppose they all use the Stream approach, right?
To make a web server really scalable, you'll have to implement it with non-blocking I/O - which means that you should make it in such a way that threads will never get blocked waiting for I/O operations to complete.
Threads are relatively expensive objects. For example, for each thread memory needs to be allocated for its call stack. By default this is in the order of one or a few MB. Which means that if you create 1000 threads, just the call stacks for all those threads will already cost you ~ 1 GB memory.
In a naïve server application, you might create a thread for each accepted connection (each client). This won't scale very well if you have many concurrent users.
I don't know the implementation details of servers like Tomcat and Jetty, but they are most likely implemented using non-blocking I/O.
Some info about non-blocking I/O in Tomcat: Understanding the Tomcat NIO Connector
One of the most well-known non-blocking I/O libraries in Java is Netty.
I noticed a major difference in processing time between two servlets in the same tomcat and two separate tomcats on the same host. The servlets communicate using http. Does tomcat or java have some mechanism that optimizes http communication when in the same tomcat or JVM. I'm trying to confirm this observation is not related to the host I'm running on.
It could be the difference between blocking and non-blocking I/O.
Tomcat uses the multi-thread model: have a pool of threads for processing requests and a queue for incoming requests. The server assigns a thread to an incoming request for processing, performs the task, sends back the response, and returns the thread to the pool. The queue handles requests that back up.
Non-blocking IO, as employed by Netty, is something different.
Perhaps the two requests are being queued up when they are processed by the same Tomcat.
Some more information about these tests. Both tests are run on SunOS 5.10 with Apache Tomcat Version 6.0.20 and jdk1.6.0_23. The http transfers can involve fairly large files - 5M. Thread handling might explain but with timing differences over a factor of 10 makes me suspect that no data has to transfer out of the JVM. Some form of blocking vs non-blocking might fit the timing difference.
Does any of you understand what weblogic.socket.Muxer is used for in WebLogic 8.1?
Often in thread dumps I see stack traces similar to this:
"ExecuteThread: '0' for queue: 'weblogic.socket.Muxer'" id=20 idx=0x68 tid=26709 prio=5 alive, in native, blocked, daemon
-- Blocked trying to get lock: java/lang/String#0x2b673d373c50[fat lock]
at jrockit/vm/Threads.waitForUnblockSignal()V(Native Method)
at jrockit/vm/Locks.fatLockBlockOrSpin(Locks.java:1675)[optimized]
at jrockit/vm/Locks.lockFat(Locks.java:1776)[optimized]
at jrockit/vm/Locks.monitorEnterSecondStageHard(Locks.java:1312)[optimized]
at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1259)[optimized]
at jrockit/vm/Locks.monitorEnter(Locks.java:2439)[optimized]
at weblogic/socket/EPollSocketMuxer.processSockets(EPollSocketMuxer.java:153)
at weblogic/socket/SocketReaderRequest.run(SocketReaderRequest.java:29)
at weblogic/socket/SocketReaderRequest.execute(SocketReaderRequest.java:42)
at weblogic/kernel/ExecuteThread.execute(ExecuteThread.java:145)
at weblogic/kernel/ExecuteThread.run(ExecuteThread.java:117)
at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
-- end of trace
It's not that I have any problems with that, it is just intresting to understand:
1) what is it doing?
2) can it affect any performance?
From the documentation (http://download.oracle.com/docs/cd/E13222_01/wls/docs100/perform/WLSTuning.html#wp1152246):
WebLogic Server uses software modules
called muxers to read incoming
requests on the server and incoming
responses on the client. These muxers
are of two primary types: the Java
muxer or native muxer.
A Java muxer has the following
characteristics:
Uses pure Java to read data from sockets.
It is also the only muxer available for RMI clients.
Blocks on reads until there is data to be read from a socket. This behavior does not scale well when there are a large number of sockets and/or when data arrives infrequently
at sockets. This is typically not an issue for clients, but it can create a huge bottleneck for a server.
Native muxers use platform-specific
native binaries to read data from
sockets. The majority of all platforms
provide some mechanism to poll a
socket for data. For example, Unix
systems use the poll system and the
Windows architecture uses completion
ports. Native provide superior
scalability because they implement a
non-blocking thread model. When a
native muxer is used, the server
creates a fixed number of threads
dedicated to reading incoming
requests. BEA recommends using the
default setting of selected for the
Enable Native IO parameter which
allows the server automatically
selects the appropriate muxer for the
server to use.
If the Enable Native IO parameter is
not selected, the server instance
exclusively uses the Java muxer. This
maybe acceptable if there are a small
number of clients and the rate at
which requests arrive at the server is
fairly high. Under these conditions,
the Java muxer performs as well as a
native muxer and eliminate Java Native
Interface (JNI) overhead. Unlike
native muxers, the number of threads
used to read requests is not fixed and
is tunable for Java muxers by
configuring the Percent Socket Readers
parameter setting in the
Administration Console. See Changing
the Number of Available Socket
Readers. Ideally, you should configure
this parameter so the number of
threads roughly equals the number of
remote concurrently connected clients
up to 50% of the total thread pool
size. Each thread waits for a fixed
amount of time for data to become
available at a socket. If no data
arrives, the thread moves to the next
socket.
Then, for those reasons, it is obviously better to use native muxers.
Here, it looks like you are using the default native muxer (weblogic.socket.EPollSocketMuxer), not the Java muxer (weblogic.socket.SocketMuxer).
I have found this link that explained the situation pretty much:
The socket Muxer manages the server’s
existing socket connections. It first
determines which sockets have incoming
requests waiting to be processed. It
then reads enough data to determine
the protocol and dispatches the socket
to an appropriate runtime layer based
on the protocol. In the runtime layer,
the socket muxer threads determine
which execute thread queue to be used
and delegates the request accordingly.
For any given application server, a thread dump will show you hundreds, if not thousands, of background threads. These servers are complex beasts, and these threads are just the background plumbing doing its job.
A "muxer" is a multiplexer, which is a mechanism for combining several streams of data on to a single channel. Weblogic will be using these to exchange data with itself, or with other nodes in the cluster. At any given time, a number of those will be "blocked", since they have nothing to do.
It's almost certainly no cause for concern. If you look under the rock, you're bound to find a few ugly things underneath blinking up at you in the sunlight.