Writing any kind of web server in Java (be it a webserver, RESTful webapp or a microservice) you get to use Sockets for dual channel communication between client and server.
Using the common Socket and ServerSocket class is trivial, but since Sockets are blocking, you end up creating a thread for each request. Using this threaded system, your server will work perfectly but won't scale very well.
The alternative is using Streams by means of SocketChannel, ServerSocketChannel and Selector, and is clearly not as trivial as common Sockets.
My question is: which of these two systems are used in production ready code? I'm talking about medium to big projects like Tomcat, Jetty, Sparkjava and the like?
I suppose they all use the Stream approach, right?
To make a web server really scalable, you'll have to implement it with non-blocking I/O - which means that you should make it in such a way that threads will never get blocked waiting for I/O operations to complete.
Threads are relatively expensive objects. For example, for each thread memory needs to be allocated for its call stack. By default this is in the order of one or a few MB. Which means that if you create 1000 threads, just the call stacks for all those threads will already cost you ~ 1 GB memory.
In a naïve server application, you might create a thread for each accepted connection (each client). This won't scale very well if you have many concurrent users.
I don't know the implementation details of servers like Tomcat and Jetty, but they are most likely implemented using non-blocking I/O.
Some info about non-blocking I/O in Tomcat: Understanding the Tomcat NIO Connector
One of the most well-known non-blocking I/O libraries in Java is Netty.
Related
I have read a lot of material to try and clearly understand the gains a Jetty Non Blocking Web Application Server can or can't offer.
So far what I understand (in part by referring to this: How do Jetty and other containers leverage NIO while sticking to the Servlet specification?) is that with a non blocking IO model a web server like Jetty runs a single (or one per CPU core) thread - the Selector thread - that determines connections that are ready for some I/O. Connections that are ready with some I/O are dispatched for processing on to an internal thread pool to process the request.
I can see how such an architecture could allow you to serve many more connections with far fewer resources. However, what I am not clear about is this:
If I wrote a servlet that ran a long running database operation using a standard JDBC driver performing blocking I/O, wouldn't the handler thread dispatched from the pool to handle this request block?
And if requests came through faster than database requests are fulfilled, the handler thread pool would exhaust at some point?
And so with an application such as this is there any benefit to be run on a Non Blocking Jetty webserver? Is the non-blocking benefit only truly accrued if the servlet itself used another layer of non-blocking access to the database? Or is there something I am missing?
Please do explain if there's some magic through which Jetty will pay less of a price for the blocking database operations than say, a blocking web server.
P.S: For a contrast I read about Node.js here - How the single threaded non blocking IO model works in Node.js - it seems to suggest that Node uses libuv underneath and applies other techniques to translate all blocking operations in code (such as database access and sleep()) into event callbacks ensuring the event loop and the internal thread pool never get blocked in a blocking callback. While it's still a little gobbledygook to me, but assuming that's true for Node, can Jetty promise the same? That too for servlets etc that are not written in a non-blocking way?
After reading about the Tomcat NIO connector I still don't get one thing: is the nio connector beneficial if the application code is blocking, i.e. it blocks on reading from the database, on reading the file system, on calling external web services?
So, for example, you have a REST-like API that receives a request, reads something from the database, and returns a response. It doesn't use servlet 3 async, it just writes to the response.
I didn't find a full description of the thread pools used by the NIO connector, but I imagine it has a thread pool for handling the requests, so each request ends up in its own thread, which it can block.
If that's the case, are the benefits of NIO still there, or the blocking code diminishes the benefits of NIO (in terms of resource utilization)?
Is the nio connector beneficial if the application code is blocking...?
Yes, the NIO connector is built with the assumption that your app will block somewhere. The NIO connector basically has several socket placeholders and responds to new incoming requests until information starts getting written back.
I didn't find a full description of the thread pools used by the NIO connector
I think this is the start of your confusion. Tomcat NIO has a selector pool, not a thread pool (reference). The connector code polls each selector to see if it has incoming or outgoing bytes to send. In this sense, the selector for a given request will continue to receive information until there is enough to process the request with a Request/Response object that bridges the gap between synchronous I/O and asynchronous I/O (reference).
The polling code never blocks longer than the time it takes to serialize a packet of information, so it's free to handle new requests. The only real limitation is the amount of memory available to Tomcat. While there is a thread pool, the number of actual threads used are much lower than the number of connections the application can handle (reference).
While there are performance differences between Tomcat Connectors (reference), the difference in raw request/response time is pretty small when the servlet itself blocks. However, the difference in the number of simultaneous requests that Tomcat can handle is vastly different when you use non-blocking I/O.
In our new project we need to implement a server application. This server gets connection requests of 50,000(+) clients. Problem is these connections have to remain open and have to be managed somewhere. The application should work like a telephone exchange. So it can get requests of connected clients and connect them to other (maybe several) clients only if they are also connected. A proprietary protocol is used. My questions are:
How (and where) to manage the open sockets? Should I put them in a HashMap or something? This sounds curious to me. But I don't have experiences with so many open connections.
Are there any frameworks available which support this connection requirements?
Thank you for your help!
How (and where) to manage the open sockets? Should I put them in a HashMap or something?
Typically each socket will be managed by a thread that will be responsible for reading and writing to the socket. You would also have a master thread that is responsible for receiving all connection requests at a predefined network interface & port (using the ServerSocket API class), which may then hand off the actual processing work to the worker/slave threads. In this case, you ought to be looking at a thread pool for the worker threads, because creating 50k threads will most likely overwhelm your OS and the hardware.
Also, if you are indeed managing 50k concurrent sockets, using NIO API (java.nio.*) over the plain IO API of Java is highly recommended, although I haven't seen too many projects requiring more than 2-5k concurrent connections. There are atleast two known NIO based frameworks in the Java world - Apache MINA and JBoss Netty. I would however recommend reading the well written NIO tutorial, before heading onto use the NIO API or the NIO frameworks.
I noticed a major difference in processing time between two servlets in the same tomcat and two separate tomcats on the same host. The servlets communicate using http. Does tomcat or java have some mechanism that optimizes http communication when in the same tomcat or JVM. I'm trying to confirm this observation is not related to the host I'm running on.
It could be the difference between blocking and non-blocking I/O.
Tomcat uses the multi-thread model: have a pool of threads for processing requests and a queue for incoming requests. The server assigns a thread to an incoming request for processing, performs the task, sends back the response, and returns the thread to the pool. The queue handles requests that back up.
Non-blocking IO, as employed by Netty, is something different.
Perhaps the two requests are being queued up when they are processed by the same Tomcat.
Some more information about these tests. Both tests are run on SunOS 5.10 with Apache Tomcat Version 6.0.20 and jdk1.6.0_23. The http transfers can involve fairly large files - 5M. Thread handling might explain but with timing differences over a factor of 10 makes me suspect that no data has to transfer out of the JVM. Some form of blocking vs non-blocking might fit the timing difference.
Does any of you understand what weblogic.socket.Muxer is used for in WebLogic 8.1?
Often in thread dumps I see stack traces similar to this:
"ExecuteThread: '0' for queue: 'weblogic.socket.Muxer'" id=20 idx=0x68 tid=26709 prio=5 alive, in native, blocked, daemon
-- Blocked trying to get lock: java/lang/String#0x2b673d373c50[fat lock]
at jrockit/vm/Threads.waitForUnblockSignal()V(Native Method)
at jrockit/vm/Locks.fatLockBlockOrSpin(Locks.java:1675)[optimized]
at jrockit/vm/Locks.lockFat(Locks.java:1776)[optimized]
at jrockit/vm/Locks.monitorEnterSecondStageHard(Locks.java:1312)[optimized]
at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1259)[optimized]
at jrockit/vm/Locks.monitorEnter(Locks.java:2439)[optimized]
at weblogic/socket/EPollSocketMuxer.processSockets(EPollSocketMuxer.java:153)
at weblogic/socket/SocketReaderRequest.run(SocketReaderRequest.java:29)
at weblogic/socket/SocketReaderRequest.execute(SocketReaderRequest.java:42)
at weblogic/kernel/ExecuteThread.execute(ExecuteThread.java:145)
at weblogic/kernel/ExecuteThread.run(ExecuteThread.java:117)
at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
-- end of trace
It's not that I have any problems with that, it is just intresting to understand:
1) what is it doing?
2) can it affect any performance?
From the documentation (http://download.oracle.com/docs/cd/E13222_01/wls/docs100/perform/WLSTuning.html#wp1152246):
WebLogic Server uses software modules
called muxers to read incoming
requests on the server and incoming
responses on the client. These muxers
are of two primary types: the Java
muxer or native muxer.
A Java muxer has the following
characteristics:
Uses pure Java to read data from sockets.
It is also the only muxer available for RMI clients.
Blocks on reads until there is data to be read from a socket. This behavior does not scale well when there are a large number of sockets and/or when data arrives infrequently
at sockets. This is typically not an issue for clients, but it can create a huge bottleneck for a server.
Native muxers use platform-specific
native binaries to read data from
sockets. The majority of all platforms
provide some mechanism to poll a
socket for data. For example, Unix
systems use the poll system and the
Windows architecture uses completion
ports. Native provide superior
scalability because they implement a
non-blocking thread model. When a
native muxer is used, the server
creates a fixed number of threads
dedicated to reading incoming
requests. BEA recommends using the
default setting of selected for the
Enable Native IO parameter which
allows the server automatically
selects the appropriate muxer for the
server to use.
If the Enable Native IO parameter is
not selected, the server instance
exclusively uses the Java muxer. This
maybe acceptable if there are a small
number of clients and the rate at
which requests arrive at the server is
fairly high. Under these conditions,
the Java muxer performs as well as a
native muxer and eliminate Java Native
Interface (JNI) overhead. Unlike
native muxers, the number of threads
used to read requests is not fixed and
is tunable for Java muxers by
configuring the Percent Socket Readers
parameter setting in the
Administration Console. See Changing
the Number of Available Socket
Readers. Ideally, you should configure
this parameter so the number of
threads roughly equals the number of
remote concurrently connected clients
up to 50% of the total thread pool
size. Each thread waits for a fixed
amount of time for data to become
available at a socket. If no data
arrives, the thread moves to the next
socket.
Then, for those reasons, it is obviously better to use native muxers.
Here, it looks like you are using the default native muxer (weblogic.socket.EPollSocketMuxer), not the Java muxer (weblogic.socket.SocketMuxer).
I have found this link that explained the situation pretty much:
The socket Muxer manages the server’s
existing socket connections. It first
determines which sockets have incoming
requests waiting to be processed. It
then reads enough data to determine
the protocol and dispatches the socket
to an appropriate runtime layer based
on the protocol. In the runtime layer,
the socket muxer threads determine
which execute thread queue to be used
and delegates the request accordingly.
For any given application server, a thread dump will show you hundreds, if not thousands, of background threads. These servers are complex beasts, and these threads are just the background plumbing doing its job.
A "muxer" is a multiplexer, which is a mechanism for combining several streams of data on to a single channel. Weblogic will be using these to exchange data with itself, or with other nodes in the cluster. At any given time, a number of those will be "blocked", since they have nothing to do.
It's almost certainly no cause for concern. If you look under the rock, you're bound to find a few ugly things underneath blinking up at you in the sunlight.