I've a general question. If cpu has one core and I run multiple threads on it. Each thread is used for a GET request. How will network connection survive the thread-switching?
What happens if one thread starts receiving response from server and suddenly a thread-switch happens, considering HTTP use TCP comm., how things would end-up?
Thanks.
TL;DR Connection will survive unless the thread gets control back too late when the server terminates it by timeout.
To understand why it works this way, consider how data gets from a wire (or air) to an application.
The network interface collects data from medium (wire) into internal hardware buffer and when some chunk of data is complete it emits so called hardware interruption (which is just a low-level event). OS handles the interruption using a driver of the network interface and that chunk of data gets to a buffer in the main memory of a computer. The buffer is controlled by OS. When the application reads data from the connection it actually reads data from that buffer.
When thread-switch happens, content of the main memory is never lost. So when the thread gets control back, it just proceeds with its task from the point it was suspended.
If the thread gets back to work when the server has already closed the connection by timeout, an IOError is thrown by the method that tries to read the data from the connection.
This explanation is oversimplified and may be even wrong in details but should give an overall impression about how the things work.
Related
In a server, there is a thread A listening for incoming connections, typically looping forever. When a connection is accepted, thread A creates a task (say, class Callable in Java) and submits it to an Executor.
All this really means is that A lost the reference to the socket, and that now there’s a thread B (created by the Executor) that manages the socket. If B experiences any exception, it would close the socket, and there is no risk that the socket, as an operating system resource, will not be reclaimed.
This is all fine if thread B starts. But what if the executor was shut down before B had a chance to get scheduled?
Does anyone think this is an issue? If the reference to the socket is lost due to this, would the garbage collector close the socket?
Yes, it sounds like an issue.
The OS will probably eventually free up the socket (at least if it's TCP, as far as I can tell) but it will probably take a relatively long time.
I don't think the garbage collector plays a role in this case. At least not for threads, which after having been started will usually keep running even if there is no reference to them in the code (this is true at least for non-daemon threads). Sockets may behave in a similar manner.
If you cannot guarantee the connection is going to be processed (by starting the handling Thread instance as soon as it is established) then you should keep a reference to the socket and make sure you close all of them as soon as possible, which probably means right after Executor.shutdown() or similar method has been called.
Please note that depending on how you ask the Executor to shut down it will either process or not threads which already have been submitted to execution but haven't yet started. So be sure to make your code behave accordingly.
Also if you have limited resources (available threads) to process incoming socket connections and don't want them to grow too much, consider closing them immediately after having been accepted so they don't pile up in the unprocessed wait queue, if this is feasible in your project. The client can then retry connecting at a later time. If you still need to consume connections as soon as they come in, consider a non-blocking I/O approach, which will tend to scale better (and up to a point).
If the reference to the socket is lost due to this, would the garbage collector close the socket?
Probably. But the garbage collector may not run until literally the end of next week: You can't rely on the GC running, pretty much ever, just because 'hey, java has a garbage collector'. It does, and it won't kick in until needed. It may simply never be needed.
Depending on the GC to close resources is a fine way to get your VM killed by the OS for using up too many system resources.
The real question is: What is the causal process that results in shutting down the executor?
If there is some sort of 'cancel all open connections' button, and you implemented that as a one-liner: queue.shutdown(), then, no - that is not a good idea: You'll now be leaning on the GC to clean up those sockets which is bad.
I assume your callables look like:
Socket socket = ....; // obtained from queue
Callable<Void> socketHandler = () -> {
try {
// all actual handling code is here.
} finally {
socket.close();
}
return null;
};
then yeah that is a problem: If the callable is never even started, that finally block won't run. (If you don't have finally you have an even bigger problem - that socket won't get cleaned up if an exception occurs during the handling of it!).
One way out is to have a list of sockets, abstract away the queue itself, and have that abstraction have a shutdown method which both shuts down the queue and closes every socket, guarding every step (both the queue shutdown as well as all the socket.close commands) with a try/catch block to ensure that a single exception in one of these steps won't just stop the shutdown process on the spot.
Note that a bunch of handlers are likely to still be chugging away, so closing the socket 'out from under them' like this will cause exceptions in the handlers. If you don't want that, shut down the queue, then await termination (guarded with try/catch stuff), and then close all the sockets.
You can close a closed socket, that is a noop, no need to check first and no need to worry about the impact of closing a ton of already-closed sockets.
But do worry about keeping an obj ref to an infinitely growing list of sockets. Once a socket is completely done with, get rid of it - also from this curated list of 'stuff you need to close if the queue is terminated'.
Of course, if the only process that leads to early queue termination is because you want to shut down the VM, don't worry about it. The sockets go away with the VM. In fact, no need to shutdown the queue. If you intend to end the VM, just.. end it. immediately: System.shutdown(0) is what you want. There is no such thing as 'but.. I should ask all the things to shut down nicely!'. That IS how you ask. Systems that need to clean up resources are mostly badly designed (design them so that they don't need cleanup on VM shutdown. All the resources work that way, for example), and if you must, register a shutdown hook.
I would like to use Java Netty to create a TCP server for a large number of persistent connections from a clients. In other words, imaging that there are 1000 client devices out there, and all of them create and maintain a persistent connection to the TCP server. There will be a reasonable amount of traffic (mostly lines of text) that go back and forth across each of these persistent connections. How can I determine the best number of threads to use in the boss and worker groups for NioEventLoopGroup?
My understanding is that when the connection is created, Netty creates a SimpleChannelInboundHandler<String> object to handle the connection. When the connection is created then the handler channelActive method is called, and every time it gets a new message from the client, the method messageReceived gets called (or channelRead0 method in Netty 4.0.24).
Is my understanding correct?
What happens if I have long running code to run in messageReceived -
do I need to launch this code in yet another thread
(java.util.Thread)?
What happens if my messageReceived method blocks on something or
takes a long time to complete? Does that bring Netty to a grinding
halt?
Basically I need to write a TCP socket server that can serve a large number of persistent connections as quickly as possible.
Is there any guidance available on number of threads for NioEventLoopGroup and on how to use any threads inside the handler?
Any help would be greatly appreciated.
How can I determine the best number of threads to use in the boss and worker groups for NioEventLoopGroup?
About Boss Thread,if you are saying that you need persistent connections , there is no sense to use a lot of boss threads, because boss threads only responsible for accepting new connections. So I would use only one boss thread.
The number of worker threads should depends on your processor cores.
Don't forget to add -XmsYYYYM and -XmxYYYYM as your VM attributes, because without them you can face the case, when your JVM are not using all cores.
What happens if I have long running code to run in messageReceived - do I need to launch this code in yet another thread (java.util.Thread)?
Do you really need to do it? Probably you should think of doing your logic another way, if not then probably you should consider OIO with new thread for each connection.
What happens if my messageReceived method blocks on something or takes a long time to complete?
You should avoid using thread blocking actions in your handlers.
Does that bring Netty to a grinding halt?
Yep, it does.
We are facing an unusual problem in our application, in the last one month our application reached an unrecoverable state, It was recovered post application restart.
Background : Our application makes a DB query to fetch some information and this Database is hosted on a separate node.
Problematic case : When the thread dump was analyzed we see all the threads are in runnable state fetching the data from the database, but it didn't finished even after 20 minutes.
Post the application restart as expected all threads recovered. And the CPU usage was also normal.
Below is the thread dump
ThreadPool:2:47" prio=3 tid=0x0000000007334000 nid=0x5f runnable
[0xfffffd7fe9f54000] java.lang.Thread.State: RUNNABLE at
oracle.jdbc.driver.T2CStatement.t2cParseExecuteDescribe(Native Method)
at
oracle.jdbc.driver.T2CPreparedStatement.executeForDescribe(T2CPreparedStatement.java:518)
at
oracle.jdbc.driver.T2CPreparedStatement.executeForRows(T2CPreparedStatement.java:764)
at ora
All threads in the same state.
Questions:
what could be the reason for this state?
how to recover under this case ?
It's probably waiting for network data from the database server. Java threads waiting (blocked) on I/O are described by the JVM as being in the state RUNNABLE even though from the program's point of view they're blocked.
As others mentioned already, that native methods are always in runnable, as the JVM doesn't know/care about them.
The Oracle drivers on the client side have no socket timeout by default. This means if you have network issues, the client's low level socket may "stuck" there for ever, resulting in a maxxed out connection pool. You could also check the network trafic towards the Oracle server to see if it even transmits data or not.
When using the thin client, you can set oracle.jdbc.ReadTimeout, but I don't know how to do that for the thick (oci) client you use, I'm not familiar with it.
What to do? Research how can you specify read timeout for the thick ojdbc driver, and watch for exceptions related to the connection timeout, that will clearly signal network issues. If you can change the source, you can wrap the calls and retry the session when you catch timeout-related SQLExceptions.
To quickly address the issue, terminate the connection on the Oracle server manually.
Worth checking the session contention, maybe a query blocks these sessions. If you find one, you'll see which database object causes the problem.
Does your code manually handle transaction? If then, maybe some of the code didn't commit() after changing data. Or maybe someone ran data modification query directly through PLSQL or something and didn't commit, and that leads all reading operation to be hung.
When you experienced that "hung" and DB has recovered from the status, did you check the data if some of them were rolled back? Asking this since you said "It was recovered post application restart.". It's happening when JDBC driver changed stuff but didn't commit, and timeout happened... DB operation will be rolled back. ( can be different based on the configuration though )
Native methods remain always in RUNNABLE state (ok, unless you change the state from the native method, itself, but this doesn't count).
The method can be blocked on IO, any other event waiting or just long cpu intense task... or endless loop.
You can make your own pick.
how to recover under this case ?
drop the connection from oracle.
Is the system or JVM getting hanged?
If configurable and if possible, reduce the number of threads/ parallel connections.
The thread simply waste CPU cycles when waiting for IO.
Yes your CPU is unfortunately kept busy by the threads who are awaiting a response from DB.
In a servlet-based app I'm currently writing we have separate thread classes for readers and writers. Data is transmitted from a writer to multiple readers by using LinkedBlockingQueue<byte[]>, so a reader safely blocks if there is no new data to get from the writer. The problem is that if remote clients served by these reader threads terminate connection, Tomcat won't throw a broken pipe unless the writer sends in new data and attempts to transmit this new chunk to the remote clients. In other words, the following attack can be performed against our service:
Start a streaming write request and don't write any data to it at all.
Keep creating and dropping read connections. Since the writer doesn't produce any data, reading threads attached to it remain blocked and consume memory and other resources.
Observe the server run out of RAM quickly.
Should I create a single maintenance thread that would monitor sockets belonging to blocked reader threads and send interrupt() to those that appear to have lost connection to their respective clients? Are there any major flaws in the architecture described above? Thank you.
Sounds to me that the vulnerability lies in the fact that your readers wait forever, regardless of the state of the incoming connection (which, of course, you can't know about).
Thus a straightforward way to address this, if appropriate, would be to use the poll method on BlockingQueue, rather than take.Calling poll allows you to specify a timeout, after which the reader will return null if no data has been added to the queue.
In this way the readers won't stay blocked forever, and should relatively quickly fall back into the normal processing loop, allowing their resources to be freed as appropriate.
(This isn't a panacea of course; while the timeout is still running, the readers will consume resources. But ultimately a server with finite resources will have some vulnerability to a DDOS attack - and this reduces its impact to a customisably small window, instead of leaving your server permanently crippled, at least.)
The approach I taken in the past is to have blocking connections and only have reader threads. When you want to write to multiple connections, you do that in the current thread. If you are concerned about a write blocking for every, you can have a single monitoring thread which closes blocked connections.
You can still have resources tied up in unused sockets, but I would have another thread which finds unused sockets and closes them.
This leaves you will one thread per connection, plus a couple of monitoring threads.
I've a situation where a thread opens a telnet connection to a target m/c and reads the data from a program which spits out the all the data in its buffer. After all the data is flushed out, the target program prints a marker. My thread keeps looking for this marker to close the connection (successful read).
Some times, the target program does not print any marker, it keeps on dumping the data and my thread keeps on reading it (no marker is printed by the target program).
So i want to read the data only for a specific period of time (say 15 mins/configurable). Is there any way to do this at the java API level?
Use another thread to close the connection after 15 mins. Alternatively, you could check after each read if 15mins have passed and then simply stop reading and cleanup the connection, but this would only work if you're sure the remote server will continue to send data (if it doesn't the read will block indefinitely).
Generally, no. Input streams don't provide timeout functinality.
However, in your specific case, that is, reading data from a socket, yes. What you need to do is set the SO_TIMEOUT on your socket to a non-zero value (the timeout you need in millisecs). Any read operations that block for the amount of time specified will throw a SocketTimeoutException.
Watch out though, as even though your socket connection is still valid after this, continuing to read from it may bring unexpected result, as you've already half consumed your data. The easiest way to handle this is to close the connection but if you keep track of how much you've read already, you can choose to recover and continue reading.
If you're using a Java Socket for your communication, you should have a look at the setSoTimeout(int) method.
The read() operation on the socket will block only for the specified time. After that, if no information is received, a java.net.SocketTimeoutException will be raised and if treated correctly, the execution will continue.
If the server really dumps data forever, the client will never be blocked in a read operation. You might thus regularly check (between reads) if the current time minus the start time has exceeded your configurable delay, and stop reading if it has.
If the client can be blocked in a synchronous read, waiting for the server to output something, then you might use a SocketChannel, and start a timer thread that interrupts the main reading thread, or shuts down its input, or closes the channel.