I am trying to figure out why the below close function is being called in netty's(3.10.5) NioWorker. The full Class can be found here
if (ret < 0 || failure) {
k.cancel(); // Some JDK implementations run into an infinite loop without this.
close(channel, succeededFuture(channel));
return false;
}
I am thinking there may be a couple of reasons like the host going down or host closing the channel after some time but I was thinking someone else who has worked with NioWorker might know better.
I have two different systems and this code is called a few time per day but the other system has like 100 per day with the same number of traffic. I am trying to find out why this might be the case.
There are 2 conditions for when this function is called:
There was an error condition when reading from the channel, this is signalled by the failure variable
The channel has reached end of stream, this happens if the other sides closes the channel cleanly (ie, using the TCP FIN flag)
Related
Suppose a simple network model: A has successfully created a TCP connection to B, and they are communicating with each other like this
A <----------> B
I know that if the program on A dies (such as core dump), that will cause a RST packet to B. So any read attempt of B will lead to an EOF, and any write attempt of B will lead to SIGPIPE. Am I right?
If, however, suppose the network has broken down (such as cable/router failure) on A, what happens to the read/write attempt of B? In my situation, all the sockets has been set to non-blocking. As a result, is it impossible for me to detect network error?
By the way, I notice that there is an option SO_KEEPALIVE in socket which may be useful to me http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/. But I wonder how much the cost will be if I set the probing interval to 2~3 second (which by default is 75 seoncd)? And it seems interval configuration is a global one, so is this gonna affect all the sockets on the machine?
Final question...
Say the network has broken down and any write attempt would cause EPIPE some time later. If, however, instead of trying to write, I put this socket into epoll device, what will happend then? Will epoll_wait return EPOLLHUP or EPOLLERR event?
There's numerous other ways a TCP connection can go dead undetected
someone yanks out a network cable inbetween.
the computer at the other end gets nuked.
a nat gateway inbetween silently drops the connection
the OS at the other end crashes hard.
the FIN packets gets lost.
undetectable errors: A router in-between the endpoints may drops packets.(including control packets)
reff
In all cases you can know about it when you try to write on socket this cause through SIGPIPE error in your program and terminate it.
By read() it can't be know whether other-side live or not. Thants Why SO_KEEPALIVE useful. Keepalive is non-invasive, and in most cases, if you're in doubt, you can turn it on without the risk of doing something wrong. But do remember that it generates extra network traffic, which can have an impact on routers and firewalls.
And this affects all sockets on your machine too!(you are correct). And Because SO_KEEPALIVE increase traffic and consume CPU. It's best to set the SIGPIPE handle, if there is a chance application will ever write to a broken connection.
Also use SO_KEEPALIVE at reasonable place in the application. It's poor to use it for whole connection duration (i.e do use so_keepalive when server works for long on client query).
Setting the probing interval Dependends on your application or say
Application layer protocol.
Though enabling TCP keepalive, you'll detect it eventually - at least during a couple of hours.
Say if the network has broken down and however, instead of trying to write, socket is puted into some epoll device :
The second argument in epoll:
n = epoll_wait (efd, events, MAXEVENTS, -1);
Set with correct event-related code, Good practice is to check this code for
caution as follow.
n = epoll_wait (efd, events, MAXEVENTS, -1);
for (i = 0; i < n; i++)
{
if ((events[i].events & EPOLLERR) ||
(events[i].events & EPOLLHUP) ||
(!(events[i].events & EPOLLIN)))
{
/* An error has occured on this fd, or the socket is not
ready for reading (why were we notified then?) */
fprintf (stderr, "epoll error\n");
close (events[i].data.fd);
continue;
}
else if (sfd == events[i].data.fd)
{
/* We have a notification on the listening socket, which
means one or more incoming connections. */
// Do what you wants
}
}
Where EPOLLRDHUP means is:
Stream socket peer closed connection, or shut down writing half of connection. (This flag is especially useful for writing simple code to detect peer shutdown when using Edge Triggered monitoring.)
I know that if the program on A dies (such as core dump), that will cause a RST packet to B. So any read attempt of B will lead to an EOF, and any write attempt of B will lead to SIGPIPE. Am I right?
Partially. RST causes an ECONNRESET, not an EOF, when reading, and EPIPE when writing.
If, however, suppose the network has broken down (such as cable/router failure) on A, what happens to the read/write attempt of B? In my situation, all the sockets has been set to non-blocking. As a result, is it impossible for me to detect network error?
Impossible on read alone, unless you use a read timeout, e.g. via select(), and take a timeout as a failure, which it mightn't be. On write you will eventually get an EPIPE, but it could take some time and several attempts due to buffering and retries.
now this problem is bugging me for a while.
In a working application that i work on, i use SocketChannel in non-blocking mode to communicate with embedded devices.
Now i receive sporadically corrupted data.
On some PCs it does not happen, now it happens on mine.
But when I change too much in the program, the problem disappears.
So much might have effects. The timing, the network interface hardware, win7, the java version, the company firewall, ...
The data reading boils down to this code:
byteBuffer.compact();
socketChannel.read(byteBuffer); // <<< problem here ?
byteBuffer.flip();
if( byteBuffer.hasRemaining() ){
handleData( byteBuffer );
}
This is run in the same thread as the writing, when the selector wakes up and the interest op OP_READ is set.
This code is the only place where byteBuffer is referenced. socketChannel is used only from the same thread when writing.
I instrumented the code, so i can printout the content of the last few read() calls, when the error happens. At the same time I analyze the network traffic on Wireshark. I added lots of asserts to check the bytebuffer integrity.
In Wireshark, the received stream looks good. No DUP-ACK or something else suspicious. The last read() calls match exactly with the data in Wireshark.
In Wireshark, i see many small TCP frames receiving with 90 bytes of payload data in intervals like 10ms arriving. Normally the Java thread reads the data as well all 10ms when it is just arrived.
When it comes to the problem, the Java thread is a bit delay, as the reading happens after 300ms, and the read returns with like ~3000 bytes which is plausible. But the data is corrupted.
The data looks like, if it was copied into the buffer and concurrently received data has overwritten the first data.
Now I don't know how to proceed. I cannot create a small example, as this only rarely happens and I don't know the exact condition which is needed.
Can someone give a hint?
How can I prove, it is the Java lib or not?
What conditions may be important to look at, too?
thanks
Frank
29-June-2015:
Now i was able to build a example for reproduction.
There is one Sender and a Receiver program.
The Sender is using blocking IO, first waiting for a connection, then sending 90 byte blocks every 2ms. The first 4 byte are a running counter, the remaining is not set. The Sender uses setNoTcpDelay(true).
The Receiver is using non-blocking IO. First it connects to the Sender, then it read the channel whenever the selection key is ready for it. Sometime, the read loop does a Thread.sleep(300).
If they run on the same PC over the loopback, this works for me all the time. If I put the Sender onto another PC, directly connected over LAN, it triggers the error. Checking with Wireshark, the traffic and the data sent looks good.
To run, first start the Sender on one PC, then (after editing the hostaddress) start the Receiver.
As long as it works, it prints a line about every 2 seconds. If it fails, it prints information about the last 5 read() calls.
What i found to be the trigger:
The sender has configured setNoTcpDelay(true)
The receiver has sometimes a Thread.sleep(300) before doing the read().
thanks
Frank
buf.order(ByteOrder.BIG_ENDIAN);
This is the default. Remove this.
buf.clear();
The buffer is already empty, because you just allocated it. Remove this.
buf.limit(0);
The limit is already zero after the clear() and also after the initial allocation. Remove this.
while( true ) {
There should be a select() call here.
Iterator<SelectionKey> it = selector.selectedKeys().iterator();
// ...
if( key == keyData && key.isConnectable() ) {
ch.finishConnect();
This method can return false. You're not handling that case.
// ...
if( key == keyData && key.isReadable() ) {
// ...
readPos += ch.read(buf);
Completely incorrect. You are entirely overlooking the case where read() returns -1, meaning the peer has disconnected. In this case you must close the channel.
// without this Thread.sleep, it would not trigger the error
So? Hasn't the penny dropped? Remove the sleep. It is completely and utterly pointless. The select() will block until data arrives. It doesn't need your help. This sleep is literally a waste of time.
if( rnd.nextInt(20) == 0 ) {
Thread.sleep(300);
}
Remove this.
selector.select();
This should be at the top of the loop, not the bottom.
I turned out to be a driver problem, at least it seems so.
I used the USB to Ethernet adapater "D-Link E-DUB100 Rev A".
Because of wireshark showing the correct data, i thought to eliminate the hardware a possible failure cause.
But meanwhile i tried "D-Link E-DUB100 Rev C1" and the problem disappeared.
So I assume it is a problem in the delivered drivers from D-Link for Rev A. And with Rev C1 it might use a system driver that does not have this problem.
thx for all taking the time to read my question.
I am kinda upset that this cannot be handled in an elegant way, after trying different solutions (this, this and several others) mentioned in answers to several SO questions, I still could not manage to detect socket disconnection (by unplugging cable).
I am using NIO non-blocking socket, everything works perfectly except that I find no way of detecting server disconnection.
I have the following code:
while (true) {
handlePendingChanges();
int selectedNum = selector.select(3000);
if (selectedNum > 0) {
SelectionKey key = null;
try {
Iterator<SelectionKey> keyIterator = selector.selelctedKeys().iterator();
while (keyIterator.hasNext()) {
key = keyIterator.next();
if (!key.isValid())
continue;
System.out.println("key state: " + key.isReadable() + ", " + key.isWritable());
if (key.isConnectable()) {
finishConnection(key);
} else if (key.isReadable()) {
onRead(key);
} else if (key.isWritable()) {
onWrite(key);
}
}
} catch (Exception e) {
e.printStackTrace();
System.err.println("I am happy that I can catch some errors.");
} finally {
selector.selectedKeys().clear();
}
}
}
While the SocketChannels are being read, I unplug the cable, and Selector.select() starts spinning and returning 0, now I have no chance to read or write the channels, because the main reading & writing code is guarded by if (selectedNum > 0), now this is the first confusion coming out of my head, from this answer, it is said that when the channel is broken, select() will return, and the selection key for the channel will indicate readable/writable, but it is apparently not the case here, the keys are not selected, select() still returns 0.
Also, from EJP's answer to a similar question:
If the peer closes the socket:
read() returns -1
readLine() returns null
readXXX() throws EOFException, for any other X.
Not the case here either, I tried commenting out if (selectedNum > 0) and using selector.keys().iterator() to get all the keys regardless whether or not they are selected, reading from those keys does not return -1 (0 returned instead), and writing to those keys does not get EOFException thrown. I only noted one thing, that even the keys are not selected, key.isReadable() returns true while key.isWritable() returns false (I guess this might be because I didn't register the keys for OP_WRITE).
My question is why Java socket is behaving like this or is there something I did wrong?
You have discovered that you need timers, and heartbeat on a TCP connection.
If you unplug the network cable, the TCP connection may not be broken. If you have nothing to send, the TCP/IP stack have nothing to send, it doesn't know that a cable is gone somewhere, or that the peer PC suddenly burst into flames. That TCP connection could be considered open until you reboot your server years later.
Think of it this way; how can the TCP connection know that the other end dropped off the network - it's off the network so it can't tell you that fact.
Some systems can detect this if you unplug the cable going into your server, and some will not. If you unplug the cable at the other end of e.g. an ethernet switch, that will not be detected.
That's why one always need supervisor timers(that e.g. send a heartbeat message to the peer, or close a TCP connection based on no activity for a given amount of time) for a TCP connection,
One very cheap way to at least avoid TCP connections that you only read data from, never write to, to stay up for years on end, is to enable TCP keepalive on a TCP socket - be aware that the default timeouts for TCP keepalive is often 2 hours.
Neither those answers applies. The first one concerns the case when the connection is broken, and the second one (mine) concerns the case where the peer closes the connection.
In a TCP connection, unless data is being sent or received, there is in principle nothing about pulling a cable that should break a connection, as TCP is deliberately designed to be robust across this sort of thing, and there is certainly nothing about it that should look to the local application like the peer closing.
The only way to detect a broken connection in TCP is to attempt to send data across it, or to interpret a read timeout as a lost connection after a suitable interval, which is an application decision.
You can also set TCP keep-alive on to enable detection of broken connections, and in some systems you can even control the timeout per socket. Not via Java however, so you are stuck with the system default, which should be two hours unless it has been modified.
Your code should call keyIterator.remove() after calling keyIterator.next().
I am currently trying to write a very simple chat application to introduce myself to java socket programming and multithreading. It consists of 2 modules, a psuedo-server and a psuedo-client, however my design has lead me to believe that I'm trying to implement an impossible concept.
The Server
The server waits on localhost port 4000 for a connection, and when it receives one, it starts 2 threads, a listener thread and a speaker thread. The speaker thread constantly waits for user input to the console, and sends it to the client when it receives said input. The listener thread blocks to the ObjectInputStream of the socket for any messages sent by the client, and then prints the message to the console.
The Client
The client connects the user to the server on port 4000, and then starts 2 threads, a listener and s speaker. These threads have the same functionality as the server's threads, but, for obvious reasons, handle input/output in the opposite way.
The First Problem
The problem I am running into is that in order to end the chat, a user must type "Bye". Now, since my threads have been looped to block for input:
while(connected()){
//block for input
//do something with this input
//determine if the connection still exists (was the message "Bye"?)
}
Then it becomes a really interesting scenario when trying to exit the application. If the client types "Bye", then it returns the sending thread and the thread that listened for the "Bye" on the server also returns. This leaves us with the problem that the client-side listener and the server-side speaker do not know that "Bye" has been typed, and thus continue execution.
I resolved this issue by creating a class Synchronizer that holds a boolean variable that both threads access in a synchronized manner:
public class Synchronizer {
boolean chatting;
public Synchronizer(){
chatting = true;
onChatStatusChanged();
}
synchronized void stopChatting(){
chatting = false;
onChatStatusChanged();
}
synchronized boolean chatting(){
return chatting;
}
public void onChatStatusChanged(){
System.out.println("Chat status changed!: " + chatting);
}
}
I then passed the same instance of this class into the thread as it was created. There was still one issue though.
The Second Problem
This is where I deduced that what I am trying to do is impossible using the methods I am currently employing. Given that one user has to type "Bye" to exit the chat, the other 2 threads that aren't being utilized still go on to pass the check for a connection and begin blocking for I/O. While they are blocking, the original 2 threads realize that the connection has been terminated, but even though they change the boolean value, the other 2 threads have already passed the check, and are already blocking for I/O.
This means that even though you will terminate the thread on the next iteration of the loop, you will still be trying to receive input from the other threads that have been properly terminated. This lead me to my final conclusion and question.
My Question
Is it possible to asynchronously receive and send data in the manner which I am trying to do? (2 threads per client/server that both block for I/O) Or must I send a heartbeat every few milliseconds back and forth between the server and client that requests for any new data and use this heartbeat to determine a disconnect?
The problem seems to reside in the fact that my threads are blocking for I/O before they realize that the partner thread has disconnected. This leads to the main issue, how would you then asynchronously stop a thread blocking for I/O?
I feel as though this is something that should be able to be done as the behavior is seen throughout social media.
Any clarification or advice would be greatly appreciated!
I don't know Java, but if it has threads, the ability to invoke functions on threads, and the ability to kill threads, then even if it doesn't have tasks, you can add tasks, which is all you need to start building your own ASync interface.
For that matter, if you can kill threads, then the exiting threads could just kill the other threads.
Also, a "Bye" (or some other code) should be sent in any case where the window is closing and the connection is open - If Java has Events, and the window you're using has a Close event, then that's the place to put it.
Alternately, you could test for a valid/open window, and send the "Bye" if the window is invalid/closed. Think of that like a poor mans' event handler.
Also, make sure you know how to (and have permission to) manually add exceptions to your networks' firewall(s).
Also, always test it over a live network. Just because it works in a loopback, doesn't mean it'll work over the network. Although you probably already know that.
Just to clarify for anyone who might stumble upon this post in the future, I ended up solving this problem by tweaking the syntax of my threads a bit. First of all, I had to remove my old threads, and replace them with AsyncSender and AsyncReader, respectively. These threads constantly send and receive regardless of user input. When there is no user input, it simply sends/receives a blank string and only prints it to the console if it is anything but a blank string.
The Workaround
try{
if((obj = in.readObject()) != null){
if(obj instanceof String)
output = (String) obj;
if(output.equalsIgnoreCase("Bye"))
s.stop();
}
}
catch(ClassNotFoundException e){
e.printStackTrace();
}
catch(IOException e){
e.printStackTrace();
}
In this iteration of the receiver thread, it does not block for input, but rather tests if the object read was null (no object was in the stream). The same is done in the sender thread.
This successfully bypasses the problem of having to stop a thread that is blocking for I/O.
Note that there are still other ways to work around this issue, such as using the InterruptableChannel.
I write and read in my function using Socket class. I used
synchronized(socket){
.//write;
//read;
}
I am doing this (repeat) every 50-1000 ms. Problem is when somebody ( unknown reason ) pluged off cable ( I got SocketTimeoutException). When he pluged in again, I need to continue.
What to do ? Do I need to close this socket in catch block and create new ? Or something else ?
You don't have to do anything. Just continue. If you get any other exception, close the Socket and restart (if appropriate).
I'd create a Decorator implementation that was willing to catch a SocketTimeoutException and retry. It could retry a certain number of times, over a certain interval before actually passing the exception along to indicate a "true" (to be defined) error condition. It could even retry indefinitely if you want. The retry logic may even encapsulate re-establishing the socket, though a timeout isn't enough to require that.
Another option is the CircuitBreaker Pattern, though it's not quite designed for what you are describing. CircuitBreaker is a bit better for avoiding costly errors that may be occurring over a period of time.