I am kinda upset that this cannot be handled in an elegant way, after trying different solutions (this, this and several others) mentioned in answers to several SO questions, I still could not manage to detect socket disconnection (by unplugging cable).
I am using NIO non-blocking socket, everything works perfectly except that I find no way of detecting server disconnection.
I have the following code:
while (true) {
handlePendingChanges();
int selectedNum = selector.select(3000);
if (selectedNum > 0) {
SelectionKey key = null;
try {
Iterator<SelectionKey> keyIterator = selector.selelctedKeys().iterator();
while (keyIterator.hasNext()) {
key = keyIterator.next();
if (!key.isValid())
continue;
System.out.println("key state: " + key.isReadable() + ", " + key.isWritable());
if (key.isConnectable()) {
finishConnection(key);
} else if (key.isReadable()) {
onRead(key);
} else if (key.isWritable()) {
onWrite(key);
}
}
} catch (Exception e) {
e.printStackTrace();
System.err.println("I am happy that I can catch some errors.");
} finally {
selector.selectedKeys().clear();
}
}
}
While the SocketChannels are being read, I unplug the cable, and Selector.select() starts spinning and returning 0, now I have no chance to read or write the channels, because the main reading & writing code is guarded by if (selectedNum > 0), now this is the first confusion coming out of my head, from this answer, it is said that when the channel is broken, select() will return, and the selection key for the channel will indicate readable/writable, but it is apparently not the case here, the keys are not selected, select() still returns 0.
Also, from EJP's answer to a similar question:
If the peer closes the socket:
read() returns -1
readLine() returns null
readXXX() throws EOFException, for any other X.
Not the case here either, I tried commenting out if (selectedNum > 0) and using selector.keys().iterator() to get all the keys regardless whether or not they are selected, reading from those keys does not return -1 (0 returned instead), and writing to those keys does not get EOFException thrown. I only noted one thing, that even the keys are not selected, key.isReadable() returns true while key.isWritable() returns false (I guess this might be because I didn't register the keys for OP_WRITE).
My question is why Java socket is behaving like this or is there something I did wrong?
You have discovered that you need timers, and heartbeat on a TCP connection.
If you unplug the network cable, the TCP connection may not be broken. If you have nothing to send, the TCP/IP stack have nothing to send, it doesn't know that a cable is gone somewhere, or that the peer PC suddenly burst into flames. That TCP connection could be considered open until you reboot your server years later.
Think of it this way; how can the TCP connection know that the other end dropped off the network - it's off the network so it can't tell you that fact.
Some systems can detect this if you unplug the cable going into your server, and some will not. If you unplug the cable at the other end of e.g. an ethernet switch, that will not be detected.
That's why one always need supervisor timers(that e.g. send a heartbeat message to the peer, or close a TCP connection based on no activity for a given amount of time) for a TCP connection,
One very cheap way to at least avoid TCP connections that you only read data from, never write to, to stay up for years on end, is to enable TCP keepalive on a TCP socket - be aware that the default timeouts for TCP keepalive is often 2 hours.
Neither those answers applies. The first one concerns the case when the connection is broken, and the second one (mine) concerns the case where the peer closes the connection.
In a TCP connection, unless data is being sent or received, there is in principle nothing about pulling a cable that should break a connection, as TCP is deliberately designed to be robust across this sort of thing, and there is certainly nothing about it that should look to the local application like the peer closing.
The only way to detect a broken connection in TCP is to attempt to send data across it, or to interpret a read timeout as a lost connection after a suitable interval, which is an application decision.
You can also set TCP keep-alive on to enable detection of broken connections, and in some systems you can even control the timeout per socket. Not via Java however, so you are stuck with the system default, which should be two hours unless it has been modified.
Your code should call keyIterator.remove() after calling keyIterator.next().
Related
I have a java application which manages several socket connections to devices. I have no control over the protocol which these devices implement, and now I want my java application to send heartbeats for each device. The devices do not send data, but only respond to commands.
The javadoc for InputStream.read() states that if the end of stream is reached, it will return -1. So that seems like a reasonable way to check if the connection is open. But when I implement this solution, there are no bytes available (since the device only responds to commands), and since the connection is open, it will hang at the read call forever. Example, I peek at one byte and if that would be -1 the heartbeat would be "unhealthy":
public static void main(final String[] args) throws IOException {
try (Socket socket = new Socket()) {
socket.connect(new InetSocketAddress("192.168.30.99", 25901), 1000);
System.out.println("Connected");
final BufferedInputStream bis = new BufferedInputStream(socket.getInputStream());
bis.mark(1);
System.out.println(bis.read()); // Stalls forever here
bis.reset();
System.out.println("Done");
}
}
Is it reasonable to say that, if no byte is received within x milliseconds, the device is connected?
Is there any surefire way to check socket connectivity without heartbeats where the ip and port is important?
Is there any surefire way to check socket connectivity without
heartbeats where the ip and port is important?
No, you can't reliably know if the other end is alive unless you try to communicate with it.
If the other end doesn't have a no-op ping function, you're pretty much out of luck. Waiting in a blocking read() call won't help you if the connection gets cut off.
Is it reasonable to say that, if no byte is received within x
milliseconds, the device is connected?
No. It means that the device hasn't sent anything in x milliseconds. Which is normal, as it only responds to commands.
when the other end of socket do not write any byte and wait to read from socket first, blocking on read is the default behavior.
with no control over the protocol , little can be done.
it is reasonable to say, successful connect is a weaker heartbeat.
you don't have to wait for x miliseconds which makes no difference on such protocol
another tricky way , you can try to send a few bytes that most unlikely being a valid command,
for example the '\0' or '\n' ,
hoping that it will do no harm to the device and the device can close socket actively on such invalid command.
when the other end closes socket actively , read call on such socket should return -1
the better heartbeat way always have something to do with the protocol,
as the no-op ping command suggested by #Kayaman
Maybe TCP level keep-alive is solution for you:
You can turn it on by using command:
socket.setKeepAlive(true);
It sets SO_KEEPALIVE socket option. Quote from SocketOptions java-API:
When the keepalive option is set for a TCP socket and no data has been
exchanged across the socket in either direction for 2 hours (NOTE: the
actual value is implementation dependent), TCP automatically sends a
keepalive probe to the peer. This probe is a TCP segment to which the
peer must respond. One of three responses is expected: 1. The peer
responds with the expected ACK. The application is not notified (since
everything is OK). TCP will send another probe following another 2
hours of inactivity. 2. The peer responds with an RST, which tells the
local TCP that the peer host has crashed and rebooted. The socket is
closed. 3. There is no response from the peer. The socket is closed.
The purpose of this option is to detect if the peer host crashes.
Valid only for TCP socket: SocketImpl
You could also use SO_TIMEOUT by using:
socket.setSoTimeout(timeout);
Enable/disable SO_TIMEOUT with the specified timeout, in milliseconds.
With this option set to a non-zero timeout, a read() call on the
InputStream associated with this Socket will block for only this
amount of time. If the timeout expires, a
java.net.SocketTimeoutException is raised, though the Socket is still
valid. The option must be enabled prior to entering the blocking
operation to have effect. The timeout must be > 0. A timeout of zero
is interpreted as an infinite timeout.
Call those right after connect() or accept() calls, before the program enters to
'no control of underlying protocl' -state.
Suppose a simple network model: A has successfully created a TCP connection to B, and they are communicating with each other like this
A <----------> B
I know that if the program on A dies (such as core dump), that will cause a RST packet to B. So any read attempt of B will lead to an EOF, and any write attempt of B will lead to SIGPIPE. Am I right?
If, however, suppose the network has broken down (such as cable/router failure) on A, what happens to the read/write attempt of B? In my situation, all the sockets has been set to non-blocking. As a result, is it impossible for me to detect network error?
By the way, I notice that there is an option SO_KEEPALIVE in socket which may be useful to me http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/. But I wonder how much the cost will be if I set the probing interval to 2~3 second (which by default is 75 seoncd)? And it seems interval configuration is a global one, so is this gonna affect all the sockets on the machine?
Final question...
Say the network has broken down and any write attempt would cause EPIPE some time later. If, however, instead of trying to write, I put this socket into epoll device, what will happend then? Will epoll_wait return EPOLLHUP or EPOLLERR event?
There's numerous other ways a TCP connection can go dead undetected
someone yanks out a network cable inbetween.
the computer at the other end gets nuked.
a nat gateway inbetween silently drops the connection
the OS at the other end crashes hard.
the FIN packets gets lost.
undetectable errors: A router in-between the endpoints may drops packets.(including control packets)
reff
In all cases you can know about it when you try to write on socket this cause through SIGPIPE error in your program and terminate it.
By read() it can't be know whether other-side live or not. Thants Why SO_KEEPALIVE useful. Keepalive is non-invasive, and in most cases, if you're in doubt, you can turn it on without the risk of doing something wrong. But do remember that it generates extra network traffic, which can have an impact on routers and firewalls.
And this affects all sockets on your machine too!(you are correct). And Because SO_KEEPALIVE increase traffic and consume CPU. It's best to set the SIGPIPE handle, if there is a chance application will ever write to a broken connection.
Also use SO_KEEPALIVE at reasonable place in the application. It's poor to use it for whole connection duration (i.e do use so_keepalive when server works for long on client query).
Setting the probing interval Dependends on your application or say
Application layer protocol.
Though enabling TCP keepalive, you'll detect it eventually - at least during a couple of hours.
Say if the network has broken down and however, instead of trying to write, socket is puted into some epoll device :
The second argument in epoll:
n = epoll_wait (efd, events, MAXEVENTS, -1);
Set with correct event-related code, Good practice is to check this code for
caution as follow.
n = epoll_wait (efd, events, MAXEVENTS, -1);
for (i = 0; i < n; i++)
{
if ((events[i].events & EPOLLERR) ||
(events[i].events & EPOLLHUP) ||
(!(events[i].events & EPOLLIN)))
{
/* An error has occured on this fd, or the socket is not
ready for reading (why were we notified then?) */
fprintf (stderr, "epoll error\n");
close (events[i].data.fd);
continue;
}
else if (sfd == events[i].data.fd)
{
/* We have a notification on the listening socket, which
means one or more incoming connections. */
// Do what you wants
}
}
Where EPOLLRDHUP means is:
Stream socket peer closed connection, or shut down writing half of connection. (This flag is especially useful for writing simple code to detect peer shutdown when using Edge Triggered monitoring.)
I know that if the program on A dies (such as core dump), that will cause a RST packet to B. So any read attempt of B will lead to an EOF, and any write attempt of B will lead to SIGPIPE. Am I right?
Partially. RST causes an ECONNRESET, not an EOF, when reading, and EPIPE when writing.
If, however, suppose the network has broken down (such as cable/router failure) on A, what happens to the read/write attempt of B? In my situation, all the sockets has been set to non-blocking. As a result, is it impossible for me to detect network error?
Impossible on read alone, unless you use a read timeout, e.g. via select(), and take a timeout as a failure, which it mightn't be. On write you will eventually get an EPIPE, but it could take some time and several attempts due to buffering and retries.
Using the App Engine Trusted Tester Sockets to connect to APNS. Writing to socket works fine.
But the problem is that the Socket gets reclaimed after 2 minutes of inactivity. It says in the Trusted Tester Website that any socket operation keeps the socket alive for further 2 minutes. It is nicer to keep the socket open until APNS decides to close the connection.
After trying pretty much all of the Socket API methods short of writing to the Output Stream, Socket gets closed after 2 minutes no matter what. What have I missed?
Deployed on java backend.
You can't keep a socket connected to APNS artifically open; without sending actual push notifications. The only way to keep it open is to send some arbitrary data/bytes but that would result in an immediate closure of the socket; APNS closes the connection as soon as it detects something that does not conform to the protocol, i.e. something that is not an actual push notification.
SO_KEEPALIVE
What about SO_KEEPALIVE? App Engine explicitly says it is supported. I think it just means it won't throw an exception when you call Socket.setKeepAlive(true); calls wanted to set socket options raised Not Implemented exceptions before. Even if you enable keep-alive your socket will be reclaimed (closed) if you don't send something for more than 2 minutes; at least on App Engine as of now.
Actually, it's not a big surprise. RFC1122 that specifies TCP Keep Alive explicitly states that TCP Keep Alives are not to be sent more than once every two hours, and then, it is only necessary if there was no other traffic. Although, it also says that this interval must be also configurable, there is no API on java.net.Socket you could use to configure that (most probably because it's highly OS dependent) and I doubt it would be set to 2 minutes on App Engine.
SO_TIMEOUT
What about SO_TIMEOUT? It is for something completely else. The javadoc of Socket.setSoTimeout() states:
Enable/disable SO_TIMEOUT with the specified timeout, in milliseconds. With this option set to a non-zero timeout, a read() call on the InputStream associated with this Socket will block for only this amount of time. If the timeout expires, a java.net.SocketTimeoutException is raised, though the Socket is still valid. The option must be enabled prior to entering the blocking operation to have effect. The timeout must be > 0. A timeout of zero is interpreted as an infinite timeout.
That is, when read() is blocking for too long because there's nothing to read you can say "ok, I don't want to wait (block) anymore; let's do something else instead". It's not going to help with our "2 minutes" problem.
What then?
The only way you can work around this problem is this: detect when a connection is reclaimed/closed then throw it away and open a new connection. And there is a library which supports exactly that.
Check out java-apns-gae.
It's an open-source Java APNS library that was specifically designed to work (and be used) on Google App Engine.
https://github.com/ZsoltSafrany/java-apns-gae
Did you try getSoLinger()? That may be the getSocketOpt that works (kind of) currently and it may reset the 2 minute timeout. In theory, also doing a zero byte read would as well but I'm not sure that would, if you try that, use this method on the inputstream.
public int read(byte b[], int off, int len)
If these suggestions don't work, please file an issue with the App Engine issue tracker.
There will be some other fixes coming, e.g. using socket options etc.
Use getpeername().
From https://developers.google.com/appengine/docs/java/sockets/overview ...
Sockets may be reclaimed after 2 minutes of inactivity; any socket
operation (e.g. getpeername) keeps the socket alive for a further 2
minutes. (Notice that you cannot Select between multiple available
sockets because that requires java.nio.SocketChannel which is not
currently supported.)
We have a simple client server architecture between our mobile device and our server both written in Java. An extremely simple ServerSocket and Socket implementation. However one problem is that when the client terminates abruptly (without closing the socket properly) the server does not know that it is disconnected. Furthermore, the server can continue to write to this socket without getting any exceptions. Why?
According to documentation Java sockets should throw exceptions if you try to write to a socket that is not reachable on the other end!
The connection will eventually be timed out by Retransmit Timeout (RTO). However, the RTO is calculated using a complicated algorithm based on network latency (RTT), see this RFC,
http://www.ietf.org/rfc/rfc2988.txt
So on a mobile network, this can be minutes. Wait 10 minutes to see if you can get a timeout.
The solution to this kind of problem is to add a heart-beat in your own application protocol and tear down connection when you don't get ACK for the heartbeat.
The key word here (without closing the socket properly).
Sockets should always be acquired and disposed of in this way:
final Socket socket = ...; // connect code
try
{
use( socket ); // use socket
}
finally
{
socket.close( ); // dispose
}
Even with this precautions you should specify application timeouts, specific to your protocol.
My experience had shown, that unfortunately you cannot use any of the Socket timeout functionality reliably ( e.g. there is no timeout for write operations and even read operations may, sometimes, hang forever ).
That's why you need a watchdog thread that enforces your application timeouts and disposes of sockets that have been unresponsive for a while.
One convenient way of doing this is by initializing Socket and ServerSocket through corresponding channels in java.nio. The main advantage of such sockets is that they are Interruptible, that way you can simply interrupt the thread that does socket protocol and be sure that socket is properly disposed off.
Notice that you should enforce application timeouts on both sides, as it is only a matter of time and bad luck when you may experience unresponsive sockets.
TCP/IP communications can be very strange. TCP will retry for quite a while at the bottom layers of the stack without ever letting the upper layers know that anything happened.
I would fully expect that after some time period (30 seconds to a few minutes) you should see an error, but I haven't tested this I'm just going off how TCP apps tend to work.
You might be able to tighten the TCP specs (retry, timeout, etc) but again, haven't messed with it much.
Also, it may be that I'm totally wrong and the implementation of Java you are using is just flaky.
To answer the first part of the question (about not knowing that the client has disconnected abruptly), in TCP, you can't know whether a connection has ended until you try to use it.
The notion of guaranteed delivery in TCP is quite subtle: delivery isn't actually guaranteed to the application at the other end (it depends on what guaranteed means really). Section 2.6 of RFC 793 (TCP) gives more details on this topic. This thread on the Restlet-discuss list and this thread on the Linux kernel list might also be of interest.
For the second part (not detecting when you write to this socket), this is probably a question of buffer and timeout (as others have already suggested).
I am facing the same problem.
I think when you register the socket with a selector it doesn't throw any exception.
Are you using a selector with your socket?
My chat application connects to a server and information is sent/received by the user. When the connection changes, such as 3g->wifi, wifi->3g, losing a data connection, etc, the socket sometimes stays connected for ages before disconnecting. During this time, it's impossible to tell if the connection is still active, it seems as if messages are being sent just fine. Other times, when sending a message, it will throw an IO error and disconnect.
Apart from implementing code to detect connection changes and reconnecting appropriately, is it possible to have the socket immediately throw an IO exception when connectivity changes?
Edit: I'm connecting using the following code:
Socket sock = new Socket();
sock.connect(new InetSocketAddress(getAddress(), getPort())), getTimeout());
//get bufferedReader and read until BufferedReader#readLine() returns null
I'm not using setSoTimeout as data may not be transferred for long periods of time depending on the remote server's configuration.
Are you talking about a java.net.Socket connection? Then try setSoTimeout(). Otherwise specify how you're connecting.
This is an old problem that I've seen a few times before in the database world.
The solution I used there was to manage the connection at the application level. I'd explicitly send a no-op message of some sort (i.e. SELECT 1 WHERE FALSE) over the connection every so often as a ping, and if this failed I would tear down and re-establish the connection, possibly to a failover server if the original wasn't accepting connections.
As previous answers already pointed out, this is a common problem. Even after sending a custom "ping" it might need some time until the socket realizes that the underlying connection is broken. Plus, regular pings are quite energy-demanding using 3-4G mobile networks, due to their tail states. Don't do that!
What you can do, however, is requesting to get informed when the connectivity changes (last section), and close/reconnect the socket manually in the according broadcast receiver. (EDIT: I see you already found out about this; just keeping it here for completeness)