We have a simple client server architecture between our mobile device and our server both written in Java. An extremely simple ServerSocket and Socket implementation. However one problem is that when the client terminates abruptly (without closing the socket properly) the server does not know that it is disconnected. Furthermore, the server can continue to write to this socket without getting any exceptions. Why?
According to documentation Java sockets should throw exceptions if you try to write to a socket that is not reachable on the other end!
The connection will eventually be timed out by Retransmit Timeout (RTO). However, the RTO is calculated using a complicated algorithm based on network latency (RTT), see this RFC,
http://www.ietf.org/rfc/rfc2988.txt
So on a mobile network, this can be minutes. Wait 10 minutes to see if you can get a timeout.
The solution to this kind of problem is to add a heart-beat in your own application protocol and tear down connection when you don't get ACK for the heartbeat.
The key word here (without closing the socket properly).
Sockets should always be acquired and disposed of in this way:
final Socket socket = ...; // connect code
try
{
use( socket ); // use socket
}
finally
{
socket.close( ); // dispose
}
Even with this precautions you should specify application timeouts, specific to your protocol.
My experience had shown, that unfortunately you cannot use any of the Socket timeout functionality reliably ( e.g. there is no timeout for write operations and even read operations may, sometimes, hang forever ).
That's why you need a watchdog thread that enforces your application timeouts and disposes of sockets that have been unresponsive for a while.
One convenient way of doing this is by initializing Socket and ServerSocket through corresponding channels in java.nio. The main advantage of such sockets is that they are Interruptible, that way you can simply interrupt the thread that does socket protocol and be sure that socket is properly disposed off.
Notice that you should enforce application timeouts on both sides, as it is only a matter of time and bad luck when you may experience unresponsive sockets.
TCP/IP communications can be very strange. TCP will retry for quite a while at the bottom layers of the stack without ever letting the upper layers know that anything happened.
I would fully expect that after some time period (30 seconds to a few minutes) you should see an error, but I haven't tested this I'm just going off how TCP apps tend to work.
You might be able to tighten the TCP specs (retry, timeout, etc) but again, haven't messed with it much.
Also, it may be that I'm totally wrong and the implementation of Java you are using is just flaky.
To answer the first part of the question (about not knowing that the client has disconnected abruptly), in TCP, you can't know whether a connection has ended until you try to use it.
The notion of guaranteed delivery in TCP is quite subtle: delivery isn't actually guaranteed to the application at the other end (it depends on what guaranteed means really). Section 2.6 of RFC 793 (TCP) gives more details on this topic. This thread on the Restlet-discuss list and this thread on the Linux kernel list might also be of interest.
For the second part (not detecting when you write to this socket), this is probably a question of buffer and timeout (as others have already suggested).
I am facing the same problem.
I think when you register the socket with a selector it doesn't throw any exception.
Are you using a selector with your socket?
Related
Suppose a simple network model: A has successfully created a TCP connection to B, and they are communicating with each other like this
A <----------> B
I know that if the program on A dies (such as core dump), that will cause a RST packet to B. So any read attempt of B will lead to an EOF, and any write attempt of B will lead to SIGPIPE. Am I right?
If, however, suppose the network has broken down (such as cable/router failure) on A, what happens to the read/write attempt of B? In my situation, all the sockets has been set to non-blocking. As a result, is it impossible for me to detect network error?
By the way, I notice that there is an option SO_KEEPALIVE in socket which may be useful to me http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/. But I wonder how much the cost will be if I set the probing interval to 2~3 second (which by default is 75 seoncd)? And it seems interval configuration is a global one, so is this gonna affect all the sockets on the machine?
Final question...
Say the network has broken down and any write attempt would cause EPIPE some time later. If, however, instead of trying to write, I put this socket into epoll device, what will happend then? Will epoll_wait return EPOLLHUP or EPOLLERR event?
There's numerous other ways a TCP connection can go dead undetected
someone yanks out a network cable inbetween.
the computer at the other end gets nuked.
a nat gateway inbetween silently drops the connection
the OS at the other end crashes hard.
the FIN packets gets lost.
undetectable errors: A router in-between the endpoints may drops packets.(including control packets)
reff
In all cases you can know about it when you try to write on socket this cause through SIGPIPE error in your program and terminate it.
By read() it can't be know whether other-side live or not. Thants Why SO_KEEPALIVE useful. Keepalive is non-invasive, and in most cases, if you're in doubt, you can turn it on without the risk of doing something wrong. But do remember that it generates extra network traffic, which can have an impact on routers and firewalls.
And this affects all sockets on your machine too!(you are correct). And Because SO_KEEPALIVE increase traffic and consume CPU. It's best to set the SIGPIPE handle, if there is a chance application will ever write to a broken connection.
Also use SO_KEEPALIVE at reasonable place in the application. It's poor to use it for whole connection duration (i.e do use so_keepalive when server works for long on client query).
Setting the probing interval Dependends on your application or say
Application layer protocol.
Though enabling TCP keepalive, you'll detect it eventually - at least during a couple of hours.
Say if the network has broken down and however, instead of trying to write, socket is puted into some epoll device :
The second argument in epoll:
n = epoll_wait (efd, events, MAXEVENTS, -1);
Set with correct event-related code, Good practice is to check this code for
caution as follow.
n = epoll_wait (efd, events, MAXEVENTS, -1);
for (i = 0; i < n; i++)
{
if ((events[i].events & EPOLLERR) ||
(events[i].events & EPOLLHUP) ||
(!(events[i].events & EPOLLIN)))
{
/* An error has occured on this fd, or the socket is not
ready for reading (why were we notified then?) */
fprintf (stderr, "epoll error\n");
close (events[i].data.fd);
continue;
}
else if (sfd == events[i].data.fd)
{
/* We have a notification on the listening socket, which
means one or more incoming connections. */
// Do what you wants
}
}
Where EPOLLRDHUP means is:
Stream socket peer closed connection, or shut down writing half of connection. (This flag is especially useful for writing simple code to detect peer shutdown when using Edge Triggered monitoring.)
I know that if the program on A dies (such as core dump), that will cause a RST packet to B. So any read attempt of B will lead to an EOF, and any write attempt of B will lead to SIGPIPE. Am I right?
Partially. RST causes an ECONNRESET, not an EOF, when reading, and EPIPE when writing.
If, however, suppose the network has broken down (such as cable/router failure) on A, what happens to the read/write attempt of B? In my situation, all the sockets has been set to non-blocking. As a result, is it impossible for me to detect network error?
Impossible on read alone, unless you use a read timeout, e.g. via select(), and take a timeout as a failure, which it mightn't be. On write you will eventually get an EPIPE, but it could take some time and several attempts due to buffering and retries.
Using the App Engine Trusted Tester Sockets to connect to APNS. Writing to socket works fine.
But the problem is that the Socket gets reclaimed after 2 minutes of inactivity. It says in the Trusted Tester Website that any socket operation keeps the socket alive for further 2 minutes. It is nicer to keep the socket open until APNS decides to close the connection.
After trying pretty much all of the Socket API methods short of writing to the Output Stream, Socket gets closed after 2 minutes no matter what. What have I missed?
Deployed on java backend.
You can't keep a socket connected to APNS artifically open; without sending actual push notifications. The only way to keep it open is to send some arbitrary data/bytes but that would result in an immediate closure of the socket; APNS closes the connection as soon as it detects something that does not conform to the protocol, i.e. something that is not an actual push notification.
SO_KEEPALIVE
What about SO_KEEPALIVE? App Engine explicitly says it is supported. I think it just means it won't throw an exception when you call Socket.setKeepAlive(true); calls wanted to set socket options raised Not Implemented exceptions before. Even if you enable keep-alive your socket will be reclaimed (closed) if you don't send something for more than 2 minutes; at least on App Engine as of now.
Actually, it's not a big surprise. RFC1122 that specifies TCP Keep Alive explicitly states that TCP Keep Alives are not to be sent more than once every two hours, and then, it is only necessary if there was no other traffic. Although, it also says that this interval must be also configurable, there is no API on java.net.Socket you could use to configure that (most probably because it's highly OS dependent) and I doubt it would be set to 2 minutes on App Engine.
SO_TIMEOUT
What about SO_TIMEOUT? It is for something completely else. The javadoc of Socket.setSoTimeout() states:
Enable/disable SO_TIMEOUT with the specified timeout, in milliseconds. With this option set to a non-zero timeout, a read() call on the InputStream associated with this Socket will block for only this amount of time. If the timeout expires, a java.net.SocketTimeoutException is raised, though the Socket is still valid. The option must be enabled prior to entering the blocking operation to have effect. The timeout must be > 0. A timeout of zero is interpreted as an infinite timeout.
That is, when read() is blocking for too long because there's nothing to read you can say "ok, I don't want to wait (block) anymore; let's do something else instead". It's not going to help with our "2 minutes" problem.
What then?
The only way you can work around this problem is this: detect when a connection is reclaimed/closed then throw it away and open a new connection. And there is a library which supports exactly that.
Check out java-apns-gae.
It's an open-source Java APNS library that was specifically designed to work (and be used) on Google App Engine.
https://github.com/ZsoltSafrany/java-apns-gae
Did you try getSoLinger()? That may be the getSocketOpt that works (kind of) currently and it may reset the 2 minute timeout. In theory, also doing a zero byte read would as well but I'm not sure that would, if you try that, use this method on the inputstream.
public int read(byte b[], int off, int len)
If these suggestions don't work, please file an issue with the App Engine issue tracker.
There will be some other fixes coming, e.g. using socket options etc.
Use getpeername().
From https://developers.google.com/appengine/docs/java/sockets/overview ...
Sockets may be reclaimed after 2 minutes of inactivity; any socket
operation (e.g. getpeername) keeps the socket alive for a further 2
minutes. (Notice that you cannot Select between multiple available
sockets because that requires java.nio.SocketChannel which is not
currently supported.)
I am building a client-server application where I have to implement a keepalive mechanism in order to detect that the client has crashed or not. I have separate threads on both client and server side. the client thread sends a "ping" then sleeps for 3 seconds, while the server reads the BufferedInputStream and checks whether ping is received, if so it makes the ping counter equals zero, else it increments the counter by +1, the server thread then sleeps for 3 seconds, if the ping counter reaches 3, it declares the client as dead.
The problem is that when the server reads the input stream, its a blocking call, and it blocks until the next ping is received, irrespective of how delayed it is, so the server never detects a missed ping.
any suggestions, so that I can read the current value of the stream and it doesn't block if there is nothing on the incoming stream.
Thanks,
Java 1.4 introduced the idea of non-blocking I/O, represented by the java.nio package. This is probably what you need.
See this tutorial for how to use non-blocking I/O.
Also, assuming this isn't homework or a learning exercise, then I recommend using a more robust protocol framework such as Apache Mina or JBoss Netty, rather than building this stuff from scratch. See this comparison between them, and why you'd want to use them.
You can have a separate monitoring thread which monitors all the blocking connections. When a connection receives anything it can reset a counter. (I would treat any packet as good as a heartbeat) Your monitoring thread can increment this counter each times it runs and when it reaches a limit (i.e. because it wasn't reset back to zero) you can close the connection. You only need one such thread. The thread which is blocking on the connection you just closed with throw an IOException, waking the thread.
On the other side, a heartbeat can be triggered whenever a packet has not been sent for some period of time. This mean a busy connection doesn't send any heartbeats, it shouldn't need to.
I want to check when the internet goes off can i capture that event .I am not getting the proper API or any example which would explain the same .
I am using socket for (TCP)communication and I open a socket when the network is available. I have observed that the socket does not give any exception in case the network goes off.
If any one had done or any example links would be really helpful Thanks in advance
The problem is that no event 'network down' exists in tcp connections, they just go down.
As suggested by Jerome you should check if timeout is reached.
Of course if network goes down you won't receive packets neither be able to send them so the underlying InputStream and OutputStream will throw an IOException but just when they'll realize that network is not working properly (usually 2*rtt = 120 seconds, it depends how TCP layer is managed).
Look state diagram by yourself:
What typically happens is that when in ESTABLISHED your socket will send data over the socket while waiting for ACK from destination. ACK won't come since network went off so your socket's window fills up and socket starts resending packets until real timeout intervenes throwing the exception.
Another case is when network goes off and your socket realizes that it cannot write anymore on channel: it will throw an exception imediately upon calling outStream.write(...).
It's not that easy to tell whether the network is off or just slow.
If you set Timeouts, it will throw exception if it takes too long:
For sockets:
socket.setSoTimeout(CONNECTION_TIMEOUT);
For HttpURLConnections:
HttpURLConnection con = (HttpURLConnection)url.openConnection();
con.setConnectTimeout(CONNECTION_TIMEOUT);
con.setReadTimeout(CONNECTION_TIMEOUT);
TCP is designed to be quiet when idle. There is no administrative packets on wire when there is no pending packet. If the connection is dead while idle, you will not know, no matter what the setting of the timeout is. It does have keepalives but it's pretty much useless at the recommended frequency of 2 hours and longer.
You need to build some heartbeat or keepalive in your application protocol to detect stale connections. Keepalive is nothing but a noop packet sent at regular interval to trigger TCP timeout when connection is down. In my app, I do this every 10 seconds.
Why don't you try pinging www.google.com
See http://java.sun.com/j2se/1.5.0/docs/guide/nio/example/Ping.java
My chat application connects to a server and information is sent/received by the user. When the connection changes, such as 3g->wifi, wifi->3g, losing a data connection, etc, the socket sometimes stays connected for ages before disconnecting. During this time, it's impossible to tell if the connection is still active, it seems as if messages are being sent just fine. Other times, when sending a message, it will throw an IO error and disconnect.
Apart from implementing code to detect connection changes and reconnecting appropriately, is it possible to have the socket immediately throw an IO exception when connectivity changes?
Edit: I'm connecting using the following code:
Socket sock = new Socket();
sock.connect(new InetSocketAddress(getAddress(), getPort())), getTimeout());
//get bufferedReader and read until BufferedReader#readLine() returns null
I'm not using setSoTimeout as data may not be transferred for long periods of time depending on the remote server's configuration.
Are you talking about a java.net.Socket connection? Then try setSoTimeout(). Otherwise specify how you're connecting.
This is an old problem that I've seen a few times before in the database world.
The solution I used there was to manage the connection at the application level. I'd explicitly send a no-op message of some sort (i.e. SELECT 1 WHERE FALSE) over the connection every so often as a ping, and if this failed I would tear down and re-establish the connection, possibly to a failover server if the original wasn't accepting connections.
As previous answers already pointed out, this is a common problem. Even after sending a custom "ping" it might need some time until the socket realizes that the underlying connection is broken. Plus, regular pings are quite energy-demanding using 3-4G mobile networks, due to their tail states. Don't do that!
What you can do, however, is requesting to get informed when the connectivity changes (last section), and close/reconnect the socket manually in the according broadcast receiver. (EDIT: I see you already found out about this; just keeping it here for completeness)