Suppose a simple network model: A has successfully created a TCP connection to B, and they are communicating with each other like this
A <----------> B
I know that if the program on A dies (such as core dump), that will cause a RST packet to B. So any read attempt of B will lead to an EOF, and any write attempt of B will lead to SIGPIPE. Am I right?
If, however, suppose the network has broken down (such as cable/router failure) on A, what happens to the read/write attempt of B? In my situation, all the sockets has been set to non-blocking. As a result, is it impossible for me to detect network error?
By the way, I notice that there is an option SO_KEEPALIVE in socket which may be useful to me http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/. But I wonder how much the cost will be if I set the probing interval to 2~3 second (which by default is 75 seoncd)? And it seems interval configuration is a global one, so is this gonna affect all the sockets on the machine?
Final question...
Say the network has broken down and any write attempt would cause EPIPE some time later. If, however, instead of trying to write, I put this socket into epoll device, what will happend then? Will epoll_wait return EPOLLHUP or EPOLLERR event?
There's numerous other ways a TCP connection can go dead undetected
someone yanks out a network cable inbetween.
the computer at the other end gets nuked.
a nat gateway inbetween silently drops the connection
the OS at the other end crashes hard.
the FIN packets gets lost.
undetectable errors: A router in-between the endpoints may drops packets.(including control packets)
reff
In all cases you can know about it when you try to write on socket this cause through SIGPIPE error in your program and terminate it.
By read() it can't be know whether other-side live or not. Thants Why SO_KEEPALIVE useful. Keepalive is non-invasive, and in most cases, if you're in doubt, you can turn it on without the risk of doing something wrong. But do remember that it generates extra network traffic, which can have an impact on routers and firewalls.
And this affects all sockets on your machine too!(you are correct). And Because SO_KEEPALIVE increase traffic and consume CPU. It's best to set the SIGPIPE handle, if there is a chance application will ever write to a broken connection.
Also use SO_KEEPALIVE at reasonable place in the application. It's poor to use it for whole connection duration (i.e do use so_keepalive when server works for long on client query).
Setting the probing interval Dependends on your application or say
Application layer protocol.
Though enabling TCP keepalive, you'll detect it eventually - at least during a couple of hours.
Say if the network has broken down and however, instead of trying to write, socket is puted into some epoll device :
The second argument in epoll:
n = epoll_wait (efd, events, MAXEVENTS, -1);
Set with correct event-related code, Good practice is to check this code for
caution as follow.
n = epoll_wait (efd, events, MAXEVENTS, -1);
for (i = 0; i < n; i++)
{
if ((events[i].events & EPOLLERR) ||
(events[i].events & EPOLLHUP) ||
(!(events[i].events & EPOLLIN)))
{
/* An error has occured on this fd, or the socket is not
ready for reading (why were we notified then?) */
fprintf (stderr, "epoll error\n");
close (events[i].data.fd);
continue;
}
else if (sfd == events[i].data.fd)
{
/* We have a notification on the listening socket, which
means one or more incoming connections. */
// Do what you wants
}
}
Where EPOLLRDHUP means is:
Stream socket peer closed connection, or shut down writing half of connection. (This flag is especially useful for writing simple code to detect peer shutdown when using Edge Triggered monitoring.)
I know that if the program on A dies (such as core dump), that will cause a RST packet to B. So any read attempt of B will lead to an EOF, and any write attempt of B will lead to SIGPIPE. Am I right?
Partially. RST causes an ECONNRESET, not an EOF, when reading, and EPIPE when writing.
If, however, suppose the network has broken down (such as cable/router failure) on A, what happens to the read/write attempt of B? In my situation, all the sockets has been set to non-blocking. As a result, is it impossible for me to detect network error?
Impossible on read alone, unless you use a read timeout, e.g. via select(), and take a timeout as a failure, which it mightn't be. On write you will eventually get an EPIPE, but it could take some time and several attempts due to buffering and retries.
Related
I am writing a program that has a Java Server/Client socket. There will be many messages sent back and forth, and in some situations, sending a message to the server and waiting for a period of time until the server has sent back a "execute" message.
Here is what I have planned:
1 Server (machine could possibly have antivirus security on it)
3 Clients (with room for more clients in future)
Parallel and Interleaved synchronization being carried out on the server side based up the clients output to the server.
When all machines are ready (in sync), when parallel all clients will be sent an "execute" message, when interleave clients will be sent an "execute" command in sequential order 1 by 1
I have started to build the program to have this setup above, and once a message is received on the server, the servers performs actions based upon the input and then sends back a message to the client. I have had problems in the past where messages were not sent or received properly, so my question is:
Do I keep the socket alive until then end of my program?
Or do I keep the socket open only until a successful transmission (a full handshake) has taken place and then close the socket? Leaving the client to connect again next time it wants to send a message.
You should certainly keep TCP connections open for as long as possible, but be prepared to create a new one on failure. You will need to use read timeouts at both ends to detect those.
Q: Should I open a new socket each connection, or keep it around and re-use it for subsequent connections?
A: "It depends".
I would encourage you to "Keep it Simple" and simply open new socket as needed ... until you find that you need otherwise.
By all means: keep the socket open for as long as you reasonably expect a "dialog" between your client and server. Don't force the client to establish a new connection if he's likely to want to talk again reasonably quickly.
Finally, take a look at these links regarding "Connection Pooling":
http://www.javacodegeeks.com/2013/08/simple-and-lightweight-pool-implementation.html
http://tutorials.jenkov.com/java-multithreaded-servers/thread-pooled-server.html
Whether or not you close the socket after a message depends on the protocol that you use between the server and the clients. Probably you define this yourself.
What is probably more important, is that you are able to serve multiple clients in parallel. Therefore, you need to start a separate thread for every client that requests a connection.
Personally, I made some applications with socket communication. To prevent keeping resources for too long when they are not used, but also not closing and reopening constantly when a connection is heavily used, I added a connection supervisor. This is yet another thread, that does is started when a connection is opened, and just performs a countdown from a predefined value (e.g. countdown from 60, decreqsing the value every second for a supervision time of 1 minute). When the counter reaches zero, order to close the socket, and terminate that particular thread.
When a socket is open, and receives a new message, then reset the supervision counter, so the socket will remain open, as long as the time between messages is less than 1 minute.
Using the App Engine Trusted Tester Sockets to connect to APNS. Writing to socket works fine.
But the problem is that the Socket gets reclaimed after 2 minutes of inactivity. It says in the Trusted Tester Website that any socket operation keeps the socket alive for further 2 minutes. It is nicer to keep the socket open until APNS decides to close the connection.
After trying pretty much all of the Socket API methods short of writing to the Output Stream, Socket gets closed after 2 minutes no matter what. What have I missed?
Deployed on java backend.
You can't keep a socket connected to APNS artifically open; without sending actual push notifications. The only way to keep it open is to send some arbitrary data/bytes but that would result in an immediate closure of the socket; APNS closes the connection as soon as it detects something that does not conform to the protocol, i.e. something that is not an actual push notification.
SO_KEEPALIVE
What about SO_KEEPALIVE? App Engine explicitly says it is supported. I think it just means it won't throw an exception when you call Socket.setKeepAlive(true); calls wanted to set socket options raised Not Implemented exceptions before. Even if you enable keep-alive your socket will be reclaimed (closed) if you don't send something for more than 2 minutes; at least on App Engine as of now.
Actually, it's not a big surprise. RFC1122 that specifies TCP Keep Alive explicitly states that TCP Keep Alives are not to be sent more than once every two hours, and then, it is only necessary if there was no other traffic. Although, it also says that this interval must be also configurable, there is no API on java.net.Socket you could use to configure that (most probably because it's highly OS dependent) and I doubt it would be set to 2 minutes on App Engine.
SO_TIMEOUT
What about SO_TIMEOUT? It is for something completely else. The javadoc of Socket.setSoTimeout() states:
Enable/disable SO_TIMEOUT with the specified timeout, in milliseconds. With this option set to a non-zero timeout, a read() call on the InputStream associated with this Socket will block for only this amount of time. If the timeout expires, a java.net.SocketTimeoutException is raised, though the Socket is still valid. The option must be enabled prior to entering the blocking operation to have effect. The timeout must be > 0. A timeout of zero is interpreted as an infinite timeout.
That is, when read() is blocking for too long because there's nothing to read you can say "ok, I don't want to wait (block) anymore; let's do something else instead". It's not going to help with our "2 minutes" problem.
What then?
The only way you can work around this problem is this: detect when a connection is reclaimed/closed then throw it away and open a new connection. And there is a library which supports exactly that.
Check out java-apns-gae.
It's an open-source Java APNS library that was specifically designed to work (and be used) on Google App Engine.
https://github.com/ZsoltSafrany/java-apns-gae
Did you try getSoLinger()? That may be the getSocketOpt that works (kind of) currently and it may reset the 2 minute timeout. In theory, also doing a zero byte read would as well but I'm not sure that would, if you try that, use this method on the inputstream.
public int read(byte b[], int off, int len)
If these suggestions don't work, please file an issue with the App Engine issue tracker.
There will be some other fixes coming, e.g. using socket options etc.
Use getpeername().
From https://developers.google.com/appengine/docs/java/sockets/overview ...
Sockets may be reclaimed after 2 minutes of inactivity; any socket
operation (e.g. getpeername) keeps the socket alive for a further 2
minutes. (Notice that you cannot Select between multiple available
sockets because that requires java.nio.SocketChannel which is not
currently supported.)
I am kinda upset that this cannot be handled in an elegant way, after trying different solutions (this, this and several others) mentioned in answers to several SO questions, I still could not manage to detect socket disconnection (by unplugging cable).
I am using NIO non-blocking socket, everything works perfectly except that I find no way of detecting server disconnection.
I have the following code:
while (true) {
handlePendingChanges();
int selectedNum = selector.select(3000);
if (selectedNum > 0) {
SelectionKey key = null;
try {
Iterator<SelectionKey> keyIterator = selector.selelctedKeys().iterator();
while (keyIterator.hasNext()) {
key = keyIterator.next();
if (!key.isValid())
continue;
System.out.println("key state: " + key.isReadable() + ", " + key.isWritable());
if (key.isConnectable()) {
finishConnection(key);
} else if (key.isReadable()) {
onRead(key);
} else if (key.isWritable()) {
onWrite(key);
}
}
} catch (Exception e) {
e.printStackTrace();
System.err.println("I am happy that I can catch some errors.");
} finally {
selector.selectedKeys().clear();
}
}
}
While the SocketChannels are being read, I unplug the cable, and Selector.select() starts spinning and returning 0, now I have no chance to read or write the channels, because the main reading & writing code is guarded by if (selectedNum > 0), now this is the first confusion coming out of my head, from this answer, it is said that when the channel is broken, select() will return, and the selection key for the channel will indicate readable/writable, but it is apparently not the case here, the keys are not selected, select() still returns 0.
Also, from EJP's answer to a similar question:
If the peer closes the socket:
read() returns -1
readLine() returns null
readXXX() throws EOFException, for any other X.
Not the case here either, I tried commenting out if (selectedNum > 0) and using selector.keys().iterator() to get all the keys regardless whether or not they are selected, reading from those keys does not return -1 (0 returned instead), and writing to those keys does not get EOFException thrown. I only noted one thing, that even the keys are not selected, key.isReadable() returns true while key.isWritable() returns false (I guess this might be because I didn't register the keys for OP_WRITE).
My question is why Java socket is behaving like this or is there something I did wrong?
You have discovered that you need timers, and heartbeat on a TCP connection.
If you unplug the network cable, the TCP connection may not be broken. If you have nothing to send, the TCP/IP stack have nothing to send, it doesn't know that a cable is gone somewhere, or that the peer PC suddenly burst into flames. That TCP connection could be considered open until you reboot your server years later.
Think of it this way; how can the TCP connection know that the other end dropped off the network - it's off the network so it can't tell you that fact.
Some systems can detect this if you unplug the cable going into your server, and some will not. If you unplug the cable at the other end of e.g. an ethernet switch, that will not be detected.
That's why one always need supervisor timers(that e.g. send a heartbeat message to the peer, or close a TCP connection based on no activity for a given amount of time) for a TCP connection,
One very cheap way to at least avoid TCP connections that you only read data from, never write to, to stay up for years on end, is to enable TCP keepalive on a TCP socket - be aware that the default timeouts for TCP keepalive is often 2 hours.
Neither those answers applies. The first one concerns the case when the connection is broken, and the second one (mine) concerns the case where the peer closes the connection.
In a TCP connection, unless data is being sent or received, there is in principle nothing about pulling a cable that should break a connection, as TCP is deliberately designed to be robust across this sort of thing, and there is certainly nothing about it that should look to the local application like the peer closing.
The only way to detect a broken connection in TCP is to attempt to send data across it, or to interpret a read timeout as a lost connection after a suitable interval, which is an application decision.
You can also set TCP keep-alive on to enable detection of broken connections, and in some systems you can even control the timeout per socket. Not via Java however, so you are stuck with the system default, which should be two hours unless it has been modified.
Your code should call keyIterator.remove() after calling keyIterator.next().
We have a simple client server architecture between our mobile device and our server both written in Java. An extremely simple ServerSocket and Socket implementation. However one problem is that when the client terminates abruptly (without closing the socket properly) the server does not know that it is disconnected. Furthermore, the server can continue to write to this socket without getting any exceptions. Why?
According to documentation Java sockets should throw exceptions if you try to write to a socket that is not reachable on the other end!
The connection will eventually be timed out by Retransmit Timeout (RTO). However, the RTO is calculated using a complicated algorithm based on network latency (RTT), see this RFC,
http://www.ietf.org/rfc/rfc2988.txt
So on a mobile network, this can be minutes. Wait 10 minutes to see if you can get a timeout.
The solution to this kind of problem is to add a heart-beat in your own application protocol and tear down connection when you don't get ACK for the heartbeat.
The key word here (without closing the socket properly).
Sockets should always be acquired and disposed of in this way:
final Socket socket = ...; // connect code
try
{
use( socket ); // use socket
}
finally
{
socket.close( ); // dispose
}
Even with this precautions you should specify application timeouts, specific to your protocol.
My experience had shown, that unfortunately you cannot use any of the Socket timeout functionality reliably ( e.g. there is no timeout for write operations and even read operations may, sometimes, hang forever ).
That's why you need a watchdog thread that enforces your application timeouts and disposes of sockets that have been unresponsive for a while.
One convenient way of doing this is by initializing Socket and ServerSocket through corresponding channels in java.nio. The main advantage of such sockets is that they are Interruptible, that way you can simply interrupt the thread that does socket protocol and be sure that socket is properly disposed off.
Notice that you should enforce application timeouts on both sides, as it is only a matter of time and bad luck when you may experience unresponsive sockets.
TCP/IP communications can be very strange. TCP will retry for quite a while at the bottom layers of the stack without ever letting the upper layers know that anything happened.
I would fully expect that after some time period (30 seconds to a few minutes) you should see an error, but I haven't tested this I'm just going off how TCP apps tend to work.
You might be able to tighten the TCP specs (retry, timeout, etc) but again, haven't messed with it much.
Also, it may be that I'm totally wrong and the implementation of Java you are using is just flaky.
To answer the first part of the question (about not knowing that the client has disconnected abruptly), in TCP, you can't know whether a connection has ended until you try to use it.
The notion of guaranteed delivery in TCP is quite subtle: delivery isn't actually guaranteed to the application at the other end (it depends on what guaranteed means really). Section 2.6 of RFC 793 (TCP) gives more details on this topic. This thread on the Restlet-discuss list and this thread on the Linux kernel list might also be of interest.
For the second part (not detecting when you write to this socket), this is probably a question of buffer and timeout (as others have already suggested).
I am facing the same problem.
I think when you register the socket with a selector it doesn't throw any exception.
Are you using a selector with your socket?
If I am only WRITING to a socket on an output stream, will it ever block? Only reads can block, right? Someone told me writes can block but I only see a timeout feature for the read method of a socket - Socket.setSoTimeout().
It doesn't make sense to me that a write could block.
A write on a Socket can block too, especially if it is a TCP Socket. The OS will only buffer a certain amount of untransmitted (or transmitted but unacknowledged) data. If you write stuff faster than the remote app is able to read it, the socket will eventually back up and your write calls will block.
It doesn't make sense to me that a write could block.
An OS kernel is unable to provide an unlimited amount of memory for buffering unsent or unacknowledged data. Blocking in write is the simplest way to deal with that.
Responding to these followup questions:
So is there a mechanism to set a
timeout for this? I'm not sure what
behavior it'd have...maybe throw away
data if buffers are full? Or possibly
delete older data in the buffer?
There is no mechanism to set a write timeout on a java.net.Socket. There is a Socket.setSoTimeout() method, but it affects accept() and read() calls ... and not write() calls. Apparently, you can get write timeouts if you use NIO, non-blocking mode, and a Selector, but this is not as useful as you might imagine.
A properly implemented TCP stack does not discard buffered data unless the connection is closed. However, when you get a write timeout, it is uncertain whether the data that is currently in the OS-level buffers has been received by the other end ... or not. The other problem is that you don't know how much of the data from your last write was actually transferred to OS-level TCP stack buffers. Absent some application level protocol for resyncing the stream*, the only safe thing to do after a timeout on write is to shut down the connection.
By contrast, if you use a UDP socket, write() calls won't block for any significant length of time. But the downside is that if there are network problems or the remote application is not keeping up, messages will be dropped on the floor with no notification to either end. In addition, you may find that messages are sometimes delivered to the remote application out of order. It will be up to you (the developer) to deal with these issues.
* It is theoretically possible to do this, but for most applications it makes no sense to implement an additional resyncing mechanism on top of an already reliable (to a point) TCP/IP stream. And if it did make sense, you would also need to deal with the possibility that the connection closed ... so it would be simpler to assume it closed.
The only way to do this is to use NIO and selectors.
See the writeup from the Sun/Oracle engineer in this bug report:
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4031100