Meaning of "java.io.IOException: Connection timed out" after connect phase - java

Could be related: Difference between Connection timed out and Read timed out
I have written a java server application using nio.
I connected a client to my server application and unplugged the network cable of the client. On the server side, I didn't get any exception immediately but after some time (8 minutes or so), I got a "IOException: Connection timed out"
Here is a partial stack trace:
java.io.IOException: Connection timed out
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225)
at sun.nio.ch.IOUtil.read(IOUtil.java:198)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375)
........
Till this time, when I saw the netstat output, I see that the socket state of this particular client connection is shown as ESTABLISHED.
Questions are:
Is this timeout configurable?
Why does the netstat output show the socket state as ESTABLISHED? Ideally it should be CLOSE_WAIT (as the client got disconnected)

No it is not configurable. It is the result of retransmit timeouts. It wouldn't happen at all unless the application kept writing, or had pending writes when the disconnect happened.
It shouldn't be CLOSE_WAIT, as no FIN had been received. Ergo it should be ESTABLISHED.

That timeout is generally not configurable as it depends on the possibilities offered by the operating system. Unix in general does not allow a process to fix the connection timeout and generally it is fixed to around two minutes. Perhaps some versions of linux/BSD systems allow this to be configured, but that's not portable and normally is not allowed to fix it to the user (only the administrator). This has to do with the number of retransmissions and the timeouts used for each try, and is under the exclusive control of the TCP implementation.
When you finish a connection you pass through two states (FIN_WAIT and TIME_WAIT) that are not timeout states. The first of two is to get the other end's response (you can close your side of the connection telling the other side you are not going to send more data, but you have to wait for the other end to do the same thing) The TIME_WAIT is a special state that the kernel maintains for a closed connection to process (and discard) all the possible retransmissions of the last frames that can be in course after the connection is closed. They have nothing to do with timeouts.
A tcp connection has no timeout implicit. Two machines can pass weeks without interchanging any info if they have nothing to transmit. You can control the use of some kind of heartbeat between silenting connections to check their liveness with one socket option (SO_KEEPALIVE) This option makes the tcps at both sides to interchange empty packets to know if the other side is still alive. Again, you can only control the use of this packets, not the frequency or the number of lost frames that closes the connection (this can be configured in linux, but touching the kernel configuration only in administrator mode)
Note 1 (answer to #Krishna Chaitanya P)
If you unplugged the cable and got an exception some time later, it can be one of two reasons for that to happen:
You continued writing to that connection and the sending buffer filled up without being acknowledged in time (this is rare, as normally your process get blocked in write(2) system call when this happens) and some timeout (in the java implementation of socket) did occur.
Your java implementation of tcp socket uses the SO_KEEPALIVE option (the most probable thing). As I said before, you have boolean control to use or not use it, but you cannot adjust the time between keepalives or the number of them that drops your connection. Try to call getKeepAlive()/setKeepAlive(boolean) methods on the Socket class to control this feature. I have not seen in the documentation if the connected socket is, by default, keepalived or not. This is, by far, a commonly used option in a server, as it allows to disconnect the clients that lose connections without telling to the server.

In my experience, the cause for this exception occurring for a connected socket was always due to a firewall closing connections that had been idle for too long. I've seen it happen in cloud evironments (AWS, Rackspace) in particular, but it's not limited to that. Most likely, you have some kind of firewall between the 2 connection peers, which closes idle connections after some time.
The best fix in an ideal world is to change the firewall configuration, provided you or an operations team has access to it. In any case, it's better if you can handle that use case in your code and gracefully terminate the communication with the other peer.

Because the CLOSE_WAIT state is for a FI waiting for its corresponding FIN from the peer and that is not the case here.
This TO is most probably configurable

Related

Apache Mina, How to detect when you're sending messages using an invalid socket to the client side?

I have a server setup using MINA version 2.
I don't have much experience with sockets and tcp.
The problem is if I make a connection to my server, and then unplug my internet and close the connection, (Server doesn't get notification of the connection being closed) the server will forever think that my connection is still active and valid.
The server will continue to send messages to my connection, and doesn't throw any exceptions even though there is nothing on my computer binded to the local port.
How can I test that the connection still exists?
I've tried running MINA logging in debug mode, and logging the
IoSession.isConnected() IoSession.isActive IoSession.isClosing
They always return true, true, false. Also, in debug mode, there was no useful information stating that the connection was lost. It just logged the regular "sent message" stuff, as if there was nothing wrong.
From using Flash actionscript, I have had experiences where flash will throw errors that it's operating on an invalid socket. That leads me to believe that it's saying the socket on the server is no longer valid for the connection. So in other words if flash can detect invalid sockets, a Java server should be able to detect it too correct?
If there is truly no way to detect dead connections, I can always setup a connection keep alive routine where the client is constantly sending an "I'm here" message to the server, and the server closes sessions that havent had an incoming message for a period of seconds.
EDIT: After learning that "sockets" are private and never shared over the network I managed to find better results for my issue and I found this SO thread.
Java socket API: How to tell if a connection has been closed?
Unfortunately
IOException 'Connection reset by peer' Doesn't occur when I write to
the IoSession in MINA.
Edit:
Is there any way at all in Java to detect when an ACK to a TCP packet was not received after sending a packet? An ACK Timeout?
Edit:
Yet apparantly, my computer should send a RST to the server? According to this answer. https://stackoverflow.com/a/1434592/4425643
But that seems like a bad way of port scanning. Is this how port scanning works? Port scanners send data to a port and the victim's service responds with a RST? Sorry I think I need a new question for all this. But it's odd that MINA doesn't throw connection reset by peer when it sends data. So then my computer doesn't send a RST.
The concept of socket or connection in Internet protocols is an illusion. It's a convenient abstraction that is provided to you by the operating system and the TCP stack, but in reality, it's all fake.
Under the hood, everything on the Internet takes the form of individual packets.
From the perspective of a computer sending packets to another computer, there is no built-in way to know whether that computer is actually receiving the packets, unless that computer (or some other computer in between, like a router) tells you that the packets were, or were not, received.
From the perspective of a computer expecting to receive packets from another computer, there is no way to know in advance whether any packets are coming, will ever come, or in what order -- until they actually arrive. And once they arrive, just the fact that you received one packet does not mean you'll receive any more in the future.
That's why I say connections or sockets are an illusion. The way that the operating system determines whether a connection is "alive" or not, is simply by waiting an arbitrary amount of time. After that amount of time -- called a timeout -- if one side of the TCP connection doesn't hear back from the other side, it will just assume that the other end has been disconnected, and arbitrarily set the connection status to "closed", "dead" or "terminated" ("timed out").
So:
Your server has no clue that you've pulled the plug on your Internet connection. It has no way of knowing that.
Your server's TCP stack has been configured a certain way to wait an arbitrary amount of time before "giving up" on the other end if no response is received. If this timeout is set to a very large period of time, it may appear to you that your server is hanging on to connections that are no longer valid. If this bothers you, you should look into ways to decrease the timeout interval.
Analogy: If you are on a phone call with someone, and there's a very real risk of them being hurt or killed, and you are talking to them and getting them to answer, and then the phone suddenly goes dead..... Well, how long do you wait? At what point do you assume the other person has been hurt or killed? If you wait a couple milliseconds, in most cases that's too short of a "timeout", because the other person could just be listening and thinking of how to respond. If you wait for 50 years, the person might be long dead by then. So you have to set a reasonable timeout value that makes sense.
What you want is a KeepAlive, heartbeat, or ping.
As per #allquicatic's answer, there's no completely reliable built-in method to do this in TCP. You'll have to implement a method to explicitly ask the client "Are you still there?" and await an answer for a specified amount of time.
https://en.wikipedia.org/wiki/Keepalive
A keepalive (KA) is a message sent by one device to another to check that the link between the two is operating, or to prevent this link from being broken.
https://en.wikipedia.org/wiki/Heartbeat_(computing)
In computer science, a heartbeat is a periodic signal generated by hardware or software to indicate normal operation or to synchronize other parts of a system.[1] Usually a heartbeat is sent between machines at a regular interval in the order of seconds. If a heartbeat isn't received for a time—usually a few heartbeat intervals—the machine that should have sent the heartbeat is assumed to have failed.[2]
The easiest way to implement one is to periodically send an arbitrary piece of data - e.g. a null command. A properly programmed TCP stack will timeout if an ACK is not received within its specified timeout period, and then you'll get a IOException 'Connection reset by peer'
You may have to manually tune the TCP parameters, or implement your own functionality if you want more fine-grained control than the default timeout.
The TCP framework is not exposed to Java. And Java does not provide a means to edit TCP configuration that exists on the OS level.
This means we cannot use TCP keep alive in Java efficiently because we can't change its default configuration values. Furthermore we can't set the timeout for not receiving an ACK for a message sent. (Learn about TCP to discover that every message sent will wait for an ACK (acknowledgement) from the peer that the message has been successfully delivered.)
Java can only throw exceptions for cases such as a timeout for not completing the TCP handshake in a custom amount of time, a 'Connection Reset by Peer' exception when a RST is received from the peer, and an exception for an ACK timeout after whatever period of time that may be.
To dependably track connection status, you must implement your own Ping/Pong, Keep Alive, or Heartbeat system as #Dog suggested in his answer. (The server must poll the client to see if it's still there, or the client has to continuosly let the server know it's still there.)
For example, configure your client to send a small packet every 10 seconds.
In MINA, you can set a session reader idle timeout, which will send an event when a session reader has been idle for a period of time. You can terminate that connection on delivery of this event. Setting the reader timeout to be a bit longer than the small packet interval will account for random high latency between the client and server. For example, a reader idle timeout of 15 seconds would be lenient in this case.
If your server will rarely experience session idling, and you think you can save bandwidth by polling the client when the session has gone idle, look into using the Apache MINA Keep Alive Filter.
https://mina.apache.org/mina-project/apidocs/org/apache/mina/filter/keepalive/KeepAliveFilter.html

How to identify a broken socket connection in Java immediately?

I have a typical java client and a server. The client sends some request to the server and waits for the response. The client reads up to say 100 bytes of data from the contained input stream into an array of bytes. It waits for the complete response of 100 bytes to be read within a specified timeout period of say 3 secs. The problem here is to identify if the server went down or crashed while/before writing the response. Basically, we need to identify if the socket was broken or the peer disconnected for some reason. Is there a way to identify this?
How to identify a broken socket connection in Java immediately?
You can't detect it immediately, in Java or any other language. TCP/IP doesn't know, so Java can't know. The only sure way to detect a broken TCP connection is by writing to it and catching IOExceptions, and they won't happen immediately.
The best way to identity the connection is down is to timeout the connection. i.e. you expect a response in a given amount of time and flag if that response does not come as you expect.
When you have a graceful disconnection (.e.g the other end calls close()) the read on the connection will let you know once the buffer has been drained.
However, if there some other type of failure, you might not be notified until the OS times out the connection (e.g. after 3 minutes) and indeed, you may want to keep the connection. e.g. if you pull the network cable out for 10 seconds and put it back in, that doesn't need to be a failure.
EDIT: I don't believe its a good idea to be too aggressive in automatically handling connection/service "failures". This is usually better handled by a planned fix to the system, based on investigation of the true cause. e.g. increased bandwidth, redundant connectivity, faster servers, code fixes.
If connection is broken abnormally, you will receieve IOException when reading; that normally happens quite fast, but there is no guarantees about time - all depends on the OS, network hardware, etc. If remote end gracefully closes the socket, you'll read -1 as next byte.
Assuming everything else works, if the remote peer - the TCP server - was killed then the TCP client will normally receive a TCP RST (reset) and you'll get an IOException in your client application.
However, there are lots of other things that can go wrong besides a process being killed. Basically anything on the network path between the two processes: a cable is yanked, a router dies, a firewall dies, etc. All of this will not immediately be detected.
For the above reasons the general rule is - as pointed out in the answer from EJP - that a broken connection can only be detected by writing to it. This is why it is always recommended that a TCP client and TCP server exchange some type of heartbeat messages at regular intervals. There are different ways to do this. I like best the method where the TCP client will - in the absence of data being received from the TCP server - send a heartbeat message to the server and expect a reply back within a certain time period. This way heartbeat messages will only be sent when really needed.
A sub-optimal approach - if you cannot implement true heartbeating - is to always read with a timeout. Set the timeout on the socket and then catch java.net.SocketTimeoutException. This will allow you to know that no data has been received on socket during x milliseconds.
It should be mentioned that there's one scenario where you don't have to use heartbeating, nor using the socket timeout: if the TCP client and the TCP server communicate over a loopback interface then a broken connection will always be propagated to both the TCP client application and the TCP server application. This is because, in this case, there's really no network infrastructure between the two processes. So if you have an existing application which isn't well-designed with respect to its TCP communication (i.e. it doesn't implement some form of heartbeating or at least reading with a timeout), then as a last resort you may 'fix' the problem by moving the two application onto the same host and let them communicate over the loopback interface.

Why am I getting a SocketException in a long running application?

I have written a Java socket server application which is giving me error if i run it for long time say 4-8hrs, below is the list of error i get:
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:130)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:282)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:324)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:176)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:153)
at java.io.BufferedReader.readLine(BufferedReader.java:316)
at java.io.BufferedReader.readLine(BufferedReader.java:379)
at LiveRate.processData(LiveRate.java:224)
at LiveRate.mainLiveRate(LiveRate.java:265)
at LiveRate.liveRate(LiveRate.java:126)
at LiveRate.run(LiveRate.java:119)
at java.lang.Thread.run(Thread.java:636)
My socket application reads some values from another TCP/IP server and stores the value temporarily and offers the same to other client.Not sure If these error are because of Heavyload on the system or because of the Memory issues.Please help
It is probably neither (directly) load or memory related. Instead, it is more likely to be one of the following:
the remote service is shut down / falls over and is restarted on a regular basis,
the remote service has decided to close its end of the connection because it is "idle",
network connectivity is intermittent and you are occasionally encountering an outage or congestion-induced "brownout" that is too long,
you are using NAT or similar, and the port number that was being used for the connection has been reclaimed by the NAT gateway, or
something is enforcing some policy about TCP/IP connections being open for too long.
The bottom line is that your client software needs to be able to cope with lost connections if you want ti to run for extended periods of time. This is the way that the internet works.
I'd say it's because your connection gets reseted by your Internet Provider every 24 hours.

Socket open and close on 1 sec or to hold open

I need to have periodically communication with plc ( on every 1 sec ), I send message and I receive message. I use Socket class for this communication. Do I need to every 1 sec to open connection ( socket=new Socket(ipaddress, port) ), send messagethen socket.close() and so on , or to hold socket opet all time ?
I'll assume you're talking about TCP sockets here...
Apart from the obvious inefficiencies involved in setting up a TCP connection every second you're also likely to end up accumulating sockets in TIME_WAIT (hopefully on your client).
I've written about TIME_WAIT and the problems it causes with regards to server scalability and stability here on my blog: http://www.serverframework.com/asynchronousevents/2011/01/time-wait-and-its-design-implications-for-protocols-and-scalable-servers.html
Given the rate that you are opening and closing sockets (once a second would result in 240 (60*4) sockets sitting in TIME_WAIT in the normal (4 minute) 2MSL TIME_WAIT period) this shouldn't prove too much of a problem assuming the TIME_WAIT sockets ARE ending up on the client and not on the server and assuming you're not connecting to lots of servers every second, but... If you have many clients connecting to your server every second and you are not making sure that your server doesn't accumulate sockets in TIME_WAIT state then you may limit your server's scalability.
The alternative is to hold the socket connection open and only reopen it if and when it gets disrupted. This may prove slightly more complex for you to initially program but pooling the connection in this way is likely to be considerably more efficient (when you DO need to send data you just send the data and don't need to go through the TCP handshake to set the connection up) and much more resource efficient on the client; you're not perpetually holding 240 sockets in TIME_WAIT...
Keeping the socket always connected will reduce network traffic and computation time of the client. However, if the server uses blocking I/O it may run out of connection threads if many clients are remaining connected. You will also have to deal with dropped connections due to timeout, network issues and server downtime.
You can have perpetually connected clients wherein the client is always connected to the server. But the performance of this approach depends on how the server is implemented. If it uses a threaded model (one thread per client connection), you might find yourself running out of resources when handling a lot of client connections. You should be good to go with the perpetual client approach if your server uses a event based approach for handling requests as long as the "computation" isn't long lived.
As always, benchmark based on your use cases and you should be good to go.

Detecting TCP dropout over an unreliable network

I am doing some experimentation over an unreliable radio network (home brewed) using very rudimentary java socket programming to transfer messages back and forth between the end nodes.
The setup is as follows:
Node A --- Relay Node --- Node B
One problem I am constantly running into is that somehow the connection drops out and neither Node A or B knows that the link is dead, and yet continues to transmit data. The TCP connection does not time out either. I have added in a heartbeat message that causes a timeout after a while, but I still would like to know what is the underlying cause of why TCP does not time out.
Here are the options I am enabling when setting up a socket:
channel.socket().setKeepAlive(false);
channel.socket().setTrafficClass(0x08); // for max throughput
This behavior is strange since it is totally different than when I have a wired network. On a wired network, I can simulate a disconnected connection by pulling out the ethernet cord, however, once I plug the cord back in, the connection becomes restablished and messages begin to be passed through once more.
On the radio network, the connection is never reestablished and once it silently dies, the messages never resume.
Is there some other unknown java implentation or setting for a socket that I can use, also, why am I seeing this behavior in the first place?
And yes, before anyone says anything, I know TCP is not the preffered choice over an unreliable network, but in this case I wanted to ensure no packet loss.
The TCP protocol was designed to be quiet. The RFC requires keepalive heartbeat no more frequent than 2 hours. Unless you have control over the system on both ends to change the default 2 hour heartbeat (sometimes, it requires kernel rebuild), you have to add heartbeat in your own app.
If you send heartbeat, it still needs to wait till Retransmit Timeout, which varies depending on the RTT. On a high-latency network, the timeout can be very high but it should be within minutes.
You get notification on local network because the system can detect link-down status and drop all connections on that network.
BTW, you want set Keepalive to TRUE, instead of false. With Keepalive, you at least get the slow heartbeat.
In the OSI 7-layer model, the first two layers are physical and data link. Your physical hardware running the data link protocol on wired ethernet can detect when the cable is pulled. Your wireless hardware, and corresponding protocol, probably not so much. The TCP stack can't do anything to timeout if the layer1/2 stuff isn't signaling that it is disconnected.
Define 'never'?
I expect you will be notified by a send failing eventually. You're probably just expecting to be notified sooner than you will be. The TCP stack will be retransmitting segments that it doesn't get ACKs for and the timeout before retransmission for each attempt is doubled each time it retransmits. Depending on how the stack is working out when to retransmit it's probably going to be longer than you're expecting before the stack will decide that the connection is broken and only then will it let you know.
See here: http://www.ietf.org/rfc/rfc2988.txt, here: http://msdn.microsoft.com/en-us/library/ms819737.aspx, etc.
You're used to having a wired network where the drivers can notify higher level layers that the connection has been physically broken. If you were to configure a wired network to route via a router which you then deliberately set up to not route correctly then you'd probably see similar behaviour....

Categories