Why am I getting a SocketException in a long running application?

Why am I getting a SocketException in a long running application? - java

I have written a Java socket server application which is giving me error if i run it for long time say 4-8hrs, below is the list of error i get:
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:130)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:282)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:324)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:176)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:153)
at java.io.BufferedReader.readLine(BufferedReader.java:316)
at java.io.BufferedReader.readLine(BufferedReader.java:379)
at LiveRate.processData(LiveRate.java:224)
at LiveRate.mainLiveRate(LiveRate.java:265)
at LiveRate.liveRate(LiveRate.java:126)
at LiveRate.run(LiveRate.java:119)
at java.lang.Thread.run(Thread.java:636)
My socket application reads some values from another TCP/IP server and stores the value temporarily and offers the same to other client.Not sure If these error are because of Heavyload on the system or because of the Memory issues.Please help

It is probably neither (directly) load or memory related. Instead, it is more likely to be one of the following:
the remote service is shut down / falls over and is restarted on a regular basis,
the remote service has decided to close its end of the connection because it is "idle",
network connectivity is intermittent and you are occasionally encountering an outage or congestion-induced "brownout" that is too long,
you are using NAT or similar, and the port number that was being used for the connection has been reclaimed by the NAT gateway, or
something is enforcing some policy about TCP/IP connections being open for too long.
The bottom line is that your client software needs to be able to cope with lost connections if you want ti to run for extended periods of time. This is the way that the internet works.

I'd say it's because your connection gets reseted by your Internet Provider every 24 hours.

Related

Meaning of "java.io.IOException: Connection timed out" after connect phase

Could be related: Difference between Connection timed out and Read timed out
I have written a java server application using nio.
I connected a client to my server application and unplugged the network cable of the client. On the server side, I didn't get any exception immediately but after some time (8 minutes or so), I got a "IOException: Connection timed out"
Here is a partial stack trace:
java.io.IOException: Connection timed out
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225)
at sun.nio.ch.IOUtil.read(IOUtil.java:198)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375)
........
Till this time, when I saw the netstat output, I see that the socket state of this particular client connection is shown as ESTABLISHED.
Questions are:
Is this timeout configurable?
Why does the netstat output show the socket state as ESTABLISHED? Ideally it should be CLOSE_WAIT (as the client got disconnected)

No it is not configurable. It is the result of retransmit timeouts. It wouldn't happen at all unless the application kept writing, or had pending writes when the disconnect happened.
It shouldn't be CLOSE_WAIT, as no FIN had been received. Ergo it should be ESTABLISHED.

That timeout is generally not configurable as it depends on the possibilities offered by the operating system. Unix in general does not allow a process to fix the connection timeout and generally it is fixed to around two minutes. Perhaps some versions of linux/BSD systems allow this to be configured, but that's not portable and normally is not allowed to fix it to the user (only the administrator). This has to do with the number of retransmissions and the timeouts used for each try, and is under the exclusive control of the TCP implementation.
When you finish a connection you pass through two states (FIN_WAIT and TIME_WAIT) that are not timeout states. The first of two is to get the other end's response (you can close your side of the connection telling the other side you are not going to send more data, but you have to wait for the other end to do the same thing) The TIME_WAIT is a special state that the kernel maintains for a closed connection to process (and discard) all the possible retransmissions of the last frames that can be in course after the connection is closed. They have nothing to do with timeouts.
A tcp connection has no timeout implicit. Two machines can pass weeks without interchanging any info if they have nothing to transmit. You can control the use of some kind of heartbeat between silenting connections to check their liveness with one socket option (SO_KEEPALIVE) This option makes the tcps at both sides to interchange empty packets to know if the other side is still alive. Again, you can only control the use of this packets, not the frequency or the number of lost frames that closes the connection (this can be configured in linux, but touching the kernel configuration only in administrator mode)
Note 1 (answer to #Krishna Chaitanya P)
If you unplugged the cable and got an exception some time later, it can be one of two reasons for that to happen:
You continued writing to that connection and the sending buffer filled up without being acknowledged in time (this is rare, as normally your process get blocked in write(2) system call when this happens) and some timeout (in the java implementation of socket) did occur.
Your java implementation of tcp socket uses the SO_KEEPALIVE option (the most probable thing). As I said before, you have boolean control to use or not use it, but you cannot adjust the time between keepalives or the number of them that drops your connection. Try to call getKeepAlive()/setKeepAlive(boolean) methods on the Socket class to control this feature. I have not seen in the documentation if the connected socket is, by default, keepalived or not. This is, by far, a commonly used option in a server, as it allows to disconnect the clients that lose connections without telling to the server.

In my experience, the cause for this exception occurring for a connected socket was always due to a firewall closing connections that had been idle for too long. I've seen it happen in cloud evironments (AWS, Rackspace) in particular, but it's not limited to that. Most likely, you have some kind of firewall between the 2 connection peers, which closes idle connections after some time.
The best fix in an ideal world is to change the firewall configuration, provided you or an operations team has access to it. In any case, it's better if you can handle that use case in your code and gracefully terminate the communication with the other peer.

Because the CLOSE_WAIT state is for a FI waiting for its corresponding FIN from the peer and that is not the case here.
This TO is most probably configurable

Handling "Connection reset by peer" error in an FTP client

I have a Java program that calculates some stats daily and uploads the file on a server through FTP. However, I get "Connection reset by peer" errors way too often.
Since I cannot change the server configurations, what are the recommended ways to handle such types of errors? How can I make sure that the whole file is transferred to the server?

The message "Connection reset by peer" means the server closed the connection. The cause could be a TCP timeout, a lack of disk space, ETC.
Try transferring the file using FTP without using Java, using a command line utility.If the same problem occurs, it is definitely not the Java program.
Make sure the network is not sensitive to the size of file(s) being transferred.
Make sure the server is not blocking connections from your client after it has already made "N" previous connections or after a certain length of time, E.G. 20 minutes.
See if your client can establish a persistent TCP connection using another protocol: SSH, etc. If the problem occurs with the other protocol also, it's likely to be the network.
If you find the issue is caused by a timeout that would only happen if your connection was idle too long, then check this URL:
FTP: "Connection reset by peer"

C# / Java Socket Connection Lost?

I am making a C# client / Java server chatroom, and everything works fine now, except for one thing:
After some time (an hour or so) of not using the application (or using it, I don't know) it gives me a SocketException at the C# client at the socket.EndReceive() function:
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
Do Java or C# socket connections closes after some time of idling? Or is it just the tcp protocol?
What would be the best method to fix it?
Thanks all!
Bas

Do Java or C# socket connections closes after some time of idling?
No.
However, firewalls and especially NAT gateways do, often silently.
What would be the best method to fix it?
Implement a heartbeat procedure. i.e. the client and/or server periodically sends (e.g. every 10 or 30 or so seconds) a special message that's just used to keep the connection alive and to faster detect a failed peer.

How to identify a broken socket connection in Java immediately?

I have a typical java client and a server. The client sends some request to the server and waits for the response. The client reads up to say 100 bytes of data from the contained input stream into an array of bytes. It waits for the complete response of 100 bytes to be read within a specified timeout period of say 3 secs. The problem here is to identify if the server went down or crashed while/before writing the response. Basically, we need to identify if the socket was broken or the peer disconnected for some reason. Is there a way to identify this?

How to identify a broken socket connection in Java immediately?
You can't detect it immediately, in Java or any other language. TCP/IP doesn't know, so Java can't know. The only sure way to detect a broken TCP connection is by writing to it and catching IOExceptions, and they won't happen immediately.

The best way to identity the connection is down is to timeout the connection. i.e. you expect a response in a given amount of time and flag if that response does not come as you expect.
When you have a graceful disconnection (.e.g the other end calls close()) the read on the connection will let you know once the buffer has been drained.
However, if there some other type of failure, you might not be notified until the OS times out the connection (e.g. after 3 minutes) and indeed, you may want to keep the connection. e.g. if you pull the network cable out for 10 seconds and put it back in, that doesn't need to be a failure.
EDIT: I don't believe its a good idea to be too aggressive in automatically handling connection/service "failures". This is usually better handled by a planned fix to the system, based on investigation of the true cause. e.g. increased bandwidth, redundant connectivity, faster servers, code fixes.

If connection is broken abnormally, you will receieve IOException when reading; that normally happens quite fast, but there is no guarantees about time - all depends on the OS, network hardware, etc. If remote end gracefully closes the socket, you'll read -1 as next byte.

Assuming everything else works, if the remote peer - the TCP server - was killed then the TCP client will normally receive a TCP RST (reset) and you'll get an IOException in your client application.
However, there are lots of other things that can go wrong besides a process being killed. Basically anything on the network path between the two processes: a cable is yanked, a router dies, a firewall dies, etc. All of this will not immediately be detected.
For the above reasons the general rule is - as pointed out in the answer from EJP - that a broken connection can only be detected by writing to it. This is why it is always recommended that a TCP client and TCP server exchange some type of heartbeat messages at regular intervals. There are different ways to do this. I like best the method where the TCP client will - in the absence of data being received from the TCP server - send a heartbeat message to the server and expect a reply back within a certain time period. This way heartbeat messages will only be sent when really needed.
A sub-optimal approach - if you cannot implement true heartbeating - is to always read with a timeout. Set the timeout on the socket and then catch java.net.SocketTimeoutException. This will allow you to know that no data has been received on socket during x milliseconds.
It should be mentioned that there's one scenario where you don't have to use heartbeating, nor using the socket timeout: if the TCP client and the TCP server communicate over a loopback interface then a broken connection will always be propagated to both the TCP client application and the TCP server application. This is because, in this case, there's really no network infrastructure between the two processes. So if you have an existing application which isn't well-designed with respect to its TCP communication (i.e. it doesn't implement some form of heartbeating or at least reading with a timeout), then as a last resort you may 'fix' the problem by moving the two application onto the same host and let them communicate over the loopback interface.

Sockets in CLOSE_WAIT from Jersey Client

I am using Jersey 1.4, the ApacheHttpClient, and the Apache MultiThreadedHttpConnectionManager class to manage connections. For the HttpConnectionManager, I set staleCheckingEnabled to true, maxConnectionsPerHost to 1000 and maxTotalConnections to 1000. Everything else is default. We are running in Tomcat and making connections out to multiple external hosts using the Jersey client.
I have noticed that after after a short period of time I will begin to see sockets in a CLOSE_WAIT state that are associated with the Tomcat process. Some monitoring with tcpdump shows that the external hosts appear to be closing the connection after some time but it's not getting closed on our end. Usually there is some data in the socket read queue, often 24 bytes. The connections are using https and the data seems to be encrypted so I'm not sure what it is.
I have checked to be sure that the ClientRequest objects that get created are closed. The sockets in CLOSE_WAIT do seem to get recycled and we're not running out of any resources, at least at this time. I'm not sure what's happening on the external servers.
My question is, is this normal and should I be concerned?
Thanks,
John

This is likely to be a device such as the firewall or the remote server timing out the TCP session. You can analyze packet captures of HTTPS using Wireshark as described on their SSL page:
http://wiki.wireshark.org/SSL
The staleCheckingEnabled flag only issues the check when you go to actually use the connection so you aren't using network resources (TCP sessions) when they aren't needed.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.