Handling "Connection reset by peer" error in an FTP client - java

I have a Java program that calculates some stats daily and uploads the file on a server through FTP. However, I get "Connection reset by peer" errors way too often.
Since I cannot change the server configurations, what are the recommended ways to handle such types of errors? How can I make sure that the whole file is transferred to the server?

The message "Connection reset by peer" means the server closed the connection. The cause could be a TCP timeout, a lack of disk space, ETC.
Try transferring the file using FTP without using Java, using a command line utility.If the same problem occurs, it is definitely not the Java program.
Make sure the network is not sensitive to the size of file(s) being transferred.
Make sure the server is not blocking connections from your client after it has already made "N" previous connections or after a certain length of time, E.G. 20 minutes.
See if your client can establish a persistent TCP connection using another protocol: SSH, etc. If the problem occurs with the other protocol also, it's likely to be the network.
If you find the issue is caused by a timeout that would only happen if your connection was idle too long, then check this URL:
FTP: "Connection reset by peer"

Related

Controlling java socket's connection

I'm writing a client-server application, using java TCP sockets.
Client and server are connected by a socket.
Sometimes server has to write a reply message for the client on this socket.
But in that moment, client's socket could be closed, not using close() method, but closing client's application.
Can you tell me, how server can recognize this situation, and avoid writing his reply message on this socket?
This is impossible to do reliably. If you establish that a connection is open, by the time you get around to writing to it, it may have been closed. The reliable solution is to attempt the write, and handle any errors that may result.
Note that if you do get an error indication, there is no saying how much data got to the remote peer. If you perform two writes, and the second write gets an error indication, it is quite possible that the remote peer shut down before the first write but the local peer only noticed it during the second write.

Meaning of "java.io.IOException: Connection timed out" after connect phase

Could be related: Difference between Connection timed out and Read timed out
I have written a java server application using nio.
I connected a client to my server application and unplugged the network cable of the client. On the server side, I didn't get any exception immediately but after some time (8 minutes or so), I got a "IOException: Connection timed out"
Here is a partial stack trace:
java.io.IOException: Connection timed out
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225)
at sun.nio.ch.IOUtil.read(IOUtil.java:198)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375)
........
Till this time, when I saw the netstat output, I see that the socket state of this particular client connection is shown as ESTABLISHED.
Questions are:
Is this timeout configurable?
Why does the netstat output show the socket state as ESTABLISHED? Ideally it should be CLOSE_WAIT (as the client got disconnected)
No it is not configurable. It is the result of retransmit timeouts. It wouldn't happen at all unless the application kept writing, or had pending writes when the disconnect happened.
It shouldn't be CLOSE_WAIT, as no FIN had been received. Ergo it should be ESTABLISHED.
That timeout is generally not configurable as it depends on the possibilities offered by the operating system. Unix in general does not allow a process to fix the connection timeout and generally it is fixed to around two minutes. Perhaps some versions of linux/BSD systems allow this to be configured, but that's not portable and normally is not allowed to fix it to the user (only the administrator). This has to do with the number of retransmissions and the timeouts used for each try, and is under the exclusive control of the TCP implementation.
When you finish a connection you pass through two states (FIN_WAIT and TIME_WAIT) that are not timeout states. The first of two is to get the other end's response (you can close your side of the connection telling the other side you are not going to send more data, but you have to wait for the other end to do the same thing) The TIME_WAIT is a special state that the kernel maintains for a closed connection to process (and discard) all the possible retransmissions of the last frames that can be in course after the connection is closed. They have nothing to do with timeouts.
A tcp connection has no timeout implicit. Two machines can pass weeks without interchanging any info if they have nothing to transmit. You can control the use of some kind of heartbeat between silenting connections to check their liveness with one socket option (SO_KEEPALIVE) This option makes the tcps at both sides to interchange empty packets to know if the other side is still alive. Again, you can only control the use of this packets, not the frequency or the number of lost frames that closes the connection (this can be configured in linux, but touching the kernel configuration only in administrator mode)
Note 1 (answer to #Krishna Chaitanya P)
If you unplugged the cable and got an exception some time later, it can be one of two reasons for that to happen:
You continued writing to that connection and the sending buffer filled up without being acknowledged in time (this is rare, as normally your process get blocked in write(2) system call when this happens) and some timeout (in the java implementation of socket) did occur.
Your java implementation of tcp socket uses the SO_KEEPALIVE option (the most probable thing). As I said before, you have boolean control to use or not use it, but you cannot adjust the time between keepalives or the number of them that drops your connection. Try to call getKeepAlive()/setKeepAlive(boolean) methods on the Socket class to control this feature. I have not seen in the documentation if the connected socket is, by default, keepalived or not. This is, by far, a commonly used option in a server, as it allows to disconnect the clients that lose connections without telling to the server.
In my experience, the cause for this exception occurring for a connected socket was always due to a firewall closing connections that had been idle for too long. I've seen it happen in cloud evironments (AWS, Rackspace) in particular, but it's not limited to that. Most likely, you have some kind of firewall between the 2 connection peers, which closes idle connections after some time.
The best fix in an ideal world is to change the firewall configuration, provided you or an operations team has access to it. In any case, it's better if you can handle that use case in your code and gracefully terminate the communication with the other peer.
Because the CLOSE_WAIT state is for a FI waiting for its corresponding FIN from the peer and that is not the case here.
This TO is most probably configurable

FTP: When to send the code 226 after initiating a file transfer?

From a FTP server perspective, if a client requests a file through RETR command, the server creates a data connection (socket) to the client through the specified port and starts the transfer by writing in the outputstream. The server is coded (JAVA) in such a way that after the write is complete in the socket, the outputstream is flushed and then the socket is closed. After this the code "226" is sent to the client in control channel.
Since the connection is over a very slow network, the 226 message reaches before the actual data transfer is complete. This is a tricky situation where the client code cannot be changed and the server has to make sure that the 226 is sent after client received the data.
I tried searching in the internet and got few inputs, but not sure which one is the standard.
1. to use setSoLinger() method to turn on SO_LINGER and to set timeout.
2. to introduce a delay after writing each byte in to socket (performance will be impacted for fast connections).
Is there any other options other than the above to solve the situation. Any idea about the standard followed for sending 226 in Linux/ Solaris/Windows FTP Servers.
I could see a similar thread in stackoverflow "When should 226 be sent from the FTP server?" , but could not find much info from that related to my question.
Help here is really appreciated...Thanks
Do not go with the delay for sure, the only thing I can think of is that you build a proxy layer that intercepts the acknowledge code, checks for the file, and reroutes the code to the application, sort of what telerik fiddler does as an application.
The same concept I used before with the JMS acknowledge mode when delivering messages to the server and I had to implement the same.
Wish you all luck my friend

Commons FTPClient hangs after uploading large a file

I'm using Apache Commons FTPClient 3.1 to do a simple file upload.
storefile() works fine for files of smaller sizes (under 100MB), but when I try uploading something greater than 100MB, it would finish uploading but just hang.
I've tried entering passive mode like others have suggested, but it doesn't seem to fix the problem. I've tried multiple FTP servers with the same results, so I'm guessing it's not the host.
Here's the gist of what I'm doing:
ftpClient.connect(...);
ftpClient.login(...);
ftpClient.enterLocalPassiveMode();
boolean success = ftpClient.storeFile(...);
if(success)
...
The program hangs at line 4 for large files, but does successfully upload the file.
https://commons.apache.org/proper/commons-net/apidocs/org/apache/commons/net/ftp/FTPClient.html
Its timing out. This link may help.
Control channel keep-alive feature:
During file transfers, the data connection is busy, but the control connection is idle. FTP servers know that the control connection is in use, so won't close it through lack of activity, but it's a lot harder for network routers to know that the control and data connections are associated with each other. Some routers may treat the control connection as idle, and disconnect it if the transfer over the data connection takes longer than the allowable idle time for the router.
One solution to this is to send a safe command (i.e. NOOP) over the control connection to reset the router's idle timer. This is enabled as follows:
ftpClient.setControlKeepAliveTimeout(300); // set timeout to 5 minutes
This will cause the file upload/download methods to send a NOOP approximately every 5 minutes.

Why am I getting a SocketException in a long running application?

I have written a Java socket server application which is giving me error if i run it for long time say 4-8hrs, below is the list of error i get:
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:130)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:282)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:324)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:176)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:153)
at java.io.BufferedReader.readLine(BufferedReader.java:316)
at java.io.BufferedReader.readLine(BufferedReader.java:379)
at LiveRate.processData(LiveRate.java:224)
at LiveRate.mainLiveRate(LiveRate.java:265)
at LiveRate.liveRate(LiveRate.java:126)
at LiveRate.run(LiveRate.java:119)
at java.lang.Thread.run(Thread.java:636)
My socket application reads some values from another TCP/IP server and stores the value temporarily and offers the same to other client.Not sure If these error are because of Heavyload on the system or because of the Memory issues.Please help
It is probably neither (directly) load or memory related. Instead, it is more likely to be one of the following:
the remote service is shut down / falls over and is restarted on a regular basis,
the remote service has decided to close its end of the connection because it is "idle",
network connectivity is intermittent and you are occasionally encountering an outage or congestion-induced "brownout" that is too long,
you are using NAT or similar, and the port number that was being used for the connection has been reclaimed by the NAT gateway, or
something is enforcing some policy about TCP/IP connections being open for too long.
The bottom line is that your client software needs to be able to cope with lost connections if you want ti to run for extended periods of time. This is the way that the internet works.
I'd say it's because your connection gets reseted by your Internet Provider every 24 hours.

Categories