Why does DefaultHttpClient send data over a half-closed socket?

Why does DefaultHttpClient send data over a half-closed socket? - java

I'm using DefaultHttpClient with a ThreadSafeClientConnManager on Android (2.3.x) to send HTTP requests to a my REST server (embedded Jetty).
After ~200 seconds of idle time, the server closes the TCP connection with a [FIN]. The Android client responds with an [ACK]. This should and does leave the socket in a half-closed state (server is still listening, but can't send data).
I would expect that when the client tries to use that connection again (via HttpClient.execute), DefaultHttpClient would detect the half-closed state, close the socket on the client side (thus sending it's [FIN/ACK] to finalize the close), and open a new connection for the request. But, there's the rub.
Instead, it sends the new HTTP request over the half-closed socket. Only after sending is the half-closed state detected and the socket closed on the client-side (with the [FIN] sent to the server). Of course, the server can't respond to the request (it had already sent its [FIN]), so the client thinks the request failed and automatically retries via a new socket/connection.
The end result is that server sees and processes two copies of the request.
Any ideas on how to fix this? (My server does the correct thing with the second copy, but I'm annoyed that the payload is transmitted twice.)
Shouldn't DefaultHttpClient detect that the socket was closed when it first tries to write the new HTTP packet, close that socket immediately, and start a new one? I'm baffled as to how a new HTTP request is sent on a socket minutes after the server sent a [FIN].

This is a general limitation of the blocking I/O in Java. There is simply no way of finding out whether or not the opposite endpoint has closed connection other than by attempting to read from the socket. Apache HttpClient works this problem around by employing the so stale connection check which is essentially a very brief read operation. However, the check can and often is disabled. In fact it is often advisable to have it disabled due to extra latency the check introduces. I have no idea how exactly the version of HttpClient shipped with Android behaves in this regard but you could try explicitly enabling the check by using an appropriate config parameter.
A better solution to this problem might be evicting connections from the connection pool that have been idle over a particular period of time (say 150 seconds) after a period of inactivity.
http://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html#d5e652

Related

Jersey/Servlet Serverside handling of network failures

-- EDIT: --
To rephrase the question.
Does HTTP know anything about the status of underlying TCP connection?
TCP is a reliable protocol. When the server sends data to the client it expects an acknowledgment signal from that client. What happens in HTTP when the underlying server side TCP connection fails to receive the ACK signal?
-- ORIGINAL Question: --
I am trying to solve a design issue on our HTTP client/server app.
Here is the situation:
The server runs on Tomcat, and we are somewhat limited to using Jersey or Servlets for the server side implementation.
The client requests data from the Server, which once read is deleted.
Data must not be deleted if the client has not received it.
There is no confirmation from the client if the data is received or not.
The client impl cannot be changed in any way.
The network connection is unstable and can be interrupted for long periods of time (e.g. 30 sec.) and also often.
The problem: if the client made a request and shortly after lost connection to the server, the server will not recognize this and it will delete and send the data to the client over the dead connection.
Ideally, we want to get an IOException when flushing the data stream to the client and handle it accordingly:
try (ServletOutputStream outputStream = httpServletResponse.getOutputStream()) {
outputStream.write(bytes);
outputStream.flush();
} catch (Exception e) {
// TODO: do something ...
}
I simulated this locally by killing the client shortly after sending the request or by setting a very low client read timeout value. In both cases I got a server side exception (with bioth Jersey and Servlets).
The last test was sending a request over a network and pulling the network cable in the process.
Unfortunately I did not get the expected result. The server streamed the data back without recognizing the interrupted connection.
So, does anyone have an idea how to force a Server side exception when the connection to the client is broken?
Any other ideas that don't involve using Sockets or confirmation calls from the client?
Thanks in advance!

Instead of deleting the file in real time, you can write a message on a queue in order to delete it later. The delete would have to check a database where you write if the client received the file completely.

I don't think there's a way to know for certain whether the data arrived to the client unless the client sends an acknowledgement message.
The only solution seems to be not actually deleting the data, but keeping it and setting a 'deleted' flag. But since I don't know the particular use case, I'm not sure if this helps...

TCP is a two way protocol.
If you set up an input stream and call InputStream.read(), this should return -1 if the client has disconnected.
More detail here:
Java Sockets: check if client is able to receive message from server

Apache HTTP client timeout

I am using Apache HTTP client to contact an external service. The service can take a few hours, if not longer, to generate its response. I've tried a few different things but have either ended up with socket or read timeouts. I've just tried using the RequestConfig to set the socket and connection timeout to 0 which according to the documentation should be infinite but the request always returns after exactly 1 hour. Any thoughts?

I agree with general sentiments about not trying to keep HTTP connections alive so long, however, if your hands are tied, you may find you are hitting timeouts in TCP and TCP level keep-alives may save the day.
See this link for help setting TCP keep-alive, you cannot do it in HttpClient its an OS thing, this will send ACKs regularly so your TCP connection is never idle even if nothing is going on in the HTTP stream.
Apache HttpClient TCP Keep-Alive (socket keep-alive)
Holding TCP connections for a long time even if they are active is hard. YMMV.

Ideally, any service that takes more then few minutes(2-3 minutes+ or so), should be handled asynchronously, instead keeping connection open for an hour or so long. It is waste of resources both client and server side.
Alternate approaches could be to solve these kind of problems.
You call the service to trigger processing(to prepare response). It may return you some unique request ID.
Then after an hour or so(once response is ready with response), either client request again by passing the request ID, and server returns the Response.
Other alternate approach could be, once response it ready, it pushes back the response to Callback URL or something where Client host another service specifically for receiving the response prepared by the server(step#1).

HTTP POST in Flash - TCP connection closed by client before response

I am encountering an interesting issue wherein a TCP connection for a HTTP 1.1 POST request is being closed immediately following the request (ie, before the response can be sent by the server).
A few details about the test environment:
Client - Windows XP, Internet Explorer 8, Flash player 12.
Server - Java 7
Prior to the aforementioned behaviour, we have several longstanding TCP connections, each being reused for multiple HTTP requests; we open a long poll and when this poll completes, open another. We see several hours of well behaved and reused TCP connections opening polls as the previous poll closes.
Eventually -- sometimes after 12 or more hours of normal behaviour -- a poll on a long standing connection will send the HTTP POST and immediately send a TCP FIN before the server can write the response.
The client behaviour is to keep a poll open at all times, so at this point we try to open a new poll.
A new TCP connection is then opened by the client sending another HTTP POST, with the same behaviour; the request is sent, followed by a FIN from the client.
This behaviour can continue for several minutes, until the server can finally respond to kill the client. (The server detects the initial closed connection by encountering an IO Exception, the next time it can communicate with the client, the response is to tell the client to close)
Edit: We are opening connections only through the Flash client, and are not delving into low level TCP code. While Steffen Ullrich is correct, and the single sided shutdown is possible and should be dealt with, what is not clear is why a single sided shutdown is occurring at this (seemingly arbitrary) point. We are not calling close from the application to instigate this behaviour.
My questions are:
Under what circumstances would a TCP connection for a HTTP request be terminated prior to the response being received? I understand this is bad behaviour, and an incomplete HTTP transaction, so presumably something lower down is terminating the connection for an unknown reason.
Are there any diagnostics that could be used to help understand the problem? (We are currently monitoring server and client side activity with Wireshark.)
Notes:
In Wireshark, the behaviour we see is:
Longstanding TCP connection (#1) serving multiple HTTP requests.
HTTP request is made over #1.
Server ACKs the request.
Client sends FIN to close connection #1. Server responds with FIN,ACK. (The expected traffic would be the server sending the HTTP response). Around this point the server experiences an IO Exception.
Client opens connection #2 and sends HTTP request.
Behaviour continues as from 3.

Sending a request immediatly followed by a FIN is not a connection close, but shutdown of writing shutdown(socket,SHUT_WR). The client tells the server this way that it will not send any more data, but it might still receive data. It's not that uncommon.

How to disconnect a socket once the socket.emit is done

I am using socket.io-java-client for connecting my java class on the server side to node.js and emit some events.
since I am running this on the server I dont want the socket thread to be running always.
As soon as my emit is done I want to disconnect the socket.
I tried
SocketIO socket=new SocketIO("http://IP:9001");
socket.emit("EVENT", "data");
socket.disconnect();
but this fails because we are closing the socket even before it has sent the message.
Is there any handler for emit success? How can I close the socket after the emit is successful?

After you've sent message to server, server can drop connection from its side. Just on event of receiving specific message it simply can disconnect that client socket.
Or server can additionally send response and client can close him self on receiving this response. But server should secure him self creating timeout in order to close idle clients who did not closed them self.
I recommend to do this operation on server side, and do not ever trust client side with such decisions.
Client can do it additionally after some timeout.
If you use Socket.IO just to send one message and close it after message sent, then there is no point to use Socket.IO as it will have overhead based on handshaking process, and you might consider using just HTTP request in order to send single messages to server.

Detecting loss of connection between server and client

How will the server know of client connection loss? does this trigger an event? is this possible to store code (server side) so that it can execute before the connection loss happen?
This connection loss can happen if:
being idle for too long.
client side terminated.
etc.
This i am asking in particular to Jsp and php.

It depends on the protocol you're talking about, but a "connection" is typically established through a three-way handshake, which causes both parties to simply agree that they're "connected" now. This means both parties remember in a table that there's an open connection to IP a.b.c.d on port x and what context this "connection" is associated with. All incoming data from that "connection" is then passed to the associated context.
That's all there is to it, there's no real "physical" connection; it's just an agreed upon state between two parties.
Depending on the protocol, a connection can be formally terminated with an appropriate packet. One party sends this packet to the other, telling it that the "connection" is terminated; both parties remove the table entries and that's that.
If the connection is interrupted without this packet being sent, neither party will know about it. Only the next time one party tries to send data to the other will this problem become apparent.
Depending on the protocol a connection may automatically be considered stale and terminated if no data was received for a certain amount of time. In this case, a dead connection will be noticed sooner, but requires a constant back and forth of some sort between both parties.
So in short: yes, there is a server event that can be triggered, but it is not guaranteed to be triggered.

When you close a socket, the socket on the other end is notified. However, if the connection is lost ungracefully (e.g. a network cable is unplugged, or a computer loses power), then you probably will not find out.
To deal with this, you can send periodic messages just to check the connection. If the send fails, then the connection has been interrupted. Make sure you set up your sockets to only wait for a reasonable amount of time, though.

If you are talking about a typical client server architecture, server shouldn't bother about the connection to the client. Only client should bother about connection to server. Client should take measures to avoid the connection being dropped like periodically sending a keep alive message or similar to avoid timeout.

Why does server need to bother about connection loss/termination.
Server job is to serve the request which comes from the client. That's it. If client doesn't receive the data it expected from Server then it can take appropriate action. If connection gets disconnected when server is doing some processing for giving data to client; then also server can't do much as http request is initiated by client.
So client can make a new request if for some reason it didn't get response.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.