I'm working on a browser/proxy oriented project where I need to download webpages. After sending a custom HTTP request to a web server I start listening for a server response.
When reading the response, I check the response headers for a Content-Length:-row. If I get one of those, it's easy to determine when the server is done sending data since I always know how many bytes of data I have received.
The problem occurs when the server doesn't include the Content-Length header and also keeps the connection open for further requests. For example, the google server responds with gzipped-content, but doesn't include content length. How do I know when to stop waiting for more data and close the connection?
I have considered using a timeout value to close the connection when no data has been received for a while, but this seems like the wrong way to do it. Chrome for example, can download the same pages as me and always seem to know exactly when to close the connection.
Have a look at IETF RfC 2616, search for chunked encoding and Content-Range.
HTTP is designed to return content of unknown length, as in:
HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked
25
This is the data in the first chunk
1C
and this is the second one
3
con
8
sequence
0
source Wikipedia
I would try to suggest you to force Connection: close header so you are sure that the server closes the connection after output is finished, no matter if the Content-length is set or not. Performance will be partially affected by this
There are two cases you can expect:
1. socket-close
2. socket-timeout
Usually the socket will be closed, it also make sense to declare an Socket Timeout.
Remember
int stream.read(byte[],size);
returns the real size of byte[]-argument's size that has been read till socket-close or socket-timeout (or size-argument reached).
Regards.
Related
I am in the process of sending a HTTP chunked request to an internal system. I've confirmed other factors are not at play by ensuring that I can send small messages without chunk encoding.
My process was basically to change the Transfer-Encoding header to be chunked and I've removed the Content-Length header. Additionally, I am utilising an in-house ChunkedOutputStream which has been around for quite some time.
I am able to connect, obtain an output stream and send the data. The recipient then returns a 200 response so it seems the request was received and successfully handled. The endpoint receives the HTTP Request, and streams the data straight into a table (using HttpServletRequest.getInputStream()).
On inspecting the streamed data I can see that the chunk encoding information in the stream has not been unwrapped/decoded by the Tomcat container automatically. I've been trawling the Tomcat HTTPConnector documentation and can't find anything that alludes to the chunked encoding w.r.t how a chunk encoded message should be handled within a HttpServlet. I can't see other StackOverflow questions querying this so I suspect I am missing something basic.
My question boils down to:
Should Tomcat automatically decode the chunked encoding from my request and give me a "clean" InputStream when I call HttpServletRequest.getInputStream()?
If yes, is there configuration that needs to be updated to enable this functionality? Am I sending something wrong in the headers that is causing it to return the non-decoded stream?
If no, is it common practice to wrap input stream in a ChunkedInputStream or something similar when the Transfer-Encoding header is present ?
This is solved. As expected it was basic in my case.
The legacy system I was using provided handrolled methods to simplify the process of opening a HTTP Connection, sending headers and then using an OutputStream to send the content via a POST. I didn't realise, and it was in a rather obscure location, but the behind-the-scenes helper's we're identifying that I was not specifying a Content-Length thus added the TRANSFER_ENCODING=chunked header and wrapped the OutputStream in a ChunkedOutputStream. This resulted in me double encoding the contents, hence my endpoints (seeming) inability to decode it.
Case closed.
I want to retrieve the server's response as is, with all headers. The first thing that comes to mind is to use raw sockets. As I have learned from the search, there are 3 ways to indicate the end of response:
(1) closing the connection;
(2) examining Content-Length;
(3) getting all chunks in the case of Transfer-Encoding: Chunked.
There is also
(4) the timeout method: assume that the timeout means end of data, but the latter is not really reliable.
I want a general-case solution and do not want to
add a Connection: close line to the request itself.
In addition, it is recommended to use an existing library rather than re-invent the wheel.
Question:
How do I use an existing package, preferably, something already present in Android, to detect the end of HTTP response while having access (without interference) to the raw data stream?
UPD: forgot to mention that the HTTP request is given to me as a sequence of bytes. Yes, it is for testing.
PS
relevant reading:
End of an HTTP Response
Detect the end of an HTTP Request in Java
Detect end of HTTP request body
How HTTP Server inform its clients that the response has ended
Proper handling of chuncked Http Response within Socket
Detect the end of a HTTP packet
Android socket & HTTP response headers
Java HTTP GET response waits until timeout
I suggest to use a the Apache HTTP client package (http://hc.apache.org/httpclient-3.x/ ) so you don't need to implement all the finicky details of the HTTP protocol.
The Apache Http Client will give you access to the headers and their content, which may be enough for you.
If you really need access to the actual character sequence sent by the server (e.g. for debugging purposes), you could then intercept the communication by replacing the connection socket factory with your own to create "intercepting" sockets which store all data transferred in a buffer where your code can access it later on. See http://hc.apache.org/httpcomponents-client-4.3.x/tutorial/html/connmgmt.html#d5e418
I'm building an httpserver as part of my academic java course,
The server should only support basic GET and POST requests.
I was wondering if there's an elegant way to handle an error which occures in the middle of writing an html file stream content (and after I've already sent the response headers) into the HttpServer output stream.
By elegant way I refer to showing or redirecting the user to a "Internal Server Error" error page.
I tried re-sending the http response headers with 501 error code, but java throws an exception which claims that the headers were already sent...
One fix would be to read the file's contents into memory, and only then sending the headers and the content, but other problems can arise, and furthermore, I don't want to load huge files into the memory before sending them out as a response.
Once the response status is sent on the wire, it cannot be changed. So if you sent a 200 OK response, you cannot change your mind afterwards. As you found, this presents a problem in case of errors that occur mid response.
As far as I know, the only think you can do is to send a chunked response. See section 3.6.1 of RFC 2616:
The chunked encoding modifies the body of a message in order to
transfer it as a series of chunks, each with its own size indicator,
followed by an OPTIONAL trailer containing entity-header fields. This
allows dynamically produced content to be transferred along with the
information necessary for the recipient to verify that it has received
the full message.
The purpose of this trailer is to give information about the entity body that cannot be calculated before the entity body is sent. However, section 7.1 allows any header to be included in this trailer:
The extension-header mechanism allows additional entity-header fields
to be defined without changing the protocol, but these fields cannot
be assumed to be recognizable by the recipient. Unrecognized header
fields SHOULD be ignored by the recipient and MUST be forwarded by
transparent proxies.
So while you can signal that an error has occurred mid response, it must be conventioned between the two parts how this is signaled. You cannot, in general, use any method you can assume the client will understand as signaling an error condition.
Ending the connection prematurely in a message with a Content-length header is an option, but one that is explicitly forbidden:
When a Content-Length is given in a message where a message-body is
allowed, its field value MUST exactly match the number of OCTETs in
the message-body. HTTP/1.1 user agents MUST notify the user when an
invalid length is received and detected.
That said, while the server must not send a message shorter than he advertises, the client must check for this error condition and reported as such (and proxies may even cache this partial response).
By elegant way I refer to showing or redirecting the user to a
"Internal Server Error" error page.
If you can't send the 'success' response how are you going to send a different response? All you can do is log it and forget about it.
I want to know the length of the file, so I tried getContentLength(). It works fine with network connection (edge/3g) but returns -1 with WiFi?
Why? The WiFi is good and the file was found, it can be downloaded but the return of getContentLength() is always "-1". I dont understand. file is a google documents file.
Is there an other way to get the length?
My code is:
URL url = new URL(file);
URLConnection conexion = url.openConnection();
conexion.connect();
int poids = conexion.getContentLength();
It may well be the mobile network changing things for you. For example, the mobile network I use shrinks image downloads automatically (and annoyingly). If the network is "transparently" performing the full download before giving you any data, it can fill in the content length for you.
However, you basically shouldn't rely on having the content length... there's nothing to guarantee that it'll be available to you.
The server is probably sending back a HTTP response that is chunked.
The behavior of the getContentLength() method is to return the 'internal' value of the length of the content, that is available to it. When the client receives a HTTP chunked response, the length of the response is not known, and hence the content length value is marked as -1.
The chunked nature of the response can determined by the Transfer-Encoding header value; chunked responses have a value of chunked. HTTP servers need not provide a Content-Length header value if the response is sent via chunked encoding; in fact, servers are encouraged to not send the Content-Length header for a chunked response, for the client is supposed to ignore the Content-Length header.
As for the actual reason on why the server is responding differently in two networks, well it depends on various factors. Usually servers will opt for a more optimal delivery mode, depending on the nature of the client. For some reason, it has detected that it is better off sending chunked responses for one type of a connection. The answer might lie in the HTTP request headers, but not necessarily so.
Let's say I have a java program that makes an HTTP request on a server using HTTP 1.1 and doesn't close the connection. I make one request, and read all data returned from the input stream I have bound to the socket. However, upon making a second request, I get no response from the server (or there's a problem with the stream - it doesn't provide any more input). If I make the requests in order (Request, request, read) it works fine, but (request, read, request, read) doesn't.
Could someone shed some insight onto why this might be happening? (Code snippets follow). No matter what I do, the second read loop's isr_reader.read() only ever returns -1.
try{
connection = new Socket("SomeServer", port);
con_out = connection.getOutputStream();
con_in = connection.getInputStream();
PrintWriter out_writer = new PrintWriter(con_out, false);
out_writer.print("GET http://somesite HTTP/1.1\r\n");
out_writer.print("Host: thehost\r\n");
//out_writer.print("Content-Length: 0\r\n");
out_writer.print("\r\n");
out_writer.flush();
// If we were not interpreting this data as a character stream, we might need to adjust byte ordering here.
InputStreamReader isr_reader = new InputStreamReader(con_in);
char[] streamBuf = new char[8192];
int amountRead;
StringBuilder receivedData = new StringBuilder();
while((amountRead = isr_reader.read(streamBuf)) > 0){
receivedData.append(streamBuf, 0, amountRead);
}
// Response is processed here.
if(connection != null && !connection.isClosed()){
//System.out.println("Connection Still Open...");
out_writer.print("GET http://someSite2\r\n");
out_writer.print("Host: somehost\r\n");
out_writer.print("Connection: close\r\n");
out_writer.print("\r\n");
out_writer.flush();
streamBuf = new char[8192];
amountRead = 0;
receivedData.setLength(0);
while((amountRead = isr_reader.read(streamBuf)) > 0 || amountRead < 1){
if (amountRead > 0)
receivedData.append(streamBuf, 0, amountRead);
}
}
// Process response here
}
Responses to questions:
Yes, I'm receiving chunked responses from the server.
I'm using raw sockets because of an outside restriction.
Apologies for the mess of code - I was rewriting it from memory and seem to have introduced a few bugs.
So the consensus is I have to either do (request, request, read) and let the server close the stream once I hit the end, or, if I do (request, read, request, read) stop before I hit the end of the stream so that the stream isn't closed.
According to your code, the only time you'll even reach the statements dealing with sending the second request is when the server closes the output stream (your input stream) after receiving/responding to the first request.
The reason for that is that your code that is supposed to read only the first response
while((amountRead = isr_reader.read(streamBuf)) > 0) {
receivedData.append(streamBuf, 0, amountRead);
}
will block until the server closes the output stream (i.e., when read returns -1) or until the read timeout on the socket elapses. In the case of the read timeout, an exception will be thrown and you won't even get to sending the second request.
The problem with HTTP responses is that they don't tell you how many bytes to read from the stream until the end of the response. This is not a big deal for HTTP 1.0 responses, because the server simply closes the connection after the response thus enabling you to obtain the response (status line + headers + body) by simply reading everything until the end of the stream.
With HTTP 1.1 persistent connections you can no longer simply read everything until the end of the stream. You first need to read the status line and the headers, line by line, and then, based on the status code and the headers (such as Content-Length) decide how many bytes to read to obtain the response body (if it's present at all). If you do the above properly, your read operations will complete before the connection is closed or a timeout happens, and you will have read exactly the response the server sent. This will enable you to send the next request and then read the second response in exactly the same manner as the first one.
P.S. Request, request, read might be "working" in the sense that your server supports request pipelining and thus, receives and processes both request, and you, as a result, read both responses into one buffer as your "first" response.
P.P.S Make sure your PrintWriter is using the US-ASCII encoding. Otherwise, depending on your system encoding, the request line and headers of your HTTP requests might be malformed (wrong encoding).
Writing a simple http/1.1 client respecting the RFC is not such a difficult task.
To solve the problem of the blocking i/o access where reading a socket in java, you must use java.nio classes.
SocketChannels give the possibility to perform a non-blocking i/o access.
This is necessary to send HTTP request on a persistent connection.
Furthermore, nio classes will give better performances.
My stress test give to following results :
HTTP/1.0 (java.io) -> HTTP/1.0 (java.nio) = +20% faster
HTTP/1.0 (java.io) -> HTTP/1.1 (java.nio with persistent connection) = +110% faster
Make sure you have a Connection: keep-alive in your request. This may be a moot point though.
What kind of response is the server returning? Are you using chunked transfer? If the server doesn't know the size of the response body, it can't provide a Content-Length header and has to close the connection at the end of the response body to indicate to the client that the content has ended. In this case, the keep-alive won't work. If you're generating content on-the-fly with PHP, JSP etc., you can enable output buffering, check the size of the accumulated body, push the Content-Length header and flush the output buffer.
Is there a particular reason you're using raw sockets and not Java's URL Connection or Commons HTTPClient?
HTTP isn't easy to get right. I know Commons HTTP Client can re-use connections like you're trying to do.
If there isn't a specific reason for you using Sockets this is what I would recommend :)
Writing your own correct client HTTP/1.1 implementation is nontrivial; historically most people who I've seen attempt it have got it wrong. Their implementation usually ignores the spec and just does what appears to work with one particular test server - in particular, they usually ignore the requirement to be able to handle chunked responses.
Writing your own HTTP client is probably a bad idea, unless you have some VERY strange requirements.