HttpServletRequest.getInputStream() does not unwrap chunked HTTP request

HttpServletRequest.getInputStream() does not unwrap chunked HTTP request - java

I am in the process of sending a HTTP chunked request to an internal system. I've confirmed other factors are not at play by ensuring that I can send small messages without chunk encoding.
My process was basically to change the Transfer-Encoding header to be chunked and I've removed the Content-Length header. Additionally, I am utilising an in-house ChunkedOutputStream which has been around for quite some time.
I am able to connect, obtain an output stream and send the data. The recipient then returns a 200 response so it seems the request was received and successfully handled. The endpoint receives the HTTP Request, and streams the data straight into a table (using HttpServletRequest.getInputStream()).
On inspecting the streamed data I can see that the chunk encoding information in the stream has not been unwrapped/decoded by the Tomcat container automatically. I've been trawling the Tomcat HTTPConnector documentation and can't find anything that alludes to the chunked encoding w.r.t how a chunk encoded message should be handled within a HttpServlet. I can't see other StackOverflow questions querying this so I suspect I am missing something basic.
My question boils down to:
Should Tomcat automatically decode the chunked encoding from my request and give me a "clean" InputStream when I call HttpServletRequest.getInputStream()?
If yes, is there configuration that needs to be updated to enable this functionality? Am I sending something wrong in the headers that is causing it to return the non-decoded stream?
If no, is it common practice to wrap input stream in a ChunkedInputStream or something similar when the Transfer-Encoding header is present ?

This is solved. As expected it was basic in my case.
The legacy system I was using provided handrolled methods to simplify the process of opening a HTTP Connection, sending headers and then using an OutputStream to send the content via a POST. I didn't realise, and it was in a rather obscure location, but the behind-the-scenes helper's we're identifying that I was not specifying a Content-Length thus added the TRANSFER_ENCODING=chunked header and wrapped the OutputStream in a ChunkedOutputStream. This resulted in me double encoding the contents, hence my endpoints (seeming) inability to decode it.
Case closed.

Related

How to know when HTTP-server is done sending data

I'm working on a browser/proxy oriented project where I need to download webpages. After sending a custom HTTP request to a web server I start listening for a server response.
When reading the response, I check the response headers for a Content-Length:-row. If I get one of those, it's easy to determine when the server is done sending data since I always know how many bytes of data I have received.
The problem occurs when the server doesn't include the Content-Length header and also keeps the connection open for further requests. For example, the google server responds with gzipped-content, but doesn't include content length. How do I know when to stop waiting for more data and close the connection?
I have considered using a timeout value to close the connection when no data has been received for a while, but this seems like the wrong way to do it. Chrome for example, can download the same pages as me and always seem to know exactly when to close the connection.

Have a look at IETF RfC 2616, search for chunked encoding and Content-Range.
HTTP is designed to return content of unknown length, as in:
HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked
25
This is the data in the first chunk
1C
and this is the second one
3
con
8
sequence
0
source Wikipedia

I would try to suggest you to force Connection: close header so you are sure that the server closes the connection after output is finished, no matter if the Content-length is set or not. Performance will be partially affected by this

There are two cases you can expect:
1. socket-close
2. socket-timeout
Usually the socket will be closed, it also make sense to declare an Socket Timeout.
Remember
int stream.read(byte[],size);
returns the real size of byte[]-argument's size that has been read till socket-close or socket-timeout (or size-argument reached).
Regards.

Error in a middle of writing to outputstream handling

I'm building an httpserver as part of my academic java course,
The server should only support basic GET and POST requests.
I was wondering if there's an elegant way to handle an error which occures in the middle of writing an html file stream content (and after I've already sent the response headers) into the HttpServer output stream.
By elegant way I refer to showing or redirecting the user to a "Internal Server Error" error page.
I tried re-sending the http response headers with 501 error code, but java throws an exception which claims that the headers were already sent...
One fix would be to read the file's contents into memory, and only then sending the headers and the content, but other problems can arise, and furthermore, I don't want to load huge files into the memory before sending them out as a response.

Once the response status is sent on the wire, it cannot be changed. So if you sent a 200 OK response, you cannot change your mind afterwards. As you found, this presents a problem in case of errors that occur mid response.
As far as I know, the only think you can do is to send a chunked response. See section 3.6.1 of RFC 2616:
The chunked encoding modifies the body of a message in order to
transfer it as a series of chunks, each with its own size indicator,
followed by an OPTIONAL trailer containing entity-header fields. This
allows dynamically produced content to be transferred along with the
information necessary for the recipient to verify that it has received
the full message.
The purpose of this trailer is to give information about the entity body that cannot be calculated before the entity body is sent. However, section 7.1 allows any header to be included in this trailer:
The extension-header mechanism allows additional entity-header fields
to be defined without changing the protocol, but these fields cannot
be assumed to be recognizable by the recipient. Unrecognized header
fields SHOULD be ignored by the recipient and MUST be forwarded by
transparent proxies.
So while you can signal that an error has occurred mid response, it must be conventioned between the two parts how this is signaled. You cannot, in general, use any method you can assume the client will understand as signaling an error condition.
Ending the connection prematurely in a message with a Content-length header is an option, but one that is explicitly forbidden:
When a Content-Length is given in a message where a message-body is
allowed, its field value MUST exactly match the number of OCTETs in
the message-body. HTTP/1.1 user agents MUST notify the user when an
invalid length is received and detected.
That said, while the server must not send a message shorter than he advertises, the client must check for this error condition and reported as such (and proxies may even cache this partial response).

By elegant way I refer to showing or redirecting the user to a
"Internal Server Error" error page.
If you can't send the 'success' response how are you going to send a different response? All you can do is log it and forget about it.

How to configure HTTPServer to use content length and not transfer encoding: chunked?

I'm using java's HTTP Server object with web service implemeted by WebServiceProvider.
I see that no matter of the client request, the answer is chunked and i need it to be with content length.
so i'm assuming the problem is in the server and not the web server provider, right?
and how can i configure the http header to use content length and not chunked?
HttpServer m_server = HttpServer.create();
Endpoint ep= Endpoint.create(new ep());
HttpContext epContext = m_server.createContext("/DownloadFile");
ep.publish(downloadFileContext);

I assume you're talking about the com.sun.net.httpserver HTTPServer. I further assume that you're connecting the server to the service with a call to Endpoint.publish, using some service provider which supports HTTPServer.
The key is in the HttpExchange.sendResponseHeaders method:
If the response length parameter is greater than zero, this specifies an exact number of bytes to send and the application must send that exact amount of data. If the response length parameter is zero, then chunked transfer encoding is used and an arbitrary amount of data may be sent. The application terminates the response body by closing the OutputStream.
So, as long as the handler is passing a positive value for responseLength, Content-Length is used. Of course, to do that, it will have to know how much data it is going to send ahead of time, which it might well not. Whether it does or not depends entirely on the implementation of the binding, i'm afraid. I don't believe this is standardised - indeed, i don't believe that the WebServiceProvider/HTTPServer is standardised at all.
However, even if your provider is uncooperative, you have a recourse: write a Filter which adds buffering, and add it to the HttpContext which you are using to publish the service. I think that to do this, you would have to write an implementation of HttpExchange which buffers the data written to it, pass that down the filter chain for the handler to write its response to, then when it comes back, write the buffered content, setting the responseLength when it does so.

Get content from HTTP request even if there is no contentlength header

Am testing with a client who send me a HTTP request with no content length header but has a content.
How do I extract this content without the help of contentlength header?

I've kept the original answer for completeness, but I've just been looking in the HTTP RFC (2616) section 4.3:
The presence of a message-body in a request is signaled by the inclusion of a Content-Length or Transfer-Encoding header field in the request's message-headers. A message-body MUST NOT be included in a request if the specification of the request method (section 5.1.1) does not allow sending an entity-body in requests. A server SHOULD read and forward a message-body on any request; if the request method does not include defined semantics for an entity-body, then the message-body SHOULD be ignored when handling the request.
So if you haven't got a content length, you must have a Transfer-Encoding (and if you haven't, you should respond with a 400 status to indicate a bad request or 411 ("length required")). At that point, you do what the Transfer-Encoding tells you :)
Now if you're dealing with a servlet API (or a similar HTTP API) it may well handle all this for you - at which point you may be able to use the techique below to read from the stream until it yields no more data, as the API will take care of it (i.e. it won't just be a raw socket stream).
If you could give us more information about your context, that would help.
Original answer
If there's no content length, that means the content continues until the end of the data (when the socket closes).
Keep reading from the input stream (e.g. writing it to a ByteArrayOutputStream to store it, or possibly a file) until InputStream.read returns -1. For example:
byte[] buffer = new byte[8192];
ByteArrayOutputStream output = new ByteArrayOutputStream();
int bytesRead;
while ((bytesRead = inputStream.read(buffer)) != -1)
{
output.write(buffer, 0, bytesRead);
}
// Now use the data in "output"
EDIT: As has been pointed out in comments, the client could be using a chunked encoding. Normally the HTTP API you're using should deal with this for you, but if you're dealing with a raw socket you'd have to handle it yourself.
The point about this being a request (and therefore the client not being able to close the connection) is an interesting one - I thought the client could just shut down the sending part, but I don't see how that maps to anything in TCP at the moment. My low-level networking knowledge isn't what it might be.
If this answer turns out to be "definitely useless" I'll delete it...

If this were a response then the message could be terminated by closing the connection. But that's not an option here because the client still needs to read the response.
Apart from Content-Length:, the other methods of determining content length are:
Transfer-Encoding: chunked
guesswork
Hopefully it's the former, in which case the request should look something like this:
POST /some/path HTTP/1.1
Host: www.example.com
Content-Type: text/plain
Transfer-Encoding: chunked
25
This is the data in the first chunk
1C
and this is the second one
3
con
8
sequence
0
(shamelessly stolen from the Wikipedia article and modified for a request)
each chunk is of the form: hex-encoded length, CRLF, data, CRLF
after the final data-carrying chunk comes a zero-length chunk with no data
after the zero-length chunk comes optional extra HTTP headers
after the optional HTTP headers comes another CRLF

See HTTPbis Part1, Section 3.3.

getContentLength() return -1 only in WiFi?

I want to know the length of the file, so I tried getContentLength(). It works fine with network connection (edge/3g) but returns -1 with WiFi?
Why? The WiFi is good and the file was found, it can be downloaded but the return of getContentLength() is always "-1". I dont understand. file is a google documents file.
Is there an other way to get the length?
My code is:
URL url = new URL(file);
URLConnection conexion = url.openConnection();
conexion.connect();
int poids = conexion.getContentLength();

It may well be the mobile network changing things for you. For example, the mobile network I use shrinks image downloads automatically (and annoyingly). If the network is "transparently" performing the full download before giving you any data, it can fill in the content length for you.
However, you basically shouldn't rely on having the content length... there's nothing to guarantee that it'll be available to you.

The server is probably sending back a HTTP response that is chunked.
The behavior of the getContentLength() method is to return the 'internal' value of the length of the content, that is available to it. When the client receives a HTTP chunked response, the length of the response is not known, and hence the content length value is marked as -1.
The chunked nature of the response can determined by the Transfer-Encoding header value; chunked responses have a value of chunked. HTTP servers need not provide a Content-Length header value if the response is sent via chunked encoding; in fact, servers are encouraged to not send the Content-Length header for a chunked response, for the client is supposed to ignore the Content-Length header.
As for the actual reason on why the server is responding differently in two networks, well it depends on various factors. Usually servers will opt for a more optimal delivery mode, depending on the nature of the client. For some reason, it has detected that it is better off sending chunked responses for one type of a connection. The answer might lie in the HTTP request headers, but not necessarily so.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.