Am testing with a client who send me a HTTP request with no content length header but has a content.
How do I extract this content without the help of contentlength header?
I've kept the original answer for completeness, but I've just been looking in the HTTP RFC (2616) section 4.3:
The presence of a message-body in a request is signaled by the inclusion of a Content-Length or Transfer-Encoding header field in the request's message-headers. A message-body MUST NOT be included in a request if the specification of the request method (section 5.1.1) does not allow sending an entity-body in requests. A server SHOULD read and forward a message-body on any request; if the request method does not include defined semantics for an entity-body, then the message-body SHOULD be ignored when handling the request.
So if you haven't got a content length, you must have a Transfer-Encoding (and if you haven't, you should respond with a 400 status to indicate a bad request or 411 ("length required")). At that point, you do what the Transfer-Encoding tells you :)
Now if you're dealing with a servlet API (or a similar HTTP API) it may well handle all this for you - at which point you may be able to use the techique below to read from the stream until it yields no more data, as the API will take care of it (i.e. it won't just be a raw socket stream).
If you could give us more information about your context, that would help.
Original answer
If there's no content length, that means the content continues until the end of the data (when the socket closes).
Keep reading from the input stream (e.g. writing it to a ByteArrayOutputStream to store it, or possibly a file) until InputStream.read returns -1. For example:
byte[] buffer = new byte[8192];
ByteArrayOutputStream output = new ByteArrayOutputStream();
int bytesRead;
while ((bytesRead = inputStream.read(buffer)) != -1)
{
output.write(buffer, 0, bytesRead);
}
// Now use the data in "output"
EDIT: As has been pointed out in comments, the client could be using a chunked encoding. Normally the HTTP API you're using should deal with this for you, but if you're dealing with a raw socket you'd have to handle it yourself.
The point about this being a request (and therefore the client not being able to close the connection) is an interesting one - I thought the client could just shut down the sending part, but I don't see how that maps to anything in TCP at the moment. My low-level networking knowledge isn't what it might be.
If this answer turns out to be "definitely useless" I'll delete it...
If this were a response then the message could be terminated by closing the connection. But that's not an option here because the client still needs to read the response.
Apart from Content-Length:, the other methods of determining content length are:
Transfer-Encoding: chunked
guesswork
Hopefully it's the former, in which case the request should look something like this:
POST /some/path HTTP/1.1
Host: www.example.com
Content-Type: text/plain
Transfer-Encoding: chunked
25
This is the data in the first chunk
1C
and this is the second one
3
con
8
sequence
0
(shamelessly stolen from the Wikipedia article and modified for a request)
each chunk is of the form: hex-encoded length, CRLF, data, CRLF
after the final data-carrying chunk comes a zero-length chunk with no data
after the zero-length chunk comes optional extra HTTP headers
after the optional HTTP headers comes another CRLF
See HTTPbis Part1, Section 3.3.
Related
I am in the process of sending a HTTP chunked request to an internal system. I've confirmed other factors are not at play by ensuring that I can send small messages without chunk encoding.
My process was basically to change the Transfer-Encoding header to be chunked and I've removed the Content-Length header. Additionally, I am utilising an in-house ChunkedOutputStream which has been around for quite some time.
I am able to connect, obtain an output stream and send the data. The recipient then returns a 200 response so it seems the request was received and successfully handled. The endpoint receives the HTTP Request, and streams the data straight into a table (using HttpServletRequest.getInputStream()).
On inspecting the streamed data I can see that the chunk encoding information in the stream has not been unwrapped/decoded by the Tomcat container automatically. I've been trawling the Tomcat HTTPConnector documentation and can't find anything that alludes to the chunked encoding w.r.t how a chunk encoded message should be handled within a HttpServlet. I can't see other StackOverflow questions querying this so I suspect I am missing something basic.
My question boils down to:
Should Tomcat automatically decode the chunked encoding from my request and give me a "clean" InputStream when I call HttpServletRequest.getInputStream()?
If yes, is there configuration that needs to be updated to enable this functionality? Am I sending something wrong in the headers that is causing it to return the non-decoded stream?
If no, is it common practice to wrap input stream in a ChunkedInputStream or something similar when the Transfer-Encoding header is present ?
This is solved. As expected it was basic in my case.
The legacy system I was using provided handrolled methods to simplify the process of opening a HTTP Connection, sending headers and then using an OutputStream to send the content via a POST. I didn't realise, and it was in a rather obscure location, but the behind-the-scenes helper's we're identifying that I was not specifying a Content-Length thus added the TRANSFER_ENCODING=chunked header and wrapped the OutputStream in a ChunkedOutputStream. This resulted in me double encoding the contents, hence my endpoints (seeming) inability to decode it.
Case closed.
I have a Java Servlet that responds to the Twilio API. It appears that Twilio does not support the chunked transfer that my responses are using. How can I avoid using Transfer-Encoding: chunked?
Here is my code:
// response is HttpServletResponse
// xml is a String with XML in it
response.getWriter().write(xml);
response.getWriter().flush();
I am using Jetty as the Servlet container.
I believe that Jetty will use chunked responses when it doesn't know the response content length and/or it is using persistent connections. To avoid chunking you either need to set the response content length or to avoid persistent connections by setting "Connection":"close" header on the response.
Try setting the Content-length before writing to the stream. Don't forget to calculate the amount of bytes according to the correct encoding, e.g.:
final byte[] content = xml.getBytes("UTF-8");
response.setContentLength(content.length);
response.setContentType("text/xml"); // or "text/xml; charset=UTF-8"
response.setCharacterEncoding("UTF-8");
final OutputStream out = response.getOutputStream();
out.write(content);
The container will decide itself to use Content-Length or Transfer-Encoding basing on the size of data to be written by using Writer or outputStream. If the size of the data is larger than the HttpServletResponse.getBufferSize(), then the response will be trunked. If not, Content-Length will be used.
In your case, just remove the 2nd flushing code will solve your problem.
I'm writing a simple web server,code snippet:
ServerSocket server = new ServerSocket(80);
Socket client=server.accept();
InputStream in=client.getInputStream();
OutputStream out=client.getOutputStream();
int val = -1;
while ((val = in.read()) != -1) {
System.out.print((char) val);
}
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter( out));
writer.write("HTTP/1.1 200 OK\r\nContent-Type: text/html; charset=utf-8\r\n\r\nhello world!");
writer.close();
out.close();
in.close();
I run it on my computer,then visit http://127.0.0.1 in Firefox. The page hangs and could'nt show "hello world".I think the problem occurs in while ((val = in.read()) != -1),how to solve it?
HTTP (at least the 1.1 version) allows letting the connection open. The request is then ended by an empty line (i.e. "\r\n\r\n"), if it is not a POST or PUT request with contents. After this, the client can send the next request on the same connection.
So you have to read the input at least for scanning for your empty line.
Edit: To clarify this a bit, some quotes from RFC 2616 (which defines HTTP 1.1).
Section 4.1, Message Types:
Request (section 5) and Response (section 6) messages use the generic
message format of RFC 822 [9] for transferring entities (the payload
of the message). Both types of message consist of a start-line, zero
or more header fields (also known as "headers"), an empty line (i.e.,
a line with nothing preceding the CRLF) indicating the end of the
header fields, and possibly a message-body.
generic-message = start-line
*(message-header CRLF)
CRLF
[ message-body ]
start-line = Request-Line | Status-Line
So, message header and message body are delimited by an empty line (the first one after the start line).
Section 4.3 Message Body:
The message-body (if any) of an HTTP message is used to carry the
entity-body associated with the request or response. [...]
The rules for when a message-body is allowed in a message differ for
requests and responses.
The presence of a message-body in a request is signaled by the
inclusion of a Content-Length or Transfer-Encoding header field in
the request's message-headers. A message-body MUST NOT be included in
a request if the specification of the request method (section 5.1.1)
does not allow sending an entity-body in requests. A server SHOULD
read and forward a message-body on any request; if the request method
does not include defined semantics for an entity-body, then the
message-body SHOULD be ignored when handling the request.
So in principle, clients must only send a body when the method allows it, but servers should ignore superfluous message bodies, if they are sent on a method that does not support it. And the presence of a body is indicated by the Content-Length or Transfer-Encoding header fields.
The subsections of section 9 define the individual methods.
9.2 OPTIONS
can contain a body, but meaning is not defined
9.3 GET
can contain no body
9.4 HEAD
can contain no body
9.5 POST
should (or must?) contain a body
9.6 PUT
should (or must?) contain a body
9.7 DELETE
can contain no body
9.8 TRACE
can contain no body
9.9 CONNECT
(this method is reserved)
Anyway, independently of whether the client sends a body or not, and whether it reuses the connection for the next request or not, it will normally not close the connection before the response is read, since otherwise your server could not resend the response at all. So you simply can't wait on a close when reading the request, but have to somehow know when it ended to send your response.
For your simple hello world server which only handles get requests you can simply say "read until the first empty line".
For a real server (i.e. one that is visible to the outside world), you should at least parse the request, ignore any body, and handle HEAD differently than GET (that is, not sending any body back), and send error responses for unsupported methods.
I want to know the length of the file, so I tried getContentLength(). It works fine with network connection (edge/3g) but returns -1 with WiFi?
Why? The WiFi is good and the file was found, it can be downloaded but the return of getContentLength() is always "-1". I dont understand. file is a google documents file.
Is there an other way to get the length?
My code is:
URL url = new URL(file);
URLConnection conexion = url.openConnection();
conexion.connect();
int poids = conexion.getContentLength();
It may well be the mobile network changing things for you. For example, the mobile network I use shrinks image downloads automatically (and annoyingly). If the network is "transparently" performing the full download before giving you any data, it can fill in the content length for you.
However, you basically shouldn't rely on having the content length... there's nothing to guarantee that it'll be available to you.
The server is probably sending back a HTTP response that is chunked.
The behavior of the getContentLength() method is to return the 'internal' value of the length of the content, that is available to it. When the client receives a HTTP chunked response, the length of the response is not known, and hence the content length value is marked as -1.
The chunked nature of the response can determined by the Transfer-Encoding header value; chunked responses have a value of chunked. HTTP servers need not provide a Content-Length header value if the response is sent via chunked encoding; in fact, servers are encouraged to not send the Content-Length header for a chunked response, for the client is supposed to ignore the Content-Length header.
As for the actual reason on why the server is responding differently in two networks, well it depends on various factors. Usually servers will opt for a more optimal delivery mode, depending on the nature of the client. For some reason, it has detected that it is better off sending chunked responses for one type of a connection. The answer might lie in the HTTP request headers, but not necessarily so.
Let's say I have a java program that makes an HTTP request on a server using HTTP 1.1 and doesn't close the connection. I make one request, and read all data returned from the input stream I have bound to the socket. However, upon making a second request, I get no response from the server (or there's a problem with the stream - it doesn't provide any more input). If I make the requests in order (Request, request, read) it works fine, but (request, read, request, read) doesn't.
Could someone shed some insight onto why this might be happening? (Code snippets follow). No matter what I do, the second read loop's isr_reader.read() only ever returns -1.
try{
connection = new Socket("SomeServer", port);
con_out = connection.getOutputStream();
con_in = connection.getInputStream();
PrintWriter out_writer = new PrintWriter(con_out, false);
out_writer.print("GET http://somesite HTTP/1.1\r\n");
out_writer.print("Host: thehost\r\n");
//out_writer.print("Content-Length: 0\r\n");
out_writer.print("\r\n");
out_writer.flush();
// If we were not interpreting this data as a character stream, we might need to adjust byte ordering here.
InputStreamReader isr_reader = new InputStreamReader(con_in);
char[] streamBuf = new char[8192];
int amountRead;
StringBuilder receivedData = new StringBuilder();
while((amountRead = isr_reader.read(streamBuf)) > 0){
receivedData.append(streamBuf, 0, amountRead);
}
// Response is processed here.
if(connection != null && !connection.isClosed()){
//System.out.println("Connection Still Open...");
out_writer.print("GET http://someSite2\r\n");
out_writer.print("Host: somehost\r\n");
out_writer.print("Connection: close\r\n");
out_writer.print("\r\n");
out_writer.flush();
streamBuf = new char[8192];
amountRead = 0;
receivedData.setLength(0);
while((amountRead = isr_reader.read(streamBuf)) > 0 || amountRead < 1){
if (amountRead > 0)
receivedData.append(streamBuf, 0, amountRead);
}
}
// Process response here
}
Responses to questions:
Yes, I'm receiving chunked responses from the server.
I'm using raw sockets because of an outside restriction.
Apologies for the mess of code - I was rewriting it from memory and seem to have introduced a few bugs.
So the consensus is I have to either do (request, request, read) and let the server close the stream once I hit the end, or, if I do (request, read, request, read) stop before I hit the end of the stream so that the stream isn't closed.
According to your code, the only time you'll even reach the statements dealing with sending the second request is when the server closes the output stream (your input stream) after receiving/responding to the first request.
The reason for that is that your code that is supposed to read only the first response
while((amountRead = isr_reader.read(streamBuf)) > 0) {
receivedData.append(streamBuf, 0, amountRead);
}
will block until the server closes the output stream (i.e., when read returns -1) or until the read timeout on the socket elapses. In the case of the read timeout, an exception will be thrown and you won't even get to sending the second request.
The problem with HTTP responses is that they don't tell you how many bytes to read from the stream until the end of the response. This is not a big deal for HTTP 1.0 responses, because the server simply closes the connection after the response thus enabling you to obtain the response (status line + headers + body) by simply reading everything until the end of the stream.
With HTTP 1.1 persistent connections you can no longer simply read everything until the end of the stream. You first need to read the status line and the headers, line by line, and then, based on the status code and the headers (such as Content-Length) decide how many bytes to read to obtain the response body (if it's present at all). If you do the above properly, your read operations will complete before the connection is closed or a timeout happens, and you will have read exactly the response the server sent. This will enable you to send the next request and then read the second response in exactly the same manner as the first one.
P.S. Request, request, read might be "working" in the sense that your server supports request pipelining and thus, receives and processes both request, and you, as a result, read both responses into one buffer as your "first" response.
P.P.S Make sure your PrintWriter is using the US-ASCII encoding. Otherwise, depending on your system encoding, the request line and headers of your HTTP requests might be malformed (wrong encoding).
Writing a simple http/1.1 client respecting the RFC is not such a difficult task.
To solve the problem of the blocking i/o access where reading a socket in java, you must use java.nio classes.
SocketChannels give the possibility to perform a non-blocking i/o access.
This is necessary to send HTTP request on a persistent connection.
Furthermore, nio classes will give better performances.
My stress test give to following results :
HTTP/1.0 (java.io) -> HTTP/1.0 (java.nio) = +20% faster
HTTP/1.0 (java.io) -> HTTP/1.1 (java.nio with persistent connection) = +110% faster
Make sure you have a Connection: keep-alive in your request. This may be a moot point though.
What kind of response is the server returning? Are you using chunked transfer? If the server doesn't know the size of the response body, it can't provide a Content-Length header and has to close the connection at the end of the response body to indicate to the client that the content has ended. In this case, the keep-alive won't work. If you're generating content on-the-fly with PHP, JSP etc., you can enable output buffering, check the size of the accumulated body, push the Content-Length header and flush the output buffer.
Is there a particular reason you're using raw sockets and not Java's URL Connection or Commons HTTPClient?
HTTP isn't easy to get right. I know Commons HTTP Client can re-use connections like you're trying to do.
If there isn't a specific reason for you using Sockets this is what I would recommend :)
Writing your own correct client HTTP/1.1 implementation is nontrivial; historically most people who I've seen attempt it have got it wrong. Their implementation usually ignores the spec and just does what appears to work with one particular test server - in particular, they usually ignore the requirement to be able to handle chunked responses.
Writing your own HTTP client is probably a bad idea, unless you have some VERY strange requirements.