getContentLength() return -1 only in WiFi?

getContentLength() return -1 only in WiFi? - java

I want to know the length of the file, so I tried getContentLength(). It works fine with network connection (edge/3g) but returns -1 with WiFi?
Why? The WiFi is good and the file was found, it can be downloaded but the return of getContentLength() is always "-1". I dont understand. file is a google documents file.
Is there an other way to get the length?
My code is:
URL url = new URL(file);
URLConnection conexion = url.openConnection();
conexion.connect();
int poids = conexion.getContentLength();

It may well be the mobile network changing things for you. For example, the mobile network I use shrinks image downloads automatically (and annoyingly). If the network is "transparently" performing the full download before giving you any data, it can fill in the content length for you.
However, you basically shouldn't rely on having the content length... there's nothing to guarantee that it'll be available to you.

The server is probably sending back a HTTP response that is chunked.
The behavior of the getContentLength() method is to return the 'internal' value of the length of the content, that is available to it. When the client receives a HTTP chunked response, the length of the response is not known, and hence the content length value is marked as -1.
The chunked nature of the response can determined by the Transfer-Encoding header value; chunked responses have a value of chunked. HTTP servers need not provide a Content-Length header value if the response is sent via chunked encoding; in fact, servers are encouraged to not send the Content-Length header for a chunked response, for the client is supposed to ignore the Content-Length header.
As for the actual reason on why the server is responding differently in two networks, well it depends on various factors. Usually servers will opt for a more optimal delivery mode, depending on the nature of the client. For some reason, it has detected that it is better off sending chunked responses for one type of a connection. The answer might lie in the HTTP request headers, but not necessarily so.

Related

HttpServletRequest.getInputStream() does not unwrap chunked HTTP request

I am in the process of sending a HTTP chunked request to an internal system. I've confirmed other factors are not at play by ensuring that I can send small messages without chunk encoding.
My process was basically to change the Transfer-Encoding header to be chunked and I've removed the Content-Length header. Additionally, I am utilising an in-house ChunkedOutputStream which has been around for quite some time.
I am able to connect, obtain an output stream and send the data. The recipient then returns a 200 response so it seems the request was received and successfully handled. The endpoint receives the HTTP Request, and streams the data straight into a table (using HttpServletRequest.getInputStream()).
On inspecting the streamed data I can see that the chunk encoding information in the stream has not been unwrapped/decoded by the Tomcat container automatically. I've been trawling the Tomcat HTTPConnector documentation and can't find anything that alludes to the chunked encoding w.r.t how a chunk encoded message should be handled within a HttpServlet. I can't see other StackOverflow questions querying this so I suspect I am missing something basic.
My question boils down to:
Should Tomcat automatically decode the chunked encoding from my request and give me a "clean" InputStream when I call HttpServletRequest.getInputStream()?
If yes, is there configuration that needs to be updated to enable this functionality? Am I sending something wrong in the headers that is causing it to return the non-decoded stream?
If no, is it common practice to wrap input stream in a ChunkedInputStream or something similar when the Transfer-Encoding header is present ?

This is solved. As expected it was basic in my case.
The legacy system I was using provided handrolled methods to simplify the process of opening a HTTP Connection, sending headers and then using an OutputStream to send the content via a POST. I didn't realise, and it was in a rather obscure location, but the behind-the-scenes helper's we're identifying that I was not specifying a Content-Length thus added the TRANSFER_ENCODING=chunked header and wrapped the OutputStream in a ChunkedOutputStream. This resulted in me double encoding the contents, hence my endpoints (seeming) inability to decode it.
Case closed.

How to know when HTTP-server is done sending data

I'm working on a browser/proxy oriented project where I need to download webpages. After sending a custom HTTP request to a web server I start listening for a server response.
When reading the response, I check the response headers for a Content-Length:-row. If I get one of those, it's easy to determine when the server is done sending data since I always know how many bytes of data I have received.
The problem occurs when the server doesn't include the Content-Length header and also keeps the connection open for further requests. For example, the google server responds with gzipped-content, but doesn't include content length. How do I know when to stop waiting for more data and close the connection?
I have considered using a timeout value to close the connection when no data has been received for a while, but this seems like the wrong way to do it. Chrome for example, can download the same pages as me and always seem to know exactly when to close the connection.

Have a look at IETF RfC 2616, search for chunked encoding and Content-Range.
HTTP is designed to return content of unknown length, as in:
HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked
25
This is the data in the first chunk
1C
and this is the second one
3
con
8
sequence
0
source Wikipedia

I would try to suggest you to force Connection: close header so you are sure that the server closes the connection after output is finished, no matter if the Content-length is set or not. Performance will be partially affected by this

There are two cases you can expect:
1. socket-close
2. socket-timeout
Usually the socket will be closed, it also make sense to declare an Socket Timeout.
Remember
int stream.read(byte[],size);
returns the real size of byte[]-argument's size that has been read till socket-close or socket-timeout (or size-argument reached).
Regards.

How to configure HTTPServer to use content length and not transfer encoding: chunked?

I'm using java's HTTP Server object with web service implemeted by WebServiceProvider.
I see that no matter of the client request, the answer is chunked and i need it to be with content length.
so i'm assuming the problem is in the server and not the web server provider, right?
and how can i configure the http header to use content length and not chunked?
HttpServer m_server = HttpServer.create();
Endpoint ep= Endpoint.create(new ep());
HttpContext epContext = m_server.createContext("/DownloadFile");
ep.publish(downloadFileContext);

I assume you're talking about the com.sun.net.httpserver HTTPServer. I further assume that you're connecting the server to the service with a call to Endpoint.publish, using some service provider which supports HTTPServer.
The key is in the HttpExchange.sendResponseHeaders method:
If the response length parameter is greater than zero, this specifies an exact number of bytes to send and the application must send that exact amount of data. If the response length parameter is zero, then chunked transfer encoding is used and an arbitrary amount of data may be sent. The application terminates the response body by closing the OutputStream.
So, as long as the handler is passing a positive value for responseLength, Content-Length is used. Of course, to do that, it will have to know how much data it is going to send ahead of time, which it might well not. Whether it does or not depends entirely on the implementation of the binding, i'm afraid. I don't believe this is standardised - indeed, i don't believe that the WebServiceProvider/HTTPServer is standardised at all.
However, even if your provider is uncooperative, you have a recourse: write a Filter which adds buffering, and add it to the HttpContext which you are using to publish the service. I think that to do this, you would have to write an implementation of HttpExchange which buffers the data written to it, pass that down the filter chain for the handler to write its response to, then when it comes back, write the buffered content, setting the responseLength when it does so.

Get content from HTTP request even if there is no contentlength header

Am testing with a client who send me a HTTP request with no content length header but has a content.
How do I extract this content without the help of contentlength header?

I've kept the original answer for completeness, but I've just been looking in the HTTP RFC (2616) section 4.3:
The presence of a message-body in a request is signaled by the inclusion of a Content-Length or Transfer-Encoding header field in the request's message-headers. A message-body MUST NOT be included in a request if the specification of the request method (section 5.1.1) does not allow sending an entity-body in requests. A server SHOULD read and forward a message-body on any request; if the request method does not include defined semantics for an entity-body, then the message-body SHOULD be ignored when handling the request.
So if you haven't got a content length, you must have a Transfer-Encoding (and if you haven't, you should respond with a 400 status to indicate a bad request or 411 ("length required")). At that point, you do what the Transfer-Encoding tells you :)
Now if you're dealing with a servlet API (or a similar HTTP API) it may well handle all this for you - at which point you may be able to use the techique below to read from the stream until it yields no more data, as the API will take care of it (i.e. it won't just be a raw socket stream).
If you could give us more information about your context, that would help.
Original answer
If there's no content length, that means the content continues until the end of the data (when the socket closes).
Keep reading from the input stream (e.g. writing it to a ByteArrayOutputStream to store it, or possibly a file) until InputStream.read returns -1. For example:
byte[] buffer = new byte[8192];
ByteArrayOutputStream output = new ByteArrayOutputStream();
int bytesRead;
while ((bytesRead = inputStream.read(buffer)) != -1)
{
output.write(buffer, 0, bytesRead);
}
// Now use the data in "output"
EDIT: As has been pointed out in comments, the client could be using a chunked encoding. Normally the HTTP API you're using should deal with this for you, but if you're dealing with a raw socket you'd have to handle it yourself.
The point about this being a request (and therefore the client not being able to close the connection) is an interesting one - I thought the client could just shut down the sending part, but I don't see how that maps to anything in TCP at the moment. My low-level networking knowledge isn't what it might be.
If this answer turns out to be "definitely useless" I'll delete it...

If this were a response then the message could be terminated by closing the connection. But that's not an option here because the client still needs to read the response.
Apart from Content-Length:, the other methods of determining content length are:
Transfer-Encoding: chunked
guesswork
Hopefully it's the former, in which case the request should look something like this:
POST /some/path HTTP/1.1
Host: www.example.com
Content-Type: text/plain
Transfer-Encoding: chunked
25
This is the data in the first chunk
1C
and this is the second one
3
con
8
sequence
0
(shamelessly stolen from the Wikipedia article and modified for a request)
each chunk is of the form: hex-encoded length, CRLF, data, CRLF
after the final data-carrying chunk comes a zero-length chunk with no data
after the zero-length chunk comes optional extra HTTP headers
after the optional HTTP headers comes another CRLF

See HTTPbis Part1, Section 3.3.

HTTP 1.1 Persistent Connections using Sockets in Java

Let's say I have a java program that makes an HTTP request on a server using HTTP 1.1 and doesn't close the connection. I make one request, and read all data returned from the input stream I have bound to the socket. However, upon making a second request, I get no response from the server (or there's a problem with the stream - it doesn't provide any more input). If I make the requests in order (Request, request, read) it works fine, but (request, read, request, read) doesn't.
Could someone shed some insight onto why this might be happening? (Code snippets follow). No matter what I do, the second read loop's isr_reader.read() only ever returns -1.
try{
connection = new Socket("SomeServer", port);
con_out = connection.getOutputStream();
con_in = connection.getInputStream();
PrintWriter out_writer = new PrintWriter(con_out, false);
out_writer.print("GET http://somesite HTTP/1.1\r\n");
out_writer.print("Host: thehost\r\n");
//out_writer.print("Content-Length: 0\r\n");
out_writer.print("\r\n");
out_writer.flush();
// If we were not interpreting this data as a character stream, we might need to adjust byte ordering here.
InputStreamReader isr_reader = new InputStreamReader(con_in);
char[] streamBuf = new char[8192];
int amountRead;
StringBuilder receivedData = new StringBuilder();
while((amountRead = isr_reader.read(streamBuf)) > 0){
receivedData.append(streamBuf, 0, amountRead);
}
// Response is processed here.
if(connection != null && !connection.isClosed()){
//System.out.println("Connection Still Open...");
out_writer.print("GET http://someSite2\r\n");
out_writer.print("Host: somehost\r\n");
out_writer.print("Connection: close\r\n");
out_writer.print("\r\n");
out_writer.flush();
streamBuf = new char[8192];
amountRead = 0;
receivedData.setLength(0);
while((amountRead = isr_reader.read(streamBuf)) > 0 || amountRead < 1){
if (amountRead > 0)
receivedData.append(streamBuf, 0, amountRead);
}
}
// Process response here
}
Responses to questions:
Yes, I'm receiving chunked responses from the server.
I'm using raw sockets because of an outside restriction.
Apologies for the mess of code - I was rewriting it from memory and seem to have introduced a few bugs.
So the consensus is I have to either do (request, request, read) and let the server close the stream once I hit the end, or, if I do (request, read, request, read) stop before I hit the end of the stream so that the stream isn't closed.

According to your code, the only time you'll even reach the statements dealing with sending the second request is when the server closes the output stream (your input stream) after receiving/responding to the first request.
The reason for that is that your code that is supposed to read only the first response
while((amountRead = isr_reader.read(streamBuf)) > 0) {
receivedData.append(streamBuf, 0, amountRead);
}
will block until the server closes the output stream (i.e., when read returns -1) or until the read timeout on the socket elapses. In the case of the read timeout, an exception will be thrown and you won't even get to sending the second request.
The problem with HTTP responses is that they don't tell you how many bytes to read from the stream until the end of the response. This is not a big deal for HTTP 1.0 responses, because the server simply closes the connection after the response thus enabling you to obtain the response (status line + headers + body) by simply reading everything until the end of the stream.
With HTTP 1.1 persistent connections you can no longer simply read everything until the end of the stream. You first need to read the status line and the headers, line by line, and then, based on the status code and the headers (such as Content-Length) decide how many bytes to read to obtain the response body (if it's present at all). If you do the above properly, your read operations will complete before the connection is closed or a timeout happens, and you will have read exactly the response the server sent. This will enable you to send the next request and then read the second response in exactly the same manner as the first one.
P.S. Request, request, read might be "working" in the sense that your server supports request pipelining and thus, receives and processes both request, and you, as a result, read both responses into one buffer as your "first" response.
P.P.S Make sure your PrintWriter is using the US-ASCII encoding. Otherwise, depending on your system encoding, the request line and headers of your HTTP requests might be malformed (wrong encoding).

Writing a simple http/1.1 client respecting the RFC is not such a difficult task.
To solve the problem of the blocking i/o access where reading a socket in java, you must use java.nio classes.
SocketChannels give the possibility to perform a non-blocking i/o access.
This is necessary to send HTTP request on a persistent connection.
Furthermore, nio classes will give better performances.
My stress test give to following results :
HTTP/1.0 (java.io) -> HTTP/1.0 (java.nio) = +20% faster
HTTP/1.0 (java.io) -> HTTP/1.1 (java.nio with persistent connection) = +110% faster

Make sure you have a Connection: keep-alive in your request. This may be a moot point though.
What kind of response is the server returning? Are you using chunked transfer? If the server doesn't know the size of the response body, it can't provide a Content-Length header and has to close the connection at the end of the response body to indicate to the client that the content has ended. In this case, the keep-alive won't work. If you're generating content on-the-fly with PHP, JSP etc., you can enable output buffering, check the size of the accumulated body, push the Content-Length header and flush the output buffer.

Is there a particular reason you're using raw sockets and not Java's URL Connection or Commons HTTPClient?
HTTP isn't easy to get right. I know Commons HTTP Client can re-use connections like you're trying to do.
If there isn't a specific reason for you using Sockets this is what I would recommend :)

Writing your own correct client HTTP/1.1 implementation is nontrivial; historically most people who I've seen attempt it have got it wrong. Their implementation usually ignores the spec and just does what appears to work with one particular test server - in particular, they usually ignore the requirement to be able to handle chunked responses.
Writing your own HTTP client is probably a bad idea, unless you have some VERY strange requirements.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.