Why does HTTPURLConnection.getInputStream() takes time

Why does HTTPURLConnection.getInputStream() takes time - java

I have a task to download & upload a file using HTTP protocol in Android (Java platform).
I am using following code for uploading a file:
HttpURLConnection httpURLConnection = (HttpURLConnection) serverUrl.openConnection();
....
httpURLConnection.connect();
OutputStream os = httpURLConnection.getOutputStream();
And Using following code for downloading a file:
HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
...
urlConnection.connect();
DataInputStream stream = new DataInputStream(urlConnection.getInputStream());
As per my observation connect() for both the case takes time because it is communicating with network at this point. And for file upload, getOutputStream() gets execute very fast so does it means it is not communicating to network?
Whereas getInputStream() (in file download) takes some time (around 200 to 2500 mili sec) to execute. Does it mean it is communicating with network at this point? If yes then why so?
Experts, Please provide your comments on this & correct me if I am wrong anywhere.

HTTP is a request/response protocol. You need a TCP connection. The connect() method creates that. Then you need to send a request. You call getOutputStream() for that, and you write it.
At this point nothing has been written to the network (in normal transfer mode), because the content-length header has to be set, and Java doesn't know when you've finished writing. So when you call getInputStream() (or getResponseCode()), Java sets the content-length header, writes the request, waits for the server to start generating a response, reads all the response headers, and then gives you an input stream positioned at the beginning of the body of the response. All those steps take time.

You must limit buffering by specifying the streaming mode either by giving the final length of the uploaded information via setFixedLengthStreamingMode method, or setting mode to streaming if final length is not known via setChunkedStreamingMode method:
// For best performance, you should call either setFixedLengthStreamingMode(int) when the body length is known in advance,
// or setChunkedStreamingMode(int) when it is not. Otherwise HttpURLConnection will be forced to buffer the complete request body in memory
// before it is transmitted, wasting (and possibly exhausting) heap and increasing latency.
//
// see: https://developer.android.com/reference/java/net/HttpURLConnection.html
_connection.setChunkedStreamingMode(1024);
If you don't, the real transfer will occur when you call getInputStream().
See https://developer.android.com/reference/java/net/HttpURLConnection.html

Related

When does HttpURLConnection on Android really call the request

I have the following code:
HttpURLConnection conn = null;
BufferedReader in = null;
StringBuilder sb = null;
InputStream is = null;
conn = (HttpURLConnection) url.openConnection();
// Break-point A
conn.setDoInput(true);
conn.setDoOutput(true);
conn.setRequestMethod("POST");
// Break-point B
conn.setRequestProperty("X-TP-APP", Constants.X_TP_APP);
conn.setRequestProperty("X-TP-DEVICE", Constants.X_TP_DEVICE);
conn.setRequestProperty("X-TP-LOCALE", Constants.X_TP_LOCALE);
conn.setRequestProperty("Content-Type", contentType);
conn.setRequestProperty("Accept", accept);
conn.setRequestProperty("Authorization", SystemApi.TOKEN_STR);
conn.setUseCaches(false);
conn.setConnectTimeout(30000);
conn.getOutputStream().write(req.getBytes("UTF-8"));
conn.getOutputStream().flush();
conn.getOutputStream().close();
is = conn.getInputStream();
in = new BufferedReader(new InputStreamReader(is));
int statusCode = conn.getResponseCode();
// Break-point C
The code is running fine without problem (when breakpoint(A,B) is disabled)
I tried to find out when does HttpURLConnection really call the request and place breakpoint(A) after conn = getConnection(strURL);
and continue the code, but then at the end, at breakpoint(C), server would return me 401 - Unauthorized, which mean my Authorization header is not in the request.
It seem like that we are trying to open a connection first, and then set the header as fast as we can. If we are not fast enough, then the request is called anyway, which doesn't seem right.
My question and concern:
When does HttpURLConnection really call the request?
Is this what is actually happening? Is this the correct way to do so?
Is there a better way to make sure the header is set before calling the request?

Per the docs, the actual connection is made when the connect() method is invoked on the [Http]UrlConnection. That may be done manually, or it may be done implicitly by certain other methods. The Javadocs for UrlConnection.connect() say, in part:
URLConnection objects go through two phases: first they are created, then they are connected. After being created, and before being connected, various options can be specified (e.g., doInput and UseCaches). After connecting, it is an error to try to set them. Operations that depend on being connected, like getContentLength, will implicitly perform the connection, if necessary.
Note in particular the last sentence. I don't see anything in your code that would require the connection to be established until the first conn.getOutputStream(), and I read the docs as saying that the connection object will not enter the "connected" state until some method is invoked on it that requires that. Until such a time, it is ok to set connection properties.
Moreover, the docs definitely state that methods that set properties on the connection (and setRequestProperty() in particular) will throw an IllegalStateException if invoked when the connection object is already connected.
It is possible that your Java library is buggy in the manner you describe, but that would certainly be in conflict with the API specification. I think it's more likely that the explanation for the behavior you observe is different, and I recommend you capture and analyze the actual HTTP traffic to determine what's really going on.

Actually what really happened is, in the debug mode, I used conn.getResponseCode() in the expressions, which force the conn.getResponseCode() to run.
When it is not connected yet, getResponseCode() would calls connect() before the request is prepared.
Hence it would return me 401.

Since Android using the same HttpURLConnection, I did some capture the packet exchange to see what is happening under the hood.
I detailed my experiment in this post Can you explain the HttpURLConnection connection process?
To outline the network activity for your program.
At Breakpoint A No physical connection is made to the remote server. You get a logical handle to a local connection object.
At Breakpoint B You just configure the local connection object, nothing more.
conn.getOutputStream() Network connection starts here, but no payload is transferred to the server.
conn.getInputStream() Payload (http headers, content) are sent to the server, and you get the response (buffered into input stream, and also the response code etc.)
To Answer your question
When does HttpURLConnection really call the request?
getInputStream() triggers network layer to send out application payload and got responses.
Is this what is actually happening? Is this the correct way to do so?
No. openConnection() does not initiate network activity. You are getting back a local handle for future connection, not an active connection.
Is there a better way to make sure the header is set before calling the request?
You don't need to make sure header is set. The header payload isn't sent to the server until you ask for response (such as getting the response code, or opening a inputStream )

How it comes that URL.openConnection() allows me to read header?

I recently was experimenting with java networking and I found a bit odd thing, suppose you have
URL url = new URL("http://www.google.com");
URLConnection con = url.openConnection();
then i can call methods, like con.getContentLength() and so on and they will give me correct values, even despite I didn't envoke con.connect(). How can that be? I mean, where from/how does URLConnection gets those headers, I didn't invoke con.connect() yet, so no requests were sent and so no headers should be available at that moment.

The actual TCP connect happens implicitly when you call any method that requires the response, such as getContentLength(), getInputStream(), getResponseCode(). It doesn't happen at openConnection(). The request is sent at that point.
Unless you are using one of the streaming modes and you're doing a PUT or POST with request content, in which case the connection is opened when you start writing the request.

Apache HTTPClient Streaming HTTP POST Request?

I'm trying to build a "full-duplex" HTTP streaming request using Apache HTTPClient.
In my first attempt, I tried using the following request code:
URL url=new URL(/* code goes here */);
HttpPost request=new HttpPost(url.toString());
request.addHeader("Connection", "close");
PipedOutputStream requestOutput=new PipedOutputStream();
PipedInputStream requestInput=new PipedInputStream(requestOutput, DEFAULT_PIPE_SIZE);
ContentType requestContentType=getContentType();
InputStreamEntity requestEntity=new InputStreamEntity(requestInput, -1, requestContentType);
request.setEntity(requestEntity);
HttpEntity responseEntity=null;
HttpResponse response=getHttpClient().execute(request); // <-- Hanging here
try {
if(response.getStatusLine().getStatusCode() != 200)
throw new IOException("Unexpected status code: "+response.getStatusLine().getStatusCode());
responseEntity = response.getEntity();
}
finally {
if(responseEntity == null)
request.abort();
}
InputStream responseInput=responseEntity.getContent();
ContentType responseContentType;
if(responseEntity.getContentType() != null)
responseContentType = ContentType.parse(responseEntity.getContentType().getValue());
else
responseContentType = DEFAULT_CONTENT_TYPE;
Reader responseStream=decode(responseInput, responseContentType);
Writer requestStream=encode(requestOutput, getContentType());
The request hangs at the line indicated above. It seems that the code is trying to send the entire request before it gets the response. In retrospect, this makes sense. However, it's not what I was hoping for. :)
Instead, I was hoping to send the request headers with Transfer-Encoding: chunked, receive a response header of HTTP/1.1 200 OK with a Transfer-Encoding: chunked header of its own, and then I'd have a full-duplex streaming HTTP connection to work with.
Happily, my HTTPClient has another NIO-based asynchronous client with good usage examples (like this one). My questions are:
Is my interpretation of the synchronous HTTPClient behavior correct? Or is there something I can do to continue using the (simpler) synchronous HTTPClient in the manner I described?
Does the NIO-based client wait to send the whole request before seeking a response? Or will I be able to send the request incrementally and receive the response incrementally at the same time?
If HTTPClient will not support this modality, is there another HTTP client library that will? Or should I be planning to write a (minimal) HTTP client to support this modality?

Here is my view on skim reading the code:
I cannot completely agree with the fact that a non-200 response means failure. All 2XX responses are mostly valid. Check wiki for more details
For any TCP request, I would recommend to receive the entire response to confirm that it is valid. I say this because, a partial response may mostly be treated as bad response as most of the client implementations cannot make use of it. (Imagine a case where server is responding with 2MB of data and it goes down during this time)

A separate thread must be writing to the OutputStream for your code to
work.
The code above provides the HTTPClient with a PipedInputStream.
PipedInputStream makes bytes available as they are written to the corresponding OutputStream.
The code above does not write to the OutputStream (which must be done by a separate thread.
Therefore the code is hanging exactly where your comment is.
Under the hood, the Apache client says "inputStream.read()" which in the case of piped streams requires that outputStream.write(bytes) was called previously (by a separate thread).
Since you aren't pumping bytes into the associated OutputStream from a separate thread the InputStream just sits and waits for the OutputStream to be written to by "some other thread."
From the JavaDocs:
A piped input stream should be connected to a piped output stream;
the piped input stream then provides whatever data bytes are written
to the piped output stream.
Typically, data is read from a PipedInputStream object by one thread
and data is written to the corresponding PipedOutputStream by some
other thread.
Attempting to use both objects from a single thread is not
recommended, as it may deadlock the thread.
The piped input stream contains a buffer, decoupling read operations
from write operations, within limits. A pipe is said to be "broken"
if a thread that was providing data bytes to the connected piped
output stream is no longer alive.
Note: Seems to me, since piped streams and concurrency were not mentioned in your problem statement, that it's not necessary. Try wrapping a ByteArrayInputStream() with the Entity object instead first for a sanity check... that should help you narrow down the issue.
Update
Incidentally, I wrote an inversion of Apache's HTTP Client API [PipedApacheClientOutputStream] which provides an OutputStream interface for HTTP POST using Apache Commons HTTP Client 4.3.4. This may be close to what you are looking for...
Calling-code looks like this:
// Calling-code manages thread-pool
ExecutorService es = Executors.newCachedThreadPool(
new ThreadFactoryBuilder()
.setNameFormat("apache-client-executor-thread-%d")
.build());
// Build configuration
PipedApacheClientOutputStreamConfig config = new
PipedApacheClientOutputStreamConfig();
config.setUrl("http://localhost:3000");
config.setPipeBufferSizeBytes(1024);
config.setThreadPool(es);
config.setHttpClient(HttpClientBuilder.create().build());
// Instantiate OutputStream
PipedApacheClientOutputStream os = new
PipedApacheClientOutputStream(config);
// Write to OutputStream
os.write(...);
try {
os.close();
} catch (IOException e) {
logger.error(e.getLocalizedMessage(), e);
}
// Do stuff with HTTP response
...
// Close the HTTP response
os.getResponse().close();
// Finally, shut down thread pool
// This must occur after retrieving response (after is) if interested
// in POST result
es.shutdown();
Note - In practice the same client, executor service, and config will likely be reused throughout the life of the application, so the outer prep and close code in the above example will likely live in bootstrap/init and finalization code rather than directly inline with the OutputStream instantiation.

HTTP 1.1 Persistent Connections using Sockets in Java

Let's say I have a java program that makes an HTTP request on a server using HTTP 1.1 and doesn't close the connection. I make one request, and read all data returned from the input stream I have bound to the socket. However, upon making a second request, I get no response from the server (or there's a problem with the stream - it doesn't provide any more input). If I make the requests in order (Request, request, read) it works fine, but (request, read, request, read) doesn't.
Could someone shed some insight onto why this might be happening? (Code snippets follow). No matter what I do, the second read loop's isr_reader.read() only ever returns -1.
try{
connection = new Socket("SomeServer", port);
con_out = connection.getOutputStream();
con_in = connection.getInputStream();
PrintWriter out_writer = new PrintWriter(con_out, false);
out_writer.print("GET http://somesite HTTP/1.1\r\n");
out_writer.print("Host: thehost\r\n");
//out_writer.print("Content-Length: 0\r\n");
out_writer.print("\r\n");
out_writer.flush();
// If we were not interpreting this data as a character stream, we might need to adjust byte ordering here.
InputStreamReader isr_reader = new InputStreamReader(con_in);
char[] streamBuf = new char[8192];
int amountRead;
StringBuilder receivedData = new StringBuilder();
while((amountRead = isr_reader.read(streamBuf)) > 0){
receivedData.append(streamBuf, 0, amountRead);
}
// Response is processed here.
if(connection != null && !connection.isClosed()){
//System.out.println("Connection Still Open...");
out_writer.print("GET http://someSite2\r\n");
out_writer.print("Host: somehost\r\n");
out_writer.print("Connection: close\r\n");
out_writer.print("\r\n");
out_writer.flush();
streamBuf = new char[8192];
amountRead = 0;
receivedData.setLength(0);
while((amountRead = isr_reader.read(streamBuf)) > 0 || amountRead < 1){
if (amountRead > 0)
receivedData.append(streamBuf, 0, amountRead);
}
}
// Process response here
}
Responses to questions:
Yes, I'm receiving chunked responses from the server.
I'm using raw sockets because of an outside restriction.
Apologies for the mess of code - I was rewriting it from memory and seem to have introduced a few bugs.
So the consensus is I have to either do (request, request, read) and let the server close the stream once I hit the end, or, if I do (request, read, request, read) stop before I hit the end of the stream so that the stream isn't closed.

According to your code, the only time you'll even reach the statements dealing with sending the second request is when the server closes the output stream (your input stream) after receiving/responding to the first request.
The reason for that is that your code that is supposed to read only the first response
while((amountRead = isr_reader.read(streamBuf)) > 0) {
receivedData.append(streamBuf, 0, amountRead);
}
will block until the server closes the output stream (i.e., when read returns -1) or until the read timeout on the socket elapses. In the case of the read timeout, an exception will be thrown and you won't even get to sending the second request.
The problem with HTTP responses is that they don't tell you how many bytes to read from the stream until the end of the response. This is not a big deal for HTTP 1.0 responses, because the server simply closes the connection after the response thus enabling you to obtain the response (status line + headers + body) by simply reading everything until the end of the stream.
With HTTP 1.1 persistent connections you can no longer simply read everything until the end of the stream. You first need to read the status line and the headers, line by line, and then, based on the status code and the headers (such as Content-Length) decide how many bytes to read to obtain the response body (if it's present at all). If you do the above properly, your read operations will complete before the connection is closed or a timeout happens, and you will have read exactly the response the server sent. This will enable you to send the next request and then read the second response in exactly the same manner as the first one.
P.S. Request, request, read might be "working" in the sense that your server supports request pipelining and thus, receives and processes both request, and you, as a result, read both responses into one buffer as your "first" response.
P.P.S Make sure your PrintWriter is using the US-ASCII encoding. Otherwise, depending on your system encoding, the request line and headers of your HTTP requests might be malformed (wrong encoding).

Writing a simple http/1.1 client respecting the RFC is not such a difficult task.
To solve the problem of the blocking i/o access where reading a socket in java, you must use java.nio classes.
SocketChannels give the possibility to perform a non-blocking i/o access.
This is necessary to send HTTP request on a persistent connection.
Furthermore, nio classes will give better performances.
My stress test give to following results :
HTTP/1.0 (java.io) -> HTTP/1.0 (java.nio) = +20% faster
HTTP/1.0 (java.io) -> HTTP/1.1 (java.nio with persistent connection) = +110% faster

Make sure you have a Connection: keep-alive in your request. This may be a moot point though.
What kind of response is the server returning? Are you using chunked transfer? If the server doesn't know the size of the response body, it can't provide a Content-Length header and has to close the connection at the end of the response body to indicate to the client that the content has ended. In this case, the keep-alive won't work. If you're generating content on-the-fly with PHP, JSP etc., you can enable output buffering, check the size of the accumulated body, push the Content-Length header and flush the output buffer.

Is there a particular reason you're using raw sockets and not Java's URL Connection or Commons HTTPClient?
HTTP isn't easy to get right. I know Commons HTTP Client can re-use connections like you're trying to do.
If there isn't a specific reason for you using Sockets this is what I would recommend :)

Writing your own correct client HTTP/1.1 implementation is nontrivial; historically most people who I've seen attempt it have got it wrong. Their implementation usually ignores the spec and just does what appears to work with one particular test server - in particular, they usually ignore the requirement to be able to handle chunked responses.
Writing your own HTTP client is probably a bad idea, unless you have some VERY strange requirements.

Does new URL(...).openConnection() necessarily imply a POST?

If I create an HTTP java.net.URL and then call openConnection() on it, does it necessarily imply that an HTTP post is going to happen? I know that openStream() implies a GET. If so, how do you perform one of the other HTTP verbs without having to work with the raw socket layer?

If you retrieve the URLConnection object using openConnection() it doesn't actually start communicating with the server. That doesn't happen until you get the stream from the URLConnection(). When you first get the connection you can add/change headers and other connection properties before actually opening it.
URLConnection's life cycle is a bit odd. It doesn't send the headers to the server until you've gotten one of the streams. If you just get the input stream then I believe it does a GET, sends the headers, then lets you read the output. If you get the output stream then I believe it sends it as a POST, as it assumes you'll be writing data to it (You may need to call setDoOutput(true) for the output stream to work). As soon as you get the input stream the output stream is closed and it waits for the response from the server.
For example, this should do a POST:
URL myURL = new URL("http://example.com/my/path");
URLConnection conn = myURL.openConnection();
conn.setDoOutput(true);
conn.setDoInput(true);
OutputStream os = conn.getOutputStream();
os.write("Hi there!");
os.close();
InputStream is = conn.getInputStream();
// read stuff here
While this would do a GET:
URL myURL = new URL("http://example.com/my/path");
URLConnection conn = myURL.openConnection();
conn.setDoOutput(false);
conn.setDoInput(true);
InputStream is = conn.getInputStream();
// read stuff here
URLConnection will also do other weird things. If the server specifies a content length then URLConnection will keep the underlying input stream open until it receives that much data, even if you explicitly close it. This caused a lot of problems for us as it made shutting our client down cleanly a bit hard, as the URLConnection would keep the network connection open. This probably probably exists even if you just use getStream() though.

No it does not. But if the protocol of the URL is HTTP, you'll get a HttpURLConnection as a return object. This class has a setRequestMethod method to specify which HTTP method you want to use.
If you want to do more sophisticated stuff you're probably better off using a library like Jakarta HttpClient.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.