Retrieve redirected URL with Java / HttpURLConnection - java

Given a URL (String ref), I am attempting to retrieve the redirected URL as follows:
HttpURLConnection con = (HttpURLConnection)new URL(ref).openConnection();
con.setInstanceFollowRedirects(false);
con.setRequestProperty("User-Agent","");
int responseType = con.getResponseCode()/100;
while (responseType == 1)
{
Thread.sleep(10);
responseType = con.getResponseCode()/100;
}
if (responseType == 3)
return con.getHeaderField("Location");
return con.getURL().toString();
I am having several (conceptual and technical) problems with it:
Conceptual problem:
It works in most cases, but I don't quite understand how.
All methods of the 'con' instance are called AFTER the connection is opened (when 'con' is instanciated).
So how do they affect the actual result?
How come calling 'setInstanceFollowRedirects' affects the returned value of 'getHeaderField'?
Is there any point calling 'getResponseCode' over and over until the returned value is not 1xx?
Bottom line, my general question here: is there another request/response sent through the connection every time one of these methods is invoked?
Technical problem:
Sometimes the response-code is 3xx, but 'getHeaderField' does not return the "final" URL.
I tried calling my code with the returned value of 'getHeaderField' until the response-code was 2xx.
But in most other cases where the response-code is 3xx, 'getHeaderField' DOES return the "final" URL, and if I call my code with this URL then I get an empty string.
Can you please advise how to approach the two problems above in order to have a "100% proof" code for retrieving the "final" URL?
Please ignore cases where the response-code is 4xx or 5xx (or anything else other than 1xx / 2xx / 3xx for that matter).
Thanks

Conceptual problems:
0.) Can one URLConnection or HttpURLConnection object be reused?
No, you can not reuse such an object. You can use it to fetch the content of one URL just once. You can not use it to retrieve another URL, nor to fetch the content twice (speaking on the network level).
If you want to fetch another URL or to fetch the URL a second time, you have to call the openConnection() method of the URL class again to instanciate a new connection object.
1.) When is the URLConnection actually connected?
The method name openConnection() is misleading. It only instanciates the connection object. It does not do anything on the network level.
The interaction on the network level starts in this line, which implicitly connects the connection (= the TCP socket under the hood is opened and data is sent and received):
int responseType = con.getResponseCode()/100;
.
Alternatively, you can use HttpURLConnection.connect() to explicitly connect the connection.
2.) How does setInstanceFollowRedirects work?
setInstanceFollowRedirects(true) causes the URLs to be fetched "under the hood" again and again until there is a non-redirect response. The response code of the non-redirect response is returned by your call to getResponseCode().
UPDATE:
Yes, this allows to write simple code if you do not want to bother about the redirects yourself. You can simply switch on to follow redirects and then you can read the final response of the location to which you get redirected as if there was no redirect taking place.

I would be more careful in evaluating the response code. Not every 3xx-code is automatically a kind of redirection. For example the code 304 just stands for "Not modified."
Look at the original definitions here.

Related

Sending a zero-length HTTPS PUT?

I'm working with a system that, in order to make a particular service call, requires the following:
Issue an HTTP PUT command
Set the URL to some_url_here
Set the end user certificate.
Ensure that the entity body is empty and set the Content-Length headers to 0.
Here's the method I wrote to build secure connections. I've tested the GETs; they work fine. I know the problem isn't in the certificate.
public HttpsURLConnection getSecureConnection(final URL url, final String method, final int connectTimeout,
final int readTimeout) throws IOException {
Validate.notNull(sslContext);
Validate.notNull(url);
Validate.notNull(method);
Validate.isTrue(connectTimeout > 0);
Validate.isTrue(readTimeout > 0);
HttpsURLConnection connection;
try {
connection = (HttpsURLConnection) url.openConnection();
} catch (final IOException ioe) {
LOGGER.error("[CertificateLoader] Unable to open URL connection!", ioe);
throw new IOException("Unable to open URL connection!", ioe);
}
connection.setSSLSocketFactory(sslContext.getSocketFactory());
connection.setRequestMethod(method);
connection.setConnectTimeout(connectTimeout);
connection.setReadTimeout(readTimeout);
connection.setHostnameVerifier(NoopHostnameVerifier.INSTANCE);
if (method.equals("PUT")) {
connection.setRequestProperty("Content-Length", "0");
}
if (connection.getContentLength() > 0) {
Object foo = connection.getContent();
LOGGER.error("This is what's in here: " + foo.toString());
}
return connection;
}
Now, the reason for that funky if at the bottom is that when I go to make the PUT call, even though I'm not putting a body on the call directly, my logs insist I'm getting a non-zero content length. So, I added that little block to try to figure out what's in there, and lo and behold it reports the following:
This is what's in here: sun.net.www.protocol.http.HttpURLConnection$HttpInputStream#70972170
Now, that sucker's in there by default. I didn't put it in there. I didn't create that object to put in there. I just created the object as is from the URL, which I created from a String elsewhere. What I need is a way to remove that HttpInputStream object, or set it to null, or otherwise tell the code that there should be no body to this PUT request, so that my server won't reject my message as being ill-formatted. Suggestions?
Now, the reason for that funky if at the bottom is that when I go to make the PUT call, even though I'm not putting a body on the call directly, my logs insist I'm getting a non-zero content length.
The way to set a zero Content-length is as follows:
connection.setDoOutput(true); // if it's PUT or POST
connection.setRequestMethod(method);
connection.getOutputStream().close(); // send a zero length request body
It is never necessary to call connection.setRequestProperty("Content-Length", "0"). Java sets it for you. Or possibly it is omitted, in which case you may be able to ensure it via
connection.setFixedLengthStreamingMode(0);
So, I added that little block to try to figure out what's in there, and lo and behold it reports the following:
This is what's in here: sun.net.www.protocol.http.HttpURLConnection$HttpInputStream#70972170
Now, that sucker's in there by default. I didn't put it in there.
Java put it there.
I didn't create that object to put in there.
Java put it there.
I just created the object as is from the URL, which I created from a String elsewhere. What I need is a way to remove that HttpInputStream object, or set it to null, or otherwise tell the code that there should be no body to this PUT request, so that my server won't reject my message as being ill-formatted.
No it isn't. It is an input stream, not a piece of content. And it is an input stream to the content of the response, not of the request. And in any case, the server is perfectly entitled to return you content in response to your request.
Your task is to:
Get the response code and log it.
If it is >=200 and <= 299, get the connection's input stream.
Otherwise get the connection's error stream.
Whichever stream you got, read it till end of stream, and log it.
That will tell you what is really happening.
I will add that a PUT without a body is a really strange thing to do. Are you sure you've understood the requirement? 411 means Length required.

When does HttpURLConnection on Android really call the request

I have the following code:
HttpURLConnection conn = null;
BufferedReader in = null;
StringBuilder sb = null;
InputStream is = null;
conn = (HttpURLConnection) url.openConnection();
// Break-point A
conn.setDoInput(true);
conn.setDoOutput(true);
conn.setRequestMethod("POST");
// Break-point B
conn.setRequestProperty("X-TP-APP", Constants.X_TP_APP);
conn.setRequestProperty("X-TP-DEVICE", Constants.X_TP_DEVICE);
conn.setRequestProperty("X-TP-LOCALE", Constants.X_TP_LOCALE);
conn.setRequestProperty("Content-Type", contentType);
conn.setRequestProperty("Accept", accept);
conn.setRequestProperty("Authorization", SystemApi.TOKEN_STR);
conn.setUseCaches(false);
conn.setConnectTimeout(30000);
conn.getOutputStream().write(req.getBytes("UTF-8"));
conn.getOutputStream().flush();
conn.getOutputStream().close();
is = conn.getInputStream();
in = new BufferedReader(new InputStreamReader(is));
int statusCode = conn.getResponseCode();
// Break-point C
The code is running fine without problem (when breakpoint(A,B) is disabled)
I tried to find out when does HttpURLConnection really call the request and place breakpoint(A) after conn = getConnection(strURL);
and continue the code, but then at the end, at breakpoint(C), server would return me 401 - Unauthorized, which mean my Authorization header is not in the request.
It seem like that we are trying to open a connection first, and then set the header as fast as we can. If we are not fast enough, then the request is called anyway, which doesn't seem right.
My question and concern:
When does HttpURLConnection really call the request?
Is this what is actually happening? Is this the correct way to do so?
Is there a better way to make sure the header is set before calling the request?
Per the docs, the actual connection is made when the connect() method is invoked on the [Http]UrlConnection. That may be done manually, or it may be done implicitly by certain other methods. The Javadocs for UrlConnection.connect() say, in part:
URLConnection objects go through two phases: first they are created, then they are connected. After being created, and before being connected, various options can be specified (e.g., doInput and UseCaches). After connecting, it is an error to try to set them. Operations that depend on being connected, like getContentLength, will implicitly perform the connection, if necessary.
Note in particular the last sentence. I don't see anything in your code that would require the connection to be established until the first conn.getOutputStream(), and I read the docs as saying that the connection object will not enter the "connected" state until some method is invoked on it that requires that. Until such a time, it is ok to set connection properties.
Moreover, the docs definitely state that methods that set properties on the connection (and setRequestProperty() in particular) will throw an IllegalStateException if invoked when the connection object is already connected.
It is possible that your Java library is buggy in the manner you describe, but that would certainly be in conflict with the API specification. I think it's more likely that the explanation for the behavior you observe is different, and I recommend you capture and analyze the actual HTTP traffic to determine what's really going on.
Actually what really happened is, in the debug mode, I used conn.getResponseCode() in the expressions, which force the conn.getResponseCode() to run.
When it is not connected yet, getResponseCode() would calls connect() before the request is prepared.
Hence it would return me 401.
Since Android using the same HttpURLConnection, I did some capture the packet exchange to see what is happening under the hood.
I detailed my experiment in this post Can you explain the HttpURLConnection connection process?
To outline the network activity for your program.
At Breakpoint A No physical connection is made to the remote server. You get a logical handle to a local connection object.
At Breakpoint B You just configure the local connection object, nothing more.
conn.getOutputStream() Network connection starts here, but no payload is transferred to the server.
conn.getInputStream() Payload (http headers, content) are sent to the server, and you get the response (buffered into input stream, and also the response code etc.)
To Answer your question
When does HttpURLConnection really call the request?
getInputStream() triggers network layer to send out application payload and got responses.
Is this what is actually happening? Is this the correct way to do so?
No. openConnection() does not initiate network activity. You are getting back a local handle for future connection, not an active connection.
Is there a better way to make sure the header is set before calling the request?
You don't need to make sure header is set. The header payload isn't sent to the server until you ask for response (such as getting the response code, or opening a inputStream )

URLConnection.getURL method

I would like to have a second opinion on a small piece of Java code.
Will the method below always return an output string equal to the input string?
private static String func(final String url)
{
HttpURLConnection con = (HttpURLConnection)new URL(url).openConnection();
con.setInstanceFollowRedirects(true);
...
...
return con.getURL().toString();
}
The question refers to all possible scenarios, such as automatic redirection, etc.
If you look at URLConnection.getUrl() implementation, you can see that it returns the original URL passed to the constructor.
HttpURLConnection also doesn't change the original url.
To get the destination URL of a redirect you're supposed to call con.getHeaderField("Location"); - see for example: Retrieve the final location of a given URL in Java
So you get the original URL until you call connect() or some other method that results in establishing a connection.
If you set ((HttpURLConnection)con).setInstanceFollowRedirects(true); then after connect() if it really redirects you'll get the destination URL.
Redirect may not automatically happen for example when the protocol changes (e.g. http -> https).

How it comes that URL.openConnection() allows me to read header?

I recently was experimenting with java networking and I found a bit odd thing, suppose you have
URL url = new URL("http://www.google.com");
URLConnection con = url.openConnection();
then i can call methods, like con.getContentLength() and so on and they will give me correct values, even despite I didn't envoke con.connect(). How can that be? I mean, where from/how does URLConnection gets those headers, I didn't invoke con.connect() yet, so no requests were sent and so no headers should be available at that moment.
The actual TCP connect happens implicitly when you call any method that requires the response, such as getContentLength(), getInputStream(), getResponseCode(). It doesn't happen at openConnection(). The request is sent at that point.
Unless you are using one of the streaming modes and you're doing a PUT or POST with request content, in which case the connection is opened when you start writing the request.

Optimizing HttpURLConnection in Android

this problem is bugging me:
HttpURLConnection con = (HttpURLConnection)new URL(url).openConnection();
con.setRequestMethod("HEAD");
if (con.getResponseCode()!=200 ){dosomething()}
Is this the correct way to set the Request Method, or is it already too late since I called URL.openConnection() and it already made the connection using the default which is GET?
I can't call setRequestMethod("HEAD") in the same line as openConnection because it returns a URLConnection,not a HttpURLConnection.
So how do I ensure that the method will always be HEAD knowing the default is GET?
Should I just use HttpClient ?
That's the correct method.
Calling openConnection() doesn't actually do anything. The request isn't "committed" (that is, nothing is sent to the server) until you ask for something that is returned in the server's response, like the body of the response (con.getInputStream()), the status (con.getResponseCode()), or some other response header. This gives you time to set options on the HttpUrlConnection, like whether you plan to send a request body (i.e., POST), set the request method, etc.
By the way, you could set the method "on the same line," but being on the same line is meaningless: either openConnection() sends the request method, or it doesn't. Method calls that happen after are not a factor, regardless of the line they are on.

Categories