I would like to have a second opinion on a small piece of Java code.
Will the method below always return an output string equal to the input string?
private static String func(final String url)
{
HttpURLConnection con = (HttpURLConnection)new URL(url).openConnection();
con.setInstanceFollowRedirects(true);
...
...
return con.getURL().toString();
}
The question refers to all possible scenarios, such as automatic redirection, etc.
If you look at URLConnection.getUrl() implementation, you can see that it returns the original URL passed to the constructor.
HttpURLConnection also doesn't change the original url.
To get the destination URL of a redirect you're supposed to call con.getHeaderField("Location"); - see for example: Retrieve the final location of a given URL in Java
So you get the original URL until you call connect() or some other method that results in establishing a connection.
If you set ((HttpURLConnection)con).setInstanceFollowRedirects(true); then after connect() if it really redirects you'll get the destination URL.
Redirect may not automatically happen for example when the protocol changes (e.g. http -> https).
Related
I have a shortened URL. Now I am using HttpUrlConnection to open the connection with the shortened link.
URL url = new URL(myshortened url);
Now I open the connection by calling:
HttpURLConnection httpurlconnection = url.openConnection();
Finally I am extracting the location header containing the actual destination URL by calling:
String expandedurl = httpurlconnection.getHeaderField("Location");
At the end I disconnect the httpurlconnection by calling:
httpurlconnection.disconnect();
I want to know if the URL I have used is of a malicious website, can it cause any harm to the calling host? If yes, then what are the possible ways it can attack the calling host?
Edit: I have even disabled redirect by calling:
httpurlconnection.setInstanceFollowRedirects(false);
It depends on what you do with the result. For example if you use it to query a database, it could be vulnerable for SQL injection.
I'm working with a system that, in order to make a particular service call, requires the following:
Issue an HTTP PUT command
Set the URL to some_url_here
Set the end user certificate.
Ensure that the entity body is empty and set the Content-Length headers to 0.
Here's the method I wrote to build secure connections. I've tested the GETs; they work fine. I know the problem isn't in the certificate.
public HttpsURLConnection getSecureConnection(final URL url, final String method, final int connectTimeout,
final int readTimeout) throws IOException {
Validate.notNull(sslContext);
Validate.notNull(url);
Validate.notNull(method);
Validate.isTrue(connectTimeout > 0);
Validate.isTrue(readTimeout > 0);
HttpsURLConnection connection;
try {
connection = (HttpsURLConnection) url.openConnection();
} catch (final IOException ioe) {
LOGGER.error("[CertificateLoader] Unable to open URL connection!", ioe);
throw new IOException("Unable to open URL connection!", ioe);
}
connection.setSSLSocketFactory(sslContext.getSocketFactory());
connection.setRequestMethod(method);
connection.setConnectTimeout(connectTimeout);
connection.setReadTimeout(readTimeout);
connection.setHostnameVerifier(NoopHostnameVerifier.INSTANCE);
if (method.equals("PUT")) {
connection.setRequestProperty("Content-Length", "0");
}
if (connection.getContentLength() > 0) {
Object foo = connection.getContent();
LOGGER.error("This is what's in here: " + foo.toString());
}
return connection;
}
Now, the reason for that funky if at the bottom is that when I go to make the PUT call, even though I'm not putting a body on the call directly, my logs insist I'm getting a non-zero content length. So, I added that little block to try to figure out what's in there, and lo and behold it reports the following:
This is what's in here: sun.net.www.protocol.http.HttpURLConnection$HttpInputStream#70972170
Now, that sucker's in there by default. I didn't put it in there. I didn't create that object to put in there. I just created the object as is from the URL, which I created from a String elsewhere. What I need is a way to remove that HttpInputStream object, or set it to null, or otherwise tell the code that there should be no body to this PUT request, so that my server won't reject my message as being ill-formatted. Suggestions?
Now, the reason for that funky if at the bottom is that when I go to make the PUT call, even though I'm not putting a body on the call directly, my logs insist I'm getting a non-zero content length.
The way to set a zero Content-length is as follows:
connection.setDoOutput(true); // if it's PUT or POST
connection.setRequestMethod(method);
connection.getOutputStream().close(); // send a zero length request body
It is never necessary to call connection.setRequestProperty("Content-Length", "0"). Java sets it for you. Or possibly it is omitted, in which case you may be able to ensure it via
connection.setFixedLengthStreamingMode(0);
So, I added that little block to try to figure out what's in there, and lo and behold it reports the following:
This is what's in here: sun.net.www.protocol.http.HttpURLConnection$HttpInputStream#70972170
Now, that sucker's in there by default. I didn't put it in there.
Java put it there.
I didn't create that object to put in there.
Java put it there.
I just created the object as is from the URL, which I created from a String elsewhere. What I need is a way to remove that HttpInputStream object, or set it to null, or otherwise tell the code that there should be no body to this PUT request, so that my server won't reject my message as being ill-formatted.
No it isn't. It is an input stream, not a piece of content. And it is an input stream to the content of the response, not of the request. And in any case, the server is perfectly entitled to return you content in response to your request.
Your task is to:
Get the response code and log it.
If it is >=200 and <= 299, get the connection's input stream.
Otherwise get the connection's error stream.
Whichever stream you got, read it till end of stream, and log it.
That will tell you what is really happening.
I will add that a PUT without a body is a really strange thing to do. Are you sure you've understood the requirement? 411 means Length required.
I'm learning java and come across something which confuses me quite a bit. I'm watching a video on an explanation of how http requests work...
URL theURL = new URL("http://www.google.com");
**URLConnection theConn = theURL.openConnection();**
I understand the first line in that it is just creating a URL object with an actual url as an argument. But I don't understand how in the second line, a URLConnection object is being created and being set equal to a method of the other object, or is that method returning something?
The method is returning a URLConnection as documented by the URL.openConnection() Javadoc which says (in part)
Returns a URLConnection instance that represents a connection to the remote object referred to by the URL.
I am trying to retrieve the final location of a given URL (String ref) as follows:
HttpURLConnection con = (HttpURLConnection)new URL(ref).openConnection();
con.setInstanceFollowRedirects(true);
con.setRequestProperty("User-Agent","");
int responseCode = con.getResponseCode();
return con.getURL().toString();
It works in most cases, but rarely returns a URL which yet contains another redirection.
What am I doing wrong here?
Why do I get responseCode = 3xx, even after calling setInstanceFollowRedirects(true)?
UPDATE:
OK, responseCode can sometimes be 3xx.
If it happens, then I will return con.getHeaderField("Location") instead.
The code now is:
HttpURLConnection con = (HttpURLConnection)new URL(ref).openConnection();
con.setInstanceFollowRedirects(true);
con.setRequestProperty("User-Agent","");
int responseType = con.getResponseCode()/100;
while (responseType == 1)
{
Thread.sleep(10);
responseType = con.getResponseCode()/100;
}
if (responseType == 3)
return con.getHeaderField("Location");
return con.getURL().toString();
Will appreciate comment should anyone see anything wrong with the code above.
UPDATE
Removed the handling of code 1xx, as according to most commenters it is not necessary.
Testing if the Location header exists before returning it, in order to handle code 304.
HttpURLConnection con = (HttpURLConnection)new URL(ref).openConnection();
con.setInstanceFollowRedirects(true);
con.setRequestProperty("User-Agent","");
if (con.getResponseCode()/100 == 3)
{
String target = con.getHeaderField("Location");
if (target != null)
return target;
}
return con.getURL().toString();
HttpURLConnection will not follow redirects if the protocol changes, such as http to https or https to http. In that case, it will return the 3xx code and you should be able to get the Location header. You may need to open a connection again in case that new url also redirects. So basically, use a loop and break it when you get a non-redirect response code. Also, watch out for infinite redirect loops, you could set a limit for the number of iterations or check if each new url has been visited already.
If you just want the redirect url, the response header should give you that:
if (con.getResponseCode() == 301) {
String redirectUrl = con.getHeaderField("Location");
}
There probably can easily be multiple levels of redirection - imagine a bit.ly pointing to a youtu.be address pointing to youtube.com. Perhaps you need to loop until you get your 200 OK or until you hit a redirection cycle.
I have trouble locating the source code to check but I believe what I said is true. See e.g. java urlconnection get the final redirected URL
You also might need to handle protocol redirects, e.g. HTTP -> HTTPS: URLConnection Doesn't Follow Redirect
I think I now understand what you want. I now think that you are trying to retrieve the final address, not the content of the final address. Please correct me if my assumption is wrong.
For doing this (not the content, but the address), you need a different approach. You need to switch off follow-redirects and you then need to handle the iterational redirect-following on your own until you find a non-redirecting response. Bear in mind that you can not reuse a URLConnection.
The approaches for finding the final address and the other approach for retrieving the content of the final address are so different, because URLConnection does not reveal the followed-to address if you switch on follow-redirects.
In your code, you seem to expect URLConnection.getURL() to return the followed-to address. This is not the behavior of this method. It returns the original URL which you used to create the URLConnection. It does this no matter if you switch on follow-redirects or not.
However, if you switch it on, you will not be able to get the followed-to URL address. This is because getHeaderField("Location"), with follow-redirects, makes no sense: it returns the redirection-target of the final redirect, which should not exist, since it's the final address.
Sometime it is loading in the field of requestURI. Use like this code:
val declaredField = con.javaClass.getDeclaredField("requestURI")
declaredField.isAccessible=true
val loc = declaredField.get(con).toString()
Given a URL (String ref), I am attempting to retrieve the redirected URL as follows:
HttpURLConnection con = (HttpURLConnection)new URL(ref).openConnection();
con.setInstanceFollowRedirects(false);
con.setRequestProperty("User-Agent","");
int responseType = con.getResponseCode()/100;
while (responseType == 1)
{
Thread.sleep(10);
responseType = con.getResponseCode()/100;
}
if (responseType == 3)
return con.getHeaderField("Location");
return con.getURL().toString();
I am having several (conceptual and technical) problems with it:
Conceptual problem:
It works in most cases, but I don't quite understand how.
All methods of the 'con' instance are called AFTER the connection is opened (when 'con' is instanciated).
So how do they affect the actual result?
How come calling 'setInstanceFollowRedirects' affects the returned value of 'getHeaderField'?
Is there any point calling 'getResponseCode' over and over until the returned value is not 1xx?
Bottom line, my general question here: is there another request/response sent through the connection every time one of these methods is invoked?
Technical problem:
Sometimes the response-code is 3xx, but 'getHeaderField' does not return the "final" URL.
I tried calling my code with the returned value of 'getHeaderField' until the response-code was 2xx.
But in most other cases where the response-code is 3xx, 'getHeaderField' DOES return the "final" URL, and if I call my code with this URL then I get an empty string.
Can you please advise how to approach the two problems above in order to have a "100% proof" code for retrieving the "final" URL?
Please ignore cases where the response-code is 4xx or 5xx (or anything else other than 1xx / 2xx / 3xx for that matter).
Thanks
Conceptual problems:
0.) Can one URLConnection or HttpURLConnection object be reused?
No, you can not reuse such an object. You can use it to fetch the content of one URL just once. You can not use it to retrieve another URL, nor to fetch the content twice (speaking on the network level).
If you want to fetch another URL or to fetch the URL a second time, you have to call the openConnection() method of the URL class again to instanciate a new connection object.
1.) When is the URLConnection actually connected?
The method name openConnection() is misleading. It only instanciates the connection object. It does not do anything on the network level.
The interaction on the network level starts in this line, which implicitly connects the connection (= the TCP socket under the hood is opened and data is sent and received):
int responseType = con.getResponseCode()/100;
.
Alternatively, you can use HttpURLConnection.connect() to explicitly connect the connection.
2.) How does setInstanceFollowRedirects work?
setInstanceFollowRedirects(true) causes the URLs to be fetched "under the hood" again and again until there is a non-redirect response. The response code of the non-redirect response is returned by your call to getResponseCode().
UPDATE:
Yes, this allows to write simple code if you do not want to bother about the redirects yourself. You can simply switch on to follow redirects and then you can read the final response of the location to which you get redirected as if there was no redirect taking place.
I would be more careful in evaluating the response code. Not every 3xx-code is automatically a kind of redirection. For example the code 304 just stands for "Not modified."
Look at the original definitions here.