When does URL redirect happen?

When does URL redirect happen? - java

I have a URL that redirects to another one, that redirects to another one, and I'm trying to get the end domain. I thought that the redirection would happen when connecting, but running this code tells me otherwise:
public static void main(String[] args)
{
System.out.println(resolveEndDomain("http://goo.gl/ELHEjl"));
}
private static String resolveEndDomain(String deepLink)
{
HttpURLConnection httpConnection = null;
try
{
httpConnection = (HttpURLConnection) new URL(deepLink).openConnection();
//httpConnection.getResponseCode();
}
catch (Exception e)
{
e.printStackTrace();
}
return httpConnection.getURL().getHost();
}
So, if I run it, it will just give me goo.gl. But if I uncomment the line to get the response code, it goes and fetches it from the end domain, and prints www.thomann.de. The same occurs if I use getHeaderFields(), getContent() or similar, but connect() doesn't seem to help. I'm not interested in any of those responses so, how can I make it go and resolve the redirections? When does the connection effectively occur?

As for your question, the redirect happens when the server being connected replies with a 301 or 302 HTTP Status Code. Note that Status Code are part of the answer from the server, so you must have actually requested the resource from it (usually with connect()).
As you have already found, HttpURLConnection usually handles the part of checking for the redirect Status Code and redirects you to the URL provided as part of the redirect response (up to a maximum of redirections, I am not sure where that limit is defined).
You could set followsRedirects to false and interpret the HTTP headers yourself to find if you are going to be redirected or not, but until you get a 2X Status code (or 4XX -not found, forbidden- or 5XX -server error-) you will not be sure if the URL you are being redirected to is the last one.
For example, you could connect to http://goo.gl/ELHEjl and find that it redirects you to http://www.bit.ly/s34314313. But then, unless you connect to that URL and get a not 3XX Status Code, you cannot be sure if connecting to http://www.bit.ly/s34314313 is the final resource location or it will just redirect you again.

Related

Links give invalid response code from code but valid response code from browser

I'm validating links by trying to hit them and getting the response codes(in Java). But I get invalid response codes(403 or 404) from code but from browser, I get 200 status code when I inspect the network activity. Here's my code that gets the response code. [I do basic validations on urls beforehand, like making it lowercase, etc.]
static int getResponseCode(String link) throws IOException {
URL url = new URL(link);
HttpURLConnection http = (HttpURLConnection) url.openConnection();
return http.getResponseCode();
}
For link like http://science.sciencemag.org/content/220/4599/868, I am getting 403 status when I run this code. But on browser(chrome), I am getting 200 status. Also, if I use the below curl command, I am getting 200 status code.
curl -Is http://science.sciencemag.org/content/220/4599/868

The only way to overcome that is to:
check what are the HTTP headers sent by your program (for instance, by sending queries to http://scooterlabs.com/echo and check the response)
check what are the HTTP headers sent by your browser (for instance, by visiting https://www.whatismybrowser.com/detect/what-http-headers-is-my-browser-sending )
spot the differences
change your program to send the same headers as your browser (the ones that work)
I made this analysis for you, and it turns out this website requires an Accept header that resemble the Accept headers of an existing browser. By default Java sends something valid, but not resembling that.
You just need to change your program as so:
static int getResponseCode(String link) throws IOException {
URL url = new URL(link);
HttpURLConnection http = (HttpURLConnection) url.openConnection();
http.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
return http.getResponseCode();
}
(Or any other value that an actual browser uses)

Sending a zero-length HTTPS PUT?

I'm working with a system that, in order to make a particular service call, requires the following:
Issue an HTTP PUT command
Set the URL to some_url_here
Set the end user certificate.
Ensure that the entity body is empty and set the Content-Length headers to 0.
Here's the method I wrote to build secure connections. I've tested the GETs; they work fine. I know the problem isn't in the certificate.
public HttpsURLConnection getSecureConnection(final URL url, final String method, final int connectTimeout,
final int readTimeout) throws IOException {
Validate.notNull(sslContext);
Validate.notNull(url);
Validate.notNull(method);
Validate.isTrue(connectTimeout > 0);
Validate.isTrue(readTimeout > 0);
HttpsURLConnection connection;
try {
connection = (HttpsURLConnection) url.openConnection();
} catch (final IOException ioe) {
LOGGER.error("[CertificateLoader] Unable to open URL connection!", ioe);
throw new IOException("Unable to open URL connection!", ioe);
}
connection.setSSLSocketFactory(sslContext.getSocketFactory());
connection.setRequestMethod(method);
connection.setConnectTimeout(connectTimeout);
connection.setReadTimeout(readTimeout);
connection.setHostnameVerifier(NoopHostnameVerifier.INSTANCE);
if (method.equals("PUT")) {
connection.setRequestProperty("Content-Length", "0");
}
if (connection.getContentLength() > 0) {
Object foo = connection.getContent();
LOGGER.error("This is what's in here: " + foo.toString());
}
return connection;
}
Now, the reason for that funky if at the bottom is that when I go to make the PUT call, even though I'm not putting a body on the call directly, my logs insist I'm getting a non-zero content length. So, I added that little block to try to figure out what's in there, and lo and behold it reports the following:
This is what's in here: sun.net.www.protocol.http.HttpURLConnection$HttpInputStream#70972170
Now, that sucker's in there by default. I didn't put it in there. I didn't create that object to put in there. I just created the object as is from the URL, which I created from a String elsewhere. What I need is a way to remove that HttpInputStream object, or set it to null, or otherwise tell the code that there should be no body to this PUT request, so that my server won't reject my message as being ill-formatted. Suggestions?

Now, the reason for that funky if at the bottom is that when I go to make the PUT call, even though I'm not putting a body on the call directly, my logs insist I'm getting a non-zero content length.
The way to set a zero Content-length is as follows:
connection.setDoOutput(true); // if it's PUT or POST
connection.setRequestMethod(method);
connection.getOutputStream().close(); // send a zero length request body
It is never necessary to call connection.setRequestProperty("Content-Length", "0"). Java sets it for you. Or possibly it is omitted, in which case you may be able to ensure it via
connection.setFixedLengthStreamingMode(0);
So, I added that little block to try to figure out what's in there, and lo and behold it reports the following:
This is what's in here: sun.net.www.protocol.http.HttpURLConnection$HttpInputStream#70972170
Now, that sucker's in there by default. I didn't put it in there.
Java put it there.
I didn't create that object to put in there.
Java put it there.
I just created the object as is from the URL, which I created from a String elsewhere. What I need is a way to remove that HttpInputStream object, or set it to null, or otherwise tell the code that there should be no body to this PUT request, so that my server won't reject my message as being ill-formatted.
No it isn't. It is an input stream, not a piece of content. And it is an input stream to the content of the response, not of the request. And in any case, the server is perfectly entitled to return you content in response to your request.
Your task is to:
Get the response code and log it.
If it is >=200 and <= 299, get the connection's input stream.
Otherwise get the connection's error stream.
Whichever stream you got, read it till end of stream, and log it.
That will tell you what is really happening.
I will add that a PUT without a body is a really strange thing to do. Are you sure you've understood the requirement? 411 means Length required.

Get the redirected URL of a very specific URL (in Java)

How can I get the redirected URL of http://at.atwola.com/?adlink/5113/1649059/0/2018/AdId=4041444;BnId=872;itime=15692006;impref=13880156912668385284; in Java?
My code (given below) is constructed according to answers to similar questions on stack-overflow (https://stackoverflow.com/a/5270162/1382251 in particular).
But it just yields the original URL. I suspect that there are other similar cases, so I would like to resolve this one in specific and use the solution in general.
String ref = "http://at.atwola.com/?adlink/5113/1649059/0/2018/AdId=4041444;BnId=872;itime=15692006;impref=13880156912668385284;";
try
{
URLConnection con1 = new URL(ref).openConnection();
con1.connect();
InputStream is = con1.getInputStream();
URL url = con1.getURL();
is.close();
String finalPage = url.toString();
if (finalPage.equals(ref))
{
HttpURLConnection con2 = (HttpURLConnection)con1;
con2.setInstanceFollowRedirects(false);
con2.connect();
if (con2.getResponseCode()/100 == 3)
finalPage = con2.getHeaderField("Location");
}
System.out.println(finalPage);
}
catch (Exception error)
{
System.out.println("error");
}

I played a bit with your URL with telnet, wget, and curl and I noticed that in some cases the server returns response 200 OK, and sometimes 302 Moved Temporarily. The main difference seems to be the request User-agent header. Your code works if you add the following before con1.connect():
con1.setRequestProperty("User-Agent","");
That is, with empty User-Agent (or if the header is not present at all), the server issues a redirect. With the Java User-Agent (in my case User-Agent: Java/1.7.0_45) and with the default curl User-Agent (User-Agent: curl/7.32.0) the server responds with 200 OK.
In some cases you might need to also set:
System.setProperty("http.agent", "");
See Setting user agent of a java URLConnection
The server running the site is the Adtech Adserver and apparently it is doing user agent sniffing. There is a long history of user agent sniffing. So it seems that the safest thing to do would be to set the user agent to Mozilla:
con1.setRequestProperty("User-Agent","Mozilla"); //works with your code for your URL
Maybe the safest option would be to use a user agent used by some of the popular web browsers.

Verify random URLs on a network in Java

This question may be a bit too low-level, but I couldn't find an answer already.
I'm typing this next paragraph so that you can correct me/ explain the things I refer to unwittingly.
You know in a web browser you can type directory paths from your own computer, and it will bring them up? Apparently, it also works with pages within a local network. If there's another page on the same subnet, you can access it with "http://pagename/".
On the network I'm a part of, there are a lot of these pages, and they all (or mostly) have common, single-word names, such as "http://word/" . I want to test, using Java, a dictionary of common words to see which exist as locations on the network. Of course, there's probably an easier way if I know the range of ip addresses on the network, which I do. However, I get the "page not found" page if I try typing the IP address of, say, "http://word/" (which I get from ping), into the address bar. This is true even if "http://word/" works.
So say I loop through my word bank. How can I test if a URL is real?
I've worked out how to load my word bank. Here's what I have right now
URL article=new URL("http://word"); //sample URL
URLConnection myConn=article.openConnection();
Scanner myScan=new Scanner(new InputStreamReader(myConn.getInputStream()));
System.out.println(myScan.hasNext()); //Diagnostic output
This works when the URL is constructed with a valid URL. When it gets passed a bad URL, the program just ignores the System.out.println, not even making a new line. I know that different browsers show different "page not found" screens, and that these have their own html source code. Maybe that's related to my problem?
How can I test if a URL is real using this method?
Is there a way to test it with IP addresses, given my problem? or, why am I having a problem typing in the IP address and not the URL?

You should check HTTP response code. If URL is "real" (in your terms) the response code should be 200. Otherwise I believe that you will get other response code.
Do it using HttpUrlConnection.getResponseCode();
HttpUrlConnection is a subclass of URLConnection. When your are connecting with HTTP that is actually what you get from openConnection(), so you can say:
URL article=new URL("http://word"); //sample URL
HttpURLConnection myConn = (HttpURLConnection)article.openConnection();

If you are testing only http urls you can cast the URLConnection to a HTTPUrlConnection and check the HTTP response code for 200 = HTTP_OK:
URL article=new URL("http://word"); //sample URL
HttpURLConnection myConn= (HttpURLConnection)article.openConnection();
if (myConn.getResponseCode() == HttpURLConnection.HTTP_OK) {
// Site exists and has valid content
}
Additionally if you want to test IP addresses you van simply use it as url:
http://10.0.0.1

I think I've figured it out.
This code wouldn't compile without me catching IOException (Because of URL, URLConnection, and Scanner), so I had to try{/*code*/} catch(IOException oops){}, which I did nothing with. I didn't think that it was important to put the try/catch in my question. UnknownHostException and MalformedURLException extend IOException, so I was already unwittingly triggering one of them with Scanner.hasNext() or with HttpURLConnection.getResponseCode(), catching it, and exiting the try block. Thus, I never got a response code when I had a bad URL. So I need to write
try
{
URL article=new URL("http://word");
HttpURLConnection myConn=(HttpURLConnection)article.openConnection();
//code to store "http://word" as a working URL
}
catch (UnknownHostException ex) {/*code if "http://word" is not a working URL*/}
catch (IOException oops) {oops.printStackTrace();}
Thanks for everyone's help, I learned a lot. If you have a different/better answer or if you can answer why using the IP addresses didn't work, I'm still wondering that.

URLConnection FileNotFoundException for non-standard HTTP port sources

I was trying to use the Apache Ant Get task to get a list of WSDLs generated by another team in our company. They have them hosted on a weblogic 9.x server on http://....com:7925/services/. I am able to get to the page through a browser, but the get task gives me a FileNotFoundException when trying to copy the page to a local file to parse. I was still able to get (using the ant task) a URL without the non-standard port 80 for HTTP.
I looked through the Ant source code, and narrowed the error down to the URLConnection. It seems as though the URLConnection doesn't recognize the data is HTTP traffic, since it isn't on the standard port, even though the protocol is specified as HTTP. I sniffed the traffic using WireShark and the page loads correctly across the wire, but still gets the FileNotFoundException.
Here's an example where you will see the error (with the URL changed to protect the innocent). The error is thrown on connection.getInputStream();
import java.io.File;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
public class TestGet {
private static URL source;
public static void main(String[] args) {
doGet();
}
public static void doGet() {
try {
source = new URL("http", "test.com", 7925,
"/services/index.html");
URLConnection connection = source.openConnection();
connection.connect();
InputStream is = connection.getInputStream();
} catch (Exception e) {
System.err.println(e.toString());
}
}
}

The response to my HTTP request returned with a status code 404, which resulted in a FileNotFoundException when I called getInputStream(). I still wanted to read the response body, so I had to use a different method: HttpURLConnection#getErrorStream().
Here's a JavaDoc snippet of getErrorStream():
Returns the error stream if the
connection failed but the server sent
useful data nonetheless. The typical
example is when an HTTP server
responds with a 404, which will cause
a FileNotFoundException to be thrown
in connect, but the server sent an
HTML help page with suggestions as to
what to do.
Usage example:
public static String httpGet(String url) {
HttpURLConnection con = null;
InputStream is = null;
try {
con = (HttpURLConnection) new URL(url).openConnection();
con.connect();
//4xx: client error, 5xx: server error. See: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html.
boolean isError = con.getResponseCode() >= 400;
//In HTTP error cases, HttpURLConnection only gives you the input stream via #getErrorStream().
is = isError ? con.getErrorStream() : con.getInputStream();
String contentEncoding = con.getContentEncoding() != null ? con.getContentEncoding() : "UTF-8";
return IOUtils.toString(is, contentEncoding); //Apache Commons IO
} catch (Exception e) {
throw new IllegalStateException(e);
} finally {
//Note: Closing the InputStream manually may be unnecessary, depending on the implementation of HttpURLConnection#disconnect(). Sun/Oracle's implementation does close it for you in said method.
if (is != null) {
try {
is.close();
} catch (IOException e) {
throw new IllegalStateException(e);
}
}
if (con != null) {
con.disconnect();
}
}
}

This is an old thread, but I had a similar problem and found a solution that is not listed here.
I was receiving the page fine in the browser, but got a 404 when I tried to access it via the HttpURLConnection. The URL I was trying to access contained a port number. When I tried it without the port number I successfully got a dummy page through the HttpURLConnection. So it seemed the non-standard port was the problem.
I started thinking the access was restricted, and in a sense it was. My solution was that I needed to tell the server the User-Agent and I also specify the file types I expect. I am trying to read a .json file, so I thought the file type might be a necessary specification as well.
I added these lines and it finally worked:
httpConnection.setRequestProperty("User-Agent","Mozilla/5.0 ( compatible ) ");
httpConnection.setRequestProperty("Accept","*/*");

check the response code being returned by the server

I know this is an old thread but I found a solution not listed anywhere here.
I was trying to pull data in json format from a J2EE servlet on port 8080 but was receiving the file not found error. I was able to pull this same json data from a php server running on port 80.
It turns out that in the servlet, I needed to change doGet to doPost.
Hope this helps somebody.

You could use OkHttp:
OkHttpClient client = new OkHttpClient();
String run(String url) throws IOException {
Request request = new Request.Builder()
.url(url)
.build();
Response response = client.newCall(request).execute();
return response.body().string();
}

I've tried that locally - using the code provided - and I don't get a FileNotFoundException except when the server returns a status 404 response.
Are you sure that you're connecting to the webserver you intend to be connecting to? Is there any chance you're connecting to a different webserver? (I note that the port number in the code doesn't match the port number in the link)

I have run into a similar issue but the reason seems to be different, here is the exception trace:
java.io.FileNotFoundException: http://myhost1:8081/test/api?wait=1
at sun.reflect.GeneratedConstructorAccessor2.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491)
at java.security.AccessController.doPrivileged(Native Method)
at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
at com.doitnext.loadmonger.HttpExecution.getBody(HttpExecution.java:85)
at com.doitnext.loadmonger.HttpExecution.execute(HttpExecution.java:214)
at com.doitnext.loadmonger.ClientWorker.run(ClientWorker.java:126)
at java.lang.Thread.run(Thread.java:680)
Caused by: java.io.FileNotFoundException: http://myhost1:8081/test/api?wait=1
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
at com.doitnext.loadmonger.HttpExecution.execute(HttpExecution.java:166)
... 2 more
So it would seem that just getting the response code will cause the URL connection to callGetInputStream.

I know this is an old thread but just noticed something on this one so thought I will just put it out there.
Like Jessica mentioned, this exception is thrown when using non-standard port.
It only seems to happen when using DNS though. If I use IP number I can specify the port number and everything works fine.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.