Java URL Without Protocol

Java URL Without Protocol - java

I'm trying to open an InputStream to a certain URL as given by the service's API. However, it does not have a set protocol (it's not http or https) and without one, I am getting the following error.
Is there any way to get a request without a protocol?
Exception:
Exception in thread "main" java.net.MalformedURLException: no protocol.
Code:
String url = "maple.fm/api/2/search?server=1";
InputStream is = new URL(url).openStream();
UPDATE: I now updated the code to:
Code:
String url = "http://maple.fm/api/2/search?server=1";
InputStream is = new URL(url).openStream();
and now I'm getting the following error:
Exception:
Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: http://maple.fm/api/2/search?server=1

A URL without a protocol is not a valid URL. It is actually a relative URI, and you can only use a relative URI if you have an absolute URI (or equivalent) to provide the context for resolving it.
Is there any way to [make] a request without a protocol?
Basically .... No. The protocol tells the client-side libraries how to perform the request. If there is no protocol, the libraries would not know what to do.
The reason that "urls without protocols" work in a web browser's URL bar is that the browser is being helpful, and filling in the missing protocol with "http:" ... on the assumption that that is what the user probably means. (Plus a whole bunch of other stuff, like adding "www.", adding ".com", escaping spaces and other illegal characters, ... or trying a search instead of a normal HTTP GET request.)
Now you could try to do the same stuff in your code before passing the URL string to the URL class. But IMO, the correct solution if you are writing code to talk to a service is to just fix the URL. Put the correct protocol on the front ...
The 403 error you are now getting means Forbidden. The server is saying "you are not permitted to do this".
Check the documentation for the service you are trying to use. (Perhaps you need to go through some kind of login procedure. Perhaps what you are trying to do is only permitted for certain users, or something.)
Try the example URL on this page ... which incidentally works for me from my web browser.

When you say it does not have a set protocol, I am a little bit suspicious of what that means. If it can use multiple protocols, I would hope the API documentation mentions some way of determining what the protocol should be.
I hit the URL http://maple.fm/api/2/search?server=1 and it is simply returning JSON over http. I think your actual problem is that you are trying to open an InputStream to talk to a web server. I believe the solution to your problem, of trying to handle JSON over http, can be found here.
I decided to dig into this because I was curious. Combining this answer and this answer, we have the following code which will print out the JSON output from your URL. Of course, you still need a JSON library to parse it, but that's a separate problem.
import java.net.*;
import java.io.*;
public class Main{
public static String getHTML(String urlToRead) {
URL url;
HttpURLConnection conn;
BufferedReader rd;
String line;
String result = "";
try {
url = new URL(urlToRead);
conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
while ((line = rd.readLine()) != null) {
result += line;
}
rd.close();
} catch (IOException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
return result;
}
public static void main(String[] args) {
String url = "http://maple.fm/api/2/search?server=1";
System.out.println(getHTML(url));
}
}

you need to surround it with a try/catch block
try {
String url = "maple.fm/api/2/search?world=1";
InputStream is = new URL(url).openStream();
catch(MalformedURLException e) {
e.printStackTrace();

Related

Why am I getting content type of a PDF file is returned as HTML?

I am trying to see the content type of a web URL using the following code.
Interestingly, the content type of the given URL (http://www.jbssinc.com/inv_pr_pdf/2007-05-08.pdf") is returned as text/html; charset=iso-8859-1 even though it is a PDF document. I would like to understand why.
Here is my code:
public static void main(String[] args) throws MalformedURLException{
URLConnection urlConnection = null;
URL url = new URL("http://www.jbssinc.com/inv_pr_pdf/2007-05-08.pdf");
try {
urlConnection = url.openConnection();
urlConnection.setConnectTimeout(10*1000);
urlConnection.setReadTimeout(10*1000);
urlConnection.connect();
} catch (IOException e) {
System.out.println("Error in establishing connection.\n");
}
String contentType = "";
/* If we were able to get a connection ---> */
if (urlConnection != null) {
contentType = urlConnection.getContentType();
}
System.out.println(contentType);
}

When I access this page in Java, if I attempt to actually load the page, I get a 403 - Forbidden error. These error pages are HTML pages, not pdf files, so that's why you're getting the content type you're seeing.
This site is probably detecting your browser or using some other mechanism to prevent automatic downloads, that's why it works in Chrome, Firefox and IE but not with Java.
Your code works fine with a different URL, such as https://partners.adobe.com/public/developer/en/xml/AdobeXMLFormsSamples.pdf.
In the case of this webserver, if you specify the User-Agent to a typical browser value, it will allow you to make the connection normally.
Try adding this line immediately before urlConnection.connect():
urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2");
See this answer for more information about setting the User-Agent. You should make sure you are not violating the website's Terms of Service in some way before doing this, though.
Typically, the way to check if a website is explicitly forbidding apps from downloading their content is with the http://example.com/robots.txt file. In this case, that would be http://www.jbssinc.com/robots.txt. In this case, this file doesn't forbid robots (your program) from downloading this particular file, so I think you are okay to spoof your User Agent. In this case, the fact that Java is blocked is more likely to be user error.
Further reading: Is using a faked user agent allowed?

GET request using ajax v java

I'm writing a simple web application that completes just one GET request with custom headers. When I tried making the request using ajax, it gave me a cross domain error like so:
No 'Access-Control-Allow-Origin' header is present on the requested resource.
Origin 'http://localhost:8080' is therefore not allowed access.
When I make the same request in Java using custom headers, it works completely fine.
public static String executeGET() {
String response = "";
try {
URL url = new URL("http://....");
HttpURLConnection con = (HttpURLConnection) url.openConnection();
con.setRequestMethod("GET");
//set custom headers
con.setRequestProperty("header1", "2.0");
con.setRequestProperty("header2", "sellingv2");
con.connect();
InputStreamReader reader = new InputStreamReader(con.getInputStream());
Scanner scanner = new Scanner(reader);
while (scanner.hasNext()) {
response += scanner.next();
}
scanner.close();
con.disconnect();
}
catch (Exception ex) {
ex.printStackTrace();
}
return response;
}
Why does this work in Java and not with AJAX?

This request works in Java and not with AJAX because AJAX is called from within a web browser. Web browsers enforce a "Same-origin policy" which prevents front-end scripts from performing possibly malicious AJAX requests. Your Java application is not subject to this limitation so it can make the request just fine. The Access-Control-Allow-Origin header can be used to override this functionality, but your server is not configured to use it. It is mostly likely the case that the protocol, host, or port, in your url string do not match what is hosting your front-end files. If you change your url to a relative path it should work.

URL given have fileNotFoundException

I have the following code, it works totally fine on my local development server, but when I uploaded to the deployment server, I always hit file not found Exception
String urlStr = "http://" + getContext().getRequest().getServerName() +
getContext().getServletContext().getContextPath() + "test.action";
URL url = new URL(urlStr);
InputStream input = url.openStream(); //Error always occurs here, it gives me the correct URL but it says file not found.
Can anyone help me with this?

Because its a HTTP URL the correct way would be as follows.
String urlStr = "http://" + getContext().getRequest().getServerName() +
getContext().getServletContext().getContextPath() + "test.action";
URL url = new URL(urlStr);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
if (conn.getResponseCode() == HttpURLConnection.HTTP_ACCEPTED) {
InputStream input = conn.getInputStream();
}

I think that #deadlock's comments is probably the key to solving this.
You are getting a FileNotFoundException because the remote server is sending a 404 Not Found response. The most likely explanation is that you are attempting to connect using the wrong URL. Print out the URL string before you try to connect.
All the evidence is pointing to the fact that the server is sending "404 Not Found" responses ... for both versions of the code. This normally means that your URL is wrong. But it is also possible for it to be other things:
You may be using different proxies in the Java and browser cases, resulting in the Java case reaching some server that doesn't understand the URL.
It is conceivable that the server is implementing some anti web scraping mechanism, and sending you 404 responses `cos this thinks (rightly) that your requests aren't coming from a web browser,

403 error in accessing an URL but works fine in browsers

String url = "http://maps.googleapis.com/maps/api/directions/xml?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false";
URL google = new URL(url);
HttpURLConnection con = (HttpURLConnection) google.openConnection();
and I use BufferedReader to print the content I get 403 error
The same URL works fine in the browser. Could any one suggest.

The reason it works in a browser but not in java code is that the browser adds some HTTP headers which you lack in your Java code, and the server requires those headers. I've been in the same situation - and the URL worked both in Chrome and the Chrome plugin "Simple REST Client", yet didn't work in Java. Adding this line before the getInputStream() solved the problem:
connection.addRequestProperty("User-Agent", "Mozilla/4.0");
..even though I have never used Mozilla. Your situation might require a different header. It might be related to cookies ... I was getting text in the error stream advising me to enable cookies.
Note that you might get more information by looking at the error text. Here's my code:
try {
HttpURLConnection connection = ((HttpURLConnection)url.openConnection());
connection.addRequestProperty("User-Agent", "Mozilla/4.0");
InputStream input;
if (connection.getResponseCode() == 200) // this must be called before 'getErrorStream()' works
input = connection.getInputStream();
else input = connection.getErrorStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(input));
String msg;
while ((msg =reader.readLine()) != null)
System.out.println(msg);
} catch (IOException e) {
System.err.println(e);
}

HTTP 403 is a Forbidden status code. You would have to read the HttpURLConnection.getErrorStream() to see the response from the server (which can tell you why you have been given a HTTP 403), if any.

This code should work fine. If you have been making a number of requests, it is possible that Google is just throttling you. I have seen Google do this before. You can try using a proxy to verify.

Most browsers automatically encode URLs when you enter them, but the Java URL function doesn't.
You should Encode the URL with URLEncoder URL Encoder

I know this is a bit late, but the easiest way to get the contents of a URL is to use the Apache HttpComponents HttpClient project: http://hc.apache.org/httpcomponents-client-ga/index.html

you original page (with link) and the targeted linked page are not the same domain.
original-domain and target-domain.
I found the difference is in request header:
with 403 forbidden error,
request header have one line:
Referer: http://original-domain/json2tree/ipfs/ipfsList.html
when I enter url, no 403 forbidden,
the request header does NOT have above line referer: original-domain
I finally figure out how to fix this error!!!
on your original-domain web page, you have to add
<meta name="referrer" content="no-referrer" />
it will remove or prevent sending the Referer in header, works both for links and for Ajax requests made

URLConnection FileNotFoundException for non-standard HTTP port sources

I was trying to use the Apache Ant Get task to get a list of WSDLs generated by another team in our company. They have them hosted on a weblogic 9.x server on http://....com:7925/services/. I am able to get to the page through a browser, but the get task gives me a FileNotFoundException when trying to copy the page to a local file to parse. I was still able to get (using the ant task) a URL without the non-standard port 80 for HTTP.
I looked through the Ant source code, and narrowed the error down to the URLConnection. It seems as though the URLConnection doesn't recognize the data is HTTP traffic, since it isn't on the standard port, even though the protocol is specified as HTTP. I sniffed the traffic using WireShark and the page loads correctly across the wire, but still gets the FileNotFoundException.
Here's an example where you will see the error (with the URL changed to protect the innocent). The error is thrown on connection.getInputStream();
import java.io.File;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
public class TestGet {
private static URL source;
public static void main(String[] args) {
doGet();
}
public static void doGet() {
try {
source = new URL("http", "test.com", 7925,
"/services/index.html");
URLConnection connection = source.openConnection();
connection.connect();
InputStream is = connection.getInputStream();
} catch (Exception e) {
System.err.println(e.toString());
}
}
}

The response to my HTTP request returned with a status code 404, which resulted in a FileNotFoundException when I called getInputStream(). I still wanted to read the response body, so I had to use a different method: HttpURLConnection#getErrorStream().
Here's a JavaDoc snippet of getErrorStream():
Returns the error stream if the
connection failed but the server sent
useful data nonetheless. The typical
example is when an HTTP server
responds with a 404, which will cause
a FileNotFoundException to be thrown
in connect, but the server sent an
HTML help page with suggestions as to
what to do.
Usage example:
public static String httpGet(String url) {
HttpURLConnection con = null;
InputStream is = null;
try {
con = (HttpURLConnection) new URL(url).openConnection();
con.connect();
//4xx: client error, 5xx: server error. See: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html.
boolean isError = con.getResponseCode() >= 400;
//In HTTP error cases, HttpURLConnection only gives you the input stream via #getErrorStream().
is = isError ? con.getErrorStream() : con.getInputStream();
String contentEncoding = con.getContentEncoding() != null ? con.getContentEncoding() : "UTF-8";
return IOUtils.toString(is, contentEncoding); //Apache Commons IO
} catch (Exception e) {
throw new IllegalStateException(e);
} finally {
//Note: Closing the InputStream manually may be unnecessary, depending on the implementation of HttpURLConnection#disconnect(). Sun/Oracle's implementation does close it for you in said method.
if (is != null) {
try {
is.close();
} catch (IOException e) {
throw new IllegalStateException(e);
}
}
if (con != null) {
con.disconnect();
}
}
}

This is an old thread, but I had a similar problem and found a solution that is not listed here.
I was receiving the page fine in the browser, but got a 404 when I tried to access it via the HttpURLConnection. The URL I was trying to access contained a port number. When I tried it without the port number I successfully got a dummy page through the HttpURLConnection. So it seemed the non-standard port was the problem.
I started thinking the access was restricted, and in a sense it was. My solution was that I needed to tell the server the User-Agent and I also specify the file types I expect. I am trying to read a .json file, so I thought the file type might be a necessary specification as well.
I added these lines and it finally worked:
httpConnection.setRequestProperty("User-Agent","Mozilla/5.0 ( compatible ) ");
httpConnection.setRequestProperty("Accept","*/*");

check the response code being returned by the server

I know this is an old thread but I found a solution not listed anywhere here.
I was trying to pull data in json format from a J2EE servlet on port 8080 but was receiving the file not found error. I was able to pull this same json data from a php server running on port 80.
It turns out that in the servlet, I needed to change doGet to doPost.
Hope this helps somebody.

You could use OkHttp:
OkHttpClient client = new OkHttpClient();
String run(String url) throws IOException {
Request request = new Request.Builder()
.url(url)
.build();
Response response = client.newCall(request).execute();
return response.body().string();
}

I've tried that locally - using the code provided - and I don't get a FileNotFoundException except when the server returns a status 404 response.
Are you sure that you're connecting to the webserver you intend to be connecting to? Is there any chance you're connecting to a different webserver? (I note that the port number in the code doesn't match the port number in the link)

I have run into a similar issue but the reason seems to be different, here is the exception trace:
java.io.FileNotFoundException: http://myhost1:8081/test/api?wait=1
at sun.reflect.GeneratedConstructorAccessor2.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491)
at java.security.AccessController.doPrivileged(Native Method)
at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
at com.doitnext.loadmonger.HttpExecution.getBody(HttpExecution.java:85)
at com.doitnext.loadmonger.HttpExecution.execute(HttpExecution.java:214)
at com.doitnext.loadmonger.ClientWorker.run(ClientWorker.java:126)
at java.lang.Thread.run(Thread.java:680)
Caused by: java.io.FileNotFoundException: http://myhost1:8081/test/api?wait=1
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
at com.doitnext.loadmonger.HttpExecution.execute(HttpExecution.java:166)
... 2 more
So it would seem that just getting the response code will cause the URL connection to callGetInputStream.

I know this is an old thread but just noticed something on this one so thought I will just put it out there.
Like Jessica mentioned, this exception is thrown when using non-standard port.
It only seems to happen when using DNS though. If I use IP number I can specify the port number and everything works fine.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.