using Guava's Resources.readLines to read from an HTTP service - java

I want to readLines from a URL, which resolves to an HTTP service. I can use
Resources.readLines(url, Charsets.SOMETHING)
from com.google.common.io.
This works, but the class javadoc for Resources states the following, without further explanation:
Note that even though these methods use URL parameters, they are usually not appropriate for HTTP or other non-classpath resources.
Why is this method inappropriate for reading from an HTTP service, and what is the recommended approach?

When using URL to send an HTTP request, the typical process is
URL url = new URL(someStringUrl);
HttpUrlConnection con = (HttpUrlConnection) url.openConnection();
// do some stuff with con, add headers, add request body, etc.
con.getInputStream(); // get body of response
The URL given to Resources skips all that. The methods in Resources depend on URL#openStream() which skips any modifications to the URLConnection, ie. is equivalent the url.openConnection().getInputStream(). It's possible you'll get any number of 400 level error codes from the HTTP response because your request wasn't correct.
This won't happen with class path resources because the protocol is simple. You just copy the bytes.

Related

Downloading binary file from url

I am using this code to download files from a url:
FileUtils.copyURLToFile(url, new File("C:/Songs/newsong.mp3"));
When I create the url using for instance,
"https://mjcdn.cc/2/282676442/MjUgU2FhbCAtIFZlZXQgQmFsaml0Lm1wMw==",
this works just fine and the mp3 is downloaded.
However,
if I use another url:
"https://dl.jatt.link/hd.jatt.link/a0339e7c772ed44a770a3fe29e3921a8/uttzv/Hummer-(Mr-Jatt.com).mp3",
the file is 0kb.
I am able to download files from both these urls from within a web browser.
What's wrong here, and how can I fix it.
I noticed a difference between your 2 URLs:
The first one just gives back the file without redirection.
But the second one responds with a redirect (HTTP/1.1 302 Moved Temporarily). It's also a special case, because it's a redirect from HTTPS to HTTP protocol.
Browsers can follow redirects, but your program - for some reason (see below) - can't.
I suggest you to use a HTTP client library (e.g. Apache HTTP client or Jsoup), and configure it to follow redirects (if they don't do it by default).
For example, with Jsoup, you would need a code like this:
String url = "https://dl.jatt.link/hd.jatt.link/a0339e7c772ed44a770a3fe29e3921a8/uttzv/Hummer-(Mr-Jatt.com).mp3";
String filename = "C:/Songs/newsong.mp3";
Response r = Jsoup.connect(url)
//.followRedirects(true) // follow redirects (it's the default)
.ignoreContentType(true) // accept not just HTML
.maxBodySize(10*1000*1000) // accept 10M bytes (default is 1M), or set to 0 for unlimited
.execute(); // send GET request
FileOutputStream out = new FileOutputStream(new File(filename));
out.write(r.bodyAsBytes());
out.close();
Update on #EJP's comment:
I looked up Apache Commons IO's FileUtils class on GitHub. It calls openStream() of the received URL object.
openStream() is a shorthand for openConnection().inputStream().
openConnection() returns an URLConnection object. If there is an appropriate subclass for the protocol used by URL, it will return an instance of that subclass. In this case that's a HttpsURLConnection which is the subclass of HttpURLConnection.
The followRedirects option is defined in HttpURLConnection and it's indeed true by default:
Sets whether HTTP redirects (requests with response code 3xx) should be automatically followed by this class. True by default.
So OP's approach would normally work with redirects too, but it seems that redirection from HTTPS to HTTP is not handled (properly) by HttpsURLConnection. - It's the case that #VGR mentioned in the comments below.
It's possible to handle redirects manually by reading the Location header with HttpsURLConnection, then use it in a new HttpURLConnection. (Example) (I wouldn't be surprised if Jsoup did the same.)
I suggested Jsoup because it already implements a way to handle HTTPS to HTTP redirections correctly and also provides tons of useful features.

How to get http response header in apache jena during calling Method FileManager.get().loadModel(url)

I am loading model in apache jena using function FileManager.get().loadModel(url).And I also know that there may be some URLs in HTTP Response Link Header .I want to load model also from the links(URLs) in link header.How to do that ? Is there any inbuilt fuctionality to get access to header and process link header in Response header?
FileManager.get().loadModel(url) packages up reading a URL and parsing the results into a model. It is packing up a common thing to do; it is not claiming to be comprehensive. It is quite an old interface.
If you wanted detailed control over the HTTP handling, see if HttpOp (a lower level) mechanism helps, otherwise do the handling in the application and hand the input stream for the response directly to the parser.
You may also find it useful to look at the code in RDFDataMgr.process for help with content negotiation.
I don't think that this is supported by Jena. I don't see any reason in doing so. The HTTP request is done to get the data and maybe also to get the response type. If you want to get the URLs in some header fields, why not simply use plain old Java:
URL url = new URL("http://your_ontology.owl");
URLConnection conn = url.openConnection();
Map<String, List<String>> map = conn.getHeaderFields();

Java HTTP Request : get content size

I want to know if it's possible to get the size of a web page with a http request.
I use this to have oracle page length :
URL oracle = new URL("http://www.oracle.com/");
URLConnection yc = oracle.openConnection();
List<String> get = yc.getHeaderFields().get("content-Length");
But when I use this to a google page I do not have content-length in the header.
You must use the correct upper or lower cases. The field name is Content-Length not content-Length. You can also use the getContentLength() method of the URLConnection object. This field should be used by any application that need to send a HTTP body, and google does it so.
Be aware that Google uses secured connections.
Content-Length is only going to be applicable if the response is not chunked.
Generally static content will not be chunked, but in many cases dynamic content will not be chunked.
In the case of www.oracle.com, you will be redirected to www.oracle.com/index.html, which is static content and does provide Content-Length.

How it comes that URL.openConnection() allows me to read header?

I recently was experimenting with java networking and I found a bit odd thing, suppose you have
URL url = new URL("http://www.google.com");
URLConnection con = url.openConnection();
then i can call methods, like con.getContentLength() and so on and they will give me correct values, even despite I didn't envoke con.connect(). How can that be? I mean, where from/how does URLConnection gets those headers, I didn't invoke con.connect() yet, so no requests were sent and so no headers should be available at that moment.
The actual TCP connect happens implicitly when you call any method that requires the response, such as getContentLength(), getInputStream(), getResponseCode(). It doesn't happen at openConnection(). The request is sent at that point.
Unless you are using one of the streaming modes and you're doing a PUT or POST with request content, in which case the connection is opened when you start writing the request.

Getting RDF/XML web page using GET request with Accept header in Java

I want to send a GET requests that accept only results of type application/rdf+xml using the Accept: header. Is the following code right?
URLConnection connection = new URL(url + "?" + query).openConnection();
connection.setRequestProperty("Accept", "application/rdf+xml");
InputStream response = connection.getInputStream();
#gigadot nailed it, the Accept header is a suggestion to the server which the server is free to ignore.
If your application can only accept RDF/XML then you need to add logic to the receipt of the request to enforce this.
You can use the getContentType() method of a URLConnection to see what content type the server returned you and take an appropriate action (e.g. report an error) if it does not match your requirements.

Categories