Java HTTP Request : get content size - java

I want to know if it's possible to get the size of a web page with a http request.
I use this to have oracle page length :
URL oracle = new URL("http://www.oracle.com/");
URLConnection yc = oracle.openConnection();
List<String> get = yc.getHeaderFields().get("content-Length");
But when I use this to a google page I do not have content-length in the header.

You must use the correct upper or lower cases. The field name is Content-Length not content-Length. You can also use the getContentLength() method of the URLConnection object. This field should be used by any application that need to send a HTTP body, and google does it so.
Be aware that Google uses secured connections.

Content-Length is only going to be applicable if the response is not chunked.
Generally static content will not be chunked, but in many cases dynamic content will not be chunked.
In the case of www.oracle.com, you will be redirected to www.oracle.com/index.html, which is static content and does provide Content-Length.

Related

Downloading binary file from url

I am using this code to download files from a url:
FileUtils.copyURLToFile(url, new File("C:/Songs/newsong.mp3"));
When I create the url using for instance,
"https://mjcdn.cc/2/282676442/MjUgU2FhbCAtIFZlZXQgQmFsaml0Lm1wMw==",
this works just fine and the mp3 is downloaded.
However,
if I use another url:
"https://dl.jatt.link/hd.jatt.link/a0339e7c772ed44a770a3fe29e3921a8/uttzv/Hummer-(Mr-Jatt.com).mp3",
the file is 0kb.
I am able to download files from both these urls from within a web browser.
What's wrong here, and how can I fix it.
I noticed a difference between your 2 URLs:
The first one just gives back the file without redirection.
But the second one responds with a redirect (HTTP/1.1 302 Moved Temporarily). It's also a special case, because it's a redirect from HTTPS to HTTP protocol.
Browsers can follow redirects, but your program - for some reason (see below) - can't.
I suggest you to use a HTTP client library (e.g. Apache HTTP client or Jsoup), and configure it to follow redirects (if they don't do it by default).
For example, with Jsoup, you would need a code like this:
String url = "https://dl.jatt.link/hd.jatt.link/a0339e7c772ed44a770a3fe29e3921a8/uttzv/Hummer-(Mr-Jatt.com).mp3";
String filename = "C:/Songs/newsong.mp3";
Response r = Jsoup.connect(url)
//.followRedirects(true) // follow redirects (it's the default)
.ignoreContentType(true) // accept not just HTML
.maxBodySize(10*1000*1000) // accept 10M bytes (default is 1M), or set to 0 for unlimited
.execute(); // send GET request
FileOutputStream out = new FileOutputStream(new File(filename));
out.write(r.bodyAsBytes());
out.close();
Update on #EJP's comment:
I looked up Apache Commons IO's FileUtils class on GitHub. It calls openStream() of the received URL object.
openStream() is a shorthand for openConnection().inputStream().
openConnection() returns an URLConnection object. If there is an appropriate subclass for the protocol used by URL, it will return an instance of that subclass. In this case that's a HttpsURLConnection which is the subclass of HttpURLConnection.
The followRedirects option is defined in HttpURLConnection and it's indeed true by default:
Sets whether HTTP redirects (requests with response code 3xx) should be automatically followed by this class. True by default.
So OP's approach would normally work with redirects too, but it seems that redirection from HTTPS to HTTP is not handled (properly) by HttpsURLConnection. - It's the case that #VGR mentioned in the comments below.
It's possible to handle redirects manually by reading the Location header with HttpsURLConnection, then use it in a new HttpURLConnection. (Example) (I wouldn't be surprised if Jsoup did the same.)
I suggested Jsoup because it already implements a way to handle HTTPS to HTTP redirections correctly and also provides tons of useful features.

How to get http response header in apache jena during calling Method FileManager.get().loadModel(url)

I am loading model in apache jena using function FileManager.get().loadModel(url).And I also know that there may be some URLs in HTTP Response Link Header .I want to load model also from the links(URLs) in link header.How to do that ? Is there any inbuilt fuctionality to get access to header and process link header in Response header?
FileManager.get().loadModel(url) packages up reading a URL and parsing the results into a model. It is packing up a common thing to do; it is not claiming to be comprehensive. It is quite an old interface.
If you wanted detailed control over the HTTP handling, see if HttpOp (a lower level) mechanism helps, otherwise do the handling in the application and hand the input stream for the response directly to the parser.
You may also find it useful to look at the code in RDFDataMgr.process for help with content negotiation.
I don't think that this is supported by Jena. I don't see any reason in doing so. The HTTP request is done to get the data and maybe also to get the response type. If you want to get the URLs in some header fields, why not simply use plain old Java:
URL url = new URL("http://your_ontology.owl");
URLConnection conn = url.openConnection();
Map<String, List<String>> map = conn.getHeaderFields();

using Guava's Resources.readLines to read from an HTTP service

I want to readLines from a URL, which resolves to an HTTP service. I can use
Resources.readLines(url, Charsets.SOMETHING)
from com.google.common.io.
This works, but the class javadoc for Resources states the following, without further explanation:
Note that even though these methods use URL parameters, they are usually not appropriate for HTTP or other non-classpath resources.
Why is this method inappropriate for reading from an HTTP service, and what is the recommended approach?
When using URL to send an HTTP request, the typical process is
URL url = new URL(someStringUrl);
HttpUrlConnection con = (HttpUrlConnection) url.openConnection();
// do some stuff with con, add headers, add request body, etc.
con.getInputStream(); // get body of response
The URL given to Resources skips all that. The methods in Resources depend on URL#openStream() which skips any modifications to the URLConnection, ie. is equivalent the url.openConnection().getInputStream(). It's possible you'll get any number of 400 level error codes from the HTTP response because your request wasn't correct.
This won't happen with class path resources because the protocol is simple. You just copy the bytes.

Getting RDF/XML web page using GET request with Accept header in Java

I want to send a GET requests that accept only results of type application/rdf+xml using the Accept: header. Is the following code right?
URLConnection connection = new URL(url + "?" + query).openConnection();
connection.setRequestProperty("Accept", "application/rdf+xml");
InputStream response = connection.getInputStream();
#gigadot nailed it, the Accept header is a suggestion to the server which the server is free to ignore.
If your application can only accept RDF/XML then you need to add logic to the receipt of the request to enforce this.
You can use the getContentType() method of a URLConnection to see what content type the server returned you and take an appropriate action (e.g. report an error) if it does not match your requirements.

getContentLength() return -1 only in WiFi?

I want to know the length of the file, so I tried getContentLength(). It works fine with network connection (edge/3g) but returns -1 with WiFi?
Why? The WiFi is good and the file was found, it can be downloaded but the return of getContentLength() is always "-1". I dont understand. file is a google documents file.
Is there an other way to get the length?
My code is:
URL url = new URL(file);
URLConnection conexion = url.openConnection();
conexion.connect();
int poids = conexion.getContentLength();
It may well be the mobile network changing things for you. For example, the mobile network I use shrinks image downloads automatically (and annoyingly). If the network is "transparently" performing the full download before giving you any data, it can fill in the content length for you.
However, you basically shouldn't rely on having the content length... there's nothing to guarantee that it'll be available to you.
The server is probably sending back a HTTP response that is chunked.
The behavior of the getContentLength() method is to return the 'internal' value of the length of the content, that is available to it. When the client receives a HTTP chunked response, the length of the response is not known, and hence the content length value is marked as -1.
The chunked nature of the response can determined by the Transfer-Encoding header value; chunked responses have a value of chunked. HTTP servers need not provide a Content-Length header value if the response is sent via chunked encoding; in fact, servers are encouraged to not send the Content-Length header for a chunked response, for the client is supposed to ignore the Content-Length header.
As for the actual reason on why the server is responding differently in two networks, well it depends on various factors. Usually servers will opt for a more optimal delivery mode, depending on the nature of the client. For some reason, it has detected that it is better off sending chunked responses for one type of a connection. The answer might lie in the HTTP request headers, but not necessarily so.

Categories