Checking content type from URL

Checking content type from URL - java

I asked this question before and Evgeniy Dorofeev answered it. Although worked for direct link only, but I accepted his answer. He just told me about check the content type from direct link:
String requestUrl = "https://dl-ssl.google.com/android/repository/android-14_r04.zip";
URL url = new URL(requestUrl);
URLConnection c = url.openConnection();
String contentType = c.getContentType();
As far I know, there are two URL types to download a file:
Direct link. For example: https://dl-ssl.google.com/android/repository/android-14_r04.zip. From this link, we can download data directly and get the file name, included with file extension (in this link, .zip extension). So we can know what file to be downloaded. You can try to download from that link.
Undirect link. For example: http://www.example.com/directory/download?file=52378. Have you ever tried to download data from Google Drive? When downloading data from Google Drive, it will gives you an undirect link, such as the link above. We never know whether the link contains a file or webpage. Also, we don't know the file name and file extension is, because of this link type is unclear and random.
I need to check whether it is a file or webpage. I must download it if the content type is a file.
So my question:
How do I check the content type from an undirect link?
As shown in the comments of this question, can HTTP-redirects solves the problem?
Thanks for your help.

After you open an URLConnection, a header file is returned. There are some information about the file in it. You can pull what you want from there. For example:
URLConnection u = url.openConnection();
long length = Long.parseLong(u.getHeaderField("Content-Length"));
String type = u.getHeaderField("Content-Type");
length is size of the file in bytes, type is something like application/x-dosexec or application/x-rar.

Such links redirect browsers to the actual content using HTTP redirects. To get the correct content type, all you have to do is tell HttpURLConnection to follow the redirects by setting setFollowRedirects() to true (documented here).

MimeTypeMap.getFileExtensionFromUrl(url)

This one worked for me, you have to use retrofit to check the headers of response. First you have to define an endpoint to call it with the url you want to check:
#GET
suspend fun getContentType(#Url url: String): Response<Unit>
Then you call it like this to get the content type header:
api.getContentType(url).headers()["content-type"]

Related

What URL do I use to open a String object in a web browser

If I have a HTML String object, using Selenium in Java, how can I get the browser to open that String as a HTML page? I have seen this done before but I don't remember the format that the URL needs to be.
For this example, let's say the string is :
<h2>This is a <i>test</i></h2>
I looked through this page and couldn't find the answer but I might be overlooking it. For example I tried this URL and it didn't work for me:
data:<h2>This is a <i>test</i></h2>

Here is a link for documentation http://en.wikipedia.org/wiki/Data_URI_scheme. You need to specify MIME-type of data. Try data:text/html,<h2>This is a <i>test</i></h2>

Reading data from URL returning strange characters [duplicate]

This question already has answers here:
JSON URL from StackExchange API returning jibberish?
(3 answers)
Closed 9 years ago.
I am trying to grab the data from a json file through java. If I navigate to the URL using my browser, everything displays fine, but if I try to get the data using java I get get a bunch of characters that cannot be interpreted or parsed. Note that this code works with other JSON Files. Could this be a server side thing with the way the JSON file is created? I tried messing around with different character sets and that did not seem to fix the problem.
public static void main(String[] args) throws Exception {
URL url = new URL("http://www.minecraftpvp.com/api/ping.json");
URLConnection connection = url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
boolean hasLine = true;
while (hasLine) {
String line = in.readLine();
if (line != null) {
System.out.println(line);
} else {
hasLine = false;
}
}
}
The output I get from this is just a ton of strange characters that make no sense at all. Where if I change the url to something like google.com, it works fine.
EDIT: JSON URL from StackExchange API returning jibberish? Seemed to have answered my question. I tried searching before I asked to make sure the answer wasn't here and couldn't find anything. Guess I didn't look hard enough.

Yes that URL is returning gzip encoded content by default.
You can do one of three things:
Explicitly set the Accept-Encoding: header in your request. A web service should not return gzip compression unless it is listed as an accepted encoding in the request, so this website is not being very friendly. Your browser is setting it as accepted I suspect, that is why you can see it there. Just set it to an empty value and it should as per the spec return non-encoded responses, your mileage may vary on this one.
Or use the answer in this How to handle non-UTF8 html page in Java? that shows how to decompress the response. This should be the preferred option over #1.
And/or Ask the person hosting the service to implement the recommended scheme which is to only provide compressed responses if the client says it can handle them or if it can infer it from the browser fingerprint with high confidence.
Best of luck C.

You need to inspect the Content-Encoding header. The URL in question improperly returns gzip-compressed content even when you don't ask for it, and you'll need to run it through a decoder.

Uri.parse(), how to get the encoding correct?

I am doing an application where I have to read a URL from a webpage as a String[Its not the address of the page]. The URL that I will be reading contains query string, and I specifically need two queries from that URL. So I am using the Uri class available in Android. Now, the problem lies in the encoding/format of the URL and the query. One of the queries that I need is always an URL. Sometimes the query URL is %-encoded and sometimes not.
The URLs can be like the following :
Case 1 :
http://www.example.com/example/example.aspx?file=http%3A%2F%2FXX.XXX.XX.XXX%2FExample.file%3Ftoken%3D9dacfc85
Case 2 :
http://www.example.com/example/example.aspx?file=http://XX.XXX.XX.XXX/Example.file?token=9dacfc85
How do I get the correct Url contained in the file= query?
I am using the following [to accomplish the said work universally] :
Uri.decode(urlString.getQueryParameter("file"));
Is this the correct way to do it?
UPDATE
I have decided to first encode the whole URL regardless of its value and then get the query parameter. Theoretically, it should work.

If you are uncertain about the type of URL you would get then I would suggest you to decode every URL you get from the parameter. And when you need to use it then you can encode it.
As per my knowledge, you are doing it right.

Search for redirected website path

I have two websites, A and B. When I open website A, I am redirected to website B automatically.
What is the function with which I can check what was the full path of website A from which was the redirect?
I was trying to start with:
logger.info(request.getPathInfo());
logger.info(request.getPathTranslated());
logger.info(request.getServletPath());
logger.info(request.getLocalName());
logger.info(request.getRemoteAddr());
logger.info(request.getRemoteHost());
logger.info(request.getRequestURI());
logger.info(request.getServerName());
but none of them is correct.
For redirecting I use response.sendRedirect inside Controller.
Thanks for help.

You can try using the optional referer header:
request.getHeader("referer");
But it is important to note that this may not always be populated (specifically IE).
A better solution, if you are in control of both of the websites, is to pass the value somehow when you are doing the redirect. For example, as a GET or POST parameter.
Edit:
As suggested above, you can append query strings to your redirect URL. For example, you might try something like this:
String redirectUrl = "http://my.redirect.com/";
redirectUrl += "?referer=";
redirectUrl += URLEncoder.encode(request.getRequestURL().toString(), "UTF-8");
Then you can just pull this out of the request on the other side.
Use this as a starting point. You may need to manually append other query parameters that may not be part of the getRequestURL() output.

None of these would get you the page that redirected you to the current page. What you can try is:
String refererPage = request.getHeader("referer");
However keep in mind that this is also browser dependent and may not always be present.

Try this
request.getHeader("referer");

Please try
request.getHeader("referer");

Java: How to easily check if a URL was already shortened?

If I have a general url (not restricted to twitter or google) like this:
http://t.co/y4o14bI
is there an easy way to check if this url is shortened?
In the above case, I as a human can of course see that it was shortend, but is there an automatic and elegant way?

You could do a request to the URL, look if you get redirected and if so, assume it's a shortening service. For this you'd have to read the HTTP status codes.
On the other hand, you could whitelist some URL shortening services (t.co, bit.ly, and so on) and assume all links to those domains are shortened.
Drawback of the first method is that it isn't certain, some sites use redirects internally. The drawback of the second method is that you'd have to keep adding shortening services, although only a few are used widely.

One signal may be to request the URL and see if it results in a redirect to another domain. However, without a good definition of what "shortened" means, there is no generic way.

if you know all the domains that can be used to shorten your URLs, check if it is contained :
String[] domains = {"bit.ly", "t.co"...};
for(String domain : domains){
if(url.startsWith("http://" + domain)){
return true;
}
}
return false;

You can't: You will have to work by assumption.
Assumption:
Does www exist in url.
Does the server name end with a valid domain (e.g. com, edu, etc.) or does it has co.xx where xx is a valid country or organization code.
And you can add more assumption based on other url shortening links.

You can't.
You can only check if you list a couple of shorteners and check if the url starts with it.
You can also try checking whether the url is shorter than a given length (and contains path/query string), but some shorteners (tinyurl for example) may have longer urls than normal sites (aol.com)
I would prefer the list of known shorteners.

Here's what you could do in Java, groovy and the like.
Get the url you want to test;
Open the url with HttpURLConnection
Check the response code
if it is a valid code, 200 for example, the you can retrieve the url string in long form from the connection object if it was shortened or back in its original form if it wasn't.
We all love to see some code don't we. Its crude, but hey!
String addr = "http://t.co/y4o14bI";
URL url = new URL(addr);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
if (connection.getResponseCode() == 200) {
String longUrl = connection.url;
System.out.println(longUrl);
} else {
// You decide what you want to do here!
}

Actually, you as a human, can't. The only way you know that it's shortened is that it's a t.co domain. The y4o14bI could be an CMS identifier for all you know.
The best way would be to use a list of known shortener urls, and lookup against that.
And even then you would have problems. I use bit.ly with a personal domain, wtn.gd
So http://wtn.gd/random would also be a shortened URL.
You could maybe do a HTTP HEAD-request, and check for a 301/302 ?

If you request an URL like this, your HttpCLient should receive a HTTP Redirect instead of a HTML page. This wouldn't be an evidence but at least a hint.

Evaluate the URL and look for some clues:
the Path meets certain criteria
only has one step (i.e. not multiple slashes)
does not end with filename extensions
not longer than X characters (would need to evaluate various URL shortening services and adjust the upper bounds for the max token length)
HttpUrlConnection returns a redirect responseCode (i.e. 301, 302)

I would suggest using android.util.Patterns.WEB_URL
public static List<String> findUrls(String input) {
List<String> links = new ArrayList<>();
Matcher m = android.util.Patterns.WEB_URL.matcher(input);
while (m.find()) {
String url = m.group();
links.add(url);
}
return links;
}

Use the unshorten URL service like https://unshorten.me
They have an API as well https://unshorten.me/api
If the URL is shortened it will return the original URL.
If not you will get the same one back.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Checking content type from URL - java

Such links redirect browsers to the actual content using HTTP redirects. To get the correct content type, all you have to do is tell HttpURLConnection to follow the redirects by setting setFollowRedirects() to true (documented here).

MimeTypeMap.getFileExtensionFromUrl(url)

Related

What URL do I use to open a String object in a web browser

Reading data from URL returning strange characters [duplicate]

Uri.parse(), how to get the encoding correct?

Search for redirected website path

Java: How to easily check if a URL was already shortened?

Categories

Resources