fetch domain name from url - java

I have an example url:
www.google.com
I would like to fetch only "com" from this url but I completely don't know how to do this :(
Maybe somebody was struggling with this problem and found a solution?
We have to keep it in mind that example can be more advance for example
www.mydomain.com.pl
and from this we have to fetch "com.pl"
Maybe there is a library who can deal with it very easily...
Each

Use Guava
URI uri = URI.create("htp://www.mydomain.com.pl");
InternetDomainName domainName = InternetDomainName.from(uri.getHost());
System.out.println(domainName.publicSuffix()); //com.pl
You cannot do this correctly without referencing the Public Suffix List (which Guava does)

Related

Java - Retain only the Rest API base URL of a Rest API

Small question regarding how to use java to retain only the base URL of a rest API please.
As input, many strings, all valid rest APIs.
For instance, the inputs:
https://some-host.com/v1/someapi
https://another-host.fr/api/compute
https://somewhere.host.com/public/api/v3/getsomething
I would like to only retain the bold part, basically, the https, the : and the slashes, the host name. Everything that comes after the host, I would like to discard it.
Currently, I am trying some kind of string.split based on the / character, then trying to re-concat the arrays, but I have a feeling I am not going to the right direction.
What would be the most appropriate way please?
Thank you.
You could just try java.net.URL or java.net.URI. They behave pretty similar.
For example:
URL url = new URL("http://example.com/a/b/c");
url.getProtocol();
url.getHost();
url.getPath();
or:
URI uri = new URI("http://example.com/a/b/c");
uri.getScheme();
uri.getHost();
uri.getPath();
There are several methods in both classes to extract lot's of different parts.

Uri.parse(), how to get the encoding correct?

I am doing an application where I have to read a URL from a webpage as a String[Its not the address of the page]. The URL that I will be reading contains query string, and I specifically need two queries from that URL. So I am using the Uri class available in Android. Now, the problem lies in the encoding/format of the URL and the query. One of the queries that I need is always an URL. Sometimes the query URL is %-encoded and sometimes not.
The URLs can be like the following :
Case 1 :
http://www.example.com/example/example.aspx?file=http%3A%2F%2FXX.XXX.XX.XXX%2FExample.file%3Ftoken%3D9dacfc85
Case 2 :
http://www.example.com/example/example.aspx?file=http://XX.XXX.XX.XXX/Example.file?token=9dacfc85
How do I get the correct Url contained in the file= query?
I am using the following [to accomplish the said work universally] :
Uri.decode(urlString.getQueryParameter("file"));
Is this the correct way to do it?
UPDATE
I have decided to first encode the whole URL regardless of its value and then get the query parameter. Theoretically, it should work.
If you are uncertain about the type of URL you would get then I would suggest you to decode every URL you get from the parameter. And when you need to use it then you can encode it.
As per my knowledge, you are doing it right.

Not too sure how a URI works regarding absolute paths to files

Simple question: why am I getting new IllegalArgumentException: Path component should be '/' when trying to create a zip filesystem at the following URI:
file:E:/somedirectory/somefile
But this seems to work: file:/somedirectory/somefile
What if I have the same paths on two different drives and I need to access a specific one? Or am I completely missing the point of URIs in the first place?
For paths that use windows volumes use the following format:
file:///e:/somedirectory/somefile
The triple /// results from omitting the URL hostname for local files. Compare: file://sometherhost/e:/somedirectory/somefile, which is valid according to the URI spec, if not actually useful for accessing files on remote volumes.
1. Backslashes are used to point directories and files
2. Try it this way...
`E:\\somedirectory\\somefile`
Maybe its easier to do it with the URI builder. I always use it:
URIBuilder builder = new URIBuilder();
builder.setSchema("file").setHost("anyhost").setPath("/yourpath/");
URI uri;
uri = builder.build();
you can check your URI:
System.out.println(uri.toString());
I hope this will help you!

How to retrieve text from url using regex in Android

I have a url and I am trying to extract the text before the third slash. I am quite new to the concept in Android. I believe the Pattern class is used to achieve this. My problem is how to.
Take for instance: http://name.mywebsite.com/images.... I only require everything before images. Could anyone point me in the right direction?
You can use the uri method getHost in java.This example will help you,
URI uri = new URI("http://name.mywebsite.com/images");
String host = uri.getHost();
/* It returns name.mywebsite.com/*/

Java: How to easily check if a URL was already shortened?

If I have a general url (not restricted to twitter or google) like this:
http://t.co/y4o14bI
is there an easy way to check if this url is shortened?
In the above case, I as a human can of course see that it was shortend, but is there an automatic and elegant way?
You could do a request to the URL, look if you get redirected and if so, assume it's a shortening service. For this you'd have to read the HTTP status codes.
On the other hand, you could whitelist some URL shortening services (t.co, bit.ly, and so on) and assume all links to those domains are shortened.
Drawback of the first method is that it isn't certain, some sites use redirects internally. The drawback of the second method is that you'd have to keep adding shortening services, although only a few are used widely.
One signal may be to request the URL and see if it results in a redirect to another domain. However, without a good definition of what "shortened" means, there is no generic way.
if you know all the domains that can be used to shorten your URLs, check if it is contained :
String[] domains = {"bit.ly", "t.co"...};
for(String domain : domains){
if(url.startsWith("http://" + domain)){
return true;
}
}
return false;
You can't: You will have to work by assumption.
Assumption:
Does www exist in url.
Does the server name end with a valid domain (e.g. com, edu, etc.) or does it has co.xx where xx is a valid country or organization code.
And you can add more assumption based on other url shortening links.
You can't.
You can only check if you list a couple of shorteners and check if the url starts with it.
You can also try checking whether the url is shorter than a given length (and contains path/query string), but some shorteners (tinyurl for example) may have longer urls than normal sites (aol.com)
I would prefer the list of known shorteners.
Here's what you could do in Java, groovy and the like.
Get the url you want to test;
Open the url with HttpURLConnection
Check the response code
if it is a valid code, 200 for example, the you can retrieve the url string in long form from the connection object if it was shortened or back in its original form if it wasn't.
We all love to see some code don't we. Its crude, but hey!
String addr = "http://t.co/y4o14bI";
URL url = new URL(addr);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
if (connection.getResponseCode() == 200) {
String longUrl = connection.url;
System.out.println(longUrl);
} else {
// You decide what you want to do here!
}
Actually, you as a human, can't. The only way you know that it's shortened is that it's a t.co domain. The y4o14bI could be an CMS identifier for all you know.
The best way would be to use a list of known shortener urls, and lookup against that.
And even then you would have problems. I use bit.ly with a personal domain, wtn.gd
So http://wtn.gd/random would also be a shortened URL.
You could maybe do a HTTP HEAD-request, and check for a 301/302 ?
If you request an URL like this, your HttpCLient should receive a HTTP Redirect instead of a HTML page. This wouldn't be an evidence but at least a hint.
Evaluate the URL and look for some clues:
the Path meets certain criteria
only has one step (i.e. not multiple slashes)
does not end with filename extensions
not longer than X characters (would need to evaluate various URL shortening services and adjust the upper bounds for the max token length)
HttpUrlConnection returns a redirect responseCode (i.e. 301, 302)
I would suggest using android.util.Patterns.WEB_URL
public static List<String> findUrls(String input) {
List<String> links = new ArrayList<>();
Matcher m = android.util.Patterns.WEB_URL.matcher(input);
while (m.find()) {
String url = m.group();
links.add(url);
}
return links;
}
Use the unshorten URL service like https://unshorten.me
They have an API as well https://unshorten.me/api
If the URL is shortened it will return the original URL.
If not you will get the same one back.

Categories