How to validate the URL using regex in Java? - java

I need to check if an URL is valid or not. The URL should contain some subdirectories like as:
example.com/test/test1/example/a.html
The URL should contain subdirectories test, test1 and example. How can I check if the URL is valid using regex in Java?

String url = "example.com/test/test1/example/a.html";
List<String> parts = Arrays.asList(url.split("/"));
return (parts.contains("test") && parts.contains("test1") && parts.contains("example"));

Since you want to do in regex, how about this...
Pattern p = Pattern.compile("example\\.com/test/test1/example/[\\w\\W]*");
System.out.println("OK: " + p.matcher("example.com/test/test1/example/a.html").find());
System.out.println("KO: " + p.matcher("example.com/test/test2/example/a.html").find());

You can simply pass your URL as an argument to the java.net.URL(String) constructor and check if the constructor throws java.net.MalformedURLException.
EDIT If, however, you simply want to check if a given string contains a given substring, use the String.contains(CharSequence) method. For example:
String url = "example.com/test/test1/example/a.html";
if (url.contains("/test/test1/")) {
// ...
}

This question is answered here using regular expressions:
Regular expression to match URLs in Java
But you can use the library Apache Commons Validators to use some tested validators instead to write your own.
Here is the library:
http://commons.apache.org/validator/
And here the javadoc of the URL Validator.
http://commons.apache.org/validator/apidocs/org/apache/commons/validator/UrlValidator.html

Related

Getting file extension from http url using Java

Now I know about FilenameUtils.getExtension() from apache.
But in my case I'm processing extensions from http(s) urls, so in case I have something like
https://your_url/logo.svg?position=5
this method is gonna return svg?position=5
Is there the best way to handle this situation? I mean without writing this logic by myself.
You can use the URL library from JAVA. It has a lot of utility in this cases. You should do something like this:
String url = "https://your_url/logo.svg?position=5";
URL fileIneed = new URL(url);
Then, you have a lot of getter methods for the "fileIneed" variable. In your case the "getPath()" will retrieve this:
fileIneed.getPath() ---> "/logo.svg"
And then use the Apache library that you are using, and you will have the "svg" String.
FilenameUtils.getExtension(fileIneed.getPath()) ---> "svg"
JAVA URL library docs >>>
https://docs.oracle.com/javase/7/docs/api/java/net/URL.html
If you want a brandname® solution, then consider using the Apache method after stripping off the query string, if it exists:
String url = "https://your_url/logo.svg?position=5";
url = url.replaceAll("\\?.*$", "");
String ext = FilenameUtils.getExtension(url);
System.out.println(ext);
If you want a one-liner which does not even require an external library, then consider this option using String#replaceAll:
String url = "https://your_url/logo.svg?position=5";
String ext = url.replaceAll(".*/[^.]+\\.([^?]+)\\??.*", "$1");
System.out.println(ext);
svg
Here is an explanation of the regex pattern used above:
.*/ match everything up to, and including, the LAST path separator
[^.]+ then match any number of non dots, i.e. match the filename
\. match a dot
([^?]+) match AND capture any non ? character, which is the extension
\??.* match an optional ? followed by the rest of the query string, if present

Replace path url placeholder format

What's the best way to replace a path url placeholder. I have the following that needs to be replaced
/user/:name/password/:password
as
/user/{name}/password/{password}
Is there a library that could do this for me in Java?
Since the format is so simple, and : isn't a valid URL character, I would just use a basic regex matching : followed by any word, capturing the word for reprint.
"/user/:name/password/:password".replaceAll(":(\\w+)","{$1}")
Just using String#replaceAll can achieve your way.
"/user/:name/password/:password".replaceAll(":(\\w+)","{$1}")
Did you try to use replaceAll like this :
String str = "/user/:name/password/:password";
String result = str.replaceAll(":(\\w+)", "{$1}");
Output
/user/{name}/password/{password}

find the path param using regex in the url

what is the regular expression to find the path param from the url?
http://localhost:8080/domain/v1/809pA8
https://localhost:8080/domain/v1/809pA8
Want to retrieve the value(809pA8) from the above URL using regular expression, java is preferable.
I would suggest you do something like
url.substring(url.lastIndexOf('/') + 1);
If you really prefer regexps, you could do
Matcher m = Pattern.compile("/([^/]+)$").matcher(url);
if (m.find())
value = m.group(1);
I would try:
String url = "http://localhost:8080/domain/v1/809pA8";
String value = String.valueOf(url.subSequence(url.lastIndexOf('/'), url.length()-1));
No need for regex here, I think.
EDIT: I'm sorry I made a mistake:
String url = "http://localhost:8080/domain/v1/809pA8";
String value = String.valueOf(url.subSequence(url.lastIndexOf('/')+1, url.length()));
See this code working here: https://ideone.com/E30ddC
For your simple case, regex is an overkill, as others noted. But, if you have more cases and this is why you prefer regex, give Spring's AntPathMatcher#extractUriTemplateVariables a look, if you're using Spring. It's actually better equipped for extracting path variables than regex directly. Here are some good examples.

How do I split the rest of the URL from the last path of it

I have this file URL: http://xxx.xxx.xx.xx/resources/upload/2014/09/02/new sample.pdf which will be converted to http://xxx.xxx.xx.xx/resources/upload/2014/09/02/new%20sample.pdf later.
Now I can get the last path by:
public static String getLastPathFromUrl(String url) {
return url.replaceFirst(".*/([^/?]+).*", "$1");
}
which will give me new sample.pdf
but how do I get the remaining of the URL: http://xxx.xxx.xx.xx/resources/upload/2014/09/02/
?
Easier way to get last path from URL would be to use String.split function, like this:-
String url = "http://xxx.xxx.xx.xx/resources/upload/2014/09/02/new sample.pdf";
String[] urlArray = url.split("/");
String lastPath = urlArray[urlArray.length-1];
This converts your url into an Array which can then be used in many ways. There are various ways to get url-lastPath, one way could be to join the above generated Array using this answer. Or use lastIndexOf() and substring like this:-
String restOfUrl = url.substring(0,url.lastIndexOf("/"));
PS:- Although you can learn something by doing this but I think your best solution would be to replace space by %20 in the complete url String, that would be the fastest and make more sense.
I am not sure if I understood it correctly but when you say
I have this file URL: URL/new sample.pdf which will be converted to URL/new%20sample.pdf later.
It looks like you are trying to replace "space" with %20 in URL or said in simple words trying to take care of unwanted characters in URL. If that is what you need use pre-built
URLEncoder.encode(String url,String enc), You can us ÜTF-8 as encoding.
http://docs.oracle.com/javase/7/docs/api/java/net/URLEncoder.html
If you really need to split it, assuming that you interested in URL after http://, remove http:// and take store remaining URL in string variable called say remainingURL. then use
List myList = new ArrayList(Arrays.asList(remainingURL.split("/")));
You can iterate on myList to get rest of URL fragments.
I've found it:
File file=new File("http://xxx.xxx.xx.xx/resources/upload/2014/09/02/new sample.pdf");
System.out.println(file.getPath().replaceAll(file.getName(),""));
Output:
http://xxx.xxx.xx.xx/resources/upload/2014/09/02/
Spring solution:
List<String> pathSegments = UriComponentsBuilder.fromUriString(url).build().getPathSegments();
String lastPath = pathSegments.get(pathSegments.size()-1);

Regular expression Hostname

I am developing a http robot, and I developed this regular expression
(((?:f|ht)tp(?:s)?\\://)?|www)([^/]+) to detect and extract the hostname from a link (href).
Now I put here the results of the tests:
String -> http://www.meloteca.com/empresas-editoras.htm
Returns http://www.meloteca.com
String -> www.meloteca.com/empresas-editoras.htm
Returns www.meloteca.com
String -> /empresas-editoras.htm
Returns empresas-editoras.htm (without the slash)
In this case I was expecting that the regular expressions did not return any value? Why is this happening?
The same thing if I try with the following String
String -> empresas-editoras.htm
Returns empresas-editoras.htm
The snippet of code :
Pattern padrao = Pattern.compile("(((?:f|ht)tp(?:s)?\\://)?|www)([^/]+)");
Matcher mat = padrao.matcher("empresas-editoras.htm");
if(mat.find())
System.out.println("Host->"+mat.group());
It'd be better to use the URI class, and its methods like getHost() and getPath(), rather than a regular expression. The rules for constructing URIs are more complex than you probably realize, and your regex is likely to have lots of corner cases that won't be handled correctly.
If you remove one of the question marks, like this:
(((?:f|ht)tp(?:s)?\\://)|www)([^/]+)
it should work better.
The alternative ((?:f|ht)tp(?:s)?\\://)? is optional, so it can be the empty string, and then ([^/]+) just will match any string not containing /.

Categories