Pattern matching and get multiple values from URL using java - java

I am using Java-8, I would like to check whether the URL is valid or not based on pattern.
If valid then I should get the attributes bookId, authorId, category, mediaId
Pattern: <basepath>/books/<bookId>/author/<authorId>/<isbn>/<category>/mediaId/<filename>
And this is the sample URL
URL => https:/<baseurl>/v1/files/library/books/1234-4567/author/56784589/32475622347586/media/324785643257567/507f1f77bcf86cd799439011_400.png
Here Basepath is /v1/files/library.
I see some pattern matchings but I couldn't relate with my use-case, probably I was not good at reg-ex. I am also using apache-common-utils but I am not sure How to achieve it either.
Any help or hint would be really appreciable.

Try this solution (uses named capture groups in regex):
public static void main(String[] args)
{
Pattern p = Pattern.compile("http[s]?:.+/books/(?<bookId>[^/]+)/author/(?<authorId>[^/]+)/(?<isbn>[^/]+)/media/(?<mediaId>[^/]+)/(?<filename>.+)");
Matcher m = p.matcher("https:/<baseurl>/v1/files/library/books/1234-4567/author/56784589/32475622347586/media/324785643257567/507f1f77bcf86cd799439011_400.png");
if (m.matches())
{
System.out.println("bookId = " + m.group("bookId"));
System.out.println("authorId = " + m.group("authorId"));
System.out.println("isbn = " + m.group("isbn"));
System.out.println("mediaId = " + m.group("mediaId"));
System.out.println("filename = " + m.group("filename"));
}
}
prints:
bookId = 1234-4567
authorId = 56784589
isbn = 32475622347586
mediaId = 324785643257567
filename = 507f1f77bcf86cd799439011_400.png

Related

regex matcher check in if logic not working

Hi, you can see my code below. I have some strings Country, rank and grank in my code, initially they will be null, but if regex is mached, it should change the value. But even if regex is matched it is not changing the value it is always null. If I remove all if statements and append the string it works fine, but if match is not found it is throwing an exception. Please let me know how can I check this in if logic.
System.err.println(content);
Pattern c = Pattern.compile("NAME=\"(.*)\" RANK");
Pattern r = Pattern.compile("\" RANK=\"(.*)\"");
Pattern gr = Pattern.compile("\" TEXT=\"(.*)\" SOURCE");
Matcher co = c.matcher(content);
Matcher ra = r.matcher(content);
Matcher gra = gr.matcher(content);
co.find();
ra.find();
gra.find();
String country = null;
String Rank = null;
String Grank = null;
if (co.matches()) {
country = co.group(1);
}
if (ra.matches()) {
Rank = ra.group(1);
}
if (gra.matches()) {
Grank = gra.group(1);
}
You have to escape a single \ - use double \\ then it should work.
Tried this?
while (co.find()) {
System.out.print("Start index: " + co.start());
System.out.print(" End index: " + co.end() + " ");
System.out.println(co.group());
}
Personally I can't make your program work with / without the if so it's not a problem of logic but just a problem that it doesn't match the string for me
So I changed it to get something working, maybe you can use it :)
String content = "NAME=\"salut\" RANK=\"pouet\" TEXT=\"text\" SOURCE";
System.out.println(content);
System.out.println(content.replaceAll(("NAME=\"(.*)\"\\sRANK=\"(.*)\"\\sTEXT=\"(.*)\" SOURCE"), "$1---$2---$3"));
Output
NAME="salut" RANK="pouet" TEXT="text" SOURCE
salut---pouet---text

How to get a part data from a filename in Java?

How to get the part of data from string:
csvFile = "c:/Users//PHV/01Surname local.csv"
i need to extract Surname from above string
UPD
what you think about it?
File f = new File(csvFile);
String[] parts = f.getName().split(" ");
String strParts = new String(parts[0]);
String finFileName = strParts.substring(2, strParts.length());
You need a regular expression. Something like:
Pattern p = Pattern.compile("^.*/[0-9]+(a-zA-Z)+ .*");
Matcher m = p.matcher(csvFile);
String surname;
if (m.matches()) {
surname = m.group(1);
} else {
System.out.println("filename seems malformed: " + csvFile);
}
UPDATE: Here is a tutorial about regular expressions but not sure it is the best. I think it must work for you though: http://docs.oracle.com/javase/tutorial/essential/regex/
I'm not sure I understand your question, but I assume you want to extract "Surname". If that's correct, please try this:
String surname = csvFile.substring(csvFile.lastIndexOf("/") + 3, csvFile.lastIndexOf(" "));

Java. Replace relative Links to absolute with regex

I want to replace in a String, which represents a Html-File,all relative Links with absolute Links. I write the following method, which does not work. any links are followed by a duplicate baseurl like http://www.google.dehttp://www.google.de/resource?
public static String replacePattern(URL targetUrl,String urlAsString,String patternString) throws IOException{
System.out.println(targetUrl.toString());
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(urlAsString);
Set<String> replacedStrings = new TreeSet<String>();
//return matcher.replaceAll(targetUrl.toString()+"$0");
while (matcher.find()) {
String relativeLink = matcher.group(1);
//System.out.println("Find Link " + relativeLink);
if(!replacedStrings.contains(relativeLink)){
//System.out.println("Relative Link " + relativeLink);
String newLink = targetUrl.toString() + relativeLink;
//System.out.println("New Link " + newLink);
urlAsString = urlAsString.replace(relativeLink,newLink);
replacedStrings.add(relativeLink);
}
}
return urlAsString;
}
UrlAsString is a String which contains the wholecontent as a String.My patterns are
href=['\"](/[^'\"]+)['\"]
and
src=['\"](/[^'\"]+)['\"]
Use Class URL:
URL baseUrl = new URL("http://www.domain.com/folder/");
URL url = new URL(baseURL , "url.html");

Java regEx URL matching issue

and as usual thank you in advance.
I am trying to familiarize myself with regEx and I am having an issue matching a URL.
Here is an example URL:
www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
here is what my regex breakdown looks like:
[site]/[dir]*?/[year]/[month]/[day]/[storyTitle]?/[id]/htmlpage.html
the [id] is a string 22 characters in length that can be either uppercase or lowercase letters, as well as numbers. However, I do not want to extract that from the URL. Just clarifying
Now, I need to extract two values from this url.
First,
I need to extract the dirs(s). However, the [dir] is optional, but also can be as many as wanted. In other words that parameter could not be there, or it could be dir1/dir2/dir3 ..etc . So, going off my first example :
www.examplesite.com/dir1/dir2/dir3/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
Here I would need to extract dir1/dir2/dir3 where a dir is a string that is a single word with all lowercase letters (ie sports/mlb/games). There are no numbers in the dir, only using that as an example.
But in this example of a valid URL:
www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
There is no [dir] so I would not extract anything. thus, the [dir] is optional
Secondly,
I need to extract the [storyTitle] where the [storyTitle] is also optional just like the [dir] above, but however if there is a storyTitle there can only be one.
So going off my previous examples
www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
would be valid where I need to extract 'title-of-some-story' where story titles are dash separated strings that are always lowercase. The example belowis also valid:
www.examplesite.com/dir/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
In the above example, there is no [storyTitle] thus making it optional
Lastly, just to be thorough, a URL without a [dir] and without a [storyTitle] are also valid. Example:
www.examplesite.com/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
Is a valid URL. Any input would be helpful I hope I am clear.
Here is one example that will work.
public static void main(String[] args) {
Pattern p = Pattern.compile("(?:http://)?.+?(/.+?)?/\\d+/\\d{2}/\\d{2}(/.+?)?/\\w{22}");
String[] strings ={
"www.examplesite.com/dir1/dir2/4444/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
"www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
"www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
"www.examplesite.com/dir/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html",
"www.examplesite.com/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html"
};
for (int idx = 0; idx < strings.length; idx++) {
Matcher m = p.matcher(strings[idx]);
if (m.find()) {
String dir = m.group(1);
String title = m.group(2);
if (title != null) {
title = title.substring(1); // remove the leading /
}
System.out.println(idx+": Dir: "+dir+", Title: "+title);
}
}
}
Here is an all regex solution.
Edit: Allows for http://
Java source:
import java.util.*;
import java.lang.*;
import java.util.regex.*;
class Main
{
public static void main (String[] args) throws java.lang.Exception
{
String url = "http://www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html";
String url2 = "www.examplesite.com/dir/dir2/dir3/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html";
String url3 = "www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html";
String patternStr = "(?:http://)?[^/]*[/]?([\\S]*)/[\\d]{4}/[\\d]{2}/[\\d]{2}[/]?([\\S]*)/[\\S]*/[\\S]*";
// Compile regular expression
Pattern pattern = Pattern.compile(patternStr);
// Match 1st url
System.out.println("Match 1st URL:");
Matcher matcher = pattern.matcher(url);
if (matcher.find()) {
System.out.println("URL: " + matcher.group(0));
System.out.println("DIR: " + matcher.group(1));
System.out.println("TITLE: " + matcher.group(2));
}
else{ System.out.println("No match."); }
// Match 2nd url
System.out.println("\nMatch 2nd URL:");
matcher = pattern.matcher(url2);
if (matcher.find()) {
System.out.println("URL: " + matcher.group(0));
System.out.println("DIR: " + matcher.group(1));
System.out.println("TITLE: " + matcher.group(2));
}
else{ System.out.println("No match."); }
// Match 3rd url
System.out.println("\nMatch 3rd URL:");
matcher = pattern.matcher(url3);
if (matcher.find()) {
System.out.println("URL: " + matcher.group(0));
System.out.println("DIR: " + matcher.group(1));
System.out.println("TITLE: " + matcher.group(2));
}
else{ System.out.println("No match."); }
}
}
Output:
Match 1st URL:
URL: http://www.examplesite.com/dir/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
DIR: dir
TITLE: title-of-some-story
Match 2nd URL:
URL: www.examplesite.com/dir/dir2/dir3/2012/06/19/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
DIR: dir/dir2/dir3
TITLE:
Match 3rd URL:
URL: www.examplesite.com/2012/06/19/title-of-some-story/FAQKZjC3veXSalP9zxFgZP/htmlpage.html
DIR:
TITLE: title-of-some-story

get particular string using regex in java

i want how to code for get only link from string using regex or anyothers.
here the following is java code:
String aas = "window.open("+"\""+"http://www.example.com/jscript/jex5.htm"+"\""+")"+"\n"+"window.open("+"\""+"http://www.example.com/jscript/jex5.htm"+"\""+")";
how to get the link http://www.example.com/jscript/jex5.htm
thanks and advance
The Regex
(?<=window.open\(")[^"]*(?="\))
matches the link in the string you have given. Properly escaped it reads
"(?<=window.open\\(\")[^\"]*(?=\"\\))"
This will print out the first URL contained in the string that starts with "http://":
public static void main(String[] args) throws Exception {
String javascriptString = "window.open(" + "\"" + "http://www.example.com/jscript/jex5.htm" + "\"" + ")" + "\n" + "window.open(" + "\""
+ "http://www.example.com/jscript/jex5.htm" + "\"" + ")";
Pattern pattern = Pattern.compile(".*(http://.*)\".*\n.*");
Matcher m = pattern.matcher(javascriptString);
if (m.matches()) {
System.out.println(m.group(1));
}
}

Categories