Java URL regex not matching - java

I am trying to count the number of URLs in a Java string:
String test = "This http://example.com is a sentence https://secure.whatever.org that contains 2 URLs.";
String urlRegex = "<\\b(https?|ftp|file)://[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|]>";
int numUrls = 0;
pattern = Pattern.compile(urlRegex);
matcher = pattern.matcher(test);
while(matcher.find())
numUrls++;
System.err.println("numUrls = " + numUrls);
When I run this it tells me I have zero (not 2) URLs in the string. Any ideas as to why? Thanks in advance!

The < and > characters in urlRegex are causing a mismatch between your pattern and your input test String. Removing them will yield a numUrls value of 2 as intended.

Try this code :
String data = "This http://example.com is a sentence https://secure.whatever.org that contains 2 URLs.";
Pattern pattern = Pattern.compile("[hH][tT]{2}[Pp][sS]?://(\\w+(\\.\\w+?)?)+");
Matcher matcher = pattern.matcher(data);
while (matcher.find()) {
System.out.println(matcher.group());
}
Hopefully it will work.

Related

Match everything after and before something regex Java

Here is my code:
String stringToSearch = "https://example.com/excludethis123456/moretext";
Pattern p = Pattern.compile("(?<=.com\\/excludethis).*\\/"); //search for this pattern
Matcher m = p.matcher(stringToSearch); //match pattern in StringToSearch
String store= "";
// print match and store match in String Store
if (m.find())
{
String theGroup = m.group(0);
System.out.format("'%s'\n", theGroup);
store = theGroup;
}
//repeat the process
Pattern p1 = Pattern.compile("(.*)[^\\/]");
Matcher m1 = p1.matcher(store);
if (m1.find())
{
String theGroup = m1.group(0);
System.out.format("'%s'\n", theGroup);
}
I want to to match everything that is after excludethis and before a / that comes after.
With "(?<=.com\\/excludethis).*\\/" regex I will match 123456/ and store that in String store. After that with "(.*)[^\\/]" I will exclude / and get 123456.
Can I do this in one line, i.e combine these two regex? I can't figure out how to combine them.
Just like you have used a positive look behind, you can use a positive look ahead and change your regex to this,
(?<=.com/excludethis).*(?=/)
Also, in Java you don't need to escape /
Your modified code,
String stringToSearch = "https://example.com/excludethis123456/moretext";
Pattern p = Pattern.compile("(?<=.com/excludethis).*(?=/)"); // search for this pattern
Matcher m = p.matcher(stringToSearch); // match pattern in StringToSearch
String store = "";
// print match and store match in String Store
if (m.find()) {
String theGroup = m.group(0);
System.out.format("'%s'\n", theGroup);
store = theGroup;
}
System.out.println("Store: " + store);
Prints,
'123456'
Store: 123456
Like you wanted to capture the value.
This may be useful for you :)
String stringToSearch = "https://example.com/excludethis123456/moretext";
Pattern pattern = Pattern.compile("excludethis([\\d\\D]+?)/");
Matcher matcher = pattern.matcher(stringToSearch);
if (matcher.find()) {
String result = matcher.group(1);
System.out.println(result);
}
If you don't want to use regex, you could just try with String::substring*
String stringToSearch = "https://example.com/excludethis123456/moretext";
String exclusion = "excludethis";
System.out.println(stringToSearch.substring(stringToSearch.indexOf(exclusion)).substring(exclusion.length(), stringToSearch.substring(stringToSearch.indexOf(exclusion)).indexOf("/")));
Output:
123456
* Definitely don't actually use this

How to extract id from url ? Google sheet

I have the follow urls.
https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY/edit#gid=1842172258
https://docs.google.com/a/example.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6PTKTzY0xOM5c6TXY/edit#gid=1842172258
https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY
Foreach url, I need to extract the sheet id: 1mrsetjgfZI2BIypz7SGHMOfHGv6PTKTzY0xOM5c6TXY into a java String.
I am thinking of using split but it can't work with all test cases:
String string = "https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY/edit#gid=1842172258";
String[] parts = string.split("/");
String res = parts[parts.length-2];
Log.d("hello res",res );
How can I that be possible?
You can use regex \/d\/(.*?)(\/|$) (regex demo) to solve your problem, if you look closer you can see that the ID exist between d/ and / or end of line for that you can get every thing between this, check this code demo :
String[] urls = new String[]{
"https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY/edit#gid=1842172258",
"https://docs.google.com/a/example.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6PTKTzY0xOM5c6TXY/edit#gid=1842172258",
"https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY"
};
String regex = "\\/d\\/(.*?)(\\/|$)";
Pattern pattern = Pattern.compile(regex);
for (String url : urls) {
Matcher matcher = pattern.matcher(url);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
}
Outputs
1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY
1mrsetjgfZI2BIypz7SGHMOfHGv6PTKTzY0xOM5c6TXY
1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY
it looks like the id you are looking for always follow "/spreadsheets/d/" if it is the case you can update your code to that
String string = "https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY/edit#gid=1842172258";
String[] parts = string.split("spreadsheets/d/");
String result;
if(parts[1].contains("/")){
String[] parts2 = parts[1].split("/");
result = parts2[0];
}
else{
result=parts[1];
}
System.out.println("hello "+ result);
Using regex
Pattern pattern = Pattern.compile("(?<=\\/d\\/)[^\\/]*");
Matcher matcher = pattern.matcher(url);
System.out.println(matcher.group(1));
Using Java
String result = url.substring(url.indexOf("/d/") + 3);
int slash = result.indexOf("/");
result = slash == -1 ? result
: result.substring(0, slash);
System.out.println(result);
Google use fixed lenght characters for its IDs, in your case they are 44 characters and these are the characters google use: alphanumeric, -, and _ so you can use this regex:
regex = "([\w-]){44}"
match = re.search(regex,url)

How to extract string between a particular string in java. Using itext adding extracted string to make it as bold

I am finding the string in between 123 and 321 and making it as bold.
For that I used the Pattern to get the string before 123, text between 123 and 321 and text after 321.
Could anyone please help me to get all the strings between 123 and 321.
Below code only helps me to get the first occurrence of 123 and 321.
Pattern p = Pattern.compile("^.*?(123)");
Matcher m = p.matcher(meredithEditorialSectionSegment);
while (m.find()) {
String desc = m.group();
String segDesc = (desc.substring(0, desc.length() - 3));
segmentDesc.add(new Chunk(segDesc, sectionDescriptionFont));
}
descBold = meredithEditorialSectionSegment.substring(meredithEditorialSectionSegment.indexOf("123") + 3);
descBold = descBold.substring(0, descBold.indexOf("321"));
segmentDesc.add(new Chunk(descBold, sectionDescriptionBoldFont));
Matcher matcher = Pattern.compile("(?<=321).*").matcher(meredithEditorialSectionSegment);
matcher.find();
segmentDesc.add(new Chunk(matcher.group(), sectionDescriptionFont));
This should do the trick.
String str = "Could anyone please help me to get all the"+
" strings between 123 and 321.\nMaybe 123 and another 321."+
" Also,123 this must be matched.321\nhey!"+
" 312 And 123 this must NOT be matched. ";
Pattern pattern = Pattern.compile("(.*?123)(.*?)(321.*?)",Pattern.DOTALL);
Matcher matcher = pattern.matcher(str);
StringBuffer sb= new StringBuffer();
int last=0;
while (matcher.find()) {
System.out.print(matcher.group(1)+"["+matcher.group(2)+"]"+matcher.group(3));
last=matcher.end(3);
}
//the rest of the string
System.out.println(str.substring(last));
Notes:
I've added the DOTALL flag, (for avoiding newlines issues)
In your case, you only have to adapt the group(2) string
Output:
Could anyone please help me to get all the strings between 123[ and ]321.
Maybe 123[ and another ]321. Also,123[ this must be matched.]321
hey! 312 And 123 this must NOT be matched.

regex extract string between two characters

I would like to extract the strings between the following characters in the given string using regex in Java:
/*
1) Between \" and \" ===> 12222222222
2) Between :+ and # ===> 12222222222
3) Between # and > ===> 192.168.140.1
*/
String remoteUriStr = "\"+12222222222\" <sip:+12222222222#192.168.140.1>";
String regex1 = "\"(.+?)\"";
String regex2 = ":+(.+?)#";
String regex3 = "#(.+?)>";
Pattern p = Pattern.compile(regex1);
Matcher matcher = p.matcher(remoteUri);
if (matcher.matches()) {
title = matcher.group(1);
}
I am using the above given code snippet, its not able to extract the strings that I want it to. Am I doing anything wrong? Meanwhile, I am quite new to regex.
The matches() method attempts to match the regular expression against the entire string. If you want to match a part of the string, you want the find() method:
if (matcher.find())
You could, however, build a single regular expression to match all three parts at once:
String regex = "\"(.+?)\" \\<sip:\\+(.+?)#(.+?)\\>";
Pattern p = Pattern.compile(regex);
Matcher matcher = p.matcher(remoteUriStr);
if (matcher.matches()) {
title = matcher.group(1);
part2 = matcher.group(2);
ip = matcher.group(3);
}
Demo: http://ideone.com/8t2EC
If your input always looks like that and you always want the same parts from it you can put that in a single regex (with multiple capturing groups):
"([^"]+)" <sip:([^#]+)#([^>]+)>
So you can then use
Pattern p = Pattern.compile("\"([^\"]+)\" <sip:([^#]+)#([^>]+)>");
Matcher m = p.matcher(remoteUri);
if (m.find()) {
String s1 = m.group(1);
String s2 = m.group(2);
String s3 = m.group(3);
}

Parsing String in Java using a Pattern

I am trying parse out 3 pieces of information from a String.
Here is my code:
text = "H:7 E:7 P:10";
String pattern = "[HEP]:";
Pattern p = Pattern.compile(pattern);
String[] attr = p.split(text);
I would like it to return:
String[0] = "7"
String[1] = "7"
String[2] = "10"
But all I am getting is:
String[0] = ""
String[1] = "7 "
String[2] = "7 "
String[3] = "10"
Any suggestions?
A not-so-elegant solution I just devised:
String text = "H:7 E:7 P:10";
String pattern = "[HEP]:";
text = text.replaceAll(pattern, "");
String[] attr = text.split(" ");
From the javadoc, http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#split(java.lang.CharSequence) :
The array returned by this method contains each substring of the input
sequence that is terminated by another subsequence that matches this
pattern or is terminated by the end of the input sequence.
You get the empty string first because you have a match at the beginning of the string, it seems.
If I try your code with String text = "A H:7 E:7 P:10" I get indeed:
A 7 7 10
Hope it helps.
I would write a full regular expression like the following:
Pattern pattern = Pattern.compile("H:(\\d+)\\sE:(\\d+)\\sP:(\\d+)");
Matcher matcher = pattern.matcher("H:7 E:7 P:10");
if (!matcher.matches()) {
// What to do!!??
}
String hValue = matcher.group(1);
String eValue = matcher.group(2);
String pValue = matcher.group(3);
Basing on your comment I take it that you only want to get the numbers from that string (in a particular order?).
So I would recommend something like this:
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher("H:7 E:7 P:10");
while(m.find()) {
System.out.println(m.group());
}

Categories