What's wrong with my RegEx pattern - java

I have this pattern:
Pattern.compile(".*?\\[ISOLATION GROUP (^]+)].*");
I assumed this would match, for example, these two strings:
"[ISOLATION GROUP X] blabla"
"[OTHER FLAG][ISOLATION GROUP Y] blabla"
and then with group(1) I could get the name of the isolation group (in the above examples, "X" resp. "Y")
However the matches() is not even returning true. Why do these strings not match that pattern, what is wrong with the pattern?

When using a formal pattern matcher in Java, we don't need to use a pattern which matches the entire input. Instead, just use the pattern \[ISOLATION GROUP ([^\]]+) to get all matches:
String input = "[ISOLATION GROUP X] blabla";
input += "[OTHER FLAG][ISOLATION GROUP Y] blabla";
String pattern = "\\[ISOLATION GROUP ([^\\]]+)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(input);
while (m.find()) {
System.out.println("Found value: " + m.group(1));
}
Found value: X
Found value: Y
Demo

You forgot to enclose the characters of the group within braces.
.*?\\[ISOLATION GROUP (^]+)].*
should become
.*?\\[ISOLATION GROUP ([^\\]]+)\\].*
Demo
Positive lookbehind
Try using a positive lookbehind maybe? it is much more easier than your solution I think and you just have to deal with a single group
(?<=ISOLATION GROUP\s)[^\\]]+

This should work
Pattern.compile(".*?\\[ISOLATION GROUP .*\\].*");

Related

How does Java's Matcher.group (int) method avoid match the contents of sub-braces inside parentheses

I have a string like
String str = "美国临时申请No.62004615";
And a regex like
String regex = "(((美国|PCT|加拿大){0,1})([\\u4E00-\\u9FA5]{1,8})((NO.|NOS.){1})([\\d]{5,}))";
And other code is
Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println("1:"+matcher.group(1)+"\n"
+"2:"+matcher.group(2)+"\n"
+"3:"+matcher.group(3)+"\n"
+"4:"+matcher.group(4)+"\n"
+"5:"+matcher.group(5)+"\n"
+"6:"+matcher.group(6)+"\n"
+"7:"+matcher.group(7));
}
I know Parenthesis () are used to enable grouping of regex phrases. And group 1 is the big group.
The second group is ((美国|PCT|加拿大){0,1}) to match the "美国" or "PCT" or "加拿大".
The third group is ([\u4E00-\u9FA5]{1,8}) to match the chinese character which length is one to eight.
The fouth group is ((NO.|NOS.){1}) to match the NO. or NOS.
The fifth group is ([\d]{5,}) to match the number
But the console is
1:美国临时申请No.62004615 2:美国 3:美国 4:临时申请 5:No. 6:No. 7:62004615
The group (2) is the same as group (3).The group (5) is the same as group (6)
It seems that group (3) rematches the sub-parentheses inside the parentheses again. I wonder if there is a way to match only the outermost parentheses。
The ideal result should be
1:美国临时申请No.62004615 2:美国 3:临时申请 4:No. 5:62004615
It sounds like you want a non-capturing group. From the Pattern documentation:
(?:X) X, as a non-capturing group
So, change this:
(美国|PCT|加拿大)
to this:
(?:美国|PCT|加拿大)
… and then it will not be represented as a group at all in the Matcher.
Some side notes:
{0,1} is the same as writing ?.
{1} does nothing and can be removed entirely.
[\\d] is the same as just \\d.

How do i find the first un-escaped quotes in Java?

Example, I have the folowing String:
String str = "te\\\"st\""
and I must find the index of the first un-escaped(without \) ".
In the example the right index is 9.
Is there any regex or any other solution to resolve this problem?
I have the following code
Pattern pattern = Pattern.compile(HERE A REGEX);
Matcher matcher = pattern.matcher(json);
if(matcher.find()) {
System.out.println(matcher.start());
}
but I don't know what kind of regexp to use.
You may try this regex to find first un-escaped quote,
^[^"\\]*?(?:[\\]{2})*(")
Demo,,, in which I intended to capture the first unescaped quote to group 1 (\1 or $1).
And for finding the index of the quote( captured group 1), you will need to retrieve the value,
matcher.start(1)

Java pattern to find two groups of two letters in `ABC`

I have a pattern defined like this:
private static final Pattern PATTERN = Pattern.compile("[a-zA-Z]{2}");
And in my code I'm doing this:
Matcher matcher = PATTERN.matcher(myString);
and using a while loop to find all matches.
while (matcher.find()){
//do something here
}
If myString is 12345AB3CD45 the matcher is finding those two groups of two letters (AB and CD). The problem is that I have sometimes myString as 12345ABC356 so I would like the matcher to find, first AB and then BC (is only finding `AB).
Am I doing this wrong or the regex is wrong or the matcher doesn't work this way?
You can't match a same position several times with a regex, but you can use a trick.
To do that you need to enclose your pattern in a lookahead and a capture group:
(?=([A-Za-z]{2})), because a lookahead matches no characters and consumes only one position.
The result you are looking for is in the capture group 1.
Fragment of text which was placed in group 0 (entire match) can't be reused in next match to be part of group 0.
12345ABC356
^^ - AB was placed in standard match (group 0)
^^ - B can't be reused here as part of standard match
You can solve this problem with look-around mechanisms like look-ahead, which doesn't consume matched part (they are zero-length), but you can place their content in separate capturing group which you will be able to access.
So your code can look like
private static final Pattern PATTERN = Pattern.compile("[a-zA-Z](?=([a-zA-Z]))");
// ^^^^^^^^ ^^^^^^^^^^
// group 0 group 1
//...
Matcher matcher = PATTERN.matcher(myString);
while (matcher.find()){
String match = matcher.group() + matcher.group(1);
//...
}

Find characters that match a regex's set

I have a regex w_p[a-z]
It would match input like w_pa, w_pb ... w_pz. I like to find which character exactly was matched i.e. a,b or z for the above input. Is this possible with java regex?
Yes, you need to capture:
final Pattern pattern = Pattern.compile("w_p([a-z])");
final Matcher m = pattern.matcher(input);
if (m.find())
// what is matched is in m.group(1)
Sure, use Regexpr groups. w_p([a-z]) defines a group for the character you are looking for.
Pattern p = Pattern.compile("w_p([a-z])");
Matcher matcher = p.matcher(input);
if (matcher.find()) {
String character = matcher.group(1)
}
matcher.group(0) contains all that was matched (w_pa or w_pb etc.)
matcher.group(1) contains what was found in the first () pair.
See the documentation for more information.
The REGEX will be something like this:
w_p([a-z])
So you will create a group from wich you can get the value

Regular Expression in Java: How to refer to "matched patterns"?

I was reading the Java Regular Expression tutorial, and it seems only to teach to test whether a pattern matched or not, but does not tell me how to refer to a matched pattern.
For example, I have a string "My name is xxxxx". And I want to print xxxx. How would I do that with Java regular expressions?
Thanks.
What tutorial were you reading ? The sun's one tackles that topic quite thoroughly, but you have to read it correctly :)
Capturing a part of a string is done through the parentheses. If you want to capture a group in a string, you have to put this part of the regular expression in parentheses. The groups are defined in the order the parentheses appear, and the group with index 0 represents the whole string.
For instance, the regexp "Day ([0-9]+) - Note ([0-9]+)" would define 3 groups :
group(0) : The whole string
group(1) : The first group in the regexp, that is to say the day number
group(2) : The second group in the regexp, that is to say the note number
As for the actual code and how to retrieve the groups you've defined in your regexp, have a look at the Java documentation, especially the Matcher class and its group method : http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Matcher.html
You can test your regexps with that very useful tool : http://www.cis.upenn.edu/~matuszek/General/RegexTester/regex-tester.html
Hope this helped,
Cheers
Note the use of parentheses in the pattern and the group() method on Matcher
import java.util.regex.*;
public class Example {
static public void main(String[] args) {
Pattern regex = Pattern.compile("My name is (.*)");
String s = "My name is Michael";
Matcher matcher = regex.matcher(s);
if (matcher.matches()) {
System.out.println("original string: " + matcher.group(0));
System.out.println("first group: " + matcher.group(1));
}
}
}
Output is:
original string: My name is Michael
first group: Michael
You can use the Matcher group(int) method:
Pattern p = Pattern.compile("My name is (.*)");
Matcher m = p.matcher("My name is akf");
m.find();
String s = m.group(1); //grab the first group*
System.out.println(s);
output:
akf
* look at matching groups
Matcher m = Pattern.compile("name is (.*)").matcher("My name is Ross");
if (m.find()) {
System.out.println(m.group(0));
System.out.println(m.group(1));
}
The parens form a capturing group. Group 0 is the entire pattern and group 1 is the back reference.
The above program outputs:
name is Ross
Ross

Categories