Regular expression groups with all combinations [duplicate] - java

This question already has answers here:
How to use regex to find all overlapping matches
(5 answers)
Closed 3 years ago.
I am studying regular expression groups and have a simple question about that. Let's say I have a basic regular expression in java such as :
Pattern pattern = Pattern.compile("[0-9]{16}");
And I have a matcher :
Matcher matcher = pattern.matcher("111111111111111122);
while (matcher.find()) {
System.out.println(matcher.group());
}
When I loop, I want to be printed :
1111111111111111
1111111111111112
1111111111111122
I want to get the result of all 16 length number combinations. But it's only printed :
1111111111111111
Can I solve this issue by only modifying the regexp pattern?

To get the result you want, change your code to:
Pattern pattern = Pattern.compile("(?=([0-9]{16}))");
Matcher matcher = pattern.matcher("111111111111111122");
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Notice the call to group(1), not group(), which is the same as group(0).
Output
1111111111111111
1111111111111112
1111111111111122

Related

How to capture a regex group for below pattern [duplicate]

This question already has answers here:
Regex: match everything but a specific pattern
(6 answers)
Closed 3 years ago.
I am exploring java regex groups and I am trying to replace a string with some characters.
I have a string str = "abXYabcXYZ"; and I am trying to replace all characters except for the pattern group abc in string.
I tried to use str.replaceAll("(^abc)",""), but it did not work. I understand that (abc) will match a group.
You might find it easier to find the parts you want to keep and just build a new string. There are flaws with this issue with overlapping patterns, but it will likely be good enough for your use case. However, if your pattern really is as simple as "abc" then you may want to instead consider just counting the total number of matches.
String str = "abXYabcXYZ";
Pattern patternToKeep = Pattern.compile("abc");
MatchResult matches = patternToKeep.matcher(str).toMatchResult();
StringBuilder sb = new StringBuilder();
for (int i = 1; i < matches.groupCount(); i++) {
sb.append(matches.group(i));
}
System.out.println(sb.toString());
It is easier to keep the matching parts of the pattern and concatenate them. In the following example the matcher iterates with find() over str and match the next pattern. In the loop your "abc" pattern will be always found at group(0).
String str = "abXYabcXYZabcxss";
Pattern pattern = Pattern.compile("abc");
StringBuilder sb = new StringBuilder();
Matcher matcher = pattern.matcher(str);
while(matcher.find()){
sb.append(matcher.group(0));
}
System.out.println(sb.toString());
For only replacing, the nearest you can get would be:
((?!abc).)*
But with the problem that only the a's of abc would not be replaced.
Regex101 example

How can I get the second matcher in regex in Java? [duplicate]

This question already has answers here:
Match at every second occurrence
(6 answers)
Closed 4 years ago.
I want to extract the second matcher in a regex pattern between - and _ in this string:
VA-123456-124_VRG.tif
I tried this:
Pattern mpattern = Pattern.compile("-.*?_");
But I get 123456-124 for the above regex in Java.
I need only 124.
How can I achieve this?
If you know that's your format, this will return the requested digits.
Everything before the underscore that is not a dash
Pattern pattern = Pattern.compile("([^\-]+)_");
I would use a formal pattern matcher here, to be a specific as possible. I would use this pattern:
^[^-]+-[^-]+-([^_]+).*
and then check the first capture group for the possible match. Here is a working code snippet:
String input = "A-123456-124_VRG.tif";
String pattern = "^[^-]+-[^-]+-([^_]+).*";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(input);
if (m.find()) {
System.out.println("Found value: " + m.group(1) );
}
124
Demo
By the way, there is a one liner which would also work here:
System.out.println(input.split("[_-]")[2]);
But, the caveat here is that it is not very specific, and might fail for your other data.
You know you want only digits so be more specific Pattern.compile("-([0-9]+)_");
Try using below regex:
.*-(.*?)_
What this will do is : .* will match all the characters till it finds - . Also, as it is greedy, it will try to find the last possible option, which is just before 24
Demo: https://regex101.com/r/NWgZoH/1
JShell Output:
jshell> Pattern pattern = Pattern.compile(".*-(.*?)_");
pattern ==> .*-(.*?)_
jshell> Matcher matcher = pattern.matcher("VA-123456-124_VRG.tif");
matcher ==> java.util.regex.Matcher[pattern=.*-(.*?)_ region=0,21 lastmatch=]
jshell> if(matcher.find()){
...> System.out.println(matcher.group(1));
...> }
124
Your test case are very low, but if I answer your test case I think below regex can be helpful.
-.*-(.*)_
then extract first group.
if you just want to extract in simple way go ahead with this,
public static void main(String[] args) {
String s = "VA-123456-124_VRG.tif";
System.out.println(s.split("[_-]")[2]);
}

Pattern and Matcher in Java: Matcher only finds one match instead of two [duplicate]

This question already has answers here:
Overlapping matches in Regex
(3 answers)
Closed 5 years ago.
I'm working with Pattern and Matcher in Java. I have the following code:
String searchString = "0,00,0";
String searchInText = "0,00,00,0"
Pattern p = Pattern.compile(searchString);
Matcher m = p.matcher(searchString);
while(m.find){
...
}
My Problem is that the Matcher only finds one match from the first zero to the 4th zero. But there should be another match from the 3rd zero to the last zero.
Can someone help me? Is there a workaround?
Getting overlapping matches with regex is tricky, especially if you're not very familiar with regexes.
If you're not really using regex functionality (like in your example), you could easily do this with an indexOf(String, int) and keep increasing the index from which you're doing the search.
int index = 0;
while((index = text.indexOf(pattern, index)) > -1) {
System.out.println(index + " " + pattern);
index++;
}

java split by bracket and keep the delmiter - RegEx [duplicate]

This question already has answers here:
How do I split a string in Java?
(39 answers)
Closed 6 years ago.
i am trying to split the string using regex with closing bracket as a delimiter and have to keep the bracket..
i/p String: (GROUP=test1)(GROUP=test2)(GROUP=test3)(GROUP=test4)
needed o/p:
(GROUP=test1)
(GROUP=test2)
(GROUP=test3)
(GROUP=test4)
I am using the java regex - "\([^)]*?\)" and it is throwing me the error..Below is the code I am using and when I try to get the group, its throwing the error..
Pattern splitDelRegex = Pattern.compile("\\([^)]*?\\)");
Matcher regexMatcher = splitDelRegex.matcher("(GROUP=test1)(GROUP=test2) (GROUP=test3)(GROUP=test4)");
List<String> matcherList = new ArrayList<String>();
while(regexMatcher.find()){
String perm = regexMatcher.group(1);
matcherList.add(perm);
}
any help is appreciated..Thanks
You simply forgot to put capturing parentheses around the entire regex. You are not capturing anything at all. Just change the regex to
Pattern splitDelRegex = Pattern.compile("(\\([^)]*?\\))");
^ ^
I tested this in Eclipse and got your desired output.
You could use
str.split(")")
That would return an array of strings which you would know are lacking the closing parentheses and so could add them back in afterwards. Thats seems much easier and less error prone to me.
You could try changing this line :
String perm = regexMatcher.group(1);
To this :
String perm = regexMatcher.group();
So you read the last found group.
I'm not sure why you need to split the string at all. You can capture each of the bracketed groups with a regex.
Try this regex (\\([a-zA-Z0-9=]*\\)). I have a capturing group () that looks for text that starts with a literal \\(, contains [a-zA-Z0-9=] zero or many times * and ends with a literal \\). This is a pretty loose regex, you could tighten up the match if the text inside the brackets will be predictable.
String input = "(GROUP=test1)(GROUP=test2)(GROUP=test3)(GROUP=test4)";
String regex = "(\\([a-zA-Z0-9=]*\\))";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
while(matcher.find()) { // find the next match
System.out.println(matcher.group()); // print the match
}
Output:
(GROUP=test1)
(GROUP=test2)
(GROUP=test3)
(GROUP=test4)

JAVA matchers group [duplicate]

This question already has answers here:
Java regex capture not working
(4 answers)
Closed 6 years ago.
I'm building a simple twitter user mention finder using regex.
public static Set<String> getMentionedUsers(List<Tweet> tweets) {
Set<String> mentionedUsers = new TreeSet<>();
String regex = "(?<=^|(?<=[^a-zA-Z0-9-_\\\\.]))#([A-Za-z][A-Za-z0-9_]+)";
for(Tweet tweet : tweets){
Matcher matcher = Pattern.compile(regex).matcher(tweet.getText().toLowerCase());
if(matcher.find()) {
mentionedUsers.add(matcher.group(0));
}
}
return mentionedUsers;
}
And it fails to find match if the expression is in the end of text for example "#glover tell me about #GREG" it returns only "#glover".
You have to keep looping with matcher.find() over a single tweet until you do not find any more matches, you currently check each tweet only once.
(Sidenote: You should compile the pattern outside of your for-loop, even better would be to compile it outside of the method)
public static Set<String> getMentionedUsers(List<Tweet> tweets) {
Set<String> mentionedUsers = new TreeSet<>();
String regex = "(?<=^|(?<=[^a-zA-Z0-9-_\\\\.]))#([A-Za-z][A-Za-z0-9_]+)";
Pattern p = Pattern.compile(regex);
for(Tweet tweet : tweets){
Matcher matcher = p.matcher(tweet.getText().toLowerCase());
while (matcher.find()) {
mentionedUsers.add(matcher.group(0));
}
}
return mentionedUsers;
}
You are adding matcher.group(0) to your Set, take a look to the Java Docs
Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group().
The capturing group start from 1, see the reference
Group number
Capturing groups are numbered by counting their opening parentheses
from left to right. In the expression ((A)(B(C))), for example, there
are four such groups:
1 ((A)(B(C)))
2 (A)
3 (B(C))
4 (C)
Group zero always stands for the entire expression.
Capturing groups are so named because, during a match, each
subsequence of the input sequence that matches such a group is saved.
The captured subsequence may be used later in the expression, via a
back reference, and may also be retrieved from the matcher once the
match operation is complete.

Categories