This question already has answers here:
Matcher not finding overlapping words?
(4 answers)
Closed 4 years ago.
I have a String of the form:
1,2,3,4,5,6,7,8,...
I am trying to find all substrings in this string that contain exactly 4 digits. For this I have the regex [0-9],[0-9],[0-9],[0-9]. Unfortunately when I try to match the regex against my String, I never obtain all the substrings, only a part of all the possible substrings. For instance, in the example above I would only get:
1,2,3,4
5,6,7,8
although I expect to get:
1,2,3,4
2,3,4,5
3,4,5,6
...
How would I go about finding all matches corresponding to my regex?
for info, I am using Pattern and Matcher to find the matches:
Pattern pattern = Pattern.compile([0-9],[0-9],[0-9],[0-9]);
Matcher matcher = pattern.matcher(myString);
List<String> matches = new ArrayList<String>();
while (matcher.find())
{
matches.add(matcher.group());
}
By default, successive calls to Matcher.find() start at the end of the previous match.
To find from a specific location pass a start position parameter to find of one character past the start of the previous find.
In your case probably something like:
while (matcher.find(matcher.start()+1))
This works fine:
Pattern p = Pattern.compile("[0-9],[0-9],[0-9],[0-9]");
public void test(String[] args) throws Exception {
String test = "0,1,2,3,4,5,6,7,8,9";
Matcher m = p.matcher(test);
if(m.find()) {
do {
System.out.println(m.group());
} while(m.find(m.start()+1));
}
}
printing
0,1,2,3
1,2,3,4
...
If you are looking for a pure regex based solution then you may use this lookahead based regex for overlapping matches:
(?=((?:[0-9],){3}[0-9]))
Note that your matches are available in captured group #1
RegEx Demo
Code:
final String regex = "(?=((?:[0-9],){3}[0-9]))";
final String string = "0,1,2,3,4,5,6,7,8,9";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Code Demo
output:
0,1,2,3
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,7
5,6,7,8
6,7,8,9
Some sample code without regex (since it seems not useful to me). Also I would assume regex to be slower in this case. Yet it will only work as it is as long as the numbers are only 1 character long.
String s = "a,b,c,d,e,f,g,h";
for (int i = 0; i < s.length() - 8; i+=2) {
System.out.println(s.substring(i, i + 7));
}
Ouput for this string:
a,b,c,d
b,c,d,e
c,d,e,f
d,e,f,g
As #OldCurmudgeon pointed out, find() by default start looking from the end of the previous match. To position it right after the first matched element, introduce the first matched region as a capturing group, and use it's end index:
Pattern pattern = Pattern.compile("(\\d,)\\d,\\d,\\d");
Matcher matcher = pattern.matcher("1,2,3,4,5,6,7,8,9");
List<String> matches = new ArrayList<>();
int start = 0;
while (matcher.find(start)) {
start = matcher.end(1);
matches.add(matcher.group());
}
System.out.println(matches);
results in
[1,2,3,4, 2,3,4,5, 3,4,5,6, 4,5,6,7, 5,6,7,8, 6,7,8,9]
This approach would also work if your matching region is longer than one digit
I have a Regex Pattern that i am using to match screen.
When i use it to test in Sublime Text, the same is working just fine.
but in Java execution, the code is failing
System.out.println(Pattern.matches("(B+)?|(R+)?", "RRBRR"));//false
System.out.println(Pattern.matches("(B+)?|(R+)?", "RRRRR"));//true
The above code should be coming as true in both cases, whereas in java it is coming as false.
my basic requirement is to identify groups of unique character in sequence...
meaning if String is
RRRRBBBRRBBBRBBBRRR
Then it should identify as
RRRR BBB RR BBB R BBB RRR
Please help...Thanks in advance
Try this:
String value = "RRRRBBBRRBBBRBBBRRR";
Pattern pattern = Pattern.compile("B+|R+");
Matcher matcher = pattern.matcher(value);
while (matcher.find()) {
System.out.println(matcher.group());
}
The fact that the first expression returns false is due to the fact that you have a B in a middle of several R so you don't have an exact match since your regular expression expect only Rs or Bs
matches adds an implicit ^ at the start & $ at the end which means substring matches wont work. find() will look for substring.
Matcher is best suited for this:
public static void main (String[] args) throws java.lang.Exception
{
String regex = "(B+)?|(R+)?";
Pattern pat = Pattern.compile(regex);
Matcher matcher = pat.matcher("RRBRR");
System.out.println(matcher.find());
int count = 0;
while(matcher.find()){
System.out.println(matcher.group());
count++;
}
System.out.println("Count:"+count);
}
I want to check the text to see if it starts with what or who and and is a question type, so for that I wrote the following code:
private static void startWithQOrIf(String commentstr){
String urlPattern = "(|who|what).*\\?.*$";
Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(commentstr);
if (m.find()) {
System.out.println("yes");
}
}
everything works good but for example when I try:
whooooooooo is the follower?
will match as well but should not because I am looking for who not whooooooooo
Any idea?
You can ensure a whole word using a word boundary \b:
(|who|what)\\b.*\\?.*$
^^
If the words in the alternation group are supposed to appear at the start of the string, you can just use matches and remove $ anchor:
String urlPattern = "(|who|what)\\b.*\\?.*";
Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(commentstr);
if (m.matches()) { // < - Here, matches is used
System.out.println("yes");
}
Note that (|who|what) matches either an empty string, or who, or what. If you do not plan to allow empty string, use just (who|what).
You must use word boundaries.
String urlPattern = "\\b(who|what)\\b.*\\?.*$";
I'm trying to replace certain words in a long string. What happens is some words stay the same and some change. The words that don't change seem to get the matcher stuck in an infinite loop as it keeps trying to do the same action on words that are meant to stay the same. Below is an example similar to mine - I couldn't put the exact code that I'm using because it's far more detailed and would take up too much space I'm afraid.
public String test() {
String temp = "<p><img src=\"logo.jpg\"/></p>\n<p>CANT TOUCH THIS!</p>";
Pattern pattern = Pattern.compile("(<p(\\s.+)?>(.+)?</p>)");
Matcher matcher = pattern.matcher(temp);
StringBuilder stringBuilder = new StringBuilder(temp);
int start;
int end;
String match;
while (matcher.find()) {
start = matcher.start();
end = matcher.end();
match = temp.substring(start, end);
stringBuilder.replace(start, end, changeWords(match));
temp = stringBuilder.toString();
matcher = pattern.matcher(temp);
System.out.println("This is the word I'm getting stuck on: " + match);
}
return temp;
}
public String changeWords(String words) {
return "<p><img src=\"logo.jpg\"/></p>";
}
Any suggestions as to why this might be happening?
You reinitialize the matcher in the loop.
Remove the matcher = pattern.matcher(temp); instruction in your while loop and you should not be stuck any more.
You are using Matcher wrong. Your while loop reads:
while (matcher.find()) {
start = matcher.start();
end = matcher.end();
match = temp.substring(start, end);
stringBuilder.replace(start, end, changeWords(match));
temp = stringBuilder.toString();
matcher = pattern.matcher(temp);
}
it should just be:
matcher.replaceAll(temp, "new text");
No "while" loop, it is unnecessary. A matcher will not replace text it does not match and it will do the right job with regards to not matching twice at the same place etc -- no need to spoonfeed it.
What is more, your regex can do without the capturing parens. And if you only want to replace "words" (regexes have no notion of words), add word anchors around the text to be matched:
Pattern pattern = Pattern.compile("\\btext\\b");
You are looking to match "text" word and again replacing that word either with "text" (if condition in changeWord()) or "new text" (else in changeWord()). That whay it's causing infinite loop.
Why are you using Matcher at all? You don't need regex to replace words, just use replace():
input.replace("oldtext", "newtext"); // replace all occurrences of old with new
I fixed it simply by adding this line:
if (!match.equals(changeWords(match))) {
matcher = pattern.matcher(temp);
}
I try to parse a String with a Regexp to get parameters out of it.
As an example:
String: "TestStringpart1 with second test part2"
Result should be: String[] {"part1", "part2"}
Regexp: "TestString(.*?) with second test (.*?)"
My Testcode was:
String regexp = "TestString(.*?) with second test (.*?)";
String res = "TestStringpart1 with second test part2";
Pattern pattern = Pattern.compile(regexp);
Matcher matcher = pattern.matcher(res);
int i = 0;
while(matcher.find()) {
i++;
System.out.println(matcher.group(i));
}
But it only outputs the "part1"
Could someone give me hint?
Thanks
may be some fix regexp
String regexp = "TestString(.*?) with second test (.*)";
and change println code ..
if (matcher.find())
for (int i = 1; i <= matcher.groupCount(); ++i)
System.out.println(matcher.group(i));
Well, you only ever ask it to... In your original code, the find keeps shifting the matcher from one match of the entire regular expression to the next, while within the while's body you only ever pull out one group. Actually, if there would have been multiple matches of the regexp in your string, you would have found that for the first occurence, you would have got the "part1", for the second occurence you would have got the "part2", and for any other reference you would have got an error.
while(matcher.find()) {
System.out.print("Part 1: ");
System.out.println(matcher.group(1));
System.out.print("Part 2: ");
System.out.println(matcher.group(2));
System.out.print("Entire match: ");
System.out.println(matcher.group(0));
}