Indices of all the overlapping patterns using Pattern & Matcher [duplicate] - java

This question already has an answer here:
Returning overlapping regular expressions
(1 answer)
Closed 4 years ago.
How do I get all the indices(including overlapping) of the string where a pattern is matching.
I have this poc code.
public static void main(){
String input = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
Pattern pattern = Pattern.compile("aaa");
Matcher matcher = pattern.matcher(input);
List<Integer> all = new ArrayList<>();
while (matcher.find()) {
all.add(matcher.start());
}
System.out.println(all);
}
Output:
[0, 3, 6, 9, 12, 15, 18, 21, 24, 27]
It does not consider overlapping patterns.
All the matching indices should be:
[0, 1, 2, 3, 4, .....27]
I know it is easily doable by KMP, but
Can we do it using Pattern and Matcher?

You can change your regex so that the entire expression is within a lookahead, i.e. change "aaa" to "(?=aaa)". This way, the matcher will find overlapping matches, although the matches are not really overlapping, as the actual match will be empty. You can still use groups in the lookahead, though. As a more complex example (Online Demo):
String input = "abab1ab2ab3bcaab4ab5ab6";
Pattern pattern = Pattern.compile("(?=((?:ab.){2}))");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.start() + " " + matcher.group(1));
}
Starting indices and groups are:
2 ab1ab2
5 ab2ab3
14 ab4ab5
17 ab5ab6

Related

Find ALL matches of a regex pattern in Java - even overlapping ones [duplicate]

This question already has answers here:
Matcher not finding overlapping words?
(4 answers)
Closed 4 years ago.
I have a String of the form:
1,2,3,4,5,6,7,8,...
I am trying to find all substrings in this string that contain exactly 4 digits. For this I have the regex [0-9],[0-9],[0-9],[0-9]. Unfortunately when I try to match the regex against my String, I never obtain all the substrings, only a part of all the possible substrings. For instance, in the example above I would only get:
1,2,3,4
5,6,7,8
although I expect to get:
1,2,3,4
2,3,4,5
3,4,5,6
...
How would I go about finding all matches corresponding to my regex?
for info, I am using Pattern and Matcher to find the matches:
Pattern pattern = Pattern.compile([0-9],[0-9],[0-9],[0-9]);
Matcher matcher = pattern.matcher(myString);
List<String> matches = new ArrayList<String>();
while (matcher.find())
{
matches.add(matcher.group());
}
By default, successive calls to Matcher.find() start at the end of the previous match.
To find from a specific location pass a start position parameter to find of one character past the start of the previous find.
In your case probably something like:
while (matcher.find(matcher.start()+1))
This works fine:
Pattern p = Pattern.compile("[0-9],[0-9],[0-9],[0-9]");
public void test(String[] args) throws Exception {
String test = "0,1,2,3,4,5,6,7,8,9";
Matcher m = p.matcher(test);
if(m.find()) {
do {
System.out.println(m.group());
} while(m.find(m.start()+1));
}
}
printing
0,1,2,3
1,2,3,4
...
If you are looking for a pure regex based solution then you may use this lookahead based regex for overlapping matches:
(?=((?:[0-9],){3}[0-9]))
Note that your matches are available in captured group #1
RegEx Demo
Code:
final String regex = "(?=((?:[0-9],){3}[0-9]))";
final String string = "0,1,2,3,4,5,6,7,8,9";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Code Demo
output:
0,1,2,3
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,7
5,6,7,8
6,7,8,9
Some sample code without regex (since it seems not useful to me). Also I would assume regex to be slower in this case. Yet it will only work as it is as long as the numbers are only 1 character long.
String s = "a,b,c,d,e,f,g,h";
for (int i = 0; i < s.length() - 8; i+=2) {
System.out.println(s.substring(i, i + 7));
}
Ouput for this string:
a,b,c,d
b,c,d,e
c,d,e,f
d,e,f,g
As #OldCurmudgeon pointed out, find() by default start looking from the end of the previous match. To position it right after the first matched element, introduce the first matched region as a capturing group, and use it's end index:
Pattern pattern = Pattern.compile("(\\d,)\\d,\\d,\\d");
Matcher matcher = pattern.matcher("1,2,3,4,5,6,7,8,9");
List<String> matches = new ArrayList<>();
int start = 0;
while (matcher.find(start)) {
start = matcher.end(1);
matches.add(matcher.group());
}
System.out.println(matches);
results in
[1,2,3,4, 2,3,4,5, 3,4,5,6, 4,5,6,7, 5,6,7,8, 6,7,8,9]
This approach would also work if your matching region is longer than one digit

Regex How to match 2 any, but different characters

So I have a String String s = "4433334552223"; that I would like to split into an array, on every character change (between every pair of different of characters). String [] aRay = s.split("IDK"); I'm wanting the String array to contain {44,3333,4,55,222,3} after the split().
I know how to do it with a loop and such, but I was just wondering if there was a simple way to do this with regex??
You can use a backreference to match repeated characters:
String s = "4433334552223";
Matcher m = Pattern.compile("(.)\\1*").matcher(s);
while (m.find()) {
System.out.println(m.group());
}
Ideone Demo
You can use the following code:
String input ="4433334552223";
final String PATTERN = "(.)(\\1*)";
Matcher m = Pattern.compile(PATTERN).matcher(input);
ArrayList<String> result = new ArrayList<String>();
while(m.find())
{
result.add(m.group(1)+m.group(2));
}
System.out.println(result.toString());
This produce the following output:
[44, 3333, 4, 55, 222, 3]

Java - Extract string from pattern

Given some strings that look like this:
(((((((((((((4)+13)*5)/1)+7)+12)*3)-6)-11)+9)*2)/8)-10)
(((((((((((((4)+13)*6)/1)+5)+12)*2)-7)-11)+8)*3)/9)-10)
(((((((((((((4)+13)*6)/1)+7)+12)*2)-8)-11)+5)*3)/9)-10)
(btw, they are solutions for a puzzle which I write a program for :) )
They all share this pattern
"(((((((((((((.)+13)*.)/.)+.)+12)*.)-.)-11)+.)*.)/.)-10)"
For 1 solution : How can I get the values with this given pattern?
So for the first solution I will get an collection,list,array (doesn't matter) like this:
[4,5,1,7,3,6,9,2,8]
You've done most of the work actually by providing the pattern. All you need to do is use capturing groups where the . are (and escape the rest).
I put your inputs in a String array and got the results into a List of integers (as you said, you can change it to something else). As for the pattern, you want to capture the dots; this is done by surrounding them with ( and ). The problem in your case is that the whole string is full of them, so we need to quote / escape them out (meaning, tell the regex compiler that we mean the literal / character ( and )). This can be done by putting the part we want to escape between \Q and \E.
The code below shows a coherent (though maybe not effective) way to do this. Just be careful with using the right amount of \ in the right places:
public class Example {
public static void main(String[] args) {
String[] inputs = new String[3];
inputs[0] = "(((((((((((((4)+13)*5)/1)+7)+12)*3)-6)-11)+9)*2)/8)-10)";
inputs[1] = "(((((((((((((4)+13)*6)/1)+5)+12)*2)-7)-11)+8)*3)/9)-10)";
inputs[2] = "(((((((((((((4)+13)*6)/1)+7)+12)*2)-8)-11)+5)*3)/9)-10)";
List<Integer> results;
String pattern = "(((((((((((((.)+13)*.)/.)+.)+12)*.)-.)-11)+.)*.)/.)-10)"; // Copy-paste from your question.
pattern = pattern.replaceAll("\\.", "\\\\E(.)\\\\Q");
pattern = "\\Q" + pattern;
Pattern p = Pattern.compile(pattern);
Matcher m;
for (String input : inputs) {
m = p.matcher(input);
results = new ArrayList<>();
if (m.matches()) {
for (int i = 1; i < m.groupCount() + 1; i++) {
results.add(Integer.parseInt(m.group(i)));
}
}
System.out.println(results);
}
}
}
Output:
[4, 5, 1, 7, 3, 6, 9, 2, 8]
[4, 6, 1, 5, 2, 7, 8, 3, 9]
[4, 6, 1, 7, 2, 8, 5, 3, 9]
Notes:
You are using a single ., which means
Any character (may or may not match line terminators)
So if you have a number there which is not a single digit or a single character which is not a number (digit), something will go wrong either in the matches or parseInt. Consider \\d to signify a single digit or \\d+ for a number instead.
See Pattern for more info on regex in Java.

Why my regex isn't working for date?

I've got a problem using a regex to match the date in a string. Actually I've got a lot of "date formats" to match but the first one doesn't work and I don't get why it wouldn't work...
The format is like "September 12, 2013" or "May 6, 2014" or "June 02, 2014"...
In my string text, there is the following date : "July 4, 2014".
Here's my code :
Pattern p = Pattern.compile("[a-zA-Z]+ [0-3]?[0-9], (1|2)\\d{3}", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
System.out.println(m.group(1));
But it comes to this error :
Exception in thread "main" java.lang.IllegalStateException: No match found
I even tried with smaller regex but it still doesn't match anything.
Thank you in advance for the help !
You need to invoke Matcher#find() or Matcher#matches() before invoking Matcher#group.
Otherwise, the match is not performed, hence you have neither the whole group, nor any single back-references populated.
Both methods mentioned above return boolean, which will help you infer whether or not your desired group will contain any text.
A typical idiom would be:
if (matcher.find()) {
// get the group(s)
}
Documentation here.
On the other hand, I would recommend you use DateFormats instead of regular expressions for dates - API here.
You need to condition for m.find() and print m.group(0) in place of (1).
String text = "July 4, 2014";
String pattern = "\\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(Nov|Dec)(?:ember)?)\\D [0-9]{1,2}, [0-9]{4}";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(text);
if(m.find()){
System.out.println("Found value: " + m.group(0));
}
You need to check if(m.find()) and print m.group(0) because if you print m.group(1) this will print 1 or 2, (1|2) according to your input, as your input has 2014, m.group(1) will print 2. And m.group(0) means the first group of "[a-zA-Z]+ [0-3]?[0-9], (1|2)\\d{3}" and it prints your full text because it takes your full regex as a first group because there is no other group except (1|2).
Try this code.
String text="July 4, 2014";
Pattern p = Pattern.compile("[a-zA-Z]+ [0-3]?[0-9], (1|2)\\d{3}", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
if (m.find( )) {
System.out.println(m.group(0));
}else{
System.out.println("No match found");
}
Output
July 4, 2014
Visit here to know basic with example

Splitting string expression into tokens

My input is like
String str = "-1.33E+4-helloeeee+4+(5*2(10/2)5*10)/2";
i want the output as:
1.33E+4
helloeeee
4
5
2
10
2
5
10
2
But I am getting the output as
1.33, 4, helloeeee, 4, 5, 2, 10, 2, 5, 10, 2
i want the exponent value completely after splitting "1.33e+4"
here is my code:
String str = "-1.33E+4-helloeeee+4+(5*2(10/2)5*10)/2";
List<String> tokensOfExpression = new ArrayList<String>();
String[] tokens=str.split("[(?!E)+*\\-/()]+");
for(String token:tokens)
{
System.out.println(token);
tokensOfExpression.add(token);
}
if(tokensOfExpression.get(0).equals(""))
{
tokensOfExpression.remove(0);
}
I would first replace the E+ with a symbol that is not ambiguous such as
str.ReplaceAll("E+","SCINOT");
You can then parse with StringTokenizer, replacing the SCINOT symbol when you need to evaluate the number represented in scientific notation.
You can't do that with a single regular expression, because of the ambiguities introduced by FP constants in scientific notation, and in any case you need to know which token is which without having to re-scan them. You've also mis-stated your requirement, as you certainly need the binary operators in the output as well. You need to write both a scanner and a parser. Have a look for 'recursive descent expression parser' and 'Dijkstra shunting-yard algorithm'.Resetting the digest is redundant.
Try this
String[] tokens=str.split("(?<!E)+[*\\-/()+]");
It's easier to achieve the result with Matcher
String str = "-1.33E+4-helloeeee+4+(5*2(10/2)5*10)/2";
Matcher m = Pattern.compile("\\d+\\.\\d*E[+-]?\\d+|\\w+").matcher(str);
while(m.find()) {
System.out.println(m.group());
}
prints
1.33E+4
helloeeee
4
5
2
10
2
5
10
2
note that it needs some testing for different floating point expressions but it is easily adjustable

Categories