This question already has answers here:
Matcher not finding overlapping words?
(4 answers)
Closed 4 years ago.
I have a String of the form:
1,2,3,4,5,6,7,8,...
I am trying to find all substrings in this string that contain exactly 4 digits. For this I have the regex [0-9],[0-9],[0-9],[0-9]. Unfortunately when I try to match the regex against my String, I never obtain all the substrings, only a part of all the possible substrings. For instance, in the example above I would only get:
1,2,3,4
5,6,7,8
although I expect to get:
1,2,3,4
2,3,4,5
3,4,5,6
...
How would I go about finding all matches corresponding to my regex?
for info, I am using Pattern and Matcher to find the matches:
Pattern pattern = Pattern.compile([0-9],[0-9],[0-9],[0-9]);
Matcher matcher = pattern.matcher(myString);
List<String> matches = new ArrayList<String>();
while (matcher.find())
{
matches.add(matcher.group());
}
By default, successive calls to Matcher.find() start at the end of the previous match.
To find from a specific location pass a start position parameter to find of one character past the start of the previous find.
In your case probably something like:
while (matcher.find(matcher.start()+1))
This works fine:
Pattern p = Pattern.compile("[0-9],[0-9],[0-9],[0-9]");
public void test(String[] args) throws Exception {
String test = "0,1,2,3,4,5,6,7,8,9";
Matcher m = p.matcher(test);
if(m.find()) {
do {
System.out.println(m.group());
} while(m.find(m.start()+1));
}
}
printing
0,1,2,3
1,2,3,4
...
If you are looking for a pure regex based solution then you may use this lookahead based regex for overlapping matches:
(?=((?:[0-9],){3}[0-9]))
Note that your matches are available in captured group #1
RegEx Demo
Code:
final String regex = "(?=((?:[0-9],){3}[0-9]))";
final String string = "0,1,2,3,4,5,6,7,8,9";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Code Demo
output:
0,1,2,3
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,7
5,6,7,8
6,7,8,9
Some sample code without regex (since it seems not useful to me). Also I would assume regex to be slower in this case. Yet it will only work as it is as long as the numbers are only 1 character long.
String s = "a,b,c,d,e,f,g,h";
for (int i = 0; i < s.length() - 8; i+=2) {
System.out.println(s.substring(i, i + 7));
}
Ouput for this string:
a,b,c,d
b,c,d,e
c,d,e,f
d,e,f,g
As #OldCurmudgeon pointed out, find() by default start looking from the end of the previous match. To position it right after the first matched element, introduce the first matched region as a capturing group, and use it's end index:
Pattern pattern = Pattern.compile("(\\d,)\\d,\\d,\\d");
Matcher matcher = pattern.matcher("1,2,3,4,5,6,7,8,9");
List<String> matches = new ArrayList<>();
int start = 0;
while (matcher.find(start)) {
start = matcher.end(1);
matches.add(matcher.group());
}
System.out.println(matches);
results in
[1,2,3,4, 2,3,4,5, 3,4,5,6, 4,5,6,7, 5,6,7,8, 6,7,8,9]
This approach would also work if your matching region is longer than one digit
So I have a String String s = "4433334552223"; that I would like to split into an array, on every character change (between every pair of different of characters). String [] aRay = s.split("IDK"); I'm wanting the String array to contain {44,3333,4,55,222,3} after the split().
I know how to do it with a loop and such, but I was just wondering if there was a simple way to do this with regex??
You can use a backreference to match repeated characters:
String s = "4433334552223";
Matcher m = Pattern.compile("(.)\\1*").matcher(s);
while (m.find()) {
System.out.println(m.group());
}
Ideone Demo
You can use the following code:
String input ="4433334552223";
final String PATTERN = "(.)(\\1*)";
Matcher m = Pattern.compile(PATTERN).matcher(input);
ArrayList<String> result = new ArrayList<String>();
while(m.find())
{
result.add(m.group(1)+m.group(2));
}
System.out.println(result.toString());
This produce the following output:
[44, 3333, 4, 55, 222, 3]
Given some strings that look like this:
(((((((((((((4)+13)*5)/1)+7)+12)*3)-6)-11)+9)*2)/8)-10)
(((((((((((((4)+13)*6)/1)+5)+12)*2)-7)-11)+8)*3)/9)-10)
(((((((((((((4)+13)*6)/1)+7)+12)*2)-8)-11)+5)*3)/9)-10)
(btw, they are solutions for a puzzle which I write a program for :) )
They all share this pattern
"(((((((((((((.)+13)*.)/.)+.)+12)*.)-.)-11)+.)*.)/.)-10)"
For 1 solution : How can I get the values with this given pattern?
So for the first solution I will get an collection,list,array (doesn't matter) like this:
[4,5,1,7,3,6,9,2,8]
You've done most of the work actually by providing the pattern. All you need to do is use capturing groups where the . are (and escape the rest).
I put your inputs in a String array and got the results into a List of integers (as you said, you can change it to something else). As for the pattern, you want to capture the dots; this is done by surrounding them with ( and ). The problem in your case is that the whole string is full of them, so we need to quote / escape them out (meaning, tell the regex compiler that we mean the literal / character ( and )). This can be done by putting the part we want to escape between \Q and \E.
The code below shows a coherent (though maybe not effective) way to do this. Just be careful with using the right amount of \ in the right places:
public class Example {
public static void main(String[] args) {
String[] inputs = new String[3];
inputs[0] = "(((((((((((((4)+13)*5)/1)+7)+12)*3)-6)-11)+9)*2)/8)-10)";
inputs[1] = "(((((((((((((4)+13)*6)/1)+5)+12)*2)-7)-11)+8)*3)/9)-10)";
inputs[2] = "(((((((((((((4)+13)*6)/1)+7)+12)*2)-8)-11)+5)*3)/9)-10)";
List<Integer> results;
String pattern = "(((((((((((((.)+13)*.)/.)+.)+12)*.)-.)-11)+.)*.)/.)-10)"; // Copy-paste from your question.
pattern = pattern.replaceAll("\\.", "\\\\E(.)\\\\Q");
pattern = "\\Q" + pattern;
Pattern p = Pattern.compile(pattern);
Matcher m;
for (String input : inputs) {
m = p.matcher(input);
results = new ArrayList<>();
if (m.matches()) {
for (int i = 1; i < m.groupCount() + 1; i++) {
results.add(Integer.parseInt(m.group(i)));
}
}
System.out.println(results);
}
}
}
Output:
[4, 5, 1, 7, 3, 6, 9, 2, 8]
[4, 6, 1, 5, 2, 7, 8, 3, 9]
[4, 6, 1, 7, 2, 8, 5, 3, 9]
Notes:
You are using a single ., which means
Any character (may or may not match line terminators)
So if you have a number there which is not a single digit or a single character which is not a number (digit), something will go wrong either in the matches or parseInt. Consider \\d to signify a single digit or \\d+ for a number instead.
See Pattern for more info on regex in Java.
I've got a problem using a regex to match the date in a string. Actually I've got a lot of "date formats" to match but the first one doesn't work and I don't get why it wouldn't work...
The format is like "September 12, 2013" or "May 6, 2014" or "June 02, 2014"...
In my string text, there is the following date : "July 4, 2014".
Here's my code :
Pattern p = Pattern.compile("[a-zA-Z]+ [0-3]?[0-9], (1|2)\\d{3}", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
System.out.println(m.group(1));
But it comes to this error :
Exception in thread "main" java.lang.IllegalStateException: No match found
I even tried with smaller regex but it still doesn't match anything.
Thank you in advance for the help !
You need to invoke Matcher#find() or Matcher#matches() before invoking Matcher#group.
Otherwise, the match is not performed, hence you have neither the whole group, nor any single back-references populated.
Both methods mentioned above return boolean, which will help you infer whether or not your desired group will contain any text.
A typical idiom would be:
if (matcher.find()) {
// get the group(s)
}
Documentation here.
On the other hand, I would recommend you use DateFormats instead of regular expressions for dates - API here.
You need to condition for m.find() and print m.group(0) in place of (1).
String text = "July 4, 2014";
String pattern = "\\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(Nov|Dec)(?:ember)?)\\D [0-9]{1,2}, [0-9]{4}";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(text);
if(m.find()){
System.out.println("Found value: " + m.group(0));
}
You need to check if(m.find()) and print m.group(0) because if you print m.group(1) this will print 1 or 2, (1|2) according to your input, as your input has 2014, m.group(1) will print 2. And m.group(0) means the first group of "[a-zA-Z]+ [0-3]?[0-9], (1|2)\\d{3}" and it prints your full text because it takes your full regex as a first group because there is no other group except (1|2).
Try this code.
String text="July 4, 2014";
Pattern p = Pattern.compile("[a-zA-Z]+ [0-3]?[0-9], (1|2)\\d{3}", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(text);
if (m.find( )) {
System.out.println(m.group(0));
}else{
System.out.println("No match found");
}
Output
July 4, 2014
Visit here to know basic with example
My input is like
String str = "-1.33E+4-helloeeee+4+(5*2(10/2)5*10)/2";
i want the output as:
1.33E+4
helloeeee
4
5
2
10
2
5
10
2
But I am getting the output as
1.33, 4, helloeeee, 4, 5, 2, 10, 2, 5, 10, 2
i want the exponent value completely after splitting "1.33e+4"
here is my code:
String str = "-1.33E+4-helloeeee+4+(5*2(10/2)5*10)/2";
List<String> tokensOfExpression = new ArrayList<String>();
String[] tokens=str.split("[(?!E)+*\\-/()]+");
for(String token:tokens)
{
System.out.println(token);
tokensOfExpression.add(token);
}
if(tokensOfExpression.get(0).equals(""))
{
tokensOfExpression.remove(0);
}
I would first replace the E+ with a symbol that is not ambiguous such as
str.ReplaceAll("E+","SCINOT");
You can then parse with StringTokenizer, replacing the SCINOT symbol when you need to evaluate the number represented in scientific notation.
You can't do that with a single regular expression, because of the ambiguities introduced by FP constants in scientific notation, and in any case you need to know which token is which without having to re-scan them. You've also mis-stated your requirement, as you certainly need the binary operators in the output as well. You need to write both a scanner and a parser. Have a look for 'recursive descent expression parser' and 'Dijkstra shunting-yard algorithm'.Resetting the digest is redundant.
Try this
String[] tokens=str.split("(?<!E)+[*\\-/()+]");
It's easier to achieve the result with Matcher
String str = "-1.33E+4-helloeeee+4+(5*2(10/2)5*10)/2";
Matcher m = Pattern.compile("\\d+\\.\\d*E[+-]?\\d+|\\w+").matcher(str);
while(m.find()) {
System.out.println(m.group());
}
prints
1.33E+4
helloeeee
4
5
2
10
2
5
10
2
note that it needs some testing for different floating point expressions but it is easily adjustable