Java regex pattern query [duplicate] - java

This question already has an answer here:
SCJP6 regex issue
(1 answer)
Closed 7 years ago.
Just a quick question about Java regex patterns! So say if I had a method like..
public void example()
{
Pattern p = Pattern.compile("\\d*");
Matcher m = p.matcher("ab34ef");
boolean b = false;
while (b = m.find())
{
System.out.println(m.start() + " " + m.group());
}
}
If I ran this I would end up with the following output..
0
1
2 34
4
5
6
I understand how this works apart from how it ends up at 6, I thought it would finish on 5 could someone please explain this to me? Thanks!

In your string, "ab34ef", there are 7 "empty characters" with a value of "". They are located between each of the normal characters. It attempts to find a match starting on each empty character, not each normal character; i.e. the location of each | in the following: "|a|b|3|4|e|f|".

Related

Pattern and Matcher in Java: Matcher only finds one match instead of two [duplicate]

This question already has answers here:
Overlapping matches in Regex
(3 answers)
Closed 5 years ago.
I'm working with Pattern and Matcher in Java. I have the following code:
String searchString = "0,00,0";
String searchInText = "0,00,00,0"
Pattern p = Pattern.compile(searchString);
Matcher m = p.matcher(searchString);
while(m.find){
...
}
My Problem is that the Matcher only finds one match from the first zero to the 4th zero. But there should be another match from the 3rd zero to the last zero.
Can someone help me? Is there a workaround?
Getting overlapping matches with regex is tricky, especially if you're not very familiar with regexes.
If you're not really using regex functionality (like in your example), you could easily do this with an indexOf(String, int) and keep increasing the index from which you're doing the search.
int index = 0;
while((index = text.indexOf(pattern, index)) > -1) {
System.out.println(index + " " + pattern);
index++;
}

Regex shows incorrect answer

I have a text file where it has information of a person. I have written a regex to extract age of a person ie X years Y months.
String n="Mayur is 18 years 4 months old ";
Pattern p=Pattern.compile("[\\d+\\s+years]+[\\d+\\s+months]+",Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(n);
while (m.find()) {
System.out.println(m.group(0));
}
Output i received is:
r
s 18 years 4 months o
I did not extracted those wanted characters in the output but it is listed them.
Expected output is:
18 years 4 Month
Please not they are records with only years and some with only months.
The problem with your regex is that [\d+\s+years] matches any character found in the list so that's why you got r in the result you don't have to use brackets [].
This is the Regex you need (\\d+\\s* years\\s*)*(\\d+\\s* months)*, use () for a matching group.
I changed \\s+ to \\s* to make it match cases where it's written:
Mayur is 18years 4months old
Here's a Live DEMO
EDIT:
The problem of empty strings is due to to the *quantifier after the matching groups, I fixed it using this new Regex:
(\\d+\\s* years\\s*)+|(\\d+\\s* months)+
See the DEMO here
(?:\\d+\\s+(?:years|months)\\s*){1,2}
Use this.[] is not what you think.Its a character class.See demo.
https://regex101.com/r/uE3cC4/25
Try this:
String n="Mayur is 18 years 4 months old ";
Pattern p=Pattern.compile("([0-9]+) years ([0-9]+) months",Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(n);
while (m.find()) {
String years = m.group(1);
String months = m.group(2);
System.out.println(m.group(0));
}
Using "0" as group you can get the whole expression. Otherwise using 1 or 2 you can get the values.

Understanding regular expression output [duplicate]

This question already has an answer here:
SCJP6 regex issue
(1 answer)
Closed 7 years ago.
I need help to understand the output of the code below. I am unable to figure out the output for System.out.print(m.start() + m.group());. Please can someone explain it to me?
import java.util.regex.*;
class Regex2 {
public static void main(String[] args) {
Pattern p = Pattern.compile("\\d*");
Matcher m = p.matcher("ab34ef");
boolean b = false;
while(b = m.find()) {
System.out.println(m.start() + m.group());
}
}
}
Output is:
0
1
234
4
5
6
Note that if I put System.out.println(m.start() );, output is:
0
1
2
4
5
6
Because you have included a * character, your pattern will match empty strings as well. When I change your code as I suggested in the comments, I get the following output:
0 ()
1 ()
2 (34)
4 ()
5 ()
6 ()
So you have a large number of empty matches (matching each location in the string) with the exception of 34, which matches the string of digits. Use \\d+ if you want to match digits without also matching empty strings..
You used this regex - \d* - which basically means zero or more digits. Mind the zero!
So this pattern will match any group of digits, e.g. 34 plus any other position in the string, where the matched sequence will be the empty string.
So, you will have 6 matches, starting at indices 0,1,2,4,5,6. For match starting at index 2, the matched sequence is 34, while for the remaining ones, the match will be the empty string.
If you want to find only digits, you might want to use this pattern: \d+
d* - match zero or more digits in the expresion.
expresion ab34ef and his corresponding indices 012345
On the zero index there is no match so start() prints 0 and group() prints nothing, then on the first index 1 and nothing, on the second we find match so it prints 2 and 34. Next it will print 4 and nothing and so on.
Another example:
Pattern pattern = Pattern.compile("\\d\\d");
Matcher matcher = pattern.matcher("123ddc2ab23");
while(matcher.find()) {
System.out.println("start:" + matcher.start() + " end:" + matcher.end() + " group:" + matcher.group() + ";");
}
which will println:
start:0 end:2 group:12;
start:9 end:11 group:23;
You will find more information in the tutorial

Java Regex does not match - groups [duplicate]

This question already has answers here:
Java Regex does not match
(6 answers)
Closed 9 years ago.
I know that this kind of questions are proposed very often, but
I can't figure out why this RegEx does not match.
I want to check if there is a "M" at the beginning of the line, or not.
Finally, i want the path at the end of the line.
This is why startsWith() doesn't fit my Needs.
line = "M 72208 70779 aab src\com\aut\testproject\TestDomainf1.java";
if (line.matches("^(M?)(.*)$")) {}
I've also tried the other way out:
Pattern p = Pattern.compile("(M?)");
Matcher m = datePatt.matcher(line);
if (m.matches()) {
System.out.println("yay!");
}
if (line.matches("(M?)(.*)")) {}
Thanks
Seems to be simple:
if (line.startsWith("M")) {
String[] tokens = line.split("\\s+");
String path = tokens[tokens.length - 1];
}

Regular expression - Greedy quantifier [duplicate]

This question already has an answer here:
SCJP6 regex issue
(1 answer)
Closed 7 years ago.
I am really struggling with this question:
import java.util.regex.*;
class Regex2 {
public static void main(String[] args) {
Pattern p = Pattern.compile(args[0]);
Matcher m = p.matcher(args[1]);
boolean b = false;
while(b = m.find()) {
System.out.print(m.start() + m.group());
}
}
}
When the above program is run with the following command:
java Regex2 "\d*" ab34ef
It outputs 01234456. I don't really understand this output. Consider the following indexes for each of the characters:
a b 3 4 e f
^ ^ ^ ^ ^ ^
0 1 2 3 4 5
Shouldn't the output have been 0123445?
I have been reading around and it looks like the RegEx engine will also read the end of the string but I just don't understand. Would appreciate if someone can provide a step by step guide as to how it is getting that result. i.e. how it is finding each of the numbers.
It is helpful to change
System.out.print(m.start() + m.group());
to
System.out.println(m.start() + ": " + m.group());
This way the output is much clearer:
0:
1:
2: 34
4:
5:
6:
You can see that it matched at 7 different positions: at position 2 it matched string "34" and at any other position it matched an empty string. Empty string matches at the end as well, which is why you see "6" at the end of your output.
Note that if you run your program like this:
java Regex2 "\d+" ab34ef
it will only output
2: 34

Categories