Trying to understand this Regex code [duplicate] - java

This question already has an answer here:
SCJP6 regex issue
(1 answer)
Closed 7 years ago.
I have the following code. As far as I can see, the program should print 0123445. Instead, it prints 01234456.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex2 {
public static void main(String[] args) {
Pattern p = Pattern.compile("\\d*");
Matcher m = p.matcher("ab34ef");
boolean b = false;
while(b=m.find()){
System.out.print(m.start() + m.group());
}
System.out.println();
}
}
I think the following should happen-
Since the search pattern is for a \d*,
It finds a hit at position 0, but since the hit is not a digit, it just prints 0
It finds a hit at position 1, but again, not a digit, prints 0
Finds a hit at position 2 and since we are looking for \d*, the hit is 34, and so it prints 234.
Moves to position 4, finds a hit, but since hit is not a digit, it just prints 4.
Moves to position 5, finds a hit, but since hit is not a digit, it just prints 5.
At this point, as far as I can see, it should be done. But for some reason, the program also returns a 6.
Much appreciate it if someone can explain.

The \d* matches zero(!) or more digits, that's why it returns an empty string as a match at 0 and 1, it the matches 34 at position 2 and an empty string again at position 4 and 5. At that point what is left to match against is an empty string. And this empty string also matches \d* (because an empty string contains zero digits), that's why there is another match at position 6.
To contrast this try using \d+ (which matches one or more digits) as the pattern and see what happens then.

Related

Why does this regex fails to check accurately?

I have the following regex method which does the matches in 3 stages for a given string. But for some reason the Regex fails to check some of the things. As per whatever knowledge I have gained by working they seem to be correct. Can someone please correct me what am I doing wrong here?
I have the following code:
public class App {
public static void main(String[] args) {
String identifier = "urn:abc:de:xyz:234567.1890123";
if (identifier.matches("^urn:abc:de:xyz:.*")) {
System.out.println("Match ONE");
if (identifier.matches("^urn:abc:de:xyz:[0-9]{6,12}.[0-9]{1,7}.*")) {
System.out.println("Match TWO");
if (identifier.matches("^urn:abc:de:xyz:[0-9]{6,12}.[a-zA-Z0-9.-_]{1,20}$")) {
System.out.println("Match Three");
}
}
}
}
}
Ideally, this code should generate the output
Match ONE
Match TWO
Match Three
Only when the identifier = "urn:abc:de:xyz:234567.1890123.abd12" but it provides the same output event if the identifier does not match the regex such as for the following inputs:
"urn:abc:de:xyz:234567.1890123"
"urn:abc:de:xyz:234567.1890ANC"
"urn:abc:de:xyz:234567.1890123"
"urn:abc:de:xyz:234567.1890ACB.123"
I am not understanding why is it allowing the Alphanumeric characters after the . and also it does not care about the characters after the second ..
I would like my Regex to check that the string has the following format:
String starts with urn:abc:de:xyz:
Then it has the numbers [0-9] which range from 6 to 12 (234567).
Then it has the decimal point .
Then it has the numbers [0-9] which range from 1 to 7 (1890123)
Then it has the decimal point ..
Finally it has the alphanumeric character and spcial character which range from 1 to 20 (ABC123.-_12).
This is an valid string for my regex: urn:abc:de:xyz:234567.1890123.ABC123.-_12
This is an invalid string for my regex as it misses the elements from point 6:
urn:abc:de:xyz:234567.1890123
This is also an invalid string for my regex as it misses the elements from point 4 (it has ABC instead of decimal numbers).
urn:abc:de:xyz:234567.1890ABC.ABC123.-_12
This part of the regex:
[0-9]{6,12}.[0-9]{1,7} matches 6 to 12 digits followed by any character followed by 1 to 7 digits
To match a dot, it needs to be escaped. Try this:
^urn:abc:de:xyz:[0-9]{6,12}\.[0-9]{1,7}\.[a-zA-Z0-9\-_]{1,20}$
This will match with any number of dot alphanum at the end of the string as your examples:
^urn:abc:de:xyz:\d{6,12}\.\d{1,7}(?:\.[\w-]{1,20})+$
Demo & explanation

A robot moves any 4 direction up,down,left,right with specific number between 1-9

I'm having a problem fixing this regex.
Problem: Assuming, the robot can move in any one of the four directions (Forward (F), Back (B), Left (L), Right (R)) followed by the number of steps it can move in that direction. The number of steps it can take is between 1 to 9.
Valid operations: F4L1B3, R5F2, B7, L8F2R4B3, L1, R5
Invalid operations: 12, LR, L2J2, K3F5, R12, F6L7R12, B5R8L+, L4-R3
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class extra {
public static void main(String[] args){
Scanner scn = new Scanner(System.in);
System.out.println("Enter the String: ");
String move = scn.nextLine();
finalPosition(move);
}
static void finalPosition(String move)
{
Pattern p = Pattern.compile("([F][1-9]+)|([B][1-9]+)|([L][1-9]+)|([R][1-9])") ;
Matcher m = p.matcher(move);
boolean b = m.matches();
if (b)
{
System.out.println("Robot is moving");
}else
System.out.println("Invalid input");
}
}
There are 4 uppercase character valid and all 4 can they be followed by a digit 1-9. This seems only to be valid in a sequence of 1 or more occurrences, without any other non whitespace characters are allowed to be part of the string.
You might shorted the whole alternation to a character class matching 1 of the 4 allowed characters followed by a digit 1-9 and repeat that as a whole 1 or more times.
To not let other characters be part of the match, you can use whitespace boundaries at the left and the right.
(?<!\S)(?:[FBLR][1-9])+(?!\S)
(?<!\S) Assert not a non whitspace char directly to the left
(?:[FBLR][1-9])+ Repeat 1 or more times matching either F B L R and a digit 1-9
(?!\S) Assert not a non whitspace char directly to the left
See a regex demo
In Java
String regex = "(?<!\\S)(?:[FBLR][1-9])+(?!\\S)";
If this is the only input that should match from the start till the end of the string, you might also use anchors:
^(?:[FBLR][1-9])+$
Regex demo

Java Regex : 4 Letters followed by 2 Integers

Regex beginner here.
Already visited the followings, none answers my question :
1, 2, 3, 4, 5, 6, etc.
I have a simple regex to check if a string contains 4 chars followed by 2 digits.
[A-Za-z]{4}[0-9]{2}
But, when using it, it doesn't matches. Here is the method I use and an example of input and output :
Input in a JPasswordField
Mypass85
Output
false
Method
public static boolean checkPass(char[] ca){
String s = new String(ca);
System.out.println(s); // Prints : Mypass85
p = Pattern.compile("[A-Za-z]{4}[0-9]{2}");
return p.matcher(s).matches();
}
Matcher#matches attempts to match full input. Use Matcher#find instead:
public static boolean checkPass(String s){
System.out.println(s); // Prints : Mypass85
p = Pattern.compile("[A-Za-z]{4}[0-9]{2}");
return p.matcher(s).find();
}
Promoting a comment to an answer.
It doesn't match because "Mypass85" is 6 letters followed by 2 numbers, but your pattern expects exactly 4 letters followed by 2 numbers.
You can either pass something like "Pass85" to match your existing pattern, or you can get "Mypass85" to match by changing the {4} to {6} or to {4,} (4 or more).

Java Regular expressions issue - Can't match two strings in the same line [duplicate]

This question already has answers here:
What do 'lazy' and 'greedy' mean in the context of regular expressions?
(13 answers)
Closed 8 years ago.
just experiencing some problems with Java Regular expressions.
I have a program that reads through an HTML file and replaces any string inside the #VR# characters, i.e. #VR#Test1 2 3 4#VR#
However my issue is that, if the line contains more than two strings surrounded by #VR#, it does not match them. It would match the leftmost #VR# with the rightmost #VR# in the sentence and thus take whatever is in between.
For example:
#VR#Google#VR#
My code would match
URL-GOES-HERE#VR#" target="_blank" style="color:#f4f3f1; text-decoration:none;" title="ContactUs">#VR#Google
Here is my Java code. Would appreciate if you could help me to solve this:
Pattern p = Pattern.compile("#VR#.*#VR#");
Matcher m;
Scanner scanner = new Scanner(htmlContent);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
m = p.matcher(line);
StringBuffer sb = new StringBuffer();
while (m.find()) {
String match_found = m.group().replaceAll("#VR#", "");
System.out.println("group: " + match_found);
}
}
I tried replacing m.group() with m.group(0) and m.group(1) but nothing. Also m.groupCount() always returns zero, even if there are two matches as in my example above.
Thanks, your help will be very much appreciated.
Your problem is that .* is "greedy"; it will try to match as long a substring as possible while still letting the overall expression match. So, for example, in #VR# 1 #VR# 2 #VR# 3 #VR#, it will match 1 #VR# 2 #VR# 3.
The simplest fix is to make it "non-greedy" (matching as little as possible while still letting the expression match), by changing the * to *?:
Pattern p = Pattern.compile("#VR#.*?#VR#");
Also m.groupCount() always returns zero, even if there are two matches as in my example above.
That's because m.groupCount() returns the number of capture groups (parenthesized subexpressions, whose corresponding matched substrings retrieved using m.group(1) and m.group(2) and so on) in the underlying pattern. In your case, your pattern has no capture groups, so m.groupCount() returns 0.
You can try the regular expression:
#VR#(((?!#VR#).)+)#VR#
Demo:
private static final Pattern REGEX_PATTERN =
Pattern.compile("#VR#(((?!#VR#).)+)#VR#");
public static void main(String[] args) {
String input = "#VR#Google#VR# ";
System.out.println(
REGEX_PATTERN.matcher(input).replaceAll("$1")
); // prints "Google "
}

Understanding regular expression output [duplicate]

This question already has an answer here:
SCJP6 regex issue
(1 answer)
Closed 7 years ago.
I need help to understand the output of the code below. I am unable to figure out the output for System.out.print(m.start() + m.group());. Please can someone explain it to me?
import java.util.regex.*;
class Regex2 {
public static void main(String[] args) {
Pattern p = Pattern.compile("\\d*");
Matcher m = p.matcher("ab34ef");
boolean b = false;
while(b = m.find()) {
System.out.println(m.start() + m.group());
}
}
}
Output is:
0
1
234
4
5
6
Note that if I put System.out.println(m.start() );, output is:
0
1
2
4
5
6
Because you have included a * character, your pattern will match empty strings as well. When I change your code as I suggested in the comments, I get the following output:
0 ()
1 ()
2 (34)
4 ()
5 ()
6 ()
So you have a large number of empty matches (matching each location in the string) with the exception of 34, which matches the string of digits. Use \\d+ if you want to match digits without also matching empty strings..
You used this regex - \d* - which basically means zero or more digits. Mind the zero!
So this pattern will match any group of digits, e.g. 34 plus any other position in the string, where the matched sequence will be the empty string.
So, you will have 6 matches, starting at indices 0,1,2,4,5,6. For match starting at index 2, the matched sequence is 34, while for the remaining ones, the match will be the empty string.
If you want to find only digits, you might want to use this pattern: \d+
d* - match zero or more digits in the expresion.
expresion ab34ef and his corresponding indices 012345
On the zero index there is no match so start() prints 0 and group() prints nothing, then on the first index 1 and nothing, on the second we find match so it prints 2 and 34. Next it will print 4 and nothing and so on.
Another example:
Pattern pattern = Pattern.compile("\\d\\d");
Matcher matcher = pattern.matcher("123ddc2ab23");
while(matcher.find()) {
System.out.println("start:" + matcher.start() + " end:" + matcher.end() + " group:" + matcher.group() + ";");
}
which will println:
start:0 end:2 group:12;
start:9 end:11 group:23;
You will find more information in the tutorial

Categories