Regex not matching all numbers with delimiters - java

Need a single combined regex for the following pattern:
Prefix: 2221-2720 , Length: 16
Prefix: 51-55 , Length: 16
where the delimiters b/w digits can be either space ( ), minus sign (-), period (.), backslash (\), equals (=). The condition being that more than one delimiter (same or different type) can't occur more than once b/w any two digits.
Valid number - 230.293.217.952.148.4
Valid number - 230.293 217-952.148.4
Invalid number - 230..293.217.952.148.4
Invalid number - 230.293.-217. 952.148.4
A valid input is one where you have 16 digits separated by any/no delimiters as long as there are no two delimiters adjacent to each other.
Have come up with the following regex:
(2[\s=\\.-]*2[\s=\\.-]*2[\s=\\.-]*[1-9][\s=\\.-]*|2[\s=\\.-]*2[\s=\\.-]*[3-9][\s=\\.-]*[0-9][\s=\\.-]*|2[\s=\\.-]*[3-6][\s=\\.-]*[0-9](?:[\s=\\.-]*[0-9]){1}|2[\s=\\.-]*7[\s=\\.-]*[01][\s=\\.-]*[0-9][\s=\\.-]*|2[\s=\\.-]*7[\s=\\.-]*2[\s=\\.-]*0[\s=\\.-]*)[0-9](?:[\s=\\.-]*[0-9]){11}|(5[\s=\\.-]*[1-5][\s=\\.-]*)[0-9](?:[\s=\\.-]*[0-9]){13}
It does not match certain patterns. For example:
2 3 0 2 9 3 2 1 7 9 5 2 1 4 8 4
23-02-93-21-79-52-14-84
2 3 0 3 4 5 8 0 9 4 9 3 0 8 2 3
For the same numbers, it matches (as expected) the following patterns:
2302932179521484
230.293.217.952.148.4
2303458094930823
230.345.809.493.082.3
230-345-809-493-082-3
There seems to be an issue with delimiters. Kindly let me know what is wrong with my regex.

For this rule
A valid input is one where you have 16 digits separated by any/no
delimiters as long as there are no two delimiters adjacent to each
other
Prefix: 2221-2720 , Length: 16
Prefix: 51-55 , Length: 16
2221 can also be written as 2.2.-2.1
For these rules, it might be easier to write a pattern with 2 capture groups to match the whole string.
Then using some Java code, you can check the value of the capture groups for the ranges.
^((\d[ =\\.-]?\d)[ =\\.-]?\d[ =\\.-]?\d)(?:[ =\\.-]?\d){12}$
The pattern matches:
^ Start of string
( Capture group 1
(\d[ =\\.-]?\d) Capture group 2 Match 2 digits with an optional char = \ . -
[ =\\.-]?\d[ =\\.-]?\d Match 2 times optionally 1 of the listed chars and a single digit
) close group 1
(?:[ =\\.-]?\d){12} Repeat 12 times matching one of the characters and a single digit
$ End of string
Regex demo | Java demo
For example
String strings[] = {
"2221.7.952.148.412.32",
"230.293.217.952.148.4",
"5511111111111111",
"130.293 217-952.148.4",
"30..293.217.952.148.4",
"5..5",
".5.5."
};
String regex = "^((\\d[ =\\\\.-]?\\d)[ =\\\\.-]?\\d[ =\\\\.-]?\\d)(?:[ =\\\\.-]?\\d){12}$";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
int grp1 = Integer.parseInt(matcher.group(1).replaceAll("\\D+", ""));
int grp2 = Integer.parseInt(matcher.group(2).replaceAll("\\D+", ""));
if ((grp1 >= 2221 && grp1 <= 2720) || (grp2 >=51 && grp2 <= 55)) {
System.out.println("Match for " + matcher.group());
}
}
}
Output
Match for 2221.7.952.148.412.32
Match for 230.293.217.952.148.4
Match for 5511111111111111

Related

Validate a state via its pincode in Java [duplicate]

This question already has answers here:
Why doesn't [01-12] range work as expected?
(7 answers)
Closed 2 years ago.
I need to generate regex to validate state as Tamilnadu based on Pincode validation
Regex which I tried fails at some point
String regex = "^[60-64]{2}{0-9}{4}$";
Ref the Tamil Nadu Pincode info link. It starts with 60-64 as the first two digits, the next 4 digits as 0-9 numbers. It must have six digits.
the code
public boolean isHomeState(String state, String zipcode) {
if (isValidZipCode(zipcode)) {
// ...
}
return true;
}
private boolean isValidZipCode(String zipcode) {
String regex = "^[60-64]{2}{0-9}{4}$";
Pattern p = Pattern.compile(regex);
// If the pin code is empty
// return false
if (zipcode == null) {
return false;
}
Matcher m = p.matcher(zipcode);
return m.matches();
}
Try the following.
Pattern pattern = Pattern.compile("6[0-4]\\d{4}");
In other words, the digit 6 followed by a digit that is either 0 or 1 or 2 or 3 or 4 and ending with exactly four digits.
There are many regex patterns to do it e.g.
6[0-4][0-9]{4} which means 6 followed by a digit in the range, 0 to 4 which in turn followed by exactly 4 digits.
6[0-4]\d{4} which means 6 followed by a digit in the range, 0 to 4 which in turn followed by exactly 4 digits
6[0-4][0-9][0-9][0-9][0-9] which means 6 followed by a digit in the range, 0 to 4 which in turn followed by exactly 4 digits
However, let's analyze your regex, [60-64]{2}[0-9]{4} which will help you understand the problem better.
The regex, [60-64] means only one of the following:
6
A digit in the range 0 to 6
4
And [60-64]{2} means the above repeated exactly two times i.e. it will match with a combination of 2 digits in the range, 0 to 6 e.g. 00, 60, 34, 01, 10, 11 etc.
As a result, the regex, [60-64]{2}[0-9]{4} will match with first 2 digits consisting of digits in the range, 0 to 6 and next 4 digits consisting of digits in the range, 0 to 9 e.g.
012345
123856
234569
101010
202020
303030
404040
505050
606060
111111
222222
333333
which is not something you expect.

In Java with regular expressions, how to capture numbers from a string with unknown length?

My regular expression looks like this: "[a-zA-Z]+[ \t]*(?:,[ \t]*(\\d+)[ \t]*)*"
I can match the lines with this, but I don't know how to capture the numbers,I think it has to do something with grouping.
For example: from the string "asd , 5 ,2,6 ,8", how to capture the numbers 5 2 6 and 8?
A few more examples:
sdfs6df -> no capture
fdg4dfg, 5 -> capture 5
fhhh3 , 6,8 , 7 -> capture 6 8 and 7
asdasd1,4,2,7 -> capture 4 2 and 7
So I can continue my work with these numbers. Thanks in advance.
You could match the leading word characters and make use of the \G anchor capturing the continuous digits after the comma.
Pattern
(?:\w+|\G(?!^))\h*,\h*([0-9]+)
Explanation
(?: Non capture group
\w+ Match 1+ word chars
-| or
\G(?!^) Assert postition at the end of previous match, not at the start
) Close non capturing group
\h*,\h* Match a comma between horizontal whitespace chars
([0-9]+) Capture group 1, match 1+ digits
Regex demo | Java demo
In Java with double escaped backslashes:
String regex = "(?:\\w+|\\G(?!^))\\h*,\\h*([0-9]+)";
Example code
String regex = "(?:\\w+|\\G(?!^))\\h*,\\h*([0-9]+)";
String string = "sdfs6df -> no capture\n\n"
+ "fdg4dfg, 5 -> capture 5\n\n"
+ "fhhh3 , 6,8 , 7 -> capture 6 8 and 7\n\n"
+ "asdasd1,4,2,7 -> capture 4 2 and 7";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
5
6
8
7
4
2
7

Regex for string with separate sections

I am new to regular expressions in Java and I have to match a pattern of
[0-9999][A-Z][0-9999][-][0-99] for an input from the user. I'm not quite sure how to separate the different sections!
You can use this regex [0-9]{1,4}[A-Z][0-9]{1,4}[-][0-9]{1,2}:
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
System.out.println("Please enter a String that match [0-9999][A-Z][0-9999][-][0-99]");
String input = scan.nextLine();
//If your input match with your String, then print Match, else Not match
System.out.println(input.matches("[0-9]{1,4}[A-Z][0-9]{1,4}[-][0-9]{1,2}") ?
"Match" : "Not Match");
}
Explication
[0-9]{1,4} # A number between 0 and 9999
[A-Z] # An alphabetic A to Z
[0-9]{1,4} # A number between 0 and 9999
[-] # -
[0-9]{1,2} # A number between 0 and 99
You'll have to use groups in the regex as below.
([0-9999])([A-Z])([0-9999])[-]([0-99])
Then you'll be able to use Matcher.group() to find groups.
You can see it working here
https://regex101.com/r/hW3O5Z/1
You can read more about it at
https://docs.oracle.com/javase/tutorial/essential/regex/groups.html
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
The regex pattern would look like:
r'[0-9]{1,4}[A-Z][0-9]{1,4}-[0-9]{1,2}'
The square braces define sets so [0-9] will find any number between 0 and 9
The curly braces are quantifiers so {1,4} matches looks for the next 1 to 4 matches of whatever comes before it
To match the dash we just type the character
So this whole regex will look for 1 to 4 characters between 0 and 9 then a character between A and Z then 1 to 4 characters between 0 and 9 then a dash then 1 to 2 characters between 0 and 9

Understanding regular expression output [duplicate]

This question already has an answer here:
SCJP6 regex issue
(1 answer)
Closed 7 years ago.
I need help to understand the output of the code below. I am unable to figure out the output for System.out.print(m.start() + m.group());. Please can someone explain it to me?
import java.util.regex.*;
class Regex2 {
public static void main(String[] args) {
Pattern p = Pattern.compile("\\d*");
Matcher m = p.matcher("ab34ef");
boolean b = false;
while(b = m.find()) {
System.out.println(m.start() + m.group());
}
}
}
Output is:
0
1
234
4
5
6
Note that if I put System.out.println(m.start() );, output is:
0
1
2
4
5
6
Because you have included a * character, your pattern will match empty strings as well. When I change your code as I suggested in the comments, I get the following output:
0 ()
1 ()
2 (34)
4 ()
5 ()
6 ()
So you have a large number of empty matches (matching each location in the string) with the exception of 34, which matches the string of digits. Use \\d+ if you want to match digits without also matching empty strings..
You used this regex - \d* - which basically means zero or more digits. Mind the zero!
So this pattern will match any group of digits, e.g. 34 plus any other position in the string, where the matched sequence will be the empty string.
So, you will have 6 matches, starting at indices 0,1,2,4,5,6. For match starting at index 2, the matched sequence is 34, while for the remaining ones, the match will be the empty string.
If you want to find only digits, you might want to use this pattern: \d+
d* - match zero or more digits in the expresion.
expresion ab34ef and his corresponding indices 012345
On the zero index there is no match so start() prints 0 and group() prints nothing, then on the first index 1 and nothing, on the second we find match so it prints 2 and 34. Next it will print 4 and nothing and so on.
Another example:
Pattern pattern = Pattern.compile("\\d\\d");
Matcher matcher = pattern.matcher("123ddc2ab23");
while(matcher.find()) {
System.out.println("start:" + matcher.start() + " end:" + matcher.end() + " group:" + matcher.group() + ";");
}
which will println:
start:0 end:2 group:12;
start:9 end:11 group:23;
You will find more information in the tutorial

Regular expressions: some groups missing

I have following Java code:
String s2 = "SUM 12 32 42";
Pattern pat1 = Pattern.compile("(PROD)|(SUM)(\\s+(\\d+))+");
Matcher m = pat1.matcher(s2);
System.out.println(m.matches());
System.out.println(m.groupCount());
for (int i = 1; i <= m.groupCount(); ++i) {
System.out.println(m.group(i));
}
which produces:
true
4
null
SUM
42
42
I wonder what's a null and why 12 and 32 are missing (I expected to find them amongst groups).
A repeated group will contain the match of the last substring matching the expression for the group.
It would be nice if the regexp engine would give back all substrings that matched a group. Unfortunately this is not supported:
Regular expression with variable number of groups?
Furthermore groups are a static and numbered like this:
0
_______________________
/ \
(PROD)|(SUM)(\\s+(\\d+))+
\____/ \___/| \____/|
1 2 | 4 |
\________/
3
Group X from this part of your regex:
(\\s+(\\d+))+
| |
+----------+--> X
will first match 12, then 32 and finally 42. Each time X's value gets changed, and replaces the previous one. If you want all values, you'll need a Pattern & Matcher.find() approach:
String s = "SUM 12 32 42 PROD 1 2";
Matcher m = Pattern.compile("(PROD|SUM)((\\s+\\d+)+)").matcher(s);
while(m.find()) {
System.out.println("Matched : " + m.group(1));
Matcher values = Pattern.compile("\\d+").matcher(m.group(2));
while(values.find()) {
System.out.println(" : " + values.group());
}
}
which will print:
Matched : SUM
: 12
: 32
: 42
Matched : PROD
: 1
: 2
And you see a null printed because in group 1, there's PROD, which you didn't match.
I wonder what's a null
Capturing groups are indexed from left to right, starting at one. Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group().
http://download.oracle.com/javase/1.5.0/docs/api/java/util/regex/Matcher.html#group%28int%29
the string given does not matches the entire pattern.

Categories