Regex for string with separate sections

Regex for string with separate sections - java

I am new to regular expressions in Java and I have to match a pattern of
[0-9999][A-Z][0-9999][-][0-99] for an input from the user. I'm not quite sure how to separate the different sections!

You can use this regex [0-9]{1,4}[A-Z][0-9]{1,4}[-][0-9]{1,2}:
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
System.out.println("Please enter a String that match [0-9999][A-Z][0-9999][-][0-99]");
String input = scan.nextLine();
//If your input match with your String, then print Match, else Not match
System.out.println(input.matches("[0-9]{1,4}[A-Z][0-9]{1,4}[-][0-9]{1,2}") ?
"Match" : "Not Match");
}
Explication
[0-9]{1,4} # A number between 0 and 9999
[A-Z] # An alphabetic A to Z
[0-9]{1,4} # A number between 0 and 9999
[-] # -
[0-9]{1,2} # A number between 0 and 99

You'll have to use groups in the regex as below.
([0-9999])([A-Z])([0-9999])[-]([0-99])
Then you'll be able to use Matcher.group() to find groups.
You can see it working here
https://regex101.com/r/hW3O5Z/1
You can read more about it at
https://docs.oracle.com/javase/tutorial/essential/regex/groups.html
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

The regex pattern would look like:
r'[0-9]{1,4}[A-Z][0-9]{1,4}-[0-9]{1,2}'
The square braces define sets so [0-9] will find any number between 0 and 9
The curly braces are quantifiers so {1,4} matches looks for the next 1 to 4 matches of whatever comes before it
To match the dash we just type the character
So this whole regex will look for 1 to 4 characters between 0 and 9 then a character between A and Z then 1 to 4 characters between 0 and 9 then a dash then 1 to 2 characters between 0 and 9

Related

Use regex to get 2 specific groups of substring

String s = #Section250342,Main,First/HS/12345/Jack/M,2000 10.00,
#Section250322,Main,First/HS/12345/Aaron/N,2000 17.00,
#Section250399,Main,First/HS/12345/Jimmy/N,2000 12.00,
#Section251234,Main,First/HS/12345/Jack/M,2000 11.00
Wherever there is the word /Jack/M in the3 string, I want to pull the section numbers(250342,251234) and the values(10.00,11.00) associated with it using regex each time.
I tried something like this https://regex101.com/r/4te0Lg/1 but it is still messed.
.Section(\d+(?:\.\d+)?).*/Jack/M

If the only parts of each section that change are the section number, the name of the person and the last value (like in your example) then you can make a pattern very easily by using one of the sections where Jack appears and replacing the numbers you want by capturing groups.
Example:
#Section250342,Main,First/HS/12345/Jack/M,2000 10.00
becomes,
#Section(\d+),Main,First/HS/12345/Jack/M,2000 (\d+.\d{2})
If the section substring keeps the format but the other parts of it may change then just replace the rest like this:
#Section(\d+),\w+,(?:\w+/)*Jack/M,\d+ (\d+.\d{2})
I'm assuming that "Main" is a class, "First/HS/..." is a path and that the last value always has 2 and only 2 decimal places.
\d - A digit: [0-9]
\w - A word character: [a-zA-Z_0-9]
+ - one or more times
* - zero or more times
{2} - exactly 2 times
() - a capturing group
(?:) - a non-capturing group
For reference see: https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/regex/Pattern.html
Simple Java example on how to get the values from the capturing groups using java.util.regex.Pattern and java.util.regex.Matcher
import java.util.regex.*;
public class GetMatch {
public static void main(String[] args) {
String s = "#Section250342,Main,First/HS/12345/Jack/M,2000 10.00,#Section250322,Main,First/HS/12345/Aaron/N,2000 17.00,#Section250399,Main,First/HS/12345/Jimmy/N,2000 12.00,#Section251234,Main,First/HS/12345/Jack/M,2000 11.00";
Pattern p = Pattern.compile("#Section(\\d+),\\w+,(?:\\w+/)*Jack/M,\\d+ (\\d+.\\d{2})");
Matcher m;
String[] tokens = s.split(",(?=#)"); //split the sections into different strings
for(String t : tokens) //checks every string that we got with the split
{
m = p.matcher(t);
if(m.matches()) //if the string matches the pattern then print the capturing groups
System.out.printf("Section: %s, Value: %s\n", m.group(1), m.group(2));
}
}
}

You could use 2 capture groups, and use a tempered greedy token approach to not cross #Section followed by a digit.
#Section(\d+)(?:(?!#Section\d).)*\bJack/M,\d+\h+(\d+(?:\.\d+)?)\b
Explanation
#Section(\d+) Match #Section and capture 1+ digits in group 1
(?:(?!#Section\d).)* Match any character if not directly followed by #Section and a digit
\bJack/M, Match the word Jack and /M,
\d+\h+ Match 1+ digits and 1+ spaces
(\d+(?:\.\d+)?) Capture group 2, match 1+ digits and an optional decimal part
\b A word boundary
Regex demo
In Java:
String regex = "#Section(\\d+)(?:(?!#Section\\d).)*\\bJack/M,\\d+\\h+(\\d+(?:\\.\\d+)?)\\b";

Regex not matching all numbers with delimiters

Need a single combined regex for the following pattern:
Prefix: 2221-2720 , Length: 16
Prefix: 51-55 , Length: 16
where the delimiters b/w digits can be either space ( ), minus sign (-), period (.), backslash (\), equals (=). The condition being that more than one delimiter (same or different type) can't occur more than once b/w any two digits.
Valid number - 230.293.217.952.148.4
Valid number - 230.293 217-952.148.4
Invalid number - 230..293.217.952.148.4
Invalid number - 230.293.-217. 952.148.4
A valid input is one where you have 16 digits separated by any/no delimiters as long as there are no two delimiters adjacent to each other.
Have come up with the following regex:
(2[\s=\\.-]*2[\s=\\.-]*2[\s=\\.-]*[1-9][\s=\\.-]*|2[\s=\\.-]*2[\s=\\.-]*[3-9][\s=\\.-]*[0-9][\s=\\.-]*|2[\s=\\.-]*[3-6][\s=\\.-]*[0-9](?:[\s=\\.-]*[0-9]){1}|2[\s=\\.-]*7[\s=\\.-]*[01][\s=\\.-]*[0-9][\s=\\.-]*|2[\s=\\.-]*7[\s=\\.-]*2[\s=\\.-]*0[\s=\\.-]*)[0-9](?:[\s=\\.-]*[0-9]){11}|(5[\s=\\.-]*[1-5][\s=\\.-]*)[0-9](?:[\s=\\.-]*[0-9]){13}
It does not match certain patterns. For example:
2 3 0 2 9 3 2 1 7 9 5 2 1 4 8 4
23-02-93-21-79-52-14-84
2 3 0 3 4 5 8 0 9 4 9 3 0 8 2 3
For the same numbers, it matches (as expected) the following patterns:
2302932179521484
230.293.217.952.148.4
2303458094930823
230.345.809.493.082.3
230-345-809-493-082-3
There seems to be an issue with delimiters. Kindly let me know what is wrong with my regex.

For this rule
A valid input is one where you have 16 digits separated by any/no
delimiters as long as there are no two delimiters adjacent to each
other
Prefix: 2221-2720 , Length: 16
Prefix: 51-55 , Length: 16
2221 can also be written as 2.2.-2.1
For these rules, it might be easier to write a pattern with 2 capture groups to match the whole string.
Then using some Java code, you can check the value of the capture groups for the ranges.
^((\d[ =\\.-]?\d)[ =\\.-]?\d[ =\\.-]?\d)(?:[ =\\.-]?\d){12}$
The pattern matches:
^ Start of string
( Capture group 1
(\d[ =\\.-]?\d) Capture group 2 Match 2 digits with an optional char = \ . -
[ =\\.-]?\d[ =\\.-]?\d Match 2 times optionally 1 of the listed chars and a single digit
) close group 1
(?:[ =\\.-]?\d){12} Repeat 12 times matching one of the characters and a single digit
$ End of string
Regex demo | Java demo
For example
String strings[] = {
"2221.7.952.148.412.32",
"230.293.217.952.148.4",
"5511111111111111",
"130.293 217-952.148.4",
"30..293.217.952.148.4",
"5..5",
".5.5."
};
String regex = "^((\\d[ =\\\\.-]?\\d)[ =\\\\.-]?\\d[ =\\\\.-]?\\d)(?:[ =\\\\.-]?\\d){12}$";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
int grp1 = Integer.parseInt(matcher.group(1).replaceAll("\\D+", ""));
int grp2 = Integer.parseInt(matcher.group(2).replaceAll("\\D+", ""));
if ((grp1 >= 2221 && grp1 <= 2720) || (grp2 >=51 && grp2 <= 55)) {
System.out.println("Match for " + matcher.group());
}
}
}
Output
Match for 2221.7.952.148.412.32
Match for 230.293.217.952.148.4
Match for 5511111111111111

Why does this regex fails to check accurately?

I have the following regex method which does the matches in 3 stages for a given string. But for some reason the Regex fails to check some of the things. As per whatever knowledge I have gained by working they seem to be correct. Can someone please correct me what am I doing wrong here?
I have the following code:
public class App {
public static void main(String[] args) {
String identifier = "urn:abc:de:xyz:234567.1890123";
if (identifier.matches("^urn:abc:de:xyz:.*")) {
System.out.println("Match ONE");
if (identifier.matches("^urn:abc:de:xyz:[0-9]{6,12}.[0-9]{1,7}.*")) {
System.out.println("Match TWO");
if (identifier.matches("^urn:abc:de:xyz:[0-9]{6,12}.[a-zA-Z0-9.-_]{1,20}$")) {
System.out.println("Match Three");
}
}
}
}
}
Ideally, this code should generate the output
Match ONE
Match TWO
Match Three
Only when the identifier = "urn:abc:de:xyz:234567.1890123.abd12" but it provides the same output event if the identifier does not match the regex such as for the following inputs:
"urn:abc:de:xyz:234567.1890123"
"urn:abc:de:xyz:234567.1890ANC"
"urn:abc:de:xyz:234567.1890123"
"urn:abc:de:xyz:234567.1890ACB.123"
I am not understanding why is it allowing the Alphanumeric characters after the . and also it does not care about the characters after the second ..
I would like my Regex to check that the string has the following format:
String starts with urn:abc:de:xyz:
Then it has the numbers [0-9] which range from 6 to 12 (234567).
Then it has the decimal point .
Then it has the numbers [0-9] which range from 1 to 7 (1890123)
Then it has the decimal point ..
Finally it has the alphanumeric character and spcial character which range from 1 to 20 (ABC123.-_12).
This is an valid string for my regex: urn:abc:de:xyz:234567.1890123.ABC123.-_12
This is an invalid string for my regex as it misses the elements from point 6:
urn:abc:de:xyz:234567.1890123
This is also an invalid string for my regex as it misses the elements from point 4 (it has ABC instead of decimal numbers).
urn:abc:de:xyz:234567.1890ABC.ABC123.-_12

This part of the regex:
[0-9]{6,12}.[0-9]{1,7} matches 6 to 12 digits followed by any character followed by 1 to 7 digits
To match a dot, it needs to be escaped. Try this:
^urn:abc:de:xyz:[0-9]{6,12}\.[0-9]{1,7}\.[a-zA-Z0-9\-_]{1,20}$

This will match with any number of dot alphanum at the end of the string as your examples:
^urn:abc:de:xyz:\d{6,12}\.\d{1,7}(?:\.[\w-]{1,20})+$
Demo & explanation

Mask mobile number in Java [duplicate]

I would like to mask the last 4 digits of the identity number (hkid)
A123456(7) -> A123***(*)
I can do this by below:
hkid.replaceAll("\\d{3}\\(\\d\\)", "***(*)")
However, can my regular expression really can match the last 4 digit and replace by "*"?
hkid.replaceAll(regex, "*")
Please help, thanks.
Jessie

Personally, I wouldn't do it with regular expressions:
char[] cs = hkid.toCharArray();
for (int i = cs.length - 1, d = 0; i >= 0 && d < 4; --i) {
if (Character.isDigit(cs[i])) {
cs[i] = '*';
++d;
}
}
String masked = new String(cs);
This goes from the end of the string, looking for digit characters, which it replaces with a *. Once it's found 4 (or reaches the start of the string), it stops iterating, and builds a new string.

While I agree that a non-regex solution is probably the simplest and fastest, here's a regex to catch the last 4 digits independent if there is a grouping ot not: \d(?=(?:\D*\d){0,3}\D*$)
This expression is meant to match any digit that is followed by 0 to 3 digits before hitting the end of the input.
A short breakdown of the expression:
\d matches a single digit
\D matches a single non-digit
(?=...) is a positive look-ahead that contributes to the match but isn't consumed
(?:...){0,3} is a non-capturing group with a quantity of 0 to 3 occurences given.
$ matches the end of the input
So you could read the expression as follows: "match a single digit if it is followed by a sequence of 0 to 3 times any number of non-digits which are followed by a single digit and that sequence is followed by any number of non-digits and the end of the input" (sounds complicated, no?).
Some results when using input.replaceAll( "\\d(?=(?:\\D*\\d){0,3}\\D*$)", "*" ):
input = "A1234567" -> output = "A123****"
input = "A123456(7)" -> output = "A123***(*)"
input = "A12345(67)" -> output = "A123**(**)"
input = "A1(234567)" -> output = "A1(23****)"
input = "A1234B567" -> output = "A123*B***"
As you can see in the last example the expression will match digits only. If you want to match letters as well either replace \d and \D with \w and \W (note that \w matches underscores as well) or use custom character classes, e.g. [02468] and [^02468] to match even digits only.

How to match just 1 or 2 chars with regex

i want regx to match any word of 2 or 1 characters example ( is , an , or , if, a )
i tried this :-
int scount = 0;
String txt = "hello everyone this is just test aa ";
Pattern p2 = Pattern.compile("\\w{1,2}");
Matcher m2 = p2.matcher(txt);
while (m2.find()) {
scount++;
}
but got wrong matches.

You probably want to use word boundary anchors:
Pattern p2 = Pattern.compile("\\b\\w{1,2}\\b");
These anchors match at the start/end of alphanumeric "words", that is, in positions before a \w character if there is no \w character before that, or after a \w character if there is no \w character after that.

I think that you should be a bit more descriptive. Your current code returns 15 from the variable scount. That's not nothing.
If you want to get a count of the 2 letter words, and that is excluding underscores, digits within this count, I think that you would be better off with negative lookarounds:
Pattern.compile("(?i)(?<![a-z])[a-z]{1,2}(?![a-z])");
With a string input of hello everyone this is just 1 test aa, you get the value of scount as 2 (is and aa) and not 3 (is, 1, aa) as you would have if you were looking for only 1 or 2 consecutive \w.
Also, with hello everyone this is just test aa_, you get a count of 1 with \w (is), but 2 (is, aa)with the lookarounds.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex for string with separate sections - java

I am new to regular expressions in Java and I have to match a pattern of [0-9999][A-Z][0-9999][-][0-99] for an input from the user. I'm not quite sure how to separate the different sections!

Related

Use regex to get 2 specific groups of substring

Regex not matching all numbers with delimiters

Why does this regex fails to check accurately?

Mask mobile number in Java [duplicate]

How to match just 1 or 2 chars with regex

Categories

Resources