Use regex to get 2 specific groups of substring - java

String s = #Section250342,Main,First/HS/12345/Jack/M,2000 10.00,
#Section250322,Main,First/HS/12345/Aaron/N,2000 17.00,
#Section250399,Main,First/HS/12345/Jimmy/N,2000 12.00,
#Section251234,Main,First/HS/12345/Jack/M,2000 11.00
Wherever there is the word /Jack/M in the3 string, I want to pull the section numbers(250342,251234) and the values(10.00,11.00) associated with it using regex each time.
I tried something like this https://regex101.com/r/4te0Lg/1 but it is still messed.
.Section(\d+(?:\.\d+)?).*/Jack/M

If the only parts of each section that change are the section number, the name of the person and the last value (like in your example) then you can make a pattern very easily by using one of the sections where Jack appears and replacing the numbers you want by capturing groups.
Example:
#Section250342,Main,First/HS/12345/Jack/M,2000 10.00
becomes,
#Section(\d+),Main,First/HS/12345/Jack/M,2000 (\d+.\d{2})
If the section substring keeps the format but the other parts of it may change then just replace the rest like this:
#Section(\d+),\w+,(?:\w+/)*Jack/M,\d+ (\d+.\d{2})
I'm assuming that "Main" is a class, "First/HS/..." is a path and that the last value always has 2 and only 2 decimal places.
\d - A digit: [0-9]
\w - A word character: [a-zA-Z_0-9]
+ - one or more times
* - zero or more times
{2} - exactly 2 times
() - a capturing group
(?:) - a non-capturing group
For reference see: https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/regex/Pattern.html
Simple Java example on how to get the values from the capturing groups using java.util.regex.Pattern and java.util.regex.Matcher
import java.util.regex.*;
public class GetMatch {
public static void main(String[] args) {
String s = "#Section250342,Main,First/HS/12345/Jack/M,2000 10.00,#Section250322,Main,First/HS/12345/Aaron/N,2000 17.00,#Section250399,Main,First/HS/12345/Jimmy/N,2000 12.00,#Section251234,Main,First/HS/12345/Jack/M,2000 11.00";
Pattern p = Pattern.compile("#Section(\\d+),\\w+,(?:\\w+/)*Jack/M,\\d+ (\\d+.\\d{2})");
Matcher m;
String[] tokens = s.split(",(?=#)"); //split the sections into different strings
for(String t : tokens) //checks every string that we got with the split
{
m = p.matcher(t);
if(m.matches()) //if the string matches the pattern then print the capturing groups
System.out.printf("Section: %s, Value: %s\n", m.group(1), m.group(2));
}
}
}

You could use 2 capture groups, and use a tempered greedy token approach to not cross #Section followed by a digit.
#Section(\d+)(?:(?!#Section\d).)*\bJack/M,\d+\h+(\d+(?:\.\d+)?)\b
Explanation
#Section(\d+) Match #Section and capture 1+ digits in group 1
(?:(?!#Section\d).)* Match any character if not directly followed by #Section and a digit
\bJack/M, Match the word Jack and /M,
\d+\h+ Match 1+ digits and 1+ spaces
(\d+(?:\.\d+)?) Capture group 2, match 1+ digits and an optional decimal part
\b A word boundary
Regex demo
In Java:
String regex = "#Section(\\d+)(?:(?!#Section\\d).)*\\bJack/M,\\d+\\h+(\\d+(?:\\.\\d+)?)\\b";

Related

Regex to mask multiple phone numbers (~) separated except last 4 digiits

I am trying to find a regex which masks phone numbers except last 4 digits.
example: phone=9988998888~7654321908~6789054321
Desired output : phone=******8888~******1908~*****4321
I tried below regex but it is masking only starting number
phone=******8888~7654321908~6789054321
^(phone)=(\d(?=\d{4}))*
Use replaceAll​(Function<MatchResult,​String> replacer) to replace each digit in MatchResult with "*".
public class PhoneNumberMask {
public static void main(String[] args) {
String target = "phone=9988998888~7654321908~6789054321";
Pattern pattern = Pattern.compile("(\\d+(?=\\d{4}))");
Matcher matcher = pattern.matcher(target);
String result = matcher.replaceAll((matchResult) -> matchResult.group(1).replaceAll("\\d", "*"));
System.out.println(result);
}
}
You could use:
\d(?=\d{4})
See this online demo
\d - Any single digit.
(?=\d{4}) - Positive lookahead for 4 digits.
Replace with *.
See a Java demo
Assuming you only want to mask all numbers in a string that starts with phone= separated with ~, you can use a plain regex solution without a lambda in the replacement with
String masked = text.replaceAll("(\\G(?!^)(?:\\d{4}~)?|^phone=)\\d(?=\\d{4})", "$1*");
See the regex demo. Details:
(\G(?!^)(?:\d{4}~)?|^phone=) - Group 1: end of the previous successful match and then an optional sequence of four digits and a ~ or start of string and phone=
\d - a digit
(?=\d{4}) - followed with any four digits.

How to write a regex capture group which matches a character 3 or 4 times before a delimiter?

I'm trying to write a regex that splits elements out according to a delimiter. The regex also needs to ensure there are ideally 4, but at least 3 colons : in each match.
Here's an example string:
"Checkers, etc:Blue::C, Backgammon, I say:Green::Pepsi:P, Chess, misc:White:Coke:Florida:A, :::U"
From this, there should be 4 matches:
Checkers, etc:Blue::C
Backgammon, I say:Green::Pepsi:P
Chess, misc:White:Coke:Florida:A
:::U
Here's what I've tried so far:
([^:]*:[^:]*){3,4}(?:, )
Regex 101 at: https://regex101.com/r/O8iacP/8
I tried setting up a non-capturing group for ,
Then I tried matching a group of any character that's not a :, a :, and any character that's not a : 3 or 4 times.
The code I'm using to iterate over these groups is:
String line = "Checkers, etc:Blue::C, Backgammon, I say::Pepsi:P, Chess:White:Coke:Florida:A, :::U";
String pattern = "([^:]*:[^:]*){3,4}(?:, )";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher matcher = r.matcher(line);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Any help is appreciated!
Edit
Using #Casimir's regex, it's working. I had to change the above code to use group(0) like this:
String line = "Checkers, etc:Blue::C, Backgammon, I say::Pepsi:P, Chess:White:Coke:Florida:A, :::U";
String pattern = "(?![\\s,])(?:[^:]*:){3}\\S*(?![^,])";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher matcher = r.matcher(line);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Now prints:
Checkers, etc:Blue::C
Backgammon, I say::Pepsi:P
Chess:White:Coke:Florida:A
:::U
Thanks again!
I suggest this pattern:
(?![\\s,])(?:[^:]*:){3}\\S*(?![^,])
Negative lookaheads avoid to match leading or trailing delimiters. The second one in particular forces the match to be followed by the delimiter or the end of the string (not followed by a character that isn't a comma).
demo
Note that the pattern doesn't have capture groups, so the result is the whole match (or group 0).
You might use
(?:[^,:]+, )?[^:,]*(?::+[^:,]+)+
(?:[^,:]+, )? Optionally match 1+ any char except a , or : followed by , and space
[^:,]* Match 0+ any char except : or ,
(?: Non Capturing group
:+[^:,]+ Match 1+ : and 1+ times any char except : and ,
)+ Close group and repeat 1+ times
Regex demo
You seem to be making it harder than it needs to be with the lookahead (which won't be satisfied at end-of-line anyway).
([^:]*:){3}[^:,]*:?[^:,]*
Find the first 3 :'s, then start including , in the negative groupings, with an optional 4th :.

Extracting words with - included upper lowercase not working for words it only extracts chars

I'm trying to extract several words from a string with regex matcher &pattern. I did spend some time to make the regular expression I'm using but this doesn't work as expected, any help would be very appreciated.
I made the regular expression I'm using but this doesn't work as expected, some help would be great. I'm able to extract the chars from the words I want but not the entire word.
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main (String[] args){
String mebo = "1323 99BIMCP 1 2 BMWQ-CSPS-D1, 0192, '29229'";
Pattern pattern = Pattern.compile("[((a-zA-Z1-9-0)/W)]");
Matcher matcher = pattern.matcher(mebo);
while (matcher.find()) {
System.out.printf("Word is %s %n",matcher.group(0));
}
}
}
This is current output:
Word is 1 Word is 3 Word is 2 Word is 3 Word is 9 Word is 9 Word
is B Word is I Word is M Word is C Word is P Word is 1 Word is 2
Word is B Word is M Word is W Word is Q Word is - Word is C Word
is S Word is P Word is S Word is - Word is D Word is 1 Word is 0
Word is 1 Word is 9 Word is 2 Word is 2 Word is 9 Word is 2 Word
is 2 Word is 9
============
My expectation is to iterate entire words for example:
String mebo = "1323 99BIMCP 1 2 BMWQ-CSPS-D1, 0192, '29229'"
word is 1323 word is 99BIMCP word is 1 word is 2 word is BMWQ-CSPS-D1
word is 0192 word is 29229
You can use this as it seems from your regex you want to include character digit and - in your match.
`[\w-]+`
[\w-]+ - Matches (a-z 0-9 _ and - ) one or more time.
Demo
The easiest solution here seems to be to ditch regex overall and just split the string instead. You want to allow digits, alphabetic characters and - in your words. Consider the following code:
for (String word : mebo.split("[^\\d\\w-]+")) {
System.out.printf("Word is %s %n", word);
}
This should exhibit the desired behaviour. Note that this will generate some empty strings, unless you have the + in the splitting pattern.
What this does is splitting the input string between everything that does not match your desired characters. This is accomplished through using an inverted character class.
I would suggest a regex split, followed by a regex replacement:
String mebo = "1323 99BIMCP 1 2 BMWQ-CSPS-D1, 0192, '29229'";
String[] parts = mebo.split("\\s*,?\\s+");
for (String part : parts) {
System.out.println(part.replaceAll("[']", ""));
}
1323
99BIMCP
1
2
BMWQ-CSPS-D1
0192
29229
The logic here is to split on whitespace, possibly including a comma separator. Then, we can do a regex replacement cleanup to remove stray characters such as single quotes. Double quotes and any other unwanted characters can easily be added to the character class used for replacement.
In general, regex alone may not suffice here, and you may need a parser to cover every edge case. Case in point, consider the following input line:
One, "Two or more", Three
My answer fails here, because it blindly splits on whitespace, and does not know that escaped whitespace is not a token. A regex would also fail here.

java regular expression to extract uuid within square brackets

I have string inside brackets like following format:
[space string space]
I want to extract the string if the string is in UUID format.
example : [ d6a413f4-059c-11e8-ba89-0ed5f89f718b ]
With java regular expression how can I get d6a413f4-059c-11e8-ba89-0ed5f89f718b ?
For your given example, you could use a lookaround to match what is between the [ and the ]:
(?<=\[ ).*?(?= \])
Explanation
(?= \]) positive lookbehind to assert that what is before is [
.*? match any character zero or more times non greedy
(?= \]) positive lookahead to assert that what follows is ]
For example:
String regex = "(?<=\\[ ).*?(?= \\])";
String string = "[ d6a413f4-059c-11e8-ba89-0ed5f89f718b ]";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Java example output
Using regex
\[ ([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}) ]
Regex101
Why you don't want to do this
If you know that your string will definitely have the right format then you can just use substring to get the UUID
class Main {
public static void main(String... args) {
String s = "[ d6a413f4-059c-11e8-ba89-0ed5f89f718b ]";
System.out.println(s.substring(2, s.length()-2));
}
}
Try it online!
This will be faster than using the regex option.
Regex to check if given String contains valid UUID:
"\\[ ([a-f0-9]{8}\\-(?:[a-f0-9]{4}\\-){3}[a-f0-9]{12}) \\]"
So, what is going on in this regex:
\\[ - character ‘[‘ and whitespace after it
[a-f0-9]{8} – characters from ‘a’ to ‘f’ and from ‘0’ to ‘9’ exactly eight times (123e5670 part)
\\- - ‘-‘ character
(?:[a-f0-9]{4}\\-){3} – non-capturing group that you want to be present exactly three times (this non-capturing group should contain exactly 4 characters that are in the range from ‘a’ to ‘f’ or from ‘0’ to ‘9’. After these 4 characters there must be present ‘-‘ character) (a234-b234-c234- part)
[a-f0-9]{12} - characters from ‘a’ to ‘f’ and from ‘0’ to ‘9’ exactly twelve times (d23456789012 part)
\\] – whitespace and ‘]’ character
After searching String for match with find() method, you only print capturing group #1 with group(1) method ( capturing group #1 is contained in parenthesis () )
Your UUID is in capture group 1. Here is a simple example how you can get UUID from source String:
String source = "[ 123e5670-a234-b234-c234-d23456789012 ]";
Pattern p = Pattern.compile("\\[ ([a-f0-9]{8}\\-(?:[a-f0-9]{4}\\-){3}[a-f0-9]{12}) \\]");
Matcher m = p.matcher(source);
if(m.find()) {
System.out.println( m.group(1));
}

What regex should I use to check a string only has numbers and 2 special characters ( - and , ) in Java?

Scenario: I want to check whether string contains only numbers and 2 predefined special characters, a dash and a comma.
My string contains numbers (0 to 9) and 2 special characters: a dash (-) defines a range and a comma (,) defines a sequence.
Tried attempt :
Tried following regex [0-9+-,]+, but not working as expected.
Possible inputs :
1-5
1,5
1-5,6
1,3,5-10
1-5,6-10
1,3,5-7,8,10
The regex should not accept these types of strings:
-----
1--4
,1,5
5,6,
5,4,-
5,6-
-5,6
Please can any one help me to create regex for above scenario?
You may use
^\d+(?:-\d+)?(?:,\d+(?:-\d+)?)*$
See the regex demo
Regex details:
^ - start of string
\d+ - 1 or more digits
(?:-\d+)? - an optional sequence of - and 1+ digits
(?:,\d+(?:-\d+)?)* - zero or more seuqences of:
, - a comma
\d+(?:-\d+)? - same pattern as described above
$ - end of string.
Change your regex [0-9+-,]+ to [0-9,-]+
final String patternStr = "[0-9,-]+";
final Pattern p = Pattern.compile(patternStr);
String data = "1,3,5-7,8,10";
final Matcher m = p.matcher(data);
if (m.matches()) {
System.out.println("SUCCESS");
}else{
System.out.println("ERROR");
}

Categories