java regular expression to extract uuid within square brackets - java

I have string inside brackets like following format:
[space string space]
I want to extract the string if the string is in UUID format.
example : [ d6a413f4-059c-11e8-ba89-0ed5f89f718b ]
With java regular expression how can I get d6a413f4-059c-11e8-ba89-0ed5f89f718b ?

For your given example, you could use a lookaround to match what is between the [ and the ]:
(?<=\[ ).*?(?= \])
Explanation
(?= \]) positive lookbehind to assert that what is before is [
.*? match any character zero or more times non greedy
(?= \]) positive lookahead to assert that what follows is ]
For example:
String regex = "(?<=\\[ ).*?(?= \\])";
String string = "[ d6a413f4-059c-11e8-ba89-0ed5f89f718b ]";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Java example output

Using regex
\[ ([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}) ]
Regex101
Why you don't want to do this
If you know that your string will definitely have the right format then you can just use substring to get the UUID
class Main {
public static void main(String... args) {
String s = "[ d6a413f4-059c-11e8-ba89-0ed5f89f718b ]";
System.out.println(s.substring(2, s.length()-2));
}
}
Try it online!
This will be faster than using the regex option.

Regex to check if given String contains valid UUID:
"\\[ ([a-f0-9]{8}\\-(?:[a-f0-9]{4}\\-){3}[a-f0-9]{12}) \\]"
So, what is going on in this regex:
\\[ - character ‘[‘ and whitespace after it
[a-f0-9]{8} – characters from ‘a’ to ‘f’ and from ‘0’ to ‘9’ exactly eight times (123e5670 part)
\\- - ‘-‘ character
(?:[a-f0-9]{4}\\-){3} – non-capturing group that you want to be present exactly three times (this non-capturing group should contain exactly 4 characters that are in the range from ‘a’ to ‘f’ or from ‘0’ to ‘9’. After these 4 characters there must be present ‘-‘ character) (a234-b234-c234- part)
[a-f0-9]{12} - characters from ‘a’ to ‘f’ and from ‘0’ to ‘9’ exactly twelve times (d23456789012 part)
\\] – whitespace and ‘]’ character
After searching String for match with find() method, you only print capturing group #1 with group(1) method ( capturing group #1 is contained in parenthesis () )
Your UUID is in capture group 1. Here is a simple example how you can get UUID from source String:
String source = "[ 123e5670-a234-b234-c234-d23456789012 ]";
Pattern p = Pattern.compile("\\[ ([a-f0-9]{8}\\-(?:[a-f0-9]{4}\\-){3}[a-f0-9]{12}) \\]");
Matcher m = p.matcher(source);
if(m.find()) {
System.out.println( m.group(1));
}

Related

Use regex to get 2 specific groups of substring

String s = #Section250342,Main,First/HS/12345/Jack/M,2000 10.00,
#Section250322,Main,First/HS/12345/Aaron/N,2000 17.00,
#Section250399,Main,First/HS/12345/Jimmy/N,2000 12.00,
#Section251234,Main,First/HS/12345/Jack/M,2000 11.00
Wherever there is the word /Jack/M in the3 string, I want to pull the section numbers(250342,251234) and the values(10.00,11.00) associated with it using regex each time.
I tried something like this https://regex101.com/r/4te0Lg/1 but it is still messed.
.Section(\d+(?:\.\d+)?).*/Jack/M
If the only parts of each section that change are the section number, the name of the person and the last value (like in your example) then you can make a pattern very easily by using one of the sections where Jack appears and replacing the numbers you want by capturing groups.
Example:
#Section250342,Main,First/HS/12345/Jack/M,2000 10.00
becomes,
#Section(\d+),Main,First/HS/12345/Jack/M,2000 (\d+.\d{2})
If the section substring keeps the format but the other parts of it may change then just replace the rest like this:
#Section(\d+),\w+,(?:\w+/)*Jack/M,\d+ (\d+.\d{2})
I'm assuming that "Main" is a class, "First/HS/..." is a path and that the last value always has 2 and only 2 decimal places.
\d - A digit: [0-9]
\w - A word character: [a-zA-Z_0-9]
+ - one or more times
* - zero or more times
{2} - exactly 2 times
() - a capturing group
(?:) - a non-capturing group
For reference see: https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/regex/Pattern.html
Simple Java example on how to get the values from the capturing groups using java.util.regex.Pattern and java.util.regex.Matcher
import java.util.regex.*;
public class GetMatch {
public static void main(String[] args) {
String s = "#Section250342,Main,First/HS/12345/Jack/M,2000 10.00,#Section250322,Main,First/HS/12345/Aaron/N,2000 17.00,#Section250399,Main,First/HS/12345/Jimmy/N,2000 12.00,#Section251234,Main,First/HS/12345/Jack/M,2000 11.00";
Pattern p = Pattern.compile("#Section(\\d+),\\w+,(?:\\w+/)*Jack/M,\\d+ (\\d+.\\d{2})");
Matcher m;
String[] tokens = s.split(",(?=#)"); //split the sections into different strings
for(String t : tokens) //checks every string that we got with the split
{
m = p.matcher(t);
if(m.matches()) //if the string matches the pattern then print the capturing groups
System.out.printf("Section: %s, Value: %s\n", m.group(1), m.group(2));
}
}
}
You could use 2 capture groups, and use a tempered greedy token approach to not cross #Section followed by a digit.
#Section(\d+)(?:(?!#Section\d).)*\bJack/M,\d+\h+(\d+(?:\.\d+)?)\b
Explanation
#Section(\d+) Match #Section and capture 1+ digits in group 1
(?:(?!#Section\d).)* Match any character if not directly followed by #Section and a digit
\bJack/M, Match the word Jack and /M,
\d+\h+ Match 1+ digits and 1+ spaces
(\d+(?:\.\d+)?) Capture group 2, match 1+ digits and an optional decimal part
\b A word boundary
Regex demo
In Java:
String regex = "#Section(\\d+)(?:(?!#Section\\d).)*\\bJack/M,\\d+\\h+(\\d+(?:\\.\\d+)?)\\b";

How to divide a string with a regex through characters that are outside square brackets?

I have this string:
PARTNER6;PARTNER7[PORTAL4;PORTAL5];PARTNER1[PARTNER1WEB] -> ∞
I want to divide it like this:
PARTNER6
PARTNER7[PORTAL4;PORTAL5]
PARTNER1[PARTNER1WEB]
I tried to use this expression, but it divides everything including what is in parentheses
[\s,;]+
I can't figure out how to divide only what is outside the brackets
You may use this regex to get all of your matches:
\w+(?:\[[^]]*\]\w*)*\w*
RegEx Demo
RegEx Details:
\w+: Match 1+ word characters
(?:\[[^]]*\]\w*)*: Match [...] string followed by 0 or more word characters. Repeat this group 0 or more times
\w*: Match 0 or more word characters
Code:
jshell> String regex = "\\w+(?:\\[[^]]*\\]\\w*)*\\w*";
regex ==> "\\w+(?:\\[[^]]*\\]\\w*)*\\w*"
jshell> String string = "PARTNER6;PARTNER7[PORTAL4;PORTAL5];PARTNER1[PARTNER1WEB] -> ∞";
string ==> "PARTNER6;PARTNER7[PORTAL4;PORTAL5];PARTNER1[PARTNER1WEB] -> ∞"
jshell> Pattern.compile(regex).matcher(string).results().map(MatchResult::group).collect(Collectors.toList());
$3 ==> [PARTNER6, PARTNER7[PORTAL4;PORTAL5], PARTNER1[PARTNER1WEB]]
I would use a regex find all approach here:
String input = "PARTNER6;PARTNER7[PORTAL4;PORTAL5];PARTNER1[PARTNER1WEB]";
String pattern = "(\\w+(?:\\[.*?\\])?);?";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
This prints:
PARTNER6
PARTNER7[PORTAL4;PORTAL5]
PARTNER1[PARTNER1WEB]
The regex pattern used above matches a word with \w+, followed by an optional term in square brackets, followed by optional semicolon.

How to write a regex capture group which matches a character 3 or 4 times before a delimiter?

I'm trying to write a regex that splits elements out according to a delimiter. The regex also needs to ensure there are ideally 4, but at least 3 colons : in each match.
Here's an example string:
"Checkers, etc:Blue::C, Backgammon, I say:Green::Pepsi:P, Chess, misc:White:Coke:Florida:A, :::U"
From this, there should be 4 matches:
Checkers, etc:Blue::C
Backgammon, I say:Green::Pepsi:P
Chess, misc:White:Coke:Florida:A
:::U
Here's what I've tried so far:
([^:]*:[^:]*){3,4}(?:, )
Regex 101 at: https://regex101.com/r/O8iacP/8
I tried setting up a non-capturing group for ,
Then I tried matching a group of any character that's not a :, a :, and any character that's not a : 3 or 4 times.
The code I'm using to iterate over these groups is:
String line = "Checkers, etc:Blue::C, Backgammon, I say::Pepsi:P, Chess:White:Coke:Florida:A, :::U";
String pattern = "([^:]*:[^:]*){3,4}(?:, )";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher matcher = r.matcher(line);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Any help is appreciated!
Edit
Using #Casimir's regex, it's working. I had to change the above code to use group(0) like this:
String line = "Checkers, etc:Blue::C, Backgammon, I say::Pepsi:P, Chess:White:Coke:Florida:A, :::U";
String pattern = "(?![\\s,])(?:[^:]*:){3}\\S*(?![^,])";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher matcher = r.matcher(line);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Now prints:
Checkers, etc:Blue::C
Backgammon, I say::Pepsi:P
Chess:White:Coke:Florida:A
:::U
Thanks again!
I suggest this pattern:
(?![\\s,])(?:[^:]*:){3}\\S*(?![^,])
Negative lookaheads avoid to match leading or trailing delimiters. The second one in particular forces the match to be followed by the delimiter or the end of the string (not followed by a character that isn't a comma).
demo
Note that the pattern doesn't have capture groups, so the result is the whole match (or group 0).
You might use
(?:[^,:]+, )?[^:,]*(?::+[^:,]+)+
(?:[^,:]+, )? Optionally match 1+ any char except a , or : followed by , and space
[^:,]* Match 0+ any char except : or ,
(?: Non Capturing group
:+[^:,]+ Match 1+ : and 1+ times any char except : and ,
)+ Close group and repeat 1+ times
Regex demo
You seem to be making it harder than it needs to be with the lookahead (which won't be satisfied at end-of-line anyway).
([^:]*:){3}[^:,]*:?[^:,]*
Find the first 3 :'s, then start including , in the negative groupings, with an optional 4th :.

regex find string between 2 characters, seperated by comma

I am new to regular expression and i want to find a string between two characters,
I tried below but it always returns false. May i know whats wrong with this ?
public static void main(String[] args) {
String input = "myFunction(hello ,world, test)";
String patternString = "\\(([^]]+)\\)";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
Input:
myFunction(hello,world,test) where myFunction can be any characters. before starting ( there can be any characters.
Output:
hello
world
test
You could match make use of the \G anchor which asserts the position at the end of the previous match and and capture your values in a group:
(?:\bmyFunction\(|\G(?!^))([^,]+)(?:\h*,\h*)?(?=[^)]*\))
In Java:
String regex = "(?:\\bmyFunction\\(|\\G(?!^))([^,]+)(?:\\h*,\\h*)?(?=[^)]*\\))";
Explanation
(?: Non capturing group
\bmyFunction\( Word boundary to prevent the match being part of a larger word, match myFunction and an opening parentheses (
| Or
\G(?!^) Assert position at the end of previous match, not at the start of the string
) Close non capturing group
([^,]+) Capture in a group matching 1+ times not a comma
(?:\h*,\h*)? Optionally match a comma surrounded by 0+ horizontal whitespace chars
(?=[^)]*\)) Positive lookahead, assert what is on the right is a closing parenthesis )
Regex demo | Java demo
For example:
String patternString = "(?:\\bmyFunction\\(|\\G(?!^))([^,]+)(?:\\h*,\\h*)?(?=[^)]*\\))";
String input = "myFunction(hello ,world, test)";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Result
hello
world
test
I'd suggest you to achieve this in a two-step process:
Step 1: Capture all the content between ( and )
Use the regex: ^\S+\((.*)\)$
Demo
The first and the only capturing group will contain the required text.
Step 2: Split the captured string above on ,, thus yielding all the comma-separated parameters independently.
See this you may get idea
([\w]+),([\w]+),([\w]+)
DEMO: https://rubular.com/r/9HDIwBTacxTy2O

Regular expression java to extract the balance from a string

I have a String which contains " Dear user BAL= 1,234/ ".
I want to extract 1,234 from the String using the regular expression. It can be 1,23, 1,2345, 5,213 or 500
final Pattern p=Pattern.compile("((BAL)=*(\\s{1}\\w+))");
final Matcherm m = p.matcher(text);
if(m.find())
return m.group(3);
else
return "";
This returns 3.
What regular expression should I make? I am new to regular expressions.
You search in your regex for word characters \w+ but you should search for digits with \d+.
Additionally there is the comma, so you need to match that as well.
I'd use
/.BAL=\s([\d,]+(?=/)./
as pattern and get only the number in the resulting group.
Explanation:
.* match anything before
BAL= match the string "BAL="
\s match a whitespace
( start matching group
[\d,]+ matches every digit or comma one ore more times
(?=/) match the former only if followed by a slash
) end matching group
.* matches anything thereaft
This is untestet, but it should work like this:
final Pattern p=Pattern.compile(".*BAL=\\s([\\d,]+(?=/)).*");
final Matcherm m = p.matcher(text);
if(m.find())
return m.group(1);
else
return "";
According to an online tester, the pattern above matches the text:
BAL= 1,234/
If it didn't have to be extracted by the regular expression you could simply do:
// split on any whitespace into a 4-element array
String[] foo = text.split("\\s+");
return foo[3];

Categories