Java regex repeating capture groups

Java regex repeating capture groups - java

Considering the following string: "${test.one}${test.two}" I would like my regex to return two matches, namely "test.one" and "test.two". To do that I have the following snippet:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTester {
private static final Pattern pattern = Pattern.compile("\\$\\{((?:(?:[A-z]+(?:\\.[A-z0-9()\\[\\]\"]+)*)+|(?:\"[\\w/?.&=_\\-]*\")+)+)}+$");
public static void main(String[] args) {
String testString = "${test.one}${test.two}";
Matcher matcher = pattern.matcher(testString);
while (matcher.find()) {
for (int i = 0; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
}
}
I have some other stuff in there as well, because I want this to also be a valid match ${test.one}${"hello"}.
So, basically, I just want it to match on anything inside of ${} as long as it either follows the format: something.somethingelse (alphanumeric only there) or something.somethingElse() or "something inside of quotations" (alphanumeric plus some other characters). I have the main regex working, or so I think, but when I run the code, it finds two groups,
${test.two}
test.two
I want the output to be
test.one
test.two

Basically, your regex main problem is that it matches only at the end of string, and you match many more chars that just letters with [A-z]. Your grouping also seem off.
If you load your regex at regex101, you will see it matches
\$\{
( - start of a capturing group
(?: - start of a non-capturing group
(?:[A-z]+ - start of a non-capturing group, and it matches 1+ chars between A and z (your first mistake)
(?:\.[A-z0-9()\[\]\"]+)* - 0 or more repetitions of a . and then 1+ letters, digits, (, ), [, ], ", \, ^, _, and a backtick
)+ - repeat the non-capturing group 1 or more times
| - or
(?:\"[\w/?.&=_\-]*\")+ - 1 or more occurrences of ", 0 or more word, /, ?, ., &, =, _, - chars and then a "
)+ - repeat the group pattern 1+ times
) - end of non-capturing group
}+ - 1+ } chars
$ - end of string.
To match any occurrence of your pattern inside a string, you need to use
\$\{(\"[^\"]*\"|\w+(?:\(\))?(?:\.\w+(?:\(\))?)*)}
See the regex demo, get Group 1 value after a match is found. Details:
\$\{ - a ${ substring
(\"[^\"]*\"|\w+(?:\(\))?(?:\.\w+(?:\(\))?)*) - Capturing group 1:
\"[^\"]*\" - ", 0+ chars other than " and then a "
| - or
\w+(?:\(\))? - 1+ word chars and an optional () substring
(?:\.\w+(?:\(\))?)* - 0 or more repetitions of . and then 1+ word chars and an optional () substring
} - a } char.
See the Java demo:
String s = "${test.one}${test.two}\n${test.one}${test.two()}\n${test.one}${\"hello\"}";
Pattern pattern = Pattern.compile("\\$\\{(\"[^\"]*\"|\\w+(?:\\(\\))?(?:\\.\\w+(?:\\(\\))?)*)}");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
}
Output:
test.one
test.two
test.one
test.two()
test.one
"hello"

You could use the regular expression
(?<=\$\{")[a-z]+(?="\})|(?<=\$\{)[a-z]+\.[a-z]+(?:\(\))?(?=\})
which has no capture groups. The characters classes [a-z] can be modified as required provided they do not include a double-quote, period or right brace.
Demo
Java's regex engine performs the following operations.
(?<=\$\{") # match '${"' in a positive lookbehind
[a-z]+ # match 1+ lowercase letters
(?="\}) # match '"}' in a positive lookahead
| # or
(?<=\$\{) # match '${' in a positive lookbehind
[a-z]+ # match 1+ lowercase letters
\.[a-z]+ # match '.' followed by 1+ lowercase letters
(?:\(\))? # optionally match `()`
(?=\}) # match '}' in a positive lookahead

Related

Find duplicate char sequences in String by regex in Java

I have an input string and I want to use regex to check if this string has = and $, e.g:
Input:
name=alice$name=peter$name=angelina
Output: true
Input:
name=alicename=peter$name=angelina
Output: false
My regex does't work:
Pattern pattern = Pattern.compile("([a-z]*=[0-9]*$])*");
Matcher matcher = pattern.matcher("name=rob$name=bob");

With .matches(), you may use
Pattern pattern = Pattern.compile("\\p{Lower}+=\\p{Lower}+(?:\\$\\p{Lower}+=\\p{Lower}+)*"); // With `matches()` to ensure whole string match
Details
\p{Lower}+ - 1+ lowercase letters (use \p{L} to match any and \p{Alpha} to only match ASCII letters)
= - a = char
\p{Lower}+ - 1+ lowercase letters
(?:\\$\\p{Lower}+=\\p{Lower}+)* - 0 or more occurrences of:
\$ - a $ char
\p{Lower}+=\p{Lower}+ - 1+ lowercase letters, = and 1+ lowercase letters.
See the Java demo:
List<String> strs = Arrays.asList("name=alice$name=peter$name=angelina", "name=alicename=peter$name=angelina");
Pattern pattern = Pattern.compile("\\p{Lower}+=\\p{Lower}+(?:\\$\\p{Lower}+=\\p{Lower}+)*");
for (String str : strs)
System.out.println("\"" + str + "\" => " + pattern.matcher(str).matches());
Output:
"name=alice$name=peter$name=angelina" => true
"name=alicename=peter$name=angelina" => false

You have extra ] and need to escape $ to use it as a character though you also need to match the last parameter without $ so use
([a-z]*=[a-z0-9]*(\$|$))*
• [a-z]*= : match a-z zero or more times, match = character
• [a-z0-9]*(\$|$): match a-z and 0-9, zero or more times, followed by either $ character or end of match.
• ([a-z]*=[a-z0-9]*(\$|$))*: match zero or more occurences of pairs.
Note: use + (one or more matches) instead of * for strict matching as:
([a-z]+=[a-z0-9]+(\$|$))*

How do I write a multi-regex line?

I'm trying to write a line of regex that performs the following:
A string variable that can contain only:
The letters a to z (upper and lowercase) (zero or many times)
The hyphen character (zero or many times)
The single quote character (zero or one time)
The space character (zero or one time)
Tried searching through many regex websites
.matches("([a-zA-Z_0-9']*(\\s)?)(-)?"))
This allows close to what I want, however you cant start typing a-z anymore after you have typed in space character. So it's sequential in a way. I want the validation to allow for any sequence of those factors.
Expected:
Allowed to type a string that has any amount of a-zA-Z, zero to one space, zero to one dash, anywhere throughout the string.

This is a validation for that
"^(?!.*\\s.*\\s)(?!.*'.*')[a-zA-Z'\\s-]*$"
Expanded
^ # Begin
(?! .* \s .* \s ) # Max single whitespace
(?! .* ' .* ' ) # Max single, single quote
[a-zA-Z'\s-]* # Optional a-z, A-Z, ', whitespace or - characters
$ # End

I guess,
^(?!.*([ ']).*\\1)[A-Za-z' -]*$
might work OK.
Here,
(?!.*([ ']).*\\1)
we are trying to say that, if there was horizontal space (\h) or single quote (') twice in the string, exclude those, which we would be then keeping only those with zero or one time of repetition.
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
public static void main(String[] args){
final String regex = "^(?!.*([ ']).*\\1)[A-Za-z' -]*$";
final String string = "abcAbc- ";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
Output
Full match: abcAbc-
Group 1: null
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:

How to find words having given letter using Java Regex

public class Homework {
public static void main(String[] args) {
String words[] = { "Abendessen", "Affe", "Affen", "aber", "anders", "Attacke", "arrangieren", "Art", "Asien",
"Bund", "Arten", "Biene", "Abend", "baden", "suchen", "A1rten", "Abend-Essen" };
Pattern pattern = Pattern.compile("[aA][a-z[n]+a-z]*");
for (int i = 0; i < words.length; i++) {
Matcher matcher = pattern.matcher(words[i]);
if (matcher.find()) {
System.out.println("OK: " + words[i]);
}
}
}
}
Filters for words beginning with a or A and having an n in the word. These words may only consist of letters and have only small letters starting with the second letter.
These words should be matched: Abendessen, Affen, anders, arrangieren, Asien, Arten, Abend
I've tried this regular expression above carelessly and believe that's wrong too.

Your current pattern [aA][a-z[n]+a-z]* reads as:
Character class [aA], character class [a-z[n]+. It is then followed by a-z]* which will match an a, -, z and ] repeated 0+ times.
That would for example match Abendessena-z]
What you might do is to start the match with a or A and repeat 2 times [a-z] 0+ times and make sure that there is a n in the middle:
\b[aA][a-z]*n[a-z]*\b
Explanation
\b Word boundary
[aA] Match a or A
[a-z]* Match 0+ times a-z
n Match n
[a-z]* Match 0+ times a-z
\b Word boundary
You might also use anchors ^ and $ to assert that start and the end of the string instead of \b
Regex demo

java regular expression to extract uuid within square brackets

I have string inside brackets like following format:
[space string space]
I want to extract the string if the string is in UUID format.
example : [ d6a413f4-059c-11e8-ba89-0ed5f89f718b ]
With java regular expression how can I get d6a413f4-059c-11e8-ba89-0ed5f89f718b ?

For your given example, you could use a lookaround to match what is between the [ and the ]:
(?<=\[ ).*?(?= \])
Explanation
(?= \]) positive lookbehind to assert that what is before is [
.*? match any character zero or more times non greedy
(?= \]) positive lookahead to assert that what follows is ]
For example:
String regex = "(?<=\\[ ).*?(?= \\])";
String string = "[ d6a413f4-059c-11e8-ba89-0ed5f89f718b ]";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Java example output

Using regex
\[ ([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}) ]
Regex101
Why you don't want to do this
If you know that your string will definitely have the right format then you can just use substring to get the UUID
class Main {
public static void main(String... args) {
String s = "[ d6a413f4-059c-11e8-ba89-0ed5f89f718b ]";
System.out.println(s.substring(2, s.length()-2));
}
}
Try it online!
This will be faster than using the regex option.

Regex to check if given String contains valid UUID:
"\\[ ([a-f0-9]{8}\\-(?:[a-f0-9]{4}\\-){3}[a-f0-9]{12}) \\]"
So, what is going on in this regex:
\\[ - character ‘[‘ and whitespace after it
[a-f0-9]{8} – characters from ‘a’ to ‘f’ and from ‘0’ to ‘9’ exactly eight times (123e5670 part)
\\- - ‘-‘ character
(?:[a-f0-9]{4}\\-){3} – non-capturing group that you want to be present exactly three times (this non-capturing group should contain exactly 4 characters that are in the range from ‘a’ to ‘f’ or from ‘0’ to ‘9’. After these 4 characters there must be present ‘-‘ character) (a234-b234-c234- part)
[a-f0-9]{12} - characters from ‘a’ to ‘f’ and from ‘0’ to ‘9’ exactly twelve times (d23456789012 part)
\\] – whitespace and ‘]’ character
After searching String for match with find() method, you only print capturing group #1 with group(1) method ( capturing group #1 is contained in parenthesis () )
Your UUID is in capture group 1. Here is a simple example how you can get UUID from source String:
String source = "[ 123e5670-a234-b234-c234-d23456789012 ]";
Pattern p = Pattern.compile("\\[ ([a-f0-9]{8}\\-(?:[a-f0-9]{4}\\-){3}[a-f0-9]{12}) \\]");
Matcher m = p.matcher(source);
if(m.find()) {
System.out.println( m.group(1));
}

Regex that allows only single separators between words

I need to construct a regular expression such that it should not allow / at the start or end, and there should not be more than one / in sequence.
Valid Expression is: AB/CD
Valid Expression :AB
Invalid Expression: //AB//CD//
Invalid Expression: ///////
Invalid Expression: AB////////
The / character is just a separator between two words. Its length should not be more than one between words.

Assuming you only want to allow alphanumerics (including underscore) between slashes, it's pretty trivial:
boolean foundMatch = subject.matches("\\w+(?:/\\w+)*");
Explanation:
\w+ # Match one or more alnum characters
(?: # Start a non-capturing group
/ # Match a single slash
\w+ # Match one or more alnum characters
)* # Match that group any number of times

This regex does it:
^(?!/)(?!.*//).*[^/]$
So in java:
if (str.matches("(?!/)(?!.*//).*[^/]"))
Note that ^ and $ are implied by matches(), because matches must match the whole string to be true.

[a-zA-Z]+(/[a-zA-Z]+)+
It matches
a/b
a/b/c
aa/vv/cc
doesn't matches
a
/a/b
a//b
a/b/
Demo
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Reg {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("[a-zA-Z]+(/[a-zA-Z]+)+");
Matcher matcher = pattern.matcher("a/b/c");
System.out.println(matcher.matches());
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java regex repeating capture groups - java

Related

Find duplicate char sequences in String by regex in Java

How do I write a multi-regex line?

How to find words having given letter using Java Regex

java regular expression to extract uuid within square brackets

Regex that allows only single separators between words

Categories

Resources