Let's say I have the following string
['json.key']
I want a regex pattern that will match the entire string because it contains the matching closing '] to the opening ['.
But sometimes the [' and '] don't have to exist, and it should be okay too.
jsonKey
But I don't want strings like these to match
['jsonKey
jsonKey']
Because they are missing the matching [' and '].
The current regex pattern I have for this is
(\[')?[\w-]+('])?
But this doesn't quite work because it lets the two last cases pass.
I need a regex pattern for Java and JavaScript code. But they are separate modules, it could be different patterns.
In Java or Javascript you can use alternation and look arounds like this:
(?<!\S)(?:\['[\w-]+']|[\w-]+)(?!\S)
RegEx Demo
RegEx Details:
(?<!\S): Assert that previous char is not a non-whitespace
(?:: Start non-capture group
\['[\w-]+']: Match ['<1+ word char>']
|: OR
[\w-]+: Match 1+ of word char or hyphen
): End non-capture group
(?!\S): Assert that next char is not a non-whitespace
I have a string that needs to be extracted using regex. It’s preferable that only a single regex is used. As it’s used in a loop with 9 pre-existing Regex’s.(Ie, so i can just add it to the ArrayList of available regex's)
The pattern of strings will always be
Between {4,8} A-Z0-9. Followed by either,
[A-Z]{1} or [A-Z0-9]{2} or, another [A-Z0-9]{4,8}
For example:
“A1B1C1 ABCD E FGHI JK X0Y0Z0”
I’d want this to return four matches.
A1B1C1 & ABCD E & FGHI JK & X0Y0Z0
I've been trying to match the first part of {4,8} chatactures, followed by a non-greedy match for {1,2}. For example(s):
[A-Z0-9]{4,8}(\\s{1}[A-Z0-9]{1,2})*? && [A-Z0-9]{4,8}(\\s{1}[A-Z]{1}|\\s{1}[A-Z0-9]{2})*?
But this never returns more than the first {4,8} charactures.
You could use an optional part with a word boundary and an alternation to match either [A-Z0-9]{2} or [A-Z]
\b[A-Z0-9]{4,8}(?:\h+(?:[A-Z0-9]{2}|[A-Z]))?\b
\b Word boundary
[A-Z0-9]{4,8} Match 4 - 8 times A-Z0-9
(?: Non capture group
\h+ Match 1+ horizontal whitespace chars
(?:[A-Z0-9]{2}|[A-Z]) Match either 2 x A-Z0-9 or 1 x A-Z
)? Close non capture group and make it optional
\b Word boundary
Regex demo | Java demo
In Java
String regex = "\\b[A-Z0-9]{4,8}(?:\\h+(?:[A-Z0-9]{2}|[A-Z]))?\\b";
I have the following String to be split:
Given String:
[PSR__123456_A,[AgrID=123456,PoolID=A],,Auto,Bank,0,0],[PSR__123456_A,[AgrID=123456,PoolID=A],,Auto,Bank,0,0],[PSR_Net__123456_A,[AgrID=123456,PoolID=A],,Suppress_Collateral,Bank,0,0]
Expected Results: (3 elements)
[PSR__123456_A,[AgrID=123456,PoolID=A],,Auto,Bank,0,0]
[PSR__123456_A,[AgrID=123456,PoolID=A],,Auto,Bank,0,0]
[PSR_Net__123456_A,[AgrID=123456,PoolID=A],,Suppress_Collateral,Bank,0,0]
I have tried the following regular expressions to parse/split the above string:
",(?![^[]*[]])" or ",(?=(((?!]).)*\[)|[^\[\]]*$)"
but still I cannot achieve the expected results, but rather it gives me the following results (6 elements) instead:
[PSR__123456_A
[AgrID=123456,PoolID=A],,Auto,Bank,0,0]
[PSR__123456_A
[AgrID=123456,PoolID=A],,Auto,Bank,0,0]
[PSR_Net__123456_A
[AgrID=123456,PoolID=A],,Suppress_Collateral,Bank,0,0]
Is there a way to do this in Java (RegEx) without splitting the String character by character?
If you want to select the comma when and what is at the right should be 2 times an opening and 2 times a closing square bracket, you might use:
,(?=\[[^[]*\[[^[]*\][^]]*\])
In Java:
String regex = ",(?=\\[[^\\[]*\\[[^\\[]*\\][^]]*\\])";
See the Regex demo | Java demo
That will match:
, Match comma
(?= Positive lookahead
\[[^[]*\[[^[]*\][^]]+\] matches:
\[ Match [
[^[]* Negated character class not matching [
\[ Match [
[^[]* Negated character class not matching [
\] Match ]
[^]]* Negated character class not matching ]
\] Match ]
) Close positive lookahead
Assuming that your first elements start with [PSR, then you can use a regex with positive lookahead like this:
,(?=\[PSR)
Working demo
With \n as replacement string
Update: as Manish described in his comment, you can actually use ],[ with ]\n[ as replacement string
Working demo
I want to check if a string consists of letters and digits only, and allow a - separator:
^[\w\d-]*$
Valid: TEST-TEST123
Now I want to check that the separator occurs only once at a time. Thus the following examples should be invalid:
Invalid: TEST--TEST, TEST------TEST, TEST-TEST--TEST.
Question: how can I restrict the repeated occurrence of the a character?
You may use
^(?:[a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)*)?$
Or, in Java, you may use an alphanumeric \p{Alnum} character class to denote letters and digits:
^(?:\p{Alnum}+(?:-\p{Alnum}+)*)?$
See the regex demo
Details
^ - start of the string
(?: - start of an optional non-capturing group (it will ensure the pattern matches an empty string, if you do not need it, remove this group!)
\p{Alnum}+ - 1 or more letters or digits
(?:-\p{Alnum}+)* - zero or more repetitions of
- - a hyphen
\p{Alnum}+ - 1 or more letters or digits
)? - end of the optional non-capturing group
$ - end of string.
In code, you do not need the ^ and $ anchors if you use the pattern in the matches method since it anchors the match by default:
Boolean valid = s.matches("(?:\\p{Alnum}+(?:-\\p{Alnum}+)*)?");
I am working on a regular expression where the pattern is:
1.0.0[ - optional description]/1.0.0.0[ - optional description].txt
The [ - optional description] part is of course, optional. So some possible VALID values are
1.0.0/1.0.0.0.txt
1.0.0/1.0.0.0 - xyz.txt
1.0.0 - abc/1.0.0.0 - xyz.txt
1.0.0 - abc/1.0.0.0.txt
To be a little more robust in the pattern matching, I'd like to match zero or more spaces before and after the "-" character. So all these would be valid too.
1.0.0 - abc/1.0.0.0 - xyz.txt
1.0.0-abc/1.0.0.0-xyz.txt
1.0.0 -abc/1.0.0.0- xyz.txt
To do this matching, I have the following regular expression (Java code):
String part1 = "((\\d+.{1}\\d+.{1}\\d+)(\\s*-\\s*(.+))?)";
String part2 = "((\\d+.{1}\\d+.{1}\\d+.{1}\\d+)(\\s*-\\s*(.+))?\\.sql)";
pattern = Pattern.compile(part1+ "/" + part2);
So far this regular expression is working well. But while unit testing I found a case I can't quite figure out yet. The use case is if the string contains the "-" character is surrounded by 1 or more spaces, but there is no description after the "-" character. This would look like:
1.0.0 - /1.0.0.0.txt
1.0.0- /1.0.0.0-xyz.txt
In these cases, I want the pattern match to FAIL. But with my current regular expression the match succeeds. I think what I want is if there is a "-" character surrounded by any number of spaces like " - " then there must also be at least 1 non-space character following it. But I can't quite figure out the regex for this.
Thanks!
Something like,
^\d+\.\d+\.\d+(?:\s*-\s*\w+)?\/\d+\.\d+\.\d+\.\d+(?:\s*-\s*\w+)?.txt$
Or you can combine the \.\d+ repetitions as
^\d+(?:\.\d+){2}(?:\s*-\s*\w+)?\/\d+(?:\.\d+){3}(?:\s*-\s*\w+)?.txt$
Regex Demo
Changes
.{1} When you want to repeat something once, no need for {}. Its implicit
(?:\s*-\s*\w+) Matches zero or more space (\s*) followed by -, another space and then \w+ a description of length greater than 1
The ? at the end of this patterns makes this optional.
This same pattern is repeated again at the end to match the second part.
^ Anchors the regex at the start of the string.
$ Anchors the regex at the end of the string. These two are necessary so that there is nothing other in the string.
Don't group the patterns using () unless it is necessary to capture them. This can lead to wastage of memory. Use (?:..) If you want to group patterns but not capture them
In the group that matches the optional part, you need to replace .+ with \\S+ where \S means any non-whitespace character. This enforces the optional part to include non-whitespace character in order to match the pattern:
String part1
= "((\\d+\\.\\d+\\.\\d+)(\\s*-\\s*(\\S+))?)";
String part2
= "((\\d+\\.\\d+\\.\\d+.{1}\\d+)(\\s*-\\s*(\\S+))?\\.txt)";
Also note that .{1} (which is the same as just .) matches any character. From the examples, you want to match a dot, so it should be replaced with \.
Something like
^\d+\.\d+\.\d+(?:\s*-\s*[^\/\s]+)?\/\d+\.\d+\.\d+\.\d+?(?:\s*-\s*[^.\s]+)?\.\w+$
Check it out here at regex101.