Match text with possible brackets between brackets - java

I need to match text between ${ and }
Example:
${I need to match this text}
Simple regex \$\\{(.+?)\\} will work fine until I place some of } inside the text
Curly brackets are paired inside the text to match.
Is there any possibility to solve this by means of Regular Expressions?

\$\{((?:\{[^\{\}]*\}|[^\{\}]*)*)\}
If we meet an opening bracket, we look for its pair, and after the closing one we proceed as usual. This can't handle more than one level of nested brackets.
The main building block here in [^\{\}]* - any non-bracket sequence. It can be surrounded by brackets \{[^\{\}]*\} but it might be not (?:\{[^\{\}]*\}|[^\{\}]*). Any count of these sequences can be present, hence * at the end.
Any level of nesting might require a recursive regex, not supported by Java. But any fixed amount can be matched by carefully extending this idea.

Add a $ to end of the ReGex and don't escape it. The dollar sign means it'll check for the previous letter or symbol at the very end.
ReGex: \${(.+?)}$
Java Formatted: \\${(.+?)}$

Related

A regex that will capture all text after a colon but ends with a comma - within can be nested brackets, commas etc

I am trying to redact something that log4j is overriding in general. To do this, I am trying to change a regex to ensure it captures what I need...an example...
"definition":{"schema":{"columns":[{"dataType":"INT","name":"column_a","description":"description1"},{"dataType":"INT","name":"column_b","description":"description2"}]}}}, "some other stuff": ["SOME_STUFF"], etc.
Hoping to capture just...
{"schema":{"columns":[{"dataType":"INT","name":"column_a","description":"*** REDACTED ***"},{"dataType":"INT","name":"column_b","description":"description"}]}}}
I have this...
(?<=("definition":{))(\\.|[^\\])*?(?=}})
Where if I keep adding a } at the end it will keep highlighting what I need. The problem is that there is no set number of nested elements in the list.
Is there anyway to adjust the above so I can capture everything within the outer brackets?
If you don't have other brackets after the last one you're trying to match, this regex should work for you:
(?<=\"definition\":)\{.*\}(?:\})
The main difference is moving the brackets from the lookarounds to the matching part.
Check the demo here.
This regex should work for you if you cannot use a proper JSON parser:
(?<=\"definition\":).+?\}(?=,\h*\")
RegEx Demo
Breakdown:
(?<=\"definition\":): Lookbehind condition to make sure we have "definition": before the current position
.+?\}: Match 1+ of any characters ending with }
(?=,\h*\"): Lookahead to assert that we have a comma then 0 or more spaces followed by a " ahead of the current position
In Java use this regex declaration:
String regex = "(?<=\"definition\":).+?\\}(?=,\\h*\")";

Match text between characters (avoid nesting)

Given:
"abc{defghij{kl}mnopqrst}uvwxyz{aaaaaaaaaa}"
I want to match the text between the characters { and the last } excluding nesting - i.e. the texts {defghij{kl}mnopqrst} and {aaaaaaaaaa}.
Without the nested {kl}, the regex expression \{[^{}]*\} works just fine. But not with the nested {kl}.
Is there a way to do this? If not possible, can I say "match text between { and } where the size of the enclosed text is at least, e.g. 3, characters so that the nested {kl} which contains two characters is not matched? (I'm assuming one level nesting)
Editor: https://www.freeformatter.com/java-regex-tester.html
Thanks,
In your problem since nesting levels will not reach two, it is possible to solve it with a readable, short regex and that would be:
\{(?:\{[^{}]*}|[^{}]+)*}
In Java you have to escape opening braces, as I did.
Above regex matches an opening brace then looks for either something other than { and } characters (i.e. [^{}]+) or something enclosed in braces {[^{}]*} and repeats this pattern as much as possible then expects to match a closing brace.
See live demo here

Any suggestions to match and extract the pattern?

I want to match something like this
$(string).not(string).not(string)
The not(string) can repeat zero or more times, after $(string).
Note that the string can be whatever things, except nested not(string).
I used the regular expression (\\$\\((.*)\\))((\\.not\\((.*?)\\))*?)(?!(\\.not)), I think the *? is to non-greedily match any number of sequence of not(string), and use the lookahead to stop the match that is not not(string), so that I can extract only the part that I want.
However, when I tested on the input like
$(string).not(string).not(string).append(string)
the group(0) returns the whole string, which I only need $(string).not(string).not(string).
Obviously I still miss something or misuse of anything, any suggestions?
Try this one (escaped for java):
(\\$\\(string\\)(?:(?:\\.not\(.*?\\))+))
It should capture just the part that you are after. You can test it out (unescaped for java though)
If we assume that parenthesis are not nested, you can write something like this:
string p = "\\$\\([^)]*\\)(?:\\.not\\([^)]*\\))*";
Not need to add a lookahead since the non-capturing group has a greedy quantifier (so the group is repeated as possible).
if what you called string in your question may be a quoted string with parenthesis inside like in Pshemo example: $(string).not(".not(foo)").not(string), you can replace each [^)]* with (?:\\s*\"[^\"]*\"\\s*|[^)]*) to ignore characters inside quoted parts.
From here, "group zero denotes the entire pattern". Use group(1).
(\$\([\w ]+\))(\.not\([\w ]+\))*
This will also work, it would give you two groups, One consisting of the word with $ sign, another would give you the set of all ".not" strings.
Please note: You might have to add escape characters for java.

Regular expressions Groups java

Hi I am writing regular expression:I need get info from this line
;row:1;field:2;name:3
My regular expression for this line is
.*(row:([0-9]{1,}))?;.*field:([0-9]{1,});.*name:([0-9]{1,});
But the problem that the word row is optional,I can also get the line without word row,but in this case my regular expressions does not work,how I can write it?
Thanks.
If you are just trying to make the word row optional, you can surround it with parenthesis and use the question mark to say you want zero or one of the parenthesis contents. I have shown below:
(row:)?
This will find zero or one of "row:". I would recommend also adding the ?: operator inside the open parenthesis if you are not trying to creating match groups:
(?:row:)?
Adding the ?: operator tells the program that the parenthesis is not part of a match group, and you are using the parenthesis just to group the letters. So putting this into your regular expression looks like this:
.*((?:row:)?([0-9]{1,}))?;.*field:([0-9]{1,});.*name:([0-9]{1,});
If you want this applied to "field:" and "name:", then you would get this:
.*((?:row:)?([0-9]{1,}))?;.*(?:field:)?([0-9]{1,});.*(?:name:)?([0-9]{1,});
If you do not want the other parenthesis in your expression to create match groups, I would also change them. Also, to shorten the expression a little and make it cleaner I would change [0-9] to \d which means any digit, and I would replace {1,} with + which means "one or more", and remove the unnecessary parens which results in this:
.*(?:(?:row:)?\d+)?;.*(?:field:)?\d+;.*(?:name:)?\d+;
I'm not really sure that this is the final expression that you want, since your question was not very descriptive, but I have shown you how to find one or more of "row:" and clean up the expression a little without changing its meaning.

Regex why does negative lookahead not work when there are two groups here

when I tried this regex
\"(\S\S+)\"(?!;c)
on this string "MM:";d it comes as matched as I wanted
and on this string "MM:";c it comes as not matched as desired.
But when I add a second group, by moving the semicolon inside that group and making it optional using |
\"(\S\S+)\"(;|)(?!c)
for this string "MM:";c it comes as matched when I expected it to not like before.
I tried this on Java and then on Javascript using Regex tool debuggex:
This link contains a snippet of the above
What am I doing wrong?
note the | is so it is not necessary to have a semicolon.Also in the examples I put c, it is just a substitute in the example for a word, that's why I am using negative lookahead.
After following Holgers response of using Possessive Quantifiers,
\"(\S\S+)\";?+(?!c)
it worked, here is a link to it on RegexPlanet
I believe that the regex will do what it can to find a match; since your expression said the semicolon could be optional, it found that it could match the entire expression (since if the semicolon is not consumed by the first group, it becomes a "no-match" for the negative lookahead. This has to do with the recursive way that regex works: it keeps trying to find a match...
In other words, the process goes like this:
MM:" - matched
(;|) - try semicolon? matched
(?!c) - oops - negative lookahead fails. No match. Go back
(;|) - try nothing. We still have ';c' left to match
(?!c) - negative lookahead not matched. We have a match
An update (based on your comment). The following code may work better:
\"(\S\S+)\"(;|)((?!c)|(?!;c))
Debuggex Demo
The problem is that you don’t want to make the semicolon optional in the sense of regular expression. An optional semicolon implies that the matcher is allowed to try both, matching with or without it. So even if the semicolon is there the matcher can ignore it creating an empty match for the group letting the lookahead succeed.
But you want to consume the semicolon if it’s there, so it is not allowed to be used to satisfy the negative look-ahead. With Java’s regex engine that’s pretty easy: use ;?+
This is called a “possessive quantifier”. Like with the ? the semicolon doesn’t need to be there but if it’s there it must match and cannot be ignored. So the regex engine has no alternatives any more.
So the entire pattern looks like \"(\S\S+)\";?+(?!c) or \"(\S\S+)\"(;?+)(?!c) if you need the semicolon in a group.

Categories