Hi I am writing regular expression:I need get info from this line
;row:1;field:2;name:3
My regular expression for this line is
.*(row:([0-9]{1,}))?;.*field:([0-9]{1,});.*name:([0-9]{1,});
But the problem that the word row is optional,I can also get the line without word row,but in this case my regular expressions does not work,how I can write it?
Thanks.
If you are just trying to make the word row optional, you can surround it with parenthesis and use the question mark to say you want zero or one of the parenthesis contents. I have shown below:
(row:)?
This will find zero or one of "row:". I would recommend also adding the ?: operator inside the open parenthesis if you are not trying to creating match groups:
(?:row:)?
Adding the ?: operator tells the program that the parenthesis is not part of a match group, and you are using the parenthesis just to group the letters. So putting this into your regular expression looks like this:
.*((?:row:)?([0-9]{1,}))?;.*field:([0-9]{1,});.*name:([0-9]{1,});
If you want this applied to "field:" and "name:", then you would get this:
.*((?:row:)?([0-9]{1,}))?;.*(?:field:)?([0-9]{1,});.*(?:name:)?([0-9]{1,});
If you do not want the other parenthesis in your expression to create match groups, I would also change them. Also, to shorten the expression a little and make it cleaner I would change [0-9] to \d which means any digit, and I would replace {1,} with + which means "one or more", and remove the unnecessary parens which results in this:
.*(?:(?:row:)?\d+)?;.*(?:field:)?\d+;.*(?:name:)?\d+;
I'm not really sure that this is the final expression that you want, since your question was not very descriptive, but I have shown you how to find one or more of "row:" and clean up the expression a little without changing its meaning.
Related
Hello
I'm trying to create a validation rule that checks the regular expression to accept only specific phrases. Regex is based on Java.
Here are examples of correct inputs:
1OR2
2
1 OR 2 OR 15
( 2OR3) AND 1
(12AND13 AND1)OR(4 AND5)
((2AND3 AND 1)OR(4AND5))AND6
but I would be happy if only the regex could accept anything like :
())34AND(4
I have no idea how to create a regex to check if the brackets open and close correctly(they can be nested). I assumed it can be impossible to check it in regex so the proper validation for the brackets I've already made in the code(stack implementation). In the code I have a second step validation of the phrase.
All I need the regex to do is to check if there are these specific things inside the phrase:
numbers, round brackets, words AND and OR with multiple occurrences and whitespaces are allowed.
It should NOT accept letters or other characters.
All I managed to create so far is this:
^[0-9 \\(][0-9 \\(\\)]*
also tried adding something like:
\\b(AND|OR)\\b
inside the second pair of brackets but with no luck.
I cannot figure out how to correct it to add OR and AND words.
I used the following and matched all the inputs you gave:
^[^\)][0-9 \( (AND|OR)]*$
I assumed you didn't want to start with ), which is why I included ^[^\)].
In case you weren't aware, I use https://www.regexpal.com to check my regular expressions for code.
Since you have an arbitrary number of nested elements it's arguably not possible with regex.
For demonstration purposes only, this matches zero or more conjunctions and one set of parenthesis:
^\d+(\s*(?:AND|OR)\s*(\d+|\(?\s*\d+(\s*(?:AND|OR)\s*\d+)\s*\)))*$|^(\d+|\(?\s*\d+(\s*(?:AND|OR)\s*\d+)\s*\))\s*(\s*(?:AND|OR)\s*\d+)*$
That's it. Adding more sets and levels of nested parenthesis leads to exponentially increasing complexity - till it breaks altogether.
Demo
Assume I have the following string.
create or replace package test as
-- begin null; end;/
end;
/
I want a regular expression that will find the semicolon not preceded by a set of "--" double dashes on the same line. I'm using the following pattern "(?!--.*);" and I'm still getting matches for the two semicolons on the 2nd line.
I feel like I'm missing something about negative look aheads but I can't figure out what.
If you want to match semicolons only on the lines which do not start with --, this regex should do the trick:
^(?!--).*(;)
Example
I only made a few changes from your regex:
Multi-line mode, so we can use ^ and $ and search by line
^ at the beginning to indicate start of a line
.* between the negative lookahead and the semicolon, because otherwise with the first change it would try to match something like ^;, which is wrong
(I also added parentheses around the semicolon so the demo page displays the result more clearly, but this is not necessary and you can change to whatever is most convenient for your program.)
First of all, what you need is a negative lookbehind (?<!) and not a negative lookahead (?!) since you want to check what's behind your potential match.
Even with that, you won't be able to use the negative lookbehind in your case since the Java's regex engine does not support variable length lookbehind. This means that you need to know exactly how many characters to look behind your potential match for it to work.
With that said, wouldn't be simpler in your case to just split your String by linefeed/carriage return and then remove the line that start with "--"?
The reason "(?!--.*);" isn't working is because the negative look ahead is asserting that when positioned before a ; that the next two chars are --, which of course matches every time (; is always not --).
In java, to match a ; that doesn't have -- anywhere before it:
"\\G(((?<!--)[^;])*);"
To see this in action using a replaceAll() call:
String s = "foo; -- begin null; end;";
s = s.replaceAll("\\G(((?<!--)[^;])*);", "$1!");
System.out.println(s);
Output:
foo! -- begin null; end;
Showing that only semi colons before a double dash are matched.
I want to match something like this
$(string).not(string).not(string)
The not(string) can repeat zero or more times, after $(string).
Note that the string can be whatever things, except nested not(string).
I used the regular expression (\\$\\((.*)\\))((\\.not\\((.*?)\\))*?)(?!(\\.not)), I think the *? is to non-greedily match any number of sequence of not(string), and use the lookahead to stop the match that is not not(string), so that I can extract only the part that I want.
However, when I tested on the input like
$(string).not(string).not(string).append(string)
the group(0) returns the whole string, which I only need $(string).not(string).not(string).
Obviously I still miss something or misuse of anything, any suggestions?
Try this one (escaped for java):
(\\$\\(string\\)(?:(?:\\.not\(.*?\\))+))
It should capture just the part that you are after. You can test it out (unescaped for java though)
If we assume that parenthesis are not nested, you can write something like this:
string p = "\\$\\([^)]*\\)(?:\\.not\\([^)]*\\))*";
Not need to add a lookahead since the non-capturing group has a greedy quantifier (so the group is repeated as possible).
if what you called string in your question may be a quoted string with parenthesis inside like in Pshemo example: $(string).not(".not(foo)").not(string), you can replace each [^)]* with (?:\\s*\"[^\"]*\"\\s*|[^)]*) to ignore characters inside quoted parts.
From here, "group zero denotes the entire pattern". Use group(1).
(\$\([\w ]+\))(\.not\([\w ]+\))*
This will also work, it would give you two groups, One consisting of the word with $ sign, another would give you the set of all ".not" strings.
Please note: You might have to add escape characters for java.
I need to match text between ${ and }
Example:
${I need to match this text}
Simple regex \$\\{(.+?)\\} will work fine until I place some of } inside the text
Curly brackets are paired inside the text to match.
Is there any possibility to solve this by means of Regular Expressions?
\$\{((?:\{[^\{\}]*\}|[^\{\}]*)*)\}
If we meet an opening bracket, we look for its pair, and after the closing one we proceed as usual. This can't handle more than one level of nested brackets.
The main building block here in [^\{\}]* - any non-bracket sequence. It can be surrounded by brackets \{[^\{\}]*\} but it might be not (?:\{[^\{\}]*\}|[^\{\}]*). Any count of these sequences can be present, hence * at the end.
Any level of nesting might require a recursive regex, not supported by Java. But any fixed amount can be matched by carefully extending this idea.
Add a $ to end of the ReGex and don't escape it. The dollar sign means it'll check for the previous letter or symbol at the very end.
ReGex: \${(.+?)}$
Java Formatted: \\${(.+?)}$
when I tried this regex
\"(\S\S+)\"(?!;c)
on this string "MM:";d it comes as matched as I wanted
and on this string "MM:";c it comes as not matched as desired.
But when I add a second group, by moving the semicolon inside that group and making it optional using |
\"(\S\S+)\"(;|)(?!c)
for this string "MM:";c it comes as matched when I expected it to not like before.
I tried this on Java and then on Javascript using Regex tool debuggex:
This link contains a snippet of the above
What am I doing wrong?
note the | is so it is not necessary to have a semicolon.Also in the examples I put c, it is just a substitute in the example for a word, that's why I am using negative lookahead.
After following Holgers response of using Possessive Quantifiers,
\"(\S\S+)\";?+(?!c)
it worked, here is a link to it on RegexPlanet
I believe that the regex will do what it can to find a match; since your expression said the semicolon could be optional, it found that it could match the entire expression (since if the semicolon is not consumed by the first group, it becomes a "no-match" for the negative lookahead. This has to do with the recursive way that regex works: it keeps trying to find a match...
In other words, the process goes like this:
MM:" - matched
(;|) - try semicolon? matched
(?!c) - oops - negative lookahead fails. No match. Go back
(;|) - try nothing. We still have ';c' left to match
(?!c) - negative lookahead not matched. We have a match
An update (based on your comment). The following code may work better:
\"(\S\S+)\"(;|)((?!c)|(?!;c))
Debuggex Demo
The problem is that you don’t want to make the semicolon optional in the sense of regular expression. An optional semicolon implies that the matcher is allowed to try both, matching with or without it. So even if the semicolon is there the matcher can ignore it creating an empty match for the group letting the lookahead succeed.
But you want to consume the semicolon if it’s there, so it is not allowed to be used to satisfy the negative look-ahead. With Java’s regex engine that’s pretty easy: use ;?+
This is called a “possessive quantifier”. Like with the ? the semicolon doesn’t need to be there but if it’s there it must match and cannot be ignored. So the regex engine has no alternatives any more.
So the entire pattern looks like \"(\S\S+)\";?+(?!c) or \"(\S\S+)\"(;?+)(?!c) if you need the semicolon in a group.