Regular Expression Issue in Java - java

I have searched everywhere and I cannot find what I am doing wrong.
I have this regular expression: ^(\[\[).+(\]\]) that I want to match for this data that starts just at the beginning of the line as shown below (I do not want to match anything but the things starting at the beginning of a line):
[[match this]] [[don't match this]]
{{Link GA|es}}
{{Link FA|ca}}
And for some reason it is not matching anything in Java (or other regex "testers" such as regexpal.com). By "in Java" i mean with the String.replaceAll(String regex, String replacement) method in the Java String API.
But, if I omit the ^ and just have (\[\[).+(\]\]) it matches fine at the beginning of the line, but also matches inline instances which I do not want.
Can anyone point out what the error is here? Thank you

^ means "start of string", not "start of line", unless you use the Pattern.MULTILINE (or (?m)) option when building the regex. Also, you should be using a lazy quantifier (as pointed out by Dave Newton in his comment).
Finally, don't forget to double the backslashes:
String result = subject.replaceAll("(?m)^\\[\\[.+?\\]\\]", "");

.+ is greedy, in that it will match everything it can (here, matching everything up to the last \]\]
To stop this behaviour just add a ? to make it non-greedy
^\[\[.+?\]\]
Will match [[ then look for any characters until it finds the first occurrence of ]]

(\[\[).+(\]\]){1}+ {1}+ that mean exactly one time's improve link

Related

Java Regex Match Pattern Groups unexpectedly matched [duplicate]

I am writing a regex that will be used for recognizing commands in a string. I have three possible words the commands could start with and they always end with a semi-colon.
I believe the regex pattern should look something like this:
(command1|command2|command3).+;
The problem, I have found, is that since . matches any character and + tells it to match one or more, it skips right over the first instance of a semi-colon and continues going.
Is there a way to get it to stop at the first instance of a semi-colon it comes across? Is there something other than . that I should be using instead?
The issue you are facing with this: (command1|command2|command3).+; is that the + is greedy, meaning that it will match everything till the last value.
To fix this, you will need to make it non-greedy, and to do that you need to add the ? operator, like so: (command1|command2|command3).+?;
Just as an FYI, the same applies for the * operator. Adding a ? will make it non greedy.
Tell it to find only non-semicolons.
[^;]+
What you are looking for is a non-greedy match.
.+?
The "?" after your greedy + quantifier will make it match as less as possible, instead of as much as possible, which it does by default.
Your regex would be
'(command1|command2|command3).+?;'
See Python RE documentation

Regex not matching when the start or end are empty

Here is my regex as I have inputted it into my java file.
String myRegex = "(?<=[^a-zA-Z0-9])(target)(?=[^a-zA-Z0-9])";
If I have a string as follows:
.target. - it works.
However, if I have a string that JUST says target it does not work. How can I modify the regex so that if there is nothing at the start or the end of the string, it still matches?
EDIT - Examples.
_target - Should succeed!
target_ - Should succeed!
target - Should succeed!
Currently these examples fail with the current regex.
Add "start of input" to your look behind and add "end of input" to your look ahead using a regex alternation (ie | which is a logical "or"):
String myRegex = "(?<=^|[^a-zA-Z0-9])target(?=[^a-zA-Z0-9]|$)";
The problem with your regex is that your look behind required there to be a preceding character that was not a letter/digit.
These look arounds also match start/end of input.
See live demo.
The problem is, there are two negatives happening here. My lookbehinds are can be negative, and my character classes can be negatives. Currently, my lookbehinds are positive and my character classes are negatives. So it's saying: "Look behind and make sure you find something that is not within these classes". So when you there is nothing there, it won't find it and will fail. The solution was to make my look behind negative and make the character classes positive. So now it's saying "Look behind and sure there ISN'T any of these characters". So if it is empty, it won't fail because it meets the condition.
This is the final regex:
String myRegex = "(?<![a-zA-Z0-9])target(?![a-zA-Z0-9])";
If I'm understanding your question correctly, instead of using the look ahead and look behind, you can just use the ? to indicate that there should be 0 or 1 non-alphabetical or numerical character before and after "target".
([^a-zA-Z0-9])?(target)([^a-zA-Z0-9])?
You should be able to match target using the * 0 or more quantifier to match any target which have 0 or more occurrences of the characters you want. So:
[_]*(target)[_]*
should match:
_target
target
target_
_target_
Add any element you want to be matched before or after the word to the brackets. Example to match .target. too:
[\._]*(target)[\._]*
This will match target substring no matter what part of the string it is. If you want to make the rule only for match at the start of the string then add the ^ anchor to it like:
^[\._]*(target)[\._]*
and will match the ones mentioned above only if they start the string.

Word that matches ^.*(?=.*\\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$

I am totally confused right now.
What is a word that matches: ^.*(?=.*\\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$
I tried at Regex 101 this 1Test#!. However that does not work.
I really appreciate your input!
What happens is that your regex seems to be in Java-flavor (Note the \\d)
that is why you have to convert it to work with regex101 which does not work with jave (only works with php, phyton, javascript)
see converted regex:
^.*(?=.*\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$
which will match your string 1Test#!. Demo here: http://regex101.com/r/gE3iQ9
You just want something that matches that regex?
Here:
a1a!
This pattern matches
\dTest#!
if u want a pattern which matches 1Test#! try this pattern
^.(?=.\d)(?=.[a-zA-Z])(?=.[!##$%^&]).*$
Your java string ^.*(?=.*\\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$ encodes the regexp expression ^.*(?=.*\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$.
This is because the \ is an escape sequence.
The latter matches the string you specified.
If your original string was a regexp, rather than a java string, it would match strings such as \dTest#!
Also you should consider removing the first .*, doing so would make the regexp more efficient. The reason is that regexp's by default are greedy. So it will start by matching the whole string to the initial .*, the lookahead will then fail. The regexp will backtrack, matchine the first .* to all but the last character, and will fail all but one of the loohaheads. This will proceed until it hits a point where the different lookaheads succeed. Dropping the first .*, putting the lookahead immidiately after the start of string anchor, will avoid this problem, and in this case the set of strings matched will be the same.

Regex why does negative lookahead not work when there are two groups here

when I tried this regex
\"(\S\S+)\"(?!;c)
on this string "MM:";d it comes as matched as I wanted
and on this string "MM:";c it comes as not matched as desired.
But when I add a second group, by moving the semicolon inside that group and making it optional using |
\"(\S\S+)\"(;|)(?!c)
for this string "MM:";c it comes as matched when I expected it to not like before.
I tried this on Java and then on Javascript using Regex tool debuggex:
This link contains a snippet of the above
What am I doing wrong?
note the | is so it is not necessary to have a semicolon.Also in the examples I put c, it is just a substitute in the example for a word, that's why I am using negative lookahead.
After following Holgers response of using Possessive Quantifiers,
\"(\S\S+)\";?+(?!c)
it worked, here is a link to it on RegexPlanet
I believe that the regex will do what it can to find a match; since your expression said the semicolon could be optional, it found that it could match the entire expression (since if the semicolon is not consumed by the first group, it becomes a "no-match" for the negative lookahead. This has to do with the recursive way that regex works: it keeps trying to find a match...
In other words, the process goes like this:
MM:" - matched
(;|) - try semicolon? matched
(?!c) - oops - negative lookahead fails. No match. Go back
(;|) - try nothing. We still have ';c' left to match
(?!c) - negative lookahead not matched. We have a match
An update (based on your comment). The following code may work better:
\"(\S\S+)\"(;|)((?!c)|(?!;c))
Debuggex Demo
The problem is that you don’t want to make the semicolon optional in the sense of regular expression. An optional semicolon implies that the matcher is allowed to try both, matching with or without it. So even if the semicolon is there the matcher can ignore it creating an empty match for the group letting the lookahead succeed.
But you want to consume the semicolon if it’s there, so it is not allowed to be used to satisfy the negative look-ahead. With Java’s regex engine that’s pretty easy: use ;?+
This is called a “possessive quantifier”. Like with the ? the semicolon doesn’t need to be there but if it’s there it must match and cannot be ignored. So the regex engine has no alternatives any more.
So the entire pattern looks like \"(\S\S+)\";?+(?!c) or \"(\S\S+)\"(;?+)(?!c) if you need the semicolon in a group.

Why doesn't this regex work as expected in Java?

trivial regex question (the answer is most probably Java-specific):
"#This is a comment in a file".matches("^#")
This returns false. As far as I can see, ^ means what it always means and # has no special meaning, so I'd translate ^# as "A '#' at the beginning of the string". Which should match. And so it does, in Perl:
perl -e "print '#This is a comment'=~/^#/;"
prints "1". So I'm pretty sure the answer is something Java specific. Would somebody please enlighten me?
Thank you.
Matcher.matches() checks to see if the entire input string is matched by the regex.
Since your regex only matches the very first character, it returns false.
You'll want to use Matcher.find() instead.
Granted, it can be a bit tricky to find the concrete specification, but it's there:
String.matches() is defined as doing the same thing as Pattern.matches(regex, str).
Pattern.matches() in turn is defined as Pattern.compile(regex).matcher(input).matches().
Pattern.compile() returns a Pattern.
Pattern.matcher() returns a Matcher
Matcher.matches() is documented like this (emphasis mine):
Attempts to match the entire region against the pattern.
The matches method matches your regex against the entire string.
So try adding a .* to match rest of the string.
"#This is a comment in a file".matches("^#.*")
which returns true. One can even drop all anchors(both start and end) from the regex and the match method will add it for us. So in the above case we could have also used "#.*" as the regex.
This should meet your expectations:
"#This is a comment in a file".matches("^#.*$")
Now the input String matches the pattern "First char shall be #, the rest shall be any char"
Following Joachims comment, the following is equivalent:
"#This is a comment in a file".matches("#.*")

Categories