Word that matches ^.*(?=.*\\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$ - java

I am totally confused right now.
What is a word that matches: ^.*(?=.*\\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$
I tried at Regex 101 this 1Test#!. However that does not work.
I really appreciate your input!

What happens is that your regex seems to be in Java-flavor (Note the \\d)
that is why you have to convert it to work with regex101 which does not work with jave (only works with php, phyton, javascript)
see converted regex:
^.*(?=.*\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$
which will match your string 1Test#!. Demo here: http://regex101.com/r/gE3iQ9

You just want something that matches that regex?
Here:
a1a!

This pattern matches
\dTest#!
if u want a pattern which matches 1Test#! try this pattern
^.(?=.\d)(?=.[a-zA-Z])(?=.[!##$%^&]).*$

Your java string ^.*(?=.*\\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$ encodes the regexp expression ^.*(?=.*\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$.
This is because the \ is an escape sequence.
The latter matches the string you specified.
If your original string was a regexp, rather than a java string, it would match strings such as \dTest#!
Also you should consider removing the first .*, doing so would make the regexp more efficient. The reason is that regexp's by default are greedy. So it will start by matching the whole string to the initial .*, the lookahead will then fail. The regexp will backtrack, matchine the first .* to all but the last character, and will fail all but one of the loohaheads. This will proceed until it hits a point where the different lookaheads succeed. Dropping the first .*, putting the lookahead immidiately after the start of string anchor, will avoid this problem, and in this case the set of strings matched will be the same.

Related

Java Regex missing a match in the output

I am currently matching a string against a regular expression. My pattern is:
"(?<=\p{Alnum}|\p{Punct})(\p{Alnum}+\p{Punct}{1})"
I am matching it with the string:
"https://www.google.com/"
My desired result with the above regex and string is:
https:, www., google., com/
I am able to get all the matches successfully except 'https:' one. In that case it is giving out 'ttps:' instead of the required 'https:'
I am not able to understand where I went wrong. Can anyone please help me in figuring this out?
You can use
(?<![^\p{Alnum}\p{Punct}])(\p{Alnum}+\p{Punct})
See the online regex demo.
The (?<![^\p{Alnum}\p{Punct}]) negative lookbehind matches a location that is not immediately preceded by a char other than an alphanumeric and a punctuation char.
Note that your regex required an alphanumeric or punctuation char immediately on the left, so it was impossible to match the start of string position.
Note that {1} is always redundant, you can see more about regex redundancy in the "Writing cleaner regular expressions" YT video of mine.

regular expression to remove a string conditionally

I want to remove dd/ or /dd/ or /dd but then if it's /dd/ I want to replace it with / so that it looks like MM/YYYY.
dd/MM/YYYY
MM/dd/YYYY
MM/YYYY/dd
[^\p{Alpha}]*d+[^\p{Alpha}]*
The above is my current regex.
What I want to achieve is either,
MM/YYYY
YYYY/MM
Cause right now, if I replace /dd/, it results in
MMYYYY
or
YYYYMM
The best one in terms of understandability is to spell out the three options as alternatives:
/dd(?=/)|^dd/|/dd$
That is:
/dd(?=/) the string "/dd" anywhere in the text followed by (positive lookahead) a "/"
or ^dd/ the string "dd/" at the beginning of the text
or /dd$ the string "/dd" at the end of the text
The first alternative is written with a lookahead for the ending slash after "/dd" so that this slash is not consumed, and left in the string so that "MM/dd/YYYY" keeps one slash in the middle.
I think what you're looking for are the Pattern and Matcher classes in Java. Attempt to find a match for your regular expression, and for each match call the replaceAll() or replaceOne() function.
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
You can achieve this using the regex: dd/|/dd.
Not sure though by d you meant digits or just d.
The regex you have there is more general and matches much more than required.

Regex why does negative lookahead not work when there are two groups here

when I tried this regex
\"(\S\S+)\"(?!;c)
on this string "MM:";d it comes as matched as I wanted
and on this string "MM:";c it comes as not matched as desired.
But when I add a second group, by moving the semicolon inside that group and making it optional using |
\"(\S\S+)\"(;|)(?!c)
for this string "MM:";c it comes as matched when I expected it to not like before.
I tried this on Java and then on Javascript using Regex tool debuggex:
This link contains a snippet of the above
What am I doing wrong?
note the | is so it is not necessary to have a semicolon.Also in the examples I put c, it is just a substitute in the example for a word, that's why I am using negative lookahead.
After following Holgers response of using Possessive Quantifiers,
\"(\S\S+)\";?+(?!c)
it worked, here is a link to it on RegexPlanet
I believe that the regex will do what it can to find a match; since your expression said the semicolon could be optional, it found that it could match the entire expression (since if the semicolon is not consumed by the first group, it becomes a "no-match" for the negative lookahead. This has to do with the recursive way that regex works: it keeps trying to find a match...
In other words, the process goes like this:
MM:" - matched
(;|) - try semicolon? matched
(?!c) - oops - negative lookahead fails. No match. Go back
(;|) - try nothing. We still have ';c' left to match
(?!c) - negative lookahead not matched. We have a match
An update (based on your comment). The following code may work better:
\"(\S\S+)\"(;|)((?!c)|(?!;c))
Debuggex Demo
The problem is that you don’t want to make the semicolon optional in the sense of regular expression. An optional semicolon implies that the matcher is allowed to try both, matching with or without it. So even if the semicolon is there the matcher can ignore it creating an empty match for the group letting the lookahead succeed.
But you want to consume the semicolon if it’s there, so it is not allowed to be used to satisfy the negative look-ahead. With Java’s regex engine that’s pretty easy: use ;?+
This is called a “possessive quantifier”. Like with the ? the semicolon doesn’t need to be there but if it’s there it must match and cannot be ignored. So the regex engine has no alternatives any more.
So the entire pattern looks like \"(\S\S+)\";?+(?!c) or \"(\S\S+)\"(;?+)(?!c) if you need the semicolon in a group.

Regular Expression Issue in Java

I have searched everywhere and I cannot find what I am doing wrong.
I have this regular expression: ^(\[\[).+(\]\]) that I want to match for this data that starts just at the beginning of the line as shown below (I do not want to match anything but the things starting at the beginning of a line):
[[match this]] [[don't match this]]
{{Link GA|es}}
{{Link FA|ca}}
And for some reason it is not matching anything in Java (or other regex "testers" such as regexpal.com). By "in Java" i mean with the String.replaceAll(String regex, String replacement) method in the Java String API.
But, if I omit the ^ and just have (\[\[).+(\]\]) it matches fine at the beginning of the line, but also matches inline instances which I do not want.
Can anyone point out what the error is here? Thank you
^ means "start of string", not "start of line", unless you use the Pattern.MULTILINE (or (?m)) option when building the regex. Also, you should be using a lazy quantifier (as pointed out by Dave Newton in his comment).
Finally, don't forget to double the backslashes:
String result = subject.replaceAll("(?m)^\\[\\[.+?\\]\\]", "");
.+ is greedy, in that it will match everything it can (here, matching everything up to the last \]\]
To stop this behaviour just add a ? to make it non-greedy
^\[\[.+?\]\]
Will match [[ then look for any characters until it finds the first occurrence of ]]
(\[\[).+(\]\]){1}+ {1}+ that mean exactly one time's improve link

Regex (Java) to remove all characters up to but not including (a number or a letter a-f followed by a number)

I need help constructing the regular expression to remove all characters up to but not including (a number or a letter a-f followed by a number) in Java:
Here's what I came up with (doesn't work):
string.replaceFirst(".+?(\\d|[a-f]\\d)","");
That line of code replaces the entire string with an empty string.
.+? is every character up to \\d a digit OR [a-f]\\d any of the letters a-f followed by a digit.
This doesn't work, however, can I have some help?
Thanks
EDIT: changed replace with replaceFirst
First off, replace() acts on literals, not regexes. You should use replaceFirst or replaceAll depending on what you want. Your regex problem is that you're including the suffix as part of the string to replace. You can give this a try:
input.replaceFirst(".+?(\\d|[a-f]\\d)","$1")
Here I just include the suffix in the replacement string as well. The more correct approach is to make that a zero-width assertion so that it doesn't get included in the region to replace. You can use a positive lookahead:
input.replaceFirst(".+?(?=(\\d|[a-f]\\d))", "")
The other answers given here have the problem that if the string starts with a-f followed by a number, or just a number, they will actually match and replace the first character. Not sure if that's a relevant scenario. This more convoluted pattern should work though:
"([^a-f\\d]|([a-f](?!\\d)))+"
(that is, everything that's not a digit or a-f, or a-f not followed by a digit).
I'd suggest something along the lines of
string.replaceFirst(".*?(?=(\\d|[a-f]\\d))", "");
s = s.replaceFirst(".*?(?=[a-f]?\\d)", "");
Using .*? instead of .+? insures that the first character gets checked by the lookahead, solving the problem #johusman mentioned. And while your (\\d|[a-f]\\d) isn't causing a problem, [a-f]?\\d is both more efficient and more readable.

Categories