Java Regex won't match, any explanations? - java

The regex String :
"[Ff][uU][Nn][Cc] "
Matches input:
"fUnC "
But not:
"func across( a, b )"
And I don't understand why...
I'm testing my expressions here:
http://www.regexplanet.com/simple/index.html
I figured out that I (dumbly) needed my regex to be "[Ff][uU][Nn][Cc] .*" for a match.
SOLVED: Don't use the convenience method Pattern.Matches(regex, input) if you are looking for what amounts to a submatch. You should use the Matcher.find() method instead.

When I use the regex tester you link to, I see that your regex works with find(), but not with matches(). This is what I would expect -- find() just looks for a regex hit within the target string, while matches() always tries to match the entire string.

"[Ff][uU][Nn][Cc].*" may help...

It can be.... it's working fine. But your strings in there and you'll see MATCHES is false, but replaceFirst and ReplaceAll work fine.
If you want MATCHES to be true
add a * at the end

Have you also tried using the regex tester, ignoring case? There should be a way to turn on case insensitivity in the Java regex matcher.

Related

String matches() not able to pick ^ [duplicate]

trivial regex question (the answer is most probably Java-specific):
"#This is a comment in a file".matches("^#")
This returns false. As far as I can see, ^ means what it always means and # has no special meaning, so I'd translate ^# as "A '#' at the beginning of the string". Which should match. And so it does, in Perl:
perl -e "print '#This is a comment'=~/^#/;"
prints "1". So I'm pretty sure the answer is something Java specific. Would somebody please enlighten me?
Thank you.
Matcher.matches() checks to see if the entire input string is matched by the regex.
Since your regex only matches the very first character, it returns false.
You'll want to use Matcher.find() instead.
Granted, it can be a bit tricky to find the concrete specification, but it's there:
String.matches() is defined as doing the same thing as Pattern.matches(regex, str).
Pattern.matches() in turn is defined as Pattern.compile(regex).matcher(input).matches().
Pattern.compile() returns a Pattern.
Pattern.matcher() returns a Matcher
Matcher.matches() is documented like this (emphasis mine):
Attempts to match the entire region against the pattern.
The matches method matches your regex against the entire string.
So try adding a .* to match rest of the string.
"#This is a comment in a file".matches("^#.*")
which returns true. One can even drop all anchors(both start and end) from the regex and the match method will add it for us. So in the above case we could have also used "#.*" as the regex.
This should meet your expectations:
"#This is a comment in a file".matches("^#.*$")
Now the input String matches the pattern "First char shall be #, the rest shall be any char"
Following Joachims comment, the following is equivalent:
"#This is a comment in a file".matches("#.*")

Word that matches ^.*(?=.*\\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$

I am totally confused right now.
What is a word that matches: ^.*(?=.*\\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$
I tried at Regex 101 this 1Test#!. However that does not work.
I really appreciate your input!
What happens is that your regex seems to be in Java-flavor (Note the \\d)
that is why you have to convert it to work with regex101 which does not work with jave (only works with php, phyton, javascript)
see converted regex:
^.*(?=.*\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$
which will match your string 1Test#!. Demo here: http://regex101.com/r/gE3iQ9
You just want something that matches that regex?
Here:
a1a!
This pattern matches
\dTest#!
if u want a pattern which matches 1Test#! try this pattern
^.(?=.\d)(?=.[a-zA-Z])(?=.[!##$%^&]).*$
Your java string ^.*(?=.*\\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$ encodes the regexp expression ^.*(?=.*\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$.
This is because the \ is an escape sequence.
The latter matches the string you specified.
If your original string was a regexp, rather than a java string, it would match strings such as \dTest#!
Also you should consider removing the first .*, doing so would make the regexp more efficient. The reason is that regexp's by default are greedy. So it will start by matching the whole string to the initial .*, the lookahead will then fail. The regexp will backtrack, matchine the first .* to all but the last character, and will fail all but one of the loohaheads. This will proceed until it hits a point where the different lookaheads succeed. Dropping the first .*, putting the lookahead immidiately after the start of string anchor, will avoid this problem, and in this case the set of strings matched will be the same.

string.matches(regex) returns false, although I think it should be true

I am working with Java regular expressions.
Oh, I really miss Perl!! Java regular expressions are so hard.
Anyway, below is my code.
oneLine = "{\"kind\":\"list\",\"items\"";
System.out.println(oneLine.matches("kind"));
I expected "true" to be shown on the screen, but I could only see "false".
What's wrong with the code? And how can I fix it?
Thank you in advance!!
String#matches() takes a regex as parameter, in which anchors are implicit. So, your regex pattern will be matched at the beginning till the end of the string.
Since your string does not start with "kind", so it returns false.
Now, as per your current problem, I think you don't need to use regex here. Simply using String#contains() method will work fine: -
oneLine.contains("kind");
Or, if you want to use matches, then build the regex to match complete string: -
oneLine.matches(".*kind.*");
The .matches method is intended to match the entire string. So you need something like:
.*kind.*
Demo: http://ideone.com/Gb5MQZ
Matches tries to match the whole string (implicit ^ and $ anchors), you want to use contains() to check for parts of the string.

Whitespace in Java's regular expression

I'm trying to write a regular expression to mach an IRC PRIVMSG string. It is something like:
:nick!name#some.host.com PRIVMSG #channel :message body
So i wrote the following code:
Pattern pattern = Pattern.compile("^:.*\\sPRIVMSG\\s#.*\\s:");
Matcher matcher = pattern.matcher(msg);
if(matcher.matches()) {
System.out.println(msg);
}
It does not work. I got no matches. When I test the regular expression using online javascript testers, I got matches.
I tried to find the reason, why it doesn't work and I found that there's something wrong with the whitespace symbol. The following pattern will give me some matches:
Pattern.compile("^:.*");
But the pattern with \s will not:
Pattern.compile("^:.*\\s");
It's confusing.
The java matches method strikes again! That method only returns true if the entire string matches the input. You didn't include anything that captures the message body after the second colon, so the entire string is not a match. It works in testers because 'normal' regex is a 'match' if any part of the input matches.
Pattern pattern = Pattern.compile("^:.*?\\sPRIVMSG\\s#.*?\\s:.*$");
Should match
If you look at the documentation for matches(), uou will notice that it is trying to match the entire string. You need to fix your regexp or use find() to iterate through the substring matches.

Why doesn't this regex work as expected in Java?

trivial regex question (the answer is most probably Java-specific):
"#This is a comment in a file".matches("^#")
This returns false. As far as I can see, ^ means what it always means and # has no special meaning, so I'd translate ^# as "A '#' at the beginning of the string". Which should match. And so it does, in Perl:
perl -e "print '#This is a comment'=~/^#/;"
prints "1". So I'm pretty sure the answer is something Java specific. Would somebody please enlighten me?
Thank you.
Matcher.matches() checks to see if the entire input string is matched by the regex.
Since your regex only matches the very first character, it returns false.
You'll want to use Matcher.find() instead.
Granted, it can be a bit tricky to find the concrete specification, but it's there:
String.matches() is defined as doing the same thing as Pattern.matches(regex, str).
Pattern.matches() in turn is defined as Pattern.compile(regex).matcher(input).matches().
Pattern.compile() returns a Pattern.
Pattern.matcher() returns a Matcher
Matcher.matches() is documented like this (emphasis mine):
Attempts to match the entire region against the pattern.
The matches method matches your regex against the entire string.
So try adding a .* to match rest of the string.
"#This is a comment in a file".matches("^#.*")
which returns true. One can even drop all anchors(both start and end) from the regex and the match method will add it for us. So in the above case we could have also used "#.*" as the regex.
This should meet your expectations:
"#This is a comment in a file".matches("^#.*$")
Now the input String matches the pattern "First char shall be #, the rest shall be any char"
Following Joachims comment, the following is equivalent:
"#This is a comment in a file".matches("#.*")

Categories