Whitespace in Java's regular expression - java

I'm trying to write a regular expression to mach an IRC PRIVMSG string. It is something like:
:nick!name#some.host.com PRIVMSG #channel :message body
So i wrote the following code:
Pattern pattern = Pattern.compile("^:.*\\sPRIVMSG\\s#.*\\s:");
Matcher matcher = pattern.matcher(msg);
if(matcher.matches()) {
System.out.println(msg);
}
It does not work. I got no matches. When I test the regular expression using online javascript testers, I got matches.
I tried to find the reason, why it doesn't work and I found that there's something wrong with the whitespace symbol. The following pattern will give me some matches:
Pattern.compile("^:.*");
But the pattern with \s will not:
Pattern.compile("^:.*\\s");
It's confusing.

The java matches method strikes again! That method only returns true if the entire string matches the input. You didn't include anything that captures the message body after the second colon, so the entire string is not a match. It works in testers because 'normal' regex is a 'match' if any part of the input matches.
Pattern pattern = Pattern.compile("^:.*?\\sPRIVMSG\\s#.*?\\s:.*$");
Should match

If you look at the documentation for matches(), uou will notice that it is trying to match the entire string. You need to fix your regexp or use find() to iterate through the substring matches.

Related

Regular Expression always returns false

I have a problem to get a regular expression to get work.
I use an XMLRPC Library to get information from an wiki.
so far so good.
After retrieving the data into a String Variable I would like to search through with a regular expression but the matcher will always return "false".
But if I asking the String ....contains("xyz"); the Answer is true.
The String looks something like this:
====== Datensicherheit ====== ''Kriterium von Sicherheit'' Typ: technisch Definition: \ //Allgemein.........
String regex = "Definition";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
System.out.println(matcher.matches());
Does anybody know what I'm doing wrong?
This is an issue with your regex expression. If you are wanting to know if the string contains "Definition", your regex needs to be:
String regex = ".*Definition.*";
Note that matches() returns true if, and only if, the entire region sequence matches this matcher's pattern. see the java doc # https://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#matches()
So, it will only be true if the entire "text" region matches "Definition", which is unlikely :).
Try find() instead which is true if, and only if, a subsequence of the input sequence starting at the given index matches this matcher's pattern.

regex to find matches in a multiline string in Java

I was trying use a regex to find some matches in a string in Java. The actual regex is
^(interface \X*!)
When i do it Java i use
^(interface \\X*!)
Now this throws Illegal/unsupported escape sequence near index 13. I searched the boards a little bit and found that it should actually be four backslashes to make it work. But if i use
^(interface \\\\X*!)
it returns no matches. Any pointers would be really helpful.
Just a sample match would be like
interface ABC
temp
abc
xyz
!
The \X construct comes from Perl, and the Javadoc for java.util.Pattern explicitly states in the section Comparison to Perl 5 that it is not supported.
In Java, you have to use a different construct. But this part is already answered in https://stackoverflow.com/a/39561579.
In order to match the pattern you identify in the comments, using Java, something like this should work:
Pattern p = Pattern.compile("interface[^!]*!", Pattern.DOTALL);
Matcher m = p.matcher("interface ABC\ntemp\nabc\nxyz\n!"); // your test string
if (m.matches()) {
//
}
This pattern matches any string beginning with "interface", followed by zero or more of any character except "!", followed by "!".
Pattern.DOTALL tells it that in addition to all other characters, "." should also match carriage returns and line feeds. See this for more info on DOTALL.

java regex - not able to retrieve contents from first square brackets

I am pretty new to regular expressions stuff.. I have this requirement of picking up contents in the first square brackets. For e.g. if I have the string like "PORT-OTEF_RA2/6 [Eh0001/001-06] [ignore, test port]",
I need the result as "Eh0001/001-06".
I am using following regular expression.
Pattern pattern =
Pattern.compile("^PORT.+\\[(.*?)\\]");
Matcher matcher =
pattern.matcher("PORT-OTEF_RA2/6 [Eh0001/001-06] [ignore, test port]");
if(matcher.find()){
System.out.println(matcher.group(1));
}
but I always get the contents of second square brackets.
However, if I give the regular expression as
Pattern.compile("\\[(.*?)\\]");
I get the required answer. But I need to make sure the string starts with "PORT". Can someone light me on where I am going wrong.
Use non-greedy regex after PORT:
^PORT.+?\\[(.*?)\\]
Otherwise .+ will be greedy and match till last [...] is found.
RegEx Demo

Why does this regex not match?

I'm sure this type of question gets posted a lot here. I have this regex:
^\[.*\]
which should match
[Test]Hi there
And according to RegexPal, it does. However, in this Java SCCE it doesn't:
final String pat = "^\\[.*\\]";
final String str = "[Test]Hi there";
System.out.println(pat);
System.out.println(str);
System.out.println(str.matches(pat));
Output:
^\[.*\]
[Test]Hi there
false
Why doesn't it match?
"match" in Java means "matches the whole string":
Attempts to match the entire region against the pattern.
Since your regex doesn't accept any characters after the last ] it will not "match" anything that has characters after the ].
You can use find to see if the string contains something that's matched by your regex (it will still have to be anchored at the beginning, since you use ^).
In other words ^\[.*\] will not match [Test]Hi there, but it will find [Test] within [Test]Hi there.
Because String#match will try to match your regex against the whole string. What you're looking for is Pattern.compile(pat).matcher(str).find(), see Matcher.

How to find the exact word using a regex in Java?

Consider the following code snippet:
String input = "Print this";
System.out.println(input.matches("\\bthis\\b"));
Output
false
What could be possibly wrong with this approach? If it is wrong, then what is the right solution to find the exact word match?
PS: I have found a variety of similar questions here but none of them provide the solution I am looking for.
Thanks in advance.
When you use the matches() method, it is trying to match the entire input. In your example, the input "Print this" doesn't match the pattern because the word "Print" isn't matched.
So you need to add something to the regex to match the initial part of the string, e.g.
.*\\bthis\\b
And if you want to allow extra text at the end of the line too:
.*\\bthis\\b.*
Alternatively, use a Matcher object and use Matcher.find() to find matches within the input string:
Pattern p = Pattern.compile("\\bthis\\b");
Matcher m = p.matcher("Print this");
m.find();
System.out.println(m.group());
Output:
this
If you want to find multiple matches in a line, you can call find() and group() repeatedly to extract them all.
Full example method for matcher:
public static String REGEX_FIND_WORD="(?i).*?\\b%s\\b.*?";
public static boolean containsWord(String text, String word) {
String regex=String.format(REGEX_FIND_WORD, Pattern.quote(word));
return text.matches(regex);
}
Explain:
(?i) - ignorecase
.*? - allow (optionally) any characters before
\b - word boundary
%s - variable to be changed by String.format (quoted to avoid regex
errors)
\b - word boundary
.*? - allow (optionally) any characters after
For a good explanation, see: http://www.regular-expressions.info/java.html
myString.matches("regex") returns true or false depending whether the
string can be matched entirely by the regular expression. It is
important to remember that String.matches() only returns true if the
entire string can be matched. In other words: "regex" is applied as if
you had written "^regex$" with start and end of string anchors. This
is different from most other regex libraries, where the "quick match
test" method returns true if the regex can be matched anywhere in the
string. If myString is abc then myString.matches("bc") returns false.
bc matches abc, but ^bc$ (which is really being used here) does not.
This writes "true":
String input = "Print this";
System.out.println(input.matches(".*\\bthis\\b"));
You may use groups to find the exact word. Regex API specifies groups by parentheses. For example:
A(B(C))D
This statement consists of three groups, which are indexed from 0.
0th group - ABCD
1st group - BC
2nd group - C
So if you need to find some specific word, you may use two methods in Matcher class such as: find() to find statement specified by regex, and then get a String object specified by its group number:
String statement = "Hello, my beautiful world";
Pattern pattern = Pattern.compile("Hello, my (\\w+).*");
Matcher m = pattern.matcher(statement);
m.find();
System.out.println(m.group(1));
The above code result will be "beautiful"
Is your searchString going to be regular expression? if not simply use String.contains(CharSequence s)
System.out.println(input.matches(".*\\bthis$"));
Also works. Here the .* matches anything before the space and then this is matched to be word in the end.

Categories