Regular Expression always returns false - java

I have a problem to get a regular expression to get work.
I use an XMLRPC Library to get information from an wiki.
so far so good.
After retrieving the data into a String Variable I would like to search through with a regular expression but the matcher will always return "false".
But if I asking the String ....contains("xyz"); the Answer is true.
The String looks something like this:
====== Datensicherheit ====== ''Kriterium von Sicherheit'' Typ: technisch Definition: \ //Allgemein.........
String regex = "Definition";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
System.out.println(matcher.matches());
Does anybody know what I'm doing wrong?

This is an issue with your regex expression. If you are wanting to know if the string contains "Definition", your regex needs to be:
String regex = ".*Definition.*";

Note that matches() returns true if, and only if, the entire region sequence matches this matcher's pattern. see the java doc # https://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#matches()
So, it will only be true if the entire "text" region matches "Definition", which is unlikely :).
Try find() instead which is true if, and only if, a subsequence of the input sequence starting at the given index matches this matcher's pattern.

Related

Print out the string that matched my regular expression in java?

Possible duplicate: Print regex matches in java
I am using Matcher class in java to match a string with a particular regular expression which I converted into a Pattern using the Pattern class. I know my regex works because when I do Matcher.find(), I am getting true values where I am supposed to. But I want to print out the stings that are producing those true values (meaning print out the strings that match my regex) and I don't see a method in the matcher class to achieve that. Please do let me know if anyone has encountered such a problem before. I apologize as this question is fairly rudimentary but I am fairly new to regex and hence am still finding my way around the regex world.
Assuming mis your matcher:
m.group() will return the matched string.
[EDIT] Added info regarding matched groups
Also, if your regex has portions inside parenthesis, m.group(n) will return the string that matches the nth group inside parenthesis;
Pattern p = Pattern.compile("mary (.*) bob");
Matcher m = p.matcher("since that day mary loves bob");
m.group() returns "mary loves bob".
m.group(1) return "loves".

Whitespace in Java's regular expression

I'm trying to write a regular expression to mach an IRC PRIVMSG string. It is something like:
:nick!name#some.host.com PRIVMSG #channel :message body
So i wrote the following code:
Pattern pattern = Pattern.compile("^:.*\\sPRIVMSG\\s#.*\\s:");
Matcher matcher = pattern.matcher(msg);
if(matcher.matches()) {
System.out.println(msg);
}
It does not work. I got no matches. When I test the regular expression using online javascript testers, I got matches.
I tried to find the reason, why it doesn't work and I found that there's something wrong with the whitespace symbol. The following pattern will give me some matches:
Pattern.compile("^:.*");
But the pattern with \s will not:
Pattern.compile("^:.*\\s");
It's confusing.
The java matches method strikes again! That method only returns true if the entire string matches the input. You didn't include anything that captures the message body after the second colon, so the entire string is not a match. It works in testers because 'normal' regex is a 'match' if any part of the input matches.
Pattern pattern = Pattern.compile("^:.*?\\sPRIVMSG\\s#.*?\\s:.*$");
Should match
If you look at the documentation for matches(), uou will notice that it is trying to match the entire string. You need to fix your regexp or use find() to iterate through the substring matches.

How to find the exact word using a regex in Java?

Consider the following code snippet:
String input = "Print this";
System.out.println(input.matches("\\bthis\\b"));
Output
false
What could be possibly wrong with this approach? If it is wrong, then what is the right solution to find the exact word match?
PS: I have found a variety of similar questions here but none of them provide the solution I am looking for.
Thanks in advance.
When you use the matches() method, it is trying to match the entire input. In your example, the input "Print this" doesn't match the pattern because the word "Print" isn't matched.
So you need to add something to the regex to match the initial part of the string, e.g.
.*\\bthis\\b
And if you want to allow extra text at the end of the line too:
.*\\bthis\\b.*
Alternatively, use a Matcher object and use Matcher.find() to find matches within the input string:
Pattern p = Pattern.compile("\\bthis\\b");
Matcher m = p.matcher("Print this");
m.find();
System.out.println(m.group());
Output:
this
If you want to find multiple matches in a line, you can call find() and group() repeatedly to extract them all.
Full example method for matcher:
public static String REGEX_FIND_WORD="(?i).*?\\b%s\\b.*?";
public static boolean containsWord(String text, String word) {
String regex=String.format(REGEX_FIND_WORD, Pattern.quote(word));
return text.matches(regex);
}
Explain:
(?i) - ignorecase
.*? - allow (optionally) any characters before
\b - word boundary
%s - variable to be changed by String.format (quoted to avoid regex
errors)
\b - word boundary
.*? - allow (optionally) any characters after
For a good explanation, see: http://www.regular-expressions.info/java.html
myString.matches("regex") returns true or false depending whether the
string can be matched entirely by the regular expression. It is
important to remember that String.matches() only returns true if the
entire string can be matched. In other words: "regex" is applied as if
you had written "^regex$" with start and end of string anchors. This
is different from most other regex libraries, where the "quick match
test" method returns true if the regex can be matched anywhere in the
string. If myString is abc then myString.matches("bc") returns false.
bc matches abc, but ^bc$ (which is really being used here) does not.
This writes "true":
String input = "Print this";
System.out.println(input.matches(".*\\bthis\\b"));
You may use groups to find the exact word. Regex API specifies groups by parentheses. For example:
A(B(C))D
This statement consists of three groups, which are indexed from 0.
0th group - ABCD
1st group - BC
2nd group - C
So if you need to find some specific word, you may use two methods in Matcher class such as: find() to find statement specified by regex, and then get a String object specified by its group number:
String statement = "Hello, my beautiful world";
Pattern pattern = Pattern.compile("Hello, my (\\w+).*");
Matcher m = pattern.matcher(statement);
m.find();
System.out.println(m.group(1));
The above code result will be "beautiful"
Is your searchString going to be regular expression? if not simply use String.contains(CharSequence s)
System.out.println(input.matches(".*\\bthis$"));
Also works. Here the .* matches anything before the space and then this is matched to be word in the end.

Matching numerical pattern

I'm doing a project which requires numerical pattern matching.
For example i want to know whether Value = 1331 is a part of 680+651 = 1331 or not, i.e. i want to match 1331 with 680+651 = 1331 or any other given string.
I'm trying pattern matching in java for the first time and i could not succeed. Below is my code snippet.
String REGEX1=s1; //s1 is '1331'
pattern = Pattern.compile(REGEX1);
matcher = pattern.matcher(line_out); //line_out is for ex. 680+651 = 1331
System.out.println("lookingAt(): "+matcher.lookingAt());
System.out.println("matches(): "+matcher.matches());
It is returning false all the times.
Pls help me.
matches() requires that the pattern be a complete match, not a partial.
You either need to change your pattern to something like .*= 1331$ or use the find() method which will do a partial match.
The matches method requires a perfect, full exact match. Since there is more text in 680+651=1331 than what is matched by the regex 1331, matches returns false.
As I pointed out in Brian's post, you need to be careful in your regex to ensure that a regex of 1331 does not match the number 213312 unless that is what you want.
matches() is the wrong method for this, use find().
http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html says:
public boolean matches()
Attempts to match the entire input sequence against the pattern.
and
public boolean find()
Attempts to find the next subsequence of the input sequence that matches the pattern.

Regex works in other engines but not Java Pattern/Matcher

I can't figure out why this regex doesn't work, I've tested it in php and other regex engines where it works fine and matches ",AA,".
Pattern p = Pattern.compile("(^|,)AA(,|$)");
Matcher m = p.matcher("A,B,AA,C,D");
//assigns as false
boolean matches = m.matches();
Side note: I have a split/array binary search method for doing an IN_SET / NOT_IN_SET search against the string. This is just an example I need to get working before implementing regex as another comparing option.
matches() validates the entire string. You want to use find() instead.
From the API:
matches()
Attempts to match the entire region against the pattern.
-- http://download.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#matches()
and:
find()
Attempts to find the next subsequence of the input sequence that matches the pattern.
-- http://download.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#find()
Matcher matches the entire region against the pattern. Use find().

Categories