I have a Regex pattern that matches data I need to parse exactly as I need it. Unfortunately with the split method it is deleting the desired data and passing the garbage out to me. Normally I would just try another Regex expression doing the opposite but its not quite as simple as it sounds. It must be in Java as this section is part of a much bigger program/package.
Pattern p = Pattern.compile("/^\{\?|\:|\=|\||(\-
configurationFile)|(isUsingRESTDescription)|(\restURL)=(\s|\w|\.|\-|\:|\/|\;|\[|\]|\'|\})\r/g");
This is the string I'm parsing (there are carriage returns after each section):
SearchResult::getBleh(): {BLEHID=BLEH blehLastmoddate=1-Jul-11 bleh=BLEH; Beh description=blehbleh BlEh=bleh1231bleh bLeH=bleh-blehbleh 1 media=http://bleh.com/13 Date=22-May-12 name=[]} String[] items = p.split(input^);
The above gives me the opposite of what I want.
You'd think someone would have had this problem. Help would be appreciated :).
Use capture groups. You can read about them in the javadoc for Pattern.
An example:
Pattern p = Pattern.compile("[^/]*/([^/]*)/.*");
Matcher m = p.matcher("foo/bar/input");
if (m.find()) {
String captured = m.group(1); // This equals "bar"
String matched = m.group(0); // This equals "foo/bar/input"
}
Anything located inside of parentheses in a Pattern is a capture group. The Matcher indexes the capture groups based on when the opening parentheses is encountered. Group 0 is always the entire matched region.
Consider the following code snippet:
String input = "Print this";
System.out.println(input.matches("\\bthis\\b"));
Output
false
What could be possibly wrong with this approach? If it is wrong, then what is the right solution to find the exact word match?
PS: I have found a variety of similar questions here but none of them provide the solution I am looking for.
Thanks in advance.
When you use the matches() method, it is trying to match the entire input. In your example, the input "Print this" doesn't match the pattern because the word "Print" isn't matched.
So you need to add something to the regex to match the initial part of the string, e.g.
.*\\bthis\\b
And if you want to allow extra text at the end of the line too:
.*\\bthis\\b.*
Alternatively, use a Matcher object and use Matcher.find() to find matches within the input string:
Pattern p = Pattern.compile("\\bthis\\b");
Matcher m = p.matcher("Print this");
m.find();
System.out.println(m.group());
Output:
this
If you want to find multiple matches in a line, you can call find() and group() repeatedly to extract them all.
Full example method for matcher:
public static String REGEX_FIND_WORD="(?i).*?\\b%s\\b.*?";
public static boolean containsWord(String text, String word) {
String regex=String.format(REGEX_FIND_WORD, Pattern.quote(word));
return text.matches(regex);
}
Explain:
(?i) - ignorecase
.*? - allow (optionally) any characters before
\b - word boundary
%s - variable to be changed by String.format (quoted to avoid regex
errors)
\b - word boundary
.*? - allow (optionally) any characters after
For a good explanation, see: http://www.regular-expressions.info/java.html
myString.matches("regex") returns true or false depending whether the
string can be matched entirely by the regular expression. It is
important to remember that String.matches() only returns true if the
entire string can be matched. In other words: "regex" is applied as if
you had written "^regex$" with start and end of string anchors. This
is different from most other regex libraries, where the "quick match
test" method returns true if the regex can be matched anywhere in the
string. If myString is abc then myString.matches("bc") returns false.
bc matches abc, but ^bc$ (which is really being used here) does not.
This writes "true":
String input = "Print this";
System.out.println(input.matches(".*\\bthis\\b"));
You may use groups to find the exact word. Regex API specifies groups by parentheses. For example:
A(B(C))D
This statement consists of three groups, which are indexed from 0.
0th group - ABCD
1st group - BC
2nd group - C
So if you need to find some specific word, you may use two methods in Matcher class such as: find() to find statement specified by regex, and then get a String object specified by its group number:
String statement = "Hello, my beautiful world";
Pattern pattern = Pattern.compile("Hello, my (\\w+).*");
Matcher m = pattern.matcher(statement);
m.find();
System.out.println(m.group(1));
The above code result will be "beautiful"
Is your searchString going to be regular expression? if not simply use String.contains(CharSequence s)
System.out.println(input.matches(".*\\bthis$"));
Also works. Here the .* matches anything before the space and then this is matched to be word in the end.
I'm trying to parse through a string formatted like this, except with more values:
Key1=value,Key2=value,Key3=value,Key4=value,Key5=value,Key6=value,Key7=value
The Regex
((Key1)=(.*)),((Key2)=(.*)),((Key3)=(.*)),((Key4)=(.*)),((Key5)=(.*)),((Key6)=(.*)),((Key7)=(.*))
In the actual string, there are about double the amount of key/values, but I'm keeping it short for brevity. I have them in parentheses so I can call them in groups. The keys I have stored as Constants, and they will always be the same. The problem is, it never finds a match which doesn't make sense (unless the Regex is wrong)
Judging by your comment above, it sounds like you're creating the Pattern and Matcher objects and associating the Matcher with the target string, but you aren't actually applying the regex. That's a very common mistake. Here's the full sequence:
String regex = "Key1=(.*),Key2=(.*)"; // etc.
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(targetString);
// Now you have to apply the regex:
if (m.find())
{
String value1 = m.group(1);
String value2 = m.group(2);
// etc.
}
Not only do you have to call find() or matches() (or lookingAt(), but nobody ever uses that one), you should always call it in an if or while statement--that is, you should make sure the regex actually worked before you call any methods like group() that require the Matcher to be in a "matched" state.
Also notice the absence of most of your parentheses. They weren't necessary, and leaving them out makes it easier to (1) read the regex and (2) keep track of the group numbers.
Looks like you'd do better to do:
String[] pairs = data.split(",");
Then parse the key/value pairs one at a time
Your regex is working for me...
If you are always getting an IllegalStateException, I would say that you are trying to do something like:
matcher.group(1);
without having invoked the find() method.
You need to call that method before any attempt to fetch a group (or you will be in an illegal state to call the group() method)
Give this a try:
String test = "Key1=value,Key2=value,Key3=value,Key4=value,Key5=value,Key6=value,Key7=value";
Pattern pattern = Pattern.compile("((Key1)=(.*)),((Key2)=(.*)),((Key3)=(.*)),((Key4)=(.*)),((Key5)=(.*)),((Key6)=(.*)),((Key7)=(.*))");
Matcher matcher = pattern.matcher(test);
matcher.find();
System.out.println(matcher.group(1));
It's not wrong per se, but it requires a lot of backtracking which might cause the regular expression engine to bail. I would try a split as suggested elsewhere, but if you really need to use a regular expression, try making it non-greedy.
((Key1)=(.*?)),((Key2)=(.*?)),((Key3)=(.*?)),((Key4)=(.*?)),((Key5)=(.*?)),((Key6)=(.*?)),((Key7)=(.*?))
To understand why it requires so much backtracking, understand that for
Key1=(.*),Key2=(.*)
applied to
Key1=x,Key2=y
Java's regular expression engine matches the first (.*) to x,Key2=y and then tries stripping characters off the right until it can get a match for the rest of the regular expression: ,Key2=(.*). It effectively ends up asking,
Does "" match ,Key2=(.*), no so try
Does "y" match ,Key2=(.*), no so try
Does "=y" match ,Key2=(.*), no so try
Does "2=y" match ,Key2=(.*), no so try
Does "y2=y" match ,Key2=(.*), no so try
Does "ey2=y" match ,Key2=(.*), no so try
Does "Key2=y" match ,Key2=(.*), no so try
Does ",Key2=y" match ,Key2=(.*), yes so the first .* is "x" and the second is "y".
EDIT:
In Java, the non-greedy qualifier changes things so that it starts off trying to match nothing and then building from there.
Does "x,Key2=(.*)" match ,Key2=(.*), no so try
Does ",Key2=(.*)" match ,Key2=(.*), yes.
So when you've got 7 keys it doesn't need to unmatch 6 of them which involves unmatching 5 which involves unmatching 4, .... It can do it's job in one forward pass over the input.
I'm not going to say that there's no regex that will work for this, but it's most likely more complicated to write (and more importantly, read, for the next person that has to deal with the code) than it's worth. The closest I'm able to get with a regex is if you append a terminal comma to the string you're matching, i.e, instead of:
"Key1=value1,Key2=value2"
you would append a comma so it's:
"Key1=value1,Key2=value2,"
Then, the regex that got me the closest is: "(?:(\\w+?)=(\\S+?),)?+"...but this doesn't quite work if the values have commas, though.
You can try to continue tweaking that regex from there, but the problem I found is that there's a conflict in the behavior between greedy and reluctant quantifiers. You'd have to specify a capturing group for the value that is greedy with respect to commas up to the last comma prior to an non-capturing group comprised of word characters followed by the equal sign (the next value)...and this last non-capturing group would have to be optional in case you're matching the last value in the sequence, and maybe itself reluctant. Complicated.
Instead, my advice is just to split the string on "=". You can get away with this because presumably the values aren't allowed to contain the equal sign character.
Now you'll have a bunch of substrings, each of which that is a bunch of characters that comprise a value, the last comma in the string, followed by a key. You can easily find the last comma in each substring using String.lastIndexOf(',').
Treat the first and last substrings specially (because the first one does not have a prepended value and the last one has no appended key) and you should be in business.
If you know you always have 7, the hack-of-least resistance is
^Key1=(.+),Key2=(.+),Key3=(.+),Key4=(.+),Key5=(.+),Key6=(.+),Key7=(.+)$
Try it out at http://www.fileformat.info/tool/regex.htm
I'm pretty sure that there is a better way to parse this thing down that goes through .find() rather than .matches() which I think I would recommend as it allows you to move down the string one key=value pair at a time. It moves you into the whole "greedy" evaluation discussion.
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems. - Jamie Zawinski
The simplest solution is the most robust.
final String data = "Key1=value,Key2=value,Key3=value,Key4=value,Key5=value,Key6=value,Key7=value";
final String[] pairs = data.split(",");
for (final String pair: pairs)
{
final String[] keyValue = pair.split("=");
final String key = keyValue[0];
final String value = keyValue[1];
}
I'm doing a project which requires numerical pattern matching.
For example i want to know whether Value = 1331 is a part of 680+651 = 1331 or not, i.e. i want to match 1331 with 680+651 = 1331 or any other given string.
I'm trying pattern matching in java for the first time and i could not succeed. Below is my code snippet.
String REGEX1=s1; //s1 is '1331'
pattern = Pattern.compile(REGEX1);
matcher = pattern.matcher(line_out); //line_out is for ex. 680+651 = 1331
System.out.println("lookingAt(): "+matcher.lookingAt());
System.out.println("matches(): "+matcher.matches());
It is returning false all the times.
Pls help me.
matches() requires that the pattern be a complete match, not a partial.
You either need to change your pattern to something like .*= 1331$ or use the find() method which will do a partial match.
The matches method requires a perfect, full exact match. Since there is more text in 680+651=1331 than what is matched by the regex 1331, matches returns false.
As I pointed out in Brian's post, you need to be careful in your regex to ensure that a regex of 1331 does not match the number 213312 unless that is what you want.
matches() is the wrong method for this, use find().
http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html says:
public boolean matches()
Attempts to match the entire input sequence against the pattern.
and
public boolean find()
Attempts to find the next subsequence of the input sequence that matches the pattern.
I can't figure out why this regex doesn't work, I've tested it in php and other regex engines where it works fine and matches ",AA,".
Pattern p = Pattern.compile("(^|,)AA(,|$)");
Matcher m = p.matcher("A,B,AA,C,D");
//assigns as false
boolean matches = m.matches();
Side note: I have a split/array binary search method for doing an IN_SET / NOT_IN_SET search against the string. This is just an example I need to get working before implementing regex as another comparing option.
matches() validates the entire string. You want to use find() instead.
From the API:
matches()
Attempts to match the entire region against the pattern.
-- http://download.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#matches()
and:
find()
Attempts to find the next subsequence of the input sequence that matches the pattern.
-- http://download.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#find()
Matcher matches the entire region against the pattern. Use find().