I'm writing some simple (I thought) regex in Java to remove an asterisk or ampersand which occurs directly next to some specified punctuation.
This was my original code:
String ptr = "\\s*[\\*&]+\\s*";
String punct1 = "[,;=\\{}\\[\\]\\)]"; //need two because bracket rules different for ptr to left or right
String punct2 = "[,;=\\{}\\[\\]\\(]";
out = out.replaceAll(ptr+"("+punct1+")|("+punct2+")"+ptr,"$1");
Which instead of just removing the "ptr" part of the string, removed the punct too! (i.e. replaced the matched string with an empty string)
I examined further by doing:
String ptrStr = ".*"+ptr+"("+punct1+")"+".*|.*("+punct2+")"+ptr+".*";
Matcher m_ptrStr = Pattern.compile(ptrStr).matcher(out);
and found that:
m_ptrStr.matches() //returns true, but...
m_ptrStr.group(1) //returns null??
I have no idea what I'm doing wrong as I've used this exact method before with far more complicated regex and group(1) has always returned the captured group. There must be something I haven't been able to spot, so.. any ideas?
The problem is that you have an alternation with a capturing group on each side:
(regex1)|(regex2)
The matcher will start and search for a match using the first alternation; if not found, it will try the second alternation.
However, those are still two groups, and only one will match. The one which will not match will return null, and this is what happens to you here.
You therefore need to test both groups; since you have a match, at least one will not be null.
When you have | in your pattern, that means that the matcher is allowed to match one of two patterns. Whichever one it matches, any capture groups for the pattern it matches will return the substrings--but any capture groups for the other pattern will return null, because the other pattern wasn't really matched.
It looks like your pattern is
.*\s*[\*&]+\s*([,;=\{}\[\]\)]).*|.*([,;=\{}\[\]\(])+\s*[\*&]+\s*.*
------------- left ------------- -------------- right ------------
If matches() returns true, then either your string matched the "left" pattern, in which case group(1) will be non-null and group(2) will be null; or else it matched the "right" pattern, in which case group(1) will be null and group(2) non-null. [Note: The matcher will not try to find out if both sides are successful matches. That is, if the left side matches, it won't check the right side.]
I have a regex that has an | (or) in it and I would like to determine what part of the or matched in the regex:
Possible Inputs:
-- Input 1 --
Stuff here to keep.
-- First --
all of this below
gets
deleted
-- Input 2 --
Stuff here to keep.
-- Second --
all of this below
gets
deleted
Regex to match part of an incoming input source and determine what part of the | (or) was matched? "-- First --" or "-- Second --"
Pattern PATTERN = Pattern.compile("^(.*?)-+ *(?:First|Second) *-+", Pattern.DOTALL);
Matcher m = PATTERN.matcher(text);
if (m.find()) {
// How can I tell if the regex matched "First" or "Second"?
}
How can I tell which input was matched (First or Second)?
The regular expression does not contain that information. However, you could use some additional groups to figure it out.
Example pattern: (?:(First)|(Second))
On the string First the second capture group will be empty and with Second the first one will be empty. A simple inspection of the groups returned to Java will tell you which part of the regex matched.
EDIT: I assumed that First and Second were used as placeholders for the sake of simplicity and actually represent more complex expressions. If you are really looking to find which of two strings was matched, then having a single capture group (like this: (First|Second)) and comparing its content with First will do the job just fine.
Because RegExes are stateless there is no way to tell by using only one regex.
The solution is to use two different RegExes and make a case decision.
However, you can use group() which returns the last match as String.
You can test this for .contains("First").
if(m.group().contains("First")) {
// case 1
} else {
// case 2
}
I'm trying to get matches for commands like this;
[AUTR| <version_software> | <version_protocol> | <msg> ]
[PING]
What is the regular expression that find this matches for the first command?
AUTR
version_software
version_protocol
msg
this is the code that parse that:
String[] tokens = msg.replace('<',' ').replace('>',' ').replace('[', ' ').replace(']', ' ').split("\\|");
for (int i=0; i<tokens.length; i++) tokens[i] = tokens[i].trim();
I'm only wondering how it can be done with a regex solution.
EDIT:
I'm trying to match groups with easier expressions, and with this code the call to m.groupCount returns one... but when I try to print it... it throws this exception "java.lang.IllegalStateException: No match found"
Pattern pattern = Pattern.compile("([\\w+])");
Matcher m = pattern.matcher("[AUTR]");
for (int i=0; i<m.groupCount();i++)
{
System.out.println(m.group(i));
}
EDIT:
http://fiddle.re/6ykc
Regular Expression:
\[([\w]+)(\s*\|\s*<([\w. ]+)>\s*)*\]
Java Regex String:
"\\[([\\w]+)(\\s*\\|\\s*<([\\w. ]+)>\\s*)*\\]"
Note that this is for variable commands now and that all extra parameters must match the following character set [a-zA-Z_0-9. ] (Includes periods and spaces).
Issue: There is an issue with variable length commands that you cannot capture more than one group with a variable type grouping.
The captured input associated with a group is always the subsequence that the group most recently matched. If a group is evaluated a second time because of quantification then its previously-captured value, if any, will be retained if the second evaluation fails. Matching the string "aba" against the expression (a(b)?)+, for example, leaves group two set to "b". All captured input is discarded at the beginning of each match.
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#cg
EDIT 2:
In order to get all of them you can do 2 regular expressions, one to grab the command:
String command_regex = "\\[([\\w]+)";
And find that and then find the parameters which you can use the <> as your key character to select:
String parameters = "<([\\w. ]+)>";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string_to_match);
while (matcher.find()) {
System.out.println(matcher.group());
}
Hope that helps.
ORIGINAL:
Not exactly sure on the formatting, are the "<" and ">" and "|" required? And what are the formats for the command, version_software, version_protocol and message? This is my attempt though for regular expressions (tested in Python)
\[(\w+)\s*\|\s*<([\w.]+)>\s*\|\s*<(\w+)>\s*\|\s*<([\w\s]+)>\s*\]
You need to make sure to escape the brackets and the pipe symbols (I added \s* conditions between because I don't know if there will be spaces or not. If you do:
>> search.re("expression above", line)
>> search.groups()
It should give all tokens in python at least. I left it more hardcoded to allow room for adjustments on each token you wanted to grab, otherwise you could reduce the last 3 parts by making it a group and saying to repeat 3 times. Let me know results?
I'm trying to parse through a string formatted like this, except with more values:
Key1=value,Key2=value,Key3=value,Key4=value,Key5=value,Key6=value,Key7=value
The Regex
((Key1)=(.*)),((Key2)=(.*)),((Key3)=(.*)),((Key4)=(.*)),((Key5)=(.*)),((Key6)=(.*)),((Key7)=(.*))
In the actual string, there are about double the amount of key/values, but I'm keeping it short for brevity. I have them in parentheses so I can call them in groups. The keys I have stored as Constants, and they will always be the same. The problem is, it never finds a match which doesn't make sense (unless the Regex is wrong)
Judging by your comment above, it sounds like you're creating the Pattern and Matcher objects and associating the Matcher with the target string, but you aren't actually applying the regex. That's a very common mistake. Here's the full sequence:
String regex = "Key1=(.*),Key2=(.*)"; // etc.
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(targetString);
// Now you have to apply the regex:
if (m.find())
{
String value1 = m.group(1);
String value2 = m.group(2);
// etc.
}
Not only do you have to call find() or matches() (or lookingAt(), but nobody ever uses that one), you should always call it in an if or while statement--that is, you should make sure the regex actually worked before you call any methods like group() that require the Matcher to be in a "matched" state.
Also notice the absence of most of your parentheses. They weren't necessary, and leaving them out makes it easier to (1) read the regex and (2) keep track of the group numbers.
Looks like you'd do better to do:
String[] pairs = data.split(",");
Then parse the key/value pairs one at a time
Your regex is working for me...
If you are always getting an IllegalStateException, I would say that you are trying to do something like:
matcher.group(1);
without having invoked the find() method.
You need to call that method before any attempt to fetch a group (or you will be in an illegal state to call the group() method)
Give this a try:
String test = "Key1=value,Key2=value,Key3=value,Key4=value,Key5=value,Key6=value,Key7=value";
Pattern pattern = Pattern.compile("((Key1)=(.*)),((Key2)=(.*)),((Key3)=(.*)),((Key4)=(.*)),((Key5)=(.*)),((Key6)=(.*)),((Key7)=(.*))");
Matcher matcher = pattern.matcher(test);
matcher.find();
System.out.println(matcher.group(1));
It's not wrong per se, but it requires a lot of backtracking which might cause the regular expression engine to bail. I would try a split as suggested elsewhere, but if you really need to use a regular expression, try making it non-greedy.
((Key1)=(.*?)),((Key2)=(.*?)),((Key3)=(.*?)),((Key4)=(.*?)),((Key5)=(.*?)),((Key6)=(.*?)),((Key7)=(.*?))
To understand why it requires so much backtracking, understand that for
Key1=(.*),Key2=(.*)
applied to
Key1=x,Key2=y
Java's regular expression engine matches the first (.*) to x,Key2=y and then tries stripping characters off the right until it can get a match for the rest of the regular expression: ,Key2=(.*). It effectively ends up asking,
Does "" match ,Key2=(.*), no so try
Does "y" match ,Key2=(.*), no so try
Does "=y" match ,Key2=(.*), no so try
Does "2=y" match ,Key2=(.*), no so try
Does "y2=y" match ,Key2=(.*), no so try
Does "ey2=y" match ,Key2=(.*), no so try
Does "Key2=y" match ,Key2=(.*), no so try
Does ",Key2=y" match ,Key2=(.*), yes so the first .* is "x" and the second is "y".
EDIT:
In Java, the non-greedy qualifier changes things so that it starts off trying to match nothing and then building from there.
Does "x,Key2=(.*)" match ,Key2=(.*), no so try
Does ",Key2=(.*)" match ,Key2=(.*), yes.
So when you've got 7 keys it doesn't need to unmatch 6 of them which involves unmatching 5 which involves unmatching 4, .... It can do it's job in one forward pass over the input.
I'm not going to say that there's no regex that will work for this, but it's most likely more complicated to write (and more importantly, read, for the next person that has to deal with the code) than it's worth. The closest I'm able to get with a regex is if you append a terminal comma to the string you're matching, i.e, instead of:
"Key1=value1,Key2=value2"
you would append a comma so it's:
"Key1=value1,Key2=value2,"
Then, the regex that got me the closest is: "(?:(\\w+?)=(\\S+?),)?+"...but this doesn't quite work if the values have commas, though.
You can try to continue tweaking that regex from there, but the problem I found is that there's a conflict in the behavior between greedy and reluctant quantifiers. You'd have to specify a capturing group for the value that is greedy with respect to commas up to the last comma prior to an non-capturing group comprised of word characters followed by the equal sign (the next value)...and this last non-capturing group would have to be optional in case you're matching the last value in the sequence, and maybe itself reluctant. Complicated.
Instead, my advice is just to split the string on "=". You can get away with this because presumably the values aren't allowed to contain the equal sign character.
Now you'll have a bunch of substrings, each of which that is a bunch of characters that comprise a value, the last comma in the string, followed by a key. You can easily find the last comma in each substring using String.lastIndexOf(',').
Treat the first and last substrings specially (because the first one does not have a prepended value and the last one has no appended key) and you should be in business.
If you know you always have 7, the hack-of-least resistance is
^Key1=(.+),Key2=(.+),Key3=(.+),Key4=(.+),Key5=(.+),Key6=(.+),Key7=(.+)$
Try it out at http://www.fileformat.info/tool/regex.htm
I'm pretty sure that there is a better way to parse this thing down that goes through .find() rather than .matches() which I think I would recommend as it allows you to move down the string one key=value pair at a time. It moves you into the whole "greedy" evaluation discussion.
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems. - Jamie Zawinski
The simplest solution is the most robust.
final String data = "Key1=value,Key2=value,Key3=value,Key4=value,Key5=value,Key6=value,Key7=value";
final String[] pairs = data.split(",");
for (final String pair: pairs)
{
final String[] keyValue = pair.split("=");
final String key = keyValue[0];
final String value = keyValue[1];
}
I would like to search a String for an entire match. In other words, if String s = "I am coding", and I type in that I am searching for "am" nothing should get returned. I need the exact String in order to get a match. In other words, I would have to type in"I am coding" exactly in order for a match to be returned.
I need the regex pattern for this, since I am using RowFiler.regexFilter(...).
Have you tried this: ^I am coding$?
The regex, if it doesn't contain characters to escape is as what you are looking for: any character maches for itself and two next characters means concatenation. So, in this case, "\AI am coding\z" is your answer..
On the Regex side of things:
using the start of string anchor ^ and end of string anchor $ at the beginning and the end of your search pattern (respectively) to ensure that the search string doesn't contain anything else (i.e. it equals the pattern you're trying to match. Regex:
^I am Coding$
Ref: http://www.autohotkey.com/docs/misc/RegEx-QuickRef.htm