ReGex patten to match <c:if > conditional variable names? - java

I need to get conditional variable name for all cases in a particular jsp
I am reading the jsp line by line and searching for particular pattern like for a line say its checking two type of cond where it finds the match
<c:if condition="Event ='Confirmation'">
<c:if condition="Event1 = 'Confirmation' or Event2 = 'Action'or Event3 = 'Check'" .....>
Desired Result is name of all cond variable - Event,Event1,Event2,Event3 I have written a parser that only satisfying the first case But not able to find variable names for second case.Need a pattern to satisfy both of them.
String stringSearch = "<c:if";
while ((line = bf.readLine()) != null) {
// Increment the count and find the index of the word
lineCount++;
int indexfound = line.indexOf(stringSearch);
if (indexfound > -1) {
Pattern pattern = Pattern
.compile(test=\"([\\!\\(]*)(.*?)([\\=\\)\\s\\.\\>\\[\\(]+?));
Matcher matcher = pattern.matcher(line);
if (matcher.find()) {
str = matcher.group(1);
hset.add(str);
counter++;
}
}

If I understood your requirement well, this may work :
("|\s+)!?(\w+?)\s*=\s*'.*?'
$2 will give each condition variable name.
What it does is:
("|\s+) a " or one or more spaces
!? an optional !
(\w+?) one or more word character (letter, digit or underscore) (([A-Za-z]\w*) would be more correct)
\s*=\s* an = preceded and followed by zero or more spaces
'.*?' zero or more characters inside ' and '
Second capture group is (\w+?) retrieving the variable name
Add required escaping for \
Edit: For the additional conditions you specified, the following may suffice:
("|or\s+|and\s+)!?(\w+?)(\[\d+\]|\..*?)?\s*(!?=|>=?|<=?)\s*.*?
("|or\s+|and\s+) A " or an or followed by one or more spaces or an and followed by one or more spaces. (Here, it is assumed that each expression part or variable name is preceded by a " or an or followed by one or more spaces or an and followed by one or more spaces)
!?(\w+?) An optional ! followed by one or more word character
(\[\d+\]|\..*?)? An optional part constituting a number enclosed in square brackets or a dot followed by zero or more characters
(!?=|>=?|<=?) Any of the following relational operators : =,!=,>,<,>=,<=
$2 will give the variable name.
Here second capture group is (\w+?) retrieving variable name and third capture group retrieves any suffix if present (eg:[2] in Event[2]).
For input containing a condition Event.indexOf(2)=something, $2 gives Event only. If you want it to be Event.indexOf(2) use $2$3.

This could suit your needs:
"(\\w+)\\s*=\\s*(?!\")"
Which means:
Every word followed by a = that isn't followed by a "
For example:
String s = "<c:if condition=\"Event ='Confirmation'\"><c:if condition=\"Event1 = 'Confirmation' or Event2 = 'Action'or Event3 = 'Check'\" .....>";
Pattern p = Pattern.compile("(\\w+)\\s*=\\s*(?!\")");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(1));
}
Prints:
Event
Event1
Event2
Event3

Related

Java Regex. group excluding delimiters

I'm trying to split my string using regex. It should include even zero-length matches before and after every delimiter. For example, if delimiter is ^ and my string is ^^^ I expect to get to get 4 zero-length groups.
I can not use just regex = "([^\\^]*)" because it will include extra zero-length matches after every true match between delimiters.
So I have decided to use not-delimiter symbols following after beginning of line or after delimiter. It works perfect on https://regex101.com/ (I'm sorry, i couldn't find a share option on this web-site to share my example) but in Intellij IDEa it skips one match.
So, now my code is:
final String regex = "(^|\\^)([^\\^]*)";
final String string = "^^^^";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find())
System.out.println("[" + matcher.start(2) + "-" + matcher.end(2) + "]: \"" + matcher.group(2) + "\"");
and I expect 5 empty-string matches. But I have only 4:
[0-0]: ""
[2-2]: ""
[3-3]: ""
[4-4]: ""
The question is why does it skip [1-1] match and how can I fix it?
Your regex matches either the start of string or a ^ (capturing that into Group 1) and then any 0+ chars other than ^ into Group 2. When the first match is found (the start of the string), the first group keeps an empty string (as it is the start of string) and Group 2 also holds an empty string (as the first char is ^ and [^^]* can match an empty string before a non-matching char. The whole match is zero-length, and the regex engine moves the regex index to the next position. So, after the first match, the regex index is moved from the start of the string to the position after the first ^. Then, the second match is found, the second ^ and the empty string after it. Hence, the the first ^ is not matched, it is skipped.
The solution is a simple split one:
String[] result = string.split("\\^", -1);
The second argument makes the method output all empty matches at the end of the resulting array.
See a Java demo:
String str = "^^^^";
String[] result = str.split("\\^", -1);
System.out.println("Number of items: " + result.length);
for (String s: result) {
System.out.println("\"" + s+ "\"");
}
Output:
Number of items: 5
""
""
""
""
""

Matcher cannot recognize the second group of regular expression in java

I've got a problem when I'm using Matcher for finding a symbol from the group of regular expressions, it cannot recognize the second group .Maybe the code below make it clear :
public void set(String n){
String pat = "(\\d+)[!##$%^&*()_+-=}]";
Pattern r;
r = Pattern.compile(pat);
System.out.println(r);
Matcher m;
m = r.matcher(n);
if (m.find()) {
JOptionPane.showMessageDialog(null,
"Not a correct form", "ERROR_NAME_MATCH", 0);
}else{
name = n;
}
}
After running the code the first group is recognizable but the second one [!##$%^&*()_+-=}] is not.I'm totally sure that the expression is true I've checked it with 'RegexBuddy'. There must be a problem with concatenating two or more groups in one line.
Thank you for your help.
Your regex - (\d+)[!##$%^&*()_+=}-] - matches a sequence of 1+ digits followed with a symbol from the specified set.
You want to test a string and return true if a single character from the specified set is present in the string.
So, just move \d to the character class and certainly move the - to the end of this class:
String pat = "[\\d!##$%^&*()_+=}-]";
^^^
If you need to match a digit or special char, use
String pat = "\\d|[!##$%^&*()_+=}-]";
If you need both irrespective of the order:
String pat = "^(?=\\D*\\d)(?=[^!##$%^&*()_+=}-]*[!##$%^&*()_+=}-])";

Java Regex Word Extract exclude with special char

below are the String values
"method" <in> abs
("method") <in> abs
method <in> abs
i want to extract only the Word method, i tries with below regex
"(^[^\\<]*)" its included the special char also
O/p for the above regex
"method"
("method")
method
my expected output
method
method
method
^\\W*(\\w+)
You can use this and grab the group 1 or capture 1.See demo.
https://regex101.com/r/sS2dM8/20
A couple of words on your "(^[^<]*)" regex: it does not match because it has beginning of string anchor ^ after ", which is never the case. However, even if you remove it "([^<]*)", it will not match the last case where " and ( are missing. You need to make them optional. And note the brackets must escaped, and the order of quotes and brackets is different than in your input.
So, your regex could be fixed as
^\(?"?(\b[^<]*)\b"?\)?(?=\s+<)
See demo
However, I'd suggest using a replaceAll approach:
String rx = "(?s)\\(?\"?(.*?)\"?\\)?\\s+<.*";
System.out.println("\"My method\" <in> abs".replaceAll(rx, "$1"));
See IDEONE demo
If the strings start with ("My method, you can also add ^ to the beginning of the pattern: String rx = "(?s)^\\(?\"?(.*?)\"?\\)?\\s+<.*";.
The regex (?s)^\\(?\"?(.*?)\"?\\)?\\s+<.* matches:
(?s) makes . match a newline symbol (may not be necessary)
^ - matches the beginning of a string
\\(? - matches an optional (
\"? - matches an optional "
(.*?) - matches and captures into Group 1 any characters as few as possible
\"? - matches an optional "
\\)? - matches an optional )
\\s+ - matches 1 or more whitespace
< - matches a <
.* - matches 0 or more characters to the end of string.
With $1, we restore the group 1 text in the resulting string.
In fact it is not too complicated.
Here is my answer:
Pattern pattern = Pattern.compile("([a-zA-Z]+)");
String[] myStrs = {
"\"method\"",
"(\"method\")",
"method"
};
for(String s:myStrs) {
Matcher matcher = pattern.matcher(s);
if(matcher.find()) {
System.out.println( matcher.group(0) );
}
}
The output is:
method
method
method
You just need to use:
[a-zA-Z]+

Java Regex to match repeated keywords

I need to filter a document if the caption is the same surname (i.e.,Smith Vs Smith or John Vs John etc.).
I am converting entire document into a string and validating that string against a regular expression.
Could any one help me to write a regular expression for the above case.
Backreferences.
Example: (\w+) Vs \1
If a had exactly understand your question: you have a string like this "X Vs Y" (Where X and Y are two names) and you want to know if X == Y.
In this case, a simple (\w+) regex can do it :
String input = "Smith Vs Smith";
// Build the Regex
Pattern p = Pattern.compile("(\\w+)");
Matcher m = p.matcher(input);
// Store the matches in a list
List<String> str = new ArrayList<String>();
while (m.find()) {
if (!m.group().equals("Vs"))
{
str.add(m.group());
}
}
// Test the matches
if (str.size()>1 && str.get(0).equals(str.get(1)))
System.out.println(" The Same ");
else System.out.println(" Not the Same ");
(\w+).*\1
This means: a word of 1 or more characters, signed as group 1, followed by anything, and followed by whatever group 1 is.
More explained: grouping (bracketing part of regex) and referencing to groups defined in the expression ( \1 does that here).
Example:
String s = "Stewie is a good guy. Stewie does no bad things";
s.find("(\\w+).*\\1") // will be true, and group 1 is the duplicated word. (note the additional java escape);

Can't retrieve data from matched * group in Java

I'm having trouble figuring out the proper regex.
Here is some sample code:
#Test
public void testFindEasyNaked() {
System.out.println("Naked_find");
String arg = "hi mom <us-patent-grant seq=\"002\" image=\"D000001\" >foo<name>Fred</name></us-patent-grant> extra stuff";
String nakedPat = "<(us-patent-grant)((\\s*[\\S&&[^>]])*)*\\s*>(.+?)</\\1>";
System.out.println(nakedPat);
Pattern naked = Pattern.compile(nakedPat, Pattern.MULTILINE + Pattern.DOTALL );
Matcher m = naked.matcher(arg);
if (m.find()) {
System.out.println("found naked");
for (int i = 0; i <= m.groupCount(); i++) {
System.out.printf("%d: %s\n", i, m.group(i));
}
} else {
System.out.println("can't find naked either");
}
System.out.flush();
}
My regex matches the string, but I am not able to pull the repeated pattern.
What I want is to have
seq=\"002\" image=\"D000001\"
pulled out as a group. Here is what the program shows when I execute it.
Naked_find
<(us-patent-grant)((\s*[\S&&[^>]])*)*\s*>(.+?)</\1>
found naked
0: <us-patent-grant seq="002" image="D000001" >foo<name>Fred</name></us-patent-grant>
1: us-patent-grant
2:
3: "
4: foo<name>Fred</name>
The group #4 is fine, but where is the data for #2 and #3, and why is there a double quote in #3?
Thanks
Pat
Even if using an XML parser would be sound, I think I can explain the error in your regular expression:
String nakedPat = "<(us-patent-grant)((\\s*[\\S&&[^>]])*)*\\s*>(.+?)</\\1>";
You try to match the parameters in the part ((\\s*[\\S&&[^>]])*)*. Look at your innermost group: you have \s* ("one or more space") followed by \\S&&[^>] ("one non-space which is not >). It means that in your group, you will either have from zero to some spaces followed by a single non-space character.
So this will match any non-space character between "us-patent-grant" and >. And every time the regular expression engine will match it, it will assign the value to the group 3. It means the group previously matched are lost. That's why you have the last character of the tag, that is ".
You can improve it a bit by adding a + after [\\S&&[^>]], so it will match at least a complete sequence of non-spaces, but you would only obtain the last tag attribute in your group. You should instead use a better and simpler way:
Your goal being to pull out seq="002" image="D000001" in a group, what you should do is simply to match the sequence of every characters which are not > after "us-patent-grant":
"<(us-patent-grant)\\s*([^>]*)\\s*>(.+?)</\\1>"
This way, you have the following values in your groups:
Group 1: us-patent-grant
Group 2: seq=\"002\" image=\"D000001\"
Group 3: foo<name>Fred</name>
Here is the test on Regexplanet: http://fiddle.re/ezfd6

Categories