Best way to detect logical connectors - java

I'm trying to detect logical connectors in a string (AND, OR, NOT) using Java. What I would like to do is:
Given a string (e.g ((blue) AND (yellow) OR (pink)), separate each word and put them in a List. The result should be something like {"blue","yellow","pink"}
I know that for match the words, I need to use a regex like \b(AND|OR|NOT)\b.
But I don't know how return each word after or before the connector.
Other question: Is usefull use regex or maybe I have to use contains()?

How about this?
String s = "((blue) AND (yellow) OR (pink))";
s = s.replaceAll("\\(|\\)", "");
String[] words = s.split("AND|OR|NOT");
System.out.println(Arrays.toString(words));
Output:
[blue , yellow , pink]

string s="((blue) AND (yellow) OR (pink))";
s.split("\bAND\b|\bNOT\b|\bOR\b");

You could try using string.split("AND|OR|NOT");.
EDIT: oops, forgot \b:
string.split("\b(AND|OR|NOT)\b");

Parsing this kind of strings is no task for regular expressions, regular expressions represent a finite predefined automaton.
you need to use some sort of pushdown automaton for this task.
http://en.wikipedia.org/wiki/Pushdown_automaton
The easiest way to do this, is with a recursion that identifies a "logical string structure"
(...) AND (...) or (...) OR (...), NOT (...), etc...
and strips the parenthesis, and repeats until you discover a string that doesn't match this kind of structure.
This string is what you're looking for.

Related

Java get pattern of a String

is it possible to detect the pattern of a String and store it in a variable? so, if I have a String test1234 and highlight 1234 I expect something like \d{4}.
It would require that you find a regular expression that both your highlighted substring and desired replacement match and that is in no way unique. For example, "1234" could match .{4} or \d{4} or even .+ , which is not of a unique length. So, even if you could generate a regular expression from a string, it could happen that it would be the string itself or something you didn't want. Maybe you should rethink the general desired outcome of your program and try to come up with a different way of solving the issue at hand.
Hope that helped. Good luck!

Best delimiter to separate multipe regex

I need to put multiple regular expressions in a single string and then parse it back as separate regex. Like below
regex1<!>regex2<!>regex3
Problem is I am not sure which delimiter will be best to use to separate the expressions in place of <!> shown in example, so that I can safely split the string when parsing it back.
Constraints are, I can not make the string in multiple lines or use xml or json string. Because this string of expressions should be easily configurable.
Looking forward for any suggestion.
Edited:
Q: Why does it have to be a single string?
A: The system has a configuration manager that loads config from properties file. And properties are containing lines like
com.some.package.Class1.Field1: value
com.some.package.Class1.Expressions: exp1<!>exp2<!>exp3
There is no way to write the value in multiple lines in the properties file. That's why.
The best way would be to use invalid regex as delimiter such as ** Because if it is used in normal regex it won't work and would throw an exception{NOTE:++ is valid}
regex1+"**"+regex2
Now you can split it with this regex
(?<!\\\\)[*][*](?![*])
------- -----
| |->to avoid matching pattern like "A*"+"**"+"n+"
|->check if * is not escaped
Following is a list of invalid regex
[+
(+
[*
(*
[?
*+
** (delimiter would be (?<!\\\\)[*][*](?![*]))
??(delimiter would be (?<!\\\\)[?][?](?![?]))
While splitting you need to check if they are escaped
(?<!\\\\)delimiter
Best delimiter is depends upon your requirement. But for best practice use sequesnce of special characters so that possibility of occureance of this sequesnce is minimal
like
$$**##$$
#$%&&%$#
i think its something helpful for u
First you have to replace tag content with single special character and then split
String inputString="regex1<!>regex2<!>regex3";
String noHTMLString = inputString.replaceAll("\\<.*?>","-");
String[] splitString1 = (noHTMLString.split("[-]+"));
for (String string : splitString1) {
System.out.println(string);
}

understanding regex if then statements

So I'm not sure if I understand how this works and would like
a simple explanation to how they work is all. I probably have it way off. A pure regex solution is required, and I don't know if this is possible. If it is, a solution would be awesome too, but a shove in the right direction would be good for my learning process ^_^
This is how I thought the if/then/else option built into my regex engines was formatted:
?(condition)if regex|else regex
I want it to capture a string from a very specific location only when this string exists within a certain section of javascript. Because this is how I thought it worked after a decent amount of research I tried out a few variations of this code but they all ended up something like this.
((?^view_large$)Tables-137(.*?)search.htm)
Also of relevance: I'm using an java based app that has regex searches which pull the data I need so I cannot write an if statement in java which would be my preferred method. It's a pain to have to do it this way, but at the moment I have no other choice. I'm trying really hard for them to allow java code functionality instead of pure regex for more versatile options.
So to summarize, is there even a if/then option in regex and if so how is it formatted for what I'm trying to accomplish?
EDIT: The string that I want to be the "if condition" is like this: if view_large string exists and is not null then capture the exact string 500/ which is captured within the catch all group I used: (.*?)
There is no conditionals in Java regexp, but you can simulate them by writing two expressions that include mutually exclusive look-behind constructs, like this:
((?<=if )then)|((?<!if )end)
This expression will match "then" when it is preceded by an "if "; it will match "end" when it is not preceded by an "if "
The Javadoc for java.util.regex.Pattern mentions, in its list of "Perl constructs not supported by this class":
The conditional constructs (?(condition)X) and (?(condition)X|Y).
So, no dice. But you should look through the Javadoc to see if you can achieve what you need by using regex features that it does support. (Or, if you post some more detailed examples, we can try to help.)
Try lookaround assertions.
For example, say you want to capture FOOBAR only if there is a 4+ digit number somewhere:
(?=.*\d{4}).*(FOOBAR)

Pattern match numbers/operators

Hey, I've been trying to figure out why this regular expression isn't matching correctly.
List l_operators = Arrays.asList(Pattern.compile(" (\\d+)").split(rtString.trim()));
The input string is "12+22+3"
The output I get is -- [,+,+]
There's a match at the beginning of the list which shouldn't be there? I really can't see it and I could use some insight. Thanks.
Well, technically, there is an empty string in front of the first delimiter (first sequence of digits). If you had, say a line of CSV, such as abc,def,ghi and another one ,jkl,mno you would clearly want to know that the first value in the second string was the empty string. Thus the behaviour is desirable in most cases.
For your particular case, you need to deal with it manually, or refine your regular expression somehow. Like this for instance:
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(rtString);
if (m.find()) {
List l_operators = Arrays.asList(p.split(rtString.substring(m.end()).trim()));
// ...
}
Ideally however, you should be using a parser for these type of strings. You can't for instance deal with parenthesis in expressions using just regular expressions.
That's the behavior of split in Java. You just have to take it (and deal with it) or use other library to split the string. I personally try to avoid split from Java.
An example of one alternative is to look at Splitter from Google Guava.
Try Guava's Splitter.
Splitter.onPattern("\\d+").omitEmptyStrings().split(rtString)

replacing regex in java string

I have this java string:
String bla = "<my:string>invalid_content</my:string>";
How can I replace the "invalid_content" piece?
I know I should use something like this:
bla.replaceAll(regex,"new_content");
in order to have:
"<my:string>new_content</my:string>";
but I can't discover how to create the correct regex
help please :)
You could do something like
String ResultString = subjectString.replaceAll("(<my:string>)(.*)(</my:string>)", "$1whatever$3");
Mark's answer will work, but can be improved with two simple changes:
The central parentheses are redundant if you're not using that group.
Making it non-greedy will help if you have multiple my:string tags to match.
Giving:
String ResultString = SubjectString.replaceAll
( "(<my:string>).*?(</my:string>)" , "$1whatever$2" );
But that's still not how I'd write it - the replacement can be simplified using lookbehind and lookahead, and you can avoid repeating the tag name, like this:
String ResultString = SubjectString.replaceAll
( "(?<=<(my:string)>).*?(?=</\1>)" , "whatever" );
Of course, this latter one may not be as friendly to those who don't yet know regex - it is however more maintainable/flexible, so worth using if you might need to match more than just my:string tags.
See Java regex tutorial and check out character classes and capturing groups.
The PCRE would be:
/invalid_content/
For a simple substitution. What more do you want?
Is invalid_content a fix value? If so you could simply replace that with your new content using:
bla = bla.replaceAll("invalid_content","new_content");

Categories