I am looking for a way to expand a logical expression (in a string) of the form:
'(A or B) and ((C and D) or E)'
in Python to produce a list of all positive sets, i.e.
['A and C and D',
'A and E',
'B and C and D',
'B and E']
but I have been unable to find how to do this. I have investigated pyparser, but I cannot work out which example is relevant in this case. This may be very easy with some sort of logic manipulation but I do not know any formal logic. Any help, or a reference to a resource that might help would be greatly appreciated.
Here's the pyparsing bit, taken from the example SimpleBool.py. First, use infixNotation (formerly known as operatorPrecedence) to define an expression grammar that supports parenthetical grouping, and recognizes precedence of operations:
from pyparsing import *
term = Word(alphas)
AND = Keyword("and")
OR = Keyword("or")
expr = infixNotation(term,
[
(AND, 2, opAssoc.LEFT),
(OR, 2, opAssoc.LEFT),
])
sample = '(A or B) and ((C and D) or E)'
result = expr.parseString(sample)
from pprint import pprint
pprint(result.asList())
prints:
[[['A', 'or', 'B'], 'and', [['C', 'and', 'D'], 'or', 'E']]]
From this, we can see that the expression is at least parsed properly.
Next, we add parse actions to each level of the hierarchy of operations. For parse actions here, we actually pass classes, so that instead of executing functions and returning some value, the parser will call the class constructor and initializer and return a class instance for the particular subexpression:
class Operation(object):
def __init__(self, tokens):
self._tokens = tokens[0]
self.assign()
def assign(self):
"""
function to copy tokens to object attributes
"""
def __repr__(self):
return self.__class__.__name__ + ":" + repr(self.__dict__)
__str__ = __repr__
class BinOp(Operation):
def assign(self):
self.op = self._tokens[1]
self.terms = self._tokens[0::2]
del self._tokens
class AndOp(BinOp):
pass
class OrOp(BinOp):
pass
expr = infixNotation(term,
[
(AND, 2, opAssoc.LEFT, AndOp),
(OR, 2, opAssoc.LEFT, OrOp),
])
sample = '(A or B) and ((C and D) or E)'
result = expr.parseString(sample)
pprint(result.asList())
returns:
[AndOp:{'terms': [OrOp:{'terms': ['A', 'B'], 'op': 'or'},
OrOp:{'terms': [AndOp:{'terms': ['C', 'D'],
'op': 'and'}, 'E'], 'op': 'or'}],
'op': 'and'}]
Now that the expression has been converted to a data structure of subexpressions, I leave it to you to do the work of adding methods to AndOp and OrOp to generate the various combinations of terms that will evaluate overall to True. (Look at the logic in the invregex.py example that inverts regular expressions for ideas on how to add generator functions to the parsed classes to generate the different combinations of terms that you want.)
It sounds as if you want to convert these expressions to Disjunctive Normal Form. A canonical algorithm for doing that is the Quine-McCluskey algorithm; you can find some information about Python implementations thereof in the relevant Wikipedia article and in the answers to this SO question.
Related
I've pulled an expression from SQL db and that expression is stored as String in jmeter beanshell assertion.
Now I would like to use that expression to evaluate different values
String LeftTank_conv = vars.get("Conv_formula_5") + ""; LeftTank_conv = "{x*100*245/345;}"(something like this)
how can I use this string as an expression to evaluate for different x values
TIA
If you really need the possibility to evaluate arbitrary arithmetic formulae you could use JEXL engine which is being used in JMeter's __jexl3() function
Since JMeter 3.1 it's recommended to use Groovy for scripting so I'll provide the example solution using Groovy syntax:
def context = new org.apache.commons.jexl3.MapContext()
context.set('x', 1)
def engine = new org.apache.commons.jexl3.JexlBuilder().create()
def e = engine.createExpression(vars.get('Conv_formula_5'))
def result = e.evaluate(context)
log.info('Execution result: ' + result as String)
Demo:
TL;DR VERSION: Is there a parser generator that supports the following: when some rule is reduced (I assume LALR(1) parser), then reduction isn't performed, but parser backs off and replaces input with different code using values from this rule and parses that code. Repeat if needed. So if code is "i++" and rule is "expr POST_INCR", I can do more or less:
expr POST_INCR -> "(tmp = expr; expr = expr + 1; tmp)"
So basicaly code rewriting using macros?
LONG VERSION:
I wrote yet another simple interpreted language (in Java for simplicity). It works ok, but it raised some question. Introduction is pretty long, but simple and helps to shows my problem clearly (I think).
I have "while" loop. It is pretty simple, given:
WHILE LPARE boolean_expression RPAREN statement
I generate more or less the following:
new WhileNode(boolean_expression, statement);
This creates new node that, when visited later, generates code for my virtual machine. But I also have the following:
FOR LPAREN for_init_expr SEMICOLON boolean_expression SEMICOLON for_post_expr RPAREN statement
This is "for loop" known from Java or C. From aforementioned rule I create more or less the following:
new ListNode(
for_init_expr,
new WhileNode(
boolean_expression,
new ListNode(
statement,
new ListNode(for_post_expr, null))))
This is of course simple transformation, from:
for (for_init ; boolean_expression ; for_post_expr)
statement
to:
for_init
while (boolean_expression) {
statement
for_post_expr;
}
All is fine and dandy, but things get hairy for the following:
FOR LPAREN var_decl COLON expression RPAREN statement
This if well known and liked:
for (int x : new int[] { 1, 2 })
print(x);
I refrain from posting code that generates AST, since basic for loop was already a little bit long, and what we get here is ever worse. This construction is equal to:
int[] tmp = new int[] { 1, 2 };
for (int it = 0 ; it < tmp.length; it = it + 1) {
int x = tmp[it];
print(x);
}
And since I'm not using types, I simply assume that "expression" (so right side, after COLON) is something that I can iterate over (and arrays are not iterable), I call a function on a result of this "expression" which returns instance of Iterable. So, in fact, my rewritten code isn't as simple as one above, is more or less this:
Iterator it = makeIterable(new int[] { 1, 2 });
while (it.hasNext()) {
int x = it.next();
print(x);
}
It doesn't look THAT bad, but note that AST for this generates three function calls and while loop. To show you what mess it is, I post what I have now:
113 | FOR LPAREN var_decl_name.v PIPE simple_value_field_or_call.o RPAREN statement.s
114 {: Symbol sv = ext(_symbol_v, _symbol_o);
115 String autoVarName = generateAutoVariableName();
116 Node iter = new StatementEndNode(sv, "",
117 new BinNode(sv, CMD.SET, "=",
118 new VarDeclNode(sv, autoVarName),
119 new CallNode(sv, "()",
120 new BinNode(sv, CMD.DOT, ".",
121 new MkIterNode(sv, o),
122 new PushConstNode(sv, "iterator")))));
123 Node varinit = new StatementEndNode(sv, "",
124 new BinNode(sv, CMD.SET, "=",
125 v,
126 new PushConstNode(sv, "null")));
127 Node hasnext = new CallNode(sv, "()",
128 new BinNode(sv, CMD.DOT, ".",
129 new VarNode(sv, autoVarName),
130 new PushConstNode(sv, "hasNext")));
131 Node vargennext = new StatementEndNode(sv, "",
132 new BinNode(sv, CMD.SET, "=",
133 new VarNode(sv, v.name),
134 new CallNode(sv, "()",
135 new BinNode(sv, CMD.DOT, ".",
136 new VarNode(sv, autoVarName),
137 new PushConstNode(sv, "next")))));
138 return new ListNode(sv, "",
139 new ListNode(sv, "",
140 new ListNode(sv, "",
141 iter
142 ),
143 varinit
144 ),
145 new WhileNode(sv, "while",
146 hasnext,
147 new ListNode(sv, "",
148 new ListNode(sv, "",
149 vargennext
150 ),
151 s)));
To answer your questions: yes, I am ashamed of this code.
QUESTION: Is there are parser generator that let's me do something about it, namely given rule:
FOR LPAREN var_decl COLON expr RPAREN statement
tell parser to rewrite it as if it was something else. I imagine that this would require some kind of LISP's macro mechanism (which is easy in lisp due to basically lack of grammar whatsoever), maybe similar to this:
FOR LPAREN var_decl COLON expr RPAREN statement =
{ with [ x = generateAutoName(); ]
emit [ "Iterator $x = makeIterable($expr).iterator();"
"while (${x}.hasNext()) {"
"$var_decl = ${x}.next();"
"$statement"
"}"
]
}
I don't know if this is a well known problem or not, I simply don't even know what to look for - the most similar question that I found is this one: Any software for pattern-matching and -rewriting source code? but it isn't anywhere close to what I need, since it is supposed to work as a separate step and not during compilation, so it doesn't qualify.
Any help will be appreciated.
I think you are trying to bend the parser too much. You can simply build
the tree with macro in it, and then post-process the tree to replace the macros with whatever substitution you want.
You can do this by walking the resulting tree, detecting the macro nodes (or places where you want to do substitutions), and simply splicing in replacements with procedural tree hackery. Not pretty but workable. You should able to do this with the result of any parser generator/AST building machinery.
If you want a more structured approach, you could build your AST and then use source-to-source transformations to "rewrite" the macros to their content.
Out DMS Software Reengineering Toolkit can do this, you can read more details
about what the transforms look like.
Using the DMS approach, your concept:
expr POST_INCR -> "(tmp = expr; expr = expr + 1; tmp)"
requires that you parse the original text in the conventional
way with the grammar rule:
term = expr POST_INCR ;
You would give all these grammar rules to DMS and let it
parse the source and build your AST according to the grammar.
Then you apply a DMS rewrite to the resulting tree:
domain MyToyLanguage; -- tells DMS to use your grammar to process the rule below
rule replace_POST_INCR(e: expr): term->term
= "\e POST_INCR" -> " (tmp = \e; \e = \e + 1; tmp) ";
The quote marks here are "domain meta quotes" rather than string literal quotes.
The text outside the double-quotes is DMS rule-syntax. The text inside the quotes is syntax from your language (MyToyLangauge), and is parsed using the parser you provided, some special escapes for pattern variables like \e.
(You don't have to do anything to your grammar to get this pattern-parsing capability; DMS takes care of that).
By convention with DMS, we often name literal tokens like POST_INCR
with a quoted equivalent '++' in the lexer, rather than using such a name.
Instead of
#token POST_INCR "\+\+"
the lexer rule then looks like:
#token '++' "\+\+"
If you do that, then your grammar rule reads like:
term = expr '++' ;
and your rewrite rule now looks like:
rule replace_POST_INCR(e: expr): term->term
= "\e++" -> " (tmp = \e; \e = \e + 1; tmp) ";
Following this convention, the grammar (lexer and BNF)
is (IMHO) a lot more readable,
and the rewrite rules are more readable too, since they stay
extremely close to the actual language syntax.
Perhaps you are looking for something like ANTLR's tree-rewriting rules.
You could probably make your AST construction syntax more readable by defining some helper functions. To my eye, there is a lot of redundancy (why do you need both an enumeration and a character string for an operator?) but I'm not a Java programmer.
One approach you might take:
Start with your parser, which already produces an AST. Add a lexical syntax or two to handle template arguments and gensyms. Then write an AST walker which serializes the AST into the code (either Java or bytecode) needed to regenerate the AST. Using that, you can generate the macro templates functions using your own parser, which means that it will automatically stay in sync with any changes you might make to your AST.
I am aware that you can create global expressions with Esper's Statement Object Model using CreateExpressionClause and ExpressionDeclaration, but I'm not exactly sure how you are able to refer to their aliases when building an EPStatementObjectModel for a pattern. For example, say I have a pattern like this:
every (a=Event(fizz = 3 and buzz = 5) -> b=Event(fizz = 3 and buzz = 5 and foo = 1 and bar = 2))
I would like to declare fizz = 3 and buzz = 5 as a global expression as such:
create expression fizzbuzz alias for {fizz = 3 and buzz = 5}
Therefore, with EPL I could successfully simplify the pattern to the following:
every (a=Event(fizzbuzz) -> b=Event(fizzbuzz and foo = 1 and bar = 2))
I cannot seem to find a method in any of the classes in com.espertech.esper.client.soda in which I can refer to the global expression alias as I build the statement object. The best thing I could think of that would give me a valid pattern when converting the statement object to EPL would involve Expressions.property(alias), but I get the following error when I add the complete statement object to the Esper engine:
Failed to validate filter expression 'fizzbuzz': Property named 'fizzbuzz' is not valid in any stream [every (a=Event(fizzbuzz) -> b=Event(fizzbuzz and foo = 1 and bar = 2))]
Take note that a) global expressions were already declared at this point, b) If I add the pattern containing the global expression aliases in EPL form to the Esper engine, it works.
Any ideas? While it's an option, I'd prefer not to convert from EPStatementObjectModel to an EPL string everytime I add a new pattern to the engine.
You could inspect a generated object model in a debugger to find out. So in order to generate one, you could call "epadmin.compile("some epl with the expression") and see what comes back.
Following user650839's advice, I found through debugging that the way to go about including the alias to named global expressions is to incorporate a DotExpression into your statement object tree as such:
DotExpression globalExpression = new DotExpression();
globalExpression.add("fizzbuzz", new ArrayList<Expression>(), true);
i want to write a pig code to perform group by and generate sum of 31 fields, but before that i need to do some custom processing for which i wrote an eval function. i think i can make it run faster if i can include the GROUP and SUM operations into the UDF. To do this can i use algebraic UDF if yes how would my return schema of inital(), intermed() and final() look like, if no how else can i implement this. below is my code and thanks.
a = LOAD './a' using PigStorage('|') AS (val:int, grp1, grp2, amt1:long, amt2:long, amt3 ... amt31:long);
b = FOREACH a GENERATE myudfs.Custom(val) AS custom_val, grp1, grp2, amt1 ... amt31;
c = GROUP b BY (custom_val,grp1, grp2);
d = FOREACH c GENERATE group, SUM(b.amt1) ... SUM(b.amt31);
store d into './op';
How is it possible to perform GROUP within a UDF...?
GROUP is being translated in Pig into a MapReduce job (intermediate key of this job will be combined from custom_val,grp1, grp2).
The ability to iterate (FOREACH) through the entire list of tuples for a certain group is being done in the the Reducer.
Algebraic UDF will not "include the GROUP", but will be executed as part of the GROUP aggregations. So I think that Algebraic is not relevant here.
I guess that the only optimization that you might do here, is to group on the original val, and to call myudfs.Custom(val) only after the GROUP.
Assuming that your UDF is an injective function.
a = LOAD './a' using PigStorage('|') AS (val:int, grp1, grp2, amt1:long, amt2:long, amt3 ... amt31:long);
c = GROUP b BY (val,grp1, grp2);
d = FOREACH c GENERATE myudfs.Custom(group) AS custom_val, SUM(b.amt1) ... SUM(b.amt31);
store d into './op';
Hello all I'm trying to parse out a pretty well formed string into it's component pieces. The string is very JSON like but it's not JSON strictly speaking. They're formed like so:
createdAt=Fri Aug 24 09:48:51 EDT 2012, id=238996293417062401, text='Test Test', source="Region", entities=[foo, bar], user={name=test, locations=[loc1,loc2], locations={comp1, comp2}}
With output just as chunks of text nothing special has to be done at this point.
createdAt=Fri Aug 24 09:48:51 EDT 2012
id=238996293417062401
text='Test Test'
source="Region"
entities=[foo, bar]
user={name=test, locations=[loc1,loc2], locations={comp1, comp2}}
Using the following expression I am able to get most of the fields separated out
,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))(?=(?:[^']*'[^']*')*(?![^']*'))
Which will split on all the commas not in quotes of any type, but I can't seem to make the leap to where it splits on commas not in brackets or braces as well.
Because you want to handle nested parens/brackets, the "right" way to handle them is to tokenize them separately, and keep track of your nesting level. So instead of a single regex, you really need multiple regexes for your different token types.
This is Python, but converting to Java shouldn't be too hard.
# just comma
sep_re = re.compile(r',')
# open paren or open bracket
inc_re = re.compile(r'[[(]')
# close paren or close bracket
dec_re = re.compile(r'[)\]]')
# string literal
# (I was lazy with the escaping. Add other escape sequences, or find an
# "official" regex to use.)
chunk_re = re.compile(r'''"(?:[^"\\]|\\")*"|'(?:[^'\\]|\\')*[']''')
# This class could've been just a generator function, but I couldn;'t
# find a way to manage the state in the match function that wasn't
# awkward.
class tokenizer:
def __init__(self):
self.pos = 0
def _match(self, regex, s):
m = regex.match(s, self.pos)
if m:
self.pos += len(m.group(0))
self.token = m.group(0)
else:
self.token = ''
return self.token
def tokenize(self, s):
field = '' # the field we're working on
depth = 0 # how many parens/brackets deep we are
while self.pos < len(s):
if not depth and self._match(sep_re, s):
# In Java, change the "yields" to append to a List, and you'll
# have something roughly equivalent (but non-lazy).
yield field
field = ''
else:
if self._match(inc_re, s):
depth += 1
elif self._match(dec_re, s):
depth -= 1
elif self._match(chunk_re, s):
pass
else:
# everything else we just consume one character at a time
self.token = s[self.pos]
self.pos += 1
field += self.token
yield field
Usage:
>>> list(tokenizer().tokenize('foo=(3,(5+7),8),bar="hello,world",baz'))
['foo=(3,(5+7),8)', 'bar="hello,world"', 'baz']
This implementation takes a few shortcuts:
The string escapes are really lazy: it only supports \" in double quoted strings and \' in single-quoted strings. This is easy to fix.
It only keeps track of nesting level. It does not verify that parens are matched up with parens (rather than brackets). If you care about that you can change depth into some sort of stack and push/pop parens/brackets onto it.
Instead of splitting on the comma, you can use the following regular expression to match the chunks that you want.
(?:^| )(.+?)=(\{.+?\}|\[.+?\]|.+?)(?=,|$)
Python:
import re
text = "createdAt=Fri Aug 24 09:48:51 EDT 2012, id=238996293417062401, text='Test Test', source=\"Region\", entities=[foo, bar], user={name=test, locations=[loc1,loc2], locations={comp1, comp2}}"
re.findall(r'(?:^| )(.+?)=(\{.+?\}|\[.+?\]|.+?)(?=,|$)', text)
>> [
('createdAt', 'Fri Aug 24 09:48:51 EDT 2012'),
('id', '238996293417062401'),
('text', "'Test Test'"),
('source', '"Region"'),
('entities', '[foo, bar]'),
('user', '{name=test, locations=[loc1,loc2], locations={comp1, comp2}}')
]
I've set up grouping so it will separate out the "key" and the "value". It will do the same in Java - See it working in Java here:
http://www.regexplanet.com/cookbook/ahJzfnJlZ2V4cGxhbmV0LWhyZHNyDgsSBlJlY2lwZRj0jzQM/index.html
Regular Expression explained:
(?:^| ) Non-capturing group that matches the beginning of a line, or a space
(.+?) Matches the "key" before the...
= equal sign
(\{.+?\}|\[.+?\]|.+?) Matches either a set of {characters}, [characters], or finally just characters
(?=,|$) Look ahead that matches either a , or the end of a line.