How to reference attribute from .bnf parser in JFlex? - java

I'm using a .bnf parser to detect specific expressions and I'm using JFlex to detect the different sections of these expressions. My issue is, some of these expressions may contain nested expressions and I dont know how to handle that.
I've tried to include the .bnf parser in my JFlex by using %include, then referencing the expression in the relative macro using PARAMETERS = ("'"[:jletter:] [:jletterdigit:]*"'") | expression. This fails as JFlex reports the .bnf to be malformed.
Snippet of JFlex:
%{
public Lexer() {
this((java.io.Reader)null);
}
%}
%public
%class Lexer
%implements FlexLexer
%function advance
%type IElementType
%include filename.bnf
%unicode
PARAMETERS= ("'"[:jletter:] [:jletterdigit:]*"'") | <a new expression element>
%%
<YYINITIAL> {PARAMETERS} {return BAD_CHARACTER;} some random return
Snippet of .bnf parser:
{
//list of classes used.
}
expression ::= (<expression definition>)
Any input would be greatly appreciated. Thanks.

I've found the solution to my issue. In further depth, the problem was in both my grammar file and my flex file. To solve the issue, I recursively called the expression in the grammar file like so:
expression = (start value expression? end)
With the JFlex, I declared numerous states until I found a way to chain together and endless amount of expressions. Looks a little like this:
%state = WAITING_EXPRESSION
<WAITING_NEXT> "<something which indicates start of nested expression>" { yybegin(WAITING_EXPRESSION); return EXPRESSION_START; }

Related

jsqlparser to evaluate a condition

The following String:
x=92 and y=29
Produces a valid output: x=92 AND y=29 and it works fine with CCJSqlParserUtil.parseCondExpression but shouldn't it throw an exception for the following?
x=92 lasd y=29
But the output is just: x=92
Furthermore which Expression I should use to implement my own visitor? i.e,
CCJSqlParser c= new CCJSqlParser(new StringReader(str));
Expression e = c.Expression(); // or SimpleExpression, etc..
So that when 'lasd' (anything other than not,or,and) is encountered I can throw an exception and not silently ignore the rest of the expression?
Recently a patch of JSqlParser (1.2-SNAPSHOT) was published to provide the needed behaviour:
CCJSqlParserUtil.parseExpression(String expression, boolean allowPartialParse)
and
CCJSqlParserUtil.parseCondExpression(String expression, boolean allowPartialParse)
Setting allowPartialParse to false will result in the mentioned Exception.
For on the fly interpreted stuff the already existing behaviour is still needed, e.g. to provide expressions from within an text. (Syntax coloring, Context help, ...)

Xtext - No viable alternative at input

i'm trying to create a grammar that join togheter a script language with the possibility to create method.
Grammar
grammar org.example.domainmodel.Domainmodel with org.eclipse.xtext.xbase.Xbase
generate domainmodel "http://www.example.org/domainmodel/Domainmodel"
import "http://www.eclipse.org/xtext/xbase/Xbase" as xbase
Model:
imports = XImportSection
methods += XMethodDeclaration*
body = XBlockScriptLanguage;
XMethodDeclaration:
"def" type=JvmTypeReference name=ValidID
'('(params+=FullJvmFormalParameter (',' params+=FullJvmFormalParameter)*)? ')'
body=XBlockExpression
;
XBlockScriptLanguage returns xbase::XExpression:
{xbase::XBlockExpression}
(expressions+=XExpressionOrVarDeclaration ';'?)*
;
At the moment i create the following JvmModelInferr, for defining the main method for scripting language.
JvmModelInferr
def dispatch void infer(Model model, IJvmDeclaredTypeAcceptor acceptor, boolean isPreIndexingPhase) {
acceptor.accept(
model.toClass("myclass")
).initializeLater [
members += model.toMethod("main", model.newTypeRef(Void::TYPE)) [
parameters += model.toParameter("args", model.newTypeRef(typeof(String)).addArrayTypeDimension)
setStatic(true)
body = model.body
]
]
}
When i tryed to use my grammar, i obtain the following error after that i wrote my method:
no viable alternative at input 'def'
The method mymethod() is undefined
The problem is related only with method declaration, without it myclass.java is created.
Moreover i obtain the "Warning 200" for a not clear grammar, why?
There are two fixes that appear necessary:
The imports section is not marked as optional. If it was intended to be optional, it should be declared as imports ?= XImportSection. Or, add necessary import statements to your JvmModelInferr example.
The dispatch keyword isn't defined in your grammar. As defined, a method should consist of def, followed by a Java type (the return type), and then the method's name (then the body, etc.). You could add `(dispatch ?= 'dispatch') if you're targeting Xtend and intend to support its multiple dispatch feature (or your own version of it).
HTH

Parsing text in parens using JParsec

I'm writing a parser for a DSL that uses the syntax (nodeHead: nodeBody). The problem is that nodeBody may contain parens, at some cases.
The between operator of JParsec should have been a good solution, yet the following code fails:
public void testSample() {
Parser<Pair<String,String>> sut = Parsers.tuple(Scanners.IDENTIFIER.followedBy(Scanners.among(":")),
Scanners.ANY_CHAR.many().source()
).between(Scanners.among("("), Scanners.among(")"));
sut.parse("(hello:world)");
}
It does not fail when I change ANY_CHAR to IDENTIFIER, so I assume the issue here is that the second parser in the tuple is too greedy. Alternatively, can I make JParsec apply the between parsers before it applies the body?
Any ideas are very much appriciated.
At the time I was asking, seems like there was no way to do that. However, a github fork-and-pull later, there is: reluctantBetween().
Big thanks to #abailly on the fast response.
If the syntax rule is that the last character will always be ")", you could probably do:
static <T> Parser<T> reluctantBetween(
Parser<?> begin, Parser<T> parser, Parser<?> end) {
Parser<?> terminator = end.followedBy(eof());
return between(begin, terminator.not().next(parser).many(), terminator);
}

How user can (safely) programme their own filter in Java

I want my users to be able to write there own filter when requesting a List in Java.
Option 1) I'm thinking about JavaScript with Rhino.
I get my user's filter as a javascript string. And then call isAccepted(myItem) in this script.
Depending on the reply I accept the element or not.
Option 2) I'm thinking about Groovy.
My user can write Groovy script in a textfield. When my user searches with this filter the Groovy script is compiled in Java (if first call) and call the Java methode isAccepted()
Depending on the reply I accept the element or not.
My application rely a lot on this fonctionallity and it will be called intensively on my server.
So I beleave speed is the key.
Option 1 thinking:
Correct me if I'm wrong, but I think in my case the main advantage of Groovy is the speed but my user can compile and run unwanted code on my server... (any workaround?)
Option 2 thinking:
I think in most people mind JavaScript is more like a toy. Even if it's not my idea at all it is probably for my customers who will not trust it that much. Do you think so?
An other bad point I expect is speed, from my reading on the web.
And again my user can access Java and run unwanted code on my server... (any workaround?)
More info:
I'm running my application on Google App Engine for the main web service of my app.
The filter will be apply 20 times by call.
The filter will be (most of the times) simple.
Any idea to make this filter safe for my server?
Any other approche to make it work?
My thoughts:
You'll have to use your own classloader when compiling your script, to avoid any other classes to be accessible from the script. Not sure if that is possible in GAE.
You'll have to use Java's SecurityManager features to avoid a script being able to access the file ssystem, network, etc etc. Not sure if that is possible in GAE.
Looking only at the two items above, it looks incredibly complicated and brittle to me. If you can't find existing sandboxing features as an existing project, you should stay away from it.
Designing a Domain Specific Language that will allow the expressions you decide are legal is a lot safer, and looking at the above items, you will have to think very hard anyway at what you want to allow. From there to designing the language is not a big step.
Be careful not to implement the DSL with groovy closures (internal DSL), because that is just groovy and you are hackable too. You need to define an extrnal language and parse it. I recommend the parser combinator jparsec to define the grammar. no compiler compiler needed in that case.
http://jparsec.codehaus.org/
FYI, here's a little parser I wrote with jparsec (groovy code):
//import some static methods, this will allow more concise code
import static org.codehaus.jparsec.Parsers.*
import static org.codehaus.jparsec.Terminals.*
import static org.codehaus.jparsec.Scanners.*
import org.codehaus.jparsec.functors.Map as FMap
import org.codehaus.jparsec.functors.Map4 as FMap4
import org.codehaus.jparsec.functors.Map3 as FMap3
import org.codehaus.jparsec.functors.Map2 as FMap2
/**
* Uses jparsec combinator parser library to construct an external DSL parser for the following grammar:
* <pre>
* pipeline := routingStep*
* routingStep := IDENTIFIER '(' parameters? ')'
* parameters := parameter (',' parameter)*
* parameter := (IDENTIFIER | QUOTED_STRING) ':' QUOTED_STRING
* </pre>
*/
class PipelineParser {
//=======================================================
//Pass 1: Define which terminals are part of the grammar
//=======================================================
//operators
private static def OPERATORS = operators(',', '(', ')', ':')
private static def LPAREN = OPERATORS.token('(')
private static def RPAREN = OPERATORS.token(')')
private static def COLON = OPERATORS.token(':')
private static def COMMA = OPERATORS.token(',')
//identifiers tokenizer
private static def IDENTIFIER = Identifier.TOKENIZER
//single quoted strings tokenizer
private static def SINGLE_QUOTED_STRING = StringLiteral.SINGLE_QUOTE_TOKENIZER
//=======================================================
//Pass 2: Define the syntax of the grammar
//=======================================================
//PRODUCTION RULE: parameter := (IDENTIFIER | QUOTED_STRING) ':' QUOTED_STRING
#SuppressWarnings("GroovyAssignabilityCheck")
private static def parameter = sequence(or(Identifier.PARSER,StringLiteral.PARSER), COLON, StringLiteral.PARSER, new FMap3() {
def map(paramName, colon, paramValue) {
new Parameter(name: paramName, value: paramValue)
}
})
//PRODUCTION RULE: parameters := parameter (',' parameter)*
#SuppressWarnings("GroovyAssignabilityCheck")
private static def parameters = sequence(parameter, sequence(COMMA, parameter).many(), new FMap2() {
def map(parameter1, otherParameters) {
if (otherParameters != null) {
[parameter1, otherParameters].flatten()
} else {
[parameter1]
}
}
})
//PRODUCTION RULE: routingStep := IDENTIFIER '(' parameters? ')'
#SuppressWarnings("GroovyAssignabilityCheck")
private static def routingStep = sequence(Identifier.PARSER, LPAREN, parameters.optional(), RPAREN, new FMap4() {
def map(routingStepName, lParen, parameters, rParen) {
new RoutingStep(
name: routingStepName,
parameters: parameters ?: []
)
}
})
//PRODUCTION RULE: pipeline := routingStep*
#SuppressWarnings("GroovyAssignabilityCheck")
private static def pipeline = routingStep.many().map(new FMap() {
def map(from) {
new Pipeline(
routingSteps: from
)
}
})
//Combine the above tokenizers to create the tokenizer that will parse the stream and spit out the tokens of the grammar
private static def tokenizer = or(OPERATORS.tokenizer(), SINGLE_QUOTED_STRING, IDENTIFIER)
//This parser will be used to define which input sequences need to be ignored
private static def ignored = or(JAVA_LINE_COMMENT, JAVA_BLOCK_COMMENT, WHITESPACES)
/**
* Parser that is used to parse extender pipelines.
* <pre>
* def parser=PipelineParser.parser
* Pipeline pipeline=parser.parse(pipelineStr)
* </pre>
* Returns an instance of {#link Pipeline} containing the AST representation of the parsed string.
*/
//Create a syntactic pipeline parser that will use the given tokenizer to parse the input into tokens, and will ignore sequences that are matched by the given parser.
static def parser = pipeline.from(tokenizer, ignored.skipMany())
}
Some thoughts:
Whether you use JavaScript or Groovy, it will be run in a context that you provide to the script, so the script should not be able to access anything that you don't want it to (but of course, you should test it extensively to be sure if go this route).
You'd probably be safer by having the filter expression specified as data, rather than as executable code, if possible. Of course, this depends on how complex the filter expressions are. Perhaps you can break up the representation into something like field, comparator, and value, or something similar, that can be treated as data and evaluated in regular way?
If you're worried about what the user can inject via a scripting language, you're probably safer with JavaScript. I don't think that performance should be a problem, but again, I'd suggest extensive testing to be sure.
I would never let users input arbitrary code. It's brittle, insecure and a bad user experience. Not knowing anything about your users, my guess is that you will spend a lot of time answering questions.. If most of your filters are simple, why not create a little filter builder for them instead?
As far as groovy vs JavaScript i think groovy is easier to understand and better for scripting but that's just my opinion.

Equivalent of Java's Matcher.lookingAt() in Objective C

I am porting a framework from Java to Objective C which heavily depends on regular expressions. Unfortunately the Java regular expressions API is a lot different from the Objective C API.
I am trying to use the NSRegularExpression class to evaluate the regular expressions. In Java this is completely different: you have to use the Pattern and Matcher classes.
There is something I can't figure out (among other things). What is the equivalent of Matcher.lookingAt() in Objective C? To put it in code. What would be the Objective C translation of the following code?
Pattern pattern = Pattern.compile("[aZ]");
boolean lookingAt = pattern.matcher("abc").lookingAt();
Thanks to anyone who knows! (btw the above example assigns true to the lookingAt boolean)
I figured it out! This is the NSRegularExpression equivalent of the Java code:
NSError *error = nil;
NSRegularExpression *expression = [NSRegularExpression regularExpressionWithPattern:#"[aZ]" options:0 error:&error];
if (error) {
// Do something when an error occurs
}
NSString *candidate = #"abc";
BOOL lookingAt = [expression numberOfMatchesInString:candidate options:NSMatchingAnchored range:NSMakeRange(0, candidate.length)] > 0;
The emphasis here lies on the NSMatchingAnchored option when executing the expression! The docs say:
NSMatchingAnchored Specifies that matches are limited to those at the
start of the search range. See
enumerateMatchesInString:options:range:usingBlock: for a description
of the constant in context.
That's exactly what I was looking for!
You may do something like
NSString *regex = #"ObjC";
NSPredicate *predicate = [NSPredicate predicateWithFormat:#"SELF CONTAINS %#", regex];
if( [predicate evaluateWithObject:myString])
NSLog(#"matches");
else
NSLog(#"does not match");
take a look at Predicate Format String Syntax guide for further options.

Categories