Trying to understand parsers

Trying to understand parsers - java

I'm trying to use JavaCC to build a simple command line calculator that can handle a variety of expressions. While there are plenty of tutorials out there on how to write grammars, none that I've seen so far explain what happens afterwards.
What I understand right now is that after a string is passed into the parser, it's split into a tokens and turned into a parse tree. What happens next? Do I traverse through the parse tree doing a bunch of if-else string comparisons on the contents of each node and then perform the appropriate function?

I highly suggest you watch Scott Stanchfield's ANTLR 3.x tutorials. Even if you don't end up using ANTLR, which may be overkill for your project but I doubt it, you will learn a lot by watching him go through the thought process.
In general the process is...
Build a lexer to understand your tokens
Build a parser that can validate and understand and organize the input into an abstract syntax tree (AST) which should represent a simplified/easy-to-work-with version of your syntax
Run any calculation based on the AST

You need to actually compile or interpret it according to what you need..
For a calculator you just need to visit the tree recursively and evaluate the parsed tree while with a more complex language you would have to translate it to an intermediate language which is assembly-like but keeps abstraction from the underlying architecture.
Of course you could develop your own simple VM that is able to execute a set of instruction in which your language compiles but it would be overkill in your case.. just visit the parse tree. Something like:
enum Operation {
PLUS, MINUS
}
interface TreeNode {
float eval();
}
class TreeFloat implements TreeNode {
float val;
float eval() { return val; }
}
class TreeBinaryOp implements TreeNode {
TreeNode first;
TreeNode second;
Operation op;
float eval() {
if (op == PLUS)
return first.eval()+second.eval();
}
Then you just call the eval function on the root of the tree. A semantic checking could be needed (with the construction of a symbol table too if you plan to have variables or whatever).

Do I traverse through the parse tree doing a bunch of if-else string comparisons on the contents of each node and then perform the appropriate function?
No, there's no need to build a parse tree to implement a calculator. In the parts of the code where you would create a new node object, just do the calculations and return a number.
JavaCC allows you to choose any return type for a production, so just have your's return numbers.

Some parser generators (such as YACC) let you put actions within the grammar so when you apply a certain production you can also apply a defined action during that production.
E.g. in YACC:
E: NUM + NUM {$$ = $1.value + $2.value};
would add the values of NUM and return the result to the E non-terminal.
Not sure what JavaCC lets you do.

Related

ANTLR - How to determine what kind of parse tree "best fits" some code

I'm building a program with ANTLR where I ask the user to enter some Java code, and it spits out equivalent C# code. In my program, I ask the user to enter some Java code and then parse it. Up until now I've been assuming that they will enter something that will parse as a valid compilation unit on its own, e.g. something like
package foo;
class A { ... }
class B { ... }
class C { ... }
However, that isn't always the case. They might just enter code from the inside of a class:
public void method1() {
...
}
public void method2() {
...
}
Or the inside of a method:
System.out.print("hello ");
System.out.println("world!");
Or even just an expression:
context.getSystemService(Context.ACTIVITY_SERVICE)
If I try to parse such snippets by calling parser.compilationUnit(), it won't work correctly because most of the code is parsed as error nodes. I need to call the correct method depending on the nature of the code, such as parser.expression() or parser.blockStatements(). However, I don't want to ask the user to explicitly indicate this. What's the best way to infer what kind of code I'm parsing?

Rather than trying to guess a valid grammar rule entry point to parse a language snippet of unknown scope, progressively add scope wrappers to the source text until a valid top-level rule parse is achieved.
That is, with each successive parse failure, progressively add dummy package, class, & method statements as source text wrappers.
Whichever wrapper was added to achieve a successful parse will then be a known quantity. Therefore, the parse tree node representing the original source text can be easily identified.
Probably want to use a fail-fast parser; construct the parser with the BailErrorStrategy to obtain this behavior.

Our algorithm in Swiftify tries to select the best suitable parse rule from the defined rule set. This web-service converts Objective-C code fragments to Swift and you can estimate the quality of conversion immediately by your own.
Algorithm
We use open-sourced ObjectiveC grammar. Detail Steps of algorithm look like this:
Parse input Objective-C code fragment with the following rules
translationUnit
implementationDefinitionList
interfaceDeclarationList
expression
compoundStatement
If parse result of the certain rule does not contain any error returns this
rule at once.
Select the rule with the nearest to the end parse error.
If there are two or more rules with the same nearest to the end error
location, select the rule with the minimum number of syntax errors.
Demo
There are test code samples that parsed with different parse rules:
translationUnit: http://swiftify.me/clye5z
implementationDefinitionList: http://swiftify.me/fpasza
interfaceDeclarationList: http://swiftify.me/13rv2j
compoundStatement: http://swiftify.me/4cpl9n
Our algorithm is able to detect suitably parse rule even with an incorrect input:
compoundStatement with errors: http://swiftify.me/13rv2j/1

How to parse a mathematical function just once and use its result several times

I'm going to write a program that calculate the zeros of a given function. I decided to write a parser to parse that function (I have never written one before). It's a real-valued function of a real variable like "sin(1/x)+exp(x)". I want to use root finding methods like Bisection and Newton. Since these methods are iterative I want to avoid evaluating the function each time in a loop for each point x. So before I make an effort to write my own parser I want to know that is it possible to parse just once and evaluate the function f at points x0, x2, ..., xn without re-parsing f for each x?

There are two standard approaches to this problem:
Parse the formula into what is called an "abstract syntax tree" (AST). This is a compiler data structure that represents the structure of the formula, and it can be inspected quickly by code. It is also possible to evaluate the AST as a formula relatively quickly. See my SO answer
on building recursive descent parsers that support the above tasks:
Is there an alternative for flex/bison that is usable on 8-bit embedded systems?
Somehow compile the formula into your programming language. This often involves first building the AST, and then translating the AST into your programming lanuage terms, running your compiler on that result, and then loading the compiled result. This may be awkward with a compiled language such as C because the absence of dynamic linking; it is easier with languages such as Java or C# because they sort of encourage dynamic loading.
As you can guess, this approach is more effort, but the payoff is that the formula can now be evaluated as quickly as your programming language can do it. You can do this by ad hoc (e.g., recursive descent) parsing and translation, or you do tackle this in a more regular manner using a tool that "rewrites" one syntax into another: https://softwarerecs.stackexchange.com/a/31379/101

As Ira has already pointed out, you parse your expression to an abstract syntax tree. An abstract syntax tree fitting your would look similar to this:
interface AstNode {
double eval(double x);
}
class ConstantNode implements AstNode {
double value;
double eval(double x) {
return value;
}
}
class VariableNode implements AstNode {
double eval(double x) {
return x;
}
}
class OperatorNode implements AstNode {
char op;
AstNode left;
AstNode right;
double eval(double x) {
switch (op) {
case '+': return left.eval(x) + right.eval(x);
case '-': return left.eval(x) - right.eval(x);
case '/': return left.eval(x) / right.eval(x);
case '*': return left.eval(x) * right.eval(x);
case '^': return Math.pow(left.eval(x), right.eval(x));
default:
throw new RuntimeException("Invalid op " + op);
}
}
}
class Function implements AstNode {
...
After you have parsed your expression to the tree, you can just call eval() with the values you are interested in.

Maybe you can do that with Java Scripting Support. For example, it should be possible to evaluate the function in Javascript (Nashorn).
Pro: you don't need to parse the function yourself. Just serve the scripting API.

Edit AST by using visitors in Antlr

I am new to AntLR and I am struggling to do the following:
What I want to do is after I have parsed a source file (for which I have a valid grammar of course) and I have the AST in memory, to go and change some stuff and then print it back out though the visitor API.
e.g.
int foo() {
y = x ? 1 : 2;
}
and turn it into:
int foo() {
if (x) {
y = 1;
else {
y = 2;
}
}
Up to now I have the appropriate grammar to parse such syntax and I have also made some visitor methods that are getting called when I am on the correct position. What baffles me is that during visiting I can't change the text.
Ideally I would like to have something like this:
public Void visitTernExpr(SimpleCParser.TernExprContext ctx) {
ctx.setText("something");
return null;
}
and in my Main I would like to have this AST edited by different visitors that each one of them is specialised in something. Like this:
ANTLRInputStream input = new ANTLRInputStream(new FileInputStream(filename));
SimpleCLexer lexer = new SimpleCLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
SimpleCParser parser = new SimpleCParser(tokens);
ProgramContext ctx = parser.program();
MyChecker1 mc1 = new MyChecker1();
mc1.visit(ctx);
MyChecker2 mc2 = new MyChecker2();
mc1.visit(ctx);
ctx.printToFile("myfile");
Is there any way of doing those stuff in AntLR or am I on a very wrong direction?

You can do this ANTLR by smashing the AST nodes and links. You'll have create all the replacement subtree nodes and splice them in place. Then you'll have to implement the "spit source text" tree walk; I suggest you investigate "string templates" for this purpose.
But ultimately you have to do a lot of work to achieve this effect. This is because the goal of the ANTLR tool is largely focused around "parsing", which pushes the rest on you.
If what you are want to do is to replace one set of syntax by another, what you really want is a program transformation system. These are tools that are designed to have all of the above built-in already so you don't have to reinvent it all. They also usually have source-to-source transformations, which make accomplishing tasks like the one you have shown much, much easier to implement.
To accomplish your example with our DMS program transformation engine, you'd write a transformation rule and then apply it:
rule replace_ternary_assignment_by_ifthenelse
(l: left_hand_side, c: expression, e1: expression, e2: expression):
statement -> statement
"\l = \c ? \e1 : \e2;"
=> " if (\c) \l = \e1; else \l = \e2 ";
DMS parses your code, builds ASTs, find matches for the rewrites, constructs/splices all those replacement nodes for you. Finally,
DMS has built-in prettyprinters to regenerate the text. The point
of all this is to let you get on with your task of modifying your
code, rather than creating a whole new engineering job before you can
do your task. Read my essay, "Life After Parsing", easily
found by my bio or with a google search for more on this topic.
[If you go to the DMS wikipedia page, you will amusingly find
the inverse of this transform used as an example].

I would use a listener, and yes you can modify the AST while you are walking through it.
You can create a new instance of the if/else context and then replace the ternary operator context with it. This is posible because you have a reference to the rule parent and an extensive API to handle every rule children.

Binary Tree Postfix Calculator

I'm making a Postfix calculator where I must use stack objects and a Binary tree during the translation of an expression from infix to a parse tree during the evaluation of a postfix expression.
Can someone please translate?
I have developed a postfix calculator method and I've developed a method that changes an expression from infix to postfix, but I don't understand what I am being asked to do. I can enter an expression in infix and calculate it fine as well as convert it to postfix, but I cannot determine what exactly I am being asked to create here.
An example of how to essentially do this in pseudocode would be very helpful or just an explanation of how to store a mathematical expression into a binary tree as well as how to evaluate an expression in a binary tree with stack into a parse tree.
I'll also say I'm a little unsure what a parse tree is.
Any explanation would be very much appreciated.
It is an assignment for a class, so it can be seen here if this was inadequate information: http://www.cs.gsu.edu/jbhola/csc3410/Spring13/assign6_expre_tree.html
My main point here is I just don't quite understand what I'm supposed to do or how I'm supposed to do it. We weren't taught how to program any of this and we lack a textbook so I'm just kind of blindly trying to wrap my head around the whole project :/

Imagine you have a node like AddNode which has two values
class AddNode {
final double a, b;
double value() {
return // how could you return the value of this node?
}
}
Making it more generic
interface Node { double value(); }
class AddNode implements Node {
final Node a, b;
double value() {
return // something which gives the value of this node.
}
}

How do I pass in a polynomial function in java?

For a programming project in Calculus we were instructed to code a program that models the Simpson's 1/3 and 3/8 rule.
We are supposed to take in a polynomial(i.e. 5x^2+7x+10) but I am struggling conceptualizing this. I have began by using scanner but is there a better way to correctly read the polynomial?
Any examples or reference materials will be greatly appreciated.

I'd suggest that you start with a Function interface that takes in a number of input values and returns an output value:
public interface Function {
double evaluate(double x);
}
Write a polynomial implementation:
public class Poly {
public static double evaluate(double x, double [] coeffs) {
double value = 0.0;
if (coeffs != null) {
// Use Horner's method to evaluate.
for (int i = coeffs.length-1; i >= 0; --i) {
value = coeffs[i] + (x*value);
}
}
return value;
}
}
Pass that to your integrator and let it do its thing.

A simple way (to get you started) is to use an array.
In your example: 5x^2 + 7x + 10 would be:
{10,7,5}
I.e. at index 0 is the factor 10 for x^0 at index 1 is 7 for x^1 at index 2 is 10 for x^2.
Of course this not the best approach. To figure out way figure out how you would represent x^20

In java it would be easiest to pre-format your input and just ask for constants--as in, "Please enter the X^2 term" (and then the X term, and then the constant).
If that's not acceptable, you are going to be quite vulnerable to input style differences. You can separate the terms by String.split[ting] on + and -, that will leave you something like:
[5x^2], [7x], [10]
You could then search for strings containing "x^2" and "x" to differentiate your terms
Remove spaces and .toLowerCase() first to counter user variances, of course.
When you split your string you will need to identify the - cases so you can negate those constants.
You could do two splits, one on + the other on -. You could also use StringTokenizer with the option to keep the "Tokens" which might be more straight-forward but StringTokenizer makes some people a little uncomfortable, so go with whatever works for you.
Note that this will succeed even if the user types "5x^2 + 10 + 7 x", which can be handy.

I believe parsing is my problem. I am somewhat new to java so this is troubling me.
You should use a parser generator.
A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar. In addition to the parser generator itself, JavaCC provides other standard capabilities related to parser generation such as tree building (via a tool called JJTree included with JavaCC), actions, debugging, etc.
JavaCC's FAQ answers How do I parse arithmetic expressions?
See the examples that come with JavaCC.
See any text on compiling.
See Parsing Epressions by Recursive Descent and a tutorial by Theodore Norvell.
Also, see JavaCC - Parse math expressions into a class structure

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.