I have defined a grammar through which I produce a series of abstract syntax trees of sorts. For example
x = 7
println + * x 2 5
becomes
(assign)
/ \
x 7
(println)
|
(+)
/ \
(*) 5
/ \
x 2
These trees are made from various Node classes representing values and operations. Now, these trees are easy to interpret but my goal is to generate Java bytecode representing these processes. My question is, what would be the best way to approach that? Should I just literally write various bytecode instructions to a .class file, or is there some library or interface that can help with this sort of thing?
The answer is yes. ASM and BCEL are two Java libraries designed to assist with runtime generation of classfiles.
I have written a framework that lets you compose code with declarative expression trees and then compile them to bytecode. Though I had developed it to ease runtime code generation, it could just as easily be leveraged as a sort of compiler back end, e.g., by translating your syntax tree into one of my expression trees.
At the very least, you could probably glean some useful bits from my compiler and maybe use my low-level code generation framework.
My repository is here. The expression tree library is procyon-expressions.
Related
In a Java program which has a variable t counting up the time (relative to the program start, not system time), how can I turn a user-input String into a math formula that can be evaluated efficiently when needed.
(Basically, the preparation of the formula can be slow as it happens Pre run-time, but each stored function may be called several times during run-time and then has to be evaluated efficiently)
As I could not find a Math parser that would keep a formula loaded for later reference instead of finding a general graph solving the equation of y=f(x), I was considering to instead have my Java program generate a script (JS, Python, etc) out of the input String and then call said script with the current t as input parameter.
-However I have been told that Scripts are rather slow and thus impractical for real-time applications.
Is there a more efficient way of doing this? (I would even consider making my Java application generate and compile C-code for every user input if this would be viable)
Edit: A tree construct does work to store expressions, but is still fairly slow to evaluate as from what I understand I would need to turn it into a chain of expressions again when evaluating (as in, traverse the tree object) which should need more calls than direct solving of an equation. Instead I will attempt the generation of additional java classes.
What I do is generate Java code at a runtime and compile it. There are a number of libraries to help you do this, one I wrote is https://github.com/OpenHFT/Java-Runtime-Compiler This way it can be as efficient as if you had hand written the Java code yourself and if called enough times will be compiled to native code.
Can you provide some information on assumed function type and requested performance? Maybe it will be enough just to use math parser library, which pre-compiles string containing math formula with variables just once, and then use this pre-compiled form of formula to deliver result even if variables values are changing? This kind of solutions are pretty fast as it typically do not require repeating string parsing, syntax checking and so on.
An example of such open-source math parser I recently used for my project is mXparser:
mXparser on GitHub
http://mathparser.org/
Usage example containing function definition
Function f = new Function("f(x,y) = sin(x) + cos(y)");
double v1 = f.calculate(1,2);
double v2 = f.calculate(3,4);
double v3 = f.calculate(5,6);
In the above code real string parsing will be done just once, before calculating v1. Further calculation v1, v2 (an possible vn) will be done in fast mode.
Additionally you can use function definition in string expression
Expression e = new Expression("f(1,2)+f(3,4)", f);
double v = e.calculate();
I am trying to convert ant ANTLR3 grammar to an ANTLR4 grammar, in order to use it with the antlr4-python2-runtime.
This grammar is a C/C++ fuzzy parser.
After converting it (basically removing tree operators and semantic/syntactic predicates), I generated the Python2 files using:
java -jar antlr4.5-complete.jar -Dlanguage=Python2 CPPGrammar.g4
And the code is generated without any error, so I import it in my python project (I'm using PyCharm) to make some tests:
import sys, time
from antlr4 import *
from parser.CPPGrammarLexer import CPPGrammarLexer
from parser.CPPGrammarParser import CPPGrammarParser
currenttimemillis = lambda: int(round(time.time() * 1000))
def is_string(object):
return isinstance(object,str)
def parsecommandstringline(argv):
if(2!=len(argv)):
raise IndexError("Invalid args size.")
if(is_string(argv[1])):
return True
else:
raise TypeError("Argument must be str type.")
def doparsing(argv):
if parsecommandstringline(argv):
print("Arguments: OK - {0}".format(argv[1]))
input = FileStream(argv[1])
lexer = CPPGrammarLexer(input)
stream = CommonTokenStream(lexer)
parser = CPPGrammarParser(stream)
print("*** Parser: START ***")
start = currenttimemillis()
tree = parser.code()
print("*** Parser: END *** - {0} ms.".format(currenttimemillis()-start))
pass
def main(argv):
tree = doparsing(argv)
pass
if __name__ == '__main__':
main(sys.argv)
The problem is that the parsing is very slow. With a file containing ~200 lines it takes more than 5 minutes to complete, while the parsing of the same file in antlrworks only takes 1-2 seconds.
Analyzing the antlrworks tree, I noticed that the expr rule and all of its descendants are called very often and I think that I need to simplify/change these rules to make the parser operate faster:
Is my assumption correct or did I make some mistake while converting the grammar? What can be done to make parsing as fast as on antlrworks?
UPDATE:
I exported the same grammar to Java and it only took 795ms to complete the parsing. The problem seems more related to python implementation than to the grammar itself. Is there anything that can be done to speed up Python parsing?
I've read here that python can be 20-30 times slower than java, but in my case python is ~400 times slower!
I confirm that the Python 2 and Python 3 runtimes have performance issues. With a few patches, I got a 10x speedup on the python3 runtime (~5 seconds down to ~400 ms).
https://github.com/antlr/antlr4/pull/1010
I faced a similar problem so I decided to bump this old post with a possible solution. My grammar ran instantly with the TestRig but was incredibly slow on Python 3.
In my case the fault was the non-greedy token that I was using to produce one line comments (double slash in C/C++, '%' in my case):
TKCOMM : '%' ~[\r\n]* -> skip ;
This is somewhat backed by this post from sharwell in this discussion here: https://github.com/antlr/antlr4/issues/658
When performance is a concern, avoid using non-greedy operators, especially in parser rules.
To test this scenario you may want to remove non-greedy rules/tokens from your grammar.
Posting here since it may be useful to people that find this thread.
Since this was posted, there have been several performance improvements to Antlr's Python target. That said, the Python interpreter will be intrinsically slower than Java or other compiled languages.
I've put together a Python accelerator code generator for Antlr's Python3 target. It uses Antlr C++ target as a Python extension. Lexing & parsing is done exclusively in C++, and then an auto-generated visitor is used to re-build the resulting parse tree in Python. Initial tests show a 5x-25x speedup depending on the grammar and input, and I have a few ideas on how to improve it further.
Here is the code-generator tool: https://github.com/amykyta3/speedy-antlr-tool
And here is a fully-functional example: https://github.com/amykyta3/speedy-antlr-example
Hope this is useful to those who prefer using Antlr in Python!
I use ANTLR in python3 target these days.
And a file with 500~ lines just take about less than 20 sec to parse.
So turning to Python3 target might help
For implementation specific reasons, I have to use Java 1.2. I am trying to parse a String object with only numbers (I replace variables beforehand to abstract that step) and operators (PEDMAS). I have found a lot of libraries that do this well, but unfortunately nothing that is compatible with Java 1.2 (Even with fiddling, all of them are dependent on things like generics). Obviously I'm capable of making this myself, but I would certainly prefer to not remake the wheel. Are there any libraries that I just haven't found yet that could do this for me? Thanks.
(Requirements: Binary operators and parentheses)
EDIT: As requested, some examples of input and output:
"(10 / 5) + 4.5 - (8)" would give you -1.5
"(1/3) * 4" would give you 1.3333333...
"5^3 + 4 * 2" would give you 133
"-10 + 5" would give you -5
Hopefully that makes sense.
You can write your own recursive descent parser. This Java implementation uses StreamTokenizer, available since 1.0, but you'll have to substitute int constants for the enum tokens and ignore tokenIs(Symbol.WORD) for function identifiers.
I have some classes that I developed that I am using in a Android application. I have about 6-7 classes in that core, some of them are abstract classes with abstract methods. Those classes were created to provide a API to extend my Android Application.
Now I want to create an extensible system that accepts rewrite rules. Those rules are useful to replace some components at runtime. Imagine a system with mathematical operations where you see all the sums, multiplications, etc. Now you can zoom out and I want to simplify some operations dependending on the zoom level.
My system was built in java, but I belive that scala, with pattern matching, will simplify my problem. However, everytime I look at scala I see a lot of time I have to spend and a lot of headches configuring IDEs...
My classes are built to create structures like this one:
I want to be able to write rules that create a block that contains other blocks. Something like:
Integer Provider + Integer Provider -> Sum Provider
Sum Provider + Sum -> Sum Provider
Rules can be created by programmers. Any element of my structure can also be built by programmers. I don't know if scala simplifies this rule engine system, but I know that this engine, in java, can be boring to build (probly a lot of bugs will be created, I will forget some cases, etc).
Should I change all my system to scala? Or there is away to use only this feature of scala? Does it worth it?
PS: For more information on the structure please see this post at User Experience.
Yes, it is easy to write such rules in Scala, and, in fact, there have been some questions on Stack Overflow related to rule rewriting systems in Scala. Also, there are some libraries that may help you with this, related to strategic programming and nlp, but I haven't used them, so I can't comment much.
Now, I don't know exactly where these classes are coming from. If you are parsing and building them, the parser combinator library can trivially handle it:
sealed trait Expr { def value: Int }
case class Number(value: Int) extends Expr
case class Sum(e1: Expr, e2: Expr) extends Expr { def value = e1.value + e2.value }
object Example extends scala.util.parsing.combinator.RegexParsers {
def number: Parser[Expr] = """\d+""" ^^ (n => Number(n.toInt))
def sum: Parser[Expr] = number ~ "+" ~ expr ^^ {
case n ~ "+" ~ exp => Sum(n, exp)
}
def expr: Parser[Expr] = sum | number
}
If you have these classes in some other way and are applying simplifications, you could do it like this:
def simplify(expr: List[Expr]): Expr = expr match {
case expr :: Nil =>
List(expr) // no further simplification
case (n1: NumberProvider) :: Plus :: (n2: NumberProvider) :: rest =>
simplify(SumProvider(n1, n2) :: rest)
case (n: NumberProvider) :: Plus :: (s: SumProvider) :: rest =>
simplify(SumProvider(n, s) :: rest)
case (s: SumProvider) :: Plus :: (n: NumberProvider) :: rest =>
simplify(SumProvider(s, n) :: rest)
case other => other // no further simplification possible
}
The important elements here are case classes, extractors and pattern matching.
As a lone developer, Scala is expressive and powerful, so once mastered can be satisfying to write less and do more -- less boilerplate code, more compact idioms.
However, the power of Scala does come at a cost: it is a totally different language, with different (and I'd say more complex) idioms and syntax from Java. Java was designed to be intentionally simple, with the idea being that in larger organizations with more code being shared among developers, explicitness and syntax simplicity are more valuable than brilliantly concise code.
Java made an intentional choice to provide a smaller toolset to make it as quick and easy as possible for one developer to pick up where another left off, so in that sense it's geared towards team development. Scala however gives you a bit more rope to make concise but less immediately obvious constructs, which can be a minus for large enterprise environments.
Currently, Scala also has a smaller developer pool than Java, which means fewer examples and a smaller talent pool if you ever intend to hire a development team.
But as a lone developer on a solo project or in a small tight-knit team, Scala can be fun and fast to code with once you get over the significant learning hump.
If you are switching to Scala switch to it for everything you can. There is hardly a point in using Java.
Is it worth the investment? From what one can read on the web (and my own impression) you won't become faster with Scala, but you will learn a lot.
So if you are only concerned with development speed: ignore Scala.
If you want to learn: Scala is a great choice as the next language to learn and use.
I'm looking for a CFG parser implemented with Java. The thing is I'm trying to parse a natural language. And I need all possible parse trees (ambiguity) not only one of them. I already researched many NLP parsers such as Stanford parser. But they mostly require statistical data (a treebank which I don't have) and it is rather difficult and poorly documented to adapt them in to a new language.
I found some parser generators such as ANTRL or JFlex but I'm not sure that they can handle ambiguities. So which parser generator or java library is best for me?
Thanks in advance
You want a parser that uses the Earley algorithm. I haven't used either of these two libraries, but PEN and PEP appear implement this algorithm in Java.
Another option is Bison, which implements GLR. GLR is an LR type parsing algorithm that supports ambiguous grammars. Bison also generates Java code, in addition to C++.
Take a look at the related discussion here. In my last comment in that discussion I explain that you can make any parser generator produce all of the parse trees by cloning the parse tree derived so far before making the derivation fail.
If your grammar is:
G -> ...
You would augment is as this:
G' -> G {semantic:deal-with-complete-parse-tree} <NOT-VALID-TOKEN>.
The parsing engine will ultimately fail on all derivations, but your program will either have:
Saved clones of all the trees.
Dealt with the semantics of each of the trees as they were found.
Both ANTLR and JavaCC did well when I was teaching. My preference was for ANTLR because of its BNF lexical analysis, and its much less convoluted history, vision, y and licensing.