My current understanding is that JDT provides us two different interface and class hierachies for representing and manipulating Java code:
Java Model: provides a way of representing a java project. Fast to
create but does not contain as many information as AST class
hierachy, for example there are no information available about the
exact position of each element in the source file (in AST that's
available)
AST: more detailed representation of the source code plus provides
means for manipulating it.
Is that correct?
Now, there is also a hierarchy of interfaces named I*Binding (starting at IBinding), for example IMethodBinding. So for example, we have 3 different types for dealing with methods:
IMethod (from Java Model)
MethodInvocation (from AST, could get it from IMethod)
IMethodBinding
From doc IMethodBinding seems very like MethodInvocation from AST but I don't see a clear distinction and when should I use them. Could someone please clarify this?
Raw AST nodes do not contain references between them e.g. from variable use back to its declaration, or from method invocation back to method declaration. MethodInvocation object may be inspected for method name, but you can't immidiately learn what method of which class is being invoked actually. scoping analysis is required to do so.
This analysis is called binding resolution. IBinding objects are attached to AST nodes and you can use them to find e.g. a MethodDeclaration AST node for a given MethodInvocation AST node using CompilationUnit.findDeclaringNode(methodInvocationNode.resolveMethodBidning().getKey())
Or you can use CompilationUnit.findDeclaringNode(method.getKey()) to find which AST node contains declaration corresponding to given IMethod object.
MethodInvocation.resolveBinding().getKey() ==
MethodDeclaration.resolveBinding().getKey() ==
IMethod.getKey()
Related
Im trying to programmatically use / embed the Ceylon Typechecker to analyse Ceylon source code. In this use case I want to access all the information that normally the compiler would consume. But Im not going to compile and Im not going to add dependency on the compiler.
It seem to me that the main.Main entry point in ceylon/typechecker/src/main/Main.java is not the appropriate entry point for this use case (obtaining the typed tree and the attached models), because this information, which was collected by the visitors in the three checker passes is discarded, and only errors are printed.
So, my question is:
How can I parse and typecheck a compilation unit and then obtain
1. the typed syntax tree, and
2. the associated model objects of the types the analysis visitors encounter in the tree, which are linked to from the tree.
edited:
There was (and is) some confusion on my side about the 3 different ASTs there are.
In the README in ceylon /ceylon.ast it is said:
¨ ...
ceylon.ast.core – the Ceylon classes that represent a Ceylon AST. Pure Ceylon (backend-independent).
...
ceylon.ast.redhat – transforms a ceylon.ast.core AST from + to a RedHat compiler (ceylon-spec AST, and also contains functions to compile a ceylon.ast.core AST from a code string (using the RedHat compiler)
... ¨.
So there are 3 ASTs: 1. The one generated by antlr, 2. ceylon.ast.core, and 3. ceylon.ast.redhat.
Why?
In short, you'll want to:
Configure a TypeCheckerBuilder with the source files you want to typecheck,
Obtain a TypeChecker from the builder (builder.typechecker),
Invoke the typechecker (typeChecker.process()),
Process the results available from typeChecker.phasedUnits. Specifically, typeChecker.getPhasedUnits().getPhasedUnits() will give you a List<PhasedUnit>, and for each PhasedUnit, you can call getCompilationUnit() to obtain its Tree.CompilationUnit, which is the root of the AST. AST nodes also include getters for model objects.
For a detailed example, you can review the code for the Dart backend, working forwards and backwards from the call to process() in the compileDart() function.
See testCompile for example code that calls compileDart().
I want to get typed AST from JavaParser or another parser of Java code. It means I would be able to get the type for a specific variable or parameters+returning type of a method. I googled a lot about this feature of JavaParser, but didn't find anything, I assume this is because JavaParser makes untyped AST. So, advise me how I can get this. But please don't say to parse all the code and make my own set of types, I tried and this is very hard, I think this is harder than making my own AST parser.
I am a JavaParser contributor and I just did that in Clojure, on top of JavaParser. I explain how implement that in one post How to build a symbol solver for Java, in Clojure
JavaParser, or any other parser just build an Abstract Syntax Tree (AST) of the code, then you have to resolve symbols to understand which references are associated to which declarations.
Suppose you have in your code something like:
a = 1;
Now, to understand the type of a you should find where it is declared. It could be a reference to a parameter, to a local variable, to field declared in a current class or to a field inherited. If it is a field inherited you should find the code (or the bytecode) of the parent class and look there for the declaration of a. A parser does not do that, a parse just take a string (or a file) and build an AST.
Build a symbol resolver is not rocket science but it requires a bit of work. The solution I described in the post linked above is available on GitHub and I would be glad to help you use it if you want (even if it is written in Clojure you can call it from Java quite easily)
I need to parse a sequence of Prolog statements and I've been putting together ad-hoc regexs to handle them but the result is not very robust. I noticed java.util.regex.Pattern.Prolog, which is a subclass of java.util.regex.Pattern.Node, but I can't seem to find anything that explains what these classes are for or how to use them. The Javadocs are mostly empty. Are there tutorials or fleshed-out documentation of the purpose and usage of these classes? Can they be used to parse Prolog?
Those classes have package access modifiers. For example, Node, in Oracle JDK 7, is declared as
static class Node extends Object {
They can only be accessed from classes in the same package. However, since that package is typically secured by the JVM, you cannot add your classes to it. You'll get an exception like
Exception in thread "main" java.lang.SecurityException: Prohibited package name: java.util.regex
You can find and copy the source code if you want, but you will not be able to use the classes themselves.
As for their purpose, you have to again go to the source code and look at the comments.
/**
* The following classes are the building components of the object
* tree that represents a compiled regular expression. The object tree
* is made of individual elements that handle constructs in the Pattern.
* Each type of object knows how to match its equivalent construct with
* the match() method.
*/
I am trying to write an AST interpreter / REPL. ANTLRv4 provides two very similar interfaces (ParseTreeVisitor and ParseTreeListener) to walk the parse tree. I cannot seem to find any major differences between them, and the documentation is rather sparse. Is one interface preferable to the other?
The interfaces are used for different purposes. The primary differences are as follows:
ParseTreeListener
Provides separate enter/exit methods for before/after the children of a parse tree node are examined.
All methods return void. Any values collected for "return" by the listener must be held in fields or elsewhere.
Control of which tree nodes are examined is external (via ParseTreeWalker or a derived class).
ParseTreeVisitor
Provides one method which is responsible for all analysis/behavior for each parse tree node.
Each method returns generic type parameter T, which may be Void if the visitor methods do not return a value.
Control of which tree nodes are examined is internal (via visitChildren and/or calls to visit for specific children).
I'm currently pondering how best to take an AST generated using Antlr and convert it into useful objects which I can use in my program.
The purpose of my grammar (apart from learning) is to create an executable (runtime interpretted) language.
For example, how would I take an attribute sub-tree and have a specific Attribute class instanciated. E.g.
The following code in my language:
Print(message:"Hello stackoverflow")
would product the following AST:
My current line of thinking is that a factory class could read the tree, pull out the name (message), and type(STRING) value("Hello stackoverflow"). Now, knowing the type I could instanciate the correct class (e.g. A StringAttribute class) and pass in the required attribute data - the name and value.
The same approach could be used for a definition factory, pulling out the definition name (Print), instanciating the Print class, and then passing in the attributes generated from the attribute factory.
Things do get a bit more complicated with a more complicated program:
Program(args:[1,2,3,4,5])
{
If(isTrue:IsInArray(array:{Program.args} value:5))
{
Then {
Print(message:"5 is in the array")
} Else {
Print(message:"More complex " + "message")
}
}
}
ANY/ALL help or thoughts are very welcome. Many thanks.
Previous related questions by me (Could be useful):
How do I make a tree
parser
Solving LL recursion problem
Antrl3 conditional tree rewrites
I recommend reading chapter 9, Building High-Level Interpreters, from Language Implementation Patterns by Terence Parr.
EDIT
Okay, to get you through the time waiting for that book, here's what you're (at least) going to need:
a global memory space;
function spaces (each function space will also have a (local) memory space);
and classes that spring to mind (in UML-ish style):
class Interpreter
global : MemorySpace
functions : Stack<Function>
...
class MemorySpace
vars : Map<String, Object>
...
class Function
local: MemorySpace
execute(): void
...
Here's one with ANTLR -> LLVM:
Once you have the AST, all you need is an iterator to walk the tree and the template to emit the objects you want.
This Tutorial is based on Flex and Bison but at the end he details how he converts his AST to LLVM assembly code, it might be helpful.