JavaParser - Get typed AST - java

I want to get typed AST from JavaParser or another parser of Java code. It means I would be able to get the type for a specific variable or parameters+returning type of a method. I googled a lot about this feature of JavaParser, but didn't find anything, I assume this is because JavaParser makes untyped AST. So, advise me how I can get this. But please don't say to parse all the code and make my own set of types, I tried and this is very hard, I think this is harder than making my own AST parser.

I am a JavaParser contributor and I just did that in Clojure, on top of JavaParser. I explain how implement that in one post How to build a symbol solver for Java, in Clojure
JavaParser, or any other parser just build an Abstract Syntax Tree (AST) of the code, then you have to resolve symbols to understand which references are associated to which declarations.
Suppose you have in your code something like:
a = 1;
Now, to understand the type of a you should find where it is declared. It could be a reference to a parameter, to a local variable, to field declared in a current class or to a field inherited. If it is a field inherited you should find the code (or the bytecode) of the parent class and look there for the declaration of a. A parser does not do that, a parse just take a string (or a file) and build an AST.
Build a symbol resolver is not rocket science but it requires a bit of work. The solution I described in the post linked above is available on GitHub and I would be glad to help you use it if you want (even if it is written in Clojure you can call it from Java quite easily)

Related

In ANTLR how do I parse streams against a specific low-level rule?

ANTRLR newbie question. Say for a given grammar ANTLR maven plugin has created all the necessary Java classes to traverse and parse a text. And it works just fine when used as prescribed in "The Definitive ANTLR4 Reference".
Now imagine I need to reuse the generated classes to parse an expression which is defined by a rule buried somewhere deep in the grammar file.
However the Reference doesn't seem to provide a clue as to how to select a specific rule as a starting one, the generated classes always expect the whole grammar tree being present in the source.
Using the generated classes as-is doesn't work either, cause the corresponding listener and parser methods expect a context parameter which can only be created when having a "parent context" and an "invoking state" which I don't know how to define.
The only (and rubbish) solution I came up with so far, is splitting the grammar into two files so that the low-level rule in question would become top-level one, and import the latter into the first.
Do I miss something obvious here? Any help would be appreciated.
This is very simple. Load your input stream with the text that you wanna match against one of the subrules, then call the function for that subrule in the parser as you did with the main rule. Each grammar rule is represented by a function, which you can simply call for your subtext and it will generate a stripped down parse tree then, which applies only to this subrule (and it's children).

What is the role of I*Binding in Eclipse JDT?

My current understanding is that JDT provides us two different interface and class hierachies for representing and manipulating Java code:
Java Model: provides a way of representing a java project. Fast to
create but does not contain as many information as AST class
hierachy, for example there are no information available about the
exact position of each element in the source file (in AST that's
available)
AST: more detailed representation of the source code plus provides
means for manipulating it.
Is that correct?
Now, there is also a hierarchy of interfaces named I*Binding (starting at IBinding), for example IMethodBinding. So for example, we have 3 different types for dealing with methods:
IMethod (from Java Model)
MethodInvocation (from AST, could get it from IMethod)
IMethodBinding
From doc IMethodBinding seems very like MethodInvocation from AST but I don't see a clear distinction and when should I use them. Could someone please clarify this?
Raw AST nodes do not contain references between them e.g. from variable use back to its declaration, or from method invocation back to method declaration. MethodInvocation object may be inspected for method name, but you can't immidiately learn what method of which class is being invoked actually. scoping analysis is required to do so.
This analysis is called binding resolution. IBinding objects are attached to AST nodes and you can use them to find e.g. a MethodDeclaration AST node for a given MethodInvocation AST node using CompilationUnit.findDeclaringNode(methodInvocationNode.resolveMethodBidning().getKey())
Or you can use CompilationUnit.findDeclaringNode(method.getKey()) to find which AST node contains declaration corresponding to given IMethod object.
MethodInvocation.resolveBinding().getKey() ==
MethodDeclaration.resolveBinding().getKey() ==
IMethod.getKey()

Java/groovy reflection read method content

i would like to analyse my code, classes and methods : my goal is to create a sequence diagram by reversing my code
But , i would like to analyse it without running the application
So, i already get my classes and methods names
What i am looking for now , is to read/get the content of a method , without using a regex expression to parse my entire file
Is there a simple way to get it ?
Thanks
I think you can use Groovy's Global AST Transforms to analyze your code. It will give you access to the abstract syntax tree. From there you can walk in the tree nodes of your code. This is 'hooking' in the Groovy compilation process.
I'm not sure it will work with Java code. Java is Groovy code, so in theory it could work, but the compiler won't go through .java files.

Converting Antlr syntax tree into useful objects

I'm currently pondering how best to take an AST generated using Antlr and convert it into useful objects which I can use in my program.
The purpose of my grammar (apart from learning) is to create an executable (runtime interpretted) language.
For example, how would I take an attribute sub-tree and have a specific Attribute class instanciated. E.g.
The following code in my language:
Print(message:"Hello stackoverflow")
would product the following AST:
My current line of thinking is that a factory class could read the tree, pull out the name (message), and type(STRING) value("Hello stackoverflow"). Now, knowing the type I could instanciate the correct class (e.g. A StringAttribute class) and pass in the required attribute data - the name and value.
The same approach could be used for a definition factory, pulling out the definition name (Print), instanciating the Print class, and then passing in the attributes generated from the attribute factory.
Things do get a bit more complicated with a more complicated program:
Program(args:[1,2,3,4,5])
{
If(isTrue:IsInArray(array:{Program.args} value:5))
{
Then {
Print(message:"5 is in the array")
} Else {
Print(message:"More complex " + "message")
}
}
}
ANY/ALL help or thoughts are very welcome. Many thanks.
Previous related questions by me (Could be useful):
How do I make a tree
parser
Solving LL recursion problem
Antrl3 conditional tree rewrites
I recommend reading chapter 9, Building High-Level Interpreters, from Language Implementation Patterns by Terence Parr.
EDIT
Okay, to get you through the time waiting for that book, here's what you're (at least) going to need:
a global memory space;
function spaces (each function space will also have a (local) memory space);
and classes that spring to mind (in UML-ish style):
class Interpreter
global : MemorySpace
functions : Stack<Function>
...
class MemorySpace
vars : Map<String, Object>
...
class Function
local: MemorySpace
execute(): void
...
Here's one with ANTLR -> LLVM:
Once you have the AST, all you need is an iterator to walk the tree and the template to emit the objects you want.
This Tutorial is based on Flex and Bison but at the end he details how he converts his AST to LLVM assembly code, it might be helpful.

How to determine which classes are referenced in a compiled .Net or Java application?

I wonder if there's an easy way to determine which classes from a library are "used" by a compiled .NET or Java application, and I need to write a simple utility to do that (so using any of the available decompilers won't do the job).
I don't need to analyze different inputs to figure out if a class is actually created for this or that input set - I'm only concerned whether or not the class is referenced in the application. Most likely the application would subclass from the class I look for and use the subclass.
I've looked through a bunch of .Net .exe's and Java .classes with a hex editor and it appears that the referenced classes are spelled out in plaintext, but I am not sure if it will always be the case - my knowledge of MSIL/Java bytecode is not enough for that. I assume that even though the application itself can be obfuscated, it'll still have to call the library classes by the original name?
Extending what overslacked said.
EDIT: For some reason I thought you asked about methods, not types.
Types
Like finding methods, this doesn't cover access through the Reflection API.
You have to locate the following in a Reflector plugin to identify referenced types and perform a transitive closure:
Method parameters
Method return types
Custom attributes
Base types and interface implementations
Local variable declarations
Evaluated sub-expression types
Field, property, and event types
If you parse the IL yourself, all you have to do is process from the main assembly is the TypeRef and TypeSpec metadata, which is pretty easy (of course I'm speaking from parsing the entire byte code here). However, the transitive closure would still require you process the full byte code of each referenced method in the referenced assembly (to get the subexpression types).
Methods
If you can write a plugin for Reflector to handle the task, it will definitely be the easiest way. Parsing the IL is non-trivial, though I've done it now so I would just use that code if I had to (just saying it's not impossible). :D
Keep in mind that you may have method dependencies you don't see on the first pass that neither method mentioned will catch. These are due to indirect dispatch via the callvirt (virtual and interface method calls) and calli (generally delegates) instructions. For each type T created with newobj and for each method M within the type, you'll have to check all callvirt, ldftn, and ldvirtftn instructions to see if the base definition for the target (if the target is a virtual method) is the same as the base method definition for M in T or M is in the type's interface map if the target is an interface method. This is not perfect, but it is about the best you can do for static analysis without a theorem prover. It is a superset of the actual methods that will be called outside of the Reflection API, and a subset of the full set of methods in the assembly(ies).
For .NET: it looks like there's an article on MSDN that should help you get started. For what it's worth, for .NET the magic Google words are ".net assembly references".
In Java, the best mechanism to find class dependencies (in a programmatic fashion) is through bytecode inspection. This can be done with libraries like BCEL or (preferably) ASM. If you wish to parse the class files with your own code, the class file structure is well documented in the Java VM specification.
Note that class inspection won't cover runtime dependencies (like classes loaded using the service API).

Categories