Java API to get metadata about Java Source Code - java

I would like to create some reverse egineered design docs based on Java code (not bytecode), instead of writing my own interpreter, what tools and APIs are available to traverse Java code, using Java code?
Reflection is on bytecode, and is limited to the method level, I want to "objectize" also the method code.
Java doc is ignoring the code itself and only based on comments, automatic UML sequnces are too strict
E.g. an API like this (forgive my ignorance of official Programming Languages Structure terms):
JavaCodeDom jcd = new JavaCodeDom(new File(pathToJavaSource), CompilerEnum.Java16)
List <ClassSrc> classes = jcd.getClasses();
ClassSrc cls = classes.get(0);
Map<MethodSignatureSrc,MethodSrc> methods = cls.getMethodsMap();
MethodSrc main = mothds.get(new MethodSignatureSrc(Modifiers.Public, Modifiers.Static, ReturnTypes.Void, "main", new MethodParams(String[].class))
List<StatementSrc> statements = main.getStatements();
for(StatementSrc statement : statements){
if(statement.getType()==StatementTypes.Assignment()){
AssignmentStatementSrc assignment = (AssignmentStatementSrc)statement;
Identifier src = assignment.getAssigneeVariable();
ExpressinoSrc = assignment.getAssignmentValue();
}
}
List<AnnotationsSrc> annotations = cls.getAnnotations();

There are several such APIs in existence (and delivered with the JDK), some of them build in in the Java Compiler (javac).
The most extensive is the Compiler Tree API, which gets you access to individual expressions (and subexpressions) in the Java source.
The language model API models types and members of types (constructors, methods, fields) - it is used by the compiler tree API and also for annotation processing. It does not give access to the contents of the methods.
Of course, on runtime you have the Reflection API (java.lang.Class and java.lang.reflect.*, together with java.lang.annotation).
To use the compiler tree API, you have to invoke the compiler, with the compiler API.
Additionally, there is the Doclet API for Javadoc, which gives you a similar view like the language model API, but additionally with the documentation comments (and parsed tags).
I once used a combination of Doclet API and Compiler Tree API to format source code beautifully (this is not online, sadly).

BCEL supports reading an manipulating Java class files. (I have not used it myself, but saw it used successfully in a third-party product.)
The Byte Code Engineering Library is intended to give users a convenient
possibility to analyze, create, and manipulate (binary) Java class files
(those ending with .class). Classes are represented by objects which
contain all the symbolic information of the given class: methods,
fields and byte code instructions, in particular.
If you're just interested in decompiling, you might find it sufficient to decompile to source code. Here's a comparison of several options for Java.

I seems ANTLR is one option, but I haven't used it

This seems to answer my question: How to generate AST from Java source-code? ( Spoon )

Related

Using reflection to modify the structure of an object

From wikipedia:
reflection is the ability of a computer program to examine and modify the structure and behavior (specifically the values, meta-data, properties and functions) of an object at runtime.
Can anyone give me a concrete example of modifying the structure of an object? I'm aware of the following example.
Object foo = Class.forName("complete.classpath.and.Foo").newInstance();
Method m = foo.getClass().getDeclaredMethod("hello", new Class<?>[0]);
m.invoke(foo);
Other ways to get the class and examine structures. But the questions is how modify is done?
Just an additional hint since the previous answers and comments answer the question concerning reflection.
To really change the structur of a class and therefore its behaviour during runtime look at Byte code instrumentaion and in this case javassist and asm libs. In any case this is not trivial task.
Additionally you might have a look at aspect programming technic, which enables you to enhance methods with some functionallity. Often used to introduce logging without the need to have a dependency of the logging classes within your class and also dont have the invocations of the logging methods between the problem related code.
In English reflection means "mirror image".
So I'd disagree with the Wikipedia definition. For me, reflection is about runtime inspection of code, not manipulation.
In java, you can modify the bytecode at runtime using byte code manipulation. One well known library and in wide spread use is CGLIB.
In java, reflection is not fully supported as defined by the wikipedia.
Only Field.setAccessible(true) or Method.setAccessible(true) really modifies a class, and still it only changes security, not behaviour.
Frameworks like e.g. hibernate use this to add behaviour to a class by e.g. generating a subclass in bytecode that accesses private fields in the parent class.
Java is still a static typed language, unlike javascript where you can change any behaviour at runtime.
The only method in reflection (java.lang.reflect) to modify object's class behaviour is to change the accessibility flag of Constructor, Method and Field - setAccessible, whatever wiki says. Though there are libraries like http://ru.wikipedia.org/wiki/Byte_Code_Engineering_Library for decomposing, modifying, and recomposing binary Java classes

placeholder for any code

I need to define a object at runtime like below.
Filter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL,
new RegexStringComparator(".*-.5"));
I am reading one String which is having code like below
String _filterString = "RowFilter(CompareFilter.CompareOp.EQUAL,
new RegexStringComparator(\".*-.5\"))";
Now I need to define a filter object by using the above String.
I know, this type of problems we can achieve by using Reflections.But I am looking for alternatives. Is there any simple way to solve problems like this?
The Java Scripting API allows embedding of miscellaneous languages like JavaScript and have bindings to Java variables and methods. In your case the language BeanShell (Java subset) can be used.
Java Compiler can be used for compiling at Runtime, but it requires a full source (Compilation Unit). I don't think a single expression can be compiled. Maybe, you can work out from here to get your objects from the classes compiled at runtime.

Programatic code modification (e.g. variable extraction) in Java

I know it's possible to do nice stuff with Reflection, such as invoking methods, or altering the values of fields. Is it possible to do heavier code modification, though, at runtime and programmatically?
For instance, if I have a method:
public void foo(){
this.bar = 100;
}
Can I write a program that modifies the innards of this method, notices that it assigns a constant to a field, and turns it into the following:
public int baz = 100;
public void foo(){
this.bar = baz;
}
Perhaps Java isn't really the language to do this kind of thing in - if not, I'm open to suggestions for languages that would allow me to basically reparse or inspect code in this way, and be able to alter it so precisely. I might be pipe dreaming here though, so please tell me if this is the case also.
Just adding a suggestion from a friend - Apache Commons' BCEL looks excellent:
http://commons.apache.org/bcel/manual.html
The Byte Code Engineering Library (Apache Commons BCEL™) is intended to
give users a convenient way to analyze, create, and manipulate (binary)
Java class files (those ending with .class). Classes are represented by
objects which contain all the symbolic information of the given class:
methods, fields and byte code instructions, in particular.
Such objects can be read from an existing file, be transformed by a
program (e.g. a class loader at run-time) and written to a file again.
An even more interesting application is the creation of classes from
scratch at run-time. The Byte Code Engineering Library (BCEL) may be
also useful if you want to learn about the Java Virtual Machine (JVM)
and the format of Java .class files.
You are looking for software that allows you to do bytecode manipulation, there are several frameworks to achieve this, but the two most known currently are:
ASM
javassist
When performing bytecode modifications at runtime in Java classes keep in mind the following:
If you change a class's bytecode after a class has been loaded by a classloader, you'll have to find a way to reload it's class definition (either through classloading tricks, or using hotswap functionalities)
If you change the classes interface (example add new methods or fields) you will be able only to reach them through reflection.
It's probably fair to say that Java wasn't designed with this purpose in mind, but you can do it potentially. How and when depends a little on the ultimate aim of the exercise. A couple of options:
At the source code level, you can use the Java Compiler API to
compile arbitrary code into a class file (which you can then load).
At the bytecode level, you can write an agent that installs a
ClassFileTransformer to arbitrarily alter a class "on the fly"
as it is loaded. In practice, if you do this, you will also probably
make use of a library such as BCEL (Bytecode Engineering
Library) to make manipulating the class easier.
You want to investigate program transformation systems (PTS), which provide general facilities for parsing and transforming languages at the source level. PTS provide rewrite rules that say in effect, "if you see this pattern, replace it by that pattern" using the surface syntax of the target language. This is done using full parsers so the rewrite rule really operates on language syntax and not text; such rewrite rules obviously won't attempt to modify code-like text in comments, unlike tools based on regexps.
Our DMS Software Reengineering Toolkit is one of these. It provides not only the usual parsing, AST building and prettyprinting (reproducing compilable source code complete with comments), but also supports symbol tables and control and data flow analysis. These are needed for almost any interesting transformations. DMS also has front ends for a variety of dialects of Java as well as many other languages.
Bytecode transformers exist because they are much easier to build; it is pretty easy to "parse" bytecode. Of course, you can't make permanent source changes with a bytecode transformer, so it is lot less useful.
You mean like this?
String script1 = "println(\"OK!\");";
eval( script1 );
script1 += "println(\"... well, maybe NOT OK after all\");";
eval( script2 );
Output:
OK!
OK!
... well, maybe NOT OK after all
... use a scripting extension to Java. Groovy and other things like that would probably allow you to do what you want. I've written a scripting extension which integrates with Java through reflection almost seamlessly myself; contact me if you're interested in the details.

Which JVM-based language should I use for mapping of one type to another?

I'm currently working with Java to write a program that does an EAI between two applications. One application comes with HL7, which I parse with HAPI. So I get a Java object structure. I want to transform this structure to my own structure that I want to use to generate XML files with JAXB after doing some other work.
In my opinion my current solution is not very nice, because the source code gets very complex:
public NaturalPerson convertPID(PID pid) {
NaturalPerson person = new NaturalPerson();
NameNaturalPerson personsname = new NameNaturalPerson();
name.setFamilyName(pid.getPatientName().getFamilyName().getValue());
...
}
Which language is an appropiate Language to do such type mappings? (http://en.wikipedia.org/wiki/List_of_JVM_languages)
I think Java is not the best language for doing that. I don't have much time for learning, so I need a language that is easy to learn and which has a low begin-of-learning-peek. I already have some experience in the functional languages Haskell and F#. First I thought Groovy would be a good language, but then I found other opinions that suggest Scala.
Which language would you suggest for doing such type mappings?
Did you look at Dozer? It is a Java library that recursively copies data from one Java object to another. There are several ways to configure the mapping:
XML
Java API providing a DSL
Java annotations
Data in forms of Maps and Vectors handling are superbly handled on the JVM using Clojure
See all the core functions available and this SO Question on which tutorials are good to learn Clojure.

How to determine which classes are referenced in a compiled .Net or Java application?

I wonder if there's an easy way to determine which classes from a library are "used" by a compiled .NET or Java application, and I need to write a simple utility to do that (so using any of the available decompilers won't do the job).
I don't need to analyze different inputs to figure out if a class is actually created for this or that input set - I'm only concerned whether or not the class is referenced in the application. Most likely the application would subclass from the class I look for and use the subclass.
I've looked through a bunch of .Net .exe's and Java .classes with a hex editor and it appears that the referenced classes are spelled out in plaintext, but I am not sure if it will always be the case - my knowledge of MSIL/Java bytecode is not enough for that. I assume that even though the application itself can be obfuscated, it'll still have to call the library classes by the original name?
Extending what overslacked said.
EDIT: For some reason I thought you asked about methods, not types.
Types
Like finding methods, this doesn't cover access through the Reflection API.
You have to locate the following in a Reflector plugin to identify referenced types and perform a transitive closure:
Method parameters
Method return types
Custom attributes
Base types and interface implementations
Local variable declarations
Evaluated sub-expression types
Field, property, and event types
If you parse the IL yourself, all you have to do is process from the main assembly is the TypeRef and TypeSpec metadata, which is pretty easy (of course I'm speaking from parsing the entire byte code here). However, the transitive closure would still require you process the full byte code of each referenced method in the referenced assembly (to get the subexpression types).
Methods
If you can write a plugin for Reflector to handle the task, it will definitely be the easiest way. Parsing the IL is non-trivial, though I've done it now so I would just use that code if I had to (just saying it's not impossible). :D
Keep in mind that you may have method dependencies you don't see on the first pass that neither method mentioned will catch. These are due to indirect dispatch via the callvirt (virtual and interface method calls) and calli (generally delegates) instructions. For each type T created with newobj and for each method M within the type, you'll have to check all callvirt, ldftn, and ldvirtftn instructions to see if the base definition for the target (if the target is a virtual method) is the same as the base method definition for M in T or M is in the type's interface map if the target is an interface method. This is not perfect, but it is about the best you can do for static analysis without a theorem prover. It is a superset of the actual methods that will be called outside of the Reflection API, and a subset of the full set of methods in the assembly(ies).
For .NET: it looks like there's an article on MSDN that should help you get started. For what it's worth, for .NET the magic Google words are ".net assembly references".
In Java, the best mechanism to find class dependencies (in a programmatic fashion) is through bytecode inspection. This can be done with libraries like BCEL or (preferably) ASM. If you wish to parse the class files with your own code, the class file structure is well documented in the Java VM specification.
Note that class inspection won't cover runtime dependencies (like classes loaded using the service API).

Categories