Programatic code modification (e.g. variable extraction) in Java - java

I know it's possible to do nice stuff with Reflection, such as invoking methods, or altering the values of fields. Is it possible to do heavier code modification, though, at runtime and programmatically?
For instance, if I have a method:
public void foo(){
this.bar = 100;
}
Can I write a program that modifies the innards of this method, notices that it assigns a constant to a field, and turns it into the following:
public int baz = 100;
public void foo(){
this.bar = baz;
}
Perhaps Java isn't really the language to do this kind of thing in - if not, I'm open to suggestions for languages that would allow me to basically reparse or inspect code in this way, and be able to alter it so precisely. I might be pipe dreaming here though, so please tell me if this is the case also.

Just adding a suggestion from a friend - Apache Commons' BCEL looks excellent:
http://commons.apache.org/bcel/manual.html
The Byte Code Engineering Library (Apache Commons BCEL™) is intended to
give users a convenient way to analyze, create, and manipulate (binary)
Java class files (those ending with .class). Classes are represented by
objects which contain all the symbolic information of the given class:
methods, fields and byte code instructions, in particular.
Such objects can be read from an existing file, be transformed by a
program (e.g. a class loader at run-time) and written to a file again.
An even more interesting application is the creation of classes from
scratch at run-time. The Byte Code Engineering Library (BCEL) may be
also useful if you want to learn about the Java Virtual Machine (JVM)
and the format of Java .class files.

You are looking for software that allows you to do bytecode manipulation, there are several frameworks to achieve this, but the two most known currently are:
ASM
javassist
When performing bytecode modifications at runtime in Java classes keep in mind the following:
If you change a class's bytecode after a class has been loaded by a classloader, you'll have to find a way to reload it's class definition (either through classloading tricks, or using hotswap functionalities)
If you change the classes interface (example add new methods or fields) you will be able only to reach them through reflection.

It's probably fair to say that Java wasn't designed with this purpose in mind, but you can do it potentially. How and when depends a little on the ultimate aim of the exercise. A couple of options:
At the source code level, you can use the Java Compiler API to
compile arbitrary code into a class file (which you can then load).
At the bytecode level, you can write an agent that installs a
ClassFileTransformer to arbitrarily alter a class "on the fly"
as it is loaded. In practice, if you do this, you will also probably
make use of a library such as BCEL (Bytecode Engineering
Library) to make manipulating the class easier.

You want to investigate program transformation systems (PTS), which provide general facilities for parsing and transforming languages at the source level. PTS provide rewrite rules that say in effect, "if you see this pattern, replace it by that pattern" using the surface syntax of the target language. This is done using full parsers so the rewrite rule really operates on language syntax and not text; such rewrite rules obviously won't attempt to modify code-like text in comments, unlike tools based on regexps.
Our DMS Software Reengineering Toolkit is one of these. It provides not only the usual parsing, AST building and prettyprinting (reproducing compilable source code complete with comments), but also supports symbol tables and control and data flow analysis. These are needed for almost any interesting transformations. DMS also has front ends for a variety of dialects of Java as well as many other languages.
Bytecode transformers exist because they are much easier to build; it is pretty easy to "parse" bytecode. Of course, you can't make permanent source changes with a bytecode transformer, so it is lot less useful.

You mean like this?
String script1 = "println(\"OK!\");";
eval( script1 );
script1 += "println(\"... well, maybe NOT OK after all\");";
eval( script2 );
Output:
OK!
OK!
... well, maybe NOT OK after all
... use a scripting extension to Java. Groovy and other things like that would probably allow you to do what you want. I've written a scripting extension which integrates with Java through reflection almost seamlessly myself; contact me if you're interested in the details.

Related

Best choice? Edit bytecode (asm) or edit java file before compiling

Goal
Detecting where comparisons between and copies of variables are made
Inject code near the line where the operation has happened
The purpose of the code: everytime the class is ran make a counter increase
General purpose: count the amount of comparisons and copies made after execution with certain parameters
2 options
Note: I always have a .java file to begin with
1) Edit java file
Find comparisons with regex and inject pieces of code near the line
And then compile the class (My application uses JavaCompiler)
2)Use ASM Bytecode engineering
Also detecting where the events i want to track and inject pieces into the bytecode
And then use the (already compiled but modified) class
My Question
What is the best/cleanest way? Is there a better way to do this?
If you go for the Java route, you don't want to use regexes -- you want a real java parser. So that may influence your decision. Mind, the Oracle JVM includes one, as part of their internal private classes that implement the java compiler, so you don't actually have to write one yourself if you don't want to. But decoding the Oracle AST is not a 5 minute task either. And, of course, using that is not portable if that's important.
If you go the ASM route, the bytecode will initially be easier to analyze, since the semantics are a lot simpler. Whether the simplicity of analyses outweighs the unfamiliarity is unknown in terms of net time to your solution. In the end, in terms of generated code, neither is "better".
There is an apparent simplicity of just looking at generated java source code and "knowing" that What You See Is What You Get vs doing primitive dumps of class files for debugging and etc., but all that apparently simplicity is there because of your already existing comfortability with the Java lanaguage. Once you spend some time dredging through byte code that, too, will become comfortable. Just a question whether it's worth the time to you to get there in the first place.
Generally it all depends how comfortable you are with either option and how critical is performance aspect. The bytecode manipulation will be much faster and somewhat simpler, but you'll have to understand how bytecode works and how to use ASM framework.
Intercepting variable access is probably one of the simplest use cases for ASM. You could find a few more complex scenarios in this AOSD'07 paper.
Here is simplified code for intercepting variable access:
ClassReader cr = ...;
ClassWriter cw = ...;
cr.accept(new MethodVisitor(cw) {
public void visitVarInsn(int opcode, int var) {
if(opcode == ALOAD) { // loading Object var
... insert method call
}
}
});
If it was me i'd probably use the ASM option.
If you need a tutorial on ASM I stumbled upon this user-written tutorial click here

from java to javascript: the object model

I'm trying to port an application I wrote in java to javascript (actually using coffeescript).
Now, I'm feeling lost.. what do you suggest to do to create class properties? Should I use getter/setters? I don't like to do this:
myObj.prop = "hello"
because I could use non existing properties, and it would be easy to mispell something..
How can I get javascript to be a bit more like java, with private, public final properties etc..? Any suggestion?
If you just translate your Java code into JavaScript, you're going to be constantly fighting JavaScript's object model, which is prototype-based, not class-based. There are no private properties on objects, no final properties unless you're using an ES5-compatible engine (you haven't mentioned what your target runtime environment is; browsers aren't use ES5-compatible, it'll be another couple of years), no classes at all in fact.
Instead, I recommend you thoroughly brief yourself on how object orientation actually works in JavaScript, and then build your application fully embracing how JavaScript does it. This is non-trivial, but rewarding.
Some articles that may be of use. I start with closures because really understanding closures is absolutely essential to writing JavaScript, and most "private member" solutions rely on closures. Then I refer to a couple of articles by Douglas Crockford. Crockford is required reading if you're going to work in JavaScript, even if you end up disagreeing with some of his conclusions. Then I point to a couple of articles specifically addressing doing class-like things.
Closures are not complicated - Me
Prototypical inheritance in JavaScript - Crockford
Private Members in JavaScript - Crockford
Simple, Efficient Supercalls in JavaScript - Me Includes syntactic sugar to make it easier to set up hierarchies of objects (it uses class-based terminology, but actually it's just prototypical inheritance), including calling "superclass" methods.
Private Members in JavaScript - Me Listing Crockford's solution and others
Mythical Methods - Me
You must remember this - Me
Addressing some of your specific questions:
what do you suggest to do to create class properties? Should I use getter/setters? I don't like to do this:
myObj.prop = "hello"
because I could use non existing properties, and it would be easy to mispell something..
I don't, I prefer using TDD to ensure that if I do have a typo, it gets revealed in testing. (A good code-completing editor will also be helpful here, though really good JavaScript code-completing editors are thin on the ground.) But you're right that getters and setters in the Java sense (methods like getFoo and setFoo) would make it more obvious when you're creating/accessing a property that you haven't defined in advance (e.g., through a typo) by causing a runtime error, calling a function that doesn't exist. (I say "in the Java sense" because JavaScript as of ES5 has a different kind of "getters" and "setters" that are transparent and wouldn't help with that.) So that's an argument for using them. If you do, you might look at using Google's Closure compiler for release builds, as it will inline them.
How can I get javascript to be a bit more like java, with private...
I've linked Crockford's article on private members, and my own which lists other ways. The very basic explanation of the Crockford model is: You use a variable in the context created by the call to your constructor function and a function created within that context (a closure) that has access to it, rather than an object property:
function Foo() {
var bar;
function Foo_setBar(b) {
bar = b;
}
function Foo_getBar() {
return bar;
}
this.setBar = Foo_setBar;
this.getBar = Foo_getBar;
}
bar is not an object property, but the functions defined in the context with it have an enduring reference to it. This is totally fine if you're going to have a smallish number of Foo objects. If you're going to have thousands of Foo objects you might want to reconsider, because each and every Foo object has its own two functions (really genuinely different Function instances) for Foo_getBar and Foo_setBar.
You'll frequently see the above written like this:
function Foo() {
var bar;
this.setBar = function(b) {
bar = b;
};
this.getBar = function() {
return bar;
};
}
Yes, it's briefer, but now the functions don't have names, and giving your functions names helps your tools help you.
How can I get javascript to be a bit more like java, with...public final properties
You can define a Java-style getter with no setter. Or if your target environment will be ES5-compliant (again, browsers aren't yet, it'll be another couple of years), you could use the new Object.defineProperty feature that allows you to set properties that cannot be written to.
But my main point is to embrace the language and environment in which you're working. Learn it well, and you'll find that different patterns apply than in Java. Both are great languages (I use them both a lot), but they work differently and lead to different solutions.
You can use module pattern to make private properties and public accessors as one more option.
This doesn't directly answer your question, but I would abandon the idea of trying to make the JavaScript app like Java. They really are different languages (despite some similarities in syntax and in their name). As a general statement, it makes sense to adopt the idioms of the target language when porting something.
Currently there are many choices for you , you can check dojo library. In dojo, you can code mostly like java programming
Class
Javascript doesn’t have a Class system like Java,dojo provide dojo.declare to define a functionality to simulate this. Check this page . There are field variable, constructor method, extend from other class.
JavaScript has a feature that constructor functions may return any object (not necesserily this). So, your constructor function could just return a proxy object, that allows access only to the public methods of your class. Using this method you can create real protected member, just like in Java (with inheritance, super() call, etc.)
I created a little library to streamline this method: http://idya.github.com/oolib/
Dojo is one option. I personally prefer Prototype. It also has a framework and API for creating classes and using inheritance in a more "java-ish" way. See the Class.create method in the API. I've used it on multiple webapps I've worked on.
I mainly agree with #Willie Wheeler that you shouldn't try too hard to make your app like Java - there are ways of using JavaScript to create things like private members etc - Douglas Crockford and others have written about this kind of thing.
I'm the author of the CoffeeScript book from PragProg. Right now, I use CoffeeScript as my primary language; I got fluent in JavaScript in the course of learning CoffeeScript. But before that, my best language was Java.
So I know what you're going through. Java has a very strong set of best practices that give you a clear idea of what good code is: clean encapsulation, fine-grained exceptions, thorough JavaDocs, and GOF design patterns all over the place. When you switch to JavaScript, that goes right out the window. There are few "best practices," and more of a vague sense of "this is elegant." Then when you start seeing bugs, it's incredibly frustrating—there are no compile-time errors, and far fewer, less precise runtime errors. It's like playing without a net. And while CoffeeScript adds some syntactic sugar that might look familiar to Java coders (notably classes), it's really no less of a leap.
Here's my advice: Learn to write good CoffeeScript/JavaScript code. Trying to make it look like Java is the path to madness (and believe me, many have tried; see: just about any JS code released by Google). Good JS code is more minimalistic. Don't use get/set methods; use exceptions sparingly; and don't use classes or design patterns for everything. JS is ultimately a more expressive language than Java is, and CoffeeScript even moreso. Once you get used to the feeling of danger that comes with it, you'll like it.
One note: JavaScripters are, by and large, terrible when it comes to testing. There are plenty of good JS testing frameworks out there, but robust testing is much rarer than in the Java world. So in that regard, there's something JavaScripters can learn from Java coders. Using TDD would also be a great way of easing your concerns about how easy it is to make errors that, otherwise, wouldn't get caught until some particular part of your application runs.

How Do I Place Auto-generated Java Classes in a Single .java File?

As everyone knows - public java classes must be placed in their own file named [ClassName].java
( When java class X required to be placed into a file named X.java? )
However, we are auto-generating 50+ java classes, and I'd like to put them all in the same file for our convenience. This would make it substantially easier to generate the file(s), and copy them around when we need to.
Is there any way I can get around this restriction? It seems like more of a stylistic concern - and something I might be able to disable with a compiler flag.
If not, what would you recommend?
Can you put wrapper class around your classes? Something like:
public class Wrapper {
public static class A {...}
public static class B {...}
....
}
Then you can access them via Wrapper.A, Wrapper.B.
At the .class level, this is a requirement per the Java spec. Even the inner classes get broken out into their own class file in the from Outer$Inner.class. I believe the same is true at the language level.
Your best bet is to generate the files and make your copy script smart. Perhaps generate them and zip them up. Usually, if you have to move these files around then either everyone has the same generator script OR you distribute them as a JAR.
Is there any way I can get around this restriction?
You can change your generated source code to make it acceptable; e.g. by using nested classes, by putting the generated classes into their own package.
It seems like more of a stylistic concern - and something I might be able to disable with a compiler flag.
It is not just a stylistic concern:
The one file per class rule is allowed by the Java Language Specification.
It is implemented by all mainstream Java compilers.
It is implemented by all mainstream JVMs in the form of the default classloader behavior.
It is assumed by 3rd party Java tools; e.g. IDEs, style checkers, bug checkers, code generation frameworks, etc.
In short, while it would theoretically be legal to implement a Java ecosystem that didn't have this restriction, it is impractical. No such compiler switch exists, and implementing one would be impractical for the reasons above.
The nested class solution is a good one. Another alternative would be to put the generated classes into a separate package (but with separate file) to make them easier to manage.

Java API to get metadata about Java Source Code

I would like to create some reverse egineered design docs based on Java code (not bytecode), instead of writing my own interpreter, what tools and APIs are available to traverse Java code, using Java code?
Reflection is on bytecode, and is limited to the method level, I want to "objectize" also the method code.
Java doc is ignoring the code itself and only based on comments, automatic UML sequnces are too strict
E.g. an API like this (forgive my ignorance of official Programming Languages Structure terms):
JavaCodeDom jcd = new JavaCodeDom(new File(pathToJavaSource), CompilerEnum.Java16)
List <ClassSrc> classes = jcd.getClasses();
ClassSrc cls = classes.get(0);
Map<MethodSignatureSrc,MethodSrc> methods = cls.getMethodsMap();
MethodSrc main = mothds.get(new MethodSignatureSrc(Modifiers.Public, Modifiers.Static, ReturnTypes.Void, "main", new MethodParams(String[].class))
List<StatementSrc> statements = main.getStatements();
for(StatementSrc statement : statements){
if(statement.getType()==StatementTypes.Assignment()){
AssignmentStatementSrc assignment = (AssignmentStatementSrc)statement;
Identifier src = assignment.getAssigneeVariable();
ExpressinoSrc = assignment.getAssignmentValue();
}
}
List<AnnotationsSrc> annotations = cls.getAnnotations();
There are several such APIs in existence (and delivered with the JDK), some of them build in in the Java Compiler (javac).
The most extensive is the Compiler Tree API, which gets you access to individual expressions (and subexpressions) in the Java source.
The language model API models types and members of types (constructors, methods, fields) - it is used by the compiler tree API and also for annotation processing. It does not give access to the contents of the methods.
Of course, on runtime you have the Reflection API (java.lang.Class and java.lang.reflect.*, together with java.lang.annotation).
To use the compiler tree API, you have to invoke the compiler, with the compiler API.
Additionally, there is the Doclet API for Javadoc, which gives you a similar view like the language model API, but additionally with the documentation comments (and parsed tags).
I once used a combination of Doclet API and Compiler Tree API to format source code beautifully (this is not online, sadly).
BCEL supports reading an manipulating Java class files. (I have not used it myself, but saw it used successfully in a third-party product.)
The Byte Code Engineering Library is intended to give users a convenient
possibility to analyze, create, and manipulate (binary) Java class files
(those ending with .class). Classes are represented by objects which
contain all the symbolic information of the given class: methods,
fields and byte code instructions, in particular.
If you're just interested in decompiling, you might find it sufficient to decompile to source code. Here's a comparison of several options for Java.
I seems ANTLR is one option, but I haven't used it
This seems to answer my question: How to generate AST from Java source-code? ( Spoon )

Rewriting method calls within compiled Java classes

I want to replace calls to a given class with calls to anther class within a method body whilst parsing compiled class files...
or put another way, is there a method of detecting usages of a given class in a method and replacing just that part of the method using something like javaassist.
for example.. if I had the compiled version of
class A { public int m() { int i = 2; B.multiply(i,i); return i; } }
is there a method of detecting the use of B and then altering the code to perform
class A { public int m() { int i = 2; C.divide(i,i); return i; } }
I know the alternative would be to write a parser to grep the source files for usages but I would prefer a more elegant solution such as using reflection to generate new compiled class files.
Any thoughts ?
As #djna says, it is possible to modify bytecode files before you load them, but you probably do not want to do this:
The code that does the code modification is likely to be complex and hard to maintain.
The code that has been modified is likely to be difficult to debug. For a start, a source level debugger will show you source code that no longer corresponds to the code that you are actually editing.
Bytecode rewriting is useful in certain cases. For example, JDO implementations use bytecode rewriting to replace object member fetches with calls into the persistence libraries. However, if you have access to the source code for these files, you'll get a better (i.e. more maintainable) solution by preprocessing (or generating) the source code.
EDIT: and AOP or Groovy sound like viable alternatives too, depending on the extent of rewriting that you anticipate doing.
BCEL or ASM.
I recently looked at a number of libraries for reading Java class files. BCEL was the fastest, had the least number of dependencies, compiled out of the box, and had a deliciously simple API. I preferred BCEL to ASM because ASM has more dependencies (although the API is reputedly simpler).
AspectJ, as previously mentioned, is another viable option.
BCEL is truly simple. You can get a list of methods in three lines of code:
ClassParser cp = new ClassParser( "A.class" );
JavaClass jc = cp.parse();
Method[] m = jc.getMethods();
There are other API facilities for further introspection, including, I believe, ways to get the instructions in a method. However, this solution will likely be more laborious than AspectJ.
Another possibility is to change the multiply or divide methods themselves, rather than trying to change all instances of the code that calls the operation. That would be an easier road to take with BCEL (or ASM).
The format of byte code for compiled Java is specified and products exist that manipulate it.
This library appears to have the capability you need. I've no idea how easy it is to do these transformations reliably.
If you don't mind using Groovy, you can intercept the call to B.multiply and replace it with C.divide. You can find an example here.
It's much easier to perform these operations ahead-of-time, where the executable on disk is modified before launching the application. Manipulating the code in memory at run time is even more prone to errors than manipulating code in memory in C/C++. Why do you need to do this?

Categories