Programmatically inspect .class files

Programmatically inspect .class files - java

I'm working on a project where we're doing a lot of remote object transfer between a Java service and clients written in other various languages. Given our current constraints I've decided to see what it would take to generate code based on an existing Java class. Basically I need to take a .class file (or a collection of them) parse the bytecode to determine all of the data members and perhaps getters/setters and then write something that can output code in a different language to create a class with the same structure.
I'm not looking for standard decompilers such as JAD. I need to be able to take a .class file and create an object model of its data members and methods. Is this possible at all?

I've used BCEL and find it really quite awkward. ASM is much better. It very extensively uses visitors (which can be a little confusing) and does not create an object model. Not creating an object model turns out to be a bonus, as any model you do want to create is unlikely to look like a literal interpretation of all the data.

I have used BCEL in the past and it was pretty easy to use. It was a few years ago so there may be something better now.
Apache Jakarta BCEL

From your description, it sounds like simple reflection would suffice. You can discover all of the static structure of the class, as well as accessing the fields of a particular instance.
I would only move on to BCEL if you are trying to translate method instructions. (And if that's what you're trying to automate, good luck!)

I'm shocked that no one has mentioned ASM yet. It's the best bytecode library your money can buy. Well, ok it's free.

JAD is a java decompiler that doesn't allow programmatic access. It isn't readily available anymore, and probably won't work for newer projects with Java7 bytecodes.

I think javassist might help you too.
http://www.jboss.org/javassist/
I have never had the need of using it, but if you give it a try, would you let us know your comments about it?
Although I think it is more for bytecode manipulation than .class inspection.

Related

How to Serialize classes, then read them with a modified version of that same class in Java

I am developing a Minecraft plugin which uses a class that I made called customPlayer. When I save the plugin data from a running instance, I put all of these objects into a HashMap<String,customPlayer> and save them with ObjectOutputStream. Loading these classes back into the same version of the plugin works great, but my problem arises when I modify the class and try to read the object using that modified class (usually associated with a new version of my plugin).
I thought about it for a bit, and thought I came up with a clever solution. My idea was to just include the old class files as an External Library inside the new version of the plugin, cross my fingers and hope it worked. It didn't.
Is there a better way to do this? I'm new to serialization and this kind of stuff, so any suggestions would be greatly appreciated. Below I will include a few Screenshots of the customPlayer class and the crash log of the server. Ideally any solution that is presented should be able to be used easily with future modifications to the class (Updates to the Jar downloaded Via a Github repo).
Instance Variables and Constructor of customPlayer.java

Is there a better way to do this?
There certainly is. Stop using Serialization and ObjectOutputStream. These classes are a disaster (even OpenJDK core team effectively agrees with this assessment). The output they generate is not particularly efficient (it's more bytes than is needed), it is not human readable, nor (easily) read by anything except java code, and it results in such hairy situations as you ran into.
Instead use e.g. Jackson to turn your objects into JSON, or use google's protobuf to turn it into efficient binary blobs.
You can read this JSON or these binary blobs in any language you want and you'll have your pick of the litter as far as libraries go. You will need to write some explicit code to 'save' an object (turn it into JSON / protobuf), and to 'read' one, but now you are free to change your code.
If you insist on continuing with serialization, you need to add a field named serialVersionUID, and set up readObject and writeObject. it's convoluted rocket science that's hard to get right. The details are in the javadoc of java.io.Serializable.
Do yourself a favour though. Don't do it.

How do classes in the Java standard API interact with the OS?

I've been trying to find an answer to this for some time, but I think part of my problem is that I don't really know how to phrase my question. I understand the that JVM ultimately preforms all the system calls at run-time that the Java program needs to make, my confusion is about the underlying way that Java classes tell the JVM to do this.
Take for example, the File class in the standard Java library. As far as I know, this is considered the most fundamental API for opening/creating files in Java. But, File is just another class right? So in theory I should be able to write my own File class from scratch which doesn't utilize the pre-exisitng one, right? How would I do that? What is happening inside the File class that tells the VM to actually create the file? I looked at the source code for the File class, and it looks like it calls another class called VMFile, but I could find no explanation of what VMFile is. When I looked at the VMFile source code, it just had function declarations with no definitions.
Thank you for your help.

The Java Native Interface (JNI) is the glue between Java classes and the OS. Native methods have a 'native' attribute (look it up in the JLS).

How to analyze method calls and objects of other classes used in a java class programmatically

i need to detect if a class relies on another class programatically,to detect inappropriate intimacy code smell(i want to analyze other java programs ,using my program).Any directions on
how to achieve this will be a great help.
And
How to identify all the objects created in a java program?
How to identify all the called methods in a java program?
Any help would be appreciated.

You might want to use what's already there instead of building something yourself. Especially if you're not very familiar with the internals of Java and the JVM.
Have a look at JDepend: http://clarkware.com/software/JDepend.html

Use a profiler as JConsole or VisualVM. With the use of profilers you can pretty much see everything that happens at runtime.

One way i think of is using logger, Put some log statement in the construct and in the methods you want to monitor. So through logs you can find out the objects created and methods accessed

I have found very useful the ObjectWeb asm-all Java bytecode manipulation and analysis library, also known as asm-all.jar
It allows you to convert any *.jar application into equivalent XML file. You can fully inspect the application structure, change it in the XML format and convert back into *.jar file
In order to use the XML files you'll need to understand what it contains. Oracle's The Java® Virtual Machine Specification is very good reference to start with
BTW: one thing you can do with this tool is to instrument the bytecode so that it creates runtime profiling information - which methods were called and by whom (as suggested by #upog)

Is there a Java library to generate class files from an AST?

This page describes how I can use the code generator in javac to generate code given that I can build an AST (using a separate parser which I wrote). The technique involves editing javac's source code to basically bypass the Java parser, so that one could supply his/her own AST to the code generator. This could work, but I was hoping to do it in a slightly cleaner way. I want to include the code generating part of javac as a library in my project so I can use it to generate code, without bringing with it the rest of javac's source.
Is there a way to do this with javac, or is there perhaps a better library?
Also, feel free to change the question's title. I couldn't think of a better one, but it's a little ambiguous. If you suggest an edit for a better title, I'll accept it.

I think what you might be interested in is a java library like BCEL(ByteCode Engineering Library)
I played around with it back when I took a class on compiler construction, basically, it has a nice wrapper for generating the constant pool, inserting named bytecode instructions into a method and whatnot, then when you are done, you can either load the class at runtime with a custom classloader, or write it out to a file in the normal way.
With BCEL, it should be relatively easy to go from the syntax tree to the java bytecodes, albeit a bit tedious, but you may want to just use BCEL to generate the raw bytecode without building the tree as well in some cases.

Another cool framework is ASM, a bytecode analysis and manipulation framework.
In case you do not want to use a framework, as of now (2014), it is not possible to generate bytecode from a tree using the arbitrary representations of com.sun.source.tree.* as said here.

Automatically generating Java source code

I'm looking for a way to automatically generate source code for new methods within an existing Java source code file, based on the fields defined within the class.
In essence, I'm looking to execute the following steps:
Read and parse SomeClass.java
Iterate through all fields defined in the source code
Add source code method someMethod()
Save SomeClass.java (Ideally, preserving the formatting of the existing code)
What tools and techniques are best suited to accomplish this?
EDIT
I don't want to generate code at runtime; I want to augment existing Java source code

What you want is a Program Transformation system.
Good ones have parsers for the language you care about, build ASTs representing the program for the parsed code, provide you with access to the AST for analaysis and modification, and can regenerate source text from the AST. Your remark about "scanning the fields" is just a kind of traversal of the AST representing the program. For each interesting analysis result you produce, you want to make a change to the AST, perhaps somewhere else, but nonetheless in the AST.
And after all the chagnes are made, you want to regenerate text with comments (as originally entered, or as you have constructed in your new code).
There are several tools that do this specifically for Java.
Jackpot provides a parser, builds ASTs, and lets you code Java procedures to do what you want with the trees. Upside: easy conceptually. Downside: you write a lot more Java code to climb around/hack at trees than you'd expect. Jackpot only works with Java.
Stratego and TXL parse your code, build ASTs, and let you write "surce-to-source" transformations (using the syntax of the target language, e.g., Java in this case) to express patterns and fixes. Additional good news: you can define any programming language you like, as the target language to be processed, and both of these have Java definitions.
But they are weak on analysis: often you need symbol tables, and data flow analysis, to really make analyses and changes you need. And they insist that everything is a rewrite rule, whether that helps you or not; this is a little like insisting you only need a hammer in toolbox; after all, everything can be treated like a nail, right?
Our DMS Software Reengineering Toolkit allows the definition of an abitrary target language (and has many predefined langauges including Java), includes all the source-to-source transformation capabilities of Stratego, TXL, the procedural capability of Jackpot,
and additionally provides symbol tables, control and data flow analysis information. The compiler guys taught us these things were necessary to build strong compilers (= "analysis + optimizations + refinement") and it is true of code generation systems too, for exactly the same reasons. Using this approach you can generate code and optimize it to the extent you have the knowledge to do so. One example, similar to your serialization ideas, is to generate fast XML readers and writers for specified XML DTDs; we've done that with DMS for Java and COBOL.
DMS has been used to read/modify/write many kinds of source files. A nice example that will make the ideas clear can be found in this technical paper, which shows how to modify code to insert instrumentation probes: Branch Coverage Made Easy.
A simpler, but more complete example of defining an arbitrary lanauges and transformations to apply to it can be found at How to transform Algebra using the same ideas.

Have a look at Java Emitter Templates. They allow you to create java source files by using a mark up language. It is similar to how you can use a scripting language to spit out HTML except you spit out compilable source code. The syntax for JET is very similar to JSP and so isn't too tricky to pick up. However this may be an overkill for what you're trying to accomplish. Here are some resources if you decide to go down that path:
http://www.eclipse.org/articles/Article-JET/jet_tutorial1.html
http://www.ibm.com/developerworks/library/os-ecemf2
http://www.vogella.de/articles/EclipseJET/article.html

Modifying the same java source file with auto-generated code is maintenance nightmare. Consider generating a new class that extends you current class and adds the desired method. Use reflection to read from user-defined class and create velocity templates for the auto-generating classes. Then for each user-defined class generate its extending class. Integrate the code generation phase in your build lifecycle.
Or you may use 'bytecode enhancement' techniques to enhance the classes without having to modify the source code.
Updates:
mixing auto-generated code always pose a risk of someone modifying it in future to just to tweak a small behavior. It's just the matter of next build, when this changes will be lost.
you will have to solely rely on the comments on top of auto-generated source to prevent developers from doing so.
version-controlling - Lets say you update the template of someMethod(), now all of your source file's version will be updated, even if the source updates is auto-generated. you will see redundant history.

You can use cglib to generate code at runtime.

Iterating through the fields and defining someMethod is a pretty vague problem statement, so it's hard to give you a very useful answer, but Eclipse's refactoring support provides some excellent tools. It'll give you constructors which initialize a selected set of the defined members, and it'll also define a toString method for you.
I don't know what other someMethod()'s you'd want to consider, but there's a start for you.

I'd be very wary of injecting generated code into files containing hand-written code. Hand-written code should be checked into revision control, but generated code should not be; the code generation should be done as part of the build process. You'd have to structure your build process so that for each file you make a temporary copy, inject the generated source code into it, and compile the result, without touching the original source file that the developers work on.

Antlr is really a great tool that can be used very easily for transforming Java source code to Java source code.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.