Java collect all used classes from bytecode

Java collect all used classes from bytecode - java

I trying to implement a RemoteClassLoader which copy and load all classes which will be used in runtime. First I need to collect the used Classes, I found a solution:
Find out which classes of a given API are used
but this is not exactly what a need, it collect only the "visible" class usages, just like loading the class and iterating all of declared field and methods, and collecting all types.
I have a class which contains only static methods, instance of this method is not used, so it will be never given to a function or will be a filed, and so I can't see that class.
Naturally the bytecode file contains the name of this class:
strings TestClass.class | grep -i "json"
gives: org/json/JSONObject
And yes that class I search and not fond.
How can I find it? And the others which I use only in functions.

The easiest, albeit conservative method is to simply take all of the Class_info entries from the constant pool. In order to call a method or access a field of a class, there must be a constant pool entry for that class (not counting reflection and not counting overriding methods in subclasses).
There are a number of tools out there that will parse a classfile and give you access to this. Reflection of course is much harder, and in general undecideable.
Edit: This won't include type descriptors, which are just Utf8_infos. If you want to find classes used as types as well, there are two approaches. Either you can go through all the Utf8s and include everything that looks like a descriptor (which may have false positives in rare cases), or you can go through the classfile and find all the type descriptor references.

Related

When I use java.lang.instrument.Instrumentation#redefineClasses(), should I pass an array to the function?

I used java.lang.instrument.Instrumentation#redefineClasses() to redefine existing classes. Sometimes, I need to redefine several classes.
If I redefine classes one by one, I will know which ones were successful and which ones failed.
But is it better to put classes redefined in an array together to get more correctness?

If I redefine class one by one, I will know which is success, which is
failed.
True.
But is it better to put classes redefined in an array together to get
more correctness?
I didn't get what you meant by more correctness! But, anyways from my understanding, using a set(array) of classes can be particularly helpful in the case where there is an interdependence of one class on other class. So, in this case you can re-define both classes by passing them in this method.
Also, Java Documentation of Interface Instrumentation says :
This method is used to replace the definition of a class without
reference to the existing class file bytes, as one might do when
recompiling from source for fix-and-continue debugging.
...
This method operates on a set in order to allow interdependent changes to more than one class at the same time (a redefinition of
class A can require a redefinition of class B).
But, do keep remember :
If this method throws an exception, no classes have been redefined.

It is much more performant to instrument classes in a batch compared to passing each class individually.
The JVM needs to halt the application for applying the redefinition which is a rather costly operation so it is worth grouping up.
Also, grouping allows for interdependent changes of classes.

Creating a unique class at runtime for use as a key in a HashMap

I understand this is a terrible hack, but I'm required to edit an external library to conform to our project's needs. The project I'm changing stores a map of classes to instances of said classes. The project's original intention is to make it impossible to have duplicate classes, but I require them. My solution is to apply a UniqueClass field to each relevant object, and then each object also have a reference to the class that I need to create an instance of. In this way, what the UniqueClass is doesn't actually matter at all, only that it's unique.
Now, I need a way to create a unique class at runtime to store in this map. Here are the options I see:
Generate and compile the actual .java files at runtime. I've actually implemented this and it works, but it's somewhat slow and requires JDK (doesn't work with JRE since it needs access to certain libraries). I don't want to require JDK configuration since non-devs will likely be using this functionality.
Generate a unique anonymous class. This works, but only with the first duplicate. Any additional duplicates are treated as the same as the original anonymous class (ClassBuilder$1). I've read here it's possible to have ClassBuilder$2 etc, but I don't know how to do that.
Object object = new Object(){};
return object.getClass();
Use a Proxy class. I don't really understand these but it had the same results as anonymous class above, since the javadocs state that if a proxy class already exists, it just returns that one.
Class proxyClass = Proxy.getProxyClass(inter.class.getClassLoader(), new Class[] { inter.class });
return proxyClass;
(Truly the most terrible way) Create a package of classes that are iterated through as each one is used as a UniqueClass. Ugly code, lots of unnecessary classes, and ultimately a limit on the number of duplicates possible.
Is there an elegant solution to this problem?

I am modifying the source code itself.
In that case create a wrapper object to use as the key in the Map. You can override equals() and hashCode() as necessary to achieve your goals. In essence you are adding one more layer of indirection to your existing multi-map of classes and instances.

easy way to find references to other classes in class file

The class file format as described in http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html contains all references to other classes in the the constant pool as entries of type CONSTANT_Utf8.
But these entries are not only references to classes but also class literals, names of methods, fields and what not.
In a first attempt I thought it would be sufficient to use the constant pool entries referenced by other constant_pool entries of type CONSTANT_Class, CONSTANT_NameAndType and CONSTANT_MethodType
But these don't seem to include type parameters and annotations. Further reading of the specification seems to suggest that I need to parse things like RuntimeVisibleAnnotations and similar constructs in order to identify the relevant constant pool entries. Which means I have to parse more or less the complete class file.
But the whole idea behind parsing the class file myself was that it would be simpler then using a library like ASM, because I thought it would be sufficient to interpret the constant pool.
My question is: Is there a way to reliable identify all classes referenced in a class file by just interpreting little more than the constant pool?

Annotation types that cannot be loaded by a class loader are ignored by this class loader and will simply appear to be invisible at runtime. I assume that this is the reason that types that are referenced by an annotation are not stored in the constant pool where the resolution of an unknown type would prohibit successful class loading. Annotations are code attributes, i.e. meta data and they should not be linked deeply into the class by avoiding a constant pool entry.
You are therefore required to also introspect RuntimeVisibleAnnotations which live outside of the constant pool. However, if the constant pool does not contain a string RunntimeVisibleAnnotations, your approach is working. ASM has however very little overhead so I would use it nevertheless.

How to compare 2 classes which are loaded from 2 different classloader

here is my case:
classloader A, loaded one class("Class1");
then, I changed Class1.java and compile it.
next I loaded Class1.class again by classloader B.
I want to compare these 2 classes, check whether the class meta data changed by someone.
Is there any way to compare 2 classes' definition data?

I am not entirely sure what you mean by "the class meta data" beyond what you can find through the reflection APIs. Here is an attempt to answer the question based on my best guess.
By definition data do you mean their declared internal variables and method signatures? Because you can do that with reflection (getDeclaredMethod() and getDeclaredFields()). However, if the two classes are loaded from different class loaders, they will not be equal (see the Class javadocs on equality), even if they are loaded from the same compiled bytecode.
There is other information you can get from the Reflection APIs, including what class it inherits from, what interfaces it implements, and any Annotations that are compiled in with it (assuming 1.5 or higher of course).
You could also potentially do a hash of the Class files (finding them through the classloader is possible) and see if they are different - that would tell you if they had different code in them.
Hope that helps.

thanks!
Reflection could collect one class's meta data, but it's hard to check whether one class is changed.
I can locate that class file, but also it's hard to check whether one class is changed.
I assumed there should be a way to check loaded classes, whether they have the same data(from the same java file).

How to determine which classes are referenced in a compiled .Net or Java application?

I wonder if there's an easy way to determine which classes from a library are "used" by a compiled .NET or Java application, and I need to write a simple utility to do that (so using any of the available decompilers won't do the job).
I don't need to analyze different inputs to figure out if a class is actually created for this or that input set - I'm only concerned whether or not the class is referenced in the application. Most likely the application would subclass from the class I look for and use the subclass.
I've looked through a bunch of .Net .exe's and Java .classes with a hex editor and it appears that the referenced classes are spelled out in plaintext, but I am not sure if it will always be the case - my knowledge of MSIL/Java bytecode is not enough for that. I assume that even though the application itself can be obfuscated, it'll still have to call the library classes by the original name?

Extending what overslacked said.
EDIT: For some reason I thought you asked about methods, not types.
Types
Like finding methods, this doesn't cover access through the Reflection API.
You have to locate the following in a Reflector plugin to identify referenced types and perform a transitive closure:
Method parameters
Method return types
Custom attributes
Base types and interface implementations
Local variable declarations
Evaluated sub-expression types
Field, property, and event types
If you parse the IL yourself, all you have to do is process from the main assembly is the TypeRef and TypeSpec metadata, which is pretty easy (of course I'm speaking from parsing the entire byte code here). However, the transitive closure would still require you process the full byte code of each referenced method in the referenced assembly (to get the subexpression types).
Methods
If you can write a plugin for Reflector to handle the task, it will definitely be the easiest way. Parsing the IL is non-trivial, though I've done it now so I would just use that code if I had to (just saying it's not impossible). :D
Keep in mind that you may have method dependencies you don't see on the first pass that neither method mentioned will catch. These are due to indirect dispatch via the callvirt (virtual and interface method calls) and calli (generally delegates) instructions. For each type T created with newobj and for each method M within the type, you'll have to check all callvirt, ldftn, and ldvirtftn instructions to see if the base definition for the target (if the target is a virtual method) is the same as the base method definition for M in T or M is in the type's interface map if the target is an interface method. This is not perfect, but it is about the best you can do for static analysis without a theorem prover. It is a superset of the actual methods that will be called outside of the Reflection API, and a subset of the full set of methods in the assembly(ies).

For .NET: it looks like there's an article on MSDN that should help you get started. For what it's worth, for .NET the magic Google words are ".net assembly references".

In Java, the best mechanism to find class dependencies (in a programmatic fashion) is through bytecode inspection. This can be done with libraries like BCEL or (preferably) ASM. If you wish to parse the class files with your own code, the class file structure is well documented in the Java VM specification.
Note that class inspection won't cover runtime dependencies (like classes loaded using the service API).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java collect all used classes from bytecode - java

Related

When I use java.lang.instrument.Instrumentation#redefineClasses(), should I pass an array to the function?

Creating a unique class at runtime for use as a key in a HashMap

easy way to find references to other classes in class file

How to compare 2 classes which are loaded from 2 different classloader

How to determine which classes are referenced in a compiled .Net or Java application?

Categories

Resources