easy way to find references to other classes in class file

easy way to find references to other classes in class file - java

The class file format as described in http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html contains all references to other classes in the the constant pool as entries of type CONSTANT_Utf8.
But these entries are not only references to classes but also class literals, names of methods, fields and what not.
In a first attempt I thought it would be sufficient to use the constant pool entries referenced by other constant_pool entries of type CONSTANT_Class, CONSTANT_NameAndType and CONSTANT_MethodType
But these don't seem to include type parameters and annotations. Further reading of the specification seems to suggest that I need to parse things like RuntimeVisibleAnnotations and similar constructs in order to identify the relevant constant pool entries. Which means I have to parse more or less the complete class file.
But the whole idea behind parsing the class file myself was that it would be simpler then using a library like ASM, because I thought it would be sufficient to interpret the constant pool.
My question is: Is there a way to reliable identify all classes referenced in a class file by just interpreting little more than the constant pool?

Annotation types that cannot be loaded by a class loader are ignored by this class loader and will simply appear to be invisible at runtime. I assume that this is the reason that types that are referenced by an annotation are not stored in the constant pool where the resolution of an unknown type would prohibit successful class loading. Annotations are code attributes, i.e. meta data and they should not be linked deeply into the class by avoiding a constant pool entry.
You are therefore required to also introspect RuntimeVisibleAnnotations which live outside of the constant pool. However, if the constant pool does not contain a string RunntimeVisibleAnnotations, your approach is working. ASM has however very little overhead so I would use it nevertheless.

Related

It appears that retransformClasses removes user defined attributes(?) How can I add notes to a method that are preserved through retransformation?

As part of our instrumentation tool suite, we have a static prepass that modifies some methods of a class and then marks those methods with a user defined attribute. When the application is run, if the class file is presented directly to the the transform() method, i.e. it is the first load of the class, I can see these attributes. But if I use retransformClasses(), then when I get control in at the transform() method my attributes have been deleted. I can see why the JVM might discard unknown attributes when recreating the class bytes to pass to transform(), but I cannot find any documentation verifying and/or describing this behavior.
How can I accomplish this goal? I can see no guarantee that the same does not happen for RuntimeVisible annotations. And even if they are preserved, they are so much more difficult to work with than attributes I would like to avoid that approach.
Any ideas on how to add 'notes' to a method that are preserved through retransformClasses()?
Thanks for any suggestions.

Once a class file is loaded, HotSpot JVM does not preserve the original bytecode. Instead, it reconstitutes the bytecode from the internal VM representation when needed. The attributes that VM does not understand are not restored.
The documentation to retransformClasses explicitly mentions this possibility:
The initial class file bytes represent the bytes passed to
ClassLoader.defineClass or redefineClasses (before any transformations
were applied), however they might not exactly match them. The constant
pool might not have the same layout or contents. The constant pool may
have more or fewer entries. Constant pool entries may be in a
different order; however, constant pool indices in the bytecodes of
methods will correspond. Some attributes may not be present. Where
order is not meaningful, for example the order of methods, order might
not be preserved.
In the mean time, RuntimeVisibleAnnotations attribute is understood by the JVM. Moreover, there is Java API to access them, therefore JVM cannot throw them away during transformation. HotSpot JVM indeed writes RuntimeVisibleAnnotations when reconstituting the bytecode.
So, your best bet is to use annotations - after all, they are designed exactly for marking members with user-defined metadata.

Mapping of Constant pool and method Area

I am trying to understand how the class file is loaded into method area and execute. I am very much confused about the constant pool.
when the constant pool is created initially? while compiling the
class file or when the class is loaded.
How the byte code is organized in method area What the method table
consists of?
Can anyone show the sketch the picture representation of mapping in
method area for clear understanding

Since the literal meaning of “constant pool” is just “pool of constants”, there are different things of the name, which are easy to confuse
Each class file has a constant pool describing all constants used in that class, which includes constant values but also symbolic references needed for linkage. Some entries fulfill both roles, e.g. class entries may serve as owner declaration for a symbolic reference to a member, needed when accessing a field or invoking a method, but may also be used to get a Class instance, e.g. for a class literal appearing in source code. Since it’s part of the class file, its format is specified within The Java® Virtual Machine Specification, §4 The class File Format, in §4.4. The Constant Pool.
As said by other answers, you can use the command javap -v class.name to inspect the constant pool of a class.
There is a corresponding data structure at runtime, also known as run-time constant pool. Since certain values are represented as runtime objects (e.g. of type String, Class, MethodType, or MethodHandle), and symbolic references must be resolved to the runtime representation of the denoted classes and members, this structure is not the same as the byte sequence found in the class file. But these entries correspond, so that each time, an object is instantiated for a constant or a symbolic reference is resolved, the result can be remembered and reused the next time the same constant entry is accessed.
This doesn’t imply that an implementation must have a 1:1 representation of each class’ constant pool. It’s possible that a specific implementation maps a class’ pool to a shared pool used for a all classes of the same class loading context, where each symbolic reference resolves to the same target.
There’s also the string pool, which can be seen as part of the runtime constant pool, holding references to all String instances associated with string constants, to allow resolving all identical string constants of all classes to the same String instance.

When a Java file is compiled, all references to variables and methods are stored in the class's constant pool as a symbolic reference.
Here is a link for your reference : What is the purpose of the Java Constant Pool?

javac creates a constant pool when you compile your source to .class file. You can see it if you make
javap -v MyClass
to your MyClass.class
The Java Virtual Machine has a method area that is shared among all Java Virtual Machine threads.
You can see bytecode of your class file by
'javap -c -v Main'
Method Area is just a part of the heap where JVM has all information about this class.

Java collect all used classes from bytecode

I trying to implement a RemoteClassLoader which copy and load all classes which will be used in runtime. First I need to collect the used Classes, I found a solution:
Find out which classes of a given API are used
but this is not exactly what a need, it collect only the "visible" class usages, just like loading the class and iterating all of declared field and methods, and collecting all types.
I have a class which contains only static methods, instance of this method is not used, so it will be never given to a function or will be a filed, and so I can't see that class.
Naturally the bytecode file contains the name of this class:
strings TestClass.class | grep -i "json"
gives: org/json/JSONObject
And yes that class I search and not fond.
How can I find it? And the others which I use only in functions.

The easiest, albeit conservative method is to simply take all of the Class_info entries from the constant pool. In order to call a method or access a field of a class, there must be a constant pool entry for that class (not counting reflection and not counting overriding methods in subclasses).
There are a number of tools out there that will parse a classfile and give you access to this. Reflection of course is much harder, and in general undecideable.
Edit: This won't include type descriptors, which are just Utf8_infos. If you want to find classes used as types as well, there are two approaches. Either you can go through all the Utf8s and include everything that looks like a descriptor (which may have false positives in rare cases), or you can go through the classfile and find all the type descriptor references.

Serialization - What is the advantage of using ObjectStreamField [] serialPersistentFields?

For class that implements Serializable interface there are 2 ways to define what specific fields get streamed during the serialization:
By default all non-static, non-transient fields that implement Serializable are preserved.
By definning ObjectStreamField [] serialPersistentFields and explicitly declaring the specific fields saved.
I wonder, what is the advantage of the second method except for the ability to define specific fields order?

The 'advantage' is that it does what it says in the Javadoc: defines which fields are serialized. Without it, all non-transient non-static fields are serialized. Your choice.

The advantage is you can conditionally populate ObjectStreamField at runtime albeit only once per JVM lifecycle to determine which fields should be serialized.
private static final ObjectStreamField [] osf;
static {
//code to init osf
}

Luckily, I'm actually writing this up right now.... Besides the advantages mentioned (and I don't know much about unshared), writing your own output format seems to have the following advantages:
Allows conditional output (different uses for serialization, such as persistence and copying, can serialize different parts of the object).
Should be faster, use less memory, and in some cases use less disk than the default mechanism (this is from Bloch's Effective Java 2).
Allows you to rename variables in a serialized class while maintaining backwards-compatibility.
Allows you to access data from deleted fields in a new version (in other words, change the internal representation of your data while maintaining backwards-compatibility).
I've seen the documentation you're quoting, and mentioning just those 2 options is a bit misleading and leaves quite a bit out: you can customize your serialization format in 2 ways, by using the ObjectOutput/InputStream interface to write and read fields in a particular order (described in Bloch), and using the PutField and GetField classes to write and read fields by name. You can use serialPersistentFields as your quote mentions to extend this second method, but it's not required unless you need to read or write data with a name which is not a member variable name.
There's a 3rd way to control format as well, using the Externizable interface, though I haven't explored that much. And some of the advantages can also be gotten through Serialization Proxies (see Bloch).
Anyone feel free to correct me on details if I missed anything.

In serialPersistentFields you can specify fields that are not necessarily present in the class anymore.
See for example the jdk class java.math.BigInteger, where several fields are read and written which don't exist anymore in the class. These obsolete fields are still read and written for compatibility with older versions. The reading and writing of these fields is handled by the readObject() and writeObject() methods.
See also
http://docs.oracle.com/javase/7/docs/platform/serialization/spec/serial-arch.html#6250

Storing property value names as String constants - performance and memory usage?

I use around 1000 properties associated with a specific java.util.Properties which is backed by a file. The main reason for the file is to change them without recompiling the program and to allow users to adjust their according to their taste.
Some of the properties are used only in one place in the code, but there are some properties that are used several times in different segments of code and even different classes.
I have recently got the habit of declaring all those properties that are used as a String constants, usually in a seperate interface like this:
public interface AnimalConstants {
public static final String WEIGHT_PROPERTY = "weight";
public static final String RUNNING_SPEED_PROPERTY = "speedInKph";
public static final String HOURS_OF_SLEEP_A_DAY_PROPERTY = "sleepHrs";
...
}
When a class need to access some of the animal properties, I just implement this interface and I have access to all those property constants declared. When I need a specific property I just use the corresponding constant, without thinking what is its exact name (since abbreviations are used often) and, what is more important, the risk of mistyping a property name is eliminated this way. Another advantage is that if I later choose to rename a property to make it more clear to the advanced user who configures those properties), I just need to change that name in the interface where that property constant is declared (and of course the property file), so no need to "search and replace" the entire project. Finally, I can easily check whether the property is being used or not; I just comment it, compile the code and see if there is an error.
However, in spite of all these advantages, I am curious what are the disadvantages of this approach. What interests me the most is the following:
What impact has this approach (1000 String constants) on the String pool? Or they are created on-demand when I access those constants? Does this prevent other Strings from being cached in the String pool?
What is the performance cost of this approach compared to the one where I use hard-coded String constants, is it the same (neglecting the cost of accessing field)? Does the String pool behave similarly or a lot different?
What's the average memory increase with this approach, are all those String constants kept in memory all the time?
Any good comment/observation is welcome.

Static fields are are initialized during the initialization phase during class loading.
But if a primitive type or a String is defined as a constant and the value is known at compile time, the compiler replaces the constant name everywhere in the code with its value. This is called a compile-time constant. If the value of the constant in the outside world changes (for example, if it is legislated that pi actually should be 3.975), you will need to recompile any classes that use this constant to get the current value.
This is when String Literals for unique string's will be created which are defined as values of constants.
But it is similar as loading constants from resources in Properties object (writing code for same). Constants definitely consume memory.
String pool behavior wont change.
Some thought on design approach:
It is very easy to put all configuration constants in a class, then refer to them throughout the app.
To change the static configuration constants you modify the source and recompile.
What if at some point in future, your program needs maintain more than one configuration, or to alternate between configurations as it processes different files, or even runs one thread with one configuration and another thread with a different configuration. You won’t be able to use that technique.
So for better designs you store constants which never change as static constants in Class definition.These are loaded as soon as Class is loaded in memory.
In other case which i described earlier (Resource loading) you keep these in various properties files which can be loaded in JAVA Properties object. Examples for such cases can be JDBC connection related information, etc...

1) What impact has this approach (1000 String constants) on the String pool?
Declaring property names as String constants is the right decision. But there will be no change in the 'String Literal Pool'. When multiple instances of the same literal present, all of them are simply linked to same item in the literal pool (provided String constructor is not used).
2) they are created on-demand when I access those constants?
String literals are added to 'String Literal Pool' during class loading.
3) Does this prevent other Strings from being cached in the String pool?
No.
4) What is the performance cost of this approach compared to the one where I use hard-coded String constants, is it the same (neglecting the cost of accessing field)? Does the String pool behave similarly or a lot different?
It is the same.
5) What's the average memory increase with this approach, are all those String constants kept in memory all the time?
I have already answered. :)
Additional notes
a) Constants interface is a Java anti-pattern. Please avoid it.
b) You must read this Javaranch article.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.