JVM bytecode limitations on class-class interactions - java

I was looking through the JVM bytecode instructions and was surprised to see that all the interactions between classes (e.g. casting, new, etc.) rely upon constant pool lookups for identity of the other classes.
Am I correct in inferring that this means that one class cannot know about the existence of more than 64k others, as it is impossible to refer to them? If one did need to refer to that many, what ought one do--delegate the work to multiple classes each of which could have their own <64k interactions?
(The reason this interests me is that I have a habit of writing code generators, sometimes producing thousands of distinct classes, and that some languages (e.g. Scala) create classes prolifically. So it seems that if true I have to be careful: if I have hundreds of methods in a class each using hundreds of (distinct) classes, I could exceed the constant pool space.)

Am I correct in inferring that this means that one class cannot know about the existence of more than 64k others, as it is impossible to refer to them?
I think you are correct. And don't forget that there are constant pool entries for other things; e.g. all of the classes method and fields names, and all of its literal strings.
If one did need to refer to that many, what ought one do--delegate the work to multiple classes each of which could have their own <64k interactions?
I guess so.
However, I'm not convinced that this concern would ever be realized in practice. It is hard to conceive of a class that needs to directly interact with that many other classes ... unless the code generator is ignoring the structure of its input source code.

It sounds like your problem could be solved via invokedynamic. This is basically a much faster form of reflection designed to ease the implementation of dynamic languages on the JVM.
If you really do have to deal with thousands of automatically generated classes, you probably don't want to statically link it all. Just use invokedynamic. This also has the advantage of letting you defer some code generation to runtime.
Note that you still need a constant pool entry for every dynamic method called by a class, but you no longer need to refer to the actual class and methods being called. In fact, you can create them on demand.

Related

When creating multiple instances of the same object, does Java replicate the method implementations?

I've tried searching around for my answer but can't seem to find one.
I was curious if Java and or other modern languages optimize the replication of objects by doing some sort of virtual mapping for the methods. It would seem to be a waste if every time a new instance of a object is created, it would copy the methods associated with it rather then perhaps mapping these to one place in memory.
I can see some cases, such as polymorphism, where it might not work.
This might be more of a fundamentals question but I am very curious how the compiler handles this.
Thanks!
Strictly speaking, it's none of your business:
The Java Virtual Machine does not mandate any particular internal structure for objects.
(JVM Spec)
So, if you were to write your own JVM, and for some reason you chose to put a copy of method code into every in-memory representation of an object, you would be free to do so.
However, there are various aspects of how the language is defined, that mean that it's not possible for two objects of the same class to have methods that differ -- even if they're non-static inner classes, dynamic classes, etc.
Therefore you're right that it would be wasteful of space to duplicate the method code for each instance, and no serious implementation of Java does so.

JAVA bytecode optimization

This is a basic question.
I have code which shouldn't run on metadata beans. All metadata beans are located under metadata package.
Now,
I use reflection API to find out whether a class is located in the the metadata package.
if (newEntity.getClass().getPackage().getName().contains("metadata"))
I use this If in several places within this code.
The question is: Should I do this once with:
boolean isMetadata = false
if (newEntity.getClass().getPackage().getName().contains("metadata")) {
isMetadata = true;
}
C++ makes optimizations and knows that this code was already called and it won't call it again. Does JAVA makes optimization? I know reflection API is a beat heavy and I prefer
not to lose expensive runtime.
You should of course check whether there really is a performance issue before putting any work into optimising. getClass() is probably quite fast already (faster than instanceof, anyway). You could probably cache the set of classes that are in the metadata package so you don't need to keep checking the package names.
If you really need to compare packages, you could find the metadata package once, using the Package.getPackage(String name) method, then for each object, call getClass().getPackage() as before, and compare the two package objects.
This is quicker and more elegant than checking for a string in the package name, but would probably not work correctly if there are multiple classloaders, as the Package objects wouldn't be equal (==) and Package doesn't over-ride .equals(). Thinking about it, it may not even be guaranteed to work on a single classloader, but I suspect that in practice you get the same Package instance rather than another copy - would be wise to check this first!, e.g:
String.class.getPackage() == Integer.class.getPackage() // should be true
Update if you check the source code for Class.getPackage(), Package.getPackage() and ClassLoader.getPackage() you can see that they cache the Package objects, so you should be safe comparing them when using a single classloader
One problem of a package-naming convention is that you have to enforce and maintain it throughout the codebase, which could become a maintenance problem over time. A more explicit way of identifying the classes might be better.
Alternative approaches to identify specific groups of classes include:
Making your metadata beans implement a marker interface
Using Java Annotations to mark metadata beans
Making all beans implement a common interface with a method that can be called to check whether the are in a specific category that you define. This is ugly as it's basically duplicating the type system, but would be fast since it doesn't need reflection.

Should i avoid using STATIC vaiables

I am designing a java API(not an API exactly) in my office which will contain 4000+ constants. So all the teams can use them directly. Initially i thought to create their classes according to their type and create their static objects into a separate class. So anybody any use them directly.
But after reading need of static variable, i afraid it could be a problem of creating so many static variables. Is there any alternate?
*After me whoever will join this project can also add a constraint in my Constant class without caring of performance. There is a possibility that many of the constants will be used rarely.
*Every member of Constant class will represent a class which will have its own behavior. It might be the part of some inheritance tree further. So using enum might not be a good idea.
You want to create a location where some 4000+ constants will live. there's the possibility that users of this class may add constants (possibly at runtime)?. T
Concern about the memory issues of statics is misplaced. If you need 4000 values, they're going to have to live somewhere, right?
If people will be adding values at runtime, this sounds like a singleton Map or Properties (which is really just a kind of map anyway) of some kinds. People often use dependency injection frameworks like Spring or Guice to manage this sort of thing.
If you just mean adding compile constants, you can make them all static. You'd probably want to make them static final as well, they'll be compiled inline.
It's very likely that 4000 constants is a very bad idea. Where I've seen systems with large numbers of constants (>100, even) defined in one place, what usually happens is that people forget the definitions of them and end up using their own variants , which sort of defeats the purpose (for example, I've worked on a system with 100's of SQL queries defined in a "Queries" class. Of course people immediately ignore it as it's more of a bother to look up if the exact query you need is in there than to roll your own. The class eventually grew to something like 1500 queries, many exact duplicates and many unused, most used once. Utterly pointless). I can imagine exceptions where you wouldn't "lose" things with naming conventions, but unless you've got a use case like that this seems like a really bad idea.
Breaking out your constants into enums gives you type-safe references. It also makes things conceptually easier to deal with. Compare:
-
public class Constants {
String WORK_ADDRESS;
String WORK_PHONE;
String HOME_ADDRESS;
String HOME_PHONE;
}
with
public enum ADRESS{ WORK, HOME }
public enum PHONE { WORK, PHONE }
Which would you rather work with?
Performance is highly unlikely to be the problem with this design. RAM is cheap. (Cue the usual quote: Premature optimization is the root of all evil.)
On the other hand, I'm not quite sure how any client developer can remember and use 4000+ constants. Can you give us an idea what sort of object this is?
You may, depending on details you haven't given us, find it useful to collect constants into enums. Stateless enums can be easier to understand than public static final variables if there are some natural groupings you can take advantage of.
What happens when you allocate in static is that it surely won't be freed in the runtime of your app.
so what ?
if you don't create them static then they'll be duplicated through every instance of your classes.
what you don't want to do is to set static huge amounts of data such as images or GUI
an image takes up a lot more than a few fields;
4000 constants of surely int (4 octet) = 16000 octets not even the size of an icon ^^
I would point out hte Javadoc to prove my point
http://download.oracle.com/javase/1.4.2/docs/api/constant-values.html#java.awt.event.KeyEvent.CHAR_UNDEFINED
this is the KeyEvent decleration in Java, check the out declarations ^^
Unless you are creating large arrays or very long strings, 4000 data values isn't going to be a lot of memory. I think that post you cited was talking about much larger amounts of data.
Another approach is to read the values from a preferences file.
Perhaps the constants be modularized into a collection of classes, so the more rarely used ones will be loaded only on demand.

Java super-tuning, a few questions

Before I ask my question can I please ask not to get a lecture about optimising for no reason.
Consider the following questions purely academic.
I've been thinking about the efficiency of accesses between root (ie often used and often accessing each other) classes in Java, but this applies to most OO languages/compilers. The fastest way (I'm guessing) that you could access something in Java would be a static final reference. Theoretically, since that reference is available during loading, a good JIT compiler would remove the need to do any reference lookup to access the variable and point any accesses to that variable straight to a constant address. Perhaps for security reasons it doesn't work that way anyway, but bear with me...
Say I've decided that there are some order of operations problems or some arguments to pass at startup that means I can't have a static final reference, even if I were to go to the trouble of having each class construct the other as is recommended to get Java classes to have static final references to each other. Another reason I might not want to do this would be... oh, say, just for example, that I was providing platform specific implementations of some of these classes. ;-)
Now I'm left with two obvious choices. I can have my classes know about each other with a static reference (on some system hub class), which is set after constructing all classes (during which I mandate that they cannot access each other yet, thus doing away with order of operations problems at least during construction). On the other hand, the classes could have instance final references to each other, were I now to decide that sorting out the order of operations was important or could be made the responsibility of the person passing the args - or more to the point, providing platform specific implementations of these classes we want to have referencing each other.
A static variable means you don't have to look up the location of the variable wrt to the class it belongs to, saving you one operation. A final variable means you don't have to look up the value at all but it does have to belong to your class, so you save 'one operation'. OK I know I'm really handwaving now!
Then something else occurred to me: I could have static final stub classes, kind of like a wacky interface where each call was relegated to an 'impl' which can just extend the stub. The performance hit then would be the double function call required to run the functions and possibly I guess you can't declare your methods final anymore. I hypothesised that perhaps those could be inlined if they were appropriately declared, then gave up as I realised I would then have to think about whether or not the references to the 'impl's could be made static, or final, or...
So which of the three would turn out fastest? :-)
Any other thoughts on lowering frequent-access overheads or even other ways of hinting performance to the JIT compiler?
UPDATE: After running several hours of test of various things and reading http://www.ibm.com/developerworks/java/library/j-jtp02225.html I've found that most things you would normally look at when tuning e.g. C++ go out the window completely with the JIT compiler. I've seen it run 30 seconds of calculations once, twice, and on the third (and subsequent) runs decide "Hey, you aren't reading the result of that calculation, so I'm not running it!".
FWIW you can test data structures and I was able to develop an arraylist implementation that was more performant for my needs using a microbenchmark. The access patterns must have been random enough to keep the compiler guessing, but it still worked out how to better implement a generic-ified growing array with my simpler and more tuned code.
As far as the test here was concerned, I simply could not get a benchmark result! My simple test of calling a function and reading a variable from a final vs non-final object reference revealed more about the JIT than the JVM's access patterns. Unbelievably, calling the same function on the same object at different places in the method changes the time taken by a factor of FOUR!
As the guy in the IBM article says, the only way to test an optimisation is in-situ.
Thanks to everyone who pointed me along the way.
Its worth noting that static fields are stored in a special per-class object which contains the static fields for that class. Using static fields instead of object fields are unlikely to be any faster.
See the update, I answered my own question by doing some benchmarking, and found that there are far greater gains in unexpected areas and that performance for simple operations like referencing members is comparable on most modern systems where performance is limited more by memory bandwidth than CPU cycles.
Assuming you found a way to reliably profile your application, keep in mind that it will all go out the window should you switch to another jdk impl (IBM to Sun to OpenJDK etc), or even upgrade version on your existing JVM.
The reason you are having trouble, and would likely have different results with different JVM impls lies in the Java spec - is explicitly states that it does not define optimizations and leaves it to each implementation to optimize (or not) in any way so long as execution behavior is unchanged by the optimization.

How are java interfaces implemented internally? (vtables?)

C++ has multiple inheritance. The implementation of multiple inheritance at the assembly level can be quite complicated, but there are good descriptions online on how this is normally done (vtables, pointer fixups, thunks, etc).
Java doesn't have multiple implementation inheritance, but it does have multiple interface inheritance, so I don't think a straight forward implementation with a single vtable per class can implement that. How does java implement interfaces internally?
I realize that contrary to C++, Java is Jit compiled, so different pieces of code might be optimized differently, and different JVMs might do things differently. So, is there some general strategy that many JVMs follow on this, or does anyone know the implementation in a specific JVM?
Also JVMs often devirtualize and inline method calls in which case there are no vtables or equivalent involved at all, so it might not make sense to ask about actual assembly sequences that implement virtual/interface method calls, but I assume that most JVMs still keep some kind of general representation of classes around to use if they haven't been able to devirtualize everything. Is this assumption wrong? Does this representation look in any way like a C++ vtable? If so do interfaces have separate vtables and how are these linked with class vtables? If so can object instances have multiple vtable pointers (to class/interface vtables) like object instances in C++ can? Do references of a class type and an interface type to the same object always have the same binary value or can these differ like in C++ where they require pointer fixups?
(for reference: this question asks something similar about the CLR, and there appears to be a good explanation in this msdn article though that may be outdated by now. I haven't been able to find anything similar for Java.)
Edit:
I mean 'implements' in the sense of "How does the GCC compiler implement integer addition / function calls / etc", not in the sense of "Java class ArrayList implements the List interface".
I am aware of how this works at the JVM bytecode level, what I want to know is what kind of code and datastructures are generated by the JVM after it is done loading the class files and compiling the bytecode.
The key feature of the HotSpot JVM is inline caching.
This doesn't actually mean that the target method is inlined, but means that an assumption
is put into the JIT code that every future call to the virtual or interface method will target
the very same implementation (i.e. that the call site is monomorphic). In this case, a
check is compiled into the machine code whether the assumption actually holds (i.e. whether
the type of the target object is the same as it was last time), and then transfer control
directly to the target method - with no virtual tables involved at all. If the assertion fails, an attempt may be made to convert this to a megamorphic call site (i.e. with multiple possible types); if this also fails (or if it is the first call), a regular long-winded lookup is performed, using vtables (for virtual methods) and itables (for interfaces).
Edit: The Hotspot Wiki has more details on the vtable and itable stubs. In the polymorphic case, it still puts an inline cache version into the call site. However, the code actually is a stub that performs a lookup in a vtable, or an itable. There is one vtable stub for each vtable offset (0, 1, 2, ...). Interface calls add a linear search over an array of itables before looking into the itable (if found) at the given offset.

Categories