MethodHandle - What is it all about?

MethodHandle - What is it all about? - java

I am studying new features of JDK 1.7 and I just can't get it what MethodHandle is designed for? I understand (direct) invocation of the static method (and use of Core Reflection API that is straightforward in this case). I understand also (direct) invocation of the virtual method (non-static, non-final) (and use of Core Reflection API that requires going through Class's hierarchy obj.getClass().getSuperclass()). Invocation of non-virtual method can be treated as special case of the former one.
Yes, I aware that there is an issue with overloading. If you want to invoke method you have to supply the exact signature. You can't check for overloaded method in easy way.
But, what is MethodHandle about? Reflection API allows you to "look on" the object internals without any pre-assumption (like implemented the interface). You can inspect the object for some purpose. But what is MethodHandle is designed too? Why and when should I use it?
UPDATE: I am reading now this http://blog.headius.com/2008/09/first-taste-of-invokedynamic.html article. According to it, the main goal is to simplify life for scripting languages that runs atop of JVM, and not for Java Language itself.
UPDATE-2: I finish to read the link above, some quotation from there:
The JVM is going to be the best VM for building dynamic languages, because it already is a dynamic language VM. And InvokeDynamic, by promoting dynamic languages to first-class JVM citizens, will prove it.
Using reflection to invoke methods works great...except for a few problems. Method objects must be retrieved from a specific type, and can't be created in a general way.<...>
...reflected invocation is a lot slower than direct invocation. Over the years, the JVM has gotten really good at making reflected invocation fast. Modern JVMs actually generate a bunch of code behind the scenes to avoid a much of the overhead old JVMs dealt with. But the simple truth is that reflected access through any number of layers will always be slower than a direct call, partially because the completely generified "invoke" method must check and re-check receiver type, argument types, visibility, and other details, but also because arguments must all be objects (so primitives get object-boxed) and must be provided as an array to cover all possible arities (so arguments get array-boxed).
The performance difference may not matter for a library doing a few reflected calls, especially if those calls are mostly to dynamically set up a static structure in memory against which it can make normal calls. But in a dynamic language, where every call must use these mechanisms, it's a severe performance hit.
http://blog.headius.com/2008/09/first-taste-of-invokedynamic.html
So, for Java programmer it is essentially useless. Am I right? From this point of view, It can be only considered as alternative way for Core Reflection API.
UPDATE-2020: Indeed, MethodHandle can be thought as s more powerful alternative to Core Reflection API. Starting with JDK 8 there are also Java Language features that use it.

What you can do with MethodHandles is curry methods, change the types of parameters and change their order.
Method Handles can handle both methods and fields.
Another trick which MethodHandles do is use primitive direct (rather than via wrappers)
MethodHandles can be faster than using reflection as there is more direct support in the JVM e.g they can be inlined. It uses the new invokedynamic instruction.

Think of MethodHandle as a modern, more flexible, more typesafe way of doing reflection.
It's currently in the early stages of its lifecycle - but over time has the potential to be optimized to become must faster than reflection - to the point that it can become as fast as a regular method call.

java.lang.reflect.Method is relatively slow and expensive in terms of memory. Method handles are supposed to be a "lightweight" way of passing around pointers to functions that the JVM has a chance of optimising. As of JDK8 method handles aren't that well optimised, and lambdas are likely to be initially implemented in terms of classes (as inner classes are).

Almost 9 years past since I've asked this question.
JDK 14 is last stable version that has massive usage of MethodHandle...
I've create mini-series of articles about invokedynamic https://alex-ber.medium.com/explaining-invokedynamic-introduction-part-i-1079de618512. Below, I'm quoting the relevant parts from their.
MethodHandle can be thought as s more powerful alternative to Core Reflection API. MethodHandle is such an Object which stores the metadata about the method (constructor, field, or similar low-level operation), such as the name of the method signature of the method etc. One way took on it is a destination of the pointer to method (de-referenced method (constructor, field, or similar low-level operation)).
Java code can create a method handle that directly accesses any method, constructor, or field that is accessible to that code. This is done via a reflective, capability-based API called MethodHandles.Lookup For example, a static method handle can be obtained from Lookup.findStatic. There are also conversion methods from Core Reflection API objects, such as Lookup.unreflect.
It is important to understand 2 key difference from Core Reflection API and MethodHandle.
With MethodHandle access check is done only once in construction time, with Core Reflection API it is done on every call to invoke method (and Securty Manager is invoked each time, slowing down the performance).
Core Reflection API invoke method is regular method. In MethodHandle all invoke* variances are signature polymorphic methods.
Basically, access check means whether you can access method (constructor, field, or similar low-level operation). For example, if the method (constructor, field, or similar low-level operation) is private, you can’t normally invoke it (get value from the field).
As opposed to the Reflection API, the JVM can completely see-through MethodHandles and will try to optimize them, hence the better performance.
Note: With MethodHandle you can also generate implementation logic. See Dynamical hashCode implementation. Part V https://alex-ber.medium.com/explaining-invokedynamic-dynamical-hashcode-implementation-part-v-16eb318fcd47 for details.

Related

In Java what is the point of invokevirtual, if anyway the index of a method in a method table is known at compile-time?

I am reading about dynamic dispatch, as I have an exam tomorrow.
In C++ we have conforming subclasses, so through the static type of the identifier we know what index to access in the virtual method table of the runtime object.
From what I am reading, Java has conformance for subclasses as well, but instead of including the known index of a method in the virtual method table in the compiled code, it only includes a symbolic reference to the method, that needs to be resolved.
What is the point of this if the static type does not refer to an interface? It could be much faster to do it the C++ way.

The Java platform defines linkage as a step taken at runtime. Virtual method tables aren't even involved in the JVM specification; they are just a typical way to implement linkage.
Note, however, that after the symbolic reference is resolved into a direct reference, there is nothing stopping the runtime from using very fast code paths for method invocation sites. That includes special-case optimizations such as monomorphic call sites, which have a hardwired direct pointer to the method code and are thus faster than vtable lookups. Monomorphic sites then become an easy target for method inlining, which opens a whole new field of applicable optimizations. Another option is an n-polymorphic site, accommodating up to n different target types in an inline cache.
As opposed to C++, all these optimizing decisions happen at runtime, subject to the specific conditions at work: the exact set of loaded classes, profiling data for each individual call site, etc. This gives managed-runtime platforms such as Java advantages of their own.

Bytecode generated access objects vs GeneratedMethodAccessor

I have bean util library and we cache Method/Fields of properties, of course. Reading and writing goes via reflection.
There is an idea to skip reflection and for each method/field to bytecode-generate a simple object that directly calls the target. For example, if we have setFoo(String s) method, we would call a set(String s) method of this generated class that internally calls setFoo(). Again, we are replacing the reflection call with the runtime generated direct call.
I know Java does similar thing with GeneratedMethodAccessor. But it's cache may be limited by JVM argument.
Does anyone know if it make sense to roll-on my implementation, considering the performance? On one hand, it sounds fine, but on other, there are many new classes that will be created - and fill perm gen space.
Any experience on this subject?

You are trying to re-invent cglib's FastMethod
In fact, Reflection is not slower at all. See
https://stackoverflow.com/a/23580143/3448419
Reflection can do more than 50,000,000 invocations per second. It is unlikely to be a bottleneck.

Is using Class.getMethods() a heavy operation in Java?

I have a question about the performance of the java "Class" API. I have a requirement where I have database values which could go like /car or /cars[0]/make. For each of those database values I have to see whether the particular class I am dealing with, has a setter method for /car like setCar ( or for /cars[0]/make a setCars method). Currently, I just iteratre through all the declared methods of the class (using getMethods) and then do some string checking to see using the method names match the database value. I do not invoke any method, when I do this. Although, this is using the Method API, it is really not doing any method invokation. Is this still a heavy operation in terms of java reflection? To paraphrase this, is this java reflection in use?

Yes, you are using reflection by the call getMethods. If you are concerned about performance you could profile your code using a java profiler like JIP

Yes, java reflection is used to look up the declared method and the lookup is already an expensive operation without invoking the method afterwards.
However the OpenJDK (the default implementation of java) uses an internal cache, so subsequent lookups for declared methods on the same class are way faster.
Any major framework for Object mapping (to database or JSON) uses reflection, so as long as your application does not deal with high frequency trading in sub-millisecond reaction time, you should be fine with using reflection here.

How are java interfaces implemented internally? (vtables?)

C++ has multiple inheritance. The implementation of multiple inheritance at the assembly level can be quite complicated, but there are good descriptions online on how this is normally done (vtables, pointer fixups, thunks, etc).
Java doesn't have multiple implementation inheritance, but it does have multiple interface inheritance, so I don't think a straight forward implementation with a single vtable per class can implement that. How does java implement interfaces internally?
I realize that contrary to C++, Java is Jit compiled, so different pieces of code might be optimized differently, and different JVMs might do things differently. So, is there some general strategy that many JVMs follow on this, or does anyone know the implementation in a specific JVM?
Also JVMs often devirtualize and inline method calls in which case there are no vtables or equivalent involved at all, so it might not make sense to ask about actual assembly sequences that implement virtual/interface method calls, but I assume that most JVMs still keep some kind of general representation of classes around to use if they haven't been able to devirtualize everything. Is this assumption wrong? Does this representation look in any way like a C++ vtable? If so do interfaces have separate vtables and how are these linked with class vtables? If so can object instances have multiple vtable pointers (to class/interface vtables) like object instances in C++ can? Do references of a class type and an interface type to the same object always have the same binary value or can these differ like in C++ where they require pointer fixups?
(for reference: this question asks something similar about the CLR, and there appears to be a good explanation in this msdn article though that may be outdated by now. I haven't been able to find anything similar for Java.)
Edit:
I mean 'implements' in the sense of "How does the GCC compiler implement integer addition / function calls / etc", not in the sense of "Java class ArrayList implements the List interface".
I am aware of how this works at the JVM bytecode level, what I want to know is what kind of code and datastructures are generated by the JVM after it is done loading the class files and compiling the bytecode.

The key feature of the HotSpot JVM is inline caching.
This doesn't actually mean that the target method is inlined, but means that an assumption
is put into the JIT code that every future call to the virtual or interface method will target
the very same implementation (i.e. that the call site is monomorphic). In this case, a
check is compiled into the machine code whether the assumption actually holds (i.e. whether
the type of the target object is the same as it was last time), and then transfer control
directly to the target method - with no virtual tables involved at all. If the assertion fails, an attempt may be made to convert this to a megamorphic call site (i.e. with multiple possible types); if this also fails (or if it is the first call), a regular long-winded lookup is performed, using vtables (for virtual methods) and itables (for interfaces).
Edit: The Hotspot Wiki has more details on the vtable and itable stubs. In the polymorphic case, it still puts an inline cache version into the call site. However, the code actually is a stub that performs a lookup in a vtable, or an itable. There is one vtable stub for each vtable offset (0, 1, 2, ...). Interface calls add a linear search over an array of itables before looking into the itable (if found) at the given offset.

Why all java methods are implicitly overridable?

In C++, I have to explicitly specify 'virtual' keyword to make a member function 'overridable', as there involves an overhead of creating virtual tables and vpointers, when a member function is made overridable (so every member function is implicitly not overridable for performance reasons).
It also allows a member function to be hidden (if not overridden) when a subclass provides a separate implementation with the same name and signature.
The same technique is used in C# as well. I am wondering why Java waved away from this behavior and made every method overridable by default and provided the ability to disable overriding behavior on explicit use of 'final' keyword.

The better question might be "Why does C# have non-virtual methods?" Or at the very least, why aren't they virtual by default with the option to flag them as non-virtual?
In C++, there is the idea (as Brian so nicely pointed out) that if you don't want it, you don't pay for it. The problem is that if you do want it, this usually means you end up paying through the nose for it. In most Java implementations, they are designed explicitly for lots of virtual calls; the vtable implementations tend to be fast, scarcely more expensive than non-virtual calls, meaning the primary advantage of non-virtual functions is lost. Furthermore, JIT compilers can inline virtual functions at runtime. As such, for efficiency reasons, there is very little reason actually to use non-virtual functions.
Thus, it largely comes down to the principle of least surprise. It tells us that all methods to behave the same way, not half of them being virtual and half of them being non-virtual. Since we need to have at least some virtual methods to achieve this polymorphism thing, it makes sense to have them all be virtual. Furthermore, having two methods with the same signature is just asking to shoot yourself in the foot.
Polymorphism also dictates that the object itself should have control over what it does. It's behavior should not be determinate on whether the client thinks it's a FooParent or a FooChild.
EDIT: So I'm being called on my assertions. This next paragraph is conjecture on my part, not a statement of fact.
An interesting side effect of all this is that Java programmers tend to use interfaces very heavily. Since the virtual method optimizations make the cost of interfaces essentially non-existent, they allow you to use a List (for example) instead of an ArrayList, and switch it out for a LinkedList at some later date with a simple one-line change and no additional penalty.
EDIT: I'll also pony up a couple sources. While not the original sources, they do come from Sun explaining some of the workings on HotSpot.
Inlining
VTable

Taken from here (#34)
There’s no virtual keyword in Java
because all non-static methods always
use dynamic binding. In Java, the
programmer doesn’t have to decide
whether to use dynamic binding. The
reason virtual exists in C++ is so you
can leave it off for a slight increase
in efficiency when you’re tuning for
performance (or, put another way, "If
you don’t use it, you don’t pay for
it"), which often results in confusion
and unpleasant surprises. The final
keyword provides some latitude for
efficiency tuning – it tells the
compiler that this method cannot be
overridden, and thus that it may be
statically bound (and made inline,
thus using the equivalent of a C++
non-virtual call). These optimizations
are up to the compiler.
A bit circular, perhaps.

So Java's rationale is probably something like this: the whole point of an object-oriented language is that things can be extended. So in terms of pure design, it really makes little sense to treat extensible as the "special case".
Remember that Java has the luxury of compiling at runtime. So some of the performance arguments in C++ compilation go out the window. In C++, if a class might be overridden, then the compiler has to take extra steps. In Java, there's no mystery about it: at any given moment in time, the JVM knows whether or not a particular method/class has been overridden or not, and that's essentially what counts.
Note that the final keyword is essentially about program design, not optimisation. The JVM doesn't need this information to see whether or not a class/method has been overridden!!

If the question is about to ask what is the better approach between java and C++/C# then it was already discussed in opposite direction in another thread, and many resource available on the net
Why C# implements methods as non-virtual by default?
http://www.artima.com/intv/nonvirtual.html
Recent introduction of #Override annotation and its wide adoption in new code, suggest that the exact answer to the question "Why all java methods are implicitly overridable?" is indeed because the designer made a mistake. (And they already fixed it)
Oh ! I'm going to get negative vote for this.

Java tries to move closer to a more dynamic language definition, where everything is an object and everything is a virtual method. It also wants to avoid ambiguity and hard to understand constructs, which it's designers viewed as a flaw in C++, therefore no operator overloading, and in this case no ability to have two public method signatures on one class hierarchy invoking different methods depending on the type of the variable referencing it.
C# is more concerned about the stability of subclasses and making sure that the subclasses behave predictably. C++ is concerned about performance.
Three different design priorities, leading to different choices.

I would say that in Java cost of virtual method is low compared to whole VM costs. In C++ it is significant cost, compared to assembly-like C background. Nobody would decide to make all methods called through pointer by default as result of C to C++ migration. It's too big change.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.