Does Java 8's Method Reference use Reflection? [duplicate] - java

The invokedynamic instruction is used to help the VM determine the method reference at runtime instead hardwiring it at compile time.
This is useful with dynamic languages where the exact method and argument types aren't known until runtime. But that isn't the case with Java lambdas. They are translated to a static method with well defined arguments. And this method can be invoked using invokestatic.
So then what is the need of invokedynamic for lambdas, especially when there is a performance hit?

Lambdas are not invoked using invokedynamic, their object representation is created using invokedynamic, the actual invocation is a regular invokevirtual or invokeinterface.
For example:
// creates an instance of (a subclass of) Consumer
// with invokedynamic to java.lang.invoke.LambdaMetafactory
something(x -> System.out.println(x));
void something(Consumer<String> consumer) {
// invokeinterface
consumer.accept("hello");
}
Any lambda has to become an instance of some base class or interface. That instance will sometimes contain a copy of the variables captured from the original method and sometimes a pointer to the parent object.
This can be implemented as an anonymous class.
Why invokedynamic
The short answer is: to generate code in runtime.
The Java maintainers chose to generate the implementation class in runtime.
This is done by calling java.lang.invoke.LambdaMetafactory.metafactory.
Since the arguments for that call (return type, interface, and captured parameters) can change, this requires invokedynamic.
Using invokedynamic to construct the anonymous class in runtime, allows the JVM to generate that class bytecode in runtime. The subsequent calls to the same statement use a cached version. The other reason to use invokedynamic is to be able to change the implementation strategy in the future without having to change already compiled code.
The road not taken
The other option would be the compiler creating an innerclass for each lambda instantiation, equivalent to translating the above code into:
something(new Consumer() {
public void accept(x) {
// call to a generated method in the base class
ImplementingClass.this.lambda$1(x);
// or repeating the code (awful as it would require generating accesors):
System.out.println(x);
}
);
This requires creating classes in compile time and having to load then during runtime. The way jvm works those classes would reside in the same directory as the original class. And the first time you execute the statement that uses that lambda, that anonymous class would have to be loaded and initialized.
About performance
The first call to invokedynamic will trigger the anonymous class generation. Then the opcode invokedynamic is replaced with code that's equivalent in performance to the writing manually the anonymous instantiation.

Brain Goetz explained the reasons for the lambda translation strategy in one of his papers which unfortunately now seem unavailable. Fortunately I kept a copy:
Translation strategy
There are a number of ways we might represent a lambda expression in
bytecode, such as inner classes, method handles, dynamic proxies, and
others. Each of these approaches has pros and cons. In selecting a
strategy, there are two competing goals: maximizing flexibility for
future optimization by not committing to a specific strategy, vs
providing stability in the classfile representation. We can achieve
both of these goals by using the invokedynamic feature from JSR 292 to
separate the binary representation of lambda creation in the bytecode
from the mechanics of evaluating the lambda expression at runtime.
Instead of generating bytecode to create the object that implements
the lambda expression (such as calling a constructor for an inner
class), we describe a recipe for constructing the lambda, and delegate
the actual construction to the language runtime. That recipe is
encoded in the static and dynamic argument lists of an invokedynamic
instruction.
The use of invokedynamic lets us defer the selection of a translation
strategy until run time. The runtime implementation is free to select
a strategy dynamically to evaluate the lambda expression. The runtime
implementation choice is hidden behind a standardized (i.e., part of
the platform specification) API for lambda construction, so that the
static compiler can emit calls to this API, and JRE implementations
can choose their preferred implementation strategy. The invokedynamic
mechanics allow this to be done without the performance costs that
this late binding approach might otherwise impose.
When the compiler encounters a lambda expression, it first lowers
(desugars) the lambda body into a method whose argument list and
return type match that of the lambda expression, possibly with some
additional arguments (for values captured from the lexical scope, if
any.) At the point at which the lambda expression would be captured,
it generates an invokedynamic call site, which, when invoked, returns
an instance of the functional interface to which the lambda is being
converted. This call site is called the lambda factory for a given
lambda. The dynamic arguments to the lambda factory are the values
captured from the lexical scope. The bootstrap method of the lambda
factory is a standardized method in the Java language runtime library,
called the lambda metafactory. The static bootstrap arguments capture
information known about the lambda at compile time (the functional
interface to which it will be converted, a method handle for the
desugared lambda body, information about whether the SAM type is
serializable, etc.)
Method references are treated the same way as lambda expressions,
except that most method references do not need to be desugared into a
new method; we can simply load a constant method handle for the
referenced method and pass that to the metafactory.
So, the idea here seemed to be to encapsulate the translation strategy and not commit to a particular way of doing things by hiding those details. In the future when type erasure and lack of value types have been solved and maybe Java supports actual function types, they might just as well go there and change that strategy for another one without causing any problems in the users' code.

Current Java 8's lambda implementation is a compound decision:
Compile the lambda expression to a static method in the enclosing class; instead of compiling lambdas to separate inner class files (Scala compiles this way, which generates many $$$ class files)
Introduce a constant pool: BootstrapMethods, which wraps the static method invocation to callsite object (can be cached for later use)
So to answer your question,
the current lambda implementation using invokedynamic is a little bit faster than the separate inner class way, because no need to load these inner class files, but instead create the inner class byte[] on the fly (to satisfy for example the Function interface), and cached for later use.
JVM team may still choose to generate separate inner class (by referencing the enclosing class's static methods) files: it's flexible

Related

Java Records Reflection and Synthetic Methods

Based on the older Java (7) Language Specifications (13.1.7):
Any constructs introduced by a Java compiler that do not have a corresponding construct in the source code must be marked as synthetic, except for default constructors, the class initialization method, and the values and valueOf methods of the Enum class.
On newer ones (Java (17) Language Specifications (13.1.7): ) that wording changes to:
A construct emitted by a Java compiler must be marked as synthetic if it does not correspond to a construct declared explicitly or implicitly in source code, unless the emitted construct is a class initialization method (JVMS §2.9).
I wonder how would this apply to the accesor methods created for the components of java Records (JEP 395)
For example
record ARecord(int a){}
would have a method int a() yet there is no code representing such method, according to the wording of the older JLS such method is added by the compiler so I would expect it to be synthetic but its not, as it can be corroborated by running the following 2 lines on JShell
jshell
| Welcome to JShell -- Version 17.0.1
| For an introduction type: /help intro
jshell> record ARecord(int a){}
| created record ARecord
jshell> ARecord.class.getDeclaredMethod("a").isSynthetic();
$2 ==> false
jshell>
The reason I ask is because I would like to use reflection (or any other programmatic mean at runtime) to determine which elements on the class have a matching code structure, basically those have code representing them, meaning:
For the following code
record ARecord(int a){
pubic void someMethod() {}
}
that entity would have 2 methods (a and someMethod), a has no code representing it and someMethod does, I need a way to differentiate those based on that criteria
I wonder if it is because its considered as implicitly declared being its code implicitly defined as part of the component
This is exactly it. Note how the old spec only says that "synthetic" should be marked on constructs that
do not have a corresponding construct in the source code
with the exception of the implicitly declared Enum.values and Enum.valueOf. Back then, those were the only two implicitly declared (in the sense that the new spec uses the phrase) things, apparently. :D
On the other hand, the new spec says
does not correspond to a construct declared explicitly or implicitly in source code
Note that this wording automatically handles the Enum exceptions, but also handles the plethora of implicitly declared things that got added since. This includes record components.
From the Java 17 spec §8.10.3. Record Members,
Furthermore, for each record component, a record class has a method with the same name as the record component and an empty formal parameter list. This method, which is declared explicitly or implicitly, is known as an accessor method.
...
If a record class has a record component for which an accessor method is not declared explicitly, then an accessor method for that record component is declared implicitly [...]
The method a is implicitly declared in your component, therefore it is not synthetic.
Generally speaking (there might be exceptions to this that I don't know of), synthetic constructs are constructs that are not specified by the language spec, but are required for a particular implementation of a compiler to work. The spec is basically saying that such constructs must be marked as "synthetic" in the binary. See some examples here.
Any members of a type that have the synthetic flag on are ignored entirely by javac. Javac acts exactly as if those things don't exist at all.
As a consequence, obviously, the 'getters' you get for a record aren't synthetic. If they were, it would be impossible to call them from .java code - the only way to call them is to write a hacky javac clone that does compile access to synthetics, or to use bytecode manipulation to remove the synthetic flag, or to emit bytecode directly, or to use reflection.

Java Instrumentation: Prevent class reference from loading class

Hello StackOverflow Community,
I recently discovered Java Instrumentation and what great things you can do with it, so I decided to write a small library for me that simplifies some of these things.
I have the following method (simplified):
public static void editClass(Class<*> clazz) {
...
}
It adds a transformer via Instrumentation that transforms the bytecode of loaded classes with the name of clazz.getName().
So in my premain method, I can say
editClass(Foo.class);
My problem is, by specifying the class via a reference to it (.class), this class gets loaded before the transformer is added, so after that, I have to retransform the class which prevents me from adding/removing methods and so on.
So, is there a way to not load the class when using this class reference? Or an other way to implement this? I know that I could just pass the class name as an argument, but I would really like to make this whole library type-safe and make refactoring easier.
Thanks in advance!
If you want to call the editClass method from premain only and we assume that the Java Agent itself does not use the class otherwise, so that the class literal inside the editClass call would be the only trigger, you can do the following:
provide both methods, editClass(Class<?> clazz) and editClass(String qualifiedName)
write the premain method (or agent classes in general) using editClass(Class<?>) and enjoy compile-time safety regarding the existence of the classes referenced via literals
perform a static code transformation of the agent classes, replacing all calls of editClass(Class<?>) with editClass(String)
This shouldn’t be too hard, as you only have to replace all sequences of ldc packagename/Foo.class, invokestatic (Ljava/lang/Class;)V with ldc "packagename.Foo", invokestatic (Ljava/lang/String;)V.
It may become even easier when the method editClass(String qualifiedName) can handle the internal class names (using slashes instead of dots).
Since you said you “recently discovered Java Instrumentation”, this might be a good exercise in class file transformations
Use the transformed Agent classes which have no references to the classes to transform anymore, to perform the load time transformations

Can I insert instructions in constructors before calling this() / super() and before initialising any final fields?

Preface
I have been experimenting with ByteBuddy and ASM, but I am still a beginner in ASM and between beginner and advanced in ByteBuddy. This question is about ByteBuddy and about JVM bytecode limitations in general.
Situation
I had the idea of creating global mocks for testing by instrumenting constructors in such a way that instructions like these are inserted at the beginning of each constructor:
if (GlobalMockRegistry.isMock(getClass()))
return;
FYI, the GlobalMockRegistry basically wraps a Set<Class<?>> and if that set contains a certain class, then isMock(Class<?>> clazz) would return true. The advantage of that concept is that I can (de)activate global mocking for each class during runtime because if multiple tests run in the same JVM process, one test might need a certain global mock, the next one might not.
What the if(...) return; instructions above want to achieve is that if mocking is active, the constructor should not do anything:
no this() or super() calls, → update: impossible
no field initialisations, → update: possible
no other side effects. → update: might be possible, see my update below
The result would be an object with uninitialised fields that did not create any (possibly expensive) side effects such as resource allocation (database connection, file creation, you name it). Why would I want that? Could I not just create an instance with Objenesis and be happy? Not if I want a global mock, i.e. mock objects I cannot inject because they are created somewhere inside methods or field initialisers I do not have control over. Please do not worry about what method calls on such an object would do if its instance fields are not properly initialised. Just assume I have instrumented the methods to return stub results, too. I know how to do that already, the problem are only constructors in the context of this question.
Questions / problems
Now if I try to simulate the desired result in Java source code, I meet the following limitations:
I cannot insert any code before this() or super(). I could mitigate that by also instrumenting the super class hierarchy with the same if(...) return;, but would like to know if I could in theory use ASM to insert my code before this() or super() using a method visitor. Or would the byte code of the instrumented class somehow be verified during loading or retransformation and then rejected because the byte code is "illegal"? I would like to know before I start learning ASM because I want to avoid wasting time for an idea which is not feasible.
If the class contains final instance fields, I also cannot enter a return before all of those fields have been initialised in the constructor. That might happen at the very end of a complex constructor which performs lots of side effects before actually initialising the last field. So the question is similar to the previous one: Can I use ASM to insert my if(...) return; before any fields (including final ones) are initialised and produce a valid class which I could not produce using javac and will not be rejected when loaded or retransformed?
BTW, if it is relevant, we are talking about Java 8+, i.e. at the time of writing this that would be Java versions 8 to 14.
If anything about this question is unclear, please do not hesitate to ask follow-up questions, so I can improve it.
Update after discussing Antimony's answer
I think this approach could work and avoid side effects, calling the constructor chain but avoiding any side effects and resulting in a newly initialised instance with all fields empty (null, 0, false):
In order to avoid calling this.getClass(), I need to hard-code the mock target's class name directly into all constructors up the parent chain. I.e. if two "global mock" target classes have the same parent class(es), multiple of the following if blocks would be woven into each corresponding parent class, one for each hard-coded child class name.
In order to avoid any side effects from objects being created or methods being called, I need to call a super constructor myself, using null/zero/false values for each argument. That would not matter because the next parent class up the chain would have a similar code block so that the arguments given do not matter anyway.
// Avoid accessing 'this.getClass()'
if (GlobalMockRegistry.isMock(Sub.class)) {
// Identify and call any parent class constructor, ideally a default constructor.
// If none exists, call another one using default values like null, 0, false.
// In the class derived from Object, just call 'Object.<init>'.
super(null, 0, false);
return;
}
// Here follows the original byte code, i.e. the normal super/this call and
// everything else the original constructor does.
Note to myself: Antimony's answer explains "uninitialised this" very nicely. Another related answer can be found here.
Next update after evaluating my new idea
I managed to validate my new idea with a proof of concept. As my JVM byte code knowledge is too limited and I am not used to the way of thinking it requires (stack frames, local variable tables, "reverse" logic of first pushing/popping variables, then applying an operation on them, not being able to easily debug), I just implemented it in Javassist instead of ASM, which in comparison was a breeze after failing miserably with ASM after hours of trial & error.
I can take it from here and I want to thank user Antimony for his very instructive answer + comments. I do know that theoretically the same solution could be implemented using ASM, but it would be exceedingly difficult in comparison because its API is too low level for the task at hand. ByteBuddy's API is too high level, Javassist was just right for me in order to get quick results (and easily maintainable Java code) in this case.
Yes and no. Java bytecode is much less restrictive than Java (source) in this regard. You can put any bytecode you want before the constructor call, as long as you don't actually access the uninitialized object. (The only operations allowed on an uninitialized this value are calling a constructor, setting private fields declared in the same class, and comparing it against null).
Bytecode is also more flexible in where and how you make the constructor call. For example, you can call one of two different constructors in an if statement, or you can wrap the super constructor call in a "try block", both things that are impossible at the Java language level.
Apart from not accessing the uninitialized this value, the only restriction* is that the object has to be definitely initialized along any path that returns from the constructor call. This means the only way to avoid initializing the object is to throw an exception. While being much laxer than Java itself, the rules for Java bytecode were still very deliberately constructed so it is impossible to observe uninitialized objects. In general, Java bytecode is still required to be memory safe and type safe, just with a much looser type system than Java itself. Historically, Java applets were designed to run untrusted code in the JVM, so any method of bypassing these restrictions was a security vulnerability.
* The above is talking about traditional bytecode verification, as that is what I am most familiar with. I believe stackmap verification behaves similarly though, barring implementation bugs in some versions of Java.
P.S. Technically, Java can have code execute before the constructor call. If you pass arguments to the constructor, those expressions are evaluated first, and hence the ability to place bytecode before the constructor call is required in order to compile Java code. Likewise, the ability to set private fields declared in the same class is used to set synthetic variables that arise from the compilation of nested classes.
If the class contains final instance fields, I also cannot enter a return before all of those fields have been initialised in the constructor.
This, however, is eminently possible. The only restriction is that you call some constructor or superconstructor on the uninitialized this value. (Since all constructors recursively have this restriction, this will ultimately result in java.lang.Object's constructor being called). However, the JVM doesn't care what happens after that. In particular, it only cares that the fields have some well typed value, even if it is the default value (null for objects, 0 for ints, etc.) So there is no need to execute the field initializers to give them a meaningful value.
Is there any other way to get the type to be instantiated other than this.getClass() from a super class constructor?
Not as far as I am aware. There's no special opcode for magically getting the Class associated with a given value. Foo.class is just syntactic sugar which is handled by the Java compiler.

Intercepting field access using Javassist or ASM

I'm familiar with various ways of intercepting method invocations using proxies, but I'm wondering if there's a way to detect field access / dereferences on some proxy using a library like Javassist or ASM? For example:
void detectFieldName(Function<Foo, Supplier<String>> f) {
Foo fooProxy = createFooProxy();
f.apply(fooProxy);
}
detectFieldName((Foo foo) -> foo.bar);
Ideally from this I'd like to know that a field named bar was dereferenced.
Looking at your updated use case: lambdas are desugared to synthetic (compiler-generated) methods, with a function object that forwards interface calls through the generated method (I haven't looked into exactly how this is implemented, but I think Brian Goetz has talked about it). You can just look in that method's bytecode (loaded from the class file; some of the ASM sample code does this) and read off the field access. Instrumentation is not required.
Note that you can't create a proxy to see field access; the field access is performed in the lambda method (or more generally, where the field is loaded) without executing any code in Foo. In fact, you don't even need to call the lambda if all you want is to get the field name, and if all you're using the Foo proxy for is the call, you don't need a proxy.
I'm not aware of any way to intercept field accesses as easily as java.lang.reflect.Proxy makes intercepting method calls.
The getfield and putfield bytecodes use symbolic descriptors that encode the class and field name, so you could use a Java agent to add method calls before or after each load and store passing the field name, object and value being loaded/stored. (This works best if you're only interested in a subset of fields, say all fields of a particular class.) Depending on your needs, you may also have to recognize reflective accesses to your fields by instrumenting use of java.lang.reflect.Field, the handle returned by MethodHandles.Lookup.findGetter/Setter etc. (which may involve interprocedural analysis or reasoning about string operations used to build the field name, etc.). You could also try instrumenting "just before" the library calls into some JVM-specific native functionality, but that ties you to one JVM implementation and your instrumentation may be skipped if the JVM intrinsifies (special-cases codegen for) reflective calls.
If you're willing to write C code, you can use the JVM Tool Interface watched field functions. This seems the easiest way to get information, but it's harder to do interesting Java-level things with (though you can call back into your Java support library from the JVMTI).
Without major hacks, this is not possible. Field access in Java is not bound dynamically. This means, any reading or writing to a field is hardcoded into all using classes. With a method proxy, one makes use of the fact you can override a method to determine behavior. For intercepting field access, one would need to intercept the class that is using a field. Some libraries emulate this behavior by replacing field access by synthetic accessor methods. This requires however some build time redefinition of all concerned classes throughout the entire project.
As for your example, you could in theory use a tool like ASM to extract the required information from the lambda expression. However, note that the lambda expression's code will be extracted into a method of the class of the method that uses the lambda expression. You might have trouble finding out which method it actually is that contains your lambda but the byte code for invoking the expression will merely look something like this:
InvokeDynamic #0:accept:(LFoo;)Ljava/util/function/Function;
As you can see, the byte code will only contain a possibly ambiguous signature. Otherwise, you could of course copy the lambda expression's logic into a new class where you changed the logic of a field access. Since lambdas are by definition interfaces, the creation of such a new class would actually be comparably easy. But the problem with the method detection remains.

Force compilation event without dependent static function definition in Java

Since a static function call is translated into a static invocation bytecode regardless of how the definition exists... is there some way to force a caller of a static function to compile successfully even when the target function and class don't exist yet?
I want to be able to compile calls to functions that don't exist yet. I need to tell the compiler to trust me that at runtime, I'll have them properly defined and in the classpath so go ahead and compile it for now.
Is there a way to do this?
Reflectively yes, but not via a regular call.
The call requires an entry in the string pool that includes the method name and parameter types so the compiler needs to be able to decide on a signature for the method.
invokestatic <method-spec>
<method-spec> is a method specification. It is a single token made up of three parts: a classname, a methodname and a descriptor. e.g.
java/lang/System/exit(I)V
is the method called "exit" in the class called "java.lang.System", and it has the descriptor "(I)V" (i.e. it takes an integer argument and returns no result).
Consider
AClass.aStaticMethod(42)
Without knowing anything about AClass, it could be a call to any of
AClass.aStaticMethod(int)
AClass.aStaticMethod(int...)
AClass.aStaticMethod(long)
AClass.aStaticMethod(long...)
ditto for float and double
AClass.aStaticMethod(Integer)
AClass.aStaticMethod(Number)
AClass.aStaticMethod(Comparable<? extends Integer>)
AClass.aStaticMethod(Object)
AClass.aStaticMethod(Serializable)
and probably a few others that I've missed.
... is there some way to force a caller of a static function to compile successfully even when the target function and class don't exist yet?
No. When compiling a method call, the compiler needs to check that the name, argument types, result type, exceptions and so on of the called method. Since you are asking about a static method, this information can only defined in one place ... the class that declares the static method. There is no work-around for this if you want static type-safety.
I need to tell the compiler to trust me that at runtime ...
It is not that simple:
You haven't told the compiler what the method signature should be. The compiler needs to be told, because is not possible to accurately infer the signature from the call.
The Java platform is designed to be robust, and "just trust me" could lead to catastrophic runtime failures.
If you are willing to sacrifice compile-time type safety and eschew the convenience / simplicity / readability of statically typed code, then reflection is an option. But I can't think of any other options that would work.
No, but you could declare interfaces that have the methods and code against them, then use the Abstract Factory pattern to provide implementations at runtime.
Dependency Injection use this approach.

Categories