My question is about handling and life cycle of the anonymous object in JVM 1.8.
As far as I read, in JDK 1.8, the underlying mechanism of lambda expression isn't purely function based. I.e. it still create an anonymous object with the method we defined in the code and call that method upon the anonymous object. Also, because the lambda expression doesn't introduce any new variable scope, calling "this" in the lambda expression would refer to the original object instead of such anonymous method.
Naturally the question follows: how does JVM handle the life cycle of such anonymous object? Define the object method containing such lambda expression as "outer object method", At least I have the following questions.
If the outer object method is a normal method, does this anonymous object belong to instance level or class level? What if the outer method is static?
If the outer object method is called multiple times, would this anonymous object be re-used or re-created?
Is such object subject to JVM GC? If yes is the GC rule remaining the same as the other objects?
Is there any tool or API to track the life cycle of such anonymous object, given it cannot be referred in the code directly?
Any help or comment or documentation is appreciated.
I don't know what you mean by "belong". An object doesn't "belong" to any level.
If the lambda is a closure, i.e. if it captures one or more local variables from the surrounding scope (including this (which is kinda like an implicit final local variable), OuterClass.this (which is implicitly accessed through a hidden field of this), or unqualified instance variables (which are implicitly accessed through this or OuterClass.this)), then in different times the function containing the lambda expression is evaluated, it will likely have to create different objects, because the values of the captured variables are stored as part of the lambda object, and since in different runs of the function (or even different times within one run of the function), the captured variables can have different values, different lambda objects must be created so that each lambda remembers its separate set of captured values.
However, if the lambda is not a closure, then any two lambda objects created from that lambda expression are semantically undistinguishable. So then one object can be re-used for all evaluations of that lambda expression. I believe in this case the virtual machine will statically allocate one object for that lambda which lives for the duration of the program.
Yes. If an object is created when the lambda expression is run, then it is dynamically allocated like other objects in Java, and it is subject to GC. However, if one object is created for the whole life of the program (see discussion for (2) above), then it would not be memory-managed, similar to string literals.
Related
As per the definition in Java. If class doesn't occupy memory and just acts like a template, then why are we creating objects(main method) inside curly braces of class. Doesn't that mean now class also occupies memory because of objects present inside of it?
Trying to understand the definition of a class
There are three concepts to keep separate here: the class, the instance, and the stack.
class SomeClass {
static int staticValue = 0;
/* non-static */ int instanceValue = 0;
int someMethod() {
int stackValue = 42;
SomeClass instance = new SomeClass();
// ...
}
}
The class acts as a template, yes. In some languages other than Java, the class takes up no memory of its own: it merely describes the memory layout of the class's instances. For a beginner definition of OOP concepts you can think of that as true.
In Java this is not quite true for three reasons:
There is an object instance for SomeClass, accessible via SomeClass.class, which does take up memory. This instance allows you to look up information about the class itself, which is sometimes called "reflection".
The static field staticValue is shared among all instances of SomeClass, so in a sense the class takes up a small amount of memory to contain this shared value.
SomeClass contains methods like someMethod, and that code has to be in memory in order to run. If you're willing to consider code as requiring memory, and that the code is associated with the class, then the class consumes memory. People talking about OOP concepts aren't usually talking about the memory consumed by the code itself, though.
This can be compared to instances of the class SomeClass, which at a minimum contain a separate value of instanceValue for every instance you create. Instances don't have their own code, and do (in Java) contain a reference to their Class instance accessible via getClass().
Finally, the method someMethod and your main example do use references and local variables that consume memory, but in a different place than the instances or classes. This place is called a "stack", in part because as you call methods and those call further methods, the stack grows like a stack of papers on a desk. This means that there may be many copies of stackValue existing at once, one for each time you have called someMethod that hasn't finished yet. Each value of stackValue is discarded whenever its corresponding invocation of someMethod returns. These aren't directly tied to classes or instances, other than that they are code that might be considered associated with a class as in #3 above. Disregarding the memory consumed by the compiled code itself, the instance does not contribute to SomeClass or its instances consuming any more memory in ways that matter to OOP.
(Instances created with new are not a part of a "stack" but rather are part of the "heap", at this level of explanation, and that includes the SomeClass.class instance and any instances of SomeClass. Some languages require careful management of the heap's memory, but Java manages it for you through a process called garbage collection. Primitives like stackValue and the reference named instance are kept on the stack, though.)
Preface
I have been experimenting with ByteBuddy and ASM, but I am still a beginner in ASM and between beginner and advanced in ByteBuddy. This question is about ByteBuddy and about JVM bytecode limitations in general.
Situation
I had the idea of creating global mocks for testing by instrumenting constructors in such a way that instructions like these are inserted at the beginning of each constructor:
if (GlobalMockRegistry.isMock(getClass()))
return;
FYI, the GlobalMockRegistry basically wraps a Set<Class<?>> and if that set contains a certain class, then isMock(Class<?>> clazz) would return true. The advantage of that concept is that I can (de)activate global mocking for each class during runtime because if multiple tests run in the same JVM process, one test might need a certain global mock, the next one might not.
What the if(...) return; instructions above want to achieve is that if mocking is active, the constructor should not do anything:
no this() or super() calls, → update: impossible
no field initialisations, → update: possible
no other side effects. → update: might be possible, see my update below
The result would be an object with uninitialised fields that did not create any (possibly expensive) side effects such as resource allocation (database connection, file creation, you name it). Why would I want that? Could I not just create an instance with Objenesis and be happy? Not if I want a global mock, i.e. mock objects I cannot inject because they are created somewhere inside methods or field initialisers I do not have control over. Please do not worry about what method calls on such an object would do if its instance fields are not properly initialised. Just assume I have instrumented the methods to return stub results, too. I know how to do that already, the problem are only constructors in the context of this question.
Questions / problems
Now if I try to simulate the desired result in Java source code, I meet the following limitations:
I cannot insert any code before this() or super(). I could mitigate that by also instrumenting the super class hierarchy with the same if(...) return;, but would like to know if I could in theory use ASM to insert my code before this() or super() using a method visitor. Or would the byte code of the instrumented class somehow be verified during loading or retransformation and then rejected because the byte code is "illegal"? I would like to know before I start learning ASM because I want to avoid wasting time for an idea which is not feasible.
If the class contains final instance fields, I also cannot enter a return before all of those fields have been initialised in the constructor. That might happen at the very end of a complex constructor which performs lots of side effects before actually initialising the last field. So the question is similar to the previous one: Can I use ASM to insert my if(...) return; before any fields (including final ones) are initialised and produce a valid class which I could not produce using javac and will not be rejected when loaded or retransformed?
BTW, if it is relevant, we are talking about Java 8+, i.e. at the time of writing this that would be Java versions 8 to 14.
If anything about this question is unclear, please do not hesitate to ask follow-up questions, so I can improve it.
Update after discussing Antimony's answer
I think this approach could work and avoid side effects, calling the constructor chain but avoiding any side effects and resulting in a newly initialised instance with all fields empty (null, 0, false):
In order to avoid calling this.getClass(), I need to hard-code the mock target's class name directly into all constructors up the parent chain. I.e. if two "global mock" target classes have the same parent class(es), multiple of the following if blocks would be woven into each corresponding parent class, one for each hard-coded child class name.
In order to avoid any side effects from objects being created or methods being called, I need to call a super constructor myself, using null/zero/false values for each argument. That would not matter because the next parent class up the chain would have a similar code block so that the arguments given do not matter anyway.
// Avoid accessing 'this.getClass()'
if (GlobalMockRegistry.isMock(Sub.class)) {
// Identify and call any parent class constructor, ideally a default constructor.
// If none exists, call another one using default values like null, 0, false.
// In the class derived from Object, just call 'Object.<init>'.
super(null, 0, false);
return;
}
// Here follows the original byte code, i.e. the normal super/this call and
// everything else the original constructor does.
Note to myself: Antimony's answer explains "uninitialised this" very nicely. Another related answer can be found here.
Next update after evaluating my new idea
I managed to validate my new idea with a proof of concept. As my JVM byte code knowledge is too limited and I am not used to the way of thinking it requires (stack frames, local variable tables, "reverse" logic of first pushing/popping variables, then applying an operation on them, not being able to easily debug), I just implemented it in Javassist instead of ASM, which in comparison was a breeze after failing miserably with ASM after hours of trial & error.
I can take it from here and I want to thank user Antimony for his very instructive answer + comments. I do know that theoretically the same solution could be implemented using ASM, but it would be exceedingly difficult in comparison because its API is too low level for the task at hand. ByteBuddy's API is too high level, Javassist was just right for me in order to get quick results (and easily maintainable Java code) in this case.
Yes and no. Java bytecode is much less restrictive than Java (source) in this regard. You can put any bytecode you want before the constructor call, as long as you don't actually access the uninitialized object. (The only operations allowed on an uninitialized this value are calling a constructor, setting private fields declared in the same class, and comparing it against null).
Bytecode is also more flexible in where and how you make the constructor call. For example, you can call one of two different constructors in an if statement, or you can wrap the super constructor call in a "try block", both things that are impossible at the Java language level.
Apart from not accessing the uninitialized this value, the only restriction* is that the object has to be definitely initialized along any path that returns from the constructor call. This means the only way to avoid initializing the object is to throw an exception. While being much laxer than Java itself, the rules for Java bytecode were still very deliberately constructed so it is impossible to observe uninitialized objects. In general, Java bytecode is still required to be memory safe and type safe, just with a much looser type system than Java itself. Historically, Java applets were designed to run untrusted code in the JVM, so any method of bypassing these restrictions was a security vulnerability.
* The above is talking about traditional bytecode verification, as that is what I am most familiar with. I believe stackmap verification behaves similarly though, barring implementation bugs in some versions of Java.
P.S. Technically, Java can have code execute before the constructor call. If you pass arguments to the constructor, those expressions are evaluated first, and hence the ability to place bytecode before the constructor call is required in order to compile Java code. Likewise, the ability to set private fields declared in the same class is used to set synthetic variables that arise from the compilation of nested classes.
If the class contains final instance fields, I also cannot enter a return before all of those fields have been initialised in the constructor.
This, however, is eminently possible. The only restriction is that you call some constructor or superconstructor on the uninitialized this value. (Since all constructors recursively have this restriction, this will ultimately result in java.lang.Object's constructor being called). However, the JVM doesn't care what happens after that. In particular, it only cares that the fields have some well typed value, even if it is the default value (null for objects, 0 for ints, etc.) So there is no need to execute the field initializers to give them a meaningful value.
Is there any other way to get the type to be instantiated other than this.getClass() from a super class constructor?
Not as far as I am aware. There's no special opcode for magically getting the Class associated with a given value. Foo.class is just syntactic sugar which is handled by the Java compiler.
If you search 'java double brace' you'll find strong arguments against using it.
Every time someone uses double brace initialisation, a kitten gets killed.
https://stackoverflow.com/a/27521360/555631
The arguments are you're creating way too many anonymous classes and you're potentially creating a memory leak.
Are lambdas any different? They each create an anonymous inner class, they each reference their enclosing closure.
Lambda expressions are different from an anonymous inner class that happens to implement a functional interface.
Anonymous inner classes will create their own class file at compilation, usually something along the lines of Foo$1.class, if it's contained in the Foo class. It is a fully functional class that implements an interface or subclasses a class. To reference local values outside its scope, it will, behind the scenes, create an instance variable in the anonymous inner class that represents a copy of the value. This is why the variable must be effectively final -- otherwise the actual variable may change and the copy may be stale.
Lambda expressions don't create anonymous inner classes. They use a java.lang.invoke.LambdaMetafactory that produces a CallSite that can be used later to execute the lambda expression. The lambda expression, whether it's a block or an expression, gets converted to a hidden private static method within the class in which it's contained. Instead of creating a class with a hidden variable, captured values get translated into parameters of the hidden private static method. Local values still must be effectively final because the value passed to the method is again a copy. The method gets invoked by a invokedynamic instruction in the JVM.
Sources:
How Lambdas And Anonymous Inner Classes Work
PART 4 – HOW LAMBDA EXPRESSION INTERNALLY WORKS
Yes, they are different.
Lambdas don't actually necessarily create anonymous classes -- they're certainly not just translated into the equivalent anonymous class. Their creation is much more convoluted than that, and often ends up with an anonymous class created at runtime, but not necessarily.
Lambdas specifically do not capture anything except the variables specifically mentioned in them, unlike anonymous inner classes, which do capture the enclosing class object if they're defined in an instance method.
According to the following link the java stack frame contains local variables, operand stack and the current class constant pool reference.
http://blog.jamesdbloom.com/JVMInternals.html
Also From Oracle "Structure of JVM" Section 2.6.3. "Dynamic Linking - Each frame (§2.6) contains a reference to the run-time constant pool (§2.5.5) for the type of the current method to support dynamic linking of the method code."
I have also read that the object in the heap also has a pointer/reference to the class data.
https://www.artima.com/insidejvm/ed2/jvm6.html
The stack frame will contain the "current class constant pool reference" and also it will have the reference to the object in heap which in turn will also point to the class data. Is this not redundant??
For example.
public class Honda {
public void run() {
System.out.println("honda is running");
}
public static void main(String[] args) {
Honda h = new Honda();
h.run(); //output honda is running
}
}
When h.run() is going to be executed, jvm will create a new stack frame and push h on the stack frame. h will point to the object in heap which in turn will have a pointer to class data of Honda. The stack frame will also have current class constant reference. Is this correct? If not please shed some light on this.
Is this not redundant??
Maybe it is redundant for instance methods and constructors.
It isn't redundant for static methods or class initialization pseudo-methods.
It is also possible that the (supposedly) redundant reference gets optimized away by the JIT compiler. (Or maybe it isn't optimized away ... because they have concluded that the redundancy leads to faster execution on average.) Or maybe the actual implementation of the JVM1 is just different.
Bear in mind that the JVM spec is describing an idealized stack frame. The actual implementation may be different ... provided that it behaves the way that the spec says it should.
On #EJP's point on normativeness, the only normative references for Java are the JLS and JVM specifications, and the Javadoc for the class library. You can also consult the source code of the JVM itself. The specifications say what should happen, and the code (in a sense) says what does happen. An article you might find in a published paper or a web article is not normative, and may well be incorrect or out of date.
1 - The actual implementation may vary from one version to the next, or between vendors. Furthermore, I have heard of a JVM implementation where a bytecode rewriter transformed from standard bytecodes to another abstract machine language at class load time. It wasn't a great idea from a performance perspective ... but it was certainly within the spirit of the JVM spec.
The stack frame will contain the "current class constant pool reference" and also it will have the reference to the object in heap which in turn will also point to the class data. Is this not redundant??
You missed the precondition of that statement, or you misquoted it, or it was just plainly wrong where you saw it.
The "reference to the object in heap" is only added for non-static method, and it refers to the hidden this parameter.
As it says in section "Local Variables Array":
The array of local variables contains all the variables used during the execution of the method, including a reference to this, all method parameters and other locally defined variables. For class methods (i.e. static methods) the method parameters start from zero, however, for instance method the zero slot is reserved for this.
So, for static methods, there is no redundancy.
Could the constant pool reference be eliminated when this is present? Yes, but then there would need to be a different way to locate the constant pool reference, requiring different bytecode instructions, so that would be a different kind of redundancy.
Always having the constant pool reference available in a well-known location in the stack frame, simplifies the bytecode logic.
There are two points here. First, there are static methods which are invoked without a this reference. Second, the actual class of an object instance is not necessarily the declaring class of the method whose code we are actually executing. The purpose of the constant pool reference is to enable resolving of symbolic references and loading of constants referenced by the code. In both cases, we need the constant pool of the class containing the currently executed code, even if the method might be inherited by the actual class of the this reference (in case of a private method invoked by another inherited method, we have a method invoked with a this instance of a class which formally does not even inherit the method).
It might even be the case that the currently executed code is contained in an interface, so we never have instances of it, but still a class file with a constant pool which must be available when executing the code. This does not only apply to Java 8 and newer, which allow static and default methods in interfaces; earlier versions also might need to execute the <clinit> method of an interface to initialize its static fields.
By the way, even if an instance method is invoked with an object reference associated with this in its first local variable, there is no requirement for the bytecode instructions to keep it there. If not needed, it might get overwritten by an arbitrary value, reusing the variable slot for other purposes. This does not preclude that subsequent instructions need the constant pool, which, as said, does not need to belong to the actual class of this anyway.
Of course, that pool reference is a logical construct anyway. Implementations may transform the code to use a shared pool or not to need a pool at all when all references have been resolved already, etc. After inlining, code may not even have a dedicated stack frame anymore.
The invokedynamic instruction is used to help the VM determine the method reference at runtime instead hardwiring it at compile time.
This is useful with dynamic languages where the exact method and argument types aren't known until runtime. But that isn't the case with Java lambdas. They are translated to a static method with well defined arguments. And this method can be invoked using invokestatic.
So then what is the need of invokedynamic for lambdas, especially when there is a performance hit?
Lambdas are not invoked using invokedynamic, their object representation is created using invokedynamic, the actual invocation is a regular invokevirtual or invokeinterface.
For example:
// creates an instance of (a subclass of) Consumer
// with invokedynamic to java.lang.invoke.LambdaMetafactory
something(x -> System.out.println(x));
void something(Consumer<String> consumer) {
// invokeinterface
consumer.accept("hello");
}
Any lambda has to become an instance of some base class or interface. That instance will sometimes contain a copy of the variables captured from the original method and sometimes a pointer to the parent object.
This can be implemented as an anonymous class.
Why invokedynamic
The short answer is: to generate code in runtime.
The Java maintainers chose to generate the implementation class in runtime.
This is done by calling java.lang.invoke.LambdaMetafactory.metafactory.
Since the arguments for that call (return type, interface, and captured parameters) can change, this requires invokedynamic.
Using invokedynamic to construct the anonymous class in runtime, allows the JVM to generate that class bytecode in runtime. The subsequent calls to the same statement use a cached version. The other reason to use invokedynamic is to be able to change the implementation strategy in the future without having to change already compiled code.
The road not taken
The other option would be the compiler creating an innerclass for each lambda instantiation, equivalent to translating the above code into:
something(new Consumer() {
public void accept(x) {
// call to a generated method in the base class
ImplementingClass.this.lambda$1(x);
// or repeating the code (awful as it would require generating accesors):
System.out.println(x);
}
);
This requires creating classes in compile time and having to load then during runtime. The way jvm works those classes would reside in the same directory as the original class. And the first time you execute the statement that uses that lambda, that anonymous class would have to be loaded and initialized.
About performance
The first call to invokedynamic will trigger the anonymous class generation. Then the opcode invokedynamic is replaced with code that's equivalent in performance to the writing manually the anonymous instantiation.
Brain Goetz explained the reasons for the lambda translation strategy in one of his papers which unfortunately now seem unavailable. Fortunately I kept a copy:
Translation strategy
There are a number of ways we might represent a lambda expression in
bytecode, such as inner classes, method handles, dynamic proxies, and
others. Each of these approaches has pros and cons. In selecting a
strategy, there are two competing goals: maximizing flexibility for
future optimization by not committing to a specific strategy, vs
providing stability in the classfile representation. We can achieve
both of these goals by using the invokedynamic feature from JSR 292 to
separate the binary representation of lambda creation in the bytecode
from the mechanics of evaluating the lambda expression at runtime.
Instead of generating bytecode to create the object that implements
the lambda expression (such as calling a constructor for an inner
class), we describe a recipe for constructing the lambda, and delegate
the actual construction to the language runtime. That recipe is
encoded in the static and dynamic argument lists of an invokedynamic
instruction.
The use of invokedynamic lets us defer the selection of a translation
strategy until run time. The runtime implementation is free to select
a strategy dynamically to evaluate the lambda expression. The runtime
implementation choice is hidden behind a standardized (i.e., part of
the platform specification) API for lambda construction, so that the
static compiler can emit calls to this API, and JRE implementations
can choose their preferred implementation strategy. The invokedynamic
mechanics allow this to be done without the performance costs that
this late binding approach might otherwise impose.
When the compiler encounters a lambda expression, it first lowers
(desugars) the lambda body into a method whose argument list and
return type match that of the lambda expression, possibly with some
additional arguments (for values captured from the lexical scope, if
any.) At the point at which the lambda expression would be captured,
it generates an invokedynamic call site, which, when invoked, returns
an instance of the functional interface to which the lambda is being
converted. This call site is called the lambda factory for a given
lambda. The dynamic arguments to the lambda factory are the values
captured from the lexical scope. The bootstrap method of the lambda
factory is a standardized method in the Java language runtime library,
called the lambda metafactory. The static bootstrap arguments capture
information known about the lambda at compile time (the functional
interface to which it will be converted, a method handle for the
desugared lambda body, information about whether the SAM type is
serializable, etc.)
Method references are treated the same way as lambda expressions,
except that most method references do not need to be desugared into a
new method; we can simply load a constant method handle for the
referenced method and pass that to the metafactory.
So, the idea here seemed to be to encapsulate the translation strategy and not commit to a particular way of doing things by hiding those details. In the future when type erasure and lack of value types have been solved and maybe Java supports actual function types, they might just as well go there and change that strategy for another one without causing any problems in the users' code.
Current Java 8's lambda implementation is a compound decision:
Compile the lambda expression to a static method in the enclosing class; instead of compiling lambdas to separate inner class files (Scala compiles this way, which generates many $$$ class files)
Introduce a constant pool: BootstrapMethods, which wraps the static method invocation to callsite object (can be cached for later use)
So to answer your question,
the current lambda implementation using invokedynamic is a little bit faster than the separate inner class way, because no need to load these inner class files, but instead create the inner class byte[] on the fly (to satisfy for example the Function interface), and cached for later use.
JVM team may still choose to generate separate inner class (by referencing the enclosing class's static methods) files: it's flexible