This question already has answers here:
How does bytecode get verified in the JVM?
(2 answers)
Closed 8 years ago.
So I am a little confused regarding the verification of bytecode that happens inside a JVM. According to the book by Deitel and Deitel, a Java program goes through five phases (edit, compile, load, verify and execute) (chapter 1). The bytecode verifier verifies the bytecode during the 'verify' stage. Nowhere does the book mention that the bytecode verifier is a part of the classloader.
However according to
docs of oracle
, the classloader performs the task of loading, linking and initialization, and during the process of linking it has to verify the bytecode.
Now, are the bytecode verification that Deitel and Deitel talks about, and the bytecode verification that
this oracle document
talks about, the same process?
Or does bytecode verification happen twice, once during the linking process and the other by the bytecode verifier?
Picture describing phases of a java program as mentioned in book by Dietel and Dietel.(I borrowed this pic from one of the answers below by nobalG :) )
You may understand the byte code verification using this diagram which is in detail explained in Oracle docs
You will find that the byte code verification happens only once not twice
The illustration shows the flow of data and control from Java language
source code through the Java compiler, to the class loader and
bytecode verifier and hence on to the Java virtual machine, which
contains the interpreter and runtime system. The important issue is
that the Java class loader and the bytecode verifier make no
assumptions about the primary source of the bytecode stream--the code
may have come from the local system, or it may have travelled halfway
around the planet. The bytecode verifier acts as a sort of gatekeeper:
it ensures that code passed to the Java interpreter is in a fit state
to be executed and can run without fear of breaking the Java
interpreter. Imported code is not allowed to execute by any means
until after it has passed the verifier's tests. Once the verifier is
done, a number of important properties are known:
There are no operand stack overflows or underflows
The types of the parameters of all bytecode instructions are known to always be correct
Object field accesses are known to be legal--private, public, or protected
While all this checking appears excruciatingly detailed, by the time
the bytecode verifier has done its work, the Java interpreter can
proceed, knowing that the code will run securely. Knowing these
properties makes the Java interpreter much faster, because it doesn't
have to check anything. There are no operand type checks and no stack
overflow checks. The interpreter can thus function at full speed
without compromising reliability.
EDIT:-
From Oracle Docs Section 5.3.2:
When the loadClass method of the class loader L is invoked with the
name N of a class or interface C to be loaded, L must perform one of
the following two operations in order to load C:
The class loader L can create an array of bytes representing C as the bytes of a ClassFile structure (§4.1); it then must invoke the
method defineClass of class ClassLoader. Invoking defineClass
causes the Java Virtual Machine to derive a class or interface
denoted by N using L from the array of bytes using the algorithm
found in §5.3.5.
The class loader L can delegate the loading of C to some other class loader L'. This is accomplished by passing the argument N
directly or indirectly to an invocation of a method on L'
(typically the loadClass method). The result of the invocation is
C.
As correctly commented by Holger, trying to explain it more with the help of an example:
static int factorial(int n)
{
int res;
for (res = 1; n > 0; n--) res = res * n;
return res;
}
The corresponding byte code would be
method static int factorial(int), 2 registers, 2 stack slots
0: iconst_1 // push the integer constant 1
1: istore_1 // store it in register 1 (the res variable)
2: iload_0 // push register 0 (the n parameter)
3: ifle 14 // if negative or null, go to PC 14
6: iload_1 // push register 1 (res)
7: iload_0 // push register 0 (n)
8: imul // multiply the two integers at top of stack
9: istore_1 // pop result and store it in register 1
10: iinc 0, -1 // decrement register 0 (n) by 1
11: goto 2 // go to PC 2
14: iload_1 // load register 1 (res)
15: ireturn // return its value to caller
Note that most of the instructions in JVM are typed.
Now you should note that proper operation of the JVM is not guaranteed unless the code meets at least the following conditions:
Type correctness: the arguments of an instruction are always of the
types expected by the instruction.
No stack overflow or underflow: an instruction never pops an argument
off an empty stack, nor pushes a result on a full stack (whose size is
equal to the maximal stack size declared for the method).
Code containment: the program counter must always point within the
code for the method, to the beginning of a valid instruction encoding
(no falling off the end of the method code; no branches into the
middle of an instruction encoding).
Register initialization: a load from a register must always follow at
least one store in this register; in other terms, registers that do
not correspond to method parameters are not initialized on method
entrance, and it is an error to load from an uninitialized register.
Object initialization: when an instance of a class C is created, one
of the initialization methods for class C (corresponding to the
constructors for this class) must be invoked before the class
instance can be used.
The purpose of byte code verification is to check these condition once and for all, by static analysis of the byte code at load time. Byte code that passes verfification can then be executed faster.
Also to note that byte code verification purpose is to shift the verfification listed above from run time to load time.
The above explanation has been taken from Java bytecode verification: algorithms and formalizations
No.
From the JVM Spec 4.10:
Even though a compiler for the Java programming language must only produce class files that satisfy all the static and structural constraints in the previous sections, the Java Virtual Machine has no guarantee that any file it is asked to load was generated by that compiler or is properly formed.
And then proceeds specify the verification process.
And JVM Spec 5.4.1:
Verification (§4.10) ensures that the binary representation of a class or interface is structurally correct (§4.9). Verification may cause additional classes and interfaces to be loaded (§5.3) but need not cause them to be verified or prepared.
The section specifying linking references §4.10 - not as a separate process but part of loading the classes.
The JVM and JLS are great documents when you have a question like this.
No such Two time verification
NO, As far as verification is concerned,look closely that how the program written in java goes through various phases in the following image,You will see that there is no such Two time verification but the code is verified just once.
EDIT – The programmer writes the program (preferably on a notepad)
and saves it as a ‘.java’ file, which is then further used for
compilation, by the compiler.
COMPILE – The compiler here takes the ‘.java’ file, compiles it
and looks for any possible errors in the scope of the program. If
it finds any error, it reports them to the programmer. If no error
is there, then the program is converted into the bytecode and
saved as a ‘.class’ file.
LOAD – Now the major purpose of the component called ‘Class Loader’
is to load the byte code in the JVM. It doesn’t execute the code yet,
but just loads it into the JVM’s memory.
VERIFY – After loading the code, the JVM’s subpart called ‘Byte
Code verifier’ checks the bytecode and verifies it for its
authenticity. It also checks if the bytecode has any such code
which might lead to some malicious outcome. This component of the
JVM ensures security.
EXECUTE – The next component is the Execution Engine. The execution
engine interprets the code line by line using the Just In Time (JIT)
compiler. The JIT compiler does the execution pretty fast but
consumes extra cache memory.
The spec lists 4 phases in bytecode verification. These steps are functionally distinct, not to be mistaken with repeating the same thing. Just like a multi-pass compiler uses each pass to setup for the next pass, phases are not repetition, but are orchestrated for a single overall purpose, each phase accomplishes certain tasks.
Unless the bytecode is changed, there is no reason to verify it twice.
The verification is described here.
http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.10
Verification of code happens twice. Once during compilation (compilation fails if the code has flaws, threats) and again after the class is loaded into memory during execution (actual byte-code verification happens here). Yes, this happens along with the process of loading classes (by class loaders), but the class loaders themselves might not act as verifiers. Its the JVM (or rather the verifier present in the JVM) that does the verification.
Related
I already worked with -XX:+PrintCompilation, and I know the basic techniques of the JIT-compiler and why JIT-compilation is used.
Yet I still have not found out how the JVM decides to JIT-compile a method, i.e. "when the right time has come to JIT-compile a method".
Am I right in the assumption that every method starts being interpreted, and as long as it is not categorized as "hot method" it will not be compiled? I have something in the back of my head that I read that a method is considered "hot" when it was executed at least 10.000 times (after interpreting the method 10.000 times, it will be compiled), but I have to admit that I am not sure about this or where I've read this.
So to sum up my question:
(1) Is every method interpreted as long as it not has been categorized as "hot" method (and therefore has been compiled) or are there reasons for methods to get compiled even if they are not "hot"?
(2) How does the JVM categorize methods into "non-hot" and "hot" methods? Number of execution? Anything else?
(3) If there are certain thresholds (like number of executions) for "hot" methods, are there Java flags (-XX:...) to set this thresholds?
HotSpot compilation policy is rather complex, especially for Tiered Compilation, which is on by default in Java 8. It's neither a number of executions, nor a matter of CompileThreshold parameter.
The best explanation (apparently, the only reasonable explanation) can be found in HotSpot sources, see advancedThresholdPolicy.hpp.
I'll summarize the main points of this advanced compilation policy:
Execution starts at tier 0 (interpreter).
The main triggers for compilation are
method invocation counter i;
backedge counter b. Backward branches typically denote a loop in the code.
Every time counters reach certain frequency value (TierXInvokeNotifyFreqLog, TierXBackedgeNotifyFreqLog), a compilation policy is called to decide what to do next with currently running method. Depending on the values of i, b and current load of C1 and C2 compiler threads it can be decided to
continue execution in interpreter;
start profiling in interpreter;
compile method with C1 at tier 3 with full profile data required for futher recompilation;
compile method with C1 at tier 2 with no profile but with possibility to recompile (unlikely);
finally compile method with C1 at tier 1 with no profile or counters (also unlikely).
Key parameters here are TierXInvocationThreshold and TierXBackEdgeThreshold. Thresholds can be dynamically adjusted for a given method depending on the length of compilation queue.
Compilation queue is not FIFO, but rather a priority queue.
C1-compiled code with profile data (tier 3) behave similarly, except that thresholds for switching to the next level (C2, tier 4) are much bigger. E.g. an interpreted method can be compiled at tier 3 after about 200 invocations, while C1-compiled method is subject for recompilation at tier 4 after 5000+ invocations.
A special policy is used for method inlining. Tiny methods can be inlined into the caller even if they are not "hot". A bit larger methods can be inlined only if they are invoked frequently (InlineFrequencyRatio, InlineFrequencyCount).
The main parameter to control this is -XX:CompileThreshold=10000
Hotspot for Java 8 now uses a tiered compilation by default using a number of stages of compilation from level 1 to 4. I believe 1 is no optimisation. Level 3 is C1 (based on the client client) and Level 4 is C2 (based on the server compiler)
This means that a little optimisation can happen earlier than you might expect and it can keep optimising long after it has reach the 10K threshold. The highest I have seen is escape analysis eliminating a StringBuilder after one million calls.
Note: a loop iterating many times can trigger the compiler. e.g. a loop of 10K times can be enough.
1) Until a method is considered hot enough, it is interpreted. However some JVMs (e.g. Azul Zing) can compile methods on start up and you can force the Hotspot JVM to compile a method via an internal API. Java 9 may also have an AOT (Ahead Of Time) compiler but it is still being researched AFAIK
2) Number of calls, or number of iterations.
3) Yes -XX:CompileThreshold= being the main one.
I've recently been looking at The Java Virtual Machine Specifications (JVMS) to try to better understand the what makes my programs work, but I've found a section that I'm not quite getting...
Section 4.7.4 describes the StackMapTable Attribute, and in that section the document goes into details about stack map frames. The issue is that it's a little wordy and I learn best by example; not by reading.
I understand that the first stack map frame is derived from the method descriptor, but I don't understand how (which is supposedly explained here.) Also, I don't entirely understand what the stack map frames do. I would assume they're similar to blocks in Java, but it appears as though you can't have stack map frames inside each other.
Anyway, I have two specific questions:
What do the stack map frames do?
How is the first stack map frame created?
and one general question:
Can someone provide an explanation less wordy and easier to understand than the one given in the JVMS?
Java requires all classes that are loaded to be verified, in order to maintain the security of the sandbox and ensure that the code is safe to optimize. Note that this is done on the bytecode level, so the verification does not verify invariants of the Java language, it merely verifies that the bytecode makes sense according to the rules for bytecode.
Among other things, bytecode verification makes sure that instructions are well formed, that all the jumps are to valid instructions within the method, and that all instructions operate on values of the correct type. The last one is where the stack map comes in.
The thing is that bytecode by itself contains no explicit type information. Types are determined implicitly through dataflow analysis. For example, an iconst instruction creates an integer value. If you store it in slot 1, that slot now has an int. If control flow merges from code which stores a float there instead, the slot is now considered to have invalid type, meaning that you can't do anything more with that value until overwriting it.
Historically, the bytecode verifier inferred all the types using these dataflow rules. Unfortunately, it is impossible to infer all the types in a single linear pass through the bytecode because a backwards jump might invalidate already inferred types. The classic verifier solved this by iterating through the code until everything stopped changing, potentially requiring multiple passes.
However, verification makes class loading slow in Java. Oracle decided to solve this issue by adding a new, faster verifier, that can verify bytecode in a single pass. To do this, they required all new classes starting in Java 7 (with Java 6 in a transitional state) to carry metadata about their types, so that the bytecode can be verified in a single pass. Since the bytecode format itself can't be changed, this type information is stored seperately in an attribute called StackMapTable.
Simply storing the type for every single value at every single point in the code would obviously take up a lot of space and be very wasteful. In order to make the metadata smaller and more efficient, they decided to have it only list the types at positions which are targets of jumps. If you think about it, this is the only time you need the extra information to do a single pass verification. In between jump targets, all control flow is linear, so you can infer the types at in between positions using the old inference rules.
Each position where types are explicitly listed is known as a stack map frame. The StackMapTable attribute contains a list of frames in order, though they are usually expressed as a difference from the previous frame in order to reduce data size. If there are no frames in the method, which occurs when control flow never joins (i.e. the CFG is a tree), then the StackMapTable attribute can be omitted entirely.
So this is the basic idea of how StackMapTable works and why it was added. The last question is how the implicit initial frame is created. The answer of course is that at the beginning of the method, the operand stack is empty and the local variable slots have the types given by the types of the method parameters, which are determined from the method decriptor.
If you're used to Java, there are a few minor differences to how method parameter types work at the bytecode level. First off, virtual methods have an implicit this as first parameter. Second, boolean, byte, char, and short do not exist at the bytecode level. Instead, they are all implemented as ints behind the scenes.
I am little bit curious about that what happen if I manually changed something into bytecode before execution. For instance, let suppose assigning int type variable into byte type variable without casting or remove semicolon from somewhere in program or anything that leads to compile time error. As I know all compile time errors are checked by compiler before making .class file. So what happen when I changed byte code after successfully compile a program then changed bytecode manually ? Is there any mechanism to handle this ? or if not then how program behaves after execution ?
EDIT :-
As Hot Licks, Darksonn and manouti already gave correct satisfy answers.Now I just conclude for those readers who all seeking answer for this type question :-
Every Java virtual machine has a class-file verifier, which ensures that loaded class files have a proper internal structure. If the class-file verifier discovers a problem with a class file, it throws an exception. Because a class file is just a sequence of binary data, a virtual machine can't know whether a particular class file was generated by a well-meaning Java compiler or by shady crackers bent on compromising the integrity of the virtual machine. As a consequence, all JVM implementations have a class-file verifier that can be invoked on untrusted classes, to make sure the classes are safe to use.
Refer this for more details.
You certainly can use a hex editor (eg, the free "HDD Hex Editor Neo") or some other tool to modify the bytes of a Java .class file. But obviously, you must do so in a way that maintains the file's "integrity" (tables all in correct format, etc). Furthermore (and much trickier), any modification you make must pass muster by the JVM's "verifier", which essentially rechecks everything that javac verified while compiling the program.
The verification process occurs during class loading and is quite complex. Basically, a data flow analysis is done on each procedure to assure that only the correct data types can "reach" a point where the data type is assumed. Eg, you can't change a load operation to load a reference to a HashMap onto the "stack" when the eventual user of the loaded reference will be assuming it's a String. (But enumerating all the checks the verifier does would be a major task in itself. I can't remember half of them, even though I wrote the verifier for the IBM iSeries JVM.)
(If you're asking if one can "jailbreak" a Java .class file to introduce code that does unauthorized things, the answer is no.)
You will most likely get a java.lang.VerifyError:
Thrown when the "verifier" detects that a class file, though well formed, contains some sort of internal inconsistency or security problem.
You can certainly do this, and there are even tools to make it easier, like http://set.ee/jbe/. The Java runtime will run your modified bytecode just as it would run the bytecode emitted by the compiler. What you're describing is a Java-specific case of a binary patch.
The semicolon example wouldn't be an issue, since semicolons are only for the convenience of the compiler and don't appear in the bytecode.
Either the bytecode executes normally and performs the instructions given or the jvm rejects them.
I played around with programming directly in java bytecode some time ago using jasmin, and I noticed some things.
If the bytecode you edited it into makes sense, it will of coursse run as expected. However there are some bytecode patterns that are rejected with a VerifyError.
For the specific examble of out of bounds access, you can compile code with out of bounds just fine. They will get you an ArrayIndexOutOfBoundsException at runtime.
int[] arr = new int[20];
for (int i = 0; i < 100; i++) {
arr[i] = i;
}
However you can construct bytecode that is more fundamentally flawed than that. To give an example I'll explain some things first.
The java bytecode works with a stack, and instructions works with the top elements on the stack.
The stack naturally have different sizes at different places in the program but sometimes you might use a goto in the bytecode to cause the stack to look different depending on how you reached there.
The stack might contain object, int then you store the object in an object array and the int in an int array. Then you go on and from somewhere else in that bytecode you use a goto, but now your stack contains int, object which would result in an int being passed to an object array and vice versa.
This is just one example of things that could happen which makes your bytecode fundamentally flawed. The JVM detects these kinds of flaws when the class is loaded at runtime, and then emits a VerifyError if something dosen't work.
Could someone list major tasks that the bytecode verifier has to perform to guarantee correctness of the program? Is there a standard, minimal set of responsibilities defined in JVM specification? I was also wondering whether verifications spans across other phases such as loading and initializing.
This is specified in the JVM Specification: Chapter 4.10. Verification of class Files .
The bulk of the page describes the various aspects of type safety. To check that the program is type-safe the verifier needs to figure out what types of operands reside in the operand stack at each program point, and make sure that they match the type expected by the respective instruction.
Other things it verifies include, but is not limited to the following:
Branches must be within the bounds of the code array for the method.
The targets of all control-flow instructions are each the start of an instruction. In the case of a wide instruction, the wide opcode is considered the start of the instruction, and the opcode giving the operation modified by that wide instruction is not considered to start an instruction. Branches into the middle of an instruction are disallowed.
No instruction can access or modify a local variable at an index greater than or equal to the number of local variables that its method indicates it allocates.
All references to the constant pool must be to an entry of the appropriate type. (For example, the instruction getfield must reference a field.)
The code does not end in the middle of an instruction.
Execution cannot fall off the end of the code.
For each exception handler, the starting and ending point of code protected by the handler must be at the beginning of an instruction or, in the case of the ending point, immediately past the end of the code. The starting point must be before the ending point. The exception handler code must start at a valid instruction, and it must not start at an opcode being modified by the wide instruction.
As a final step the verifier also performs a data-flow analysis, which makes sure that no instruction reference any uninitialized local variables.
Alternatively you might like to give it a look at the Java Language Environment white paper by James Gosling.
The bytecode verifier traverses the bytecodes, constructs the type
state information, and verifies the types of the parameters to all the
bytecode instructions.
The illustration shows the flow of data and control from Java language
source code through the Java compiler, to the class loader and
bytecode verifier and hence on to the Java virtual machine, which
contains the interpreter and runtime system. The important issue is
that the Java class loader and the bytecode verifier make no
assumptions about the primary source of the bytecode stream--the code
may have come from the local system, or it may have travelled halfway
around the planet. The bytecode verifier acts as a sort of gatekeeper:
it ensures that code passed to the Java interpreter is in a fit state
to be executed and can run without fear of breaking the Java
interpreter. Imported code is not allowed to execute by any means
until after it has passed the verifier's tests. Once the verifier is
done, a number of important properties are known:
There are no operand stack overflows or underflows
The types of the parameters of all bytecode instructions are known to always be correct
Object field accesses are known to be legal--private, public, or protected
While all this checking appears excruciatingly detailed, by the time
the bytecode verifier has done its work, the Java interpreter can
proceed, knowing that the code will run securely. Knowing these
properties makes the Java interpreter much faster, because it doesn't
have to check anything. There are no operand type checks and no stack
overflow checks. The interpreter can thus function at full speed
without compromising reliability.
It does the following:
There are no operand stack overflows or underflows
The types of the
parameters of all bytecode instructions are known to always be
correct
Object field accesses are known to be legal--private,
public, or protected
Reference:
http://java.sun.com/docs/white/langenv/Security.doc3.html
I keep hearing about all the new cool features that are being added to the JVM and one of those cool features is invokedynamic. I would like to know what it is and how does it make reflective programming in Java easier or better?
It is a new JVM instruction which allows a compiler to generate code which calls methods with a looser specification than was previously possible -- if you know what "duck typing" is, invokedynamic basically allows for duck typing. There's not too much you as a Java programmer can do with it; if you're a tool creator, though, you can use it to build more flexible, more efficient JVM-based languages. Here is a really sweet blog post that gives a lot of detail.
As part of my Java Records article, I articulated about the motivation behind Invoke Dynamic. Let's start with a rough definition of Indy.
Introducing Indy
Invoke Dynamic (Also known as Indy) was part of JSR 292 intending to enhance the JVM support for Dynamic Type Languages. After its first release in Java 7, The invokedynamic opcode along with its java.lang.invoke luggage is used quite extensively by dynamic JVM-based languages like JRuby.
Although indy specifically designed to enhance the dynamic language support, it offers much more than that. As a matter of fact, it’s suitable to use wherever a language designer needs any form of dynamicity, from dynamic type acrobatics to dynamic strategies!
For instance, the Java 8 Lambda Expressions are actually implemented using invokedynamic, even though Java is a statically typed language!
User-Definable Bytecode
For quite some time JVM did support four method invocation types: invokestatic to call static methods, invokeinterface to call interface methods, invokespecial to call constructors, super() or private methods and invokevirtual to call instance methods.
Despite their differences, these invocation types share one common trait: we can’t enrich them with our own logic. On the contrary, invokedynamic enables us to Bootstrap the invocation process in any way we want. Then the JVM takes care of calling the Bootstrapped Method directly.
How Does Indy Work?
The first time JVM sees an invokedynamic instruction, it calls a special static method called Bootstrap Method. The bootstrap method is a piece of Java code that we’ve written to prepare the actual to-be-invoked logic:
Then the bootstrap method returns an instance of java.lang.invoke.CallSite. This CallSite holds a reference to the actual method, i.e. MethodHandle.
From now on, every time JVM sees this invokedynamic instruction again, it skips the Slow Path and directly calls the underlying executable. The JVM continues to skip the slow path unless something changes.
Example: Java 14 Records
Java 14 Records are providing a nice compact syntax to declare classes that are supposed to be dumb data holders.
Considering this simple record:
public record Range(int min, int max) {}
The bytecode for this example would be something like:
Compiled from "Range.java"
public java.lang.String toString();
descriptor: ()Ljava/lang/String;
flags: (0x0001) ACC_PUBLIC
Code:
stack=1, locals=1, args_size=1
0: aload_0
1: invokedynamic #18, 0 // InvokeDynamic #0:toString:(LRange;)Ljava/lang/String;
6: areturn
In its Bootstrap Method Table:
BootstrapMethods:
0: #41 REF_invokeStatic java/lang/runtime/ObjectMethods.bootstrap:
(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;
Ljava/lang/invoke/TypeDescriptor;Ljava/lang/Class;
Ljava/lang/String;[Ljava/lang/invoke/MethodHandle;)Ljava/lang/Object;
Method arguments:
#8 Range
#48 min;max
#50 REF_getField Range.min:I
#51 REF_getField Range.max:I
So the bootstrap method for Records is called bootstrap which resides in the java.lang.runtime.ObjectMethods class. As you can see, this bootstrap method expects the following parameters:
An instance of MethodHandles.Lookup representing the lookup context
(The Ljava/lang/invoke/MethodHandles$Lookup part).
The method name (i.e. toString, equals, hashCode, etc.) the bootstrap
is going to link. For example, when the value is toString, bootstrap
will return a ConstantCallSite (a CallSite that never changes) that
points to the actual toString implementation for this particular
Record.
The TypeDescriptor for the method (Ljava/lang/invoke/TypeDescriptor
part).
A type token, i.e. Class<?>, representing the Record class type. It’s
Class<Range> in this case.
A semi-colon separated list of all component names, i.e. min;max.
One MethodHandle per component. This way the bootstrap method can
create a MethodHandle based on the components for this particular
method implementation.
The invokedynamic instruction passes all those arguments to the bootstrap method. Bootstrap method, in turn, returns an instance of ConstantCallSite. This ConstantCallSite is holding a reference to requested method implementation, e.g. toString.
Why Indy?
As opposed to the Reflection APIs, the java.lang.invoke API is quite efficient since the JVM can completely see through all invocations. Therefore, JVM may apply all sorts of optimizations as long as we avoid the slow path as much as possible!
In addition to the efficiency argument, the invokedynamic approach is more reliable and less brittle because of its simplicity.
Moreover, the generated bytecode for Java Records is independent of the number of properties. So, less bytecode and faster startup time.
Finally, let’s suppose a new version of Java includes a new and more efficient bootstrap method implementation. With invokedynamic, our app can take advantage of this improvement without recompilation. This way we have some sort of Forward Binary Compatibility. Also, That’s the dynamic strategy we were talking about!
Other Examples
In addition to Java Records, the invoke dynamic has been used to implement features like:
Lambda Expressions in Java 8+: LambdaMetafactory
String Concatenation in Java 9+: StringConcatFactory
Some time ago, C# added a cool feature, dynamic syntax within C#
Object obj = ...; // no static type available
dynamic duck = obj;
duck.quack(); // or any method. no compiler checking.
Think of it as syntax sugar for reflective method calls. It can have very interesting applications. see http://www.infoq.com/presentations/Statically-Dynamic-Typing-Neal-Gafter
Neal Gafter, who's responsible for C#'s dynamic type, just defected from SUN to MS. So it's not unreasonable to think that the same things had been discussed inside SUN.
I remember soon after that, some Java dude announced something similar
InvokeDynamic duck = obj;
duck.quack();
Unfortunately, the feature is no where to be found in Java 7. Very disappointed. For Java programmers, they have no easy way to take advantage of invokedynamic in their programs.
There are two concepts to understand before continuing to invokedynamic.
1. Static vs. Dynamic Typing
Static - preforms type checking at compile time (e.g. Java)
Dynamic - preforms type checking at runtime (e.g. JavaScript)
Type checking is a process of verifying that a program is type safe, this is, checking typed information for class and instance variables, method parameters, return values, and other variables.
E.g. Java knows about int, String,.. at compile time, while type of an object in JavaScript can only be determined at runtime
2. Strong vs. Weak typing
Strong - specifies restrictions on the types of values supplied to its operations (e.g. Java)
Weak - converts (casts) arguments of an operation if those arguments have incompatible types (e.g. Visual Basic)
Knowing that Java is a Statically and Weakly typed, how do you implement Dynamically and Strongly typed languages on the JVM?
The invokedynamic implements a runtime system that can choose the most appropriate implementation of a method or function — after the program has been compiled.
Example:
Having (a + b) and not knowing anything about the variables a,b at compile time, invokedynamic maps this operation to the most appropriate method in Java at runtime. E.g., if it turns out a,b are Strings, then call method(String a, String b). If it turns out a,b are ints, then call method(int a, int b).
invokedynamic was introduced with Java 7.
The short answer is invokedynamic is a new opcode in the JVM that didn't exist prior to JAVA 7.
As far as reflection, within the context of this definition: Java Reflection is a process of examining or modifying the run time behavior of a class at run time., however, I believe more explanation is needed.
From the article below:
For example, reflection predates both collections and generics. As a
result, method signatures are represented by Class[] in the Reflection
API. This can be cumbersome and error-prone, and it is hampered by the
verbose nature of Java’s array syntax. It is further complicated by
the need to manually box and unbox primitive types and to work around
the possibility of void methods.
Method handles to the rescue
Instead of forcing the programmer to deal
with these issues, Java 7 introduced a new API, called MethodHandles,
to represent the necessary abstractions. The core of this API is the
package java.lang.invoke and especially the class MethodHandle.
Instances of this type provide the ability to call a method, and they
are directly executable. They are dynamically typed according to their
parameter and return types, which provides as much type safety as
possible, given the dynamic way in which they are used. The API is
needed for invokedynamic, but it can also be used alone, in which case
it can be considered a modern, safe alternative to reflection.
Quoting from Understanding Java method invocation with invokedynamic
These four are the bytecode representations of the standard forms of
method invocation used in Java 8 and Java 9, and they are
invokevirtual, invokespecial, invokeinterface, and invokestatic.
This raises the question of how the fifth opcode, invokedynamic,
enters the picture. The short answer is that, as of Java 9, there was
no direct support for invokedynamic in the Java language.
In fact, when invokedynamic was added to the runtime in Java 7, the
javac compiler would not emit the new bytecode under any circumstances
whatsoever.
As of Java 8, invokedynamic is used as a primary implementation
mechanism to provide advanced platform features. One of the clearest
and simplest examples of this use of the opcode is in the
implementation of lambda expressions.
So again, invokedynamic is a new opcode that allows for a new object reference type in JAVA, a Lambda.