Why doesn't the JVM cache JIT compiled code?

Why doesn't the JVM cache JIT compiled code? - java

The canonical JVM implementation from Sun applies some pretty sophisticated optimization to bytecode to obtain near-native execution speeds after the code has been run a few times.
The question is, why isn't this compiled code cached to disk for use during subsequent uses of the same function/class?
As it stands, every time a program is executed, the JIT compiler kicks in afresh, rather than using a pre-compiled version of the code. Wouldn't adding this feature add a significant boost to the initial run time of the program, when the bytecode is essentially being interpreted?

Without resorting to cut'n'paste of the link that #MYYN posted, I suspect this is because the optimisations that the JVM performs are not static, but rather dynamic, based on the data patterns as well as code patterns. It's likely that these data patterns will change during the application's lifetime, rendering the cached optimisations less than optimal.
So you'd need a mechanism to establish whether than saved optimisations were still optimal, at which point you might as well just re-optimise on the fly.

Oracle's JVM is indeed documented to do so -- quoting Oracle,
the compiler can take advantage of
Oracle JVM's class resolution model to
optionally persist compiled Java
methods across database calls,
sessions, or instances. Such
persistence avoids the overhead of
unnecessary recompilations across
sessions or instances, when it is
known that semantically the Java code
has not changed.
I don't know why all sophisticated VM implementations don't offer similar options.

An updated to the existing answers - Java 8 has a JEP dedicated to solving this:
=> JEP 145: Cache Compiled Code. New link.
At a very high level, its stated goal is:
Save and reuse compiled native code from previous runs in order to
improve the startup time of large Java applications.
Hope this helps.

Excelsior JET has a caching JIT compiler since version 2.0, released back in 2001. Moreover, its AOT compiler may recompile the cache into a single DLL/shared object using all optimizations.

I do not know the actual reasons, not being in any way involved in the JVM implementation, but I can think of some plausible ones:
The idea of Java is to be a write-once-run-anywhere language, and putting precompiled stuff into the class file is kind of violating that (only "kind of" because of course the actual byte code would still be there)
It would increase the class file sizes because you would have the same code there multiple times, especially if you happen to run the same program under multiple different JVMs (which is not really uncommon, when you consider different versions to be different JVMs, which you really have to do)
The class files themselves might not be writable (though it would be pretty easy to check for that)
The JVM optimizations are partially based on run-time information and on other runs they might not be as applicable (though they should still provide some benefit)
But I really am guessing, and as you can see, I don't really think any of my reasons are actual show-stoppers. I figure Sun just don't consider this support as a priority, and maybe my first reason is close to the truth, as doing this habitually might also lead people into thinking that Java class files really need a separate version for each VM instead of being cross-platform.
My preferred way would actually be to have a separate bytecode-to-native translator that you could use to do something like this explicitly beforehand, creating class files that are explicitly built for a specific VM, with possibly the original bytecode in them so that you can run with different VMs too. But that probably comes from my experience: I've been mostly doing Java ME, where it really hurts that the Java compiler isn't smarter about compilation.

Related

Creating a custom JVM with larger object header

For various reasons I came to the conclusion that creating a custom JVM build might be the easiest option for what I am trying to achieve as there are simply too many things that are affecting performance really badly if done otherwise.
So I have the environment up and running, modified some simple things to generate some callbacks for what I need, played with some intrinsics, so far so good.
What I would like to know though is: What do the JVM experts here think about the feasibility of creating a custom VM that has a larger object header (e.g. 8 bytes more). markOop.hpp explains the content of the mark word in a pretty good way for the different flavours that exist (32bits, 64bits, 64bits with compressed oops) and I was wondering how hard it would be to extend the header so I can put some extra info on the objects (no tagging is not on option, see my post here).
So before digging deeper in this I was hoping that someone with experience in this could give some early feedback. Like is that a "suicide mission" because there are too many places all over where there are hard coded assumptions regarding the header size and the offsets? Or is all this fairly centralized and could be accomplished with reasonable effort without risking to break everything? Any pointer for what might need special care and what consequences this might have (besides the very obvious; more memory consumption)?

It is definitely possible to enlarge the object header (I've seen such experiments before), though this won't be as easy as just adding a new field into class oopDesc. I believe there are multiple places in JVM code that rely on the size of object header, but there are should not be too much. The size of object header already differs depending on the platform and the UseCompressedOops option, so the most places in the code already use relative offsets and won't suffer from a new field.
The other option is not to expand the header, but rather add a new fake field to java.lang.Object class. HotSpot already has the machinery for adding such fields, look for InjectedField in the sources. However, this won't be trivial either. There are some hardcoded offsets for system classes, see JavaClasses::check_offsets. These need to be fixed, too.
The both approaches are roughly equal in terms of implementation efforts. In both cases I suggest to start with debug (not fastdebug) builds of JVM as they include many helpful assertions that will catch the possible offset problems early.
Having heard of your project, I think you also have the third option: give up "JVMTI only" requirement and rewrite some parts of the agent in Java leveraging the power of bytecode instrumentation and JIT compilation. Yes, this may slightly change Java code being executed and probably result in more classes loaded, but does this really matter, if from the user's perspective the impact will be even less than with JVMTI-only agent? I mean, the performance impact could be significantly less when there are no Java<->native switches, JVMTI overhead and so on. If the agent has low overhead and works with stock JVM, I guess it's not a big problem to make it ON in production in order to get its cool features, is it?

If JVM generates machine code, then where are the code files?

I read some materials about JVM and bytecode. I think it would be more efficient if JVM can translate bytecode into platform dependent machine code in the first time run, instead of interpreting them all the time.
However, I could not find such files in my project folders. There are only bin and src folders, which contain *.class bytecodes and *.java source codes.
So my questions are:
If Java interprets bytecode all the time, why not translate bytecode to machine code after the first run?
If they do generate machine code, where are the files?

Not an option since the environment can change between runs (e.g. upgrade of JVM)
In memory (or serialized to disk when needed)

If Java interprets bytecode all the time, why not translate bytecode
to machine code after the first run?
There are pros and cons to both ahead of time (AOT) and just in time (JIT) compilation.
The main advantage of AOT is that the compiler is generally allowed to take longer, so it can perform more sophisticated analysis and optimization. Another advantage is that the compiler doesn't have to be present at runtime on the target machine. The disadvantages are everything else.
The main advantage of JIT is that the compiler is able to make optimizations based on information known only at runtime. In fact, it is even possible to unoptimize and reoptimize code when conditions change. Furthermore, the JIT doesn't have to waste time optimizing code that is never or rarely run, unlike the AOT compiler.
Some languages are designed to favor one approach over the other. For example, C/C++ are designed for AOT, while Java is designed for JIT (though it can be compiled AOT with some restrictions). For example, Java has a heavy emphasis on virtual getters and setters, possibly for classes not loaded until runtime. But the JIT can see and inline these functions at runtime. By contrast, if you used virtual methods for every field access in C++, you'd pay a huge performance penalty.

It doesn't interpret code all the time. Interpreted code is translated into byte code after some time. You can tweak this "time" using -XX:CompileThreshold= (default is 10000) or you can turn off compilation completely.
In memory. There's a special area in memory called "Code cache". You can see how methods a compiled into the cache and how they are evicted from the cache using -XX:+PrintCompilation. The size of the cache is also configurable, see -XX:ReservedCodeCacheSize=.

Well, the JVM has preprocessed data but only for its own classes. Given the size of the JRE library and the fact that it usually doesn’t change, it’s a big win (you might look for files called classes.jsa).
However, even these files are not containing native code but only easier-to-process byte code.
The big point about code generation in Hot Spot JVMs is that they don’t compile code on a class or method basis as you seem to think. These JVMs compile code fragments spanning multiple interacting methods as the interaction is discovered during the self-profiling. These code blocks may span methods from the JRE, the extension libraries, 3rd party libraries in your class path and your application classes and hence are only valid for this specific combination.
During the compilation the information gathered about your program’s behavior will be used, e.g. code paths not taken might be elided and conditionals might be asserted to evaluate to a certain result as they did in previous evaluations. This yields to a high performance but it might happen that the JVM has to drop the code even during the same execution when one of the assertions does not hold anymore, e.g. the program might take a code path it didn’t before or a new class has been loaded into the JVM which extends a class whose code has been optimized as-if having no subclasses, etc.
So if optimized and compiled code might be rendered obsolete even within the same environment, it is even much likelier to be obsolete in the next execution. In the end, the JVM would have to check whether the old code is still appropriate which might turn out to be even costlier than simply gathering the new environment’s data and program behavior.

Compiling Scheme using Java

I was writing a Scheme interpreter (trying to be fully R5RS compatible) and it just struck me that compiling into VM opcodes would make it faster. (Correct me if I am wrong.) I can interpret the Scheme source code in the memory, but I am stuck at understanding code generation.
My question is: What patterns will be required to generate opcodes from a parse tree, for, say, the JVM or any other VM (or even a real machine)? And what, if any, will be the complications, advantages, or disadvantage of doing so?

For Scheme there will be two major complications related to JVM.
First, JVM does not support explicit tail calls annotations, therefore you won't be able to guarantee a proper tail recursion as required by R5RS (3.5) without resorting to an expensive mini-interpreter trick.
The second issue is with continuations support. JVM does not provide anything useful for implementing continuations, so again you're bound to use a mini-interpreter. I.e., each CPS trivial function should return a next closure, which will be then called by an infinite mini-interpreter loop.
But still there are many interesting optimisation possibilities. I'd recommend to take a look at Bigloo (there is a relatively fast JVM backend) and Kawa. For the general compilation techniques take a look at Scheme in 90 minutes.
And still, interpretation is a viable alternative to compilation (at least on JVM, due to its severe limitations and general inefficiency). See how SISC is implemented, it is quite an interesting and innovative approach.

Is it possible to write a decent java optimizer if information is lost in the translation to bytecode?

It occurred to me that when you write a C program, the compiler knows the source and destination platform (for lack of a better term) and can optimize to the machine it is building code for.
But in java the best the compiler can do is optimize to the bytecode, which may be great, but there's still a layer in the jvm that has to interpret the bytecode, and the farther the bytecode is away translation-wise from the final machine architecture, the more work has to be done to make it go.
It seems to me that a bytecode optimizer wouldn't be nearly as good because it has lost all the semantic information available from the original source code (which may already have been butchered by the java compiler's optimizer.)
So is it even possible to ever approach the efficiency of C with a java compiler?

Actually, a byte-code JIT compiler can exceed the performance of statically compiled languages in many instances because it can evaluate the byte code in real time and in the actual execution context. So the apps performance increases as it continues to run.

What Kevin said. Also, the bytecode optimizer (JIT) can also take advantage of runtime information to perform better optimizations. For instance, it knows what code is executing more (Hot-spots) so it doesn't spend time optimizing code that rarely executes. It can do most of the stuff that profile-guided optimization gives you (branch prediction, etc), but on-the-fly for whatever the target procesor is. This is why the JVM usually needs to "warm up" before it reaches best performance.

In theory both optimizers should behave 'identically' as it is standard practice for c/c++ compilers to perform the optimization on the generated assembly and not the source code so you've already lost any semantic information.

If you read the byte code, you may see that the compiler doesn't optimise the code very well. However the JIT can optimise the code so this really doesn't matter.
Say you compile the code on an x86 machine and new architecture comes along, lets call it x64, the same Java binary can take advantage of the new features of that architecture even though it might not have existed when the code was compiled. It means you can take old distributions of libraries and take advantage of the latest hardware specific optimisations. You cannot do this with C/C++.
Java can optimise inline calls for virtual methods. Say you have a virtual method with many different possible implementations. However, say one or two implementations are called most of the time in reality. The JIT can detect this and inline up to two method implementations but still behave correctly if you happen to call another implementation. You cannot do this with C/C++
Java 7 supports escape analysis for locked/synchronised objects, it can detect that an object is only used in a local context and drop synchronization for that object.
In the current versions of Java, it can detect if two consecutive methods lock the same object and keep the lock between them (rather than release and re-acquire the lock)
You cannot do this with C/C++ because there isn't a language level understanding of locking.

What are advantages of bytecode over native code? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
It seems like anything you can do with bytecode you can do just as easily and much faster in native code. In theory, you could even retain platform and language independence by distributing programs and libraries in bytecode then compiling to native code at installation, rather than JITing it.
So in general, when would you want to execute bytecode instead of native?

Hank Shiffman from SGI said (a long time ago, but it's till true):
There are three advantages of Java
using byte code instead of going to
the native code of the system:
Portability: Each kind of computer has its unique instruction
set. While some processors include the
instructions for their predecessors,
it's generally true that a program
that runs on one kind of computer
won't run on any other. Add in the
services provided by the operating
system, which each system describes in
its own unique way, and you have a
compatibility problem. In general, you
can't write and compile a program for
one kind of system and run it on any
other without a lot of work. Java gets
around this limitation by inserting
its virtual machine between the
application and the real environment
(computer + operating system). If an
application is compiled to Java byte
code and that byte code is interpreted
the same way in every environment then
you can write a single program which
will work on all the different
platforms where Java is supported.
(That's the theory, anyway. In
practice there are always small
incompatibilities lying in wait for
the programmer.)
Security: One of Java's virtues is its integration into the Web. Load
a web page that uses Java into your
browser and the Java code is
automatically downloaded and executed.
But what if the code destroys files,
whether through malice or sloppiness
on the programmer's part? Java
prevents downloaded applets from doing
anything destructive by disallowing
potentially dangerous operations.
Before it allows the code to run it
examines it for attempts to bypass
security. It verifies that data is
used consistently: code that
manipulates a data item as an integer
at one stage and then tries to use it
as a pointer later will be caught and
prevented from executing. (The Java
language doesn't allow pointer
arithmetic, so you can't write Java
code to do what we just described.
However, there is nothing to prevent
someone from writing destructive byte
code themselves using a hexadecimal
editor or even building a Java byte
code assembler.) It generally isn't
possible to analyze a program's
machine code before execution and
determine whether it does anything
bad. Tricks like writing
self-modifying code mean that the evil
operations may not even exist until
later. But Java byte code was designed
for this kind of validation: it
doesn't have the instructions a
malicious programmer would use to hide
their assault.
Size: In the microprocessor world RISC is generally preferable
over CISC. It's better to have a small
instruction set and use many fast
instructions to do a job than to have
many complex operations implemented as
single instructions. RISC designs
require fewer gates on the chip to
implement their instructions, allowing
for more room for pipelines and other
techniques to make each instruction
faster. In an interpreter, however,
none of this matters. If you want to
implement a single instruction for the
switch statement with a variable
length depending on the number of case
clauses, there's no reason not to do
so. In fact, a complex instruction set
is an advantage for a web-based
language: it means that the same
program will be smaller (fewer
instructions of greater complexity),
which means less time to transfer
across our speed-limited network.
So when considering byte code vs native, consider which trade-offs you want to make between portability, security, size, and execution speed. If speed is the only important factor, go native. If any of the others are more important, go with bytecode.
I'll also add that maintaining a series of OS and architecture-targeted compilations of the same code base for every release can become very tedious. It's a huge win to use the same Java bytecode on multiple platforms and have it "just work."

The performance of essentially any program will improve if it is compiled, executed with profiling, and the results fed back into the compiler for a second pass. The code paths which are actually used will be more aggressively optimized, loops unrolled to exactly the right degree, and the hot instruction paths arranged to maximize I$ hits.
All good stuff, yet it is almost never done because it is annoying to go through so many steps to build a binary.
This is the advantage of running the bytecode for a while before compiling it to native code: profiling information is automatically available. The result after Just-In-Time compilation is highly optimized native code for the specific data the program is processing.
Being able to run the bytecode also enables more aggressive native optimization than a static compiler could safely use. For example if one of the arguments to a function is noted to always be NULL, all handling for that argument can simply be omitted from the native code. There will be a brief validity check of the arguments in the function prologue, if that argument is not NULL the VM aborts back to the bytecode and starts profiling again.

Bytecode creates an extra level of indirection.
The advantages of this extra level of indirection are:
Platform independence
Can create any number of programming languages (syntax) and have them compile down to the same bytecode.
Could easily create cross language converters
x86, x64, and IA64 no longer need to be compiled as seperate binaries. Only the proper virtual machine needs to be installed.
Each OS simply needs to create a virtual machine and it will have support for the same program.
Just in time compilation allows you to update a program just by replacing a single patched source file. (Very beneficial for web pages)
Some of the disadvantages:
Performance
Easier to decompile

All good answers, but my hot-button has been hit - performance.
If the code being run spends all its time calling library/system routines - file operations, database operations, sending windows messages, then it doesn't matter very much if it's JITted, because most of the clock time is spent waiting for those lower-level operations to complete.
However, if the code contains things we usually call "algorithms", that have to be fast and don't spend much time calling functions, and if those are used often enough to be a performance problem, then JIT is very important.

I think you just answered your own question: platform independence. Platform-independent bytecode is produced and distributed to its target platform. When executed it's quickly compiled to native code either before execution begins, or simultaneously (Just In Time). The Java JVM and presumably the .NET runtimes operate on this principle.

Here: http://slashdot.org/developers/02/01/31/013247.shtml
Go see what the geeks of Slashdot have to say about it! Little dated, but very good comments!

Ideally you would have portable bytecode that compiles Just In Time to native code. I think the reason bytecode interpreters exist without JIT is due primarily to the practical fact that native code compilation adds complexity to a virtual machine. It takes time to build, debug, and maintain that additional component. Not everyone has the time or resources to make that commitment.
A secondary factor is safety. It's much easier to verify an interpreter won't crash than to guarantee the same for native code.
Third is performance. It can often take more time to generate machine code than to interpret bytecode for small pieces of code that only run once.

Portability and platform independence are probably the most notable advantages of bytecode over native code.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.