What are the differences in JIT between Java and .Net

What are the differences in JIT between Java and .Net - java

I know Microsoft .NET uses the CLR as a JIT compiler while Java has the Hotspot. What Are the differences between them?

They are very different beasts. As people pointed out, the CLR compiles to machine code before it executes a piece of MSIL. This allows it in addition to the typical dead-code elimination and inlining off privates optimizations to take advantage of the particular CPU architecture of the target machine (though I'm not sure whether it does it). This also incurs a hit for each class (though the compiler is fairly fast and many platform libraries are just a thin layer over the Win32 API).
The HotSpot VM is taking a different approach. It stipulates that most of the code is executed rarely, hence it's not worth to spend time compiling it. All bytecode starts in interpreted mode. The VM keeps statistics at call-sites and tries to identify methods which are called more than a predefined number of times. Then it compiles only these methods with a fast JIT compiler (C1) and swaps the method while it is running (that's the special sauce of HS). After the C1-compiled method has been invoked some more times, the same method is compiled with slow, but sophisticated compiler and the code is swapped again on the fly.
Since HotSpot can swap methods while they are running, the VM compilers can perform some speculative optimizations that are unsafe in statically compiled code. A canonical example is static dispatch / inlining of monomorphic calls (polymorphic method with only one implementation). This is done if the VM sees that this method always resolves to the same target. What used to be complex invocation is reduced to a few CPU instructions guard, which are predicted and pipelined by modern CPUs. When the guard condition stops being true, the VM can take a different code path or even drop back to interpreting mode. Based on statistics and program workload, the generated machine code can be different at different time. Many of these optimizations rely on the information gathered during the program execution and are not possible if you compile once whan you load the class.
This is why you need to warm-up the JVM and emulate realistic workload when you benchmark algorithms (skewed data can lead to unrealistic assesment of the optimizations). Other optimizations are lock elision, adaptive spin-locking, escape analysis and stack allocation, etc.
That said, HotSpot is only one of the VMs. JRockit, Azul, IBM's J9 and the Resettable RVM, - all have different performance profiles.

Related

Does JVM translates the byte code to machine instructions? [duplicate]

I always come across articles which claim that Java is interpreted. I know that Oracle's HotSpot JRE provides just-in-time compilation, however is this the case for a majority of desktop users? For example, if I download Java via: http://www.java.com/en/download, will this include a JIT Compiler?

Yes, absolutely. Articles claiming Java is interpreted are typically written by people who either don't understand how Java works or don't understand what interpreted means.
Having said that, HotSpot will interpret code sometimes - and that's a good thing. There are definitely portions of any application (around startup, usually) which are only executed once. If you can interpret that faster than you can JIT compile it, why bother with the overhead? On the other hand, my experience of "Java is interpreted" articles is that this isn't what they mean :)
EDIT: To take T. J. Crowder's point in: yes, the JVM downloaded from java.com will be HotSpot. There are two different JITs for HotSpot, however - server and desktop. To sum up the differences in a single sentence, the desktop JIT is designed to start apps quickly, whereas the server JIT is more focused on high performance over time: server apps typically run for a very long time, so time spent optimising them really heavily pays off in the long run.

There is nothing in the JVM specification that mandates any particular execution strategy. Some JVMs only interpret, they don't even have a compiler. Some JVMs only JIT compile, they don't even have an interpreter. Some JVMs have both an intepreter and a compiler (or even multiple compilers) and statically choose between the two on startup. Some have both and dynamically switch back and forth during runtime. Some aren't even virtual machines in the usual sense of the word at all, they just statically compile JVM bytecode into native machinecode ahead-of-time.
The particular JVM that you are asking about, Oracle's HotSpot JVM, has one interpreter and two compilers, called the C1 and C2 compiler, also colloquially known as the client and server compilers, after their corresponding commandline options. HotSpot dynamically switches back and forth between the interpreter and one of the compilers at runtime (but it will not switch between the two compilers, you have to specify one of them on the commandline and then only that one will be used for the entire runtime of the JVM).
As per document here Starting with some of the later Java SE 7 releases, a new feature called tiered compilation became available. This feature uses the C1 compiler mode at the start to provide better startup performance. Once the application is properly warmed up, the C2 compiler mode takes over to provide more-aggressive optimizations and, usually, better performance
The C1 compiler is an optimizing compiler which is pretty fast and doesn't use a lot of memory. The C2 compiler is much more aggressively optimizing, but is also slower and uses more memory.
You select between the two by specifying the -client and -server commandline options (-client is the default if you don't specify one), which also sets a couple of other JVM parameters like the default JIT threshold (in -client mode, methods will be compiled after they have been interpreted 1500 times, in -server mode after 10000 times, can be set with the -XX:CompileThreshold commandline argument).
Whether or not "the majority of desktop users" actually will run in compiled or interpreted mode depends largely on what code they are running. My guess is that the vast majority of desktop users run the HotSpot JVM from Oracle's JRE/JDK or one of its forks (e.g. SoyLatte on OSX, IcedTea or OpenJDK on Unix/BSD/Linux) and they don't fiddle with the commandline options, so they will probably get the C1 compiler with the default 1500 JIT threshold. (But applications such as IntelliJ, Eclipse or NetBeans have their own launcher scripts that usually supply different commandline arguments.)
In my case, for example, I often run small scripts which never actually reach the JIT threshold, so they are never compiled. (Nor should they be.)

Some of these links about the Hotspot JVM (what you are downloading in the java.com download link above) might help:
Java SE HotSpot at a Glance
The Java HotSpot Performance Engine Architecture
Frequently Asked Questions About the Java HotSpot VM

Neither of the (otherwise-excellent) answers so far seems to have actually answered your last question, so: Yes, the Java runtime you downloaded from www.java.com is Oracle's (Sun's) Hotspot JVM, and so yes, it will do JIT compilation. HotSpot isn't just for servers or anything like that, it runs on desktops and takes full advantage of its (very mature) optimizing JIT compiler.

Jvm spec never claim how to execute the java bytecode, however, you can specify a JIT compiler if you use the JVM from hotspot VM, JIT is just a technique to optimize byte code execution.

JVM : Bytecode Confusion and JIT

So here's a question that came to my mind when I was studying Java. We know (please correct me if I am wrong!) that the Bytecode runs atop JVM. So does the JVM convert the Bytecode to the native machine code it's(JVM) written for? If that is so, isn't it less secure?
Also what exactly is a just-in-time compiler? It compiles when it is asked to do so...I studied some resources, but still didn't get the just-in-time part clear.
Thanks for any help !

So does the JVM convert the Bytecode to the native machine code
it's(JVM) written for?
No, not necessarily. Though, nowadays it is state of the art to do so by default.
If that is so, isn't it less secure?
Less secure than what?
Just because one can do insecure operations in machine code (like dereferencing an unitialized pointer or accessing unallocated memory) does not mean the JIT generates such insecure code.
Also what exactly is a just-in-time compiler?
It's that part of the JVM that converts bytecode to native machine code.
The name "just in time" means that the code is compiled (in a separate thread) while it is executed. Once completely compiled, the JVM takes notice that certain methods are compiled and can be invoked on the machine level.

So does the JVM convert the Byte-code to the native machine code it's(JVM) written for?
All JVM implementation I have seen so far are converting byte-code to the native machine code VM is written for. Although I can't see how and why doing otherwise would be useful.
Also what exactly is a just-in-time compiler?
It's simply process of converting byte code to native code in run-time. Although for performance improvement it's being done by VM in parallel with your program execution. It also usually including compiled native code caching and some other techniques of performance improvement.
If that is so, isn't it less secure?
Well, to some very small degree it is. VERY VERY SMALL DEGREE. There're some security-related modifications to different OSes eliminating JIT compilation. For example, grsecurity Linux kernel patch is in fact doing JIT impossible (actually doing impossible to execute JIT-compiled code). And another fact is that similar memory protection mechanism (writable memory pages can't be executable) is implemented in iOS which makes impossible to do any JIT compilation in user mode.

Why isn't more Java software compiled natively?

I realize the benefits of bytecode vs. native code (portability).
But say you always know that your code will run on a x86 architecture, why not then compile for x86 and get the performance benefit?
Note that I am assuming there is a performance gain to native code compilation. Some folks have answered that there could in fact be no gain which is news to me..

Because the performance gain (if any) is not worth the trouble.
Also, garbage collection is very important for performance. Chances are that the GC of the JVM is better than the one embedded in the compiled executable, say with GCJ.
And just in time compilation can even result in better performance because the JIT has more information are run-time available to optimize the compilation than the compiler at compile-time. See the wikipedia page on JIT.

"Solaris" is an operating system, not a CPU architecture. The JVM installed on the actual machine will compile to the native CPU instructions. Solaris could be SPARC, x86, or x86-64 architecture.
Also, the JIT compiler can make processor-specific optimisations depending on which actual CPU family you have. For example, different instruction sequences are faster on Intel CPUs than on AMD CPUs, and a JIT compiler for your exact platform can take advantage of this information to produce highly optimised code.

The bytecode runs in a Java Virtual Machine that is compiled for (example) Solaris. It will be optimised like heck for that operating system.
In real-world cases, you see often see equal or better performance from Java code at runtime, by virtue of building on the virtual machine's code for things like memory management - that code will have been evolving and maturing for years.
There's more benefits to building for the JVM than just portability - for example, every time a new JVM is released your compiled bytecode gets any optimisations, algorithmic improvements etc. that come from the best in the business. On the other hand, once you've compiled your C code, that's it.

Because with Just-In-Time compilation, there is trivial performance benefit.
Actually, many things JIT can actually do faster.

It's already will be compiled by JIT into Solaris native code, after run. You can't receive any other benefits if you compile it before uploading at target site.

You may, or may not get a performance benefit. But more likely you would get a performance penalty: JIT optimization is not possible with static compilation, so the performance would be only as good as the compiler can make it "blindfolded" (without actually profiling the program and optimizing it accordingly, which is what JIT compilers such as HotSpot does).
It's intuitively quite surprising how cheap (resource-wise) compiling is, and how much can be automatically optimized by just observing the running program. Black magic, but good for us :-)

All this talk of JITs is about seven years out of date BTW. The technology concerned now is called HotSpot and it isn't just a JIT.

"why not then compile for x86"
Because then you can't take advantage of the specific features of the particular cpu it gets run on. In particular, if we are to read "compile for x86" as "produce native code that can run on a 386 and its descendants", then the resulting code can't rely on even something as old as the mmx instructions.
As such, the end result is that you need to compile for every exact architecture it'll run on (what about those that does not exist yet), and have the installer select which executable to put into place. Or, I hear the intel C++ compiler will produce several versions of the same function, differing only on cpu features used, and pick the right one at run-time based on what the CPU reports as available.
On the other hand, you can view bytecode as a "half-compiled" source, similar to an intermediate format a native compiler will (unless asked) not actually write to disk. The runtime environment can then do the final compilation, knowing exactly what architecture will be used. This is the given reason why some C#/.net code could slightly outperform c++ code on some cpu-intensive tasks in some benchmarks a while ago.
The "final compilation" of bytecode can also make additional optimalization assumptions that are (from a static compilation perspective) distinctly unsafe*, and just recompile if those assumptions are found wrong later.

I guess because JIT (just in time) compilation is very advanced.

Is it possible to write a decent java optimizer if information is lost in the translation to bytecode?

It occurred to me that when you write a C program, the compiler knows the source and destination platform (for lack of a better term) and can optimize to the machine it is building code for.
But in java the best the compiler can do is optimize to the bytecode, which may be great, but there's still a layer in the jvm that has to interpret the bytecode, and the farther the bytecode is away translation-wise from the final machine architecture, the more work has to be done to make it go.
It seems to me that a bytecode optimizer wouldn't be nearly as good because it has lost all the semantic information available from the original source code (which may already have been butchered by the java compiler's optimizer.)
So is it even possible to ever approach the efficiency of C with a java compiler?

Actually, a byte-code JIT compiler can exceed the performance of statically compiled languages in many instances because it can evaluate the byte code in real time and in the actual execution context. So the apps performance increases as it continues to run.

What Kevin said. Also, the bytecode optimizer (JIT) can also take advantage of runtime information to perform better optimizations. For instance, it knows what code is executing more (Hot-spots) so it doesn't spend time optimizing code that rarely executes. It can do most of the stuff that profile-guided optimization gives you (branch prediction, etc), but on-the-fly for whatever the target procesor is. This is why the JVM usually needs to "warm up" before it reaches best performance.

In theory both optimizers should behave 'identically' as it is standard practice for c/c++ compilers to perform the optimization on the generated assembly and not the source code so you've already lost any semantic information.

If you read the byte code, you may see that the compiler doesn't optimise the code very well. However the JIT can optimise the code so this really doesn't matter.
Say you compile the code on an x86 machine and new architecture comes along, lets call it x64, the same Java binary can take advantage of the new features of that architecture even though it might not have existed when the code was compiled. It means you can take old distributions of libraries and take advantage of the latest hardware specific optimisations. You cannot do this with C/C++.
Java can optimise inline calls for virtual methods. Say you have a virtual method with many different possible implementations. However, say one or two implementations are called most of the time in reality. The JIT can detect this and inline up to two method implementations but still behave correctly if you happen to call another implementation. You cannot do this with C/C++
Java 7 supports escape analysis for locked/synchronised objects, it can detect that an object is only used in a local context and drop synchronization for that object.
In the current versions of Java, it can detect if two consecutive methods lock the same object and keep the lock between them (rather than release and re-acquire the lock)
You cannot do this with C/C++ because there isn't a language level understanding of locking.

Does Java save its runtime optimizations?

My professor did an informal benchmark on a little program and the Java times were: 1.7 seconds for the first run, and 0.8 seconds for the runs thereafter.
Is this due entirely to the loading of the runtime environment into the operating environment ?
OR
Is it influenced by Java's optimizing the code and storing the results of those optimizations (sorry, I don't know the technical term for that)?

Okay, I found where I read that. This is all from "Learning Java" (O'Reilly 2005):
The problem with a traditional JIT compilation is that optimizing code takes time. So a JIT compiler can produce decent results but may suffer a significant latency when the application starts up. This is generally not a problem for long-running server-side applications but is a serious problem for client-side software and applications run on smaller devices with limited capabilities. To address this, Sun's compiler technology, called HotSpot, uses a trick called adaptive compilation. If you look at what programs actually spend their time doing, it turns out that they spend almost all their time executing a relatively small part of the code again and again. The chunk of code that is executed repeatedly may be only a small fraction of the total program, but its behavior determines the program's overall performance. Adaptive compilation also allows the Java runtime to take advantage of new kinds of optimizations that simply can't be done in a statically compiled language, hence the claim that Java code can run faster than C/C++ in some cases.
To take advantage of this fact, HotSpot starts out as a normal Java bytecode interpreter, but with a difference: it measures (profiles) the code as it is executing to see what parts are being executed repeatedly. Once it knows which parts of the code are crucial to performance, HotSpot compiles those sections into optimal native machine code. Since it compiles only a small portion of the program into machine code, it can afford to take the time necessary to optimize those portions. The rest of the program may not need to be compiled at all—just interpreted—saving memory and time. In fact, Sun's default Java VM can run in one of two modes: client and server, which tell it whether to emphasize quick startup time and memory conservation or flat out performance.
A natural question to ask at this point is, Why throw away all this good profiling information each time an application shuts down? Well, Sun has partially broached this topic with the release of Java 5.0 through the use of shared, read-only classes that are stored persistently in an optimized form. This significantly reduces both the startup time and overhead of running many Java applications on a given machine. The technology for doing this is complex, but the idea is simple: optimize the parts of the program that need to go fast, and don't worry about the rest.
I'm kind of wondering how far Sun has gotten with it since Java 5.0.

I'm not aware of any virtual machine in widespread use that saves statistical usage data between program invocations -- but it certainly is an interesting possibility for future research.
What you're seeing is almost certainly due to disk caching.

I agree that it's likely the result of disk caching.
FYI, the IBM Java 6 VM does contain an ahead-of-time compiler (AOT). The code isn't quite as optimized as what the JIT would produce, but it is stored across VMs, I believe in some sort of persistent shared memory. Its primary benefit is to improve startup performance. The IBM VM by default JITs a method after it's been called 1000 times. If it knows that a method is going to be called 1000 times just during the VM startup (think a commonly-used method like java.lang.String.equals(...) ), then it's beneficial for it to store that in the AOT cache so that it never has to waste time compiling at runtime.

I agree that the performance difference seen by the poster is most likely caused by disk latency bringing the JRE into memory. The Just In Time compiler (JIT) would not have an impact on performance of a little application.
Java 1.6u10 (http://download.java.net/jdk6/) touches the runtime JARs in a background process (even if Java isn't running) to keep the data in the disk cache. This significantly decreases startup times (Which is a huge benefit to desktop apps, but probably of marginal value to server side apps).
On large, long running applications, the JIT makes a big difference over time - but the amount of time required for the JIT to accumulate sufficient statistics to kick in and optimize (5-10 seconds) is very, very short compared to the overall life of the application (most run for months and months). While storing and restoring the JIT results is an interesting academic exercise, the practical improvement is not very large (Which is why the JIT team has been more focused on things like GC strategies for minimizing memory cache misses, etc...).
The pre-compilation of the runtime classes does help desktop applications quite a bit (as does the aforementioned 6u10 disk cache pre-loading).

You should describe how your Benchmark was done. Especially at which point you start to measure the time.
If you include the JVM startup time (which is useful for Benchmarking the User experience but not so useful to optimize Java code) then it might be a filesystem caching effect or it can be caused by a feature called "Java Class Data Sharing":
For Sun:
http://java.sun.com/j2se/1.5.0/docs/guide/vm/class-data-sharing.html
This is an option where the JVM saves a prepared image of the runtime classes to a file, to allow quicker loading (and sharing) of those at the next start. You can control this with -Xshare:on or -Xshare:off with a Sun JVM. The default is -Xshare:auto which will load the shared classes image if present, and if not present it will write it at first startup if the directory is write able.
With IBM Java 5 this is BTW even more powerful:
http://www.ibm.com/developerworks/java/library/j-ibmjava4/
I don't know of any mainstream JVM which is saving JIT statistics.

Java JVM (actually might change from different implementations of the JVM) when first started out will interpret the byte code. Once it detects that the code will be running enough number of times JITs it to native machine language so it runs faster.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.