I always come across articles which claim that Java is interpreted. I know that Oracle's HotSpot JRE provides just-in-time compilation, however is this the case for a majority of desktop users? For example, if I download Java via: http://www.java.com/en/download, will this include a JIT Compiler?
Yes, absolutely. Articles claiming Java is interpreted are typically written by people who either don't understand how Java works or don't understand what interpreted means.
Having said that, HotSpot will interpret code sometimes - and that's a good thing. There are definitely portions of any application (around startup, usually) which are only executed once. If you can interpret that faster than you can JIT compile it, why bother with the overhead? On the other hand, my experience of "Java is interpreted" articles is that this isn't what they mean :)
EDIT: To take T. J. Crowder's point in: yes, the JVM downloaded from java.com will be HotSpot. There are two different JITs for HotSpot, however - server and desktop. To sum up the differences in a single sentence, the desktop JIT is designed to start apps quickly, whereas the server JIT is more focused on high performance over time: server apps typically run for a very long time, so time spent optimising them really heavily pays off in the long run.
There is nothing in the JVM specification that mandates any particular execution strategy. Some JVMs only interpret, they don't even have a compiler. Some JVMs only JIT compile, they don't even have an interpreter. Some JVMs have both an intepreter and a compiler (or even multiple compilers) and statically choose between the two on startup. Some have both and dynamically switch back and forth during runtime. Some aren't even virtual machines in the usual sense of the word at all, they just statically compile JVM bytecode into native machinecode ahead-of-time.
The particular JVM that you are asking about, Oracle's HotSpot JVM, has one interpreter and two compilers, called the C1 and C2 compiler, also colloquially known as the client and server compilers, after their corresponding commandline options. HotSpot dynamically switches back and forth between the interpreter and one of the compilers at runtime (but it will not switch between the two compilers, you have to specify one of them on the commandline and then only that one will be used for the entire runtime of the JVM).
As per document here Starting with some of the later Java SE 7 releases, a new feature called tiered compilation became available. This feature uses the C1 compiler mode at the start to provide better startup performance. Once the application is properly warmed up, the C2 compiler mode takes over to provide more-aggressive optimizations and, usually, better performance
The C1 compiler is an optimizing compiler which is pretty fast and doesn't use a lot of memory. The C2 compiler is much more aggressively optimizing, but is also slower and uses more memory.
You select between the two by specifying the -client and -server commandline options (-client is the default if you don't specify one), which also sets a couple of other JVM parameters like the default JIT threshold (in -client mode, methods will be compiled after they have been interpreted 1500 times, in -server mode after 10000 times, can be set with the -XX:CompileThreshold commandline argument).
Whether or not "the majority of desktop users" actually will run in compiled or interpreted mode depends largely on what code they are running. My guess is that the vast majority of desktop users run the HotSpot JVM from Oracle's JRE/JDK or one of its forks (e.g. SoyLatte on OSX, IcedTea or OpenJDK on Unix/BSD/Linux) and they don't fiddle with the commandline options, so they will probably get the C1 compiler with the default 1500 JIT threshold. (But applications such as IntelliJ, Eclipse or NetBeans have their own launcher scripts that usually supply different commandline arguments.)
In my case, for example, I often run small scripts which never actually reach the JIT threshold, so they are never compiled. (Nor should they be.)
Some of these links about the Hotspot JVM (what you are downloading in the java.com download link above) might help:
Java SE HotSpot at a Glance
The Java HotSpot Performance Engine Architecture
Frequently Asked Questions About the Java HotSpot VM
Neither of the (otherwise-excellent) answers so far seems to have actually answered your last question, so: Yes, the Java runtime you downloaded from www.java.com is Oracle's (Sun's) Hotspot JVM, and so yes, it will do JIT compilation. HotSpot isn't just for servers or anything like that, it runs on desktops and takes full advantage of its (very mature) optimizing JIT compiler.
Jvm spec never claim how to execute the java bytecode, however, you can specify a JIT compiler if you use the JVM from hotspot VM, JIT is just a technique to optimize byte code execution.
Related
JVM involves,
1) Compiler threads compiling byte code to native code at runtime.
2) Linking phase similar to linking of .o files into executable(like C)
Compilation and Dynamic linking is not a one time job, after jvm is initially launched. This is part of life-cycle of an executable, running on production.
Why java compiler designed to generate platform independent code(byte code)? that shifts performance lag to java runtime with compilation of byte code & Linking capabilities, on every production machine.
Aren't we utilizing an extra bunch of cpu cycles for processing(compilation/linking) at runtime on every production machine, that customer owns? Customer does not need to be aware about compile once and run everywhere, which can be avoided with 2-3 build machines, to build platform specific binaries.
Edit: Below is an answer to the original question interpreted as "What can be done to minimize the impact of Just-In-Time compilation on the Java Virtual Machine?"
Use interpreter only
If you want to disable compilation altogether you can use java's -Xint argument. However I think that this will result in a significant performance degradation in almost all cases.
Reduce JIT compiler threads priority
You can reduce the priority of the JIT compiler threads using -XX:CompilerThreadPriority=<n>. The values you can use there are OS-dependent.
Reduce number of JIT compiler threads
If you are concerned by the number of JIT compiler threads, you can use the -XX:CICompilerCount=<n> and -XX:[+|-]CICompilerCountPerCPU flags to control the number of compiler threads.
If CICompilerCountPerCPU is true (-XX:+CICompilerCountPerCPU), HotSpot will use some formula to decide how many compiler thread should be started (~ (log n * log log n) * 3 / 2 where n = # of available CPUs).
If CICompilerCount is set HotSpot will use that many compiler threads.
If you don't set anything, CICompilerCountPerCPU will automatically be set to true.
Tweak compilation policies
You can increase the compilation thresholds so that less methods get compiled. If you are using tired-compilation (the default nowadays) you can use
-XX:Tier3CompileThreshold=<n> (defaults to 2000)
-XX:Tier3InvocationThreshold=<n> (defaults to 200)
-XX:Tier4CompileThreshold=<n> (defaults to 15000)
-XX:Tier4InvocationThreshold=<n> (defaults to 5000)
The tiered compilation policy has many other knobs.
In particular you can also only use some of the tiers with -XX:TieredStopAtLevel=<n> where n is between 1 and 4. Higher tiers generally provide better performance but require longer compilation times.
The reason Java doesn't generate platform-specific executables is because of portability. The point of Java is "write once, run anywhere." One should be able to write a java program on a windows computer, compile it on a Linux, and run it on a Mac, with no problem whatsoever. The person(s) who write the JVM for the target platform are the ones doing the work of porting it to a the new platform, not you.
But the JVM is really good at optimizing, and can translate heavily used code from byte-code to platform-specific native code. It only does this where needed, if it will improve performance. Like an optimizing compiler, the JVM knows best, and don't undermine its capabilities unless you have a specific use case.
On the EN Wiki I read that both C# and Java are interpreted languages, however at least for C# I think it is not true.
Many interpreted languages are first compiled to some form of virtual
machine code, which is then either interpreted or compiled at runtime
to native code.
From my understanding, it is compiled into CIL and when run, using JIT its compiled to target platform. I have also read that JIT is an interpreter, is that really so?
Or are they called interpreted as they are using intermediate code? I do not understand it.
Thanks
JIT is a form of compilation to native (machine) code. Typically (but not as a necessity), implementations of either the CLI and JVM are compiled in two steps:
the language compiler compiles code to something intermediate (IL/bytecode)
the JIT compiles that to native/machine code at runtime
However, interpreters for both do exist. Micro Framework operates as an IL interpreter, for example. Equally, tools like (looking .NET here) NGEN and "AOT" (mono) allow compilation to native/machine code at the start.
They are considered JIT languages which is different from interpreting. JIT simply compiles to native code when needed during execution. The common strategy is to compile into an intermediate representation (bytecode) beforehand which makes the JIT faster.
However, there is nothing that prevents them from being interpreted, or even statically compiled. Languages are simply languages - how they are executed is irrelevant from a language perspective.
On the EN Wiki I read that both C# and Java are interpreted languages
Can you pls provide the link?
May be the interpreted word means different here. It perhaps means that these languages are first interpreted to convert source code into platform-independent code.(VM Specific)
are they called interpreted as they are using intermediate code
I too think so.
I have also read that JIT is an interpreter
JIT is a compiler. See this
Is something "interpreter" or not depends on context of discussion.
From purely abstract view interpreter can be defined as any intermediate program present in runtime which dynamically translates program code written in one language to a target code of hardware/software of other language. Think about runing java bytecode on x86 hardware, or running Python on CLR VM what exactly IronPython is. In this view every virtual machine is an interpreter of some kind. As it is program present in runtime it clearly differs from static compilers or hardware implemented VM-s.
Now there are many different ways to achieve this functionality where accent is on "dynamically" and "present in runtime".
In discussions where implementation of VM matters, people make clear distinction between "classical" interpreter and JIT-ed one. Classical interpreter is something which for every instruction of hosted program emits routine of target code. This design is simple to build, but hard to optimize. JIT-ed design reads bunch of instruction of original code, and then translates all those instructions to a one native compiled routine. So it "interprets" faster. It is like micro static compiler within VM. There are many different ways to accomplish behavior labeled as JIT, and then there are other approaches like tracing compilers.
Modern VM's like CLR, HotSpot and J9 JVM's are even more complex than to be tagged with simple labels as JIT or Interpreter. They can be at a same time static compilers (AOT execution), classical interpreters and JIT-ed VMs.
For example CLR can compile code Ahead-Of-Time (static compiler), and store native code as bunch of more or less excutable files on disk to be used for faster future startups of hosted program. I believe "ngen" is AOT process used in windows for this functionality. If AOT is not used CLR behaves as JIT VM.
J9 and HotSpot are able to switch in runtime between purely interpreted execution or JIT-ed on depending of code analysis and current load. So it's is quite gray area. J9 even has AOT functionality similar to CLR.
Some other VMs like Maxine JVM or PyPy are socalled "metacircular" VM. This means they are (mostly) implemented in a same language they host (Maxine is JVM written in Java). In order to provided good code they usually have some JIT like behavior implemented in host language which is than bootstrapped and optimized by a very low, close to machine, interpreter.
So actual definition of interpreter varies on context of discussion. When labels like JIT are used then there is clear accent of discussion to an implementation details of VM being discussed.
My understanding is that the Java bytecode produced by invoking javac is independent of the underlying operating system, but the HotSpot compiler will perform platform-specific JIT optimizations and compilations as the program is running.
However, I compiled code on Windows under a 32 bit JDK and executed it on Solaris under a 32 bit JVM (neither OS is a 64 bit operating system). The Solaris x86 box, to the best of my knowledge (working to confirm the specs on it) should outperform the Windows box in all regards (number of cores, amount of RAM, hard disk latency, processor speed, and so on). However, the same code is running measurably faster on Windows (a single data point would be a 7.5 second operation on Windows taking over 10 seconds on Solaris) on a consistent basis. My next test would be to compile on Solaris and note performance differences, but that just doesn't make sense to me, and I couldn't find any Oracle documentation that would explain what I'm seeing.
Given the same version (major, minor, release, etc.) of the JVM on two different operating systems, would invoking javac on the same source files result in different optimizations within the Java bytecode (the .class files produced)? Is there any documentation that explains this behavior?
No. javac does not do any optimizations on different platforms.
See the oracle "tools" page (where javac and other tools are described):
Each of the development tools comes in a Microsoft Windows version (Windows) and a Solaris or Linux version. There is virtually no difference in features between versions. However, there are minor differences in configuration and usage to accommodate the special requirements of each operating system. (For example, the way you specify directory separators depends on the OS.)
(Maybe the Solaris JVM is slower than the windows JVM?)
The compilation output should not depend on OS on which javac was called.
If you want to verify it try:
me#windows# javac Main.java
me#windows# javap Main.class > Main.win.txt
me#linux# javac Main.java
me#linux# javap Main.class > Main.lin.txt
diff Main.win.txt Main.lin.txt
I decided to google it anyway. ;)
http://java.sun.com/docs/white/platform/javaplatform.doc1.html
The Java Platform is a new software platform for delivering and running highly interactive, dynamic, and secure applets and applications on networked computer systems. But what sets the Java Platform apart is that it sits on top of these other platforms, and executes bytecodes, which are not specific to any physical machine, but are machine instructions for a virtual machine. A program written in the Java Language compiles to a bytecode file that can run wherever the Java Platform is present, on any underlying operating system. In other words, the same exact file can run on any operating system that is running the Java Platform. This portability is possible because at the core of the Java Platform is the Java Virtual Machine.
Written April 30, 1996.
A common mistake, esp if you have developed for C/C++, is to assume that the compiler optimises the code. It does one and only one optimisation which is to evaluate compiler time known constants.
It is certainly true that the compiler is no where near as powerful as you might imagine because it just validates the code and produces byte-code which matches your code as closely as possible.
This is because the byte-code is for an idealised virtual machine which in theory doesn't need any optimisations. Hopefully when you think about it that way it makes sense that the compiler does do anything much, it doesn't know how the code will actually be used.
Instead all the optimisation is performed by the JIT in the JVM. This is entirely platform dependant and can produce 32-bit or 64-bit code and use the exact instruction of the processor running the code. It will also optimise the code based on how it is actually used, something a static compiler cannot do. It means the code can be re-compiled more than once based on different usage patterns. ;)
To my understanding javac only consideres the -target argument to decide what bytecode to emit, hence there is no platform specific in the byte code generation.
All the optimization is done by the JVM, not the compiler, when interpreting the byte codes. This is specific to the individual platform.
Also I've read somewhere that the Solaris JVM is the reference implementation, and then it is ported to Windows. Hence the Windows version is more optimzied than the Solaris one.
Does javac perform any bytecode level optimizations depending on the
underlying operating system?
No.
Determining why the performance characteristics of your program are different on two platforms requires profiling them under the same workload, and careful analysis of method execution times and memory allocation/gc behaivor. Does your program do any I/O?
To extend on dacwe's part "Maybe the Solaris JVM is slower than the windows JVM?"
There are configuration options (e.g., whether to use the client or server vm [link], and probably others as well), whose defaults differ depending on the OS. So that might be a reason why the Solaris VM is slower here.
I know Microsoft .NET uses the CLR as a JIT compiler while Java has the Hotspot. What Are the differences between them?
They are very different beasts. As people pointed out, the CLR compiles to machine code before it executes a piece of MSIL. This allows it in addition to the typical dead-code elimination and inlining off privates optimizations to take advantage of the particular CPU architecture of the target machine (though I'm not sure whether it does it). This also incurs a hit for each class (though the compiler is fairly fast and many platform libraries are just a thin layer over the Win32 API).
The HotSpot VM is taking a different approach. It stipulates that most of the code is executed rarely, hence it's not worth to spend time compiling it. All bytecode starts in interpreted mode. The VM keeps statistics at call-sites and tries to identify methods which are called more than a predefined number of times. Then it compiles only these methods with a fast JIT compiler (C1) and swaps the method while it is running (that's the special sauce of HS). After the C1-compiled method has been invoked some more times, the same method is compiled with slow, but sophisticated compiler and the code is swapped again on the fly.
Since HotSpot can swap methods while they are running, the VM compilers can perform some speculative optimizations that are unsafe in statically compiled code. A canonical example is static dispatch / inlining of monomorphic calls (polymorphic method with only one implementation). This is done if the VM sees that this method always resolves to the same target. What used to be complex invocation is reduced to a few CPU instructions guard, which are predicted and pipelined by modern CPUs. When the guard condition stops being true, the VM can take a different code path or even drop back to interpreting mode. Based on statistics and program workload, the generated machine code can be different at different time. Many of these optimizations rely on the information gathered during the program execution and are not possible if you compile once whan you load the class.
This is why you need to warm-up the JVM and emulate realistic workload when you benchmark algorithms (skewed data can lead to unrealistic assesment of the optimizations). Other optimizations are lock elision, adaptive spin-locking, escape analysis and stack allocation, etc.
That said, HotSpot is only one of the VMs. JRockit, Azul, IBM's J9 and the Resettable RVM, - all have different performance profiles.
I realize the benefits of bytecode vs. native code (portability).
But say you always know that your code will run on a x86 architecture, why not then compile for x86 and get the performance benefit?
Note that I am assuming there is a performance gain to native code compilation. Some folks have answered that there could in fact be no gain which is news to me..
Because the performance gain (if any) is not worth the trouble.
Also, garbage collection is very important for performance. Chances are that the GC of the JVM is better than the one embedded in the compiled executable, say with GCJ.
And just in time compilation can even result in better performance because the JIT has more information are run-time available to optimize the compilation than the compiler at compile-time. See the wikipedia page on JIT.
"Solaris" is an operating system, not a CPU architecture. The JVM installed on the actual machine will compile to the native CPU instructions. Solaris could be SPARC, x86, or x86-64 architecture.
Also, the JIT compiler can make processor-specific optimisations depending on which actual CPU family you have. For example, different instruction sequences are faster on Intel CPUs than on AMD CPUs, and a JIT compiler for your exact platform can take advantage of this information to produce highly optimised code.
The bytecode runs in a Java Virtual Machine that is compiled for (example) Solaris. It will be optimised like heck for that operating system.
In real-world cases, you see often see equal or better performance from Java code at runtime, by virtue of building on the virtual machine's code for things like memory management - that code will have been evolving and maturing for years.
There's more benefits to building for the JVM than just portability - for example, every time a new JVM is released your compiled bytecode gets any optimisations, algorithmic improvements etc. that come from the best in the business. On the other hand, once you've compiled your C code, that's it.
Because with Just-In-Time compilation, there is trivial performance benefit.
Actually, many things JIT can actually do faster.
It's already will be compiled by JIT into Solaris native code, after run. You can't receive any other benefits if you compile it before uploading at target site.
You may, or may not get a performance benefit. But more likely you would get a performance penalty: JIT optimization is not possible with static compilation, so the performance would be only as good as the compiler can make it "blindfolded" (without actually profiling the program and optimizing it accordingly, which is what JIT compilers such as HotSpot does).
It's intuitively quite surprising how cheap (resource-wise) compiling is, and how much can be automatically optimized by just observing the running program. Black magic, but good for us :-)
All this talk of JITs is about seven years out of date BTW. The technology concerned now is called HotSpot and it isn't just a JIT.
"why not then compile for x86"
Because then you can't take advantage of the specific features of the particular cpu it gets run on. In particular, if we are to read "compile for x86" as "produce native code that can run on a 386 and its descendants", then the resulting code can't rely on even something as old as the mmx instructions.
As such, the end result is that you need to compile for every exact architecture it'll run on (what about those that does not exist yet), and have the installer select which executable to put into place. Or, I hear the intel C++ compiler will produce several versions of the same function, differing only on cpu features used, and pick the right one at run-time based on what the CPU reports as available.
On the other hand, you can view bytecode as a "half-compiled" source, similar to an intermediate format a native compiler will (unless asked) not actually write to disk. The runtime environment can then do the final compilation, knowing exactly what architecture will be used. This is the given reason why some C#/.net code could slightly outperform c++ code on some cpu-intensive tasks in some benchmarks a while ago.
The "final compilation" of bytecode can also make additional optimalization assumptions that are (from a static compilation perspective) distinctly unsafe*, and just recompile if those assumptions are found wrong later.
I guess because JIT (just in time) compilation is very advanced.