JVM involves,
1) Compiler threads compiling byte code to native code at runtime.
2) Linking phase similar to linking of .o files into executable(like C)
Compilation and Dynamic linking is not a one time job, after jvm is initially launched. This is part of life-cycle of an executable, running on production.
Why java compiler designed to generate platform independent code(byte code)? that shifts performance lag to java runtime with compilation of byte code & Linking capabilities, on every production machine.
Aren't we utilizing an extra bunch of cpu cycles for processing(compilation/linking) at runtime on every production machine, that customer owns? Customer does not need to be aware about compile once and run everywhere, which can be avoided with 2-3 build machines, to build platform specific binaries.
Edit: Below is an answer to the original question interpreted as "What can be done to minimize the impact of Just-In-Time compilation on the Java Virtual Machine?"
Use interpreter only
If you want to disable compilation altogether you can use java's -Xint argument. However I think that this will result in a significant performance degradation in almost all cases.
Reduce JIT compiler threads priority
You can reduce the priority of the JIT compiler threads using -XX:CompilerThreadPriority=<n>. The values you can use there are OS-dependent.
Reduce number of JIT compiler threads
If you are concerned by the number of JIT compiler threads, you can use the -XX:CICompilerCount=<n> and -XX:[+|-]CICompilerCountPerCPU flags to control the number of compiler threads.
If CICompilerCountPerCPU is true (-XX:+CICompilerCountPerCPU), HotSpot will use some formula to decide how many compiler thread should be started (~ (log n * log log n) * 3 / 2 where n = # of available CPUs).
If CICompilerCount is set HotSpot will use that many compiler threads.
If you don't set anything, CICompilerCountPerCPU will automatically be set to true.
Tweak compilation policies
You can increase the compilation thresholds so that less methods get compiled. If you are using tired-compilation (the default nowadays) you can use
-XX:Tier3CompileThreshold=<n> (defaults to 2000)
-XX:Tier3InvocationThreshold=<n> (defaults to 200)
-XX:Tier4CompileThreshold=<n> (defaults to 15000)
-XX:Tier4InvocationThreshold=<n> (defaults to 5000)
The tiered compilation policy has many other knobs.
In particular you can also only use some of the tiers with -XX:TieredStopAtLevel=<n> where n is between 1 and 4. Higher tiers generally provide better performance but require longer compilation times.
The reason Java doesn't generate platform-specific executables is because of portability. The point of Java is "write once, run anywhere." One should be able to write a java program on a windows computer, compile it on a Linux, and run it on a Mac, with no problem whatsoever. The person(s) who write the JVM for the target platform are the ones doing the work of porting it to a the new platform, not you.
But the JVM is really good at optimizing, and can translate heavily used code from byte-code to platform-specific native code. It only does this where needed, if it will improve performance. Like an optimizing compiler, the JVM knows best, and don't undermine its capabilities unless you have a specific use case.
Related
I'm building a java CLI utility application that processes some data from a file.
Apart from reading from a file, all the operations are done in-memory. The in-memory processing part is taking a surprisingly long time so I tried profiling it but could not pinpoint any specific function that performed particularly bad.
I was afraid that JIT was not able to optimize the program during a single run, so I benchmarked how the runtime changes between the consecutive executions of the function with all the program logic (including reading the input file) and sure enough, the runtime for the in-memory processing part goes down for several executions and becomes almost 10 times smaller already on the 5th run.
I tried shuffling the input data before every execution, but it doesn't have any visible effect on this. I'm not sure if some caching may be responsible for this improvement or the JIT optimizations done during the program run, but since usually the program is ran once at time, it always shows the worst performance.
Would it be possible to somehow get a good performance during the first run? Is there a generic way to optimize performance for a short-running java applications?
You probably cannot optimize startup time and performance by changing your application1, 2. And especially for a small application3. And I certainly don't think there are "generic" ways to do it; i.e. optimizations that will work for all cases.
However, there are a couple of JVM features that should improve performance for a short-lived JVM.
Class Data Sharing (CDS) is a feature that allows JIT compiled classes to be cached in the file system (as a CDS archive) and which is then reused by later of runs of your application. This feature has been available since Java 5 (though with limitations in earlier Java releases).
The CDS feature is controlled using the -Xshare JVM option.
-Xshare:dump generates a CDS archive during the run
-Xshare:off -Xshare:on and -Xshare:auto control whether an existing CDS archive will be used.
The other way to improve startup times for a HotSpot JVM is (was) to use Ahead Of Time (AOT) compilation. Basically, you compile your application to a native code binary using the jaotc command, and then run the executable it produces rather than the java command. The jaotc command is experimental and was introduced in Java 9.
It appears that jaotc was not included in the Java 16 builds published by Oracle, and is scheduled for removal in Java 17. (See JEP 410: Remove the Experimental AOT and JIT Compiler).
The current recommended way to get AOT compilation for Java is to use the GraalVM AOT Java compiler.
1 - You could convert into a client-server application where the server "up" all of the time. However, that has other problems, and doesn't eliminate the startup time issue for the client ... assuming that is coded in Java.
2 - According to #apangin, there are some other application tweaks that may could make you code more JIT friendly, though it will depend on what you code is currently doing.
3 - It is conceivable that the startup time for a large (long running) monolithic application could be improved by refactoring it so that subsystems of the application can be loaded and initialized only when they are needed. However, it doesn't sound like this would work for your use-case.
You could have the small processing run as a service: when you need to run it, "just" make a network call to that service (easier if it's HTTP because there are easy way to do it in Java). That way, the processing itself stays in the same JVM and will eventually get faster when JIT kicks in.
Of course, because it could require significant development, that is only valid if the processing itself:
is called often
has arguments that are easy to pass to the service (usually serialized as strings)
has arguments that don't require too much data to pass to the service (e.g. several MB binary content)
I am a beginner in java programming course and so far this is what I have understood about the whole java program being compiled and executed. Stating in brief:-
1) Source code (.java) file is converted into bytecode(.class) (which is an intermediate code) by the java compiler.
2) This bytecode(.class) file is platform independent so wooosh....I can copy it and take it to a different platform machine which has JVM.
3) When I run the bytecode The JVM which is a part of JRE first verifies the
bytecode, calls out JIT which at runtime makes the optimizations since
it
has access to dynamic
runtime information.
4) And finally JVM interprets the intermediate code into a
series of machine instructions for the processor to execute. (A processor can't execute the bytecode directly since it is not in native code)
Is my understanding correct? Anything that needs to be added or corrected?
Taking each of your points in turn:
1) This is correct. Java source is compiled by javac (although other tools could do the same thing) and class files are generated.
2) Again, correct. Class files contain platform-neutral bytecodes. These are loosely an instruction set for a 'virtual' machine (i.e. the JVM). This is how Java implements the "write once, run anywhere" idea it's had since it was launched.
3) Partially correct. When the JVM needs to load a class it runs a four-phase verification on the bytecodes of that class to ensure that the format of the bytecodes is legal in terms of the JVM. This is to prevent bytecode sequences being generated that could potentially subvert the JVM (i.e. virus-like behaviour). The JVM does not, however, run the JIT at this point. When bytecodes are executed they start in interpreted mode. Each bytecode is converted on the fly to the required native instructions and OS system calls.
4) This is sort of wrong when combined with point 3.
Here's the process explained briefly:
As the JVM interprets the bytecodes of the application it also profiles which groups of bytecodes are being run frequently. If you have a loop that repeatedly calls a method the JVM will notice this and identify that this is a hotspot in your code (hence the name of the Oracle JVM). Once a method has been called enough times (which is tunable), the JVM will call the Just In Time (JIT) compiler to generate native instructions for that method. When the method is called again the native code is used, eliminating the need for interpreting and thus improving the speed of the application. This profiling phase is what leads to the 'warm-up' behaviour of a Java application where relevant sections of the code are gradually compiled into native instructions.
For OpenJDK based JVMs there are two JIT compilers, C1 and C2 (sometimes called client and server). The C1 JIT will warm-up more quickly but have a lower optimum level of performance. C2 warms-up more slowly but applies a greater level of optimisation to the code, giving a higher overall performance level.
The JVM can also throw away compiled code, either because it hasn't been used for a long time (like in a cache) or an assumption that the JIT made (called a speculative optimisation) turns out to be wrong. This is called a deopt and results in the JVM going back to interpreted mode, reprofiling the code and potentially recompiling it with the JIT.
First and foremost, java is only a programming language. That means you could (theoretically) run a compiler to generate a native binary instad of this bytecode. (See: Compiling a java program into an executable )
The other thing I should mention are Java Processors which are able to execute java bytecode directly... because its their native instruction set (See: https://en.wikipedia.org/wiki/Java_processor )
I always come across articles which claim that Java is interpreted. I know that Oracle's HotSpot JRE provides just-in-time compilation, however is this the case for a majority of desktop users? For example, if I download Java via: http://www.java.com/en/download, will this include a JIT Compiler?
Yes, absolutely. Articles claiming Java is interpreted are typically written by people who either don't understand how Java works or don't understand what interpreted means.
Having said that, HotSpot will interpret code sometimes - and that's a good thing. There are definitely portions of any application (around startup, usually) which are only executed once. If you can interpret that faster than you can JIT compile it, why bother with the overhead? On the other hand, my experience of "Java is interpreted" articles is that this isn't what they mean :)
EDIT: To take T. J. Crowder's point in: yes, the JVM downloaded from java.com will be HotSpot. There are two different JITs for HotSpot, however - server and desktop. To sum up the differences in a single sentence, the desktop JIT is designed to start apps quickly, whereas the server JIT is more focused on high performance over time: server apps typically run for a very long time, so time spent optimising them really heavily pays off in the long run.
There is nothing in the JVM specification that mandates any particular execution strategy. Some JVMs only interpret, they don't even have a compiler. Some JVMs only JIT compile, they don't even have an interpreter. Some JVMs have both an intepreter and a compiler (or even multiple compilers) and statically choose between the two on startup. Some have both and dynamically switch back and forth during runtime. Some aren't even virtual machines in the usual sense of the word at all, they just statically compile JVM bytecode into native machinecode ahead-of-time.
The particular JVM that you are asking about, Oracle's HotSpot JVM, has one interpreter and two compilers, called the C1 and C2 compiler, also colloquially known as the client and server compilers, after their corresponding commandline options. HotSpot dynamically switches back and forth between the interpreter and one of the compilers at runtime (but it will not switch between the two compilers, you have to specify one of them on the commandline and then only that one will be used for the entire runtime of the JVM).
As per document here Starting with some of the later Java SE 7 releases, a new feature called tiered compilation became available. This feature uses the C1 compiler mode at the start to provide better startup performance. Once the application is properly warmed up, the C2 compiler mode takes over to provide more-aggressive optimizations and, usually, better performance
The C1 compiler is an optimizing compiler which is pretty fast and doesn't use a lot of memory. The C2 compiler is much more aggressively optimizing, but is also slower and uses more memory.
You select between the two by specifying the -client and -server commandline options (-client is the default if you don't specify one), which also sets a couple of other JVM parameters like the default JIT threshold (in -client mode, methods will be compiled after they have been interpreted 1500 times, in -server mode after 10000 times, can be set with the -XX:CompileThreshold commandline argument).
Whether or not "the majority of desktop users" actually will run in compiled or interpreted mode depends largely on what code they are running. My guess is that the vast majority of desktop users run the HotSpot JVM from Oracle's JRE/JDK or one of its forks (e.g. SoyLatte on OSX, IcedTea or OpenJDK on Unix/BSD/Linux) and they don't fiddle with the commandline options, so they will probably get the C1 compiler with the default 1500 JIT threshold. (But applications such as IntelliJ, Eclipse or NetBeans have their own launcher scripts that usually supply different commandline arguments.)
In my case, for example, I often run small scripts which never actually reach the JIT threshold, so they are never compiled. (Nor should they be.)
Some of these links about the Hotspot JVM (what you are downloading in the java.com download link above) might help:
Java SE HotSpot at a Glance
The Java HotSpot Performance Engine Architecture
Frequently Asked Questions About the Java HotSpot VM
Neither of the (otherwise-excellent) answers so far seems to have actually answered your last question, so: Yes, the Java runtime you downloaded from www.java.com is Oracle's (Sun's) Hotspot JVM, and so yes, it will do JIT compilation. HotSpot isn't just for servers or anything like that, it runs on desktops and takes full advantage of its (very mature) optimizing JIT compiler.
Jvm spec never claim how to execute the java bytecode, however, you can specify a JIT compiler if you use the JVM from hotspot VM, JIT is just a technique to optimize byte code execution.
I know Microsoft .NET uses the CLR as a JIT compiler while Java has the Hotspot. What Are the differences between them?
They are very different beasts. As people pointed out, the CLR compiles to machine code before it executes a piece of MSIL. This allows it in addition to the typical dead-code elimination and inlining off privates optimizations to take advantage of the particular CPU architecture of the target machine (though I'm not sure whether it does it). This also incurs a hit for each class (though the compiler is fairly fast and many platform libraries are just a thin layer over the Win32 API).
The HotSpot VM is taking a different approach. It stipulates that most of the code is executed rarely, hence it's not worth to spend time compiling it. All bytecode starts in interpreted mode. The VM keeps statistics at call-sites and tries to identify methods which are called more than a predefined number of times. Then it compiles only these methods with a fast JIT compiler (C1) and swaps the method while it is running (that's the special sauce of HS). After the C1-compiled method has been invoked some more times, the same method is compiled with slow, but sophisticated compiler and the code is swapped again on the fly.
Since HotSpot can swap methods while they are running, the VM compilers can perform some speculative optimizations that are unsafe in statically compiled code. A canonical example is static dispatch / inlining of monomorphic calls (polymorphic method with only one implementation). This is done if the VM sees that this method always resolves to the same target. What used to be complex invocation is reduced to a few CPU instructions guard, which are predicted and pipelined by modern CPUs. When the guard condition stops being true, the VM can take a different code path or even drop back to interpreting mode. Based on statistics and program workload, the generated machine code can be different at different time. Many of these optimizations rely on the information gathered during the program execution and are not possible if you compile once whan you load the class.
This is why you need to warm-up the JVM and emulate realistic workload when you benchmark algorithms (skewed data can lead to unrealistic assesment of the optimizations). Other optimizations are lock elision, adaptive spin-locking, escape analysis and stack allocation, etc.
That said, HotSpot is only one of the VMs. JRockit, Azul, IBM's J9 and the Resettable RVM, - all have different performance profiles.
I realize the benefits of bytecode vs. native code (portability).
But say you always know that your code will run on a x86 architecture, why not then compile for x86 and get the performance benefit?
Note that I am assuming there is a performance gain to native code compilation. Some folks have answered that there could in fact be no gain which is news to me..
Because the performance gain (if any) is not worth the trouble.
Also, garbage collection is very important for performance. Chances are that the GC of the JVM is better than the one embedded in the compiled executable, say with GCJ.
And just in time compilation can even result in better performance because the JIT has more information are run-time available to optimize the compilation than the compiler at compile-time. See the wikipedia page on JIT.
"Solaris" is an operating system, not a CPU architecture. The JVM installed on the actual machine will compile to the native CPU instructions. Solaris could be SPARC, x86, or x86-64 architecture.
Also, the JIT compiler can make processor-specific optimisations depending on which actual CPU family you have. For example, different instruction sequences are faster on Intel CPUs than on AMD CPUs, and a JIT compiler for your exact platform can take advantage of this information to produce highly optimised code.
The bytecode runs in a Java Virtual Machine that is compiled for (example) Solaris. It will be optimised like heck for that operating system.
In real-world cases, you see often see equal or better performance from Java code at runtime, by virtue of building on the virtual machine's code for things like memory management - that code will have been evolving and maturing for years.
There's more benefits to building for the JVM than just portability - for example, every time a new JVM is released your compiled bytecode gets any optimisations, algorithmic improvements etc. that come from the best in the business. On the other hand, once you've compiled your C code, that's it.
Because with Just-In-Time compilation, there is trivial performance benefit.
Actually, many things JIT can actually do faster.
It's already will be compiled by JIT into Solaris native code, after run. You can't receive any other benefits if you compile it before uploading at target site.
You may, or may not get a performance benefit. But more likely you would get a performance penalty: JIT optimization is not possible with static compilation, so the performance would be only as good as the compiler can make it "blindfolded" (without actually profiling the program and optimizing it accordingly, which is what JIT compilers such as HotSpot does).
It's intuitively quite surprising how cheap (resource-wise) compiling is, and how much can be automatically optimized by just observing the running program. Black magic, but good for us :-)
All this talk of JITs is about seven years out of date BTW. The technology concerned now is called HotSpot and it isn't just a JIT.
"why not then compile for x86"
Because then you can't take advantage of the specific features of the particular cpu it gets run on. In particular, if we are to read "compile for x86" as "produce native code that can run on a 386 and its descendants", then the resulting code can't rely on even something as old as the mmx instructions.
As such, the end result is that you need to compile for every exact architecture it'll run on (what about those that does not exist yet), and have the installer select which executable to put into place. Or, I hear the intel C++ compiler will produce several versions of the same function, differing only on cpu features used, and pick the right one at run-time based on what the CPU reports as available.
On the other hand, you can view bytecode as a "half-compiled" source, similar to an intermediate format a native compiler will (unless asked) not actually write to disk. The runtime environment can then do the final compilation, knowing exactly what architecture will be used. This is the given reason why some C#/.net code could slightly outperform c++ code on some cpu-intensive tasks in some benchmarks a while ago.
The "final compilation" of bytecode can also make additional optimalization assumptions that are (from a static compilation perspective) distinctly unsafe*, and just recompile if those assumptions are found wrong later.
I guess because JIT (just in time) compilation is very advanced.