If JVM generates machine code, then where are the code files? - java

I read some materials about JVM and bytecode. I think it would be more efficient if JVM can translate bytecode into platform dependent machine code in the first time run, instead of interpreting them all the time.
However, I could not find such files in my project folders. There are only bin and src folders, which contain *.class bytecodes and *.java source codes.
So my questions are:
If Java interprets bytecode all the time, why not translate bytecode to machine code after the first run?
If they do generate machine code, where are the files?

Not an option since the environment can change between runs (e.g. upgrade of JVM)
In memory (or serialized to disk when needed)

If Java interprets bytecode all the time, why not translate bytecode
to machine code after the first run?
There are pros and cons to both ahead of time (AOT) and just in time (JIT) compilation.
The main advantage of AOT is that the compiler is generally allowed to take longer, so it can perform more sophisticated analysis and optimization. Another advantage is that the compiler doesn't have to be present at runtime on the target machine. The disadvantages are everything else.
The main advantage of JIT is that the compiler is able to make optimizations based on information known only at runtime. In fact, it is even possible to unoptimize and reoptimize code when conditions change. Furthermore, the JIT doesn't have to waste time optimizing code that is never or rarely run, unlike the AOT compiler.
Some languages are designed to favor one approach over the other. For example, C/C++ are designed for AOT, while Java is designed for JIT (though it can be compiled AOT with some restrictions). For example, Java has a heavy emphasis on virtual getters and setters, possibly for classes not loaded until runtime. But the JIT can see and inline these functions at runtime. By contrast, if you used virtual methods for every field access in C++, you'd pay a huge performance penalty.

It doesn't interpret code all the time. Interpreted code is translated into byte code after some time. You can tweak this "time" using -XX:CompileThreshold= (default is 10000) or you can turn off compilation completely.
In memory. There's a special area in memory called "Code cache". You can see how methods a compiled into the cache and how they are evicted from the cache using -XX:+PrintCompilation. The size of the cache is also configurable, see -XX:ReservedCodeCacheSize=.

Well, the JVM has preprocessed data but only for its own classes. Given the size of the JRE library and the fact that it usually doesn’t change, it’s a big win (you might look for files called classes.jsa).
However, even these files are not containing native code but only easier-to-process byte code.
The big point about code generation in Hot Spot JVMs is that they don’t compile code on a class or method basis as you seem to think. These JVMs compile code fragments spanning multiple interacting methods as the interaction is discovered during the self-profiling. These code blocks may span methods from the JRE, the extension libraries, 3rd party libraries in your class path and your application classes and hence are only valid for this specific combination.
During the compilation the information gathered about your program’s behavior will be used, e.g. code paths not taken might be elided and conditionals might be asserted to evaluate to a certain result as they did in previous evaluations. This yields to a high performance but it might happen that the JVM has to drop the code even during the same execution when one of the assertions does not hold anymore, e.g. the program might take a code path it didn’t before or a new class has been loaded into the JVM which extends a class whose code has been optimized as-if having no subclasses, etc.
So if optimized and compiled code might be rendered obsolete even within the same environment, it is even much likelier to be obsolete in the next execution. In the end, the JVM would have to check whether the old code is still appropriate which might turn out to be even costlier than simply gathering the new environment’s data and program behavior.

Related

The compilation and execution of a java program?

I am a beginner in java programming course and so far this is what I have understood about the whole java program being compiled and executed. Stating in brief:-
1) Source code (.java) file is converted into bytecode(.class) (which is an intermediate code) by the java compiler.
2) This bytecode(.class) file is platform independent so wooosh....I can copy it and take it to a different platform machine which has JVM.
3) When I run the bytecode The JVM which is a part of JRE first verifies the
bytecode, calls out JIT which at runtime makes the optimizations since
it
has access to dynamic
runtime information.
4) And finally JVM interprets the intermediate code into a
series of machine instructions for the processor to execute. (A processor can't execute the bytecode directly since it is not in native code)
Is my understanding correct? Anything that needs to be added or corrected?
Taking each of your points in turn:
1) This is correct. Java source is compiled by javac (although other tools could do the same thing) and class files are generated.
2) Again, correct. Class files contain platform-neutral bytecodes. These are loosely an instruction set for a 'virtual' machine (i.e. the JVM). This is how Java implements the "write once, run anywhere" idea it's had since it was launched.
3) Partially correct. When the JVM needs to load a class it runs a four-phase verification on the bytecodes of that class to ensure that the format of the bytecodes is legal in terms of the JVM. This is to prevent bytecode sequences being generated that could potentially subvert the JVM (i.e. virus-like behaviour). The JVM does not, however, run the JIT at this point. When bytecodes are executed they start in interpreted mode. Each bytecode is converted on the fly to the required native instructions and OS system calls.
4) This is sort of wrong when combined with point 3.
Here's the process explained briefly:
As the JVM interprets the bytecodes of the application it also profiles which groups of bytecodes are being run frequently. If you have a loop that repeatedly calls a method the JVM will notice this and identify that this is a hotspot in your code (hence the name of the Oracle JVM). Once a method has been called enough times (which is tunable), the JVM will call the Just In Time (JIT) compiler to generate native instructions for that method. When the method is called again the native code is used, eliminating the need for interpreting and thus improving the speed of the application. This profiling phase is what leads to the 'warm-up' behaviour of a Java application where relevant sections of the code are gradually compiled into native instructions.
For OpenJDK based JVMs there are two JIT compilers, C1 and C2 (sometimes called client and server). The C1 JIT will warm-up more quickly but have a lower optimum level of performance. C2 warms-up more slowly but applies a greater level of optimisation to the code, giving a higher overall performance level.
The JVM can also throw away compiled code, either because it hasn't been used for a long time (like in a cache) or an assumption that the JIT made (called a speculative optimisation) turns out to be wrong. This is called a deopt and results in the JVM going back to interpreted mode, reprofiling the code and potentially recompiling it with the JIT.
First and foremost, java is only a programming language. That means you could (theoretically) run a compiler to generate a native binary instad of this bytecode. (See: Compiling a java program into an executable )
The other thing I should mention are Java Processors which are able to execute java bytecode directly... because its their native instruction set (See: https://en.wikipedia.org/wiki/Java_processor )

Square Root Method Takes Long Time to Execute First Try

I have a Java program where I'm trying new algorithms for square rooting and comparing them to the native Math.sqrt(a) method in Java. What I find weird is that the first time the .sqrt(a) method is called in the program, it takes at least 50,000ns whereas the times after, it only takes a few thousand. Does this have to do with how the system time is being calculated during the first few moments of running the program, or are the methods actually executing slower for some reason?
There are significant overheads in starting a Java application.
The JVM (the java executable) needs to be loaded.
The JVM needs to bootstrap:
creating and initializing the heap
classloading various system classes
and so on
Your classes need to be classloaded. This typically triggers further classloading of system classes, third party libraries and so on.
After a bit ... the JIT compiler starts to compile methods to native code.
While this is happening, the GC may run to clean up garbage that was created by JIT compilation and classloading.
All of this adds up to significant startup costs ... compared to (say) an application that is implemented in C or C++, compiled and linked to an executable.
However, this should bot be relevant to developing and benchmarking algorithms in Java. You simply need to do the benchmarking in a way that eliminates the "JVM warmup" overheads. For more details:
How do I write a correct micro-benchmark in Java?
#user7859067 comments:
need very awesome performance, go native.
I assume you mean ... implement the code as a Java native method. That doesn't help with JVM bootstrap overheads. And "going native" isn't always a win, since there are overheads when calling into custom native method from Java.
However, it is a fact that the implementations of many Math functions are in native code ... for speed. (The JIT compiler has tweaks to generate special fast calls to "intrinsic" native methods, but (AFAIK) you can't use this yourself without modifying the JRE codebase ...) Anyhow, if you compare your (pure Java) implementation's performance against the standard (native) Math.sqrt method, you are comparing apples and oranges.

Will Jvm make compiled byte code into executable file

I read the following articles:
http://searchcio-midmarket.techtarget.com/definition/just-in-time-compiler
http://javarevisited.blogspot.in/2011/12/jre-jvm-jdk-jit-in-java-programming.html
I am now really interested in knowing what will happen when I run a class. JIT compiles the byte code again and then ???
Will this compiled code be converted into an .exe by the JVM?
Like the others said: JIT does not mean the code is compiled to a binary executable (.exe). However, an interesting application that you may consider is Excelsior JET.
I haven't read too much about it and haven't used it, so I don't know exactly how it works... yet. But according to its webpage, it's an AOT (Ahead-Of-Time) compiler. This means that it will compile your .class files to a system-dependent binary file.
You should give it a try, see how it performs. According to the website, you get a free license if your project is non-comercial in nature.
Java Compiler compiles plain-text Java code into JVM bytecode. http://en.wikipedia.org/wiki/Java_bytecode
JVM has a HotSpot optimizer that evaluates the code for "Hot Spots" (basically, code that will be used the most) and pays special attention to those spots when using CPU cache. It may also flag those spots for the JVM to recompile to a native language (like Assembly) and this is called JIT.
JVM is essentially a virtual machine that runs a JVM bytecode interpreter.
There is never a direct .exe. It is a Windows/C/C++ thing, mostly.
No, the code is NOT "compiled" into an "exe"
the program is stored in memory as byte code, but the code segment currently running is preparatively compiled to physical machine code in order to run faster.
I'll go out on limb and say that JIT is a type of interpreter, designed to improve the speed of commonly used branches of code (at least that was my interpretation 10 years ago)
JIT compilers represent a hybrid approach, with translation occurring continuously, as with interpreters, but with caching of translated code to minimize performance degradation. It also offers other advantages over statically compiled code at development time, such as handling of late-bound data types and the ability to enforce security guarantees.

Why doesn't the JVM cache JIT compiled code?

The canonical JVM implementation from Sun applies some pretty sophisticated optimization to bytecode to obtain near-native execution speeds after the code has been run a few times.
The question is, why isn't this compiled code cached to disk for use during subsequent uses of the same function/class?
As it stands, every time a program is executed, the JIT compiler kicks in afresh, rather than using a pre-compiled version of the code. Wouldn't adding this feature add a significant boost to the initial run time of the program, when the bytecode is essentially being interpreted?
Without resorting to cut'n'paste of the link that #MYYN posted, I suspect this is because the optimisations that the JVM performs are not static, but rather dynamic, based on the data patterns as well as code patterns. It's likely that these data patterns will change during the application's lifetime, rendering the cached optimisations less than optimal.
So you'd need a mechanism to establish whether than saved optimisations were still optimal, at which point you might as well just re-optimise on the fly.
Oracle's JVM is indeed documented to do so -- quoting Oracle,
the compiler can take advantage of
Oracle JVM's class resolution model to
optionally persist compiled Java
methods across database calls,
sessions, or instances. Such
persistence avoids the overhead of
unnecessary recompilations across
sessions or instances, when it is
known that semantically the Java code
has not changed.
I don't know why all sophisticated VM implementations don't offer similar options.
An updated to the existing answers - Java 8 has a JEP dedicated to solving this:
=> JEP 145: Cache Compiled Code. New link.
At a very high level, its stated goal is:
Save and reuse compiled native code from previous runs in order to
improve the startup time of large Java applications.
Hope this helps.
Excelsior JET has a caching JIT compiler since version 2.0, released back in 2001. Moreover, its AOT compiler may recompile the cache into a single DLL/shared object using all optimizations.
I do not know the actual reasons, not being in any way involved in the JVM implementation, but I can think of some plausible ones:
The idea of Java is to be a write-once-run-anywhere language, and putting precompiled stuff into the class file is kind of violating that (only "kind of" because of course the actual byte code would still be there)
It would increase the class file sizes because you would have the same code there multiple times, especially if you happen to run the same program under multiple different JVMs (which is not really uncommon, when you consider different versions to be different JVMs, which you really have to do)
The class files themselves might not be writable (though it would be pretty easy to check for that)
The JVM optimizations are partially based on run-time information and on other runs they might not be as applicable (though they should still provide some benefit)
But I really am guessing, and as you can see, I don't really think any of my reasons are actual show-stoppers. I figure Sun just don't consider this support as a priority, and maybe my first reason is close to the truth, as doing this habitually might also lead people into thinking that Java class files really need a separate version for each VM instead of being cross-platform.
My preferred way would actually be to have a separate bytecode-to-native translator that you could use to do something like this explicitly beforehand, creating class files that are explicitly built for a specific VM, with possibly the original bytecode in them so that you can run with different VMs too. But that probably comes from my experience: I've been mostly doing Java ME, where it really hurts that the Java compiler isn't smarter about compilation.

Is it possible to write a decent java optimizer if information is lost in the translation to bytecode?

It occurred to me that when you write a C program, the compiler knows the source and destination platform (for lack of a better term) and can optimize to the machine it is building code for.
But in java the best the compiler can do is optimize to the bytecode, which may be great, but there's still a layer in the jvm that has to interpret the bytecode, and the farther the bytecode is away translation-wise from the final machine architecture, the more work has to be done to make it go.
It seems to me that a bytecode optimizer wouldn't be nearly as good because it has lost all the semantic information available from the original source code (which may already have been butchered by the java compiler's optimizer.)
So is it even possible to ever approach the efficiency of C with a java compiler?
Actually, a byte-code JIT compiler can exceed the performance of statically compiled languages in many instances because it can evaluate the byte code in real time and in the actual execution context. So the apps performance increases as it continues to run.
What Kevin said. Also, the bytecode optimizer (JIT) can also take advantage of runtime information to perform better optimizations. For instance, it knows what code is executing more (Hot-spots) so it doesn't spend time optimizing code that rarely executes. It can do most of the stuff that profile-guided optimization gives you (branch prediction, etc), but on-the-fly for whatever the target procesor is. This is why the JVM usually needs to "warm up" before it reaches best performance.
In theory both optimizers should behave 'identically' as it is standard practice for c/c++ compilers to perform the optimization on the generated assembly and not the source code so you've already lost any semantic information.
If you read the byte code, you may see that the compiler doesn't optimise the code very well. However the JIT can optimise the code so this really doesn't matter.
Say you compile the code on an x86 machine and new architecture comes along, lets call it x64, the same Java binary can take advantage of the new features of that architecture even though it might not have existed when the code was compiled. It means you can take old distributions of libraries and take advantage of the latest hardware specific optimisations. You cannot do this with C/C++.
Java can optimise inline calls for virtual methods. Say you have a virtual method with many different possible implementations. However, say one or two implementations are called most of the time in reality. The JIT can detect this and inline up to two method implementations but still behave correctly if you happen to call another implementation. You cannot do this with C/C++
Java 7 supports escape analysis for locked/synchronised objects, it can detect that an object is only used in a local context and drop synchronization for that object.
In the current versions of Java, it can detect if two consecutive methods lock the same object and keep the lock between them (rather than release and re-acquire the lock)
You cannot do this with C/C++ because there isn't a language level understanding of locking.

Categories