JVM : Bytecode Confusion and JIT - java

So here's a question that came to my mind when I was studying Java. We know (please correct me if I am wrong!) that the Bytecode runs atop JVM. So does the JVM convert the Bytecode to the native machine code it's(JVM) written for? If that is so, isn't it less secure?
Also what exactly is a just-in-time compiler? It compiles when it is asked to do so...I studied some resources, but still didn't get the just-in-time part clear.
Thanks for any help !

So does the JVM convert the Bytecode to the native machine code
it's(JVM) written for?
No, not necessarily. Though, nowadays it is state of the art to do so by default.
If that is so, isn't it less secure?
Less secure than what?
Just because one can do insecure operations in machine code (like dereferencing an unitialized pointer or accessing unallocated memory) does not mean the JIT generates such insecure code.
Also what exactly is a just-in-time compiler?
It's that part of the JVM that converts bytecode to native machine code.
The name "just in time" means that the code is compiled (in a separate thread) while it is executed. Once completely compiled, the JVM takes notice that certain methods are compiled and can be invoked on the machine level.

So does the JVM convert the Byte-code to the native machine code it's(JVM) written for?
All JVM implementation I have seen so far are converting byte-code to the native machine code VM is written for. Although I can't see how and why doing otherwise would be useful.
Also what exactly is a just-in-time compiler?
It's simply process of converting byte code to native code in run-time. Although for performance improvement it's being done by VM in parallel with your program execution. It also usually including compiled native code caching and some other techniques of performance improvement.
If that is so, isn't it less secure?
Well, to some very small degree it is. VERY VERY SMALL DEGREE. There're some security-related modifications to different OSes eliminating JIT compilation. For example, grsecurity Linux kernel patch is in fact doing JIT impossible (actually doing impossible to execute JIT-compiled code). And another fact is that similar memory protection mechanism (writable memory pages can't be executable) is implemented in iOS which makes impossible to do any JIT compilation in user mode.

Related

Does dynamically generated java byte-code need any optimization?

I did some java byte code generation using ASM.
By walking through some sort of AST of some kind of small DSL in visitor pattern.
And I'm worrying about the generated byte code is too 'straightforward', that is, without any 'compile-time optimization'.
Although in my case, that could be ok if the generated byte code is not optimized, still I can't help to ask: is there a need for those projects which generate byte code at runtime to do bytecode optimization?
I know the fact that for jvm, most of the 'optimization' work is done while the program is running, by jit compilation. So the bytecode optimization at compile time may effect little.
But, really? Is it absolutely meaningless to do bytecode optimization for the bytecode generated on the fly? Is there any one to share some experience about the difference, mainly in runtime performance, between bytecodes with and without any form of optimization?
I know at least one JVM based language, which share remain nameless, is slow as hell. It could have used some compile time optimization.
Javac and JVM are analyzing roughly the same programming model, therefore any optimization techniques that Javac can employ can be employed by JVM too. Then there is not much point for Javac to duplicate the work. Actually it's probably preferred that Javac leaves as much structure of the source code as possible so JVM can better reason about the code.
That doesn't apply if the source language is not a Java-ish language.
Think about this, CPU does a lot of wonderful optimization too, so why does JVM need to do any optimization? Why not leave it all to CPU. Because CPU and JVM are analyzing very different code. CPU is analyzing an arbitrary sequence of machine instructions(though it can make assumptions based on common behaviors of high level languages). JVM is analyzing a very specific, much higher level language, JVM can reason and transform the code based on knowledges that are almost impossible for CPU to discover from the machine instructions.
Back to your case, it is possible that, you (as the compiler) knows a lot better about your even-higher-level source language, you can perform transformations that are impossible by JVM.
No it is not necessary.
If you look at Javac's output, it does virtually no compile time optimization at all. And thanks to Hotspot's JIT, it is difficult to tell what impact changing the bytecode will have on optimization anyway. It's best not to worry about such things unless you can prove there's a real bottleneck and have the time to investigate it.

Will Jvm make compiled byte code into executable file

I read the following articles:
http://searchcio-midmarket.techtarget.com/definition/just-in-time-compiler
http://javarevisited.blogspot.in/2011/12/jre-jvm-jdk-jit-in-java-programming.html
I am now really interested in knowing what will happen when I run a class. JIT compiles the byte code again and then ???
Will this compiled code be converted into an .exe by the JVM?
Like the others said: JIT does not mean the code is compiled to a binary executable (.exe). However, an interesting application that you may consider is Excelsior JET.
I haven't read too much about it and haven't used it, so I don't know exactly how it works... yet. But according to its webpage, it's an AOT (Ahead-Of-Time) compiler. This means that it will compile your .class files to a system-dependent binary file.
You should give it a try, see how it performs. According to the website, you get a free license if your project is non-comercial in nature.
Java Compiler compiles plain-text Java code into JVM bytecode. http://en.wikipedia.org/wiki/Java_bytecode
JVM has a HotSpot optimizer that evaluates the code for "Hot Spots" (basically, code that will be used the most) and pays special attention to those spots when using CPU cache. It may also flag those spots for the JVM to recompile to a native language (like Assembly) and this is called JIT.
JVM is essentially a virtual machine that runs a JVM bytecode interpreter.
There is never a direct .exe. It is a Windows/C/C++ thing, mostly.
No, the code is NOT "compiled" into an "exe"
the program is stored in memory as byte code, but the code segment currently running is preparatively compiled to physical machine code in order to run faster.
I'll go out on limb and say that JIT is a type of interpreter, designed to improve the speed of commonly used branches of code (at least that was my interpretation 10 years ago)
JIT compilers represent a hybrid approach, with translation occurring continuously, as with interpreters, but with caching of translated code to minimize performance degradation. It also offers other advantages over statically compiled code at development time, such as handling of late-bound data types and the ability to enforce security guarantees.

Confused from Wiki: C# and Java are interpreted?

On the EN Wiki I read that both C# and Java are interpreted languages, however at least for C# I think it is not true.
Many interpreted languages are first compiled to some form of virtual
machine code, which is then either interpreted or compiled at runtime
to native code.
From my understanding, it is compiled into CIL and when run, using JIT its compiled to target platform. I have also read that JIT is an interpreter, is that really so?
Or are they called interpreted as they are using intermediate code? I do not understand it.
Thanks
JIT is a form of compilation to native (machine) code. Typically (but not as a necessity), implementations of either the CLI and JVM are compiled in two steps:
the language compiler compiles code to something intermediate (IL/bytecode)
the JIT compiles that to native/machine code at runtime
However, interpreters for both do exist. Micro Framework operates as an IL interpreter, for example. Equally, tools like (looking .NET here) NGEN and "AOT" (mono) allow compilation to native/machine code at the start.
They are considered JIT languages which is different from interpreting. JIT simply compiles to native code when needed during execution. The common strategy is to compile into an intermediate representation (bytecode) beforehand which makes the JIT faster.
However, there is nothing that prevents them from being interpreted, or even statically compiled. Languages are simply languages - how they are executed is irrelevant from a language perspective.
On the EN Wiki I read that both C# and Java are interpreted languages
Can you pls provide the link?
May be the interpreted word means different here. It perhaps means that these languages are first interpreted to convert source code into platform-independent code.(VM Specific)
are they called interpreted as they are using intermediate code
I too think so.
I have also read that JIT is an interpreter
JIT is a compiler. See this
Is something "interpreter" or not depends on context of discussion.
From purely abstract view interpreter can be defined as any intermediate program present in runtime which dynamically translates program code written in one language to a target code of hardware/software of other language. Think about runing java bytecode on x86 hardware, or running Python on CLR VM what exactly IronPython is. In this view every virtual machine is an interpreter of some kind. As it is program present in runtime it clearly differs from static compilers or hardware implemented VM-s.
Now there are many different ways to achieve this functionality where accent is on "dynamically" and "present in runtime".
In discussions where implementation of VM matters, people make clear distinction between "classical" interpreter and JIT-ed one. Classical interpreter is something which for every instruction of hosted program emits routine of target code. This design is simple to build, but hard to optimize. JIT-ed design reads bunch of instruction of original code, and then translates all those instructions to a one native compiled routine. So it "interprets" faster. It is like micro static compiler within VM. There are many different ways to accomplish behavior labeled as JIT, and then there are other approaches like tracing compilers.
Modern VM's like CLR, HotSpot and J9 JVM's are even more complex than to be tagged with simple labels as JIT or Interpreter. They can be at a same time static compilers (AOT execution), classical interpreters and JIT-ed VMs.
For example CLR can compile code Ahead-Of-Time (static compiler), and store native code as bunch of more or less excutable files on disk to be used for faster future startups of hosted program. I believe "ngen" is AOT process used in windows for this functionality. If AOT is not used CLR behaves as JIT VM.
J9 and HotSpot are able to switch in runtime between purely interpreted execution or JIT-ed on depending of code analysis and current load. So it's is quite gray area. J9 even has AOT functionality similar to CLR.
Some other VMs like Maxine JVM or PyPy are socalled "metacircular" VM. This means they are (mostly) implemented in a same language they host (Maxine is JVM written in Java). In order to provided good code they usually have some JIT like behavior implemented in host language which is than bootstrapped and optimized by a very low, close to machine, interpreter.
So actual definition of interpreter varies on context of discussion. When labels like JIT are used then there is clear accent of discussion to an implementation details of VM being discussed.

Why isn't more Java software compiled natively?

I realize the benefits of bytecode vs. native code (portability).
But say you always know that your code will run on a x86 architecture, why not then compile for x86 and get the performance benefit?
Note that I am assuming there is a performance gain to native code compilation. Some folks have answered that there could in fact be no gain which is news to me..
Because the performance gain (if any) is not worth the trouble.
Also, garbage collection is very important for performance. Chances are that the GC of the JVM is better than the one embedded in the compiled executable, say with GCJ.
And just in time compilation can even result in better performance because the JIT has more information are run-time available to optimize the compilation than the compiler at compile-time. See the wikipedia page on JIT.
"Solaris" is an operating system, not a CPU architecture. The JVM installed on the actual machine will compile to the native CPU instructions. Solaris could be SPARC, x86, or x86-64 architecture.
Also, the JIT compiler can make processor-specific optimisations depending on which actual CPU family you have. For example, different instruction sequences are faster on Intel CPUs than on AMD CPUs, and a JIT compiler for your exact platform can take advantage of this information to produce highly optimised code.
The bytecode runs in a Java Virtual Machine that is compiled for (example) Solaris. It will be optimised like heck for that operating system.
In real-world cases, you see often see equal or better performance from Java code at runtime, by virtue of building on the virtual machine's code for things like memory management - that code will have been evolving and maturing for years.
There's more benefits to building for the JVM than just portability - for example, every time a new JVM is released your compiled bytecode gets any optimisations, algorithmic improvements etc. that come from the best in the business. On the other hand, once you've compiled your C code, that's it.
Because with Just-In-Time compilation, there is trivial performance benefit.
Actually, many things JIT can actually do faster.
It's already will be compiled by JIT into Solaris native code, after run. You can't receive any other benefits if you compile it before uploading at target site.
You may, or may not get a performance benefit. But more likely you would get a performance penalty: JIT optimization is not possible with static compilation, so the performance would be only as good as the compiler can make it "blindfolded" (without actually profiling the program and optimizing it accordingly, which is what JIT compilers such as HotSpot does).
It's intuitively quite surprising how cheap (resource-wise) compiling is, and how much can be automatically optimized by just observing the running program. Black magic, but good for us :-)
All this talk of JITs is about seven years out of date BTW. The technology concerned now is called HotSpot and it isn't just a JIT.
"why not then compile for x86"
Because then you can't take advantage of the specific features of the particular cpu it gets run on. In particular, if we are to read "compile for x86" as "produce native code that can run on a 386 and its descendants", then the resulting code can't rely on even something as old as the mmx instructions.
As such, the end result is that you need to compile for every exact architecture it'll run on (what about those that does not exist yet), and have the installer select which executable to put into place. Or, I hear the intel C++ compiler will produce several versions of the same function, differing only on cpu features used, and pick the right one at run-time based on what the CPU reports as available.
On the other hand, you can view bytecode as a "half-compiled" source, similar to an intermediate format a native compiler will (unless asked) not actually write to disk. The runtime environment can then do the final compilation, knowing exactly what architecture will be used. This is the given reason why some C#/.net code could slightly outperform c++ code on some cpu-intensive tasks in some benchmarks a while ago.
The "final compilation" of bytecode can also make additional optimalization assumptions that are (from a static compilation perspective) distinctly unsafe*, and just recompile if those assumptions are found wrong later.
I guess because JIT (just in time) compilation is very advanced.

Is it possible to write a decent java optimizer if information is lost in the translation to bytecode?

It occurred to me that when you write a C program, the compiler knows the source and destination platform (for lack of a better term) and can optimize to the machine it is building code for.
But in java the best the compiler can do is optimize to the bytecode, which may be great, but there's still a layer in the jvm that has to interpret the bytecode, and the farther the bytecode is away translation-wise from the final machine architecture, the more work has to be done to make it go.
It seems to me that a bytecode optimizer wouldn't be nearly as good because it has lost all the semantic information available from the original source code (which may already have been butchered by the java compiler's optimizer.)
So is it even possible to ever approach the efficiency of C with a java compiler?
Actually, a byte-code JIT compiler can exceed the performance of statically compiled languages in many instances because it can evaluate the byte code in real time and in the actual execution context. So the apps performance increases as it continues to run.
What Kevin said. Also, the bytecode optimizer (JIT) can also take advantage of runtime information to perform better optimizations. For instance, it knows what code is executing more (Hot-spots) so it doesn't spend time optimizing code that rarely executes. It can do most of the stuff that profile-guided optimization gives you (branch prediction, etc), but on-the-fly for whatever the target procesor is. This is why the JVM usually needs to "warm up" before it reaches best performance.
In theory both optimizers should behave 'identically' as it is standard practice for c/c++ compilers to perform the optimization on the generated assembly and not the source code so you've already lost any semantic information.
If you read the byte code, you may see that the compiler doesn't optimise the code very well. However the JIT can optimise the code so this really doesn't matter.
Say you compile the code on an x86 machine and new architecture comes along, lets call it x64, the same Java binary can take advantage of the new features of that architecture even though it might not have existed when the code was compiled. It means you can take old distributions of libraries and take advantage of the latest hardware specific optimisations. You cannot do this with C/C++.
Java can optimise inline calls for virtual methods. Say you have a virtual method with many different possible implementations. However, say one or two implementations are called most of the time in reality. The JIT can detect this and inline up to two method implementations but still behave correctly if you happen to call another implementation. You cannot do this with C/C++
Java 7 supports escape analysis for locked/synchronised objects, it can detect that an object is only used in a local context and drop synchronization for that object.
In the current versions of Java, it can detect if two consecutive methods lock the same object and keep the lock between them (rather than release and re-acquire the lock)
You cannot do this with C/C++ because there isn't a language level understanding of locking.

Categories