I am messing around with a toy interpreter in Java and I was considering trying to write a simple compiler that can generate bytecode for the Java Virtual Machine. Which got me thinking, how much optimization needs to be done by compilers that target virtual machines such as JVM and CLI?
Do Just In Time (JIT) compilers do constant folding, peephole optimizations etc?
I'm just gonna add two links which explain Java's bytecode pretty well and some of the various optimization of the JVM during runtime.
Optimisation is what makes JVMs viable as environments for long running applications, you can bet that SUN, IBM and friends are doing their best to ensure they can optimise your bytecode and JIT-compiled code in an efficient a manner as possible.
With that being said, if you think you can pre-optimise your bytecode then it probably won't do much harm.
It is worth being aware, however, that JVMs can tend towards performing better (and not crashing) when presented with just the sort of bytecode the Java compiler tends to construct. It is not unknown for optimisations to be missed or even for the JVM to crash when permutations of bytecode occur that are correct but unlike what would be produced by javac. Hopefully that sort of thing is more in the past now, but may be something to be aware of.
Optimising bytecode is probably an oxymoron in most cases
I don't think that's true. Optimizations like hoisting loop invariants and propagating constants can never hurt, even if the JVM is smart enough to do them on its own, by simple virtue of making the code do less work.
Obfuscators such as ProGuard will perform many static optimisations on your bytecode for you.
The HotSpot compiler will optimize your code at runtime better than is possible at compile-time - it has more information to work with, after all. The only time you should be optimizing the bytecode instead of just your algorithm is when you are targeting mobile devices, such as the Blackberry, where the JVM for that platform is not powerful enough to optimize code at runtime and just executes the bytecode.
Optimising bytecode is probably an oxymoron in most cases. Unless you control the VM, you have no idea what it does to speed up code execution, if anything. The compiler would need to know the details of the VM in order to generate optimised code.
Note to Aseraphim:
It can also be useful to optimise bytecode for non-embedded applications in some limited cases:
When delivering code over the wire, eg for WebStart apps, to minimise deliverable/cache size and because you don't necessarily know the capability/speed of the client.
For code that you know is performance critical and used at start-up before (say) HotSpot has had time to gather any stats.
Again, the transformations that a good optimiser/obfuscator performs can be very helpful.
Related
I did some java byte code generation using ASM.
By walking through some sort of AST of some kind of small DSL in visitor pattern.
And I'm worrying about the generated byte code is too 'straightforward', that is, without any 'compile-time optimization'.
Although in my case, that could be ok if the generated byte code is not optimized, still I can't help to ask: is there a need for those projects which generate byte code at runtime to do bytecode optimization?
I know the fact that for jvm, most of the 'optimization' work is done while the program is running, by jit compilation. So the bytecode optimization at compile time may effect little.
But, really? Is it absolutely meaningless to do bytecode optimization for the bytecode generated on the fly? Is there any one to share some experience about the difference, mainly in runtime performance, between bytecodes with and without any form of optimization?
I know at least one JVM based language, which share remain nameless, is slow as hell. It could have used some compile time optimization.
Javac and JVM are analyzing roughly the same programming model, therefore any optimization techniques that Javac can employ can be employed by JVM too. Then there is not much point for Javac to duplicate the work. Actually it's probably preferred that Javac leaves as much structure of the source code as possible so JVM can better reason about the code.
That doesn't apply if the source language is not a Java-ish language.
Think about this, CPU does a lot of wonderful optimization too, so why does JVM need to do any optimization? Why not leave it all to CPU. Because CPU and JVM are analyzing very different code. CPU is analyzing an arbitrary sequence of machine instructions(though it can make assumptions based on common behaviors of high level languages). JVM is analyzing a very specific, much higher level language, JVM can reason and transform the code based on knowledges that are almost impossible for CPU to discover from the machine instructions.
Back to your case, it is possible that, you (as the compiler) knows a lot better about your even-higher-level source language, you can perform transformations that are impossible by JVM.
No it is not necessary.
If you look at Javac's output, it does virtually no compile time optimization at all. And thanks to Hotspot's JIT, it is difficult to tell what impact changing the bytecode will have on optimization anyway. It's best not to worry about such things unless you can prove there's a real bottleneck and have the time to investigate it.
I would like to know how to check if the JIT compiler is turned off. I have the following code which is meant to turn the JIT compiler off.The problem is, I am not sure if it is actually doing that. So I was wondering if there is a way of checking if the JIT is off.
I looked at the Compiler class but there isn't any method like isDisabled/enabled().
Code:
Compiler.disable();
Any help or direction will be highly appreciated.
(Not a direct answer to your question since it seems your were trying to turn off the JIT compiler programmatically, but based on your comment, this might be of interest.)
If you want to turn off the JIT compiler on a Sun/Oracle JVM, you should try the -Xint option:
-Xint
Operate in interpreted-only mode. Compilation to native code is disabled, and all bytecodes are executed by the interpreter. The
performance benefits offered by the Java HotSpot Client VM's adaptive
compiler will not be present in this mode.
I don't believe you can turn the JIT off at runtime.
If you want to seriously benchmark a Java program, you should definitely be ignoring the first few runs. Getting reliable benchmarks in Java is an extremely tricky business, best left to people much smarter than you or I.
I recommend using Caliper, which is used internally at Google for microbenchmarking and is plenty smart about warming up the JIT and stuff. In particular, look at the example here, which shows how to measure the efficiency of an algorithm for different input sizes.
In the article Performance Features and Tools.
The JIT compiler was first made available as a performance update in
the Java Development Kit (JDK) 1.1.6 software release and is now a
standard tool invoked whenever you use the java interpreter command in
the Java 2 platform release.
You can disable the JIT compiler using
the -Djava.compiler=NONE option to the Java VM.
So, you can deduce that when the variable is not set, or set to something other than NONE, then the JIT is enabled.
IBM JVMs definitely support the Java interface java/lang/Compiler.disable() and .enable() which was introduced in Java 5, I believe. That includes WebSphere Real Time (which is a JVM designed to provide more predictable performance) as well as our "standard" JVMs. If you call disable(), it will prevent JIT compilations until you call enable().
I work for IBM on the JIT compiler team. We don't usually recommend people to use this interface, because interfering with JIT compilation heuristics is generally not a good idea, but there are reasonable real-time scenarios where you would use it.
You can printout methods when they get compiled, with `-XX:+PrintCompilation if your method isn't printed out or suddenly gets faster after it is print out, you can see the likely cause.
According to the information I could gather on .NET and Java execution environment, the current state of affairs is follows:
Modern Java VM are capable of performing continuous recompilation, which combined with profiling can yield great performance improvements. Older JVMs employed JIT.
More information in this article:
http://www.ibm.com/developerworks/library/j-jtp12214/ and especially: Java theory and practice: Dynamic compilation and performance measurement
.NET uses JIT or NGEN to generate native code, but once the native code is generated, no further (runtime) optimizations are performed.
Benchmarks aside and with no intention to escalate holy wars, does this mean that Java Hotspot VM is one generation ahead of .Net. Will these technologies employed at Java VM eventually find its way into .NET runtime?
They follow two different strategies. I do not think one is better than the other.
.NET does not interpret bytecode, so it has to JIT everything as is gets executed and therefore cannot optimise heavily due to time constraints. If you need heavy optimizations in some part of the code, you can always NGEN it manually, or do a fast but unsafe implementation. Furthermore, calling native code is easy. The approach here seems to be getting the runtime good enough and manually optimise bottlenecks.
Modern JVMs will usually interpret most of the code, and then do an optimized compilation of the bottlenecks. This usually gets better results than straight JIT'ing, but if you need more, you don't have unsafe in Java, and calling native code is not nice. So the approach here is to do as much automatic optimising as possible, because the other options are not that good.
In reality Java applications tend to perform slightly better in time and worse in space when compared to .NET.
I've never benchmarked the two to compare, and I'm more familiar with the Sun JVM, I can only speak in general terms about JITs.
There are always tradeoffs with optimizations, and not all optimizations work all the time. However, here are some modern JIT techniques. I think this can be the beginning of a good conversation if we stick to the technical stuff:
escape analysis
intrinsics
http://bugs.sun.com/view_bug.do?bug_id=6823354
http://weblog.ikvm.net/CommentView.aspx?guid=0404dd8a-88a8-4d62-9bcb-98324d57a2a9
tail-call optimization
on-stack replacement
lock coarsening
lock elision
multi-threaded garbage collection
low-pause garbage collection
polymorphic method call removal
fast heap allocation
There's also features that are helpful as far as good implementations of a VM go:
being able to pick between GC
implementations customization of each GC
heap allocation parameters (such as growth)
page locking
Based on these features and many more, we can compare VMs, and not just "Java" versus ".NET" but, say, Sun's JVM versus IBM's JVM versus .NET versus Mono.
For example, Sun's JVM doesn't do tail-call optimization, IIRC, but IBM's does.
Apparently someone was working on something similar for Rotor. I don't have access to IEEE so I can't read the abstract.
Dynamic recompilation and profile-guided optimisations for a .NET JIT compiler
Quote from Summary...
An evaluation of the framework using a
set of test programs shows that
performance can improve by a maximum
of 42.3% and by 9% on average. Our
results also show that the overheads
of collecting accurate profile
information through instrumentation to
an extent outweigh the benefits of
profile-guided optimisations in our
implementation, suggesting the need
for implementing techniques that can
reduce such overheads.
You may be interested in SPUR which is a Tracing JIT compiler. The focus is on javascript but it operates on CIL not the language itself. It is a research project based on Bartok not the standard .NET VM. The paper has some performance benchmarks showing 'it consistently performs faster than SPUR-CLR' which is the standard 3.5 CLR. There haven't been any announcements about it's future relating to the current VM however. Traces can cross method boundaries which is not something HotSpot does AFAIK, JVM tracing JITs are mentioned here.
I'd be hesitant to say the .NET VM is a generation behind especially when considering all the sub-systems, in particular generics. How the GC and DLR vs invokedynamic compare I'm unsure but there are lots of details about them at places like channel9.
I realize the benefits of bytecode vs. native code (portability).
But say you always know that your code will run on a x86 architecture, why not then compile for x86 and get the performance benefit?
Note that I am assuming there is a performance gain to native code compilation. Some folks have answered that there could in fact be no gain which is news to me..
Because the performance gain (if any) is not worth the trouble.
Also, garbage collection is very important for performance. Chances are that the GC of the JVM is better than the one embedded in the compiled executable, say with GCJ.
And just in time compilation can even result in better performance because the JIT has more information are run-time available to optimize the compilation than the compiler at compile-time. See the wikipedia page on JIT.
"Solaris" is an operating system, not a CPU architecture. The JVM installed on the actual machine will compile to the native CPU instructions. Solaris could be SPARC, x86, or x86-64 architecture.
Also, the JIT compiler can make processor-specific optimisations depending on which actual CPU family you have. For example, different instruction sequences are faster on Intel CPUs than on AMD CPUs, and a JIT compiler for your exact platform can take advantage of this information to produce highly optimised code.
The bytecode runs in a Java Virtual Machine that is compiled for (example) Solaris. It will be optimised like heck for that operating system.
In real-world cases, you see often see equal or better performance from Java code at runtime, by virtue of building on the virtual machine's code for things like memory management - that code will have been evolving and maturing for years.
There's more benefits to building for the JVM than just portability - for example, every time a new JVM is released your compiled bytecode gets any optimisations, algorithmic improvements etc. that come from the best in the business. On the other hand, once you've compiled your C code, that's it.
Because with Just-In-Time compilation, there is trivial performance benefit.
Actually, many things JIT can actually do faster.
It's already will be compiled by JIT into Solaris native code, after run. You can't receive any other benefits if you compile it before uploading at target site.
You may, or may not get a performance benefit. But more likely you would get a performance penalty: JIT optimization is not possible with static compilation, so the performance would be only as good as the compiler can make it "blindfolded" (without actually profiling the program and optimizing it accordingly, which is what JIT compilers such as HotSpot does).
It's intuitively quite surprising how cheap (resource-wise) compiling is, and how much can be automatically optimized by just observing the running program. Black magic, but good for us :-)
All this talk of JITs is about seven years out of date BTW. The technology concerned now is called HotSpot and it isn't just a JIT.
"why not then compile for x86"
Because then you can't take advantage of the specific features of the particular cpu it gets run on. In particular, if we are to read "compile for x86" as "produce native code that can run on a 386 and its descendants", then the resulting code can't rely on even something as old as the mmx instructions.
As such, the end result is that you need to compile for every exact architecture it'll run on (what about those that does not exist yet), and have the installer select which executable to put into place. Or, I hear the intel C++ compiler will produce several versions of the same function, differing only on cpu features used, and pick the right one at run-time based on what the CPU reports as available.
On the other hand, you can view bytecode as a "half-compiled" source, similar to an intermediate format a native compiler will (unless asked) not actually write to disk. The runtime environment can then do the final compilation, knowing exactly what architecture will be used. This is the given reason why some C#/.net code could slightly outperform c++ code on some cpu-intensive tasks in some benchmarks a while ago.
The "final compilation" of bytecode can also make additional optimalization assumptions that are (from a static compilation perspective) distinctly unsafe*, and just recompile if those assumptions are found wrong later.
I guess because JIT (just in time) compilation is very advanced.
It occurred to me that when you write a C program, the compiler knows the source and destination platform (for lack of a better term) and can optimize to the machine it is building code for.
But in java the best the compiler can do is optimize to the bytecode, which may be great, but there's still a layer in the jvm that has to interpret the bytecode, and the farther the bytecode is away translation-wise from the final machine architecture, the more work has to be done to make it go.
It seems to me that a bytecode optimizer wouldn't be nearly as good because it has lost all the semantic information available from the original source code (which may already have been butchered by the java compiler's optimizer.)
So is it even possible to ever approach the efficiency of C with a java compiler?
Actually, a byte-code JIT compiler can exceed the performance of statically compiled languages in many instances because it can evaluate the byte code in real time and in the actual execution context. So the apps performance increases as it continues to run.
What Kevin said. Also, the bytecode optimizer (JIT) can also take advantage of runtime information to perform better optimizations. For instance, it knows what code is executing more (Hot-spots) so it doesn't spend time optimizing code that rarely executes. It can do most of the stuff that profile-guided optimization gives you (branch prediction, etc), but on-the-fly for whatever the target procesor is. This is why the JVM usually needs to "warm up" before it reaches best performance.
In theory both optimizers should behave 'identically' as it is standard practice for c/c++ compilers to perform the optimization on the generated assembly and not the source code so you've already lost any semantic information.
If you read the byte code, you may see that the compiler doesn't optimise the code very well. However the JIT can optimise the code so this really doesn't matter.
Say you compile the code on an x86 machine and new architecture comes along, lets call it x64, the same Java binary can take advantage of the new features of that architecture even though it might not have existed when the code was compiled. It means you can take old distributions of libraries and take advantage of the latest hardware specific optimisations. You cannot do this with C/C++.
Java can optimise inline calls for virtual methods. Say you have a virtual method with many different possible implementations. However, say one or two implementations are called most of the time in reality. The JIT can detect this and inline up to two method implementations but still behave correctly if you happen to call another implementation. You cannot do this with C/C++
Java 7 supports escape analysis for locked/synchronised objects, it can detect that an object is only used in a local context and drop synchronization for that object.
In the current versions of Java, it can detect if two consecutive methods lock the same object and keep the lock between them (rather than release and re-acquire the lock)
You cannot do this with C/C++ because there isn't a language level understanding of locking.