static java bytecode optimizer (like proguard) with escape analysis? - java

Optimizations based on escape analysis is a planned feature for Proguard. In the meantime, are there any existing tools like proguard that already do optimizations which require escape analysis?

Yes, I think the Soot framework performs escape analysis.

What do you expect from escape analysis on compiler level? Java classes are more like object files in C - they are linked in the JVM, hence the escape analysis can be performed only on single-method level, which is of limited usability and will hamper the debugging (e.g. you will have lines of code through which you can not step).
In Java's design, the compiler is quite dumb - it checks for correctness (like Lint), but doesn't try to optimize. The smart pieces are put in the JVM - it uses multiple optimization techniques to yield well performing code on the current platform, under the current conditions. Since the JVM knows all the code that is currently loaded it can assume a lot more than the compiler and perform speculative optimizations which are reverted the moment the assumptions are invalidated. HotSpot JVM can replace code with more optimized version on the fly while the function is running (e.g. in the middle of a loop as the code gets 'hotter').
When not in debugger, variables with non-overlapping lifetimes are collapsed, invariants are hoisted out of loops, loops are unrolled, etc. All this happens in the JIT-ted code and is done dependent on how much time is spent in this function (it does not make much sense to spend time optimizing code that never runs). If we perform some of these optimizations upfront, the JIT will have less freedom and the overall result might be a net negative.
Another optimization is stack allocation of objects that do not escape the current method - this is done in certain cases, though I read a paper somewhere that the time to perform rigorous escape analysis vs the time gained by optimizations suggests that it's not worth it, so the current strategy is more heuristic.
Overall, the more information the JVM has about your original code, the better it can optimize it. And the optimizations the JVM does are constantly improving, hence I would think about compiled code optimizations only when speaking about very restricted and basic JVMs like mobile phones. In these cases you want to run your application through obfuscator anyway (to shorten class names, etc.)

Related

Why use JMH if you can switch off JIT?

I wonder why I should use JMH for benchmarking if I can switch off JIT?
Is JMH not suppressing optimizations which can be prevented by disabling JIT?
TL;DR; Assess the Formula 1 performance by riding a bycicle at the same track.
The question is very odd, especially if you ask yourself a simple follow-up question. What would be the point of running the benchmark in the conditions that are drastically different from your production environment? In other words, how would a knowledge gained running in interpreted mode apply to real world?
The issue is not black and white here: you need optimizations to happen as they happen in the real world, and you need them broken in some carefully selected places to make a good experimental setup. That's what JMH is doing: it provides the means for constructing the experimental setups. JMH samples explain the intricacies and scenarios quite well.
And, well, benchmarking is not about fighting the compiler only. Lots and lots of non-compiler (and non-JVM!) issues need to be addressed. Of course, it can be done by hand (JMH is not magical, it's just a tool that was also written by humans), but you will spend most of your time budget addressing simple issues, while having no time left to address the really important ones, specific to your experiment.
The JIT is not bulletproof and almighty. For instance, it will not kick in before a certain piece of code is run a certain number of times, or it will not kick in if a piece of bytecode is too large/too deeply buried/etc. Also, consider live instrumentation which, more often than not, prevents the JIT from operating at all (think profiling).
Therefore the interest remains in being able to either turn it off or on; if a piece of code is indeed time critical, you may want to know how it will perform depending on whether the JIT kicks in.
But those situations are very rare, and are becoming more and more rare as the JVM (therefore the JIT) evolves.

Compiling Scheme using Java

I was writing a Scheme interpreter (trying to be fully R5RS compatible) and it just struck me that compiling into VM opcodes would make it faster. (Correct me if I am wrong.) I can interpret the Scheme source code in the memory, but I am stuck at understanding code generation.
My question is: What patterns will be required to generate opcodes from a parse tree, for, say, the JVM or any other VM (or even a real machine)? And what, if any, will be the complications, advantages, or disadvantage of doing so?
For Scheme there will be two major complications related to JVM.
First, JVM does not support explicit tail calls annotations, therefore you won't be able to guarantee a proper tail recursion as required by R5RS (3.5) without resorting to an expensive mini-interpreter trick.
The second issue is with continuations support. JVM does not provide anything useful for implementing continuations, so again you're bound to use a mini-interpreter. I.e., each CPS trivial function should return a next closure, which will be then called by an infinite mini-interpreter loop.
But still there are many interesting optimisation possibilities. I'd recommend to take a look at Bigloo (there is a relatively fast JVM backend) and Kawa. For the general compilation techniques take a look at Scheme in 90 minutes.
And still, interpretation is a viable alternative to compilation (at least on JVM, due to its severe limitations and general inefficiency). See how SISC is implemented, it is quite an interesting and innovative approach.

Java, JIT and Garbage Collector efficiency

I want to know about the efficiency of Java and the advantages and disadvantages of Java Virtual Machine and Android.
Efficiency is the low use of memory, low use of the processor and fast execution.
Mobile devices are simpler than PC, then the apps need to be more efficient. Servers receive many connections and they need to be very efficient. Many mobile devices use Android and Java apps, and many servers use PHP.
Can Java and interpreted languages, such as Java Script, Python and PHP, be more efficient than C and C++?
JIT (just in time) advantages:
It can optimize better, because it knows the value of some variables and where it is used or changed.
It knows the processor and can optimize with processor specific instructions.
It is easier to transform functions into inline function.
It can remove known conditional tests and remove blocks that will not be run.
Java disadvantages:
When the app run for the first time, the app will be very slow, because the bytecodes will be interpreted and JIT compiler will do many analysis to find good optimizations. The apps cannot use the maximum of the hardware power. If an app is a game or a real-time app, if it be run for the first successfully and with no delay, but it uses the maximum of the hardware power, then the next time the app be run, it will not use the maximum of the hardware power due to optimizations. The problem is the app cannot be designed to use the maximum of the hardware power after the optimization, because it will be too slow on the first run, and will not continue to run.
Java checks if the array index is not out of bounds, and it checks if the pointers are not null. It will add several internal "if"s to generated code.
All objects use garbage collector, including objects that are very easy to manually delete.
All instances of objects are created with dynamic memory allocation, including objects that can easily use the stack. If a loop iteration begins creating an instance of a class and ends deleting the created object, dynamic memory allocation will be inefficient.
The garbage collector needs to stop the app while it cleans the memory and it is very undesired for games, GUI apps and real-time apps. Reference counting is slow and it cannot handle circular references. Multi-threaded garbage collector is slower and it needs more use of the CPU.
Can Java and interpreted languages, such as Java Script, Python and PHP, be more efficient than C and C++?
It's very difficult to get more efficient than the best C and C++ programs. There's a lot of C and C++ programs that are nowhere near as efficient as that though, and beating them with (modern) Java code is quite practical if you're any good.
I've also heard good things about the current best-of-breed Javascript engines, but I've never studied them in detail.
With Python and PHP (and many other languages besides) it's a bit different. Those languages are written in C, so it's obvious they cannot be more efficient than C (follows by construction). Yet it's much easier to write efficient code in them (i.e., that uses what is in-effect a very well-written C library) than it is to start from scratch. In particular, it reduces the number of defects per program. That's a very important metric in practice; anyone can produce fast code if it's allowed to be wrong.
In general, I advise not worrying about getting maximal efficiency. You run up against the law of diminishing returns. Instead, use sensible overall algorithms (or, as a friend of mine once said to me, “look after the big O()s and let the constant factors look after themselves”) and focus on the question of whether the program is good enough in practice. Once it is, stop fiddling around and ship it!
Let's pick apart your claimed disadvantages:
When the app run for the first time, the app will be very slow, because the bytecodes will be interpreted and JIT compiler will do many analysis to find good optimizations. The apps cannot use the maximum of the hardware power.
JIT compilation is an implementation issue. Not all platforms do it. Indeed, the Android platform could be modified to 1) do ahead of time compilation, or 2) cache the native code produced by the JIT to give faster startup next time you run the app.
It is interesting that various Java vendors have tried these strategies at various times, and yet the empirical evidence is that plain JIT is the best strategy.
Java checks if the array index is not out of bounds, and it checks if the pointers are not null. It will add several internal "if"s to generated code.
The JIT compiler can optimize away many of these tests. For the rest, the overheads tend to be relatively small; e.g. a few percent difference ... not a factor of 2.
Note that the alternative to checking is the risk that typical application bugs will crash the android platform. Certainly, garbage collection becomes problematic if applications can trash memory.
All objects use garbage collector, including objects that are very easy to manually delete.
The flip-side is that it is easy to forget to delete objects, delete objects twice, use them after they have been deleted and so on. These mistakes all lead to bugs that tend to be hard to track down.
All instances of objects are created with dynamic memory allocation, including objects that can easily use the stack. If a loop iteration begins creating an instance of a class and ends deleting the created object, dynamic memory allocation will be inefficient.
Java dynamic memory allocation and object creation is FAST. Faster than in C++ for example.
The garbage collector needs to stop the app while it cleans the memory and it is very undesired for games, GUI apps and real-time apps.
Use a concurrent / low-pause garbage collector then. Another approach is to implement your app to not generate lots of garbage ... and seldom trigger garbage collection.
Reference counting is slow and it cannot handle circular references.
No decent Java GC uses reference counting. (On the other hand, a lot of C / C++ manual memory management schemes do. For instance, so-called smart pointer schemes in C++.)
Multi-threaded garbage collector is slower and it needs more use of the CPU.
You actually mean concurrent collection I think. Yes it does, but that's the penalty you pay for the extra responsiveness that you demand for interactive games / realtime apps.
What you describe as 'efficient' I would describe as 'ideal'. An application that requires little memory, little CPU time and runs quickly, put another way, is one that is good, fast, and cheap all at the same time. Never mind if it does anything useful or interesting.
The only comparison I'd view as reasonable, if all three goals are required, is among applications that produce a common result. In that case, it is unlikely, given a competing group of evenly-capable programmers, that any one implementation would excel on all three counts over the others.
That said, your question leaves out a key criterion to the mobile market: rate of application development. Mobile applications also profit far more from positive user experience than back-end optimization. Without that constraint, the question of efficiency as you put it, seems to me more of an ponderous consideration than a practical one.
But to the actual question: can a language like Java produce more efficient code than one that compiles statically to the instruction set of the target machine? Probably not. Can it be as efficient, or efficient enough? Absolutely. If we considered an execution platform with fixed, severely constrained resources that changes infrequently, it would be a different matter.
In any language, the way to get fast execution is to do the job with as little execution as possible, and as little garbage collection as possible.
That sounds like a vacuous generality, but what it means in practice, regardless of language, is
For the data structure design, keep it as simple as possible. Stay away from the fancy collection classes full of bells and whistles. Especially stay away from notifications as a way of keeping data consistent. If your data is normalized, it can never be inconsistent. If you can't normalize it, it's better to tolerate temporary inconsistency, than to try to keep it tight with notifications.
Performance problems creep in, even into the best code. You should try not to make them, but you will still make them. Most important is knowing how to find them, once made, and remove them. Here's a blow-by-blow example. If in doing this, you find you need a better big-O algorithm, then put it in. Putting one in without being sure it's needed is a recipe for slowness.
No language can rescue a program from non-removed performance problems. The language and its compiler, JITter, etc. are like a race horse. It's fine to want a good horse, but it's a waste if the jockey isn't as slim as possible.
Your program is the jockey, and it's your job to take it on a weight-loss program.
I will paste an interesting answer given by the James Gosling himself in the Book Masterminds of Programming.
Well, I’ve heard it said that
effectively you have two compilers in
the Java world. You have the compiler
to Java bytecode, and then you have
your JIT, which basically recompiles
everything specifically again. All of
your scary optimizations are in the
JIT.
James: Exactly. These days we’re
beating the really good C and C++
compilers pretty much always. When you
go to the dynamic compiler, you get
two advantages when the compiler’s
running right at the last moment. One
is you know exactly what chipset
you’re running on. So many times when
people are compiling a piece of C
code, they have to compile it to run
on kind of the generic x86
architecture. Almost none of the
binaries you get are particularly well
tuned for any of them. You download
the latest copy of Mozilla,and it’ll
run on pretty much any Intel
architecture CPU. There’s pretty much
one Linux binary. It’s pretty generic,
and it’s compiled with GCC, which is
not a very good C compiler.
When HotSpot runs, it knows exactly
what chipset you’re running on. It
knows exactly how the cache works. It
knows exactly how the memory hierarchy
works. It knows exactly how all the
pipeline interlocks work in the CPU.
It knows what instruction set
extensions this chip has got. It
optimizes for precisely what machine
you’re on. Then the other half of it
is that it actually sees the
application as it’s running. It’s able
to have statistics that know which
things are important. It’s able to
inline things that a C compiler could
never do. The kind of stuff that gets
inlined in the Java world is pretty
amazing. Then you tack onto that the
way the storage management works with
the modern garbage collectors. With a
modern garbage collector, storage
allocation is extremely fast.

Is it possible to write a decent java optimizer if information is lost in the translation to bytecode?

It occurred to me that when you write a C program, the compiler knows the source and destination platform (for lack of a better term) and can optimize to the machine it is building code for.
But in java the best the compiler can do is optimize to the bytecode, which may be great, but there's still a layer in the jvm that has to interpret the bytecode, and the farther the bytecode is away translation-wise from the final machine architecture, the more work has to be done to make it go.
It seems to me that a bytecode optimizer wouldn't be nearly as good because it has lost all the semantic information available from the original source code (which may already have been butchered by the java compiler's optimizer.)
So is it even possible to ever approach the efficiency of C with a java compiler?
Actually, a byte-code JIT compiler can exceed the performance of statically compiled languages in many instances because it can evaluate the byte code in real time and in the actual execution context. So the apps performance increases as it continues to run.
What Kevin said. Also, the bytecode optimizer (JIT) can also take advantage of runtime information to perform better optimizations. For instance, it knows what code is executing more (Hot-spots) so it doesn't spend time optimizing code that rarely executes. It can do most of the stuff that profile-guided optimization gives you (branch prediction, etc), but on-the-fly for whatever the target procesor is. This is why the JVM usually needs to "warm up" before it reaches best performance.
In theory both optimizers should behave 'identically' as it is standard practice for c/c++ compilers to perform the optimization on the generated assembly and not the source code so you've already lost any semantic information.
If you read the byte code, you may see that the compiler doesn't optimise the code very well. However the JIT can optimise the code so this really doesn't matter.
Say you compile the code on an x86 machine and new architecture comes along, lets call it x64, the same Java binary can take advantage of the new features of that architecture even though it might not have existed when the code was compiled. It means you can take old distributions of libraries and take advantage of the latest hardware specific optimisations. You cannot do this with C/C++.
Java can optimise inline calls for virtual methods. Say you have a virtual method with many different possible implementations. However, say one or two implementations are called most of the time in reality. The JIT can detect this and inline up to two method implementations but still behave correctly if you happen to call another implementation. You cannot do this with C/C++
Java 7 supports escape analysis for locked/synchronised objects, it can detect that an object is only used in a local context and drop synchronization for that object.
In the current versions of Java, it can detect if two consecutive methods lock the same object and keep the lock between them (rather than release and re-acquire the lock)
You cannot do this with C/C++ because there isn't a language level understanding of locking.

Virtual Machine Optimization

I am messing around with a toy interpreter in Java and I was considering trying to write a simple compiler that can generate bytecode for the Java Virtual Machine. Which got me thinking, how much optimization needs to be done by compilers that target virtual machines such as JVM and CLI?
Do Just In Time (JIT) compilers do constant folding, peephole optimizations etc?
I'm just gonna add two links which explain Java's bytecode pretty well and some of the various optimization of the JVM during runtime.
Optimisation is what makes JVMs viable as environments for long running applications, you can bet that SUN, IBM and friends are doing their best to ensure they can optimise your bytecode and JIT-compiled code in an efficient a manner as possible.
With that being said, if you think you can pre-optimise your bytecode then it probably won't do much harm.
It is worth being aware, however, that JVMs can tend towards performing better (and not crashing) when presented with just the sort of bytecode the Java compiler tends to construct. It is not unknown for optimisations to be missed or even for the JVM to crash when permutations of bytecode occur that are correct but unlike what would be produced by javac. Hopefully that sort of thing is more in the past now, but may be something to be aware of.
Optimising bytecode is probably an oxymoron in most cases
I don't think that's true. Optimizations like hoisting loop invariants and propagating constants can never hurt, even if the JVM is smart enough to do them on its own, by simple virtue of making the code do less work.
Obfuscators such as ProGuard will perform many static optimisations on your bytecode for you.
The HotSpot compiler will optimize your code at runtime better than is possible at compile-time - it has more information to work with, after all. The only time you should be optimizing the bytecode instead of just your algorithm is when you are targeting mobile devices, such as the Blackberry, where the JVM for that platform is not powerful enough to optimize code at runtime and just executes the bytecode.
Optimising bytecode is probably an oxymoron in most cases. Unless you control the VM, you have no idea what it does to speed up code execution, if anything. The compiler would need to know the details of the VM in order to generate optimised code.
Note to Aseraphim:
It can also be useful to optimise bytecode for non-embedded applications in some limited cases:
When delivering code over the wire, eg for WebStart apps, to minimise deliverable/cache size and because you don't necessarily know the capability/speed of the client.
For code that you know is performance critical and used at start-up before (say) HotSpot has had time to gather any stats.
Again, the transformations that a good optimiser/obfuscator performs can be very helpful.

Categories