Who has successfully compiled a Java business project to native (e.g. using GCJ or Excelsior JET) and can share the pros and cons?
I image following advantages:
more speed (the user's machine does not need to compile byte code to native code)
less possibility to hack
no separate Java runtime required
and following disadvantages:
needs a special build for each platform to support (but that's required for SWT already)
some features like reflection might not work?
harder to locate bugs (what about stacktraces)?
I've used Excelsior JET to compile an SWT app to native for Windows.
This was some time ago, and they've improved the tool immensely since then, but JVM speed has since also improved commensurably. Memory usage not so much, but then even low end laptops have GB's of RAM these days and, although it might disgust our sense of frugality, it really doesn't matter if your small GUI app uses 20MB of memory on a box with at least 50x that amount, especially when you factor in the advantages of developing in this sort of environment.
The main reasons to static compile are startup time and memory usage. JET gave me these, but at the expense of a long build cycle, bugs in missing classes due to dynamic loading conflicting with static compilation (this is something I believe they've improved a lot) and platform-specific builds (you must build the Windows distro on Windows). Eventually Moore's Law and JVM speeds made these trade-off's not worth it, and I dropped this build option.
FWIW, stacktraces and reflection are fine, as is dynamic class loading if you make sure the compiler knows about the classes that are not statically referenced. If you're worried about decompilation, well, an obfuscator will likely give you as much mileage.
I have nothing but good things to say about JET and the people that make it though. If you need to do it, JET's a solid product.
I can think of 2 cons
Cons:
Compiling application needs time which would increases the software
development costs. Java originally use interpretor to run the code
which is far more easy to develop.
It doesn't have Java original debugger to debug your software. You
need to use other like GDB which is far more complicated. Again,
increase your development costs
Related
I work in a corporate environment with several widely-used, infrequently-updated Java libraries in its repositories. Some of the libraries quite old, and were compiled using language levels as far back as JDK 1.5 (Java 5).
Nearly all of our actively developed Java projects are using Java 8 or newer, but they are dependent on one or more of the JDK 1.5-level JARs.
Are there performance penalties for using JARs with very old bytecode versions? Can the modern JIT update old, inefficient bytecode on the fly?
Remember: Write Once, Run Everywhere! In your case, the slogan really fits.
Generally, there's no reason to touch old Java code if it's doing its job.
The bytecode hasn't changed much over the years, the bytecode instructions understood by a Java8 JRE are 99% the same that were present already in Java2 - there were very few additions, so from that point of view there's no need to update the old bytecode. It will run even faster under Java8 than under Java5, as both the JRE with its HotSpot engine and the class library have improved a lot.
The changes in the class file format are more about metadata, and the class-file version number also makes sure that you don't run a Java8 program under a Java6 JRE where half of the Java classes and methods are missing.
What changed a lot, is the Java class library and the source language. And as an old library couldn't know about the changes to come later, it might turn out to be less efficient than a newly-written version using all the features from later Java releases. But my guess is that the performance gained by a redesign of the old libraries isn't worth the effort.
And finally the general advice on performance questions:
If it ain't broken, don't fix it (= don't optimize before you know that you have a performance problem).
Before optimizing, use a profiler to find out where your bottlenecks are. Believe me, the bottleneck is hardly ever where you expect it.
In order to learn more about testing, we're going to use a profiler on a larger project (to actually get some values and measurements) and since we don't have any large project ourselves, we're forced to use something else. Any good suggestions? Maybe testing JUnit perhaps? (not "With" JUnit)?
Edit:
Not looking for any specific data, just... something... The problem is that all of this is so new so it gets kinda confusing. The point is to get slightly accustomed to testing tools such as a profiler. In other words, there shouldn't be too necessary to know much about the actual program since the program don't really matter and the data gained isn't too significant either and is mostly supposed to merely demonstrate that you can actually get stuff out of testing. So it's a bit confusing how I should proceed since I am not used to big actual programs.
Can I just download normal java files and just run/profile them with NetBeans (or similar) without having to do or care about a bunch of stuff?
Well, I've got my standard scenario. It's in C++, but it shouldn't take more than a day or two to recode it in Java.
Caveat: The scenario is not about measuring, per se, but about performance tuning, which is not at all the same thing.
It makes the point that serious code often contains multiple performance problems, and if you're really trying to make it go fast, profilers are not necessarily the best tools.
It depends on what type of data you want to profile. But the best way to get a "larger project" if you don't have one, is to find some open source project on the web that fit with what you want.
Edit: I never profile with NetBeans, so I can't tell you for this tool, but if you don't care about the tool, you can start trying with VisualVM (included with the JDK), it's a tool for monitoring the JVM. It's very usefull, and if you already run java application (like NetBeans) you'll not need to download extra applications.
Description of the tool taken on their website: VisualVM monitors application CPU usage, GC activity, heap and permanent generation memory, number of loaded classes and running threads.
VisualVM website
If you really want to profile with some source code, a little java application with a main will do the job, but again it depends on what data/amout of data you want to profile. Maybe you can find some "test applications" written in java on the web.
I realize the benefits of bytecode vs. native code (portability).
But say you always know that your code will run on a x86 architecture, why not then compile for x86 and get the performance benefit?
Note that I am assuming there is a performance gain to native code compilation. Some folks have answered that there could in fact be no gain which is news to me..
Because the performance gain (if any) is not worth the trouble.
Also, garbage collection is very important for performance. Chances are that the GC of the JVM is better than the one embedded in the compiled executable, say with GCJ.
And just in time compilation can even result in better performance because the JIT has more information are run-time available to optimize the compilation than the compiler at compile-time. See the wikipedia page on JIT.
"Solaris" is an operating system, not a CPU architecture. The JVM installed on the actual machine will compile to the native CPU instructions. Solaris could be SPARC, x86, or x86-64 architecture.
Also, the JIT compiler can make processor-specific optimisations depending on which actual CPU family you have. For example, different instruction sequences are faster on Intel CPUs than on AMD CPUs, and a JIT compiler for your exact platform can take advantage of this information to produce highly optimised code.
The bytecode runs in a Java Virtual Machine that is compiled for (example) Solaris. It will be optimised like heck for that operating system.
In real-world cases, you see often see equal or better performance from Java code at runtime, by virtue of building on the virtual machine's code for things like memory management - that code will have been evolving and maturing for years.
There's more benefits to building for the JVM than just portability - for example, every time a new JVM is released your compiled bytecode gets any optimisations, algorithmic improvements etc. that come from the best in the business. On the other hand, once you've compiled your C code, that's it.
Because with Just-In-Time compilation, there is trivial performance benefit.
Actually, many things JIT can actually do faster.
It's already will be compiled by JIT into Solaris native code, after run. You can't receive any other benefits if you compile it before uploading at target site.
You may, or may not get a performance benefit. But more likely you would get a performance penalty: JIT optimization is not possible with static compilation, so the performance would be only as good as the compiler can make it "blindfolded" (without actually profiling the program and optimizing it accordingly, which is what JIT compilers such as HotSpot does).
It's intuitively quite surprising how cheap (resource-wise) compiling is, and how much can be automatically optimized by just observing the running program. Black magic, but good for us :-)
All this talk of JITs is about seven years out of date BTW. The technology concerned now is called HotSpot and it isn't just a JIT.
"why not then compile for x86"
Because then you can't take advantage of the specific features of the particular cpu it gets run on. In particular, if we are to read "compile for x86" as "produce native code that can run on a 386 and its descendants", then the resulting code can't rely on even something as old as the mmx instructions.
As such, the end result is that you need to compile for every exact architecture it'll run on (what about those that does not exist yet), and have the installer select which executable to put into place. Or, I hear the intel C++ compiler will produce several versions of the same function, differing only on cpu features used, and pick the right one at run-time based on what the CPU reports as available.
On the other hand, you can view bytecode as a "half-compiled" source, similar to an intermediate format a native compiler will (unless asked) not actually write to disk. The runtime environment can then do the final compilation, knowing exactly what architecture will be used. This is the given reason why some C#/.net code could slightly outperform c++ code on some cpu-intensive tasks in some benchmarks a while ago.
The "final compilation" of bytecode can also make additional optimalization assumptions that are (from a static compilation perspective) distinctly unsafe*, and just recompile if those assumptions are found wrong later.
I guess because JIT (just in time) compilation is very advanced.
The canonical JVM implementation from Sun applies some pretty sophisticated optimization to bytecode to obtain near-native execution speeds after the code has been run a few times.
The question is, why isn't this compiled code cached to disk for use during subsequent uses of the same function/class?
As it stands, every time a program is executed, the JIT compiler kicks in afresh, rather than using a pre-compiled version of the code. Wouldn't adding this feature add a significant boost to the initial run time of the program, when the bytecode is essentially being interpreted?
Without resorting to cut'n'paste of the link that #MYYN posted, I suspect this is because the optimisations that the JVM performs are not static, but rather dynamic, based on the data patterns as well as code patterns. It's likely that these data patterns will change during the application's lifetime, rendering the cached optimisations less than optimal.
So you'd need a mechanism to establish whether than saved optimisations were still optimal, at which point you might as well just re-optimise on the fly.
Oracle's JVM is indeed documented to do so -- quoting Oracle,
the compiler can take advantage of
Oracle JVM's class resolution model to
optionally persist compiled Java
methods across database calls,
sessions, or instances. Such
persistence avoids the overhead of
unnecessary recompilations across
sessions or instances, when it is
known that semantically the Java code
has not changed.
I don't know why all sophisticated VM implementations don't offer similar options.
An updated to the existing answers - Java 8 has a JEP dedicated to solving this:
=> JEP 145: Cache Compiled Code. New link.
At a very high level, its stated goal is:
Save and reuse compiled native code from previous runs in order to
improve the startup time of large Java applications.
Hope this helps.
Excelsior JET has a caching JIT compiler since version 2.0, released back in 2001. Moreover, its AOT compiler may recompile the cache into a single DLL/shared object using all optimizations.
I do not know the actual reasons, not being in any way involved in the JVM implementation, but I can think of some plausible ones:
The idea of Java is to be a write-once-run-anywhere language, and putting precompiled stuff into the class file is kind of violating that (only "kind of" because of course the actual byte code would still be there)
It would increase the class file sizes because you would have the same code there multiple times, especially if you happen to run the same program under multiple different JVMs (which is not really uncommon, when you consider different versions to be different JVMs, which you really have to do)
The class files themselves might not be writable (though it would be pretty easy to check for that)
The JVM optimizations are partially based on run-time information and on other runs they might not be as applicable (though they should still provide some benefit)
But I really am guessing, and as you can see, I don't really think any of my reasons are actual show-stoppers. I figure Sun just don't consider this support as a priority, and maybe my first reason is close to the truth, as doing this habitually might also lead people into thinking that Java class files really need a separate version for each VM instead of being cross-platform.
My preferred way would actually be to have a separate bytecode-to-native translator that you could use to do something like this explicitly beforehand, creating class files that are explicitly built for a specific VM, with possibly the original bytecode in them so that you can run with different VMs too. But that probably comes from my experience: I've been mostly doing Java ME, where it really hurts that the Java compiler isn't smarter about compilation.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
It seems like anything you can do with bytecode you can do just as easily and much faster in native code. In theory, you could even retain platform and language independence by distributing programs and libraries in bytecode then compiling to native code at installation, rather than JITing it.
So in general, when would you want to execute bytecode instead of native?
Hank Shiffman from SGI said (a long time ago, but it's till true):
There are three advantages of Java
using byte code instead of going to
the native code of the system:
Portability: Each kind of computer has its unique instruction
set. While some processors include the
instructions for their predecessors,
it's generally true that a program
that runs on one kind of computer
won't run on any other. Add in the
services provided by the operating
system, which each system describes in
its own unique way, and you have a
compatibility problem. In general, you
can't write and compile a program for
one kind of system and run it on any
other without a lot of work. Java gets
around this limitation by inserting
its virtual machine between the
application and the real environment
(computer + operating system). If an
application is compiled to Java byte
code and that byte code is interpreted
the same way in every environment then
you can write a single program which
will work on all the different
platforms where Java is supported.
(That's the theory, anyway. In
practice there are always small
incompatibilities lying in wait for
the programmer.)
Security: One of Java's virtues is its integration into the Web. Load
a web page that uses Java into your
browser and the Java code is
automatically downloaded and executed.
But what if the code destroys files,
whether through malice or sloppiness
on the programmer's part? Java
prevents downloaded applets from doing
anything destructive by disallowing
potentially dangerous operations.
Before it allows the code to run it
examines it for attempts to bypass
security. It verifies that data is
used consistently: code that
manipulates a data item as an integer
at one stage and then tries to use it
as a pointer later will be caught and
prevented from executing. (The Java
language doesn't allow pointer
arithmetic, so you can't write Java
code to do what we just described.
However, there is nothing to prevent
someone from writing destructive byte
code themselves using a hexadecimal
editor or even building a Java byte
code assembler.) It generally isn't
possible to analyze a program's
machine code before execution and
determine whether it does anything
bad. Tricks like writing
self-modifying code mean that the evil
operations may not even exist until
later. But Java byte code was designed
for this kind of validation: it
doesn't have the instructions a
malicious programmer would use to hide
their assault.
Size: In the microprocessor world RISC is generally preferable
over CISC. It's better to have a small
instruction set and use many fast
instructions to do a job than to have
many complex operations implemented as
single instructions. RISC designs
require fewer gates on the chip to
implement their instructions, allowing
for more room for pipelines and other
techniques to make each instruction
faster. In an interpreter, however,
none of this matters. If you want to
implement a single instruction for the
switch statement with a variable
length depending on the number of case
clauses, there's no reason not to do
so. In fact, a complex instruction set
is an advantage for a web-based
language: it means that the same
program will be smaller (fewer
instructions of greater complexity),
which means less time to transfer
across our speed-limited network.
So when considering byte code vs native, consider which trade-offs you want to make between portability, security, size, and execution speed. If speed is the only important factor, go native. If any of the others are more important, go with bytecode.
I'll also add that maintaining a series of OS and architecture-targeted compilations of the same code base for every release can become very tedious. It's a huge win to use the same Java bytecode on multiple platforms and have it "just work."
The performance of essentially any program will improve if it is compiled, executed with profiling, and the results fed back into the compiler for a second pass. The code paths which are actually used will be more aggressively optimized, loops unrolled to exactly the right degree, and the hot instruction paths arranged to maximize I$ hits.
All good stuff, yet it is almost never done because it is annoying to go through so many steps to build a binary.
This is the advantage of running the bytecode for a while before compiling it to native code: profiling information is automatically available. The result after Just-In-Time compilation is highly optimized native code for the specific data the program is processing.
Being able to run the bytecode also enables more aggressive native optimization than a static compiler could safely use. For example if one of the arguments to a function is noted to always be NULL, all handling for that argument can simply be omitted from the native code. There will be a brief validity check of the arguments in the function prologue, if that argument is not NULL the VM aborts back to the bytecode and starts profiling again.
Bytecode creates an extra level of indirection.
The advantages of this extra level of indirection are:
Platform independence
Can create any number of programming languages (syntax) and have them compile down to the same bytecode.
Could easily create cross language converters
x86, x64, and IA64 no longer need to be compiled as seperate binaries. Only the proper virtual machine needs to be installed.
Each OS simply needs to create a virtual machine and it will have support for the same program.
Just in time compilation allows you to update a program just by replacing a single patched source file. (Very beneficial for web pages)
Some of the disadvantages:
Performance
Easier to decompile
All good answers, but my hot-button has been hit - performance.
If the code being run spends all its time calling library/system routines - file operations, database operations, sending windows messages, then it doesn't matter very much if it's JITted, because most of the clock time is spent waiting for those lower-level operations to complete.
However, if the code contains things we usually call "algorithms", that have to be fast and don't spend much time calling functions, and if those are used often enough to be a performance problem, then JIT is very important.
I think you just answered your own question: platform independence. Platform-independent bytecode is produced and distributed to its target platform. When executed it's quickly compiled to native code either before execution begins, or simultaneously (Just In Time). The Java JVM and presumably the .NET runtimes operate on this principle.
Here: http://slashdot.org/developers/02/01/31/013247.shtml
Go see what the geeks of Slashdot have to say about it! Little dated, but very good comments!
Ideally you would have portable bytecode that compiles Just In Time to native code. I think the reason bytecode interpreters exist without JIT is due primarily to the practical fact that native code compilation adds complexity to a virtual machine. It takes time to build, debug, and maintain that additional component. Not everyone has the time or resources to make that commitment.
A secondary factor is safety. It's much easier to verify an interpreter won't crash than to guarantee the same for native code.
Third is performance. It can often take more time to generate machine code than to interpret bytecode for small pieces of code that only run once.
Portability and platform independence are probably the most notable advantages of bytecode over native code.