Java, JIT and Garbage Collector efficiency - java

I want to know about the efficiency of Java and the advantages and disadvantages of Java Virtual Machine and Android.
Efficiency is the low use of memory, low use of the processor and fast execution.
Mobile devices are simpler than PC, then the apps need to be more efficient. Servers receive many connections and they need to be very efficient. Many mobile devices use Android and Java apps, and many servers use PHP.
Can Java and interpreted languages, such as Java Script, Python and PHP, be more efficient than C and C++?
JIT (just in time) advantages:
It can optimize better, because it knows the value of some variables and where it is used or changed.
It knows the processor and can optimize with processor specific instructions.
It is easier to transform functions into inline function.
It can remove known conditional tests and remove blocks that will not be run.
Java disadvantages:
When the app run for the first time, the app will be very slow, because the bytecodes will be interpreted and JIT compiler will do many analysis to find good optimizations. The apps cannot use the maximum of the hardware power. If an app is a game or a real-time app, if it be run for the first successfully and with no delay, but it uses the maximum of the hardware power, then the next time the app be run, it will not use the maximum of the hardware power due to optimizations. The problem is the app cannot be designed to use the maximum of the hardware power after the optimization, because it will be too slow on the first run, and will not continue to run.
Java checks if the array index is not out of bounds, and it checks if the pointers are not null. It will add several internal "if"s to generated code.
All objects use garbage collector, including objects that are very easy to manually delete.
All instances of objects are created with dynamic memory allocation, including objects that can easily use the stack. If a loop iteration begins creating an instance of a class and ends deleting the created object, dynamic memory allocation will be inefficient.
The garbage collector needs to stop the app while it cleans the memory and it is very undesired for games, GUI apps and real-time apps. Reference counting is slow and it cannot handle circular references. Multi-threaded garbage collector is slower and it needs more use of the CPU.

Can Java and interpreted languages, such as Java Script, Python and PHP, be more efficient than C and C++?
It's very difficult to get more efficient than the best C and C++ programs. There's a lot of C and C++ programs that are nowhere near as efficient as that though, and beating them with (modern) Java code is quite practical if you're any good.
I've also heard good things about the current best-of-breed Javascript engines, but I've never studied them in detail.
With Python and PHP (and many other languages besides) it's a bit different. Those languages are written in C, so it's obvious they cannot be more efficient than C (follows by construction). Yet it's much easier to write efficient code in them (i.e., that uses what is in-effect a very well-written C library) than it is to start from scratch. In particular, it reduces the number of defects per program. That's a very important metric in practice; anyone can produce fast code if it's allowed to be wrong.
In general, I advise not worrying about getting maximal efficiency. You run up against the law of diminishing returns. Instead, use sensible overall algorithms (or, as a friend of mine once said to me, “look after the big O()s and let the constant factors look after themselves”) and focus on the question of whether the program is good enough in practice. Once it is, stop fiddling around and ship it!

Let's pick apart your claimed disadvantages:
When the app run for the first time, the app will be very slow, because the bytecodes will be interpreted and JIT compiler will do many analysis to find good optimizations. The apps cannot use the maximum of the hardware power.
JIT compilation is an implementation issue. Not all platforms do it. Indeed, the Android platform could be modified to 1) do ahead of time compilation, or 2) cache the native code produced by the JIT to give faster startup next time you run the app.
It is interesting that various Java vendors have tried these strategies at various times, and yet the empirical evidence is that plain JIT is the best strategy.
Java checks if the array index is not out of bounds, and it checks if the pointers are not null. It will add several internal "if"s to generated code.
The JIT compiler can optimize away many of these tests. For the rest, the overheads tend to be relatively small; e.g. a few percent difference ... not a factor of 2.
Note that the alternative to checking is the risk that typical application bugs will crash the android platform. Certainly, garbage collection becomes problematic if applications can trash memory.
All objects use garbage collector, including objects that are very easy to manually delete.
The flip-side is that it is easy to forget to delete objects, delete objects twice, use them after they have been deleted and so on. These mistakes all lead to bugs that tend to be hard to track down.
All instances of objects are created with dynamic memory allocation, including objects that can easily use the stack. If a loop iteration begins creating an instance of a class and ends deleting the created object, dynamic memory allocation will be inefficient.
Java dynamic memory allocation and object creation is FAST. Faster than in C++ for example.
The garbage collector needs to stop the app while it cleans the memory and it is very undesired for games, GUI apps and real-time apps.
Use a concurrent / low-pause garbage collector then. Another approach is to implement your app to not generate lots of garbage ... and seldom trigger garbage collection.
Reference counting is slow and it cannot handle circular references.
No decent Java GC uses reference counting. (On the other hand, a lot of C / C++ manual memory management schemes do. For instance, so-called smart pointer schemes in C++.)
Multi-threaded garbage collector is slower and it needs more use of the CPU.
You actually mean concurrent collection I think. Yes it does, but that's the penalty you pay for the extra responsiveness that you demand for interactive games / realtime apps.

What you describe as 'efficient' I would describe as 'ideal'. An application that requires little memory, little CPU time and runs quickly, put another way, is one that is good, fast, and cheap all at the same time. Never mind if it does anything useful or interesting.
The only comparison I'd view as reasonable, if all three goals are required, is among applications that produce a common result. In that case, it is unlikely, given a competing group of evenly-capable programmers, that any one implementation would excel on all three counts over the others.
That said, your question leaves out a key criterion to the mobile market: rate of application development. Mobile applications also profit far more from positive user experience than back-end optimization. Without that constraint, the question of efficiency as you put it, seems to me more of an ponderous consideration than a practical one.
But to the actual question: can a language like Java produce more efficient code than one that compiles statically to the instruction set of the target machine? Probably not. Can it be as efficient, or efficient enough? Absolutely. If we considered an execution platform with fixed, severely constrained resources that changes infrequently, it would be a different matter.

In any language, the way to get fast execution is to do the job with as little execution as possible, and as little garbage collection as possible.
That sounds like a vacuous generality, but what it means in practice, regardless of language, is
For the data structure design, keep it as simple as possible. Stay away from the fancy collection classes full of bells and whistles. Especially stay away from notifications as a way of keeping data consistent. If your data is normalized, it can never be inconsistent. If you can't normalize it, it's better to tolerate temporary inconsistency, than to try to keep it tight with notifications.
Performance problems creep in, even into the best code. You should try not to make them, but you will still make them. Most important is knowing how to find them, once made, and remove them. Here's a blow-by-blow example. If in doing this, you find you need a better big-O algorithm, then put it in. Putting one in without being sure it's needed is a recipe for slowness.
No language can rescue a program from non-removed performance problems. The language and its compiler, JITter, etc. are like a race horse. It's fine to want a good horse, but it's a waste if the jockey isn't as slim as possible.
Your program is the jockey, and it's your job to take it on a weight-loss program.

I will paste an interesting answer given by the James Gosling himself in the Book Masterminds of Programming.
Well, I’ve heard it said that
effectively you have two compilers in
the Java world. You have the compiler
to Java bytecode, and then you have
your JIT, which basically recompiles
everything specifically again. All of
your scary optimizations are in the
JIT.
James: Exactly. These days we’re
beating the really good C and C++
compilers pretty much always. When you
go to the dynamic compiler, you get
two advantages when the compiler’s
running right at the last moment. One
is you know exactly what chipset
you’re running on. So many times when
people are compiling a piece of C
code, they have to compile it to run
on kind of the generic x86
architecture. Almost none of the
binaries you get are particularly well
tuned for any of them. You download
the latest copy of Mozilla,and it’ll
run on pretty much any Intel
architecture CPU. There’s pretty much
one Linux binary. It’s pretty generic,
and it’s compiled with GCC, which is
not a very good C compiler.
When HotSpot runs, it knows exactly
what chipset you’re running on. It
knows exactly how the cache works. It
knows exactly how the memory hierarchy
works. It knows exactly how all the
pipeline interlocks work in the CPU.
It knows what instruction set
extensions this chip has got. It
optimizes for precisely what machine
you’re on. Then the other half of it
is that it actually sees the
application as it’s running. It’s able
to have statistics that know which
things are important. It’s able to
inline things that a C compiler could
never do. The kind of stuff that gets
inlined in the Java world is pretty
amazing. Then you tack onto that the
way the storage management works with
the modern garbage collectors. With a
modern garbage collector, storage
allocation is extremely fast.

Related

Measuring overhead of calling through JNI

To see if I can really take any benefit of native code (written C) by using JNI (instead of writing complete java application), I want to measure overhead of calling through JNI. What is the best way to measure this overhead?
I wouldn't use a profiler to do quantitative performance testing. Profiling tends to introduce distortions into the actual timing numbers.
I'd create a benchmark that performed one of the actual calculations that you are considering doing in C and compare the C + JNI + Java version against a pure Java version. Be sure that you are comparing apples and apples; i.e. profile and optimize both versions before you benchmark them.
To do the actual benchmarking, I'd construct a loop that performed the calculation a large number of times, record the timings over a large number of iterations and compare. Make sure that you take account of JVM warmup effects; e.g. class loading, JIT compilation and heap warmup.
Like Thihara, I doubt that using C + JNI will help much. And even if it does, you need to take account of the downsides of JNI; e.g. C code portability, platform specific build issues ... and possible JVM hard crashes if your native code has bugs.
Measuring the overhead alone may give you strange results. I'd code a small part of the performance-critical code in both Java and C++ and measure the program performance, e.g., using caliper (microbenchmarking is quite a complicated thing and hardly anybody gets it right).
I would not use any profiler, especially C++ profiler, since the performance of the C++ part alone doesn't matter and since profilers may distort the results.
Use a C++ profiler and a Java profiler. They are available in IDEs for Java. I can only assume in the case of C++ though. And whatever test you design please run through a substantial number of loops to minimize environmental errors.
Oh and do post the results back since I'm also curious to see if there are any improvement in using native code over modern JVMs. Chances are though you won't see a huge performance improvement in native code.

static java bytecode optimizer (like proguard) with escape analysis?

Optimizations based on escape analysis is a planned feature for Proguard. In the meantime, are there any existing tools like proguard that already do optimizations which require escape analysis?
Yes, I think the Soot framework performs escape analysis.
What do you expect from escape analysis on compiler level? Java classes are more like object files in C - they are linked in the JVM, hence the escape analysis can be performed only on single-method level, which is of limited usability and will hamper the debugging (e.g. you will have lines of code through which you can not step).
In Java's design, the compiler is quite dumb - it checks for correctness (like Lint), but doesn't try to optimize. The smart pieces are put in the JVM - it uses multiple optimization techniques to yield well performing code on the current platform, under the current conditions. Since the JVM knows all the code that is currently loaded it can assume a lot more than the compiler and perform speculative optimizations which are reverted the moment the assumptions are invalidated. HotSpot JVM can replace code with more optimized version on the fly while the function is running (e.g. in the middle of a loop as the code gets 'hotter').
When not in debugger, variables with non-overlapping lifetimes are collapsed, invariants are hoisted out of loops, loops are unrolled, etc. All this happens in the JIT-ted code and is done dependent on how much time is spent in this function (it does not make much sense to spend time optimizing code that never runs). If we perform some of these optimizations upfront, the JIT will have less freedom and the overall result might be a net negative.
Another optimization is stack allocation of objects that do not escape the current method - this is done in certain cases, though I read a paper somewhere that the time to perform rigorous escape analysis vs the time gained by optimizations suggests that it's not worth it, so the current strategy is more heuristic.
Overall, the more information the JVM has about your original code, the better it can optimize it. And the optimizations the JVM does are constantly improving, hence I would think about compiled code optimizations only when speaking about very restricted and basic JVMs like mobile phones. In these cases you want to run your application through obfuscator anyway (to shorten class names, etc.)

why c is preferred instead of java in most of the real time application [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Why is C preferred instead of Java in most of the real time applications? For example, air-line system. I want some reasons except that Java is little slow.
There are a number of reasons:
History - the airline system is older than Java. It might need a rewrite, but it's not in progress today that I know of.
Real time generally steers clear of garbage collection, because you can't have the system waiting at a delicate juncture for the GC thread to finish its work. Things have to be more deterministic in real time control situations. This would be true of Java, C#, and any other language that uses GC.
There is a real time version of Java, but I don't know how widely it's used.
I'm not sure that the conclusion of C/C++ always being faster than Java is still true for JDK 6. A lot has changed since the 1.0 version when a lot of the benchmarks were performed (e.g., faster object creation, new memory model, new generational GC algorithms, revised reflection, etc.).
It's not about slowness at all here. It's about determinism and safety. Java, while normally fast, can hang for a microsecond when garbage collector decides to run and halt the execution for a second. If garbage collector ran while expanding airplane gear just before touching down at 150mph, bad things could happen.
C with explicit memory allocation is much more deterministic and programmer has full control over when exactly and how much memory is freed/allocated and can decide not to free memory while performing safety critical operation.
Currently new critical systems are written not even in C, but in safer languages, like ADA. Which is just as deterministic as C and (from type safety point of view) much safer even than Java.
One of the biggest reasons is that C does not have garbage collection. Garbage Collection is great because you don't have to worry deallocating memory and memory leaks are less of a problem, but it may take 100 ms or more to do the garbage collection. In many applications this time is insignificant, but this means that your app can't do anything else during that 100 ms. In certain real time applications this is unacceptable. For more information read this:
http://java.sun.com/javase/technologies/realtime/faq.jsp#2
There is really not much stopping you from using java for real-time systems. There are two different approaches.
The RT Java from Sun (JSR-1) which basically gives you a new set of APIs to interact with a RTOS and to do memory management yourself.
close to realtime JVMs like JRockit Realtime with a real time GC which takes a way the unpredictable nature of a gc where you might have a 1ms pause at one time but a 10ms pause the next time.
Contrary to what many people thinks, speed is really not what real-time is about. You want a system that is predicable enough to know exactly when things happens (to an extent possible) and to be able to trust that your process takes the same amount of time to complete every time you run it. One thing stopping that with the toolset provided with ordinary java is the built-in set of collections. Javolution is one approach to solve that (and as far as I can remember it is effectively used for airline systems (on the ground)).
At the end of the day, there is however no such thing as a tool that fits everyone. I wouldn't use java for the actual logic of the embedded auto pilot for a lot of reasons but I don't think the real-time aspects are the primary reasons for that feeling.
There are many different flavors of "Real Time".
For "semi-hard realtime" one can use C to write a kernel module for RTLinux:
http://en.wikipedia.org/wiki/RTLinux
Edited:
You simply cannot do so (write a kernel module) with Java. Also, the kernel module is super-paranoid - you cannot talk to harddisk in the realtime space.
C works closer to the machine's hardware. Java runs on JVM.
Moreover, C is compiled in On-the-Fly mode whereas java files are compiled first into bytecode files (.class) which are then executed by the JVM.
-In my opinion, the Complexity of coding and performance are the trade-offs on both languages.
Airline systems are written in Cobol, not C.

Alternative to Java

I need an alternative to Java, because I am working on a genetics-calculation project.
It takes a lot of memory and the most of the cpu time. And therefore it won´t work when I deploy it on a server, because many people use the program at the same time.
Does anybody know another language that is not running in a virtual machine and is similar to Java (object-oriented, using exceptions and type-safety)?
Best regards,
Jonathan
To answer the direct question: there are dozens of languages that fit your explicit requirements. AmmoQ listed a few; Wikipedia has many more.
And I think that you'll be disappointed with every one of them.
Despite what Java haters want you to think, Java's performance is not much different than any other compiled language. Just changing languages won't improve performance much.
You'll probably do better by getting a profiler, and looking at the algorithms that you used.
Good luck!
If your apps is consuming most of the CPU and memory on a single-user workstation, I'm skeptical that translating it into some non-VM language is going to help much. With Java, you're depending on the VM for things like memory management; you're going to have to re-implement their equivalents in your non-VM language. Also, Java's memory management is pretty good. Your application probably isn't real-time sensitive, so having it pause once in a while isn't a problem. Besides, you're going to be running this on a multi-user system anyway, right?
Memory usage will have more to do with your underlying data structures and algorithms rather than something magical about the language. Unless you've got a really great memory allocator library for your chosen language, you may find you uses just as much memory (if not more) due to bugs in your program.
Since your app is compute-intensive, some other language is unlikely to make it less so, unless you insert some strategic sleep() calls throughout the code to deliberately make it yield the CPU more often. This will slow it down, but will be nicer to the other users.
Try running your app with Java's -server option. That will engage a VM designed for long-running programs and includes a JIT that will compile your Java into native code. It may make your program run a bit faster, but it will still be CPU and memory bound.
If you don't like C++, you might consider D, ObjectiveC or the new Go language from google.
You may try C++, it satisfies all your requirements.
Use Python along with numpy, scipy and matplotlib packages. numpy is a Python package which has all the number crunching code implemented in C. Hence runtime performance (bcoz of Python Virtual Machine) won't be an issue.
If you want compiled, statically typed language only, have a look at Haskell.
Can your algorithms be parallelised?
No matter what language you use you may come up against limitations at some point if you use a single process. Using something like Hadoop will mean you can retain Java and ease of use but you can run in parallel across many machines.
On the same theme as #Barry Brown's answer:
If your application is compute / memory intensive in Java, it will probably be compute / memory intensive in C++ or any other "more efficient" language. You might get some extra leeway ... but you'll soon run into the same performance wall.
IMO, you need to do the following things:
You need profile your application, and look for any major performance bottlenecks. You might find some real surprises.
In the light of the previous step, review the design and algorithms, paying attention to space and time complexity issues. Do some research to see if someone has discovered better algorithms for doing the computations that are problematic from a performance perspective.
If the previous steps don't get you ahead of the curve, see if you can upgrade your platform; get a bigger machine with more processors, more memory, etc.
If you are still stuck, your only other option is a scale-out design. Assuming that individual user requests are processed in a single-threaded, re-architect your system so that you can run "workers" across multiple servers, with a load balancer on the front. If you have a persistent back-end, look into how you can replicate that. And so on.
Figure out if the key algorithms can be parallelized / distributed so that the resource intensive parts of a user request execute in parallel on multiple processors / multiple servers; e.g. using a "map-reduce" framework.
OK, so there is no easy answer. But simply changing programming languages is NOT a good answer.
Regardless of language your program will need to share with others when running in multiple instances on a single machine. That is simply the way computers work.
The best way to allow your current program to scale to use the available hardware resources is to chop your amount of work into small, independent pieces, and make them implement the Callable interface. These can then be executed by a suitable Executor which can then be chosen according to the available hardware. See the Executors class for many preconfigured versions. THis is what I would recommend you to do here.
If you want to switch language then Mac OS X 10.6 allows for programming in the way described above with C and ObjectiveC and if you do it properly OS X can distribute the code over all available computing resources (both CPU and GPU and what have we).
If none of the above is interesting to you, then consider one of the Grid frameworks. Terracotta may be a good place to start.
F# or ruby, or python, they are very good for calculations, and many other things
NASA uses python
Well.. I think you are looking for C#.
C# is Object Oriented and has excellent support for Generics. You can use it do write both WinForm and server-side applications.
You can read more about C# generics here: http://msdn.microsoft.com/en-us/library/ms379564(VS.80).aspx
Edit:
My mistake, geneTIcs, not geneRIcs. It does not change the fact C# will do the job, and using generics will reduce load significantly.
You might find the computer language shootout here interesting.
For example, here's Java vs C++.
You might find Ocaml (from which F# is derived) worth a look; it meets your requirements for OO, exceptions, static types and it has a native compiler, however according to the shootout you may be trading less memory for lower speed.

Performance of Java 1.6 vs C++?

With Java 1.6 out can we say that performance of Java 1.6 is almost equivalent to C++ code or still there is lot to improve on performance front in Java compared to C++ ?
Thanks.
Debian likes to conduct benchmarks on this sort of thing. In their case, it appears that Java is about half as fast and consumes 2-18 times as much memory as C++.
A well-written Java program is never going to be as fast as a well-written C or C++ program. The virtual machine is an irreducible overhead. However, most code is not well written.
Java is a simpler language than C++, and offers a more forgiving environment for inexperienced programmers - so if your programmers are inexperienced (and cheap), then Java will probably perform 'better' than C++.
shared_ptrs offer a similarly forgiving environment in C++, so they are very tempting for inexperienced programmers, or those migrating from Java, But their performance overhead is as bad or worse than Java's garbage collection. I've seen large C++ programs where every variable is a shared_ptr, and they perform terribly.
My opinion
Personally, I think that large projects need to choose an 'easy' programming language for the bulk of their code, and a 'fast' one for sections that need optimising. Java may be a good choice for the 'easy' language, especially since there is currently a plentiful supply of Java programmers - in the future, I think even easier languages such as Python will begin to take over.
C++ is a reasonable choice for a 'fast' language if you already know it, but I think it's over-complexity will eventually see it fall by the wayside, while C will continue to fulfill this role.
I would expect that most of the time for most applications C++ will be faster than Java.
In some cases there will be some C++ which is slower than Java for a given task. This is pretty rare and always a result for poor implemntation or more commenly poor refactoring of an application.
In the majority of cases the performance difference more than offset by the fexibility, ease of use, availability of libraries, and, portability that Java provides.
In a very few cases performance is so critical that developing in Java would be a poor choice <opinion><flame off>in these cases plain C is usually a better choice than C++ </flame></opinion>.
Currently the sweetspot in performance/ease of use/ease of development tradeoffs is C#. Portability is a big issues here though.
I find that Java performs very well.
However, why has no one ever fixed my biggest complaint?
Java uses FIVE TIMES as much memory as a C++ program doing the same job. At least!
And once it's used, Java keeps it!
Please, please, why won't anyone write a garbage collector for Java that uses minimum amounts of RAM? It could compact the heap and returns the memory to the OS. Instead of ridiculous piles of -Xm* options, use the memory needed and then give it back!
Actually I am sure some of the embedded system JVMs do this, but none of the desktop or server systems do.
This memory piggishness makes Java applications all want to act as if they own the entire computer system, that no one ever wants to run more than one application and that RAM is free and infinitely upgradable.
Therefore, no matter how great the performance, I would never write anything like a utility program in Java. Only gigantic server apps need apply.
What program are you developing?
Comparing C++ to Java speed is like comparing a screwdriver and a hammer, pointless. In the world we live in, where both supercomputers and toasters need to be programmed, you need to focus on your particular requirements.
I use C++ for hard realtime software running on embedded systems. I wouldn't dream of using the awfully broken Java for realtime spec for at least another 5 years, when it will hopefully be mature. I would be equally loath to use C++ for a database, cloud accessing middleware app (actually I have no Idea what I just said, but I know Java is good for 'that sort of stuff')
Would you use a ferrari with no trunk space to move your belongings? Would you bring a minivan to a drag race?
People have to understand that just because they are programming languages, does not mean they are comparable in a meaningful way.
No. Unless you measure it, you may not say it.
Performance is usually "good enough" for most purposes. The question is what you want to compare exactly, and if you have applied a profiler to find and fix the hotspots in your code.
JVM's based on Sun's code still pay a hefty startup-tax (I still wonder why they cannot snapshot that and restart from there) but Suns approach has been correctness first, speed second, and it's taken them 10 years to get up to par.
So the answer is "It depends" :)
For most applications it is almost certainly possible to write a C++ program which performs considerably better than a program to achieve the same thing in java.
However if the program isn't optimised for speed then java will likely be just as fast or faster because the compiler / JIT is able to make optimisations that a C++ environment can't.
Basically if you are willing to spend considerable time understanding and coding for performance you can probably do a considerably better job in c++ eventually than you could in java but for the same amount of time and effort it is quite likely that java will "win".
As usual though, algorithmic improvements tend to make as much if not more difference than the language.

Categories