Giving up control: machine code generation vs memory layout? - java

This may be a bit off topic of "right answer, not discussion."
However, I am trying to debug my thought process, so maybe someone can help me:
I use compilers all the time, and the fact that I'm giving up control over machine code generation (the layout of my caches, and the flow of electrons) does not bother me.
However, giving up control of memory layout (being able to place stuff in memory) and memory management (garbage collection) still bothers me these days.
Have others dealt with this? If so, how did you get past it? (In particular, how I often feel "safer" in C++ than in Java.)
Thanks!

Your feeling is, naturally, very subjective.
You might feel comfortable managing your own memory space in C++.
Others might appreciate the easiness of Java managing the heap for you, and reducing memory management overhead to a minimum.
Programming domain has an influence as well. For example, in an embedded environment, you most likely will not have the privilege to enjoy a garbage collection mechanism, leaving you to manage your own memory, whether you like it or not.
Bottom line - subjective and domain-dependent.

Confront your nightmare! Profile a busy application in NetBeans and watch the garbage collector do its job.

If you trust the JVM with code generation, why not trust it with data generation too?
Please note that things like cache sizes on CPU may influence the optimal placement of your objects, and that the JIT basically knows better than you because it can measure and take action in the process.

If you've ever used COM under C++ its really no different to using "Release()". The momory may or may not be freed right then or it may be freed somewhere down the line when the thing using it has finished using it.
Best thing to do is just assume it works and stop worrying about it.

The original poster asked about (a) memory layout and (b) memory management. The previous answers only talk about memory management.
Regarding memory layout, the keyword to search for seems to be "struct".
C and C++ both have memory layout control. D should as well.
It appears (based on a quick search) Java does not.
C# has grants memory layout control via structs. See:
Stack Overflow: incorrect members order in a C# structure
http://www.developerfusion.com/article/84519/mastering-structs-in-c/
Go's data structures are called "structs", but I cannot tell if they grant control over memory layout. (I suspect they do, but have not been able to confirm this.)
I welcome any corrections/additions to the above.
(And regarding memory management, I'm quite happy to let the language/platform do it.)

Related

java eliminating garbage generation

Suppose we have java task that is working in isolation and we are able to monitor it using visualvm... and we notice continuous garbage creation and periodic gc like this.
How do we detect what exactly is causing this issue
is there a way to see which method execution is generating garbage? how do we see where the garbage comes from?
yes we can see what objects exactly are allocating memory, but thats not helpful... i believe lot of objects are created and garbaged later, but i cant figure out where that happens and what exactly causes this...
How do we do this usually? what tools to use? any links to topics about this are appreciated
NOTE the problem here is not the GC parameter optimization, but rather the code optimization, we want to eliminate unnecessary object creation, maybe use primitives instead etc...
The easiest way is to use tool like JProfiler and record allocations. The "Allocation HotSpot" view will show in which method your application is allocating the objects. More details can be found here
When you cannot use profiler another approach is to take a heapdump and investigate the objects it contains. Then based on this information assume in which method they are instantiated.
I would suggest install VisualGC plugin in jvisualvm. It will give you very good idea about number of small and full GCs happening.
If you are looking for garbage collected objects and possible chance of memory leaks than you should inspect heap dump at two different instances of your code workflow.

Tracking down a 'GC overhead limit exceeded' error

What is the easiest way to track down (i.e., find the cause of) a 'GC overhead limit exceeded' error?
What I do not consider good options:
Adding the -XX:-UseGCOverheadLimit parameter to the JVM call. That Java exception is telling me there is something incredibly inefficient in my implementation, and I want to fix that.
"Go and look carefully at your code". The project is very large, so I need some clues regarding where to look for inefficiencies.
Shoud I use a profiler? If yes, which one would you suggest?
Should I look into the GC log? I have tried doing that, but I have a little understanding of it, and it seems there are no clear pointers to the code (saying which objects are being GC'ed).
Many questions have been asked on SO about this error, but no one seems to answer this specific question.
Simplest tools to start profiling your app
Netbeans comes with a built in profiler.
Jconsole can also help a bit
VisualVm can also aid a bit.
Commercial Tools that is really awesome is DyanTrace
Now for the approach to fix your problem:
Although there can be other ways you can tackle it. but following things may help.
1) The symptoms that you are seeing are probably result of creation of too many short lived objects in your code. Now this is not a memory leak situation but too much garbage to clean for the JVM. And Jvm is failing to keep up with that. You need to check your code for where are these objects getting created.
2) Second thing you can do is take several heap dumps at regular intervals between two GC runs and compare these heapdumps in netbeans or some other tool of your choice. you need to do this before your app goes into this bad state.This comparison will tell you what has grown in heap and may be will give you a pointer where to look into your code.
I hope this helps in solving your issue. :)

Java Memory aware cache

I am looking for some ideas, and maybe already some concrete implemenatation if somebody knows any, but I am willing to code the wanted cache on my own.
I want to have a cache that caches only as many gigs as I configure. In comparision to the rest of the app the cache part will use nearly 100% of memory, so we can generalize the used memory of the app beeing the cache size(+ garbage).
Are there methods for getting a guess of how much memory is used? Or is it better to rely on soft pointers? Soft pointer and running always at the top of the jvm memory limit might be very inefficent with lots of cpu cycles for memory cleaning? Can I do some analysis on existing objects, like a myObject.getMemoryUsage()?
The LinkedHashMap has enough cache hits for my purpose so I don't have to code some strategic caching monster, but I don't know how to solve this momory issue properly. Any ideas? I don't want OOME flying anywhere.
What is best pratice?
SoftReference are not a great idea as they tend to be clearer all at once. This means when you get a performance hit from a GC, you also get a hit having to re-build your cache.
You can use Instrumentation.getObjectSize() to get the shallow size of an Object and use reflection to obtain a deep size. However, doing this relatively expensive and not something you want to get doing very often.
Why can't you limit the size to a number of object? In fact, I would start with the simplest cache you can and only add what you really need.
LRU cache in Java.
EDIT: One way to track how much memory you are using is to Serialize the value and store it as a byte[]. This can give you fairly precise control however can slow down your solution by up to 1000x times. (Nothing comes for free ;)
I would recommend using the Java Caching System. Though if you wanted to roll your own, I'm not aware of any way to get an objects size in memory. Your best bet would be to extend AbstractMap and wrap the values in SoftReferences. Then you could set the java heap size to the maximum size you wanted. Though, your implementation would also have to find and clean out stale data. It's probably easier just to use JCS.
The problem with SoftReferences is that they give more work to the garbage collector. Although it doesn't meet your requirements, HBase has a very interesting strategy in order to prevent the cache from contributing to the garbage collection pauses : they store the cache in native memory :
https://issues.apache.org/jira/browse/HBASE-4027
https://issues.apache.org/jira/secure/attachment/12488272/HBase-4027+%281%29.pdf
A good start for your use-case would be to store all your data on disk. It might seem naive, but thanks to the I/O cache, frequently accessed data will reside in memory. I highly recommend reading these architecture notes from the Varnish caching system :
https://www.varnish-cache.org/trac/wiki/ArchitectNotes
The best practice I find is to delegate the caching functionality outside of Java if possible. Java may be good in managing memory, but at dedicated caching system should be used for anything more than a simple LRU cache.
There is a large cost with GC when it kicks in.
EHCache is one of the more popular ones I know of. Java Caching System from another answer is good as well.
However, I generally offload that work to an underlying function (usually the JPA persistence layer by the application server, I let it get handled there so I don't have to deal with it on the application tier).
If you are caching other data such as web requests, http://hc.apache.org/httpclient-3.x/ is also another good candidate.
However, just remember you also have "a file system" there's absolutely nothing wrong with writing to the file system data you have retrieved. I've used the technique several times to fix out of memory errors due to improper use of ByteArrayOutputStreams

Java, JIT and Garbage Collector efficiency

I want to know about the efficiency of Java and the advantages and disadvantages of Java Virtual Machine and Android.
Efficiency is the low use of memory, low use of the processor and fast execution.
Mobile devices are simpler than PC, then the apps need to be more efficient. Servers receive many connections and they need to be very efficient. Many mobile devices use Android and Java apps, and many servers use PHP.
Can Java and interpreted languages, such as Java Script, Python and PHP, be more efficient than C and C++?
JIT (just in time) advantages:
It can optimize better, because it knows the value of some variables and where it is used or changed.
It knows the processor and can optimize with processor specific instructions.
It is easier to transform functions into inline function.
It can remove known conditional tests and remove blocks that will not be run.
Java disadvantages:
When the app run for the first time, the app will be very slow, because the bytecodes will be interpreted and JIT compiler will do many analysis to find good optimizations. The apps cannot use the maximum of the hardware power. If an app is a game or a real-time app, if it be run for the first successfully and with no delay, but it uses the maximum of the hardware power, then the next time the app be run, it will not use the maximum of the hardware power due to optimizations. The problem is the app cannot be designed to use the maximum of the hardware power after the optimization, because it will be too slow on the first run, and will not continue to run.
Java checks if the array index is not out of bounds, and it checks if the pointers are not null. It will add several internal "if"s to generated code.
All objects use garbage collector, including objects that are very easy to manually delete.
All instances of objects are created with dynamic memory allocation, including objects that can easily use the stack. If a loop iteration begins creating an instance of a class and ends deleting the created object, dynamic memory allocation will be inefficient.
The garbage collector needs to stop the app while it cleans the memory and it is very undesired for games, GUI apps and real-time apps. Reference counting is slow and it cannot handle circular references. Multi-threaded garbage collector is slower and it needs more use of the CPU.
Can Java and interpreted languages, such as Java Script, Python and PHP, be more efficient than C and C++?
It's very difficult to get more efficient than the best C and C++ programs. There's a lot of C and C++ programs that are nowhere near as efficient as that though, and beating them with (modern) Java code is quite practical if you're any good.
I've also heard good things about the current best-of-breed Javascript engines, but I've never studied them in detail.
With Python and PHP (and many other languages besides) it's a bit different. Those languages are written in C, so it's obvious they cannot be more efficient than C (follows by construction). Yet it's much easier to write efficient code in them (i.e., that uses what is in-effect a very well-written C library) than it is to start from scratch. In particular, it reduces the number of defects per program. That's a very important metric in practice; anyone can produce fast code if it's allowed to be wrong.
In general, I advise not worrying about getting maximal efficiency. You run up against the law of diminishing returns. Instead, use sensible overall algorithms (or, as a friend of mine once said to me, “look after the big O()s and let the constant factors look after themselves”) and focus on the question of whether the program is good enough in practice. Once it is, stop fiddling around and ship it!
Let's pick apart your claimed disadvantages:
When the app run for the first time, the app will be very slow, because the bytecodes will be interpreted and JIT compiler will do many analysis to find good optimizations. The apps cannot use the maximum of the hardware power.
JIT compilation is an implementation issue. Not all platforms do it. Indeed, the Android platform could be modified to 1) do ahead of time compilation, or 2) cache the native code produced by the JIT to give faster startup next time you run the app.
It is interesting that various Java vendors have tried these strategies at various times, and yet the empirical evidence is that plain JIT is the best strategy.
Java checks if the array index is not out of bounds, and it checks if the pointers are not null. It will add several internal "if"s to generated code.
The JIT compiler can optimize away many of these tests. For the rest, the overheads tend to be relatively small; e.g. a few percent difference ... not a factor of 2.
Note that the alternative to checking is the risk that typical application bugs will crash the android platform. Certainly, garbage collection becomes problematic if applications can trash memory.
All objects use garbage collector, including objects that are very easy to manually delete.
The flip-side is that it is easy to forget to delete objects, delete objects twice, use them after they have been deleted and so on. These mistakes all lead to bugs that tend to be hard to track down.
All instances of objects are created with dynamic memory allocation, including objects that can easily use the stack. If a loop iteration begins creating an instance of a class and ends deleting the created object, dynamic memory allocation will be inefficient.
Java dynamic memory allocation and object creation is FAST. Faster than in C++ for example.
The garbage collector needs to stop the app while it cleans the memory and it is very undesired for games, GUI apps and real-time apps.
Use a concurrent / low-pause garbage collector then. Another approach is to implement your app to not generate lots of garbage ... and seldom trigger garbage collection.
Reference counting is slow and it cannot handle circular references.
No decent Java GC uses reference counting. (On the other hand, a lot of C / C++ manual memory management schemes do. For instance, so-called smart pointer schemes in C++.)
Multi-threaded garbage collector is slower and it needs more use of the CPU.
You actually mean concurrent collection I think. Yes it does, but that's the penalty you pay for the extra responsiveness that you demand for interactive games / realtime apps.
What you describe as 'efficient' I would describe as 'ideal'. An application that requires little memory, little CPU time and runs quickly, put another way, is one that is good, fast, and cheap all at the same time. Never mind if it does anything useful or interesting.
The only comparison I'd view as reasonable, if all three goals are required, is among applications that produce a common result. In that case, it is unlikely, given a competing group of evenly-capable programmers, that any one implementation would excel on all three counts over the others.
That said, your question leaves out a key criterion to the mobile market: rate of application development. Mobile applications also profit far more from positive user experience than back-end optimization. Without that constraint, the question of efficiency as you put it, seems to me more of an ponderous consideration than a practical one.
But to the actual question: can a language like Java produce more efficient code than one that compiles statically to the instruction set of the target machine? Probably not. Can it be as efficient, or efficient enough? Absolutely. If we considered an execution platform with fixed, severely constrained resources that changes infrequently, it would be a different matter.
In any language, the way to get fast execution is to do the job with as little execution as possible, and as little garbage collection as possible.
That sounds like a vacuous generality, but what it means in practice, regardless of language, is
For the data structure design, keep it as simple as possible. Stay away from the fancy collection classes full of bells and whistles. Especially stay away from notifications as a way of keeping data consistent. If your data is normalized, it can never be inconsistent. If you can't normalize it, it's better to tolerate temporary inconsistency, than to try to keep it tight with notifications.
Performance problems creep in, even into the best code. You should try not to make them, but you will still make them. Most important is knowing how to find them, once made, and remove them. Here's a blow-by-blow example. If in doing this, you find you need a better big-O algorithm, then put it in. Putting one in without being sure it's needed is a recipe for slowness.
No language can rescue a program from non-removed performance problems. The language and its compiler, JITter, etc. are like a race horse. It's fine to want a good horse, but it's a waste if the jockey isn't as slim as possible.
Your program is the jockey, and it's your job to take it on a weight-loss program.
I will paste an interesting answer given by the James Gosling himself in the Book Masterminds of Programming.
Well, I’ve heard it said that
effectively you have two compilers in
the Java world. You have the compiler
to Java bytecode, and then you have
your JIT, which basically recompiles
everything specifically again. All of
your scary optimizations are in the
JIT.
James: Exactly. These days we’re
beating the really good C and C++
compilers pretty much always. When you
go to the dynamic compiler, you get
two advantages when the compiler’s
running right at the last moment. One
is you know exactly what chipset
you’re running on. So many times when
people are compiling a piece of C
code, they have to compile it to run
on kind of the generic x86
architecture. Almost none of the
binaries you get are particularly well
tuned for any of them. You download
the latest copy of Mozilla,and it’ll
run on pretty much any Intel
architecture CPU. There’s pretty much
one Linux binary. It’s pretty generic,
and it’s compiled with GCC, which is
not a very good C compiler.
When HotSpot runs, it knows exactly
what chipset you’re running on. It
knows exactly how the cache works. It
knows exactly how the memory hierarchy
works. It knows exactly how all the
pipeline interlocks work in the CPU.
It knows what instruction set
extensions this chip has got. It
optimizes for precisely what machine
you’re on. Then the other half of it
is that it actually sees the
application as it’s running. It’s able
to have statistics that know which
things are important. It’s able to
inline things that a C compiler could
never do. The kind of stuff that gets
inlined in the Java world is pretty
amazing. Then you tack onto that the
way the storage management works with
the modern garbage collectors. With a
modern garbage collector, storage
allocation is extremely fast.

Java without gc - io

I would like to run a Java program with garbage collection switched off. Managing memory in my own code is not so difficult.
However the program needs quite a lot of I/O.
Is there any way (short of using JNI for all I/O operations) that I could achieve this using pure Java?
Thanks
Daniel
What you are trying to achieve is frequently done in investment banking to develop low-latency real-time systems.
To avoid GC you simply need to make sure not to allocate memory after the startup and warm-up phase of your application.
As you seem to have noticed Java NIO internally does unwanted memory allocation.
Unfortunately, you have no choice but write JNI replacements for the problematic calls.
You need at least to write a replacement for the NIO Selector.
You will have to avoid using most of the Java libraries due to similar unwanted memory allocations.
For example you will have to avoid using immutable object like String, avoid Boxing, re-implement Collections that preallocate enough entries for the whole lifetime of your program.
Writing Java code this way is not easy, but certainly possible.
I am developing a platform to do just so.
Managing memory in my own code is not
so difficult.
It's not difficult - It's impossible. For example:
public void foo() {
Object o = new Object();
// free(o); // Doh! No "free" keyword in Java.
}
Without the aid of the garbage collector how can the memory consumed by o be reclaimed?
I'm assuming from your question that you might want to avoid the sporadic pauses caused by garbage collection due to the high level of I/O being performed by your app. If this is the case there are techniques for minimising the number of objects created (e.g. re-using objects from a pool). You could also consider enabling the Concurrent Mark Sweep Collector.
The concurrent mark sweep collector,
also known as the concurrent collector
or CMS, is targeted at applications
that are sensitive to garbage
collection pauses.
It's very hard (but not impossible) to disable GC in a JVM.
Look at the JNI "critical" functions for hints.
You can also essentially ensure you don't GC by not allocating any more objects (write a JVMTI agent that slaps you if you do, and instrument your code).
Finally, you can force a fatal OutOfMemoryError by ensuring that every object you allocate is never freed, thus when you hit -Xmx memory used, you'll fall over as GC won't be able to reclaim anything (mind you, you'll GC one or more times at this point before you fall over in a heap).
The real question is why you'd want to? What upside do you see in doing it? Is it for realtime? If so, I'd consider looking at one of the several realtime JVMs available on the market (Oracle, IBM, & others all sell them). I can't honestly think of another reason to do this while still using Java.
The only way you are going to be able to turn off garbage collection is to modify the JVM. This is should be feasible with OpenJDK 6 codebase.
However, the what you will get at the end is a JVM that leaks memory like crazy, with no reasonable hope of fixing the leaks. The Java class library APIs are designed and implemented on the assumption that there is a GC taking care of memory management. This is so fundamental that any serious attempt to "fix" it would lead to a language / library that is not recognizable as Java.
If you want a non-garbage collected language, use C or C++.
Modern JVM's are so good at handling short-lived objects that any scheme you devise on your own will be slower.
This is because the objects you handle yourself will become long-lived and receive extra deluxe treatment from the JVM in terms of being moved around etc. Of course, this is by the garbage collector, which you want to turn off, but you can do very little without any gc.
So, before you start considering what optimization to use, then establish a baseline where you have a large unoptimized, program and profile it. Then do your tweaks, and see if it helps, but you will never know if you do not have a baseline.
As other people have mentioned you can't disable the GC. However, you can choose to use the experimental 'Epsilon' garbage collector which will never actually perform any garbage collections. Warning: it will crash if your JVM runs out of memory (because it's not doing any garbage collections).
There's more info (including the command-line switch to use) at:
http://openjdk.java.net/jeps/318
Good luck!
GarbageCollection is automated memory management in java.So you can not disable GC
Since you say, "its all about predictability not straight line speed," you should look at using a realtime Java system with deterministic garbage collection.

Categories