Java Garbage Collection, Class Based Tenuring

Java Garbage Collection, Class Based Tenuring - java

I have been playing with the parameters of the Java Garbage Collector, and I'm seeing expensive and frequent minor garbage collections as the eden/survivor space fills up. This is due to me allocating a pool of very large objects. These objects I know are "permament", in that they are reused but will never be GCed. I'm therefore trying to find a way to "automatically" place objects of these types in the old generation rather than in the new one.
I'm currently getting around this issue by allocating a very large new generation (to avoid the very frequent minor GCs), unfortunately, this means that each individual collection is more expensive.
I would like to be able to specify, per-class, a tenure rate, and set it as very low for the specific classes of objects which I know will never get GCed (and which are very very large)
(in his case, it's about
My application is highly latency sensitive.
My current set up is using CMS with a min/max heap size of 48.
Is this possible? I have searched through every possible JVM flag and can't find anything to that effect, and cannot see a way to do it with a custom class loader.

Considering Hotspot, there is no such flag that would allow you to allocate certain Class instances directly in the OldGen.
If the pool is really reused and "permanent", you should be getting frequent minor gcs only during the pool allocation. You need to run your application for a longer period of time and see if the pool was indeed tenured. After that, you should not be seeing any minor GC caused by the pool usage.

Related

Java: reliably allocate large array on heap

The Task
Allocate X=4..8MB of byte array (on heap), e.g. using ByteBuffer.allocate() such that it will not cause an OutOfMemoryError. It is not allowed to split the array and process it in smaller portions. Note that the allocation happens on heap, this is not a direct ByteBuffer.
The Challenges
Memory can be fragmented, and if there is enough memory (greater than X), a continuous portion of size X bytes may still be unavailable to allocate the array (any API to find out is there a continuous region of X bytes is available probably would help).
Heap memory is divided into regions to keep objects of different generations, and an object cannot span two or more regions of the heap: Huge arrays throws out of memory despite enough memory available and Large Array allocation across young and tenured portions of java Heap
Large objects are immediately allocated in a tenured region, but it is tricky to reliably reason about which region exactly even using ManagementFactory.getMemoryPoolMXBeans(): how can I know size of each generation in java heap with jmx Some JVMs dynamically adjust LOAs: https://www.ibm.com/docs/en/sdk-java-technology/8?topic=SSYKE2_8.0.0/com.ibm.java.vm.80.doc/docs/mm_allocation_loa.html
Question
Is there a way in Java to code as follows?
if (<I can reliably allocate an array sized X bytes on heap right now>) {
ByteBuffer.allocate(X);
}

There’s a fundamental problem with the idea to do
if (<I can reliably allocate an array sized X bytes on heap right now>) {
ByteBuffer.allocate(X);
}
known as “check-then-act” anti-pattern. Regardless of how the check in the if’s condition is supposed to work, you need to ensure that it doesn’t change between the check and the subsequent action, i.e. the allocation.
To ensure that the result doesn’t change, you’d not only need to stop all other threads of the same JVM from performing allocations (or concurrent garbage collection from completing) but also prevent all other processes of the same machine from allocating memory, as it is possible that the operating system did not reserve memory for your JVM exclusively but still allows other processing to take it right at this point.
The condition itself has the challenges already named in your question and, as you said yourself, all this fiddling with implementation specific memory regions might be moot when the JVM is capable of reconfiguring them on-the-fly. Since this is usually done as response to the result of a garbage collection, you’d need to perform a full garbage collection first, to determine the resulting situation. Only in this case we were able to be sure that another GC won’t change the situation, if we were able to stop all other threads and processes from doing allocations.
And on some JVMs the only way to reliably trigger a garbage collection, is to perform an actual allocation.
So you need a way to atomically perform the check, followed by an actual allocation that ensures that the memory stays available to you no matter what happens in the environment or an answer that the memory is not available. This mechanism does exist. Just call ByteBuffer.allocate(X) and if it completes normally, the returned reference ensures that the memory stays available as long as you keep it. Otherwise, the thrown OutOfMemoryError signals the unavailability of the memory. Since this mechanism exist, there is no reason to provide a second one with the same outcome.

No, there is no reliable way to do this in Java.
There are several ways to get estimates or best-effort guesses for the available memory, but nothing reliable. Also note that even if there were such a thing, another thread could change the available amount between the condition and the call to allocate.
This related answer contains a way to get such an estimate, and also explains some of the reasons why this can not be reliable.

Does garbage collection change the object addresses in Java?

I read that garbage collection can lead to memory fragmentation problem at run-time. To solve this problem, compacting is done by the JVM where it takes all the active objects and assigns them contiguous memory.
This means that the object addresses must change from time to time? Also, if this happens,
Are the references to these objects also re-assigned?
Won't this cause significant performance issues? How does Java cope with it?

I read that garbage collection can lead to memory fragmentation problem at run-time.
This is not an exclusive problem of garbage collected heaps. When you have a manually managed heap and free memory in a different order than the preceding allocations, you may get a fragmented heap as well. And being able to have different lifetimes than the last-in-first-out order of automatic storage aka stack memory, is one of the main motivations to use the heap memory.
To solve this problem, compacting is done by the JVM where it takes all the active objects and assigns them contiguous memory.
Not necessarily all objects. Typical implementation strategies will divide the memory into logical regions and only move objects from a specific region to another, but not all existing objects at a time. These strategies may incorporate the age of the objects, like generational collectors moving objects of the young generation from the Eden space to a Survivor space, or the distribution of the remaining objects, like the “Garbage First” collector which will, like the name suggests, evacuate the fragment with the highest garbage ratio first, which implies the smallest work to get a free contiguous memory block.
This means that the object addresses must change from time to time?
Of course, yes.
Also, if this happens,
Are the references to these objects also re-assigned?
The specification does not mandate how object references are implemented. An indirect pointer may eliminate the need to adapt all references, see also this Q&A. However, for JVMs using direct pointers, this does indeed imply that these pointers need to get adapted.
Won't this cause significant performance issues? How does Java cope with it?
First, we have to consider what we gain from that. To “eliminate fragmentation” is not an end in itself. If we don’t do it, we have to scan the reachable objects for gaps between them and create a data structure maintaining this information, which we would call “free memory” then. We also needed to implement memory allocations as a search for matching chunks in this data structure or to split chunks if no exact match has been found. This is a rather expensive operation compared to an allocation from a contiguous free memory block, where we only have to bump the pointer to the next free byte by the required size.
Given that allocations happen much more often than garbage collection, which only runs when the memory is full (or a threshold has been crossed), this does already justify more expensive copy operations. It also implies that just using a larger heap can solve performance issues, as it reduces the number of required garbage collector runs, whereas the number of survivor objects will not scale with the memory (unreachable objects stay unreachable, regardless of how long you defer the collection). In fact, deferring the collection raises the chances that more objects became unreachable in the meanwhile. Compare also with this answer.
The costs of adapting references are not much higher than the costs of traversing references in the marking phase. In fact, non-concurrent collectors could even combine these two steps, transferring an object on first encounter and adapting subsequently encountered references, instead of marking the object. The actual copying is the more expensive aspect, but as explained above, it is reduced by not copying all objects but using certain strategies based on typical application behavior, like generational approaches or the “garbage first” strategy, to minimize the required work.

If you move an object around the memory, its address will change. Therefore, references pointing to it will need to be updated. Memory fragmentation occurs when an object in a contigous (in memory) sequence of objects gets deleted. This creates a hole in the memory space, which is generally bad because contigous chunks of memory have faster access times and a higher probability of fitting in chache lines, among other things. It should be noted that the use of indirection tables can prevent reference updates up to the maximum level of indirection used.
Garbage collection has a moderate performance overhead, not just in Java but in other languages as well, such as C# for example. As how Java copes with this, the strategies for performing garbage collection and how to minimize its impact on performance depends on the particular JVM being used, since each JVM can implement garbage collection however it pleases; the only requirement is that it meets the JVM specification.
However, as a programmer, there are some best practices you should follow to make the best out of garbage collection and to minimze its performance hit on your application. See this, also this, this, this blog post, and this other blog post. You might want to check the JVM specs but it's a bit dense.

Is there any condition when application will never perform Garbage Collection?

Is there any condition when application will never perform Garbage Collection ? Theoretically is it possible to have such application design ?

Yes, there is. Please read about memory leaks in Java. An example is described in Effective Java Item 6: Eliminate obsolete object references

Garbage collection happens on objects which are not referenced anymore in your application.
With Java 11, there is a way to never purposely perform garbage collection, by running your JVM with the newly introduced Epsilon GC, a garbage collector which handles memory allocation but never releases the allocated memory.

There is at least one product in the market that implements high frequency trading using Java and jvm technology.
Obviously, an application that needs to react in microseconds can't afford a garbage collector to kick in and halt the system for arbitrary periods of time.
In this case, the solution was to write the whole application to never create objects that turn into garbage. For example, all input data is kept in fixed byte arrays (that are allocated once at start time) which are then used as buffers for all kinds of processing.
Unless I am mistaken, you can listen to more details on the software engineering radio podcast. I think it should be this episode: http://www.se-radio.net/2016/04/se-radio-episode-255-monica-beckwith-on-java-garbage-collection/

Is there any condition when application will never perform Garbage Collection ?
You can prevent the GC from running by having a Thread which doesn't reach a safe point.
Unless you use a concurrent collector, the GC will only be performed when a memory region, e.g. when the Eden or Tenure spaces fill.
If you make these large enough, and your garbage rate low enough, the GC won't run for long enough that you can either perform a GC overnight, in a maintenance window or restart the process.
Theoretically is it possible to have such application design?
I have worked on applications which GC less than once per day (and some of them are restarted every day)
For example, say you produce 300KB of garbage per second, or 1 GB per hour, with a 24 GB Eden size you can run for a whole day without a collection.
In reality, if you move most of your data off-heap e.g. Chronicle Map or Queue, you might find a 4 GB, can run for a day or even a week with a minor collection.

Is it a memory leak if the garbage collector runs abnormally?

I have developed a J2ME web browser application, it is working fine. I am testing its memory consumption. It seems to me that it has a memory leak, because the green curve that represents the consumed memory of the memory monitor (of the wireless toolkit) reaches the maximum allocated memory (which is 687768 bytes) every 7 requests done by the browser, (i.e. when the end user navigates in the web browser from one page to other for 7 pages) after that the garbage collector runs and frees the allocated memory.
My question is:
is it a memory leak when the garbage collector runs automatically every 7 page navigation?
Do I need to run the garbage collector (System.gc()) manually one time per request to prevent the maximum allocated memory to be reached?
Please guide me, thanks

To determine if it is a memory leak, you would need to observe it more.
From your description, i.e. that once the maximum memory is reached, the GC kicks in and is able to free memory for your application to run, it does not sound like there is a leak.
Also you should not call GC yourself since
it is only an indication
could potentially affect the underlying algorithm affecting its performance.
You should instead focus on why your application needs so much memory in such a short period.

My question is: is it a memory leak when the garbage collector runs automatically every 7 page navigation?
Not necessarily. It could also be that:
your heap is too small for the size of problem you are trying to solve, or
your application is generating (collectable) garbage at a high rate.
In fact, given the numbers you have presented, I'm inclined to think that this is primarily a heap size issue. If the interval between GC runs decreased over time, then THAT would be evidence that pointed to a memory leak, but if the rate stays steady on average, then it would suggest that the rate of memory usage and reclamation are in balance; i.e. no leak.
Do I need to run the garbage collector (System.gc()) manually one time per request to prevent the maximum allocated memory to be reached?
No. No. No.
Calling System.gc() won't cure a memory leak. If it is a real memory leak, then calling System.gc() will not reclaim the leaked memory. In fact, all you will do is make your application RUN A LOT SLOWER ... assuming that the JVM doesn't ignore the call entirely.
Direct and indirect evidence that the default behaviour of HotSpot JVMs is to honour System.gc() calls:
"For example, the default setting for the DisableExplicitGC option causes JVM to honor Explicit garbage collection requests." - http://pic.dhe.ibm.com/infocenter/wasinfo/v7r0/topic/com.ibm.websphere.express.doc/info/exp/ae/rprf_hotspot_parms.html
"When JMX is enabled in this way, some JVMs (such as Sun's) that do distributed garbage collection will periodically invoke System.gc, causing a Full GC." - http://static.springsource.com/projects/tc-server/2.0/getting-started/html/ch11s07.html
"It is best to disable explicit GC by using the flag -XX:+DisableExplicitGC." - http://docs.oracle.com/cd/E19396-01/819-0084/pt_tuningjava.html
And from the Java 7 source code:
./openjdk/hotspot/src/share/vm/runtime/globals.hpp
product(bool, DisableExplicitGC, false, \
"Tells whether calling System.gc() does a full GC") \
where the false is the default value for the option. (And note that this is in the OS / M/C independent part of the code tree.)

I wrote a library that makes a good effort to force the GC. As mentioned before, System.gc() is asynchronous and won't do anything by itself. You may want to use this library to profile your application and find the spots where too much garbage is being produced. You can read more about it in this article where I describe the GC problem in detail.

That is (semi) normal behavior. Available (unreferenced) storage is not collected until the size of the heap reaches some threshold, triggering a collection cycle.
You can reduce the frequency of GC cycles by being a bit more "heap aware". Eg, a common error in many programs is to parse a string by using substring to not only parse off the left-most word, but also shorten the remaining string by substringing to the right. Creating a new String for the word is not easily avoided, but one can easily avoid repeatedly substringing the "tail" of the original string.
Running System.GC will accomplish nothing -- on most platforms it's a no-op, since it's so commonly abused.
Note that (outside of brain-dead Android) you can't have a true "memory leak" in Java (unless there's a serious JVM bug). What's commonly referred to as a "leak" in Java is the failure to remove all references to objects that will never be used again. Eg, you might keep putting data into a chain and never clear pointers to the stuff on the far end of the chain that is no longer going to be used. The resulting symptom is that the MINIMUM heap used (ie, the size immediately after GC runs) keeps rising each cycle.

Adding to the other excellent answers:
Looks like you are confusing memory leak with garbage collection.
Memory leak is when unused memory cannot be garbage collected because it still has references somewhere (although they're not used for anything).
Garbage collection is when a piece of software (the garbage collector) frees unreferenced memory automatically.
You should not call the garbage collector manually because that would affect its performance.

Predicting Java memory

Is there a way to predict how much memory my Java program is going to take? I come from a C++ background where I implemented methods such as "size_in_bytes()" on classes and I could fairly accurately predict the runtime memory footprint of my app. Now I'm in a Java world, and that is not so easy... There are references, pools, immutable objects that are shared... but I'd still like to be able to predict my memory footprint before I look at the process size in top.

You can inspect the size of objects if you use the instrumentation API. It is a bit tricky to use -- it requires a "premain" method and extra VM parameters -- but there are plenty of examples on the web. "java instrumentation size" should find you these.
Note that the default methods will only give you a shallow size. And unless you avoid any object construction outside of the constructor (which is next to impossible), there will be dead objects around waiting to be garbage collected.
But in general, you could use these to estimate the memory requirements of your application, if you have a good control on the amount of objects generated.

You can't predict the amount of memory a program is going to take. However, you can predict how much an object will take. Edit it turns out I'm almost completely wrong, this document describes the memory usage of objects better: http://www.javamex.com/tutorials/memory/object_memory_usage.shtml

In general, you can predict fairly closely what a given object will require. There's some overhead that is relatively fixed, plus the instance fields in the object, plus a modest amount of padding. But then object size is rounded up to at least (on most JVMs) a 16-byte boundary, and some JVMs round up some object sizes to larger boundaries (to allow the use of standard sized pre-allocated object frames). But all this is relatively fixed for a given JVM.
What varies, of course, is the overhead required for garbage collection. A naive garbage collector requires 100% overhead (at least one free byte for every allocated byte), though certain varieties of "generational" collectors can improve on this to a degree. But how much space is required for GC is highly dependent on the workload (on most JVMs).
The other problem is that when you're running at a relatively low level of allocation (where you're only using maybe 10% of max available heap) then garbage will accumulate. It isn't actively referenced, but the bits of garbage are interspersed with your active objects, so it takes up working set. As a result, your working set tends to be roughly equal to your current overall garbage-collected heap size (plus other system overhead).
You can, of course, "throttle" the heap size so that you run at a higher % utilization, but that increases the frequency of garbage collection (and the overall cost of GC to a lesser degree).

You can use profilers to understand the constant set of objects that are always in memory. Then you should execute all the code paths to check for memory leaks. JProfiler is a good one to start with.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.