While I am reading Oracle's documentation on G1 Garbage Collector, I noted following -
When performing garbage collections, G1 operates in a manner similar to the CMS collector. G1 performs a concurrent global marking phase to determine the liveness of objects throughout the heap. After the mark phase completes, G1 knows which regions are mostly empty. It collects in these regions first, which usually yields a large amount of free space. This is why this method of garbage collection is called Garbage-First
As mentioned in above quote, G1 selects the regions which are mostly empty.
My question is, if any region is mostly empty, how would that yields large amount of free memory ? If it is mostly empty, it is already a part of free memory. Isn't it ?
Could anyone here help me clarify this ?
In this sentence:
After the mark phase completes, G1 knows which regions are mostly empty.
"mostly empty" means "contains the most reclaimable garbage". This is clear from the context. The purpose of the mark phase is to determine which objects are definitely or probably reachable. The rest are definitely unreachable, and can be collected.
Collecting regions with the largest amount of reclaimable space is good for two reasons:
You get the most space back soonest.
With a copying collector, there is less work to do if the "from" space mostly contains stuff that you don't need to copy to the "to" space. So you get the most space back efficiently.
In most use-cases, the 2nd reason is more significant. It is rarely important to get space back quickly. You just need the space to be available when the application requests it. (GC pauses are a different matter, but they are caused by other things.)
Think of of "garbage first" as a shorthand for "most garbage-y first." That is, it defines various blocks of memories, and then prioritizes the ones with the most garbage -- thus getting the most bang for its buck.
From the page you cited:
G1 concentrates its collection and compaction activity on the areas of the heap that are likely to be full of reclaimable objects, that is, garbage. (emphasis added)
... as opposed to just treating every block of memory as equally-good for GC, even if 99% of it contains live objects.
Related
I've a AKKA-HTTP based service which is written in scala. This service works as a proxy for an API call. It creates a host connection pool for calling API using
https://doc.akka.io/docs/akka-http/current/client-side/host-level.html
The service is integrated with NewRelic and has the attached snapshots
I would like to understand the reasons for this kind of zig-zag patterns even when there is no traffic on the service and the connections in the host-pool gets terminated because of idle-timeout.
Moreover, I would also like to know Does the FULL GC will only occur after it reached a threshold say 7GB? or it can also occur at some other time when there is no traffic?
The service has XmX of 8GB. Moreover, there are also multiple dispatchers(fork-join-executor) which performs multiple tasks.
First, your graphs show a very healthy application. This "chainsaw" pattern is overall seen as a very good thing, without much to worry about.
When exactly a Full GC is going to happen is a bit hard to predict (I would use the word impossible, too). When your "live" objects have nowhere to move (because there is simply no space for that), a Full GC may be triggered. There are certain thresholds of when a concurrent phase (marking) is going to be initiated, but if that results in a Full GC or not is decided later.
Considering that G1 also re-sizes regions (makes them less/more) based on heuristics, and the fact that it can also shrink or grow your heap (up to -Xmx), the exact conditions when a Full GC might happen is not easy to predict (I guess some GC experts that know the exact internal details might be able to do that). Also, G1GC can do partial collections: when it collects young regions + some of the old regions (not all), still making it far better than a Full GC time-wise.
Unfortunately, your point about no traffic is correct. When there is very limited traffic, you might not get a Full GC, but immediately as traffic comes in, such a thing might happen. Old regions might slowly build up during your "limited traffic" and as soon as you have a spike - surprise. There are ways to trigger a Full GC on demand, and though I have heard of such applications that do this - I have not worked with one in practice.
In general with a GC that's not reference-counting, you'll see that zig-zag pattern because memory is only reclaimed when a GC runs.
G1 normally only collects areas of the heap where it expects to find a lot of garbage relative to live objects ("garbage collection" is a bit of a misnomer: it actually involves collecting the live objects and (in the case of a relocating garbage collector like G1) moving the live objects to a different area of the heap, which allows the area it collected in to then be declared ready for new allocations; therefore the fewer live objects it needs to handle, the less work it needs to do relative to the memory freed up).
At a high-level, G1 works by defining an Eden (a young generation) where newly created objects where newly created objects are allocated and it divides Eden into multiple regions with each thread being mapped to a region. When a region fills up, only that region is collected, with the survivors being moved into an older generation (this is simplifying). This continues until the survivor generation is full, at which point the survivor and eden generations are collected, with the surviving survivors being promoted to the old generation, and when the old generation fills up, you have a full GC.
So there isn't necessarily a fixed threshold where a full GC will get triggered, but in general the more heap gets used up, the more likely it becomes that a full GC will run. Beyond that, garbage collectors on the JVM tend to be more or less autonomous: most will ignore System.gc and/or other attempts to trigger a GC.
Conceivably with G1, if you allocated a multi-GiB array at startup, threw away the reference, and then after every period of idleness reallocated an array of the same size as the one you allocated at startup and then threw away the reference, you'd have a decent chance of triggering a full GC. This is because that array is big enough to bypass eden and go straight to the old generation where it will consume heap until the next full GC. Eventually there won't be enough contiguous free space in the old generation to allocate these arrays, and that will trigger a full GC. The only complications to this approach are that:
You'll eventually have to outsmart the JIT optimizer, which will see that you're allocating this array and throwing it away and decide that it doesn't actually have to allocate the array
If you have a long enough busy time that a full GC ran since the last allocate-and-throw-away, there's no guarantee that the allocation of the large array will succeed after a full GC, which will cause an OOM.
In reading about ZGC I notice that it brags about being a 'single-generation garbage collector,' but I rarely see any details on exactly what this term means.
Normal Generational GC
I'm familiar with Eden, the survivor space, the nursery, hospice care, metaspace, permgen, zombified objects, Noah's Ark and the old age home, so I don't need an explanation of how concurrent mark sweep (CMS GC) or the garbage first (G1 GC) algorithms work. My understanding is those are both multi-generational, and I'm good with that.
Personally, I always liked being able to go through Java Mission Control and see how many generations of GC cycles an object has lived through. That was always helpful in troubleshooting a memory leak or GC problem.
Single vs Multi-Generational GC
So what exactly is a 'single-generation' garbage collector, and how does that differ from how objects are currently tracked through multiple garbage collection cycles in CMS and G1?
tl;dr
ZGC has much to brag about, but being single-generation is not among that.
Adding multi-generational is planned as future improvement.
Details
Some garbage collector implementations distinguish between new objects and old objects, the “young generation” versus “old generation”. The goal is to make it cheaper for the programmer to create short-lived objects as they will be more quickly and efficiently disposed of to free up memory.
ZGC does not “brag” about being single-generational. Quite the opposite, the team has indicated their desire to add generational features in future versions of ZGC. See a discussion in this section of a talk by Per Liden of Oracle (wait a few seconds for new slide “Generational ZGC”): https://youtu.be/88E86quLmQA
For more info on generations, see this Question: Java heap terminology: young, old and permanent generations?
You do know the details, but do you know why generational garbage collectors were invented? Specifically, why G1GC is generational?
The idea at the time was that scanning the entire heap was expensive, were expensive meant STW. G1 as a matter of fact started its life a non-generational (or single-generation as you call it), but the size of remembered sets and their time to go over them, triggered it to become generational. The fact that there are "generations" + the fact that G1 scans all the young regions in a single cycle (or better said if it "committed" to scan some X number of young regions at the beginning of some cycle - it must scan them all), means that you do not need for remembered sets to have connections between them in the young space (also card table is a lot smaller).
As to you direct question, ZGC (as well as Shenandoah) do not "track" how many cycles an Object has survived, they do not need to do that, as they have no generations, so no need to move Objects somewhere in the "old" to be scanned "later". They scan the entire heap on each cycle. They do this concurrently (in a very interesting way), so there is simply no need, at the moment, for them to be generational.
Suppose we are using the mark-sweep garbage collection algorithm, if we are able to flag memory allocations as "reclaimable", isn't that enough? Wouldn't the program know that "reclaimable" memory is basically unused memory that can be allocated when requested? What are the physical differences between a "reclaimable" chunk compared to an "unused" chunk as shown in this picture:
Marking - During the mark phase all objects that are reachable from Java threads, native handles and other root sources are marked as alive, as well as the objects that are reachable from these objects and so forth. This process identifies and marks all objects that are still used, and the rest can be considered garbage.
Sweeping - During the sweep phase the heap is traversed to find the gaps between the live objects. These gaps are recorded in a free list and are made available for new object allocation.
Unused vs Reclaimable space -
Unused space is nothing but gaps between Alive blocks, these gaps are created due to garbage-collection of unused/reclaimable objects, compaction stage will move unused blocks to the end. Compare two diagrams from the posted image.
The sweep step is just "another" way of delaying the stop-the-world event. Of course, in theory this is a good thing. While marking, also keep track of this "free" space, where dead objects have been identified. Thus, you can use that information from these lists when you allocate next time.
The thing is that the heap is not "compact" when you use sweep only, as such when there is not contiguous free space for allocation (based on these free lists information), a compaction phase will still be performed.
My last point is that CMS is deprecated, no one supports in, so understanding what it does might be interesting, but not really valuable; as other GC's do things differently.
I read that garbage collection can lead to memory fragmentation problem at run-time. To solve this problem, compacting is done by the JVM where it takes all the active objects and assigns them contiguous memory.
This means that the object addresses must change from time to time? Also, if this happens,
Are the references to these objects also re-assigned?
Won't this cause significant performance issues? How does Java cope with it?
I read that garbage collection can lead to memory fragmentation problem at run-time.
This is not an exclusive problem of garbage collected heaps. When you have a manually managed heap and free memory in a different order than the preceding allocations, you may get a fragmented heap as well. And being able to have different lifetimes than the last-in-first-out order of automatic storage aka stack memory, is one of the main motivations to use the heap memory.
To solve this problem, compacting is done by the JVM where it takes all the active objects and assigns them contiguous memory.
Not necessarily all objects. Typical implementation strategies will divide the memory into logical regions and only move objects from a specific region to another, but not all existing objects at a time. These strategies may incorporate the age of the objects, like generational collectors moving objects of the young generation from the Eden space to a Survivor space, or the distribution of the remaining objects, like the “Garbage First” collector which will, like the name suggests, evacuate the fragment with the highest garbage ratio first, which implies the smallest work to get a free contiguous memory block.
This means that the object addresses must change from time to time?
Of course, yes.
Also, if this happens,
Are the references to these objects also re-assigned?
The specification does not mandate how object references are implemented. An indirect pointer may eliminate the need to adapt all references, see also this Q&A. However, for JVMs using direct pointers, this does indeed imply that these pointers need to get adapted.
Won't this cause significant performance issues? How does Java cope with it?
First, we have to consider what we gain from that. To “eliminate fragmentation” is not an end in itself. If we don’t do it, we have to scan the reachable objects for gaps between them and create a data structure maintaining this information, which we would call “free memory” then. We also needed to implement memory allocations as a search for matching chunks in this data structure or to split chunks if no exact match has been found. This is a rather expensive operation compared to an allocation from a contiguous free memory block, where we only have to bump the pointer to the next free byte by the required size.
Given that allocations happen much more often than garbage collection, which only runs when the memory is full (or a threshold has been crossed), this does already justify more expensive copy operations. It also implies that just using a larger heap can solve performance issues, as it reduces the number of required garbage collector runs, whereas the number of survivor objects will not scale with the memory (unreachable objects stay unreachable, regardless of how long you defer the collection). In fact, deferring the collection raises the chances that more objects became unreachable in the meanwhile. Compare also with this answer.
The costs of adapting references are not much higher than the costs of traversing references in the marking phase. In fact, non-concurrent collectors could even combine these two steps, transferring an object on first encounter and adapting subsequently encountered references, instead of marking the object. The actual copying is the more expensive aspect, but as explained above, it is reduced by not copying all objects but using certain strategies based on typical application behavior, like generational approaches or the “garbage first” strategy, to minimize the required work.
If you move an object around the memory, its address will change. Therefore, references pointing to it will need to be updated. Memory fragmentation occurs when an object in a contigous (in memory) sequence of objects gets deleted. This creates a hole in the memory space, which is generally bad because contigous chunks of memory have faster access times and a higher probability of fitting in chache lines, among other things. It should be noted that the use of indirection tables can prevent reference updates up to the maximum level of indirection used.
Garbage collection has a moderate performance overhead, not just in Java but in other languages as well, such as C# for example. As how Java copes with this, the strategies for performing garbage collection and how to minimize its impact on performance depends on the particular JVM being used, since each JVM can implement garbage collection however it pleases; the only requirement is that it meets the JVM specification.
However, as a programmer, there are some best practices you should follow to make the best out of garbage collection and to minimze its performance hit on your application. See this, also this, this, this blog post, and this other blog post. You might want to check the JVM specs but it's a bit dense.
I've often read that in the Sun JVM short-lived objects ("relatively new objects") can be garbage collected more efficiently than long-lived objects ("relatively old objects")
Why is that so?
Is that specific to the Sun JVM or does this result from a general garbage collection principle?
Most Java apps create Java objects and then discard them rather quickly eg. you create some objects in a method then once you exit the method all the object dies. Most apps behave this way and most people tend to code their apps this way. The Java heap is roughly broken up into 3 parts, permanent, old (long lived) generation, and young (short lived) generation. Young gen is further broken up into S1, S2 and eden. These are just heaps.
Most objects are created in the young gen. The idea here is that, since the mortality rate of objects is high, we quickly create them, use them and then discard them. Speed is of essence. As you create objects, the young gen fills up, until a minor GC occurs. In a minor GC, all objects that are alive are copied over from eden and say S2 to S1. Then, the 'pointer' is rested on eden and S2.
Every copy ages the object. By default, if an object survives 32 copies viz. 32 minor GC, then the GC figures that it is going to be around for a lot longer. So, what it does is to tenure it, by moving it to the old generation. Old gen is just one big space. When the old gen fills up, a full GC, or major GC, happens in the old gen. Because there is no other space to copy to, the GC has to compact. This is a lot slower than minor GC, that's why we avoid doing that more frequently.
You can tune the tenuring parameter with
java -XX:MaxTenuringThreshold=16
if you know that you have lots of long lived objects. You can print the various age bucket of your app with
java -XX:-PrintTenuringDistribution
(see above explanations for more general GC.. this answers WHY new is cheaper to GC than old).
The reason eden can be cleared faster is simple: the algorithm is proportional to the number of objects that will survive GC in the eden space, not proportional to the number of live objects in the whole heap. ie: if you have an average object death rate of 99% in eden (ie: 99% of objects do not survive GC, which is not abnormal), you only need to look at and copy that 1%. For "old" GC, all live objects in the full heap need to be marked/swept. That is significantly more expensive.
This is generational garbage collection. It's used pretty widely these days. See more here: (wiki).
Essentially, the GC assumes that new objects are more likely to become unreachable than older ones.
There this phenomena that "most objects die young". Many objects are created inside a method and never stored in a field. Therefore, as soon as the method exits these objects "die" and thus are candidate for collection at the next collection cycle.
Here is an example:
public String concatenate(int[] arr) {
StringBuilder sb = new StringBuilder();
for(int i = 0; i < arr.length; ++i)
sb.append(i > 0 ? "," : "").append(arr[i]);
return sb.toString();
}
The sb object will become garbage as soon as the method returns.
By splitting the object space into two (or more) age-based areas the GC can be more efficient: instead of frequently scanning the entire heap, the GC frequently scans only the nursery (the young objects area) - which, obviously, takes much less time that a full heap scan. The older objects area is scanned less frequently.
Young objects are managed more efficiently (not only collected; accesses to young objects are also faster) because they are allocated in a special area (the "young generation"). That special area is more efficient because it is collected "in one go" (with all threads stopped) and neither the collector nor the applicative code has to deal with concurrent access from the other.
The trade-off, here, is that the "world" is stopped when the "efficient area" is collected. This may induce a noticeable pause. The JVM keeps pause times low by keeping the efficient area small enough. In other words, if there is an efficiently-managed area, then that area must be small.
A very common heuristic, applicable to many programs and programming languages, is that many objects are very short-lived, and most of the write accesses occur in young objects (those which were created recently). It is possible to write application code which does not work that way, but these heuristic will be "mostly true" on "most applications". Thus, it makes sense to store young objects in the efficiently-managed area. Which is what the JVM GC does, and which is why that efficient area is called the "young generation".
Note that there are systems where the whole memory is handled "efficiently". When the GC must run, the application becomes "frozen" for a few seconds. This is harmless for long-run computations, but detrimental to interactivity, which is why most modern GC-enabled programming environments use generational GC with a limited-size young generation.
This is based on the observation that the life-expectancy of an object goes up as it ages. So it makes sense to move objects to a less-frequently collected pool once they reach a certain age.
This isn't a fundamental property of the way programs use memory. You could write a pathological program that kept all objects around for a long time (and the same length of time for all objects), but this tends not to happen by accident.
The JVM (usually) uses a generational garbage collector. This kind of collector separates the heap memory into several pools, according to the age of the objects in there. The reasoning here is based on the observation that most objects are short-lived, so that if you do a garbage collection on an area of memory with "young" objects, you can reclaim relatively more memory than if you do garbage collection across "older" objects.
In the Hotspot JVM, new objects get allocated in the so-called Eden area. When this area fills up, the JVM will sweep the Eden area (which does not take too much time, because it is not so big). Objects that are still alive are moved to the Survivor area, and the rest is discarded, freeing up Eden for the next generation. When the Eden collection is not sufficient does the the garbage collector move on to the older generations (which takes more work).
All GCs behave that way. The basic idea is that you try to reduce the amount of objects that you need to check every time you run the GC because this is a pretty expensive operation. So if you have millions of objects but just need to check a few, that's way better than to have to check all of them. Also, a feature of GC plays into your hands: Temporary objects (which can't be reached by anyone anymore), have no cost during the GC run (well, let's ignore the finalize() method for now). Only objects which survive cost CPU time. Next, there is the observation that many objects are short lived.
Therefore, objects are created in a small space (called "Eden" or "young gen"). After a while, all objects that can be reached are copied (= expensive) out of this space and the space is then declared empty (so Java effectively forgets about all unreachable objects, so they don't have a cost since they don't have to be copied). Over time, long lived objects are moved to "older" spaces and the older spaces are swept less often to reduce the GC overhead (for example, every N runs, the GC will run an old space instead of the eden space).
Just to compare: If you allocate an object in C/C++, you need to call free() plus the destructor for each of them. This is one reason why GC is faster than traditional, manual memory management.
Of course, this is a rather simplified look. Today, working on GC is at the level of compiler design (i.e. done by very few people). GCs pull all kinds of tricks to make the whole process efficient and unnoticeable. See the Wikipedia article for some pointers.