Suppose we are using the mark-sweep garbage collection algorithm, if we are able to flag memory allocations as "reclaimable", isn't that enough? Wouldn't the program know that "reclaimable" memory is basically unused memory that can be allocated when requested? What are the physical differences between a "reclaimable" chunk compared to an "unused" chunk as shown in this picture:
Marking - During the mark phase all objects that are reachable from Java threads, native handles and other root sources are marked as alive, as well as the objects that are reachable from these objects and so forth. This process identifies and marks all objects that are still used, and the rest can be considered garbage.
Sweeping - During the sweep phase the heap is traversed to find the gaps between the live objects. These gaps are recorded in a free list and are made available for new object allocation.
Unused vs Reclaimable space -
Unused space is nothing but gaps between Alive blocks, these gaps are created due to garbage-collection of unused/reclaimable objects, compaction stage will move unused blocks to the end. Compare two diagrams from the posted image.
The sweep step is just "another" way of delaying the stop-the-world event. Of course, in theory this is a good thing. While marking, also keep track of this "free" space, where dead objects have been identified. Thus, you can use that information from these lists when you allocate next time.
The thing is that the heap is not "compact" when you use sweep only, as such when there is not contiguous free space for allocation (based on these free lists information), a compaction phase will still be performed.
My last point is that CMS is deprecated, no one supports in, so understanding what it does might be interesting, but not really valuable; as other GC's do things differently.
Related
I've a AKKA-HTTP based service which is written in scala. This service works as a proxy for an API call. It creates a host connection pool for calling API using
https://doc.akka.io/docs/akka-http/current/client-side/host-level.html
The service is integrated with NewRelic and has the attached snapshots
I would like to understand the reasons for this kind of zig-zag patterns even when there is no traffic on the service and the connections in the host-pool gets terminated because of idle-timeout.
Moreover, I would also like to know Does the FULL GC will only occur after it reached a threshold say 7GB? or it can also occur at some other time when there is no traffic?
The service has XmX of 8GB. Moreover, there are also multiple dispatchers(fork-join-executor) which performs multiple tasks.
First, your graphs show a very healthy application. This "chainsaw" pattern is overall seen as a very good thing, without much to worry about.
When exactly a Full GC is going to happen is a bit hard to predict (I would use the word impossible, too). When your "live" objects have nowhere to move (because there is simply no space for that), a Full GC may be triggered. There are certain thresholds of when a concurrent phase (marking) is going to be initiated, but if that results in a Full GC or not is decided later.
Considering that G1 also re-sizes regions (makes them less/more) based on heuristics, and the fact that it can also shrink or grow your heap (up to -Xmx), the exact conditions when a Full GC might happen is not easy to predict (I guess some GC experts that know the exact internal details might be able to do that). Also, G1GC can do partial collections: when it collects young regions + some of the old regions (not all), still making it far better than a Full GC time-wise.
Unfortunately, your point about no traffic is correct. When there is very limited traffic, you might not get a Full GC, but immediately as traffic comes in, such a thing might happen. Old regions might slowly build up during your "limited traffic" and as soon as you have a spike - surprise. There are ways to trigger a Full GC on demand, and though I have heard of such applications that do this - I have not worked with one in practice.
In general with a GC that's not reference-counting, you'll see that zig-zag pattern because memory is only reclaimed when a GC runs.
G1 normally only collects areas of the heap where it expects to find a lot of garbage relative to live objects ("garbage collection" is a bit of a misnomer: it actually involves collecting the live objects and (in the case of a relocating garbage collector like G1) moving the live objects to a different area of the heap, which allows the area it collected in to then be declared ready for new allocations; therefore the fewer live objects it needs to handle, the less work it needs to do relative to the memory freed up).
At a high-level, G1 works by defining an Eden (a young generation) where newly created objects where newly created objects are allocated and it divides Eden into multiple regions with each thread being mapped to a region. When a region fills up, only that region is collected, with the survivors being moved into an older generation (this is simplifying). This continues until the survivor generation is full, at which point the survivor and eden generations are collected, with the surviving survivors being promoted to the old generation, and when the old generation fills up, you have a full GC.
So there isn't necessarily a fixed threshold where a full GC will get triggered, but in general the more heap gets used up, the more likely it becomes that a full GC will run. Beyond that, garbage collectors on the JVM tend to be more or less autonomous: most will ignore System.gc and/or other attempts to trigger a GC.
Conceivably with G1, if you allocated a multi-GiB array at startup, threw away the reference, and then after every period of idleness reallocated an array of the same size as the one you allocated at startup and then threw away the reference, you'd have a decent chance of triggering a full GC. This is because that array is big enough to bypass eden and go straight to the old generation where it will consume heap until the next full GC. Eventually there won't be enough contiguous free space in the old generation to allocate these arrays, and that will trigger a full GC. The only complications to this approach are that:
You'll eventually have to outsmart the JIT optimizer, which will see that you're allocating this array and throwing it away and decide that it doesn't actually have to allocate the array
If you have a long enough busy time that a full GC ran since the last allocate-and-throw-away, there's no guarantee that the allocation of the large array will succeed after a full GC, which will cause an OOM.
In reading about ZGC I notice that it brags about being a 'single-generation garbage collector,' but I rarely see any details on exactly what this term means.
Normal Generational GC
I'm familiar with Eden, the survivor space, the nursery, hospice care, metaspace, permgen, zombified objects, Noah's Ark and the old age home, so I don't need an explanation of how concurrent mark sweep (CMS GC) or the garbage first (G1 GC) algorithms work. My understanding is those are both multi-generational, and I'm good with that.
Personally, I always liked being able to go through Java Mission Control and see how many generations of GC cycles an object has lived through. That was always helpful in troubleshooting a memory leak or GC problem.
Single vs Multi-Generational GC
So what exactly is a 'single-generation' garbage collector, and how does that differ from how objects are currently tracked through multiple garbage collection cycles in CMS and G1?
tl;dr
ZGC has much to brag about, but being single-generation is not among that.
Adding multi-generational is planned as future improvement.
Details
Some garbage collector implementations distinguish between new objects and old objects, the “young generation” versus “old generation”. The goal is to make it cheaper for the programmer to create short-lived objects as they will be more quickly and efficiently disposed of to free up memory.
ZGC does not “brag” about being single-generational. Quite the opposite, the team has indicated their desire to add generational features in future versions of ZGC. See a discussion in this section of a talk by Per Liden of Oracle (wait a few seconds for new slide “Generational ZGC”): https://youtu.be/88E86quLmQA
For more info on generations, see this Question: Java heap terminology: young, old and permanent generations?
You do know the details, but do you know why generational garbage collectors were invented? Specifically, why G1GC is generational?
The idea at the time was that scanning the entire heap was expensive, were expensive meant STW. G1 as a matter of fact started its life a non-generational (or single-generation as you call it), but the size of remembered sets and their time to go over them, triggered it to become generational. The fact that there are "generations" + the fact that G1 scans all the young regions in a single cycle (or better said if it "committed" to scan some X number of young regions at the beginning of some cycle - it must scan them all), means that you do not need for remembered sets to have connections between them in the young space (also card table is a lot smaller).
As to you direct question, ZGC (as well as Shenandoah) do not "track" how many cycles an Object has survived, they do not need to do that, as they have no generations, so no need to move Objects somewhere in the "old" to be scanned "later". They scan the entire heap on each cycle. They do this concurrently (in a very interesting way), so there is simply no need, at the moment, for them to be generational.
At least in old GCs, it holds true. (I know there are new ones like ZGC and Shenandoah that aim to eliminate that)
As far as I know GC keeps tracks of living objects, so shouldn't the GC times be mostly affected by the number of objects (living/needs to be cleared)?
EDIT:
I meant grows in terms of capacity, meaning bigger heap but same utilization of it by the application
Didn't you answer your own question?
As far as I know GC keeps tracks of living objects, so shouldn't the GC times be mostly affected by the number of objects (living/needs to be cleared)?
The more the heap grows, the more live objects it has, the slower the GC (I'm sure there are exceptions to this rule, in particular for minor collections, but that's the rough idea). The number of objects to be cleared is irrelevant, what matters most is the total number of live objects. Now if your heap is growing because you're storing long-lived objects, it might be ok as long as you don't keep on adding more and more of them. Eventually, long-lived objects will move towards the survivor space and will only be subject to major collections and not minor ones. As long as the minor GC always achieve sufficient memory freeing from the young generation, major GC won't be triggered on all objects (which includes long-lived ones).
I have also observed a different behaviour with G1. We had a low-latency application (40ms p99), so we attempted to configure G1 to make very short pauses (can't remember how much, maybe 5ms or so). What happened is that G1 was more or less meeting the 5ms target, but it had to run extremely frequently because 5ms was not enough to cope with all dead objects we had in our heap. Therefore, it's not exactly true to say individual garbage collection runs are going to get slower with increased heap size, however the average time spent in garbage collection in a given period of time is most likely going to increase.
There are many different algorithms that can be used to implement garbage collection. Not all of them exhibit the behaviour you mention.
In the case of your question, you are referring to algorithms that use a form of mark-sweep. If we take the HostSpot JVM as an example, the old generation can be collected using the CMS collector. This uses a marking phase, where all objects that are accessible from application code are marked. Initially, a root set of directly accessible objects (object references on the stack, registers, etc.) is created. Each object in this set has the mark-bit set in its header to indicate it is still in use. All references from these objects are recursively followed and ulitmately every accessible object has the mark-bit set. How long this takes is proportional to the number of live objects, not the size of the heap.
The sweeping phase then has to sweep throught the entire heap, looking for objects with the mark-bit set and determining the gaps between them so that they can be added to free lists. These are used to allocate space for objects being promoted from the young generation. Since the whole heap must be swept, the time this takes is proportional to the size of the heap, regardless of how much live data is in the heap.
In the case of G1, the algorithm is similar but each generation of the heap is divided into regions so that space can be reclaimed in a more efficent way.
While I am reading Oracle's documentation on G1 Garbage Collector, I noted following -
When performing garbage collections, G1 operates in a manner similar to the CMS collector. G1 performs a concurrent global marking phase to determine the liveness of objects throughout the heap. After the mark phase completes, G1 knows which regions are mostly empty. It collects in these regions first, which usually yields a large amount of free space. This is why this method of garbage collection is called Garbage-First
As mentioned in above quote, G1 selects the regions which are mostly empty.
My question is, if any region is mostly empty, how would that yields large amount of free memory ? If it is mostly empty, it is already a part of free memory. Isn't it ?
Could anyone here help me clarify this ?
In this sentence:
After the mark phase completes, G1 knows which regions are mostly empty.
"mostly empty" means "contains the most reclaimable garbage". This is clear from the context. The purpose of the mark phase is to determine which objects are definitely or probably reachable. The rest are definitely unreachable, and can be collected.
Collecting regions with the largest amount of reclaimable space is good for two reasons:
You get the most space back soonest.
With a copying collector, there is less work to do if the "from" space mostly contains stuff that you don't need to copy to the "to" space. So you get the most space back efficiently.
In most use-cases, the 2nd reason is more significant. It is rarely important to get space back quickly. You just need the space to be available when the application requests it. (GC pauses are a different matter, but they are caused by other things.)
Think of of "garbage first" as a shorthand for "most garbage-y first." That is, it defines various blocks of memories, and then prioritizes the ones with the most garbage -- thus getting the most bang for its buck.
From the page you cited:
G1 concentrates its collection and compaction activity on the areas of the heap that are likely to be full of reclaimable objects, that is, garbage. (emphasis added)
... as opposed to just treating every block of memory as equally-good for GC, even if 99% of it contains live objects.
I've often read that in the Sun JVM short-lived objects ("relatively new objects") can be garbage collected more efficiently than long-lived objects ("relatively old objects")
Why is that so?
Is that specific to the Sun JVM or does this result from a general garbage collection principle?
Most Java apps create Java objects and then discard them rather quickly eg. you create some objects in a method then once you exit the method all the object dies. Most apps behave this way and most people tend to code their apps this way. The Java heap is roughly broken up into 3 parts, permanent, old (long lived) generation, and young (short lived) generation. Young gen is further broken up into S1, S2 and eden. These are just heaps.
Most objects are created in the young gen. The idea here is that, since the mortality rate of objects is high, we quickly create them, use them and then discard them. Speed is of essence. As you create objects, the young gen fills up, until a minor GC occurs. In a minor GC, all objects that are alive are copied over from eden and say S2 to S1. Then, the 'pointer' is rested on eden and S2.
Every copy ages the object. By default, if an object survives 32 copies viz. 32 minor GC, then the GC figures that it is going to be around for a lot longer. So, what it does is to tenure it, by moving it to the old generation. Old gen is just one big space. When the old gen fills up, a full GC, or major GC, happens in the old gen. Because there is no other space to copy to, the GC has to compact. This is a lot slower than minor GC, that's why we avoid doing that more frequently.
You can tune the tenuring parameter with
java -XX:MaxTenuringThreshold=16
if you know that you have lots of long lived objects. You can print the various age bucket of your app with
java -XX:-PrintTenuringDistribution
(see above explanations for more general GC.. this answers WHY new is cheaper to GC than old).
The reason eden can be cleared faster is simple: the algorithm is proportional to the number of objects that will survive GC in the eden space, not proportional to the number of live objects in the whole heap. ie: if you have an average object death rate of 99% in eden (ie: 99% of objects do not survive GC, which is not abnormal), you only need to look at and copy that 1%. For "old" GC, all live objects in the full heap need to be marked/swept. That is significantly more expensive.
This is generational garbage collection. It's used pretty widely these days. See more here: (wiki).
Essentially, the GC assumes that new objects are more likely to become unreachable than older ones.
There this phenomena that "most objects die young". Many objects are created inside a method and never stored in a field. Therefore, as soon as the method exits these objects "die" and thus are candidate for collection at the next collection cycle.
Here is an example:
public String concatenate(int[] arr) {
StringBuilder sb = new StringBuilder();
for(int i = 0; i < arr.length; ++i)
sb.append(i > 0 ? "," : "").append(arr[i]);
return sb.toString();
}
The sb object will become garbage as soon as the method returns.
By splitting the object space into two (or more) age-based areas the GC can be more efficient: instead of frequently scanning the entire heap, the GC frequently scans only the nursery (the young objects area) - which, obviously, takes much less time that a full heap scan. The older objects area is scanned less frequently.
Young objects are managed more efficiently (not only collected; accesses to young objects are also faster) because they are allocated in a special area (the "young generation"). That special area is more efficient because it is collected "in one go" (with all threads stopped) and neither the collector nor the applicative code has to deal with concurrent access from the other.
The trade-off, here, is that the "world" is stopped when the "efficient area" is collected. This may induce a noticeable pause. The JVM keeps pause times low by keeping the efficient area small enough. In other words, if there is an efficiently-managed area, then that area must be small.
A very common heuristic, applicable to many programs and programming languages, is that many objects are very short-lived, and most of the write accesses occur in young objects (those which were created recently). It is possible to write application code which does not work that way, but these heuristic will be "mostly true" on "most applications". Thus, it makes sense to store young objects in the efficiently-managed area. Which is what the JVM GC does, and which is why that efficient area is called the "young generation".
Note that there are systems where the whole memory is handled "efficiently". When the GC must run, the application becomes "frozen" for a few seconds. This is harmless for long-run computations, but detrimental to interactivity, which is why most modern GC-enabled programming environments use generational GC with a limited-size young generation.
This is based on the observation that the life-expectancy of an object goes up as it ages. So it makes sense to move objects to a less-frequently collected pool once they reach a certain age.
This isn't a fundamental property of the way programs use memory. You could write a pathological program that kept all objects around for a long time (and the same length of time for all objects), but this tends not to happen by accident.
The JVM (usually) uses a generational garbage collector. This kind of collector separates the heap memory into several pools, according to the age of the objects in there. The reasoning here is based on the observation that most objects are short-lived, so that if you do a garbage collection on an area of memory with "young" objects, you can reclaim relatively more memory than if you do garbage collection across "older" objects.
In the Hotspot JVM, new objects get allocated in the so-called Eden area. When this area fills up, the JVM will sweep the Eden area (which does not take too much time, because it is not so big). Objects that are still alive are moved to the Survivor area, and the rest is discarded, freeing up Eden for the next generation. When the Eden collection is not sufficient does the the garbage collector move on to the older generations (which takes more work).
All GCs behave that way. The basic idea is that you try to reduce the amount of objects that you need to check every time you run the GC because this is a pretty expensive operation. So if you have millions of objects but just need to check a few, that's way better than to have to check all of them. Also, a feature of GC plays into your hands: Temporary objects (which can't be reached by anyone anymore), have no cost during the GC run (well, let's ignore the finalize() method for now). Only objects which survive cost CPU time. Next, there is the observation that many objects are short lived.
Therefore, objects are created in a small space (called "Eden" or "young gen"). After a while, all objects that can be reached are copied (= expensive) out of this space and the space is then declared empty (so Java effectively forgets about all unreachable objects, so they don't have a cost since they don't have to be copied). Over time, long lived objects are moved to "older" spaces and the older spaces are swept less often to reduce the GC overhead (for example, every N runs, the GC will run an old space instead of the eden space).
Just to compare: If you allocate an object in C/C++, you need to call free() plus the destructor for each of them. This is one reason why GC is faster than traditional, manual memory management.
Of course, this is a rather simplified look. Today, working on GC is at the level of compiler design (i.e. done by very few people). GCs pull all kinds of tricks to make the whole process efficient and unnoticeable. See the Wikipedia article for some pointers.