Related
At least in old GCs, it holds true. (I know there are new ones like ZGC and Shenandoah that aim to eliminate that)
As far as I know GC keeps tracks of living objects, so shouldn't the GC times be mostly affected by the number of objects (living/needs to be cleared)?
EDIT:
I meant grows in terms of capacity, meaning bigger heap but same utilization of it by the application
Didn't you answer your own question?
As far as I know GC keeps tracks of living objects, so shouldn't the GC times be mostly affected by the number of objects (living/needs to be cleared)?
The more the heap grows, the more live objects it has, the slower the GC (I'm sure there are exceptions to this rule, in particular for minor collections, but that's the rough idea). The number of objects to be cleared is irrelevant, what matters most is the total number of live objects. Now if your heap is growing because you're storing long-lived objects, it might be ok as long as you don't keep on adding more and more of them. Eventually, long-lived objects will move towards the survivor space and will only be subject to major collections and not minor ones. As long as the minor GC always achieve sufficient memory freeing from the young generation, major GC won't be triggered on all objects (which includes long-lived ones).
I have also observed a different behaviour with G1. We had a low-latency application (40ms p99), so we attempted to configure G1 to make very short pauses (can't remember how much, maybe 5ms or so). What happened is that G1 was more or less meeting the 5ms target, but it had to run extremely frequently because 5ms was not enough to cope with all dead objects we had in our heap. Therefore, it's not exactly true to say individual garbage collection runs are going to get slower with increased heap size, however the average time spent in garbage collection in a given period of time is most likely going to increase.
There are many different algorithms that can be used to implement garbage collection. Not all of them exhibit the behaviour you mention.
In the case of your question, you are referring to algorithms that use a form of mark-sweep. If we take the HostSpot JVM as an example, the old generation can be collected using the CMS collector. This uses a marking phase, where all objects that are accessible from application code are marked. Initially, a root set of directly accessible objects (object references on the stack, registers, etc.) is created. Each object in this set has the mark-bit set in its header to indicate it is still in use. All references from these objects are recursively followed and ulitmately every accessible object has the mark-bit set. How long this takes is proportional to the number of live objects, not the size of the heap.
The sweeping phase then has to sweep throught the entire heap, looking for objects with the mark-bit set and determining the gaps between them so that they can be added to free lists. These are used to allocate space for objects being promoted from the young generation. Since the whole heap must be swept, the time this takes is proportional to the size of the heap, regardless of how much live data is in the heap.
In the case of G1, the algorithm is similar but each generation of the heap is divided into regions so that space can be reclaimed in a more efficent way.
I read that garbage collection can lead to memory fragmentation problem at run-time. To solve this problem, compacting is done by the JVM where it takes all the active objects and assigns them contiguous memory.
This means that the object addresses must change from time to time? Also, if this happens,
Are the references to these objects also re-assigned?
Won't this cause significant performance issues? How does Java cope with it?
I read that garbage collection can lead to memory fragmentation problem at run-time.
This is not an exclusive problem of garbage collected heaps. When you have a manually managed heap and free memory in a different order than the preceding allocations, you may get a fragmented heap as well. And being able to have different lifetimes than the last-in-first-out order of automatic storage aka stack memory, is one of the main motivations to use the heap memory.
To solve this problem, compacting is done by the JVM where it takes all the active objects and assigns them contiguous memory.
Not necessarily all objects. Typical implementation strategies will divide the memory into logical regions and only move objects from a specific region to another, but not all existing objects at a time. These strategies may incorporate the age of the objects, like generational collectors moving objects of the young generation from the Eden space to a Survivor space, or the distribution of the remaining objects, like the “Garbage First” collector which will, like the name suggests, evacuate the fragment with the highest garbage ratio first, which implies the smallest work to get a free contiguous memory block.
This means that the object addresses must change from time to time?
Of course, yes.
Also, if this happens,
Are the references to these objects also re-assigned?
The specification does not mandate how object references are implemented. An indirect pointer may eliminate the need to adapt all references, see also this Q&A. However, for JVMs using direct pointers, this does indeed imply that these pointers need to get adapted.
Won't this cause significant performance issues? How does Java cope with it?
First, we have to consider what we gain from that. To “eliminate fragmentation” is not an end in itself. If we don’t do it, we have to scan the reachable objects for gaps between them and create a data structure maintaining this information, which we would call “free memory” then. We also needed to implement memory allocations as a search for matching chunks in this data structure or to split chunks if no exact match has been found. This is a rather expensive operation compared to an allocation from a contiguous free memory block, where we only have to bump the pointer to the next free byte by the required size.
Given that allocations happen much more often than garbage collection, which only runs when the memory is full (or a threshold has been crossed), this does already justify more expensive copy operations. It also implies that just using a larger heap can solve performance issues, as it reduces the number of required garbage collector runs, whereas the number of survivor objects will not scale with the memory (unreachable objects stay unreachable, regardless of how long you defer the collection). In fact, deferring the collection raises the chances that more objects became unreachable in the meanwhile. Compare also with this answer.
The costs of adapting references are not much higher than the costs of traversing references in the marking phase. In fact, non-concurrent collectors could even combine these two steps, transferring an object on first encounter and adapting subsequently encountered references, instead of marking the object. The actual copying is the more expensive aspect, but as explained above, it is reduced by not copying all objects but using certain strategies based on typical application behavior, like generational approaches or the “garbage first” strategy, to minimize the required work.
If you move an object around the memory, its address will change. Therefore, references pointing to it will need to be updated. Memory fragmentation occurs when an object in a contigous (in memory) sequence of objects gets deleted. This creates a hole in the memory space, which is generally bad because contigous chunks of memory have faster access times and a higher probability of fitting in chache lines, among other things. It should be noted that the use of indirection tables can prevent reference updates up to the maximum level of indirection used.
Garbage collection has a moderate performance overhead, not just in Java but in other languages as well, such as C# for example. As how Java copes with this, the strategies for performing garbage collection and how to minimize its impact on performance depends on the particular JVM being used, since each JVM can implement garbage collection however it pleases; the only requirement is that it meets the JVM specification.
However, as a programmer, there are some best practices you should follow to make the best out of garbage collection and to minimze its performance hit on your application. See this, also this, this, this blog post, and this other blog post. You might want to check the JVM specs but it's a bit dense.
im reading android training article: Performance Tips
Object creation is never free. A generational garbage collector with
per-thread allocation pools for temporary objects can make
allocation cheaper, but allocating memory is always more expensive
than not allocating memory.
what's per-thread allocation pools for temporary objects?
I did't find any docs about this.
Read it as : A generational garbage collector with per-thread allocation , pools for temporary objects .
Per thread garbage collection is that Objects associated only with a thread that created them are tracked. At a garbage collection time for a particular thread, it is determined which objects associated only with that thread remain reachable from a restricted root set associated with the thread. Any thread-only objects that are not determined to be reachable are garbage collected.
What they are saying, and they are right, is object creation (and subsequent collection) can be a major time-taker.
If you look at this example you'll see that at one point, memory management dominated the time, and was fixed by keeping used objects of each class in a free-list, so they could be efficiently re-used.
However, also note in that example, memory management was not the biggest problem at first.
It only became the biggest problem after even bigger problems were removed.
For example, suppose you have a team of people who want to lose weight, relative to another team.
Suppose the team has
1) a 400-lb person, (corresponding to some other problem)
2) a 200-lb person (corresponding to the memory management problem), and
3) a 100-lb person (corresponding to some other problem).
If the team as a whole wants to lose the most weight, where should it concentrate first?
Obviously, they need to work on all three, but if they miss out on the big guy, they're not going to get very far.
So the most aggressive procedure is first find out what the biggest problem is (not by guessing), and fix that.
Then the next biggest, and so on.
The big secret is don't guess.
Everybody knows that, but what do they do? - they guess anyway.
Guesses, by definition, are often wrong, missing the biggest issues.
Let the program tell you what the biggest problem is.
(I use random pausing as in that example.)
What are the "best practices" for creating (and releasing) millions of small objects?
I am writing a chess program in Java and the search algorithm generates a single "Move" object for each possible move, and a nominal search can easily generate over a million move objects per second. The JVM GC has been able to handle the load on my development system, but I'm interested in exploring alternative approaches that would:
Minimize the overhead of garbage collection, and
reduce the peak memory footprint for lower-end systems.
A vast majority of the objects are very short-lived, but about 1% of the moves generated are persisted and returned as the persisted value, so any pooling or caching technique would have to provide the ability to exclude specific objects from being re-used.
I don't expect fully-fleshed out example code, but I would appreciate suggestions for further reading/research, or open source examples of a similar nature.
Run the application with verbose garbage collection:
java -verbose:gc
And it will tell you when it collects. There would be two types of sweeps, a fast and a full sweep.
[GC 325407K->83000K(776768K), 0.2300771 secs]
[GC 325816K->83372K(776768K), 0.2454258 secs]
[Full GC 267628K->83769K(776768K), 1.8479984 secs]
The arrow is before and after size.
As long as it is just doing GC and not a full GC you are home safe. The regular GC is a copy collector in the 'young generation', so objects that are no longer referenced are simply just forgotten about, which is exactly what you would want.
Reading Java SE 6 HotSpot Virtual Machine Garbage Collection Tuning is probably helpful.
Since version 6, the server mode of JVM employs an escape analysis technique. Using it you can avoid GC all together.
Well, there are several questions in one here !
1 - How are short-lived objects managed ?
As previously stated, the JVM can perfectly deal with a huge amount of short lived object, since it follows the Weak Generational Hypothesis.
Note that we are speaking of objects that reached the main memory (heap). This is not always the case. A lot of objects you create does not even leave a CPU register. For instance, consider this for-loop
for(int i=0, i<max, i++) {
// stuff that implies i
}
Let's not think about loop unrolling (an optimisations that the JVM heavily performs on your code). If max is equal to Integer.MAX_VALUE, you loop might take some time to execute. However, the i variable will never escape the loop-block. Therefore the JVM will put that variable in a CPU register, regularly increment it but will never send it back to the main memory.
So, creating millions of objects are not a big deal if they are used only locally. They will be dead before being stored in Eden, so the GC won't even notice them.
2 - Is it useful to reduce the overhead of the GC ?
As usual, it depends.
First, you should enable GC logging to have a clear view about what is going on. You can enable it with -Xloggc:gc.log -XX:+PrintGCDetails.
If your application is spending a lot of time in a GC cycle, then, yes, tune the GC, otherwise, it might not be really worth it.
For instance, if you have a young GC every 100ms that takes 10ms, you spend 10% of your time in the GC, and you have 10 collections per second (which is huuuuuge). In such a case, I would not spend any time in GC tuning, since those 10 GC/s would still be there.
3 - Some experience
I had a similar problem on an application that was creating a huge amount of a given class. In the GC logs, I noticed that the creation rate of the application was around 3 GB/s, which is way too much (come on... 3 gigabytes of data every second ?!).
The problem : Too many frequent GC caused by too many objects being created.
In my case, I attached a memory profiler and noticed that a class represented a huge percentage of all my objects. I tracked down the instantiations to find out that this class was basically a pair of booleans wrapped in an object. In that case, two solutions were available :
Rework the algorithm so that I do not return a pair of booleans but instead I have two methods that return each boolean separately
Cache the objects, knowing that there were only 4 different instances
I chose the second one, as it had the least impact on the application and was easy to introduce. It took me minutes to put a factory with a not-thread-safe cache (I did not need thread safety since I would eventually have only 4 different instances).
The allocation rate went down to 1 GB/s, and so did the frequency of young GC (divided by 3).
Hope that helps !
If you have just value objects (that is, no references to other objects) and really but I mean really tons and tons of them, you can use direct ByteBuffers with native byte ordering [the latter is important] and you need some few hundred lines of code to allocate/reuse + getter/setters. Getters look similar to long getQuantity(int tupleIndex){return buffer.getLong(tupleInex+QUANTITY_OFFSSET);}
That would solve the GC problem almost entirely as long as you do allocate once only, that is, a huge chunk and then manage the objects yourself. Instead of references you'd have only index (that is, int) into the ByteBuffer that has to be passed along. You may need to do the memory align yourself as well.
The technique would feel like using C and void*, but with some wrapping it's bearable. A performance downside could be bounds checking if the compiler fails to eliminate it. A major upside is the locality if you process the tuples like vectors, the lack of the object header reduces the memory footprint as well.
Other than that, it's likely you'd not need such an approach as the young generation of virtually all JVM dies trivially and the allocation cost is just a pointer bump. Allocation cost can be a bit higher if you use final fields as they require memory fence on some platforms (namely ARM/Power), on x86 it is free, though.
Assuming you find GC is an issue (as others point out it might not be) you will be implementing your own memory management for you special case i.e. a class which suffers massive churn. Give object pooling a go, I've seen cases where it works quite well. Implementing object pools is a well trodden path so no need to re-visit here, look out for:
multi-threading: using thread local pools might work for your case
backing data structure: consider using ArrayDeque as it performs well on remove and has no allocation overhead
limit the size of your pool :)
Measure before/after etc,etc
I've met a similar problem. First of all, try to reduce the size of the small objects. We introduced some default field values referencing them in each object instance.
For example, MouseEvent has a reference to Point class. We cached Points and referenced them instead of creating new instances. The same for, for example, empty strings.
Another source was multiple booleans which were replaced with one int and for each boolean we use just one byte of the int.
I dealt with this scenario with some XML processing code some time ago. I found myself creating millions of XML tag objects which were very small (usually just a string) and extremely short-lived (failure of an XPath check meant no-match so discard).
I did some serious testing and came to the conclusion that I could only achieve about a 7% improvement on speed using a list of discarded tags instead of making new ones. However, once implemented I found that the free queue needed a mechanism added to prune it if it got too big - this completely nullified my optimisation so I switched it to an option.
In summary - probably not worth it - but I'm glad to see you are thinking about it, it shows you care.
Given that you are writing a chess program there are some special techniques you can use for decent performance. One simple approach is to create a large array of longs (or bytes) and treat it as a stack. Each time your move generator creates moves it pushes a couple of numbers onto the stack, e.g. move from square and move to square. As you evaluate the search tree you will be popping off moves and updating a board representation.
If you want expressive power use objects. If you want speed (in this case) go native.
One solution I've used for such search algorithms is to create just one Move object, mutate it with new move, and then undo the move before leaving the scope. You are probably analyzing just one move at a time, and then just storing the best move somewhere.
If that's not feasible for some reason, and you want to decrease peak memory usage, a good article about memory efficiency is here: http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf
Just create your millions of objects and write your code in the proper way: don't keep unnecessary references to these objects. GC will do the dirty job for you. You can play around with verbose GC as mentioned to see if they are really GC'd. Java IS about creating and releasing objects. :)
I think you should read about stack allocation in Java and escape analysis.
Because if you go deeper into this topic you may find that your objects are not even allocated on the heap, and they are not collected by GC the way that objects on the heap are.
There is a wikipedia explanation of escape analysis, with example of how this works in Java:
http://en.wikipedia.org/wiki/Escape_analysis
I am not a big fan of GC, so I always try finding ways around it. In this case I would suggest using Object Pool pattern:
The idea is to avoid creating new objects by store them in a stack so you can reuse it later.
Class MyPool
{
LinkedList<Objects> stack;
Object getObject(); // takes from stack, if it's empty creates new one
Object returnObject(); // adds to stack
}
Object pools provide tremendous (sometime 10x) improvements over object allocation on the heap. But the above implementation using a linked list is both naive and wrong! The linked list creates objects to manage its internal structure nullifying the effort.
A Ringbuffer using an array of objects work well. In the example give (a chess programm managing moves) the Ringbuffer should be wrapped into a holder object for the list of all computed moves. Only the moves holder object references would then be passed around.
I have always wondered why the garbage collector in Java activates whenever it feels like it rather than do:
if(obj.refCount == 0)
{
delete obj;
}
Are there any big advantages to how Java does it that I overlooked?
Thanks
Each JVM is different, but the HotSpot JVM does not primarily rely on reference counting as a means for garbage collection. Reference counting has the advantage of being simple to implement, but it is inherently error-prone. In particular, if you have a reference cycle (a set of objects that all refer to one another in a cycle), then reference counting will not correctly reclaim those objects because they all have nonzero reference count. This forces you to use an auxiliary garbage collector from time to time, which tends to be slower (Mozilla Firefox had this exact problem, and IIRC their solution was to add in a garbage collector on top of reference counting). This is why, for example, languages like C++ tend to have a combination of shared_ptrs that use reference counting and weak_ptrs that don't use reference cycles.
Additionally, associating a reference count with each object makes the cost of assigning a reference greater than normal, because of the extra bookkeeping involved of adjusting the reference count (which only gets worse in the presence of multithreading). Furthermore, using reference counting precludes the use of certain types fast of memory allocators, which can be a problem. It also tends to lead to heap fragmentation in its naive form, since objects are scattered through memory rather than tightly-packed, decreasing allocation times and causing poor locality.
The HotSpot JVM uses a variety of different techniques to do garbage collection, but its primary garbage collector is called a stop-and-copy collector. This collector works by allocating objects contiguously in memory next to one another, and allows for extremely fast (one or two assembly instructions) allocation of new objects. When space runs out, all of the new objects are GC'ed simultaneously, which usually kills off most of the new objects that were constructed. As a result, the GC is much faster than a typically reference-counting implementation, and ends up having better locality and better performance.
For a comparison of techniques in garbage collecting, along with a quick overview of how the GC in HotSpot works, you may want to check out these lecture slides from a compilers course that I taught last summer. You may also want to look at the HotSpot garbage collection white paper that goes into much more detail about how the garbage collector works, including ways of tuning the collector on an application-by-application basis.
Hope this helps!
Reference counting has the following limitations:
It is VERY bad for multithreading performance (basically, every assignment of an object reference must be protected).
You cannot free cycles automatically
Because it doesn't work strictly based on reference counting.
Consider circular references which are no longer reachable from the "root" of the application.
For example:
APP has a reference to SOME_SCREEN
SOME_SCREEN has a reference to SOME_CHILD
SOME_CHILD has a reference to SOME_SCREEN
now, APP drops it's reference to SOME_SCREEN.
In this case, SOME_SCREEN still has a reference to SOME_CHILD, and SOME_CHILD still has a reference to SOME_SCREEN - so, in this case, your example doesn't work.
Now, others (Apple with ARC, Microsoft with COM, many others) have solutions for this and work more similarly to how you describe it.
With ARC you have to annotate your references with keywords like strong and weak to let ARC know how to deal with these references (and avoid circular references)... (don't read too far into my specific example with ARC because ARC handles these things ahead-of-time during the compilation process and doesn't require a specific runtime per-se) so it can definitely be done similarly to how you describe it, but it's just not workable with some of the features of Java. I also believe COM works more similarly to how you describe... but again, that's not free of some amount of consideration on the developer's part.
In fact, no "simple" reference counting scheme would ever be workable without some amount of thought by the application developer (to avoid circular references, etc)
Because the garbage collector in modern JVMs is no longer tracking references count. This algorithm is used to teach how GC works, but it was both resource-consuming and error-prone (e.g. cyclic dependencies).
because the garbage collector in java is based on copying collector for 'youg generation' objects, and
mark and sweep for `tenure generations' objects.
Resources from: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html