Garbage collector vs. collections - java

I have read few posts about garbage collection in Java, but still I cannot decide whether clearing a collection explicitly is considered a good practice or not... and since I could not find a clear answer, I decided to ask it here.
Consider this example:
List<String> list = new LinkedList<>();
// here we use the list, perhaps adding hundreds of items in it...
// ...and now the work is done, the list is not needed anymore
list.clear();
list = null;
From what I saw in implementations of e.g. LinkedList or HashSet, the clear() method basically just loops all the items in the given collection, setting all its elements (in case of LinkedList also references to next and previous elements) to null
If I got it right, setting the list to null just removes one reference from list - considering it was the only reference to it, the garbage collector will eventually take care of it. I just don't know how long would it take until also the list's elements are processed by garbage collector in this case.
So my question is - do the last two lines of the above listed example code actually help the garbage collector to work more efficiently (i.e. to collect the list's elements earlier) or would I just make my application busy with "irrelevant tasks"?

The last two lines do not help.
Once the list variable goes out of scope*, if that's the last reference to the linked list then the list becomes eligible for garbage collection. Setting list to null immediately beforehand adds no value.
Once the list becomes eligible for garbage collection, so to do its elements if the list holds the only references to them. Clearing the list is unnecessary.
For the most part you can trust the garbage collector to do its job and do not need to "help" it.
* Pedantically speaking, it's not scope that controls garbage collection, but reachability. Reachability isn't easy to sum up in one sentence. See this Q&A for an explanation of this distinction.
One common exception to this rule is if you have code that will retain references longer than they're needed. The canonical example of this is with listeners. If you add a listener to some component, and later on that listener is no longer needed, you need to explicitly remove it. If you don't, that listener can inhibit garbage collection of both itself and of the objects it has references to.
Let's say I added a listener to a button like so:
button.addListener(event -> label.setText("clicked!"));
Then later on the label is removed, but the button remains.
window.removeChild(label);
This is a problem because the button has a reference to the listener and the listener has a reference to the label. The label can't be garbage collected even though it's no longer visible on screen.
This is a time to take action and get on the GC's good side. I need to remember the listener when I add it...
Listener listener = event -> label.setText("clicked!");
button.addListener(listener);
...so that I can remove it when I'm done with the label:
window.removeChild(label);
button.removeListener(listener);

It depends on the following factors
how clear() is implemented
the allocation patterns for the entries held by the collection
the garbage collector
whether there might be other things holding onto the collection or subviews of it (does not apply to your example but common in the real world)
For a primitive, non-generational, tracing garbage-collector clearing out references only means extra work for without making things much easier on the GC. But clearing may still help if you cannot guarantee that all references to the collection are nulled out in a timely manner.
For generational GCs and especially G1GC nulling out references inside a collection (or a reference array) may be helpful under some circumstances by reducing cross-region references.
But that only helps if you actually have allocation patterns that create objects in different regions and put them into a collection living in a another region. And it also depends on the clear() implementation nulling out those references, which turns clearing into an O(n) operation when it could often be implemented as a O(1) one.
So for your concrete example the answer would be as follows:
If
your list is long-lived
the lists created on that code-path make up/hold onto a significant fraction of the garbage your application produces
you're using G1 or a similar multi-generational collector
slowly accumulates objects before eventually being released (this usually puts them in different regions, thus creating cross-region references)
you wish to trade CPU-time on clearing for reduced GC workload
the clear() implementation is O(n) instead of O(1), i.e. nulls out all entries. OpenJDK's 1.8 LinkedList does this.
then it may be beneficial to call clear() before releasing the collection itself.
So at best this is a very workload-specific micro-optimization that should only be applied after profiling/monitoring the application under realistic conditions and determining that GC overhead justifies the extra cost of clearing.
For reference, OpenJDK 1.8's LinkedList::clear
/**
* Removes all of the elements from this list.
* The list will be empty after this call returns.
*/
public void clear() {
// Clearing all of the links between nodes is "unnecessary", but:
// - helps a generational GC if the discarded nodes inhabit
// more than one generation
// - is sure to free memory even if there is a reachable Iterator
for (Node<E> x = first; x != null; ) {
Node<E> next = x.next;
x.item = null;
x.next = null;
x.prev = null;
x = next;
}
first = last = null;
size = 0;
modCount++;
}

I don't believe the clear() will help in this instance. The GC will remove items once there are no more references to them, so in theory, just setting the list = null will have the same effect.
You cannot control when the GC will be called, so in my view its not worth worry about unless you have specific resource/performance requirements. Personally I'd still with list = null;
If you want to reuse the list variable, then of course clear() is the best option rather than creating a new list object.

In Java an object is either alive (reachable via a reference owned by some other object) or dead (not reachable by a reference owner by any other object). Objects that are only reachable from dead objects are also considered dead and eligible for garbage collection.
If no live object has a reference to your collection, then it is unreachable and eligible for garbage collection. What this also means is that all of your collection's elements (and any other helper objects that it may have created) are also unreachable unless some other live object has a reference to them.
Therefore, the clear method has no effect other than erasing a reference from one dead object to another. They will get garbage collected either way.

Related

How to make object in List eligible for garbage collection?

I understand that when an object is added to a List, the List retains a reference to that object based on answer from this question
Is this java Object eligible for garbage collection in List
How then how do you make the object in the List eligible for garbage collection so that it's removed from the heap and not taking up memory?
I ask because in JavaFX, a Vboxs getChildren method returns observable list containing the children nodes of the vbox. If a UI element is removed but not eligible for garbage collection, will this object still be on the heap consuming memory?
Removing the references from that should make them subject of garbage collection (as long as no other object keeps references!).
You know, that is the whole idea how a GC works: it keeps those objects that are alive (can be reached from your initial starting point). Everything else is garbage; and subject to disposal once the GC decides to collect that garbage. And just to be precise here: you have to understand that these are two different activities. There might be a long time between object X "turns into garbage"; and "X gets collected; and memory is freed up".
One can use WeakReferences to avoid this; but of course, that requires that some piece of code would be pushing such WeakReference objects into the list initially. So, if you "own" this code, you could change that. But of course: that means that you always have to check if the object behind the WeakReference is still there when accessing the WeakReference.
How then how do you make the object in the List eligible for garbage
collection so that it's removed from the heap and not taking up
memory?
Assuming that those objects are only referenced by this List, simply use the clear method
If a UI element is removed but not eligible for garbage collection,
will this object still be on the heap consuming memory?
As long as an object is been hard referenced by at least one another object that is not itself eligible for garbage collection, the objet will be itself not eligible for garbage collection so it won't be collected by the GC it will then remain in the heap.
If you cannot remove the object from the List, the only way I can think about to handle this would be wrapping your Objects into a WeakReference.

Why is clear an O(n) operation for linked list?

According to attachment 1, linked list's clear operation is O(n).
I have a question about why is it so.
Here is how we implemented the linked list in class(java)
public class LinkedIntList {
private ListNode front;
......
}
And if I were to write a clear method for this linked list class, this is how I would write it
public void clear() {
front = null;
}
Given this implementation(think this is how most people would write this), this would be one operation that is independent of the size of the list (just setting front to null). Also by setting the front pointer as null, wouldn't you essentially be asking the garbage collector to "reclaim the underlying memory and reuses it for future object allocation." In this case , the underlying memory would be the front node and all the nodes that are consecutively attached to it.(http://javabook.compuware.com/content/memory/how-garbage-collection-works.aspx)
After stating all of that, how is clear an O(n) operation for linked list?
Attachment 1:
This is from a data structures class I am in
Remember that a Linked List has n entries that were allocated for it, and for clearing it, you actually need to free them.
Since java has a built in garbage collector (GC) - you don't need to explicitly free those - but the GC will go over each and every one of them and free them when time comes
So even though your explicit method is O(1), invoking it requires O(n) time from the GC, which will make your program O(n)
I expect that your data structure class is not assuming that JAVA is the only system in the world.
In C, C++, Pascal, Assembly, Machine Code, Objective C, VB 6, etc, it takes a fixed time to free each block of memory, as they do not have a garbage collector. Until very recently most programs where wrote without the benefits of a garbage collector.
So in any of the above, all the node will need to be pass to free(), and the call to free() takes about a fixed time.
In Java, the link listed would take O(1) time to clear for a simple implantation of a linked list.
However as it may be possible that nodes would be pointed to from outside of the list, or that a garbage collector will consider different part of the memory at different time, there can be real life benefits from setting all the “next” and “prev” pointers to null. But in 99% of cases, it is best just to set the “front” pointer in the header to null as your code shows.
I think you should ask your lecture about this, as I expect lots of the students in the class will have the same issue. You need to learn C well before you can understand most generally data structure books or classes.

Java ArrayList reference

I am creating an ArrayList of objects using generics. Each thread does come calculating and stores the object in the the array list.
However when looking at the ArrayList which is static and volatile all the object attributes are set as null. My thoughts are something to do with the garbage collector removing the instances in the threads so once the threads have finished there is no reference to them.
Any help would be really helpful?
The garbage collector will not remove instances1 from an array list. That is not the problem.
The problem is most likely that you are accessing and updating the array list object without proper synchronization. If you do not synchronize properly, one thread won't always see the changes made by another one.
Declaring the reference to the ArrayList object only guarantees that the threads will see the same list object reference. It makes no guarantees about what happens with the operations on the list object.
1 - Assuming that the array list is reachable when the GC runs, then all elements that have been properly added to the list will also be reachable. Nothing that is reachable will be deleted by the garbage collector. Besides, the GC won't ever reach into an object that your application can still see and change ordinary references to null.

Map clear vs null

I have a map that I use to store dynamic data that are discarded as soon as they are created (i.e. used; they are consumed quickly). It responds to user interaction in the sense that when user clicks a button the map is filled and then the data is used to do some work and then the map is no longer needed.
So my question is what's a better approach for emptying the map? should I set it to null each time or should I call clear()? I know clear is linear in time. But I don't know how to compare that cost with that of creating the map each time. The size of the map is not constant, thought it may run from n to 3n elements between creations.
If a map is not referenced from other objects where it may be hard to set a new one, simply null-ing out an old map and starting from scratch is probably lighter-weight than calling a clear(), because no linear-time cleanup needs to happen. With the garbage collection costs being tiny on modern systems, there is a good chance that you would save some CPU cycles this way. You can avoid resizing the map multiple times by specifying the initial capacity.
One situation where clear() is preferred would be when the map object is shared among multiple objects in your system. For example, if you create a map, give it to several objects, and then keep some shared information in it, setting the map to a new one in all these objects may require keeping references to objects that have the map. In situations like that it's easier to keep calling clear() on the same shared map object.
Well, it depends on how much memory you can throw at it. If you have a lot, then it doesn't matter. However, setting the map itself to null means that you have freed up the garbage collector - if only the map has references to the instances inside of it, the garbage collector can collect not only the map but also any instances inside of it. Clear does empty the map but it has to iterate over everything in the map to set each reference to null, and this takes place during your execution time that you can control - the garbage collector essentially has to do this work anyways, so let it do its thing. Just note that setting it to null doesn't let you reuse it. A typical pattern to reuse a map variable may be:
Map<String, String> whatever = new HashMap<String, String();
// .. do something with map
whatever = new HashMap<String, String>();
This allows you to reuse the variable without setting it to null at all, you silently discard the reference to the old map. This is atrocious practice in non-memory managed applications since they must reference the old pointer to clear it (this is a dangling pointer in other langauges), but in Java since nothing references this the GC marks it as eligible for collection.
I feel nulling the existing map is more cheaper than clear(). As creation of object is very cheap in modern JVMs.
Short answer: use Collection.clear() unless it is too complicated to keep the collection arround.
Detailed answer: In Java, the allocation of memory is almost instantaneous. It is litle more than a pointer that gets moved inside the VM. However, the initialization of those objects might add up to something significant. Also, all objects that use an internal buffer are sensible to resizing and copying of their content. Using clear() make sure that buffers eventually stabilize to some dimension, so that reallocation of memory and copying if old buffer to new buffer will never be necessary.
Another important issue is that reallocating then releasing a lot of objects will require more frequent execution of the Garbage collector, which might cause suddenly lag.
If you always holds the map, it will be prompted to the old generation. If each user has one corresponding map, the number of map in the old generation is proportionate to the number of the user. It may trigger Full GC more frequently when the number of users increase.
You can use both with similar results.
One prior answer notes that clear is expected to take constant time in a mature map implementation. Without checking the source code of the likes of HashMap, TreeMap, ConcurrentHashMap, I would expect their clear method to take constant time, plus amortized garbage collection costs.
Another poster notes that a shared map cannot be nulled. Well, it can if you want it, but you do it by using a proxy object which encapsulates a proper map and nulls it out when needed. Of course, you'd have to implement the proxy map class yourself.
Map<Foo, Bar> myMap = new ProxyMap<Foo, Bar>();
// Internally, the above object holds a reference to a proper map,
// for example, a hash map. Furthermore, this delegates all calls
// to the underlying map. A true proxy.
myMap.clear();
// The clear method simply reinitializes the underlying map.
Unless you did something like the above, clear and nulling out are equivalent in the ways that matter, but I think it's more mature to assume your map, even if not currently shared, may become shared at a later time due to forces you can't foresee.
There is another reason to clear instead of nulling out, even if the map is not shared. Your map may be instantiated by an external client, like a factory, so if you clear your map by nulling it out, you might end up coupling yourself to the factory unnecessarily. Why should the object that clears the map have to know that you instantiate your maps using Guava's Maps.newHashMap() with God knows what parameters? Even if this is not a realistic concern in your project, it still pays off to align yourself to mature practices.
For the above reasons, and all else being equal, I would vote for clear.
HTH.

Question about Garbage Collection in Java

Suppose I have a doubly linked list. I create it as such:
MyList list = new MyList();
Then I add some nodes, use it and afterwards decide to throw away the old list like this:
list = new MyList();
Since I just created a new list, the nodes inside the old memory area are still pointing to each other. Does that mean the region with the old nodes won't get garbage collected? Do I need to make each node point to null so they're GC'd?
No, you don't. The Java GC handles cyclic references just fine.
Conceptually, each time the GC runs, it looks at all the "live" root references in the system:
Local variables in every stack frame
"this" references in every instance method stack frame
Effectively, all static variables (In fact these are really referenced by Class objects, which are in turn referenced by ClassLoaders, but lets ignore that for the moment.)
With those "known live" objects, it examines the fields within them, adding to the list. It recurses down into those referenced objects, and so on, until it's found every live object in the system. It then garbage collects everything that it hasn't deemed to be live.
Your cyclically referenced nodes refer to each other, but no live object refers to them, so they're eligible for garbage collection.
Note that this is a grossly simplified summary of how a garbage collector conceptually works. In reality they're hugely complicated, with generations, compaction, concurrency issues and the like.
If you created your own double linked list, and you put in this double linked list Containers (that contain items from your list); only those containers are linked one to another.
So in your list you'll have an object A contained in A'. A' is linked to B' and B' is a container that hold B etc. And none of the object have to reference another.
In a normal case those containers won't be available from outside (only the content is interesting); so only your list will have references to your containers (remember that your content isn't aware of his container).
If you remove your last reference to your list (the list, not the container nor the content) the GC will try to collect your list content, witch is your containers and your contents.
Since your containers are not available outside the only reference they have is one each other and the main list. All of that is called an island of isolation. Concerning the content, if they still have references in your application, they will survive the GC, if not they won't.
So when you remove your list only A' and B' will be deleted because even if they still have references, those references are part of an island. If A and B have no more references they will be deleted too.
No -- Java (at least as normally implemented) doesn't use reference counting, it uses a real garbage collector. That means (in essence) when it runs out of memory, it looks at the pointers on the stack, in registers, and other places that are always accessible, and "chases" them to find everything that's accessible from them.
Pointers within other data structures like your doubly-linked list simply don't matter unless there's some outside pointer (that is accessible) that leads to them.
No, the GC will reclaim them anyways so you don't need to point them to null. Here's a good one paragraph description from this JavaWorld article:
Any garbage collection algorithm must
do two basic things. First, it must
detect garbage objects. Second, it
must reclaim the heap space used by
the garbage objects and make it
available to the program. Garbage
detection is ordinarily accomplished
by defining a set of roots and
determining reachability from the
roots. An object is reachable if there
is some path of references from the
roots by which the executing program
can access the object. The roots are
always accessible to the program. Any
objects that are reachable from the
roots are considered live. Objects
that are not reachable are considered
garbage, because they can no longer
affect the future course of program
execution.
The garbage collector looks if objects are referenced by live threads. If objects are not reachable by any live threads, they are eligible for garbage collection.
It doesn't matter if the objects are referencing each other.
As others have pointed out, the Java garbage collector doesn't simply look at reference counting; instead it essentially looks at a graph where the nodes are the objects that currently exist and links are a reference from one object to another. It starts from a node that is known to be live (the main method, for instance) and then garbage collects anything that can't be reached.
The Wikipedia article on garbage collection discusses a variety of ways that this can be done, although I don't know exactly which method is used by any of the JVM implementations.
The garbage collector looks for objects that isn't referenced anywhere.
So if you create a object and you loose the reference like the example the garbage collector will collect this.

Categories