Free memory from complex objects in Java

Free memory from complex objects in Java - java

I try to do my best to explain my question. Maybe it's a bit abstract.
I read some literature about not invoking GC explictly in Java code, finalize method, pointing to null, etc.
I have some large XMLs files (customer invoices). Using Jaxb, the file marshals in a complex Java object. Its attributes are basic types (Integer, BigDecimal, String, etc.) but also class of other complex classes, list of other classes, list of classes with list as attribute, etc.
When I do my stuff with the object, I need to remove it from the memory. Some XML are very large and I can avoid a memory leak or OutOfMemoryError situation.
So, my questions are:
Is it enough to assign big object to null? I read that, if there are soft references, GC will not free the object.
Should I do a in deep clearing of the object, clearing all list, assigning null to the attributes, etc.?
What about JaxB (I'm using Java6, so JaxB is built in) and soft references? JaxB is faster than old JibX marshaller, but I don't know if it's worse in memory usage.
Should I wrap the megacomplex JaxB class with WeakReference or something like this?
Excuse me for mixing Java memory usage concepts, JaxB, etc. I'm studying the stability of a large running process, and the .hprof files evidence that all customers data of all invoices remains in memory.
Excuse me if it's a simple, basic or rare question.
Thanks in advance

Unless something else points to parts of your big object (graph), assigning the big object reference null is enough.
Safest though, would be to use a profiler after your application has been running for a while, and look at the object references, and see if there's something that isn't properly GC'ed.

Is it enough to assign big object to null? I read that, if there are soft references, GC will not free the object.
The short answer is yes. It is enough to assign (all strong references to) a big object to null - if you do this, the object will no longer be considered "strongly reachable" by the Garbage Collector.
Soft references will not be a problem in your case, because it's guaranteed that softly reachable objects will be garbage collected before an OutOfMemoryError is thrown. They might well prevent the garbage collector from collecting the object immediately (if they didn't, they'd act exactly the same as weak references). But this memory use would be "temporary", in that it would be freed up if it were needed to fulfil an allocation request.
Should I do a in deep clearing of the object, clearing all list, assigning null to the attributes, etc.?
That would probably be a bad idea. If the field values are only referenced by the outer big object, then they will also be garbage collected when the big object is collected. And if they are not, then the other parts of the code that reference them will not be happy to see that you're removing members from a list they're using!
In the best case this does nothing, and in the worst case this will break your program. Don't let the lure of this distract you from addressing the sole actual issue of whether your object is strongly-reachable or not.
What about JaxB (I'm using Java6, so JaxB is built in) and soft references? JaxB is faster than old JibX marshaller, but I don't know if it's worse in memory usage.
I'm not especially familiar with the relative time and space performance of those libraries. But in general, it's safe to assume a very strong "innocent until proven guilty" attitude with core libraries. If there were a memory leak bug, it would probably have been found, reported and fixed by now (unless you're doing something very niche).
If there's a memory leak, I'm 99.9% sure that it's your own code that's at fault.
Should I wrap the megacomplex JaxB class with WeakReference or something like this?
This sounds like you may be throwing GC "fixes" at the problem without thinking through what is actually needed.
If the JaxB class ought to be weakly referenced, then by all means this is a good idea (and it should be there already). But if it shouldn't, then definitely don't do this. Weak referencing is more a question of the overall semantics, and shouldn't be something you introduce specifically to avoid memory issues.
If the outer code needs a reference to the object, then it needs a reference - there's no magic you can do to have the intance be garbage collected yet still available. If it doesn't need a reference (beyond a certain point), then it doesn't need one at all - better to just nullify a standard [strong] reference, or let it fall out of scope. Weak references are a specialist situation, and are generally used when you don't have full control over the point where an object ceases to be relevant. Which is probably not the case here.
the .hprof files evidence that all customers data of all invoices remains in memory.
This suggests that they are indeed being referenced longer than is necessary.
The good news is that the hprof file will contain details of exactly what is referencing them. Look at an invoice instance that you would expect to have been GCed, and see what is referencing it and preventing it from being GCed. Then look into the class in question to see how you expect that reference to be freed, and why it hasn't been in this case.
All good performance/memory tweaking is based on measurements. Taking heap dumps, and inspecting the instances and references to them, is your measurements. Do this and act on the results, rather than trying to wrap things in WeakReferences on the hope that it might help.

You wrote
hprof files evidence that all customers data of all invoices remains in memory.
You should analyse it using mat. Some good notes at http://memoryanalyzer.blogspot.in/

Related

Is there a way to receive object, without having reference to it?

Suppose following code:
Object obj = new Object();
obj = null;
At this point, i don't have any reference to this object, but it's still on the heap, because garbage collection don't happens instantly. Is there a way to re obtain reference on this object, before it'll be collected by GC?
Only possible way that i seen so far is to use Unsafe, which provides direct memory access, but i will need to know where in memory exactly object is allocated. Also, there is Weak\SoftReference, but they are implemented by special GC behavior.
P.S. To predict questions like "Why do you need it?" - Because science is not about why, it's about why not! (c)

This is highly JVM implementation specific. In a naive implementation having memory allocation information associated with each object, you could find an object whose memory has not been freed yet and it seems you are thinking into that direction.
However, sophisticated JVMs don’t work that way. Associating allocation information with each object would create a giant overhead, given that you may have millions of objects in your runtime. Not only regarding memory requirement, but also regarding the amount of work that has to be done for maintaining these information when allocating or freeing an object.
So what makes a part of your heap memory an object? Only the reference you are holding to it. The garbage collector traverses existing references and within the objects found this way, it will find meta information (i.e. a pointer to class specific information) needed to understand how much memory belongs to the object and how to interpret the contained data (to traverse the sub-references, if any). Everything unreferenced is unused per se and might contain old objects or might have never been used at all, who knows. Once all references to an object are gone, there is no information left about the former existence of this object.
Getting to the point, there is no explicit freeing action. When the garbage collector has found surviving objects, they will be copied to a dedicated new place and their old place is considered to be free, regardless of how many objects there were before and how much memory each individual object occupied when it was alive.
When you search memory that is considered to be unused, you may find reminiscences of old objects, but without references to their starting points, it’s impossible to say whether the bit pattern that looks like an object really is a dead object or just a coincidence. Even if you managed to resurrect an object that way, it had nothing to do with your original idea of being able to resurrect a reference, because the gc didn’t run yet.
Note that all modifications to this ordinary life time work by holding another reference to the object. E.g., when the class defines a non-trivial finalize() method, the JVM has to add a reference to the queue of objects needing finalization. Similarly, soft, weak and phantom references encapsulate a reference to the object in question. Also a debugger may keep a reference to an object, once it has seen it.
But for your trivial code Object obj = new Object(); obj = null;, assuming there’s no breakpoint set in-between, there will be no additional reference and hence, no way of resurrecting the object. A JVM may even elide the entire allocation when optimizing the code at runtime. So then you wouldn’t even find remainings of the object in the RAM when searching it as the object effectively never existed.

At this point, i don't have any reference to this object, but it's still on the heap, because garbage collection don't happens instantly.
It is undefined where it is, and it is also undefined whether or not garbage collection happens instantly.
Is there a way to re obtain reference on this object, before it'll be collected by GC?
You already had one and you threw it away. Just keep it.
I will need to know where in memory exactly object is allocated.
There is nothing in standard Java that will tell you that, and no useful way you could make use of the information if you could get it.
Also, there is Weak/SoftReference, but they are implemented by special GC behavior.
I don't see how this affects your question, whatever it is.

Java efficiency - child object referencing parent object

I'm new to java/garbage collected languages and I still am getting my head around what it means to have an object reference (because I'm told it's not a pointer?) so I'm pondering this question:
I have a parent/child object structure where the parent will have several lists of several children each...is there any inefficiency or any other reason not to have a pointer in each child back to it's parent? In my prior language (Delphi) it was a simple pointer so not a problem at all. Are there any considerations with this practice in Java?

There shouldn't be any issue here. Technically yes, Java references are not pointers, but for most issues, you can think of them similarly. Object references are integers pointing to locations in Java's heap. Each additional place it's stored is therefore one additional integer. Reasonably small, generally speaking.
You can (generally!) trust Java to do the right thing when it comes to object management, and shouldn't have to worry too much about garbage collection or the intricacies of how object references work.

From what I know I'd say you'd be fine doing that. Java does a good job of cleaning up your garbage and I usually have a 'parent' field in children classes.

As previous answers have stated, generally the GC is pretty good with clearing things up. Your primary concern will be things that persist once you leave an activity, hold onto context. This will cause your Activity to stay in memory because you have a reference to it that is not in it's parent child tree.
More on this here

I think it would be helpful if you read up on reference types as well - strong, weak, phantom and soft as it would be helpful. Also, read up on how GC works (for different generations - young/survivor spaces & old generation), garbage collectors to use and GC parameters that you can specify.

Is there any advantage to assign null to a ref when it is no longer used?

I heard that assign a reference to null explicit will help gc to collect it.
Is that true?
If an object is out of scope, will it get gc quickly?

if an object is out of scope, will it get gc quickly?
That is impossible to answer in general. However, if a reference is about to go out of scope, setting it to null just before it does will almost certainly achieve nothing.
On the other hand, if the reference variable is long-lived, then setting it to null may be useful if the referenced object is no longer needed.

Typically the JVM will do garbage collection when it needs to, so assigning a reference to null will not help it happen more quickly.

If you have an array or reference which you intend to keep it can be worth nulling it out.
If you have a long method with a large object which will not go out of scope immediately, refers to a large object it could be worth nulling it out. However in this situation, it is better the break up the method at the point where the object is no longer needed so it goes out of scope.

If that is the only reference then the GC can free the space on the heap that the object was using. However, if there is another reference to the same object then setting the first reference to null will do nothing. The second reference will still keep the object alive - so the GC will not free the space.

In general, if the reference is about to go out of scope, you don't need to explicitly null it as it will be gone soon any way.
The only explicit nulling or removing of references I consciously use are:
Closing a resource associated with an object that has a longer lifetime than that resource. For example: a java.sql.Connection implementation usually has an associated physical connection (eg a Socket): when the java.sql.Connection is closed you can null this physical connection as you are no longer using it, while the actual java.sql.Connection will (might) still be held by the user for an indefinite time. (I use this example as I am a developer of a JDBC driver, in general this example does not occur for a Java developer, but similar situations exist)
Processing relatively large 'throw-away' objects in a list or array structure. For example: the JavaMail library provides methods to get messages; these return an array of messages. If you process the messages sequentially (and then no longer need them), nulling the array-entry after processing could reduce the memory footprint of your application (with IMAP it can 'on demand' load information from the server and store it in the message, increasing its size due to processing).
There are probably some other cases I will explicitly null a reference, but that is usually to signify the unavailability of something not out of concern about garbage collection or memory usage.
However as always: don't just null because you think it could help: null if you know it will help (so profile the code, measure memory usage etc). And if it is a one off hobby application or university assignment: don't bother.

Is there a practical use for weak references? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Weak references - how useful are they?
Since weak references can be claimed by the garbage collector at any time, is there any practical reason for using them?

If you want to keep a reference to something as long as it is used elsewhere e.g. a Listener, you can use a weak reference.
WeakHashMap can be used as a short lived cache of keys to derived data. It can also be used to keep information about objects used else where and you don't know when those objects are discarded.
BTW Soft References are like Weak references, but they will not always be cleaned up immediately. The GC will always discard weak references when it can and retain Soft References when it can.
There is another kind of reference called a Phantom Reference. This is used in the GC clean up process and refers to an object which isn't accessible to "normal" code because its in the process of being cleaned up.

Since weak reference can be claimed by garbage collector at any time, is there any practical reason to use it?
Of course there are practical reasons to use it. It would be awfully strange if the framework designers went to the enormous expense of building a weak reference system that was impractical, don't you think?
I think the question you intended to ask was:
What are realistic situations in which people use weak references?
There are many. A common one is to achieve a performance goal. When performance tuning an application one often must make a tradeoff between more memory usage and more time usage. Suppose for example there is a complex calculation that you must perform many times, but the computation is "pure" -- the answer depends only on the arguments, not upon exogenous state. You can build a cache -- a map from the arguments to the result -- but that then uses memory. You might never ask the question again, and that memory is would then be wasted.
Weak references possibly solve this problem; the cache can get quite large, and therefore time is saved if the same question is asked many times. But if the cache gets large enough that the garbage collector needs to reclaim space, it can do so safely.
The downside is of course that the cleanup policy of the garbage collector is tuned to meet the goals of the whole system, not your specific cache problem. If the GC policy and your desired cache policy are sufficiently aligned then weak references are a highly pragmatic solution to this problem.

If a WeakReference is the only reference to an object, and you want the object to hang around, you should probably be using a SoftReference instead.
WeakReferences are best used in cases where there will be other references to the object, but you can't (or don't want to have to) detect when those other references are no longer used. Then, the other reference will prevent the object from being garbage collected, and the WeakReference will just be another way of getting to the same object.
Two common use cases are:
For holding additional (often expensively calculated but reproducible) information about specific objects that you cannot modify directly, and whose lifecycle you have little control over. WeakHashMap is a perfect way of holding these references: the key in the WeakHashMap is only weakly held, and so when the key is garbage collected, the value can be removed from the Map too, and hence be garbage collected.
For implementing some kind of eventing or notification system, where "listeners" are registered with some kind of coordinator, so they can be informed when something occurs – but where you don't want to prevent these listeners from being garbage collected when they come to the end of their life. A WeakReference will point to the object while it is still alive, but point to "null" once the original object has been garbage collected.

We use it for that reason - in our example, we have a variety of listeners that must register with a service. The service keeps weak references to the listeners, while the instantiated classes keep strong references. If the classes at any time get GC'ed, the weak reference is all that remains of the listeners, which will then be GC'ed as well. It makes keeping track of the intermediary classes much easier.

The most common usage of weak references is for values in "lookup" Maps.
With normal (hard) value references, if the value in the map no longer has references to it elsewhere, you often don't need the lookup any more. With weakly referenced map values, once there are no other references to it, the object becomes a candidate for garbage collection
The fact that the map itself has a (the only) reference to the object does not stop it from being garbage collected because the reference is a weak reference

To prevent memory leaks, see this article for details.

A weak reference is a reference that does not protect the referent object from collection by a garbage collector.
An object referenced only by weak references is considered
unreachable (or "weakly reachable") and so may be collected at any
time.
Weak references are used to avoid keeping memory referenced by
unneeded objects. Some garbage-collected languages feature or support
various levels of weak references, such as Java, C#, Python, Perl, PHP or
Lisp.
Garbage collection is used to reduce the potential for memory leaks
and data corruption. There are two main types of garbage collection:
tracing and reference counting. Reference counting schemes record the
number of references to a given object and collect the object when
the reference count becomes zero. Reference-counting cannot collect
cyclic (or circular) references because only one object may be
collected at a time. Groups of mutually referencing objects which are
not directly referenced by other objects and are unreachable can thus
become permanently resident; if an application continually generates
such unreachable groups of unreachable objects this will have the
effect of a memory leak. Weak references may be used to solve the
problem of circular references if the reference cycles are avoided by
using weak references for some of the references within the group.
Weak references are also used to minimize the number of unnecessary
objects in memory by allowing the program to indicate which objects
are not critical by only weakly referencing them.

I use it generally for some type of cache. Recently accessed items are available immediately and in the case of cache miss you reload the item (DB, FS, whatever).

I use WeakSet to encode links in a graph. If a node is deleted, the links automatically disappear.

Can there be memory leak in Java

I get this question asked many times. What is a good way to answer

Can there be memory leak in Java?
The answer is that it depends on what kind of memory leak you are talking about.
Classic C / C++ memory leaks occur when an application neglects to free or dispose an object when they are done with it, and it leaks. Cyclic references are a sub-case of this where the application has difficulty knowing when to free / dispose, and neglects to do it as a result. Related problems are where the application uses an object after it has been freed, or attempts to free it twice. (You could call the latter problems memory leaks, or just bugs. Either way ... )
Java and other (fully1) managed languages mostly don't suffer from these problems because the GC takes care of freeing objects that are no longer reachable. (Certainly, dangling pointer and double-free problems don't exist, and cycles are not problematic as they are for C / C++ "smart pointers" and other reference count schemes.)
But in some cases GC in Java will miss objects that (from the perspective of the programmer) should be garbage collected. This happens when the GC cannot figure out that an object cannot be reached:
The logic / state of the program might be such that the execution paths that would use some variable cannot occur. The developer can see this as obvious, but the GC cannot be sure, and errs on the side of caution (as it is required to).
The programmer could be wrong about it, and the GC is avoiding what might otherwise result in a dangling reference.
(Note that the causes of memory leaks in Java can be simple, or quite subtle; see #jonathan.cone's answer for some subtle ones. The last one potentially involves external resources that you shouldn't rely on the GC to deal with anyway.)
Either way, you can have a situation where unwanted objects cannot be garbage collected, and hang around tying up memory ... a memory leak.
Then there is the problem that a Java application or library can allocate off-heap objects via native code that need to be managed manually. If the application / library is buggy or is used incorrectly, you can get a native memory leak. (For example: Android Bitmap memory leak ... noting that this problem is fixed in later versions of Android.)
1 - I'm alluding to a couple of things. Some managed languages allow you to write unmanaged code where you can create classic storage leaks. Some other managed languages (or more precisely language implementations) use reference counting rather than proper garbage collecting. A reference count-based storage manager needs something (i.e. the application) to break cycles ... or else storage leaks will ensue.

Yes. Memory leaks can still occur even when you have a GC. For example, you might hold on to resources such as database result sets which you must close manually.

Well, considering that java uses a garbage collector to collect unused objects, you can't have a dangling pointer. However, you could keep an object in scope for longer than it needs to be, which could be considered a memory leak. More on this here: http://web.archive.org/web/20120722095536/http://www.ibm.com:80/developerworks/rational/library/05/0816_GuptaPalanki/
Are you taking a test on this or something? Because that's at least an A+ right there.

The answer is a resounding yes, but this is generally a result of the programming model rather than an indication of some defect in the JVM. This is common when frameworks have lifecycles different of that than a running JVM. Some examples are:
Reloading a context
Failing to dereference observers (listeners)
Forgetting to clean up resources after you're finished using them *
* - Billions of consulting dollars have been made resolving the last one

Yes, in the sense that your Java application can accumulate memory over time that the garbage collector is unable to free.
By maintaining references to uneeded/unwanted objects they will never fall out of scope and their memory will not be claimed back.

yes, if you don't de-reference objects they will never be garbage-collected and memory usage will increase. however because of how java is designed, this is difficult to achieve whereas in some other languages this is sometimes difficult not to achieve.
edit: read Amokrane's link. it's good.

Yes it is possible.
In Effective Java there is an example involving a stack implemented using arrays. If your pop operations simply decrement the index value it is possible to have a memory leak. Why? Because your array still has a reference to the popped value and you still have a reference to the stack object. So the correct thing to do for this stack implementation would be to clear the reference to the popped value using an explicit null assignment at the popped array index.

The short answer:
A competent JVM has no memory
leaks, but more memory can be used
than is needed, because not all unused
objects have been garbage collected,
yet. Also, Java apps themselves can hold references to objects they no longer need and this can result in a memory leak.

The book Effective Java gives two more reasons for "memory leaks":
Once you put object reference in Cache and forget that it's there. The reference remains in cache long before becoming irrelevant. Solution is to represent cache as a WeakHashMap
in an API where clients register callbacks and don't re-register them explicitly. Solution is to store only weak references to them.

Yes, it can be, in a context when a program mistakenly hold a reference to an object that would be never used again and therefore it's not cleaned by the GC.
An example to it would be forgetting to close an opened stream:
class MemoryLeak {
private void startLeaking() throws IOException {
StringBuilder input = new StringBuilder();
URLConnection conn = new URL("www.example.com/file.txt").openConnection();
BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8));
while (br.readLine() != null) {
input.append(br.readLine());
}
}
public static void main(String[] args) throws IOException {
MemoryLeak ml = new MemoryLeak();
ml.startLeaking();
}
}

One simple answer is : JVM will take care of all your initialization of POJO's [plain old java objects] as long as you are not working with JNI. With JNI if you have made any memory allocation with the native code you have to take care of that memory by yourself.

Yes. A memory leak is unused memory not released to the memory manager by the app.
I've seen many times Java code wich stores items on a data structure but the items are never removed from there, filling the memory until an OutOfMemoryError:
void f() {
List<Integer> w = new ArrayList<Integer>();
while (true) {
w.add(new Integer(42));
}
}
While this example is too obvious, Java memory errors tend to be more subtle. For example, using Dependency Injection storing a huge object on a component with SESSION scope, without releasing it when the object is no longer used.
On a 64 bits VM this tends to get worse since the swap memory space starts to get filled until the system crawls on too many IO operations.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.