Java efficiency - child object referencing parent object

Java efficiency - child object referencing parent object - java

I'm new to java/garbage collected languages and I still am getting my head around what it means to have an object reference (because I'm told it's not a pointer?) so I'm pondering this question:
I have a parent/child object structure where the parent will have several lists of several children each...is there any inefficiency or any other reason not to have a pointer in each child back to it's parent? In my prior language (Delphi) it was a simple pointer so not a problem at all. Are there any considerations with this practice in Java?

There shouldn't be any issue here. Technically yes, Java references are not pointers, but for most issues, you can think of them similarly. Object references are integers pointing to locations in Java's heap. Each additional place it's stored is therefore one additional integer. Reasonably small, generally speaking.
You can (generally!) trust Java to do the right thing when it comes to object management, and shouldn't have to worry too much about garbage collection or the intricacies of how object references work.

From what I know I'd say you'd be fine doing that. Java does a good job of cleaning up your garbage and I usually have a 'parent' field in children classes.

As previous answers have stated, generally the GC is pretty good with clearing things up. Your primary concern will be things that persist once you leave an activity, hold onto context. This will cause your Activity to stay in memory because you have a reference to it that is not in it's parent child tree.
More on this here

I think it would be helpful if you read up on reference types as well - strong, weak, phantom and soft as it would be helpful. Also, read up on how GC works (for different generations - young/survivor spaces & old generation), garbage collectors to use and GC parameters that you can specify.

Related

Is there a way to receive object, without having reference to it?

Suppose following code:
Object obj = new Object();
obj = null;
At this point, i don't have any reference to this object, but it's still on the heap, because garbage collection don't happens instantly. Is there a way to re obtain reference on this object, before it'll be collected by GC?
Only possible way that i seen so far is to use Unsafe, which provides direct memory access, but i will need to know where in memory exactly object is allocated. Also, there is Weak\SoftReference, but they are implemented by special GC behavior.
P.S. To predict questions like "Why do you need it?" - Because science is not about why, it's about why not! (c)

This is highly JVM implementation specific. In a naive implementation having memory allocation information associated with each object, you could find an object whose memory has not been freed yet and it seems you are thinking into that direction.
However, sophisticated JVMs don’t work that way. Associating allocation information with each object would create a giant overhead, given that you may have millions of objects in your runtime. Not only regarding memory requirement, but also regarding the amount of work that has to be done for maintaining these information when allocating or freeing an object.
So what makes a part of your heap memory an object? Only the reference you are holding to it. The garbage collector traverses existing references and within the objects found this way, it will find meta information (i.e. a pointer to class specific information) needed to understand how much memory belongs to the object and how to interpret the contained data (to traverse the sub-references, if any). Everything unreferenced is unused per se and might contain old objects or might have never been used at all, who knows. Once all references to an object are gone, there is no information left about the former existence of this object.
Getting to the point, there is no explicit freeing action. When the garbage collector has found surviving objects, they will be copied to a dedicated new place and their old place is considered to be free, regardless of how many objects there were before and how much memory each individual object occupied when it was alive.
When you search memory that is considered to be unused, you may find reminiscences of old objects, but without references to their starting points, it’s impossible to say whether the bit pattern that looks like an object really is a dead object or just a coincidence. Even if you managed to resurrect an object that way, it had nothing to do with your original idea of being able to resurrect a reference, because the gc didn’t run yet.
Note that all modifications to this ordinary life time work by holding another reference to the object. E.g., when the class defines a non-trivial finalize() method, the JVM has to add a reference to the queue of objects needing finalization. Similarly, soft, weak and phantom references encapsulate a reference to the object in question. Also a debugger may keep a reference to an object, once it has seen it.
But for your trivial code Object obj = new Object(); obj = null;, assuming there’s no breakpoint set in-between, there will be no additional reference and hence, no way of resurrecting the object. A JVM may even elide the entire allocation when optimizing the code at runtime. So then you wouldn’t even find remainings of the object in the RAM when searching it as the object effectively never existed.

At this point, i don't have any reference to this object, but it's still on the heap, because garbage collection don't happens instantly.
It is undefined where it is, and it is also undefined whether or not garbage collection happens instantly.
Is there a way to re obtain reference on this object, before it'll be collected by GC?
You already had one and you threw it away. Just keep it.
I will need to know where in memory exactly object is allocated.
There is nothing in standard Java that will tell you that, and no useful way you could make use of the information if you could get it.
Also, there is Weak/SoftReference, but they are implemented by special GC behavior.
I don't see how this affects your question, whatever it is.

Free memory from complex objects in Java

I try to do my best to explain my question. Maybe it's a bit abstract.
I read some literature about not invoking GC explictly in Java code, finalize method, pointing to null, etc.
I have some large XMLs files (customer invoices). Using Jaxb, the file marshals in a complex Java object. Its attributes are basic types (Integer, BigDecimal, String, etc.) but also class of other complex classes, list of other classes, list of classes with list as attribute, etc.
When I do my stuff with the object, I need to remove it from the memory. Some XML are very large and I can avoid a memory leak or OutOfMemoryError situation.
So, my questions are:
Is it enough to assign big object to null? I read that, if there are soft references, GC will not free the object.
Should I do a in deep clearing of the object, clearing all list, assigning null to the attributes, etc.?
What about JaxB (I'm using Java6, so JaxB is built in) and soft references? JaxB is faster than old JibX marshaller, but I don't know if it's worse in memory usage.
Should I wrap the megacomplex JaxB class with WeakReference or something like this?
Excuse me for mixing Java memory usage concepts, JaxB, etc. I'm studying the stability of a large running process, and the .hprof files evidence that all customers data of all invoices remains in memory.
Excuse me if it's a simple, basic or rare question.
Thanks in advance

Unless something else points to parts of your big object (graph), assigning the big object reference null is enough.
Safest though, would be to use a profiler after your application has been running for a while, and look at the object references, and see if there's something that isn't properly GC'ed.

Is it enough to assign big object to null? I read that, if there are soft references, GC will not free the object.
The short answer is yes. It is enough to assign (all strong references to) a big object to null - if you do this, the object will no longer be considered "strongly reachable" by the Garbage Collector.
Soft references will not be a problem in your case, because it's guaranteed that softly reachable objects will be garbage collected before an OutOfMemoryError is thrown. They might well prevent the garbage collector from collecting the object immediately (if they didn't, they'd act exactly the same as weak references). But this memory use would be "temporary", in that it would be freed up if it were needed to fulfil an allocation request.
Should I do a in deep clearing of the object, clearing all list, assigning null to the attributes, etc.?
That would probably be a bad idea. If the field values are only referenced by the outer big object, then they will also be garbage collected when the big object is collected. And if they are not, then the other parts of the code that reference them will not be happy to see that you're removing members from a list they're using!
In the best case this does nothing, and in the worst case this will break your program. Don't let the lure of this distract you from addressing the sole actual issue of whether your object is strongly-reachable or not.
What about JaxB (I'm using Java6, so JaxB is built in) and soft references? JaxB is faster than old JibX marshaller, but I don't know if it's worse in memory usage.
I'm not especially familiar with the relative time and space performance of those libraries. But in general, it's safe to assume a very strong "innocent until proven guilty" attitude with core libraries. If there were a memory leak bug, it would probably have been found, reported and fixed by now (unless you're doing something very niche).
If there's a memory leak, I'm 99.9% sure that it's your own code that's at fault.
Should I wrap the megacomplex JaxB class with WeakReference or something like this?
This sounds like you may be throwing GC "fixes" at the problem without thinking through what is actually needed.
If the JaxB class ought to be weakly referenced, then by all means this is a good idea (and it should be there already). But if it shouldn't, then definitely don't do this. Weak referencing is more a question of the overall semantics, and shouldn't be something you introduce specifically to avoid memory issues.
If the outer code needs a reference to the object, then it needs a reference - there's no magic you can do to have the intance be garbage collected yet still available. If it doesn't need a reference (beyond a certain point), then it doesn't need one at all - better to just nullify a standard [strong] reference, or let it fall out of scope. Weak references are a specialist situation, and are generally used when you don't have full control over the point where an object ceases to be relevant. Which is probably not the case here.
the .hprof files evidence that all customers data of all invoices remains in memory.
This suggests that they are indeed being referenced longer than is necessary.
The good news is that the hprof file will contain details of exactly what is referencing them. Look at an invoice instance that you would expect to have been GCed, and see what is referencing it and preventing it from being GCed. Then look into the class in question to see how you expect that reference to be freed, and why it hasn't been in this case.
All good performance/memory tweaking is based on measurements. Taking heap dumps, and inspecting the instances and references to them, is your measurements. Do this and act on the results, rather than trying to wrap things in WeakReferences on the hope that it might help.

You wrote
hprof files evidence that all customers data of all invoices remains in memory.
You should analyse it using mat. Some good notes at http://memoryanalyzer.blogspot.in/

Is there a practical use for weak references? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Weak references - how useful are they?
Since weak references can be claimed by the garbage collector at any time, is there any practical reason for using them?

If you want to keep a reference to something as long as it is used elsewhere e.g. a Listener, you can use a weak reference.
WeakHashMap can be used as a short lived cache of keys to derived data. It can also be used to keep information about objects used else where and you don't know when those objects are discarded.
BTW Soft References are like Weak references, but they will not always be cleaned up immediately. The GC will always discard weak references when it can and retain Soft References when it can.
There is another kind of reference called a Phantom Reference. This is used in the GC clean up process and refers to an object which isn't accessible to "normal" code because its in the process of being cleaned up.

Since weak reference can be claimed by garbage collector at any time, is there any practical reason to use it?
Of course there are practical reasons to use it. It would be awfully strange if the framework designers went to the enormous expense of building a weak reference system that was impractical, don't you think?
I think the question you intended to ask was:
What are realistic situations in which people use weak references?
There are many. A common one is to achieve a performance goal. When performance tuning an application one often must make a tradeoff between more memory usage and more time usage. Suppose for example there is a complex calculation that you must perform many times, but the computation is "pure" -- the answer depends only on the arguments, not upon exogenous state. You can build a cache -- a map from the arguments to the result -- but that then uses memory. You might never ask the question again, and that memory is would then be wasted.
Weak references possibly solve this problem; the cache can get quite large, and therefore time is saved if the same question is asked many times. But if the cache gets large enough that the garbage collector needs to reclaim space, it can do so safely.
The downside is of course that the cleanup policy of the garbage collector is tuned to meet the goals of the whole system, not your specific cache problem. If the GC policy and your desired cache policy are sufficiently aligned then weak references are a highly pragmatic solution to this problem.

If a WeakReference is the only reference to an object, and you want the object to hang around, you should probably be using a SoftReference instead.
WeakReferences are best used in cases where there will be other references to the object, but you can't (or don't want to have to) detect when those other references are no longer used. Then, the other reference will prevent the object from being garbage collected, and the WeakReference will just be another way of getting to the same object.
Two common use cases are:
For holding additional (often expensively calculated but reproducible) information about specific objects that you cannot modify directly, and whose lifecycle you have little control over. WeakHashMap is a perfect way of holding these references: the key in the WeakHashMap is only weakly held, and so when the key is garbage collected, the value can be removed from the Map too, and hence be garbage collected.
For implementing some kind of eventing or notification system, where "listeners" are registered with some kind of coordinator, so they can be informed when something occurs – but where you don't want to prevent these listeners from being garbage collected when they come to the end of their life. A WeakReference will point to the object while it is still alive, but point to "null" once the original object has been garbage collected.

We use it for that reason - in our example, we have a variety of listeners that must register with a service. The service keeps weak references to the listeners, while the instantiated classes keep strong references. If the classes at any time get GC'ed, the weak reference is all that remains of the listeners, which will then be GC'ed as well. It makes keeping track of the intermediary classes much easier.

The most common usage of weak references is for values in "lookup" Maps.
With normal (hard) value references, if the value in the map no longer has references to it elsewhere, you often don't need the lookup any more. With weakly referenced map values, once there are no other references to it, the object becomes a candidate for garbage collection
The fact that the map itself has a (the only) reference to the object does not stop it from being garbage collected because the reference is a weak reference

To prevent memory leaks, see this article for details.

A weak reference is a reference that does not protect the referent object from collection by a garbage collector.
An object referenced only by weak references is considered
unreachable (or "weakly reachable") and so may be collected at any
time.
Weak references are used to avoid keeping memory referenced by
unneeded objects. Some garbage-collected languages feature or support
various levels of weak references, such as Java, C#, Python, Perl, PHP or
Lisp.
Garbage collection is used to reduce the potential for memory leaks
and data corruption. There are two main types of garbage collection:
tracing and reference counting. Reference counting schemes record the
number of references to a given object and collect the object when
the reference count becomes zero. Reference-counting cannot collect
cyclic (or circular) references because only one object may be
collected at a time. Groups of mutually referencing objects which are
not directly referenced by other objects and are unreachable can thus
become permanently resident; if an application continually generates
such unreachable groups of unreachable objects this will have the
effect of a memory leak. Weak references may be used to solve the
problem of circular references if the reference cycles are avoided by
using weak references for some of the references within the group.
Weak references are also used to minimize the number of unnecessary
objects in memory by allowing the program to indicate which objects
are not critical by only weakly referencing them.

I use it generally for some type of cache. Recently accessed items are available immediately and in the case of cache miss you reload the item (DB, FS, whatever).

I use WeakSet to encode links in a graph. If a node is deleted, the links automatically disappear.

Does circular GC work in a map?

I have a User object which strongly refers to a Data object.
If I create a Map<Data, User> (with Guava MapMaker) with weak keys, such a key would only be removed if it's not referenced anywhere else. However, it is always refered to by the User object that it maps to, which is in turn only removed from the map when the Data key is removed, i.e. never, unless the GC's circular reference detection also works when crossing a map (I hope you understand what I mean :P)
Will Users+Datas be garbage collected if they're no longer used elsewhere in the application, or do I need to specify weak values as well?

The GC doesn't detect circular references because it doesn't need to.
The approach it takes is to keep all the objects which are strongly referenced from root nodes e.g. Thread stacks. This way objects not accessible strongly (with circular references or not) are collected.
EDIT: This may help explain the "myth"
http://www.javacoffeebreak.com/articles/thinkinginjava/abitaboutgarbagecollection.html
Reference counting is commonly used to explain one kind of garbage collection but it doesn't seem to be used in any JVM implementations.
This is an interesting link http://www.ibm.com/developerworks/library/j-jtp10283/

In documentation you see:
weakKeys()
Specifies that each key (not value) stored in the map should be wrapped in a WeakReference (by default, strong references are used).
since it is weakReferenced it will be collected.
http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/collect/MapMaker.html

How are weak references implemented?

I wonder how weak references work internally, for example in .NET or in Java. My two general ideas are:
"Intrusive" - to add list of weak references to the most top class (object class). Then, when an object is destroyed, all the weak references can be iterated and set to null.
"Non-intrusive" - to maintain a hashtable of objects' pointers to lists of weak references. When a weak reference A is created to an object B, there would be an entry in the hashtable modified or created, whose key would be the pointer to B.
"Dirty" - to store a special hash-value with each object, which would be zeroed when the object is destroyed. Weak references would copy that hash-value and would compare it with the object's value to check if the object is alive. This would however cause access violation errors, when used directly, so there would need to be an additional object with that hash-value, I think.
Either of these solutions seems clean nor efficient. Does anyone know how it is actually done?

In .NET, when a WeakReference is created, the GC is asked for a handle/opaque token representing the reference. Then, when needed, WeakReference uses this handle to ask the GC if that handle is still valid (i.e. the original object still exists) - and if so, it can get the actual object reference.
So this is building a list of tokens/handles against object addresses (and presumably maintaining that list during defragmentation etc)
I'm not sure I 100% understand the three bullets, so I hesitate to guess which (if any) that is closest to.

Not sure I understood your question, but you can have a look at the implementation for the class WeakReference and its superclass Reference in Java. It is well commented and you can see it has a field treated specially by the GC and another one used directly by the VM.

Python's PEP 205 has a decent explanation of how weak references should behave in Python, and this gives some insight into how they can be implemented. Since a weak reference is immutable, you could have just one for each object, to which you pass out references as needed. Thus, when the object is destroyed, only one weak reference needs to be invalidated.

It seems that implementation of weak references is well-kept secret in the industry ;-). For example, as of now, wikipedia article lacks any implementation details. And look at the answers above (including the accepted): "go look at the source" or "I think" ;-\ .
Of all the answers, only the one referencing Python's PEP 205 is insightful. As it says, for any single object, there can be at most one weak reference, if we treat weakref as an entity itself.
The rest describes Squirrel language implementation. So, weakref is itself an object, when you put weak reference to an object in some container, you actually put reference to weakref object. Each ref-countable object has field to store pointer to its weakref, which is NULL until weakref to that object is actually requested. Each object has method to request weakref, which either returns existing (singleton) weakref from the field, or creates it and caches in the field.
Of course, weakref points to the original object. So, then you just need to go thru all the available places where references to objects are handled and add transparent handling of weakrefs (i.e. automatically dereference it). ("Transparent" alternative is to add virtual "access" method which will be identity for most objects, and actual dereference for weakref.)
And as object has pointer to its weakref, then the object can NULLify the weakref in own destructor.
This implementation is pretty clean (no magic "calls into GC" and stuff) and has O(1) runtime cost. Of course, it's pretty greedy of memory - need to add +1 pointer field to each object, even though typically for 90+% objects that would be NULL. Of course, VHLLs already have large memory overhead per object, and there may be chance to compact different "extra" fields. For example, object type is typically a small enumeration, so it may be possible to merge type and some kind of weakref reference into single machine word (say, keep weakref objects in a separate arena, and use index to that).

The normal approach, I think, is for the system to maintain some sort of list of weak references. When the garbage collector executes, before dead objects are removed, the system iterates through the list of weak references and invalidates any reference whose target has not been tagged live. Depending upon the system, this may occur before or after the system temporarily resurrects objects which are eligible for immediate finalization (in the case of .net, there are two kinds of WeakReference--one of which is effectively processed before the system scans for finalizers, meaning that it will become invalid when its target becomes eligible for finalization, and one of which is processed after).
Incidentally, if I were designing a gc-based framework, I would add a couple of other goodies: (1) a means of declaring a reference-type storage location as holding a reference that's primarily of interest to someone else, and (2) A variety of WeakReference which would could indicate that the only references to an object are in "of interest to someone else" storage locations. Although WeakReference is a useful type, the act of turning a weak reference into a strong reference may prevent the system from ever recognizing that nobody would mind if its target disappeared.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.