What are WeakReferences, Weakhashmaps, softreferences used for?

What are WeakReferences, Weakhashmaps, softreferences used for? - java

Please explain what WeakReferences are used for. I usually do understand Java concepts, but this one is giving me trouble.
I do understand what WeakReferences are, but their usage and nature is a little vague inside my head. I am not able to visualize a correct scenario wherein using WeakReferences becomes a necessity.
I also know that a WeakHashMap is related to WeakReferences where a row which contains a null key, gets automatically removed. I can't visualize how can this be, that I have a WeakHashMap somewhere, and some other process nullifies a key, and then WeakHashMap saves the day by removing that row.
Also this article that everyone refers to, does not provide a case study that would help me understand.
If anyone out there can come up with a scenario and give me some understanding into this, I would be really grateful.

Weak references are basically used when you don't want the object to "stick" around if no one else is pointing to it. One very common use case which I believe helps when thinking of weak references is the use of weak hash map for maintaining canonical mapping.
Consider a case wherein you need to maintain a mapping between a Class<?> instance and the list of all methods it holds. Given that the JVM is perfectly capable of dynamic class loading and unloading, it's quite possible that the class you have in your map as a key is no longer needed (doesn't have anything else pointing to it). Now, if you would have used a "strong" reference to maintain the class to method mapping, your class will stick around as long as your map is reachable which isn't a good position to be in this case. What you would really want is that once there are no live references to your "class", it should be let go by the map. This is exactly what a weak hash map is used for.
EDIT: I would recommend giving this thread a read.

Related

Identity Hashcode to Java Object

A friend of mine and I have the following bet going:
It is possible to get the Object again from the memory by using the Identity Hashcode received for that Object using System.identityHashCode() in Java. With the restriction that it has not yet been cleaned up by the Garbage Collector.
I have been looking for an answer for quite some while now and am not able to find a definite one.
I think that it might be possible to do so using the JVMTI, but I havn't yet worked with it.
Does anyone of you have an answer to that? Will buy you a coffie, if I can do so on your site ;)
Thanks in advance,
Felix
p.s: I am saying this behaviour can be achieved and the friend of mine says it is not possible

In theory it is possible however you have some issues.
it is randomly generated so it is not unique. Any number of objects (though unlikely) could have the same identity hash code.
it is not a memory location, it doesn't change when moved from Eden, around the Survivors spaces or in tenured space.
you need to find all the object roots to potentially find it.
If you can assume it is visible to a known object like a static collection, it should be easy to navigate via reflection.
BTW Once the 64-bit OpenJDK/Oracle JVM, the identity hash code is stored in the header from offset 1, this means you can read it, or even change it using sun.misc.Unsafe. ;)
BTW2 The 31-bit hashCode (not 32-bit) stored in the header is lazily set and is also used for biased locking. i.e. once you call Object.hashCode() or System.identityHashCode() you disable biased locking for the object.

I think your friend is going to win this bet. Java/the JVM manages the memory for you and there is no way to access it once you drop all your references to something.
Phantom References, Weak References, etc are all designed to allow just what you are describing - so if you keep a Weak or Phantom reference to something you can. identityHashCode is neither though.
C and C++ might let you do this since you have more direct control of the memory, but even then you would need the memory location not a hash of it.

No, because the identityHashCodes are not necessarily unique. They are not pointers to the objects.

No. The identityHashCode is not necessarily a memory address: it is only the default implementation of hashCode. It is also not guaranteed to be unique for all objects (but different instances should have different identityHashCodes).
Even if the identityHashCode is derived from a memory address, the object may be reallocated (but the identityHashCode cannot change, by definition).

Keeping track of already seen objects

I'm trying to implement an interceptor for my application, that would be able to keep track of objects it has seen. I need to be able to tell, whether the object I'm seeing now is something new, or a reused one.
Assuming I have an interface like this one:
public interface Interceptor {
void process(Object o);
}
I've been thinking about adding a Set that would keep track of those objects. But since I don't want to cause memory leaks with that kind of behavior, perhaps I should devise some other pattern? In the end, those objects may be destroyed in other layers.
Possible solutions seem:
putting hashCode of an object into the Set
using WeakHashSet instead of HashSet
the first option seems not 100% reliable, because hashCode may not be unique. As for the second option, I'm not that sure this will prevent memleaks.
And one more note, I'm not able to modify the objects, I can't add fields, methods. Wrapping is also not an option.
Any ideas?

WeakReferences are the way to go. From here:
A weak reference, simply put, is a reference that isn't strong enough
to force an object to remain in memory. Weak references allow you to
leverage the garbage collector's ability to determine reachability for
you, so you don't have to do it yourself.
i.e. keeping a WeakReference won't force the JVM to hold a reference to this object.
Of course the weak reference isn't strong enough to prevent garbage
collection, so you may find (if there are no strong references to the
widget) that weakWidget.get() suddenly starts returning null.

Just a completion to Brian Agnew correct answer. There is no WeakHashSet class in java API, you'll need to create it from a WeakHashMap like this:
Set<Object> weakHashSet = Collections.newSetFromMap(
new WeakHashMap<Object, Boolean>());
See Collections.newSetFromMap java docs.

Java efficiency - child object referencing parent object

I'm new to java/garbage collected languages and I still am getting my head around what it means to have an object reference (because I'm told it's not a pointer?) so I'm pondering this question:
I have a parent/child object structure where the parent will have several lists of several children each...is there any inefficiency or any other reason not to have a pointer in each child back to it's parent? In my prior language (Delphi) it was a simple pointer so not a problem at all. Are there any considerations with this practice in Java?

There shouldn't be any issue here. Technically yes, Java references are not pointers, but for most issues, you can think of them similarly. Object references are integers pointing to locations in Java's heap. Each additional place it's stored is therefore one additional integer. Reasonably small, generally speaking.
You can (generally!) trust Java to do the right thing when it comes to object management, and shouldn't have to worry too much about garbage collection or the intricacies of how object references work.

From what I know I'd say you'd be fine doing that. Java does a good job of cleaning up your garbage and I usually have a 'parent' field in children classes.

As previous answers have stated, generally the GC is pretty good with clearing things up. Your primary concern will be things that persist once you leave an activity, hold onto context. This will cause your Activity to stay in memory because you have a reference to it that is not in it's parent child tree.
More on this here

I think it would be helpful if you read up on reference types as well - strong, weak, phantom and soft as it would be helpful. Also, read up on how GC works (for different generations - young/survivor spaces & old generation), garbage collectors to use and GC parameters that you can specify.

Does circular GC work in a map?

I have a User object which strongly refers to a Data object.
If I create a Map<Data, User> (with Guava MapMaker) with weak keys, such a key would only be removed if it's not referenced anywhere else. However, it is always refered to by the User object that it maps to, which is in turn only removed from the map when the Data key is removed, i.e. never, unless the GC's circular reference detection also works when crossing a map (I hope you understand what I mean :P)
Will Users+Datas be garbage collected if they're no longer used elsewhere in the application, or do I need to specify weak values as well?

The GC doesn't detect circular references because it doesn't need to.
The approach it takes is to keep all the objects which are strongly referenced from root nodes e.g. Thread stacks. This way objects not accessible strongly (with circular references or not) are collected.
EDIT: This may help explain the "myth"
http://www.javacoffeebreak.com/articles/thinkinginjava/abitaboutgarbagecollection.html
Reference counting is commonly used to explain one kind of garbage collection but it doesn't seem to be used in any JVM implementations.
This is an interesting link http://www.ibm.com/developerworks/library/j-jtp10283/

In documentation you see:
weakKeys()
Specifies that each key (not value) stored in the map should be wrapped in a WeakReference (by default, strong references are used).
since it is weakReferenced it will be collected.
http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/collect/MapMaker.html

Fetching customer orders: Set<Order> getAllOrders() vs. Set<Integer> getAllOrders()

I haven’t done much Java programming and hence a lot of unanswered ORM questions revolve in my head that might seem as fairly straight forward to more seasoned folks.
Let's say we have two classes: Customer and Order. Customer class implements a method called listAllOrders, what should the method’s signature be?
Set<Order> getAllOrders(); // the OOP way
Set<Integer> getAllOrders(); // db-friendly way, based on the assumption that each order is assigned a unique int id
int[] getAllOrders(); // db-friendly way, optimised
Set<OrderID> getAllOrders(); // the OOP way, optimised
Other? A combination of the above through method overloading?
My thinking is to dismiss 3rd option right away, since the optimisation is premature, uncalled for and is likely to cause more trouble than good.
Choosing between the 1st and the 2nd options, the main argument seems to be that in many scenarios returning a collection of orders instead of id’s is going to be an overkill.
The 1st option would always require pre-loading of orders into memory and although some order properties can be handled through lazy loading the Order objects themselves would still require significantly more memory allocated as opposed to a set of bare id’s. The other reason seems to be that lazy loading won’t give me the advantage of using the final modifier to enforce immutability of Order fields.
However, option 2 doesn’t really encapsulate order numbering, i.e. if it were decided to start using String UUID or long as order identifier instead of integer it would become necessary to do a serious rewrite. Obviously this could be mitigated by introduction of a new lightweight object called OrderId (4th approach).
Well, at this point I'd really appreciate some help from ORM and Java gurus!

It always helps to know what the caller wants. It isn't an optimization to return an order id if every time, the caller is then going to have to look up every single order. On the other hand, if most of the time they don't need all that information, then there isn't any point in return it.
I would go for option 1 most of the time. But if it's something memory constrained like a smart phone, I think I'd go for the lazy expansion.

One of the hardest things for people that are new to ORM tools is that they try to apply the knowledge they have of the DB and the optimizations that were used in that context (storing IDs, etc.). I would suggest that you work with objects, and address performance problems if they come up.
Most ORMs handle this type of concern for you; Hibernate for example will by default lazy-load a collection (or Orders in your case). Internally it will store a collection of the ID values for you, but this is really not your concern as you will only interact with the object and collection of associated objects. If you focus on your domain objects and think "OO", your ORM tool should support you and allow you to further optimize later if it becomes necessary.

I suggest to use option one, because you are dealing with business objects and not messing with numbers. If you need the id, you can still obtain it from an order. And if you use a smart OR mapper, only the primary key of an order will be loaded until you access some real data - so no point to wory about database or memory performance.

I have to agree with Paul Tomblin, if the function is called getAllOrders() I would expect it to return an order object (assuming such a class exists). The function should be called getAllOrderIds() if its going to return something more specific.

Option #2 is, like option #3, a premature optimization. Your client wants to know what the orders are; it may very well not even care what the orders' IDs are. There are circumstances where a client might want the IDs, but if you see that happening, you should probably ask Why? and find a way to operate directly on the Orders themselves.

I'll echo Paul Tomblin in saying that the question is not, "What's the best thing to do?" but "What's the best way to accomplish this objective?" Before you can say what the return value of a function should be, you have to decide what the caller is going to do with the information. If what the caller wants to do is, say, display a list of Order IDs for the user, then return Order IDs. If the caller wants to display a list of order dates, total cost, and shipment statuses, then I'd return a set of objects containing that data. Etc.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.