Keeping track of already seen objects - java

I'm trying to implement an interceptor for my application, that would be able to keep track of objects it has seen. I need to be able to tell, whether the object I'm seeing now is something new, or a reused one.
Assuming I have an interface like this one:
public interface Interceptor {
void process(Object o);
}
I've been thinking about adding a Set that would keep track of those objects. But since I don't want to cause memory leaks with that kind of behavior, perhaps I should devise some other pattern? In the end, those objects may be destroyed in other layers.
Possible solutions seem:
putting hashCode of an object into the Set
using WeakHashSet instead of HashSet
the first option seems not 100% reliable, because hashCode may not be unique. As for the second option, I'm not that sure this will prevent memleaks.
And one more note, I'm not able to modify the objects, I can't add fields, methods. Wrapping is also not an option.
Any ideas?

WeakReferences are the way to go. From here:
A weak reference, simply put, is a reference that isn't strong enough
to force an object to remain in memory. Weak references allow you to
leverage the garbage collector's ability to determine reachability for
you, so you don't have to do it yourself.
i.e. keeping a WeakReference won't force the JVM to hold a reference to this object.
Of course the weak reference isn't strong enough to prevent garbage
collection, so you may find (if there are no strong references to the
widget) that weakWidget.get() suddenly starts returning null.

Just a completion to Brian Agnew correct answer. There is no WeakHashSet class in java API, you'll need to create it from a WeakHashMap like this:
Set<Object> weakHashSet = Collections.newSetFromMap(
new WeakHashMap<Object, Boolean>());
See Collections.newSetFromMap java docs.

Related

What are WeakReferences, Weakhashmaps, softreferences used for?

Please explain what WeakReferences are used for. I usually do understand Java concepts, but this one is giving me trouble.
I do understand what WeakReferences are, but their usage and nature is a little vague inside my head. I am not able to visualize a correct scenario wherein using WeakReferences becomes a necessity.
I also know that a WeakHashMap is related to WeakReferences where a row which contains a null key, gets automatically removed. I can't visualize how can this be, that I have a WeakHashMap somewhere, and some other process nullifies a key, and then WeakHashMap saves the day by removing that row.
Also this article that everyone refers to, does not provide a case study that would help me understand.
If anyone out there can come up with a scenario and give me some understanding into this, I would be really grateful.
Weak references are basically used when you don't want the object to "stick" around if no one else is pointing to it. One very common use case which I believe helps when thinking of weak references is the use of weak hash map for maintaining canonical mapping.
Consider a case wherein you need to maintain a mapping between a Class<?> instance and the list of all methods it holds. Given that the JVM is perfectly capable of dynamic class loading and unloading, it's quite possible that the class you have in your map as a key is no longer needed (doesn't have anything else pointing to it). Now, if you would have used a "strong" reference to maintain the class to method mapping, your class will stick around as long as your map is reachable which isn't a good position to be in this case. What you would really want is that once there are no live references to your "class", it should be let go by the map. This is exactly what a weak hash map is used for.
EDIT: I would recommend giving this thread a read.

Is there any method to obtain objects created by a specific class in Java?

I'd need to obtain all the objects created by a class. Is there any pre-built method
to do this in Java API?
It's not built into the API because there's no such thing as "created by an object" in terms of Java. You could use a cache for this. But it's not built into Java. You'd have to maintain this. You can save yourself some trouble by using Factory methods so you don't forget to store it in the cache.
class Creator {
Set<Stuff> set = new HashSet<Stuff>();
// each time you create an object, do this
void foo() {
Stuff stuff = // what you create
set.put(stuff);
}
// you get them like
Set<Stuff> objectsCreatedByThis() { return Collections.unmodifiableSet(stuff); }
}
One thing to be concerned about is that this will cause a high potential for memory leaks. You could use WeakReferences to get around this, so that when every other thread loses access to the Object, so does the set. After all, you wouldn't want your set to just store everything so it never gets GC'd.
No. There is nothing in the standard language nor standard API to help you. Your only chance is to restrict your problem to some domain which is completely under your control. Then you can use some kind of reference tracking either by a factory or directly in the constructor.
Be careful though: A naive implementation will keep references to all objects forever and this will spoil garbage collection. java.lang.ref.Reference<T> and associated stuff might help here. But frankly - this is not stuff for beginners.

When a PhantomReference/SoftReference/WeakReference is queued, how do you know what it referred to?

I haven't used PhantomReferences. There seems to be very few good examples of real-world use.
When a phantom shows up in your queue, how do you know which object it is/was? The get() method appears to be useless. According to the JavaDoc,
Because the referent of a phantom reference is always inaccessible,
this method always returns null.
I think that unless your object is a singleton, you always want to use a subclass of PhantomReference, in which you place whatever mementos you need in order to understand what died.
Is this correct, or did I miss something?
Is this also true for SoftReferences?
For WeakReferences?
Links to relevant examples of usage would be great.
I think that unless your object is a singleton, you always want to use a subclass of PhantomReference, in which you place whatever mementos you need in order to understand what died.
You could also use a Map<Reference<?>, SomeMetadataClassOrInterface> to recover whatever metadata you need. Since ReferenceQueue<T> returns a Reference<T>, you either have to cast it to whatever subclass of PhantomReference you expect, or let a Map<> do it for you.
For what it's worth, it looks like using PhantomReferences puts some burden on you:
Unlike soft and weak references, phantom references are not automatically cleared by the garbage collector as they are enqueued. An object that is reachable via phantom references will remain so until all such references are cleared or themselves become unreachable.
so you'd have to clear() the references yourself in order for memory to be reclaimed. (why there is usefulness in having to do so vs. letting the JVM do this for you is beyond me)
Your question has caused me to look into it a little more, and I found this very well written explanation and examples of all the reference types. He even talks about some (tenuous) uses of phantom references.
http://weblogs.java.net/blog/2006/05/04/understanding-weak-references

Does circular GC work in a map?

I have a User object which strongly refers to a Data object.
If I create a Map<Data, User> (with Guava MapMaker) with weak keys, such a key would only be removed if it's not referenced anywhere else. However, it is always refered to by the User object that it maps to, which is in turn only removed from the map when the Data key is removed, i.e. never, unless the GC's circular reference detection also works when crossing a map (I hope you understand what I mean :P)
Will Users+Datas be garbage collected if they're no longer used elsewhere in the application, or do I need to specify weak values as well?
The GC doesn't detect circular references because it doesn't need to.
The approach it takes is to keep all the objects which are strongly referenced from root nodes e.g. Thread stacks. This way objects not accessible strongly (with circular references or not) are collected.
EDIT: This may help explain the "myth"
http://www.javacoffeebreak.com/articles/thinkinginjava/abitaboutgarbagecollection.html
Reference counting is commonly used to explain one kind of garbage collection but it doesn't seem to be used in any JVM implementations.
This is an interesting link http://www.ibm.com/developerworks/library/j-jtp10283/
In documentation you see:
weakKeys()
Specifies that each key (not value) stored in the map should be wrapped in a WeakReference (by default, strong references are used).
since it is weakReferenced it will be collected.
http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/collect/MapMaker.html

Considering object encapsulation, should getters return an immutable property?

When a getter returns a property, such as returning a List of other related objects, should that list and it's objects be immutable to prevent code outside of the class, changing the state of those objects, without the main parent object knowing?
For example if a Contact object, has a getDetails getter, which returns a List of ContactDetails objects, then any code calling that getter:
can remove ContactDetail objects from that list without the Contact object knowing of it.
can change each ContactDetail object without the Contact object knowing of it.
So what should we do here? Should we just trust the calling code and return easily mutable objects, or go the hard way and make a immutable class for each mutable class?
It's a matter of whether you should be "defensive" in your code. If you're the (sole) user of your class and you trust yourself then by all means no need for immutability. However, if this code needs to work no matter what, or you don't trust your user, then make everything that is externalized immutable.
That said, most properties I create are mutable. An occasional user botches this up, but then again it's his/her fault, since it is clearly documented that mutation should not occur via mutable objects received via getters.
It depends on the context. If the list is intended to be mutable, there is no point in cluttering up the API of the main class with methods to mutate it when List has a perfectly good API of its own.
However, if the main class can't cope with mutations, then you'll need to return an immutable list - and the entries in the list may also need to be immutable themselves.
Don't forget, though, that you can return a custom List implementation that knows how to respond safely to mutation requests, whether by firing events or by performing any required actions directly. In fact, this is a classic example of a good time to use an inner class.
If you have control of the calling code then what matters most is that the choice you make is documented well in all the right places.
Joshua Bloch in his excellent "Effective Java" book says that you should ALWAYS make defensive copies when returning something like this. That may be a little extreme, especially if the ContactDetails objects are not Cloneable, but it's always the safe way. If in doubt always favour code safety over performance - unless profiling has shown that the cloneing is a real performance bottleneck.
There are actually several levels of protection you can add. You can simply return the member, which is essentially giving any other class access to the internals of your class. Very unsafe, but in fairness widely done. It will also cause you trouble later if you want to change the internals so that the ContactDetails are stored in a Set. You can return a newly-created list with references to the same objects in the internal list. This is safer - another class can't remove or add to the list, but it can modify the existing objects. Thirdly return a newly created list with copies of the ContactDetails objects. That's the safe way, but can be expensive.
I would do this a better way. Don't return a list at all - instead return an iterator over a list. That way you don't have to create a new list (List has a method to get an iterator) but the external class can't modify the list. It can still modify the items, unless you write your own iterator that clones the elements as needed. If you later switch to using another collection internally it can still return an iterator, so no external changes are needed.
In the particular case of a Collection, List, Set, or Map in Java, it is easy to return an immutable view to the class using return Collections.unmodifiableList(list);
Of course, if it is possible that the backing-data will still be modified then you need to make a full copy of the list.
Depends on the context, really. But generally, yes, one should write as defensive code as possible (returning array copies, returning readonly wrappers around collections etc.). In any case, it should be clearly documented.
I used to return a read-only version of the list, or at least, a copy. But each object contained in the list must be editable, unless they are immutable by design.
I think you'll find that it's very rare for every gettable to be immutable.
What you could do is to fire events when a property is changed within such objects. Not a perfect solution either.
Documentation is probably the most pragmatic solution ;)
Your first imperative should be to follow the Law of Demeter or ‘Tell don't ask’; tell the object instance what to do e.g.
contact.print( printer ) ; // or
contact.show( new Dialog() ) ; // or
contactList.findByName( searchName ).print( printer ) ;
Object-oriented code tells objects to do things. Procedural code gets information then acts on that information. Asking an object to reveal the details of its internals breaks encapsulation, it is procedural code, not sound OO programming and as Will has already said it is a flawed design.
If you follow the Law of Demeter approach any change in the state of an object occurs through its defined interface, therefore side-effects are known and controlled. Your problem goes away.
When I was starting out I was still heavily under the influence of HIDE YOUR DATA OO PRINCIPALS LOL. I would sit and ponder what would happen if somebody changed the state of one of the objects exposed by a property. Should I make them read only for external callers? Should I not expose them at all?
Collections brought out these anxieties to the extreme. I mean, somebody could remove all the objects in the collection while I'm not looking!
I eventually realized that if your objects' hold such tight dependencies on their externally visible properties and their types that, if somebody touches them in a bad place you go boom, your architecture is flawed.
There are valid reasons to make your external properties readonly and their types immutable. But that is the corner case, not the typical one, imho.
First of all, setters and getters are an indication of bad OO. Generally the idea of OO is you ask the object to do something for you. Setting and getting is the opposite. Sun should have figured out some other way to implement Java beans so that people wouldn't pick up this pattern and think it's "Correct".
Secondly, each object you have should be a world in itself--generally, if you are going to use setters and getters they should return fairly safe independent objects. Those objects may or may not be immutable because they are just first-class objects. The other possibility is that they return native types which are always immutable. So saying "Should setters and getters return something immutable" doesn't make too much sense.
As for making immutable objects themselves, you should virtually always make the members inside your object final unless you have a strong reason not to (Final should have been the default, "mutable" should be a keyword that overrides that default). This implies that wherever possible, objects will be immutable.
As for predefined quasi-object things you might pass around, I recommend you wrap stuff like collections and groups of values that go together into their own classes with their own methods. I virtually never pass around an unprotected collection simply because you aren't giving any guidance/help on how it's used where the use of a well-designed object should be obvious. Safety is also a factor since allowing someone access to a collection inside your class makes it virtually impossible to ensure that the class will always be valid.

Categories