Immutable-but-refreshable collections? - java

A common pattern I am encountering is a need to refresh a collection like a List or a HashMap from a database but don't want values added or removed from it in between refreshes.
I recently discovered the ImmutableList and ImmutableMap in Google's Guava library which I like a lot. But I cannot do a "clearAll()" on these types nor repopulate them. But I like that they are immutable beyond that.
So if I wanted to enforce this pattern of "only make mutable on a database refresh", I guess I would have to use a volatile variable each time? Or is there a better pattern? This can be in a multithreaded environment too, and I would ensure there's no race conditions on the refresh();
public class MarketManager {
private volatile ImmutableList<Market> markets = null;
private MarketManager() {
}
public void refresh() {
marketList = //build a new immutable market list
}
public static MarketManager getInstance() {
MarketManager marketManager = new MarketManager();
marketManager.refresh();
return marketManager;
}
}

Here is how you could code your class so that you use Collections.unmodifiableList():
public class MarketManager {
private volatile List<Market> markets = Collections.emptyList();
private MarketManager() {
}
public void refresh() {
marketList = //build a new market list
}
public List<Market> getMarkets() {
return Collections.unmodifiableList(markets);
}
// ...
}
Collections.unmodifiableList() simply wraps any List implementation so that all mutation operations are "disabled" (that is, they throw an exception).

I think you're trying to mix together two concepts: unmodifiable objects and immutable objects.
Unmodifiable objects are objects that YOU cannot modify. However, they can be changed "under you" by other actors. Such objects are useful when you need to control access to your object's internal state. However, they are not useful for thread safety or optimization.
Immutable objects are truly immutable. Once created, they cannot be changed. Not by you, not by anyone. They offer all the advantages of unmodifiable objects, plus they permit optimizations and guarantee thread safety.
What you're proposing will turn an immutable collection (created and frozen) into an unmodifiable one (I can't change it, but it can be refreshed from the DB). Just use unmodifiable collection framework from Java, no need to chang ethe Guava immutable collections.

Related

Java, create a cache of re-usable thread unsafe objects

Sometimes in java I have objects that are thread unsafe and expensive to create. I would like to create a cache of those objects so I don't need to re-create them but it must also prevent concurrent access to the same object.
For example I might have DateFormat and creating it is too expensive, but I can't share a single DateFormat. For arguments sake assume that I can't use a thread safe DateFormat.
What would be fantastic is to be able to create some cache like this:
Cache<DateFormat> cache = new Cache(() -> dateFormatCreator());
// and now make use of a dateFormat that might be created for this call
// or it might be an existing one from the cache.
cache.withExclusiveAccessToObject(dateFormat -> {
// use the dateFormat here, it is not in use by any other thread at the same time.
// new dateFormats can be created on the fly as needed.
});
I should have also mentioned that ThreadLocal is not ideal as I can not ensure threads are going to be re-used.
I believe there are two paths you can go:
Option 1
Maintain an object-per-thread
This can work if you access the expensive object from a limited well defined set of threads (read, using a thread-pool and not creating threads every time, which is what happens anyway in many applications).
In this case you can use a ThreadLocal. Since within one thread everything is expected to be sequential, you can keep thread-unsafe objects in a thread local.
You can think of ThreadLocal as a map that per thread maintains a dedicated instance of an expensive object.
Option 2
Share one (or N in general) objects between M threads so that N < M. In this case there might be a situation where two threads will try to work with the same object.
I'm not aware of ready solution for this, after all its your objects that you want to maintain, but in general its pretty easy to wrap your own implementation that will provide some sort of locking/synchronized access to the objects for your types of objects.
The range of ideas for implementations can vary. As an idea: You can wrap an actual object with a runtime/build-time generated proxy making it effectively thread safe:
public interface IMyObject {
void inc();
void dec();
}
// this is an object that you would like to make thread safe
public class MyActualObject implements IMyObject {
private int counter = 0;
void inc() {counter++;}
void dec() {counter--;}
}
public class MyThreadSafeProxy implements IMyObject {
private IMyObject realObject;
public MyThreadSafeProxy(IMyObject realObject) {
this.realObject = realObject;
}
#Override
public synchronized void inc() {
realObject.inc();
}
#Override
public syncrhonized void dec() {
realObject.dec();
}
}
Instead of storing MyObject-s you can wrap them in MyThreadSafeProxy
Its possible also to generate such a proxy automatically: See cglib framework or Dynamic Proxies (java.lang.Proxy class)
From my experience usually Option 1 is preferable unless the objects you work with are too expensive so that if there are N threads in the pool, you can't really support N objects in memory.

Does client-side locking violates encapsulation of synchronization policy?

As mentioned by Java_author,
Client-side locking entails guarding client code that uses some object X with the lock, X uses to guard its own state.
That object X in below code is list. Above point says, using lock owned by ListHelper type object to synchronize putIfAbsent(), is a wrong lock.
package compositeobjects;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class ListHelper<E> {
private List<E> list =
Collections.synchronizedList(new ArrayList<E>());
public boolean putIfAbsent(E x) {
synchronized(list){
boolean absent = !list.contains(x);
if(absent) {
list.add(x);
}
return absent;
}
}
}
But, Java author says,
Client-side locking has a lot in common with class extension—they both couple the behavior of the derived class to the implementation of the base class. Just as extension violates encapsulation of implementation [EJ Item 14], client-side locking violates encapsulation of synchronization policy.
My understanding is, the nested class instance that Collections.synchronizedList() return, also uses lock owned by list object.
Why usage of client-side locking(with list) in ListHelper, violates encapsulation of synchronization policy?
You are relying upon the fact that the synchronizedList uses itself as the monitor, which happens to be true at present.
You're even relying upon the fact that synchronizedList uses synchronized to achieve synchronization, which also happens to be true at present (it's a reasonable assumption, but it's not one that is necessary).
There are ways in which the implementation of synchronizedList could be changed such that your code wouldn't work correctly.
For instance, the constructor of synchronizedList:
SynchronizedList(List<E> list) {
super(list);
// ...
}
could be changed to
SynchronizedList(List<E> list) {
super(list, new Object());
// ...
}
Now, the mutex field used by the methods in the SynchronizedList implementation is no longer this (effectively), so synchronizing externally on list would no longer work.
With that said, the fact that using synchronized (list) does have the intended effect is described in the Javadoc, so this behavior won't be changed, so what you are doing now is absolutely fine; it has been designed using a leaky abstraction, and so shouldn't be designed like this if you were doing something similar from scratch, but that leaky abstraction's properties are documented.
Your code basically creates a synchronized set. It only adds elements if it not already in the list, the very definition of a set.
Regardless of how the synchronized list does its own locking the code you have must provide its own locking mechanism because there are two calls to the synchronized list, in which between the list will release its lock. So if two threads would add the same object they could both pass the contains check and both add it to the list. The compound synchronize makes sure that it isn't. Paramount is that all the list usage code goes through your utility class, otherwise it will still fail.
As I have written in an comment, exactly the same behaviour can be achieved with a synchronized set, which will also make sure that the element has not yet been added while locking the entire operation. By using this synchronized set access and modification without usage of your utility class is ok.
Edit:
If your code needs a list and not a set, and a LinkedHashSet isn't the option I would create a new synchronized list myself:
public class SynchronizedList<E> implements List<E> {
private List<E> wrapped = new ArrayList<E>();
....
#override
public int size() {
synchronized(this) {
return wrapped.size();
}
}
....
#override
public void add(E element) {
synchronized(this) {
boolean absent = !wrapped.contains(x);
if(absent) {
wrapped.add(element);
}
return absent;
}
}

Java: Should I construct lightweight objects each time or cache instance?

During code review, a colleague of mine looked at this piece of code:
public List<Item> extractItems(List<Object[]> results) {
return Lists.transform(results, new Function<Object[], Item>() {
#Override
public Item apply(Object[] values) {
...
}
});
}
He suggests changing it to this:
public List<Item> extractItems(List<Object[]> results) {
return Lists.transform(results, getTransformer());
}
private Function<Object[], Item> transformer;
private Function<Object[], Item> getTransformer() {
if(transformer == null) {
transformer = new Function<Object[], Item>() {
#Override
public Item apply(Object[] values) {
...
}
};
}
return transformer;
}
So we are looking at taking the new Function() construction, and moving it over to be a member variable and re-used next time.
While I understand his logic and reasoning, I guess I'm not sold that I should do this for every possible object that I create that follows this pattern. It seems like there would be some good reasons not to do this, but I'm not sure.
What are your thoughts? Should we always cache duplicately created objects like this?
UPDATE
The Function is a google guava thing, and holds no state. A couple people have pointed out the non-thread-safe aspect of this change, which is perfectly valid, but isn't actually a concern here. I'm more asking about the practice of constructing vs caching small objects, which is better?
Your colleague's proposal is not thread safe. It also reeks of premature optimization. Is the construction of a Function object a known (tested) CPU bottleneck? If not, there's no reason to do this. It's not a memory problem - you're not keeping a reference, so GC will sweep it away, probably from Eden.
As already said, it's all premature optimization. The gain is probably not measurable and the whole story should be forgotten.
However, with the transformer being stateless, I'd go for it for readability reasons. Anonymous functions as an argument rather pollute the code.
Just drop the lazy initialization - you're gonna use the transformer whenever you use the class, right? (*) So put it in a static final field and maybe you can reuse it somewhere else.
(*) And even if not, creating and holding a cheap object during the whole application lifetime doesn't matter.

Object locking private class members - best practice? (Java)

I asked a similar question the other day but wasn't satisfied with the response, mainly because the code I supplied had some issues that people focused on.
Basically, what is the best practice for locking private members in Java? Assuming each private field can only be manipulated in isolation and never together (like in my Test class example below), should you lock each private field directly (example 1), or should you use a general lock object per private field you wish to lock (example 2)?
Example 1: Lock private fields directly
class Test {
private final List<Object> xList = new ArrayList<Object>();
private final List<Object> yList = new ArrayList<Object>();
/* xList methods */
public void addToX(Object o) {
synchronized(xList) {
xList.add(o);
}
}
public void removeFromX(Object o) {
synchronized(xList) {
xList.remove(o);
}
}
/* yList methods */
public void addToY(Object o) {
synchronized(yList) {
yList.add(o);
}
}
public void removeFromY(Object o) {
synchronized(yList) {
yList.remove(o);
}
}
}
Example 2: Use lock objects per private field
class Test {
private final Object xLock = new Object();
private final Object yLock = new Object();
private List<Object> xList = new ArrayList<Object>();
private List<Object> yList = new ArrayList<Object>();
/* xList methods */
public void addToX(Object o) {
synchronized(xLock) {
xList.add(o);
}
}
public void removeFromX(Object o) {
synchronized(xLock) {
xList.remove(o);
}
}
/* yList methods */
public void addToY(Object o) {
synchronized(yLock) {
yList.add(o);
}
}
public void removeFromY(Object o) {
synchronized(yLock) {
yList.remove(o);
}
}
}
Personally I prefer the second form. No other code at all can use that reference (barring reflection, debugging APIs etc). You don't need to worry about whether the internal details of the list tries to synchronize on it. (Any method you call on a list obviously has access to this, so could synchronize on it.) You're purely using it for locking - so you've also got separation of concerns between "I'm a lock" and "I'm a list".
I find that way it's easier to reason about the monitor, as you can easily see all the possible code that uses it.
You may wish to create a separate class purely for use as monitors, with an override for toString() which could help with diagnostics. It would also make the purpose of the variable clearer.
Admittedly this approach does take more memory, and usually you don't need to worry about code locking on this... but I personally feel that the benefit of separating the concerns and not having to worry about whether that code does lock on itself outweighs the efficiency cost. You can always choose to go for the first form if you find that the "wasted" objects are a performance bottleneck for some reason (and after you've analyzed the code in the class you're potentially going to synchronize on).
(Personally I wish that both Java and .NET hadn't gone down the "every object has an associated monitor" route, but that's a rant for a different day.)
Example 1 is much better. Since xList are final, they are great for synchronization. There is no need for extra lock objects, unnecessarily complicating code and consuming memory. Only make sure the list itself is never exposed to the outside world breaking encapsulation and thread safety.
However consider:
CopyOnWriteArrayList
Collections.synchronizedList() - and see also: Java synchronized block vs. Collections.synchronizedMap
Let's put it this way: the second approach uses more code -- what does that extra code buy you? As far as concurrency, the two are exactly the same, so it must be some other aspect from the bigger picture of your app design.
Even if you're sure the object you are doing the lock will never change I find it more reassuring to use special object just for locking. It makes it more transparent. If the class was to be significantly expanded and/or modified in the future by someone else he might find a reason to make xList non-final without noticing that it's used for locking. This could quickly lead to problems. Thread safety is not trivial and can grow more complex when code evolves so make it as clear and as safe as possible. Cost of having a separate object just for locking is small compared to the cost of diagnosing problems with thread-safety.

What's the preferred way to assign a collection from a parameter?

I have this class:
public MyClass {
public void initialize(Collection<String> data) {
this.data = data; // <-- Bad!
}
private Collection<String> data;
}
This is obviously bad style, because I'm introducing a shared mutable state. What's the preferred way to handle this?
Ignore it?
Clone the collection?
...?
EDIT: To clarify why this is bad, imagine this:
MyClass myObject = new MyClass();
List<String> data = new ArrayList<String>();
myObject.initialize(data); // myObject.data.size() == 0
data.add("Test"); // myObject.data.size() == 1
Just storing the reference poses a way to inject data into the private field myObject.data, although it should be completely private.
Depending on the nature of MyClass this could have serious impacts.
The best way is to deep clone the parameter. For performance reasons, this is usually not possible. On top of that, not all objects can be cloned, so deep copying might throw exceptions and cause all kinds of headache.
The next best way would be a "copy-on-write" clone. There is no support for this in the Java runtime.
If you think that it's possible someone mutates the collection, do a shallow copy using the copy constructor:
this.data = new HashSet<String> (data);
This will solve your problem (since String is immutable) but it will fail when the type in the set is mutable.
Another solution is to always make the sets immutable as soon as you store them somewhere:
Set<String> set = ...
...build the set...
// Freeze the set
set = Collections.unmodifiableSet(set);
// Now you can safely pass it elsewhere
obj.setData (set);
The idea here is turn collections into "value objects" as soon as possible. Anyone who wants to change the collection must copy it, change it and then save it back.
Within a class, you can keep the set mutable and wrap it in the getter (which you should do anyway).
Problems with this approach: Performance (but it's probably not as bad as you'd expect) and discipline (breaks if you forget it somewhere).
Null check (if you want to restrict null)
Either defensive copy (if you don't want shared state)
or as you did (if a live view on data is useful)
Depends heavily on your requirements.
Edited:
Ignoring should be no option. Silent fail is, well... a debugging nightmare.
public class Foo {
private final Collection collection = new ArrayList();
public void initialise(final Collection collection) {
this.collection.addAll(collection);
}
}
Sorry for not addressing your concern directly, but I would never directly pass a Collection to a setXxx() bean setter method. Instead, I would do:
private final List<MyClass> theList;
public void addXxx(MyClass item) { ... }
public void removeXxx(MyClass item) { ... } // or index.
public void Iterator<MyClass> iterateXxx() {
return Collections.unmodifiableList(theList).iterator();
}
I would go for defensive copying / deep cloning only if I am sure there would be no side effects from using it, and as for the speed, I wouldn't concern myself with it, since in business applications reliability has 10 times more priority than speed. ;-)
An idea will be to pass the data as a String array and create the Set inside MyClass. Of course MyClass should test that the input data is valid. I believe that this is a good practice anyway.
If both the caller of MyClass and MyClass itself actually work with a Set<String>, then you could consider cloning the collection. The Set however needs to be constructed somehow. I would prefer to move this responsibility to MyClass.

Categories