Java: Should I construct lightweight objects each time or cache instance? - java

During code review, a colleague of mine looked at this piece of code:
public List<Item> extractItems(List<Object[]> results) {
return Lists.transform(results, new Function<Object[], Item>() {
#Override
public Item apply(Object[] values) {
...
}
});
}
He suggests changing it to this:
public List<Item> extractItems(List<Object[]> results) {
return Lists.transform(results, getTransformer());
}
private Function<Object[], Item> transformer;
private Function<Object[], Item> getTransformer() {
if(transformer == null) {
transformer = new Function<Object[], Item>() {
#Override
public Item apply(Object[] values) {
...
}
};
}
return transformer;
}
So we are looking at taking the new Function() construction, and moving it over to be a member variable and re-used next time.
While I understand his logic and reasoning, I guess I'm not sold that I should do this for every possible object that I create that follows this pattern. It seems like there would be some good reasons not to do this, but I'm not sure.
What are your thoughts? Should we always cache duplicately created objects like this?
UPDATE
The Function is a google guava thing, and holds no state. A couple people have pointed out the non-thread-safe aspect of this change, which is perfectly valid, but isn't actually a concern here. I'm more asking about the practice of constructing vs caching small objects, which is better?

Your colleague's proposal is not thread safe. It also reeks of premature optimization. Is the construction of a Function object a known (tested) CPU bottleneck? If not, there's no reason to do this. It's not a memory problem - you're not keeping a reference, so GC will sweep it away, probably from Eden.

As already said, it's all premature optimization. The gain is probably not measurable and the whole story should be forgotten.
However, with the transformer being stateless, I'd go for it for readability reasons. Anonymous functions as an argument rather pollute the code.
Just drop the lazy initialization - you're gonna use the transformer whenever you use the class, right? (*) So put it in a static final field and maybe you can reuse it somewhere else.
(*) And even if not, creating and holding a cheap object during the whole application lifetime doesn't matter.

Related

Multithreading with classes

This is a bit of an interesting question but I wanted to know everyone's thoughts on this design pattern.
public class MyThreadedMap {
private ConcurrentHashMap<Integer, Object> map;
...
public class Wrapper {
public Object get(int index){
return map.get(index);
}
}
}
At this point multiple threads will have their own instance of Wrapper and would be accessing the map with wrapper.get(index).
I found that the performance change from having the wrapper and not having the wrapper is just slightly better, that is the wrapper helps a little. When I place synchronized on the get method there is a serious performance hit.
What exactly is happening here? When an inner class is instantiated am I creating a copy of that get method for each instance? Would it be best if I just left the wrapper out since there is no real performance gain?
ConcurrentHashMap has fancy ways of minimizing synchronization overhead. When you synchronize the get method, it imposes normal synchronization overhead, thus the performance hit.
If there is no other code in the Wrapper class, I would just leave it out as it doesn't appear to add anything.

Object locking private class members - best practice? (Java)

I asked a similar question the other day but wasn't satisfied with the response, mainly because the code I supplied had some issues that people focused on.
Basically, what is the best practice for locking private members in Java? Assuming each private field can only be manipulated in isolation and never together (like in my Test class example below), should you lock each private field directly (example 1), or should you use a general lock object per private field you wish to lock (example 2)?
Example 1: Lock private fields directly
class Test {
private final List<Object> xList = new ArrayList<Object>();
private final List<Object> yList = new ArrayList<Object>();
/* xList methods */
public void addToX(Object o) {
synchronized(xList) {
xList.add(o);
}
}
public void removeFromX(Object o) {
synchronized(xList) {
xList.remove(o);
}
}
/* yList methods */
public void addToY(Object o) {
synchronized(yList) {
yList.add(o);
}
}
public void removeFromY(Object o) {
synchronized(yList) {
yList.remove(o);
}
}
}
Example 2: Use lock objects per private field
class Test {
private final Object xLock = new Object();
private final Object yLock = new Object();
private List<Object> xList = new ArrayList<Object>();
private List<Object> yList = new ArrayList<Object>();
/* xList methods */
public void addToX(Object o) {
synchronized(xLock) {
xList.add(o);
}
}
public void removeFromX(Object o) {
synchronized(xLock) {
xList.remove(o);
}
}
/* yList methods */
public void addToY(Object o) {
synchronized(yLock) {
yList.add(o);
}
}
public void removeFromY(Object o) {
synchronized(yLock) {
yList.remove(o);
}
}
}
Personally I prefer the second form. No other code at all can use that reference (barring reflection, debugging APIs etc). You don't need to worry about whether the internal details of the list tries to synchronize on it. (Any method you call on a list obviously has access to this, so could synchronize on it.) You're purely using it for locking - so you've also got separation of concerns between "I'm a lock" and "I'm a list".
I find that way it's easier to reason about the monitor, as you can easily see all the possible code that uses it.
You may wish to create a separate class purely for use as monitors, with an override for toString() which could help with diagnostics. It would also make the purpose of the variable clearer.
Admittedly this approach does take more memory, and usually you don't need to worry about code locking on this... but I personally feel that the benefit of separating the concerns and not having to worry about whether that code does lock on itself outweighs the efficiency cost. You can always choose to go for the first form if you find that the "wasted" objects are a performance bottleneck for some reason (and after you've analyzed the code in the class you're potentially going to synchronize on).
(Personally I wish that both Java and .NET hadn't gone down the "every object has an associated monitor" route, but that's a rant for a different day.)
Example 1 is much better. Since xList are final, they are great for synchronization. There is no need for extra lock objects, unnecessarily complicating code and consuming memory. Only make sure the list itself is never exposed to the outside world breaking encapsulation and thread safety.
However consider:
CopyOnWriteArrayList
Collections.synchronizedList() - and see also: Java synchronized block vs. Collections.synchronizedMap
Let's put it this way: the second approach uses more code -- what does that extra code buy you? As far as concurrency, the two are exactly the same, so it must be some other aspect from the bigger picture of your app design.
Even if you're sure the object you are doing the lock will never change I find it more reassuring to use special object just for locking. It makes it more transparent. If the class was to be significantly expanded and/or modified in the future by someone else he might find a reason to make xList non-final without noticing that it's used for locking. This could quickly lead to problems. Thread safety is not trivial and can grow more complex when code evolves so make it as clear and as safe as possible. Cost of having a separate object just for locking is small compared to the cost of diagnosing problems with thread-safety.

Determining synchronization scope?

in trying to improve my understanding on concurrency issues, I am looking at the following scenario (Edit: I've changed the example from List to Runtime, which is closer to what I am trying):
public class Example {
private final Object lock = new Object();
private final Runtime runtime = Runtime.getRuntime();
public void add(Object o) {
synchronized (lock) { runtime.exec(program + " -add "+o); }
}
public Object[] getAll() {
synchronized (lock) { return runtime.exec(program + " -list "); }
}
public void remove(Object o) {
synchronized (lock) { runtime.exec(program + " -remove "+o); }
}
}
As it stands, each method is by thread safe when used standalone. Now, what I'm trying to figure out is how to handle where the calling class wishes to call:
for (Object o : example.getAll()) {
// problems if multiple threads perform this operation concurrently
example.remove(b);
}
But as noted, there is no guarantee that the state will be consistent between the call to getAll() and the calls to remove(). If multiple threads call this, I'll be in trouble. So my question is - How should I enable the developer to perform the operation in a thread safe manner? Ideally I wish to enforce the thread safety in a way that makes it difficult for the developer to avoid/miss, but at the same time not complicated to achieve. I can think of three options so far:
A: Make the lock 'this', so the synchronization object is accessible to calling code, which can then wrap the code blocks. Drawback: Hard to enforce at compile time:
synchronized (example) {
for (Object o : example.getAll()) {
example.remove(b);
}
}
B: Place the combined code into the Example class - and benefit from being able to optimize the implementation, as in this case. Drawback: Pain to add extensions, and potential mixing unrelated logic:
public class Example {
...
public void removeAll() {
synchronized (lock) { Runtime.exec(program + " -clear"); }
}
}
C: Provide a Closure class. Drawback: Excess code, potentially too generous of a synchronization block, could in fact make deadlocks easier:
public interface ExampleClosure {
public void execute(Example example);
}
public Class Example {
...
public void execute(ExampleClosure closure) {
synchronized (this) { closure.execute(this); }
}
}
example.execute(new ExampleClosure() {
public void execute(Example example) {
for (Object o : example.getAll()) {
example.remove(b);
}
}
}
);
Is there something I'm missing? How should synchronization be scoped to ensure the code is thread safe?
Use a ReentrantReadWriteLock which is exposed via the API. That way, if someone needs to synchronize several API calls, they can acquire a lock outside of the method calls.
In general, this is a classic multithreaded design issue. By synchronizing the data structure rather than synchronizing concepts that use the data structure, it's hard to avoid the fact that you essentially have a reference to the data structure without a lock.
I would recommend that locks not be done so close to the data structure. But it's a popular option.
A potential technique to make this style work is to use an editing tree-walker. Essentially, you expose a function that does a callback on each element.
// pointer to function:
// - takes Object by reference and can be safely altered
// - if returns true, Object will be removed from list
typedef bool (*callback_function)(Object *o);
public void editAll(callback_function func) {
synchronized (lock) {
for each element o { if (callback_function(o)) {remove o} } }
}
So then your loop becomes:
bool my_function(Object *o) {
...
if (some condition) return true;
}
...
editAll(my_function);
...
The company I work for (corensic) has test cases extracted from real bugs to verify that Jinx is finding the concurrency errors properly. This type of low level data structure locking without higher level synchronization is pretty common pattern. The tree editing callback seems to be a popular fix for this race condition.
I think everyone is missing his real problem. When iterating over the new array of Object's and trying to remove one at a time the problem is still technically unsafe (though ArrayList implantation would not explode, it just wouldnt have expected results).
Even with CopyOnWriteArrayList there is the possibility that there is an out of date read on the current list to when you are trying to remove.
The two suggestions you offered are fine (A and B). My general suggestion is B. Making a collection thread-safe is very difficult. A good way to do it is to give the client as little functionality as possible (within reason). So offering the removeAll method and removing the getAll method would suffice.
Now you can at the same time say, 'well I want to keep the API the way it is and let the client worry about additional thread-safety'. If thats the case, document thread-safety. Document the fact that a 'lookup and modify' action is both non atomic and non thread-safe.
Today's concurrent list implementations are all thread safe for the single functions that are offered (get, remove add are all thread safe). Compound functions are not though and the best that could be done is documenting how to make them thread safe.
I think j.u.c.CopyOnWriteArrayList is a good example of similar problem you're trying to solve.
JDK had a similar problem with Lists - there were various ways to synchronize on arbitrary methods, but no synchronization on multiple invocations (and that's understandable).
So CopyOnWriteArrayList actually implements the same interface but has a very special contract, and whoever calls it, is aware of it.
Similar with your solution - you should probably implement List (or whatever interface this is) and at the same time define special contracts for existing/new methods. For example, getAll's consistency is not guaranteed, and calls to .remove do not fail if o is null, or isn't inside the list, etc. If users want both combined and safe/consistent options - this class of yours would provide a special method that does exactly that (e.g. safeDeleteAll), leaving other methods close to original contract as possible.
So to answer your question - I would pick option B, but would also implement interface your original object is implementing.
From the Javadoc for List.toArray():
The returned array will be "safe" in
that no references to it are
maintained by this list. (In other
words, this method must allocate a new
array even if this list is backed by
an array). The caller is thus free to
modify the returned array.
Maybe I don't understand what you're trying to accomplish. Do you want the Object[] array to always be in-sync with the current state of the List? In order to achieve that, I think you would have to synchronize on the Example instance itself and hold the lock until your thread is done with its method call AND any Object[] array it is currently using. Otherwise, how will you ever know if the original List has been modified by another thread?
You have to use the appropriate granularity when you choose what to lock. What you're complaining about in your example is too low a level of granularity, where the lock doesn't cover all the methods that have to happen together. You need to make methods that combine all the actions that need to happen together within the same lock.
Locks are reentrant so the high-level method can call low-level synchronized methods without a problem.

Generating singletons

This might sound like a weird idea and I haven't thought it through properly yet.
Say you have an application that ends up requiring a certain number of singletons to do some I/O for example. You could write one singleton and basically reproduce the code as many times as needed.
However, as programmers we're supposed to come up with inventive solutions that avoid redundancy or repetition of any kind. What would be a solution to make multiple somethings that could each act as a singleton.
P.S: This is for a project where a framework such as Spring can't be used.
You could introduce an abstraction like this:
public abstract class Singleton<T> {
private T object;
public synchronized T get() {
if (object == null) {
object = create();
}
return object;
}
protected abstract T create();
}
Then for each singleton, you just need to write this:
public final Singleton<Database> database = new Singleton<Database>() {
#Override
protected Database create() {
// connect to the database, return the Database instance
}
};
public final Singleton<LogCluster> logs = new Singleton<LogCluster>() {
...
Then you can use the singletons by writing database.get(). If the singleton hasn't been created, it is created and initialized.
The reason people probably don't do this, and prefer to just repeatedly write something like this:
private Database database;
public synchronized Database getDatabase() {
if (database == null) {
// connect to the database, assign the database field
}
return database;
}
private LogCluster logs;
public synchronized LogCluster getLogs() {
...
Is because in the end it is only one more line of code for each singleton, and the chance of getting the initialize-singleton pattern wrong is pretty low.
However, as programmers we're supposed to come up with inventive solutions that avoid redundancy or repetition of any kind.
That is not correct. As programmers, we are supposed to come up with solutions that meet the following criteria:
meet the functional requirements; e.g. perform as required without bugs,
are delivered within the mandated timeframe,
are maintainable; e.g. the next developer can read and modify the code,
performs fast enough for the task in hand, and
can be reused in future tasks.
(These criteria are roughly ordered by decreasing priority, though different contexts may dictate a different order.)
Inventiveness is NOT a requirement, and "avoid[ing] redundancy or repetition of any kind" is not either. In fact both of these can be distinctly harmful ... if the programmer ignores the real criteria.
Bringing this back to your question. You should only be looking for alternative ways to do singletons if it is going to actually make the code more maintainable. Complicated "inventive" solutions may well return to bite you (or the people who have to maintain your code in the future), even if they succeed in reducing the number of lines of repeated code.
And as others have pointed out (e.g. #BalusC), current thinking is that the singleton pattern should be avoided in a lot of classes of application.
There does exist a multiton pattern. Regardless, I am 60% certain that the real solution to the original problem is a RDBMS.
#BalusC is right, but I will say it more strongly, Singletons are evil in all contexts.
Webapps, desktop apps, etc. Just don't do it.
All a singleton is in reality is a global wad of data. Global data is bad. It makes proper unit testing impossible. It makes tracing down weird bugs much, much harder.
The Gang of Four book is flat out wrong here. Or at least obsolete by a decade and a half.
If you want only one instance, have a factory that makes only one. Its easy.
How about passing a parameter to the function that creates the singleton (for example, it's name or specialization), that knows to create a singleton for each unique parameter?
I know you asked about Java, but here is a solution in PHP as an example:
abstract class Singleton
{
protected function __construct()
{
}
final public static function getInstance()
{
static $instances = array();
$calledClass = get_called_class();
if (!isset($instances[$calledClass]))
{
$instances[$calledClass] = new $calledClass();
}
return $instances[$calledClass];
}
final private function __clone()
{
}
}
Then you just write:
class Database extends Singleton {}

What's the preferred way to assign a collection from a parameter?

I have this class:
public MyClass {
public void initialize(Collection<String> data) {
this.data = data; // <-- Bad!
}
private Collection<String> data;
}
This is obviously bad style, because I'm introducing a shared mutable state. What's the preferred way to handle this?
Ignore it?
Clone the collection?
...?
EDIT: To clarify why this is bad, imagine this:
MyClass myObject = new MyClass();
List<String> data = new ArrayList<String>();
myObject.initialize(data); // myObject.data.size() == 0
data.add("Test"); // myObject.data.size() == 1
Just storing the reference poses a way to inject data into the private field myObject.data, although it should be completely private.
Depending on the nature of MyClass this could have serious impacts.
The best way is to deep clone the parameter. For performance reasons, this is usually not possible. On top of that, not all objects can be cloned, so deep copying might throw exceptions and cause all kinds of headache.
The next best way would be a "copy-on-write" clone. There is no support for this in the Java runtime.
If you think that it's possible someone mutates the collection, do a shallow copy using the copy constructor:
this.data = new HashSet<String> (data);
This will solve your problem (since String is immutable) but it will fail when the type in the set is mutable.
Another solution is to always make the sets immutable as soon as you store them somewhere:
Set<String> set = ...
...build the set...
// Freeze the set
set = Collections.unmodifiableSet(set);
// Now you can safely pass it elsewhere
obj.setData (set);
The idea here is turn collections into "value objects" as soon as possible. Anyone who wants to change the collection must copy it, change it and then save it back.
Within a class, you can keep the set mutable and wrap it in the getter (which you should do anyway).
Problems with this approach: Performance (but it's probably not as bad as you'd expect) and discipline (breaks if you forget it somewhere).
Null check (if you want to restrict null)
Either defensive copy (if you don't want shared state)
or as you did (if a live view on data is useful)
Depends heavily on your requirements.
Edited:
Ignoring should be no option. Silent fail is, well... a debugging nightmare.
public class Foo {
private final Collection collection = new ArrayList();
public void initialise(final Collection collection) {
this.collection.addAll(collection);
}
}
Sorry for not addressing your concern directly, but I would never directly pass a Collection to a setXxx() bean setter method. Instead, I would do:
private final List<MyClass> theList;
public void addXxx(MyClass item) { ... }
public void removeXxx(MyClass item) { ... } // or index.
public void Iterator<MyClass> iterateXxx() {
return Collections.unmodifiableList(theList).iterator();
}
I would go for defensive copying / deep cloning only if I am sure there would be no side effects from using it, and as for the speed, I wouldn't concern myself with it, since in business applications reliability has 10 times more priority than speed. ;-)
An idea will be to pass the data as a String array and create the Set inside MyClass. Of course MyClass should test that the input data is valid. I believe that this is a good practice anyway.
If both the caller of MyClass and MyClass itself actually work with a Set<String>, then you could consider cloning the collection. The Set however needs to be constructed somehow. I would prefer to move this responsibility to MyClass.

Categories