Java / Android: LinkedList race condition

Java / Android: LinkedList race condition - java

I am using a LinkedList in an android application, where I encounter race conditions. 1 thread adds data to the LinkedList while another removes them when they are retrieved. I encountered the siuation where my newly added objects are never handled.
I Googled for Java synchronized objects, but either I am thinking too easy or I am missing something. From what I've read "Object synchronization" is the more difficult of the two, so I was wondering if just enclosing a piece of code with sychronized(object) { }; is enough?
So, I have:
public void function addMove(Object move) {
synchronized(list) {
list.add(move);
};
};
public void function second() {
synchronized(list) {
// iterate through the list
list.clear;
};
};
Is this really all I need?

- In your case if your list is properly coupled with the Object and there is no other way to access it except this only method then i think that would be enough.
- Moreover the below approach can also be used:
public class Test{
private LinkedList<Object> list = new LinkedList<Object>();
public void function addMove(Object move) {
synchronized(this) {
list.add(move);
}
}
public void function second() {
synchronized(this) {
list.clear;
}
}
}
//////////////////////////////////Edited Part///////////////////////
- Synchronization is done to protect the crucial state of data and are applied on to the method or atomic statements accessing that field whose data is to be protected.
- Every object has one and only one lock, which is to be obtained by a thread to access the synchronized methods or atomic statements which has access to its instance variables.
- Similarly every Class has one and only one lock, which is to be obtained by a thread to access the synchronized methods or atomic statements which has access to its static variable.
- Now when an thread gains access over the lock of an object then after that it gets the access to its synchronized methods or atomic statements, and if at that time another thread tries to access the same object lock, then that thread is denied the access and it moves to the blocked state.

If the list is properly encapsulated into your object (i.e. no other object has access to the list), and if these are the only methods accessing the list, then yes, this is all you need.
The key is that every access to the list must be done in a synchronized block. And every synchonized block must synchronize on the same object (list, in your code example).

Yes this is the way how you can control the access of threads for your object. All you have to do is to look on the timing of your methods. The clear command should only be called if there was data added to the list.

I don't know if Android supports Java 5, but java.util.concurrent contains lots of excellent classes to support queues, such as ConcurrentLinkedQueue.

Related

Why do we need to specify the lock for synchronized statements?

Given that there's only one lock for each instance of a class, then why doesn't Java just allow us to do this:
void method() {
synchronized {
// do something
}
// do other things
}
instead of this:
void method() {
synchronized (lock) {
// do something
}
// do other things
}
What's the purpose of specifying a lock? Does it make a difference if I choose one object as a lock over the other? Or could I just choose any random object?
EDIT:
It turned out that my comprehension of synchronized methods is wrong at the fundamental level.
I thought different synchronized methods or blocks are entirely independent of each other regardless of locks. Rather, all synchronized methods or blocks with the same lock can be accessed only by one thread, even if such synchronized methods/blocks are from different classes (the documentation should have emphasized this more: ALL synced methods/blocks, regardless of location, all that matters is the lock).

Given that there's only one lock for each instance of a class, then why doesn't Java just allow us to do this:
void method() {
synchronized {
// do something
}
// do other things
}
Although an intrinsic lock is provided with each instance,
that's not necessarily the "obvious" lock to use.
You're perhaps right that they could have provided synchronized { ... } as a shorthand for synchronized (this) { ... }.
I don't know why they didn't, but I never missed it.
But concurrent programming is tricky,
so making the lock object an explicit required parameter may make things clearer to readers, which is a good thing, as #ajb pointed out in a comment.
In any case, I don't think syntax is your main question, so let's move on.
What's the purpose of specifying a lock?
Uhm, the lock is perhaps the single most important thing in the synchronization mechanism. The key point in synchronization is that only one thread can hold the same lock. Two threads holding different locks are not synchronized. So knowing what is the lock guarding the synchronization is crucial.
Does it make a difference if I choose one object as a lock over the other?
I hope the previous section makes it clear that yes, you have to choose the object carefully. It has to be an object visible by all threads involved,
it has to be not null, and it has to be something that won't get reassigned during the period of synchronization.
Or could I just choose any random object?
Certainly not. See the previous section.
To understand concurrency in Java, I recommend the book Java Concurrency in Practice by one of the authors of the API, or Oracle's tutorials on the subject.

It's so you can lock on something completely different than this.
Remember how Vector is "thread-safe?" It's not quite that simple; each call is, but code like this isn't because it could have been updated between getting the size of the vector and getting the element:
for (int i = 0; i < vector.size(); ++i) System.out.println(vector.get(i));
Since Vector, along with Collections.synchronized*, is synchronized with the older synchronized keyword, you can make that above code thread-safe by enclosing it all within a lock:
synchronized (vector) {
for (int i = 0; i < vector.size(); ++i) System.out.println(vector.get(i));
}
This could be in a method that isn't thread-safe, isn't synchronized, or uses ReentrantLock; locking the vector is separate from locking this.

It most certainly makes a difference what object you use as a lock. If you say
void method() {
synchronized (x) {
// do something
}
// do other things
}
Now, if one thread is executing the block and another tries to enter the block, if x is the same for both of them, then the second thread will have to wait. But if x is different, the second thread can execute the block at the same time. So, for example, if method is an instance method and you say
void method() {
synchronized (this) {
// do something
}
// do other things
}
Now two threads running the method using the same object can't execute the block simultaneously, but two threads can still run the method on different objects without blocking each other. This is what you'd want when you want to prevent simultaneous access to the instance variables in that object, but you don't have anything else you need to protect. It's not a problem if two threads are accessing variables in two different objects.
But say the block of code is accessing a common resource, and you want to make sure all other threads are locked out of accessing that resource. For example, you're accessing a database, and the block does a series of updates and you want to make sure they're done atomically, i.e. no other code should access the database while you're in between two updates. Now synchronized (this) isn't good enough, because you could have the method running for two different objects but accessing the same database. In this case, you'd need a lock that is the same for all objects that might access the same database. Here, making the database object itself the lock would work. Now no two threads can use method to enter this block at the same time, if they're working with the same database, even if the objects are different.

if you have multiple objects b1/b2 needs to update concurrency
class A {
private B b1, b2;
}
if you have only one lock say class A itself
synchronized (this) { ... }
then assume there are two threads are updating b1 and b2 in the same time, they will play one by one because synchronized (this)
but if you have two locks for b1 and b2
private Object lock1 = new Object, lock2 = new Object;
the two threads i've mentioned will play concurrently because synchronized (lock1) not affect synchronized (lock2).sometimes means better performance.

In synchronized (lock).., lock can be an object level lock or it can be class level lock.
Example1 Class Level Lock:
private static Object lock=new Object();
synchronized (lock){
//do Something
}
Example2 Object Level Lock:
private Object lock=new Object();
synchronized (lock){
//do Something
}

Thread-safe classes explanation in Java

Let's consider this situation:
public class A {
private Vector<B> v = new Vector<B>();
}
public class B {
private HashSet<C> hs = new HashSet<C>();
}
public class C {
private String sameString;
public void setSameString(String s){
this.sameString = s;
}
}
My questions are:
Vector is thread-safe so when a thread calls over it, for instance, the get(int index)method Is this thread the only owner ofHashSeths?
If a thread call get(int index) over v and it obtains one B object. Then this thread obtains a C object and invoke setSameString(String s) method, is this write thread-safe? Or mechanism such as Lock are needed?

First of all, take a look at this SO on reasons not to use Vector. That being said:
1) Vector locks on every operation. That means it only allows one thread at a time to call any of its operations (get,set,add,etc.). There is nothing preventing multiple threads from modifying Bs or their members because they can obtain a reference to them at different times. The only guarantee with Vector (or classes that have similar synchronization policies) is that no two threads can concurrently modify the vector and thus get into a race condition (which could throw ConcurrentModificationException and/or lead to undefined behavior);
2) As above, there is nothing preventing multiple threads to access Cs at the same time because they can obtain a reference to them at different times.
If you need to protect the state of an object, you need to do it as close to the state as possible. Java has no concept of a thread owning an object. So in your case, if you want to prevent many threads from calling setSameString concurrently, you need to declare the method synchronized.
I recommend the excellent book by Brian Goetz on concurrency for more on the topic.

In case 2. It's not thread-safe because multiple threads could visit the data at the same time. Consider using read-write lock if you want to achieve better performance. http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReadWriteLock.html#readLock()

Will all java threads see shared resource updates after modification in synchronized method?

Is it necessary to use specialized concurrent versions of Java's Collections data structures (e.g. CopyOnWriteArrayList vs ArrayList) if all access to a pair of data structures is always wrapped with the acquisition and release of a lock (in particular, using a static synchronized method for any modifications to the data structure). For example:
public static synchronized Item doIt() {
// remove something from data structure 1
// add the removed item to data structure 2
// return removed item
}
I know the synchronized method will enforce only one thread at a time performing the updates, but by the time a thread has exited the method, are the other threads guaranteed to see the updated data structures, or do I still need specialized concurrent data structures for that guarantee?
Edit:
Here's a better example of what I'm trying to do:
private static final List<Item> A;
private static final HashMap<Integer,Item> B;
public static Item doSomething() {
// some stuff ...
Item item = doIt();
// some other stuff ...
return item;
}
private static synchronized Item doIt() {
Item theItem = A.remove( A.size()-1 );
B.put( theItem.getId(), theItem );
return theItem;
}

Yes, if the access is always wrapped in synchronized methods/blocks.
This is because, synchronized establishes a happens-before relation between synchronized methods/blocks (on the same object). Quoting from Synchronized Methods in the Java Tutorial:
Second, when a synchronized method exits, it automatically establishes
a happens-before relationship with any subsequent invocation of a
synchronized method for the same object. This guarantees that changes
to the state of the object are visible to all threads.
However, it is important that you really wrap all access in synchronized blocks. If you would, for example, return a reference to the list from a synchronized method like this
public synchronized List<Object> GetList() {
return this.myList;
}
and use the list outside the synchronized method, you will not get that guarantee!

Synchronization is about quite a bit more than just mutual exclusion. Namely, it is about the visibility of all actions which go on within the block (and precede it as well) to any other thread which subsequently acquires the same lock.
Therefore you don't need concurrent data structures when you use locking for all access to the structures.
Finally, just to make sure: you must use locking for all access, including all reads.

Non-thread-safe Attempt to Implement Put-if-absent?

There is one code snippet in the 4th chapter in Java Concurrency in Practice
public class ListHelper<E> {
public List<E> list =
Collections.synchronizedList(new ArrayList<E>());
...
public synchronized boolean putIfAbsent(E x) {
boolean absent = !list.contains(x);
if (absent)
list.add(x);
return absent;
}
}
it says this is thread safe for using different locks, putIfAbsent is not atomic relative to other operations on the List.
But I think "synchronized" preventing multithreads enter putIfAbsent, if there are other methods that do other operations on the List, key word synchronized should also be as the method atttribute. So following this way, should it be thread safe? Under what case "it is not atomic"?

putIfAbsent is not atomic relative to other operations on the List. But I think "synchronized" preventing multithreads enter putIfAbsent
This is true but there is no guarantees that there are other ways threads are accessing the list. The list field is public (which is always a bad idea) which means that other threads can call methods on the list directly. To properly protect the list, you should make it private and add add(...) and other methods to your ListHelper that are also synchronized to fully control all access to the synchronized-list.
// we are synchronizing the list so no reason to use Collections.synchronizedList
private List<E> list = new ArrayList<E>();
...
public synchronized boolean add(E e) {
return list.add(e);
}
If the list is private and all of the methods are synchronized that access the list then you can remove the Collections.synchronizedList(...) since you are synchronizing it yourself.
if there are other methods that do other operations on the List, key word synchronized should also be as the method atttribute. So following this way, should it be thread safe?
Not sure I fully parse this part of the question. But if you make the list be private and you add other methods to access the list that are all synchronized then you are correct.
Under what case "it is not atomic"?
putIfAbsent(...) is not atomic because there are multiple calls to the synchronized-list. If multiple threads are operating on the list then another thread could have called list.add(...) between the time putIfAbsent(...) called list.contains(x) and then calls list.add(x). The Collections.synchronizedList(...) protects the list against corruption by multiple threads but it cannot protect against race-conditions when there are multiple calls to list methods that could interleave with calls from other threads.

Any unsynchronized method that modifies the list may introduce the absent element after list.contains() returns false, but before the element has been added.
Picture this as two threads:
boolean absent = !list.contains(x); // Returns true
-> list.add(theSameElementAsX); // Another thread
if(absent) // absent is true, but the list has been modified!
list.add(x);
return absent;
This could be accomplished with simply a method as follows:
public void add(E e) {
list.add(e);
}
If the method were synchronized, there would be no problem, since the add method wouldn't be able to run before putIfAbsent() was fully finished.
A proper correction would include making the List private, and making sure that compound operations on it are properly synchronized (i.e. on the class or the list itself).

Thread safety is not composable! Imagine a program built entirely out of "thread safe" classes. Is the program itself "thread safe?" Not necessarily. It depends on what the program does with those classes.
The synchronizedList wrapper makes each individual method of a List "thread safe". What does that mean? It means that none of those wrapped methods can corrupt the internal structure of the list when called in a multi-threaded environment.
That doesn't protect the way in which any given program uses the list. In the example code, the list appears to be used as an implementation of a set: The program doesn't allow the same object to appear in the list more than one time. There's nothing in the synchronizedList wrapper that will enforce that particular guarantee though, because that guarantee has nothing to do with the internal structure of the list. The list can be perfectly valid as a list, but not valid as a set.
That's why the additional synchronization on the putIfAbsent() method.

Collections.synchronizedList() creates a collection which adds synchronization on private mutex for every single method of it. This mutex is list this for one-argument factory used in example, or can be provided when two-argument factory is used. That's why we need an external lock to make subsequent contains() and add() calls atomic.
In case the list is available directly, not via ListHelper, this code is broken, because access will be guarded by different locks in that case. To prevent that, it is possible to make list private to prevent direct access, and wrap all neccesary API with synchronization on the same mutex declared in ListHelper or on the this of ListHelper itself.

Determining synchronization scope?

in trying to improve my understanding on concurrency issues, I am looking at the following scenario (Edit: I've changed the example from List to Runtime, which is closer to what I am trying):
public class Example {
private final Object lock = new Object();
private final Runtime runtime = Runtime.getRuntime();
public void add(Object o) {
synchronized (lock) { runtime.exec(program + " -add "+o); }
}
public Object[] getAll() {
synchronized (lock) { return runtime.exec(program + " -list "); }
}
public void remove(Object o) {
synchronized (lock) { runtime.exec(program + " -remove "+o); }
}
}
As it stands, each method is by thread safe when used standalone. Now, what I'm trying to figure out is how to handle where the calling class wishes to call:
for (Object o : example.getAll()) {
// problems if multiple threads perform this operation concurrently
example.remove(b);
}
But as noted, there is no guarantee that the state will be consistent between the call to getAll() and the calls to remove(). If multiple threads call this, I'll be in trouble. So my question is - How should I enable the developer to perform the operation in a thread safe manner? Ideally I wish to enforce the thread safety in a way that makes it difficult for the developer to avoid/miss, but at the same time not complicated to achieve. I can think of three options so far:
A: Make the lock 'this', so the synchronization object is accessible to calling code, which can then wrap the code blocks. Drawback: Hard to enforce at compile time:
synchronized (example) {
for (Object o : example.getAll()) {
example.remove(b);
}
}
B: Place the combined code into the Example class - and benefit from being able to optimize the implementation, as in this case. Drawback: Pain to add extensions, and potential mixing unrelated logic:
public class Example {
...
public void removeAll() {
synchronized (lock) { Runtime.exec(program + " -clear"); }
}
}
C: Provide a Closure class. Drawback: Excess code, potentially too generous of a synchronization block, could in fact make deadlocks easier:
public interface ExampleClosure {
public void execute(Example example);
}
public Class Example {
...
public void execute(ExampleClosure closure) {
synchronized (this) { closure.execute(this); }
}
}
example.execute(new ExampleClosure() {
public void execute(Example example) {
for (Object o : example.getAll()) {
example.remove(b);
}
}
}
);
Is there something I'm missing? How should synchronization be scoped to ensure the code is thread safe?

Use a ReentrantReadWriteLock which is exposed via the API. That way, if someone needs to synchronize several API calls, they can acquire a lock outside of the method calls.

In general, this is a classic multithreaded design issue. By synchronizing the data structure rather than synchronizing concepts that use the data structure, it's hard to avoid the fact that you essentially have a reference to the data structure without a lock.
I would recommend that locks not be done so close to the data structure. But it's a popular option.
A potential technique to make this style work is to use an editing tree-walker. Essentially, you expose a function that does a callback on each element.
// pointer to function:
// - takes Object by reference and can be safely altered
// - if returns true, Object will be removed from list
typedef bool (*callback_function)(Object *o);
public void editAll(callback_function func) {
synchronized (lock) {
for each element o { if (callback_function(o)) {remove o} } }
}
So then your loop becomes:
bool my_function(Object *o) {
...
if (some condition) return true;
}
...
editAll(my_function);
...
The company I work for (corensic) has test cases extracted from real bugs to verify that Jinx is finding the concurrency errors properly. This type of low level data structure locking without higher level synchronization is pretty common pattern. The tree editing callback seems to be a popular fix for this race condition.

I think everyone is missing his real problem. When iterating over the new array of Object's and trying to remove one at a time the problem is still technically unsafe (though ArrayList implantation would not explode, it just wouldnt have expected results).
Even with CopyOnWriteArrayList there is the possibility that there is an out of date read on the current list to when you are trying to remove.
The two suggestions you offered are fine (A and B). My general suggestion is B. Making a collection thread-safe is very difficult. A good way to do it is to give the client as little functionality as possible (within reason). So offering the removeAll method and removing the getAll method would suffice.
Now you can at the same time say, 'well I want to keep the API the way it is and let the client worry about additional thread-safety'. If thats the case, document thread-safety. Document the fact that a 'lookup and modify' action is both non atomic and non thread-safe.
Today's concurrent list implementations are all thread safe for the single functions that are offered (get, remove add are all thread safe). Compound functions are not though and the best that could be done is documenting how to make them thread safe.

I think j.u.c.CopyOnWriteArrayList is a good example of similar problem you're trying to solve.
JDK had a similar problem with Lists - there were various ways to synchronize on arbitrary methods, but no synchronization on multiple invocations (and that's understandable).
So CopyOnWriteArrayList actually implements the same interface but has a very special contract, and whoever calls it, is aware of it.
Similar with your solution - you should probably implement List (or whatever interface this is) and at the same time define special contracts for existing/new methods. For example, getAll's consistency is not guaranteed, and calls to .remove do not fail if o is null, or isn't inside the list, etc. If users want both combined and safe/consistent options - this class of yours would provide a special method that does exactly that (e.g. safeDeleteAll), leaving other methods close to original contract as possible.
So to answer your question - I would pick option B, but would also implement interface your original object is implementing.

From the Javadoc for List.toArray():
The returned array will be "safe" in
that no references to it are
maintained by this list. (In other
words, this method must allocate a new
array even if this list is backed by
an array). The caller is thus free to
modify the returned array.
Maybe I don't understand what you're trying to accomplish. Do you want the Object[] array to always be in-sync with the current state of the List? In order to achieve that, I think you would have to synchronize on the Example instance itself and hold the lock until your thread is done with its method call AND any Object[] array it is currently using. Otherwise, how will you ever know if the original List has been modified by another thread?

You have to use the appropriate granularity when you choose what to lock. What you're complaining about in your example is too low a level of granularity, where the lock doesn't cover all the methods that have to happen together. You need to make methods that combine all the actions that need to happen together within the same lock.
Locks are reentrant so the high-level method can call low-level synchronized methods without a problem.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.