We have a system in which each thread (there can be dozens of them) works as an individual agent.
It has its own inner variables and objects, and it monitors other threads' objects as well as its own) in order to make decisions.
Unfortunately the system is deadlocking quite often.
Going through java tutorial (http://download.oracle.com/javase/tutorial/essential/concurrency/index.html) and through other topics here at stackoverflow, I managed to avoid some of these deadlocks by synchronizing the methods and using a monitor, as in:
Producer->monitor->Consumer.
However, not all communication between threads can be modeled like this. As I've mentioned before, at a given time one thread must have access to the objects (variables, lists, etc) of the other threads.
The way it's being done now is that each thread has a list with pointers to every other thread, forming a network. By looping through this list, one thread can read all the information it needs from all the others. Even though there is no writing involved (there shouldn't be any problems with data corruption), it still deadlocks.
My question is: is there an already known way for dealing with this sort of problem? A standard pattern such as the monitor solution?
Please let me know if the question needs more explanation and I'll edit the post.
Thank you in advance!
-Edit----
After getting these answers I studied more about java.concurrency and also the actor model. At the moment the problem seems to be fixed by using a reentrant lock:
http://download.oracle.com/javase/tutorial/essential/concurrency/newlocks.html
Since it can back out from an attempt to acquire the locks, it doesn't seem to have the problem of waiting forever for the them.
I also started implementing an alternate version following the actor model since it seems to be an interesting solution to this case.
My main mistakes were:
-Blindly trusting synchronize
-When in the tutorial they say "the lock is on the object" what they actually mean is the whole object running the thread (in my case), not the object I would like to access.
Thank you all for the help!
Look at higher-level concurrency constructs such as the java.util.concurrent package and the Akka framework/library. Synchronizing and locking manually is a guaranteed way to fail with threads in Java.
I would recommend to apply Actor model here (kind of share nothing parallelism model).
Using this model means that all your thread don't interrupt each other explicitely and you don't need to do any synchronization at all.
Instead of making synchronization you'll use messages. When one Actor (thread) needs to get info about another Actor, it just asynchronously send a correspondent message to that Actor.
Each Actor can also respond to messages of certain types. So, when a new message comes, Actor analyses it and sends a response (or does any other activity). The key point here is that processing of incoming messages is being done synchronously (i.e. it's the only point where you need the simplest way of synchronization - just mark the method which processes messages with synchronized modifier).
When one thread needs to synchronize with many other threads in a manner that a deadlock may occur, greedily acquire all your resources, and in the case that you can't acquire a single resource out of the set, release all resources and try again.
It's an algorithm based on the dining philosophers problem.
One important thing to remember is, that you have to aquire all locks in a consistent order across all your threads, in order to avoid the following situation:
Thread 1 Thread 2
acquire A acquire B
acquire B acquire A
One way to do it would be to have only objects used as locks, which can be ordered.
class Lock {
static final AtomicLong counter = new AtomicLong()
final long id = counter.incrementAndGet();
}
which must be used like
if (lock1.id < lock2.id) {
synchronized (lock1) {
synchronized (lock2) {
...
}
}
} else {
synchronized (lock2) {
synchronized (lock1) {
...
}
}
}
Obviously, this becomes tedious soon, in particular, the more locks are involved. Using explicit ReentrantLocks might help, as it more easily allows all that stuff to be factored out into a generic “grab multiple locks method“.
Another strategy, which might be applicable for your problem, would be "hand-over-hand" locking. Consider
class Node {
final ReentrantLock lock = new ReentrantLock();
Node previous;
Node next;
}
with a traversal operation like
Node start = ...;
Node successor;
start.lock.lock();
try {
successor = start.next;
successor.lock.lock();
} finally {
start.lock.unlock();
}
// Here, we own the lock on start's next sibling. We could continue
// with this scheme, traversing the entire graph, at any time holding
// at most two locks: the node we come from and the node we want to
// go to.
The above scheme still requires, that the locks are acquired in a consistent order across all threads. This means, that you can only every traverse the graph either in "forward" direction (i.e., following the thread of next pointers) or "backward" direction (going via previous). As soon as you start using both at random, things become prone to deadlocks again. This is potentially true also, if you make arbitrary changes to the graph structure, changing the positions of nodes.
How about actor model? Shortly speaking, in actor-based programming all threads work as independent actors (or, as you said, agents). Communication is done via messages. Each actor has its own message queue and processes these messages one by one. This model is implemented in a Scala programming language, and one of its frameworks - Akka - may be used from Java.
What I do is use ExecutorServices for each Thread Pool. When you want another thread to do work, you pass it copies (or immutable data) of all the information it will need. This way you have state which is local to a thread or thread pool and you have information which is passed to another thread. i.e. you never pass mutable state to another thread. This avoid the need to ever lock another threads data.
Related
Is there any way to put any sort of event listener that will be called when some thread - for example, the current thread - stops its activity and starts waiting or terminates?
I need this for the object to be notified and release some resources, when it is not in active use in this thread but still stored in memory somewhere that prevents it from being garbage collected - otherwise I'd place that resource releasing code in finalise() method.
UPD
Use case: an object that keeps a reference to a jdbc resultset or a database connection; the respective close() or commit() should be called automatically when the object is set aside temporarily or discarded at all without requiring the program to call any sort of cleanup method.
(There is no question how do I lock the object to be accessed from only one thread at a time, it is solved.)
The distinct non-answer: wrong design point. Threads don't "own" resources.
Threads are simply "threads of execution". They run the code you tell them to run. Therefore a thread doesn't own any of the objects it comes by.
As a consequence, there are no built-in mechanisms to help with your requirement. You would have to implement something yourself, relying on monitoring threads, and their states. Which would be a hard and challenging task. Mainly because: multi threading is hard.
The serious recommendation here: step back from this design. Rather think about other, different ways to deal with such "resources".
This is indeed a wrong approach.
You can obviously lock the object and unlock it in a finally block like this:
private Lock lock = new ReentrantLock();
public void useObject() {
lock.lock();
try {
//do something with your resource.
}
finally {
lock.unlock();
}
}
This way if the thread that runs useObject terminates, it will execute the finally block, and unlock the lock that protects the resource.
But there's NO way to detect the thread is not having any activity. If the thread is preempted by the Operation System, there's no way for you to know about it. That's below the abstraction level, you as a developer, operate.
If you want to gain more understanding on how the OS works with threads, and what you can cannot do you should check out
Java Multithreading, Concurrency & Performance Optimization
course on Udemy.
It also talks about how to properly use the right locks to do this kind of safe synchronization, and get the best performance from your application when you have to share resources such as database connections.
I hope it helps
Synchronization works by providing exclusive access to an object or method by putting a Synchronized keyword before a method name. What if I want to give higher precedence to one particular access if two or more accesses to a method occurs at the same time. Can we do that?
Or just may be I'm misunderstanding the concept of Synchronization in java. Please correct me.
I have other questions as well,
Under what requirements should we make method synchronized?
When to make method synchronized ? And when to make block synchronized ?
Also if we make a method synchronized will the class too be synchronized ? little confused here.
Please Help. Thanks.
No. Sadly Java synchronization and wait/notify appear to have been copied from the very poor example of Unix, rather than almost anywhere else where there would have been priority queues instead of thundering herds. When Per Brinch Hansen, author of monitors and Objective Pascal, saw Java, he commented 'clearly I have laboured in vain'.
There is a solution for almost everything you need in multi-threading and synchronization in the concurrent package, it however requires some thinking about what you do first. The synchronized, wait and notify constructs are like the most basic tools if you have just a very basic problem to solve, but realistically most advanced programs will (/should) never use those and instead rely on the tools available in the Concurrent package.
The way you think about threads is slightly wrong. There is no such thing as a more important thread, there is only a more important task. This is why Java clearly distinguishes between Threads, Runnables and Callables.
Synchronization is a concept to prevent more than one thread from entering a specific part of code, which is - again - the most basic concept of avoiding threading issues. Those issues happen if more than one thread accesses some data, where at least one of those multiple threads is trying to modify that data. Think about an array that is read by Thread A, while it is written by Thread B at the same time. Eventually Thread B will write the cell that Thread A is just about to read. Now as the order of execution of threads is undefined, it is as well undefined whether Thread A will read the old value, the new value or something messed up in between.
A synchronized "lock" around this access is a very brute way of ensuring that this will never happen, more sophisticated tools are available in the concurrent package like the CopyOnWriteArray, that seamlessly handles the above issue by creating a copy for the writing thread, so neither Thread A nor Thread B needs to wait. Other tools are available for other solutions and problems.
If you dig a bit into the available tools you soon learn that they are highly sophisticated, and the difficulties using them is usually located with the programmer and not with the tools, because countless hours of thinking, improving and testing has been gone into those.
Edit: to clarify a bit why the importance is on the task even though you set it on the thread:
Imagine a street with 3 lanes that narrows to 1 lane (synchronized block) and 5 cars (threads) are arriving. Let's further assume there is one person (the car scheduler) that has to define which cars get the first row and which ones get the other rows. As there is only 1 lane, he can at best assign 1 cars to the first row and the others need to come behind. If all cars look the same, he will most likely assign the order more or less randomly, while a car already in front might stay in front more likely, just because it would be to troublesome to move those cars around.
Now lets say one car has a sign on top "President of the USA inside", so the scheduler will most likely give that car priority in his decision. But even though the sign is on the car, the reason for his decision is not the importance of the car (thread), but the importance on the people inside (task). So the sign is nothing but an information for the scheduler, that this car transports more important people. Whether or not this is true however, the scheduler can't say (at least not without inspection), so he just has to trust the sign on the car.
Now if in another scenario all 5 cars have the "President inside" sign, the scheduler doesn't have any way to decide which one goes first, and he is in the same situation again as he was with all the cars having no sign at all.
Well in case of synchronized, the access is random if multiple threads are waiting for the lock. But in case you need first-come first-serve basis: Then you can probably use `ReentrantLock(fairness). This is what the api says:
The constructor for this class accepts an optional fairness parameter.
When set true, under contention, locks favor granting access to the
longest-waiting thread.
Else if you wish to give access based on some other factor, then I guess it shouldn;t be complicated to build one. Have a class that when call's lock gets blocked if some other thread is executing. When called unlock it will unblock a thread based on whatever algorithm you wish to.
There's no such thing as "priority" among synchronized methods/blocks or accesses to them. If some other thread is already holding the object's monitor (i.e. if another synchronized method or synchronized (this) {} block is in progress and hasn't relinquished the monitor by a call to this.wait()), all other threads will have to wait until it's done.
There are classes in the java.util.concurrent package that might be able to help you if used correctly, such as priority queues. Full guidance on how to use them correctly is probably beyond the scope of this question - you should probably read a decent tutorial to start with.
Is it possible in Java (Android) to implement a customized version of a Thread which carries its own States?
What I mean is:
While ThreadA is in Running state, it still can be polled by ThreadB that asks for its state
e.g.
ThreadA.getState();
It is possible to modify the states values to some custom ones? So as to implement a sort of basic communication system between those two threads?
Thanks.
Yes that is possible. I used this a lot in my previous projects, all what you need is to extend the Thread class.
public class StateThread extends Thread{
String state = "ThreadState";
public synchronized void setState(String newState){
state = newState;
}
public synchronized String getState(){
return state;
}
#override
public void run(){
// Do stuff and update state...
}
}
Yes, it is possible to perform this task.
Is it a good design? I don't think so.
There are other means to perform communication between threads -
For example, you should use a queue with a Producer/Consumer pattern.
I am sure that Android, as JavaSE supports thread local - you can use it in order to manage local thread data (including states) (maybe in combination with a queue that will get "operations" to change the state managed by a thread
If you do decide to go for the solution of having setState and getState methods, at least consider using the ReaderWriterLock to optimize your locking
Threads state is maintained by the Virtual Machine. VM uses the state to monitor and manage the actual thread.
That's why there is no mechanism to modify the state of the Thread. There is no setState function that allows to set your custom state.
For your application purpose, you can define your own instance variables by extending Thread but that cannot alter Thread's state in any way.
Synchronizing with shared data is not very useful for determining the 'state' of a thread - the thread writes its state as 'healthy', then gets stuck - the monitor thread then checks the state and finds it healthy.
Monitoring the 'state' should mean making the checked thread do something, not just looking directly at some shared object.
If you have a message-passing design, (as suggested by zaske), you can pass around a 'state record' on the input queue of evey thread, asking it to record its state inside and pass it on to the next thread. The 'monitor' thread waits for the record to come back, all filled in. If it does not get it in a resonable time, it could log what it has got - it keeps a reference to the state record object, so it could see which thread has not updated its state. It could, perhaps, fail to feed a watchdog timer.
Say, I have a data object:
class ValueRef { double value; }
Where each data object is stored in a master collection:
Collection<ValueRef> masterList = ...;
I also have a collection of jobs, where each job has a local collection of data objects (where each data object also appears in the masterList):
class Job implements Runnable {
Collection<ValueRef> neededValues = ...;
void run() {
double sum = 0;
for (ValueRef x: neededValues) sum += x;
System.out.println(sum);
}
}
Use-case:
for (ValueRef x: masterList) { x.value = Math.random(); }
Populate a job queue with some jobs.
Wake up a thread pool
Wait until each job has been evaluated
Note: During the job evaluation, all of the values are all constant. The threads however, have possibly evaluated jobs in the past, and retain cached values.
Question: what is the minimal amount of synchronization necessary to ensure each thread sees the latest values?
I understand synchronize from the monitor/lock-perspective, I do not understand synchronize from the cache/flush-perspective (ie. what is being guaranteed by the memory model on enter/exit of the synchronized block).
To me, it feels like I should need to synchronize once in the thread that updates the values to commit the new values to main memory, and once per worker thread, to flush the cache so the new values are read. But I'm unsure how best to do this.
My approach: create a global monitor: static Object guard = new Object(); Then, synchronize on guard, while updating the master list. Then finally, before starting the thread pool, once for each thread in the pool, synchronize on guard in an empty block.
Does that really cause a full flush of any value read by that thread? Or just values touched inside the synchronize block? In which case, instead of an empty block, maybe I should read each value once in a loop?
Thanks for your time.
Edit: I think my question boils down to, once I exit a synchronized block, does every first read (after that point) go to main memory? Regardless of what I synchronized upon?
It doesn't matter that threads of a thread pool have evaluated some jobs in the past.
Javadoc of Executor says:
Memory consistency effects: Actions in a thread prior to submitting a Runnable object to an Executor happen-before its execution begins, perhaps in another thread.
So, as long as you use standard thread pool implementation and change the data before submitting the jobs you shouldn't worry about memory visibility effects.
What you are planning sounds sufficient. It depends on how you plan to "wake up thread pool."
The Java Memory Model provides that all writes performed by a thread before entering a synchronized block are visible to threads that subsequently synchronize on that lock.
So, if you are sure the worker threads are blocked in a wait() call (which must be inside a synchronized block) during the time you update the master list, when they wake up and become runnable, the modifications made by the master thread will be visible to these threads.
I would encourage you, however, to apply the higher level concurrency utilities in the java.util.concurrent package. These will be more robust than your own solution, and are a good place to learn concurrency before delving deeper.
Just to clarify: It's almost impossible to control worker threads without using a synchronized block where a check is made to see whether the worker has a task to implement. Thus, any changes made by the controller thread to the job happen-before the worker thread awakes. You require a synchronized block, or at least a volatile variable to act as a memory barrier; however, I can't think how you'd create a thread pool with using one of these.
As an example of the advantages of using the java.util.concurrency package, consider this: you could use a synchronized block with a wait() call in it, or a busy-wait loop with a volatile variable. Because of the overhead of context switching between threads, a busy wait can actually perform better under certain conditions—it's not necessary the horrible idea that one might assume at first glance.
If you use the Concurrency utilities (in this case, probably an ExecutorService), the best selection for your particular case can be made for you, factoring in the environment, the nature of the task, and the needs of other threads at a given time. Achieving that level of optimization yourself is a lot of needless work.
Why don't you make Collection<ValueRef> and ValueRef immutable or at least don't modify the values in the collection after you have published the reference to the collection. Then you will not have any worry about synchronization.
That is when you want to change the values of the collection, create a new collection and put new values in it. Once the values have been set pass the collection reference new job objects.
The only reason not to do this would be if the size of the collection is so large that it barely fits in memory and you cannot afford to have two copies, or the swapping of the collections would cause too much work for the garbage collector (prove that one of these is a problem before you use a mutable data structure for threaded code).
I have a Results object which is written to by several threads concurrently. However, each thread has a specific purpose and owns certain fields, so that no data is actually modified by more than one thread. The consumer of this data will not try to read it until all of the writer threads are done writing it. Because I know this to be true, there is no synchronization on the data writes and reads.
There is a RunningState object associated with this Results object which serves to coordinate this work. All of its methods are synchronized. When a thread is done with its work on this Results object, it calls done() on the RunningState object, which does the following: decrements a counter, checks if the counter has gone to 0 (indicating that all writers are done), and if so, puts this object on a concurrent queue. That queue is consumed by a ResultsStore which reads all of the fields and stores data in the database. Before reading any data, the ResultsStore calls RunningState.finalizeResult(), which is an empty method whose sole purpose is to synchronize on the RunningState object, to ensure that writes from all of the threads are visible to the reader.
Here are my concerns:
1) I believe that this will work correctly, but I feel like I'm violating good design principles to not synchronize on the data modifications to an object that is shared by multiple threads. However, if I were to add synchronization and/or split things up so each thread only saw the data it was responsible for, it would complicate the code. Anyone who modifies this area had better understand what's going on in any case or they're likely to break something, so from a maintenance standpoint I think the simpler code with good comments explaining how it works is a better way to go.
2) The fact that I need to call this do-nothing method seems like an indication of wrong design. Is it?
Opinions appreciated.
This seems mostly right, if a bit fragile (if you change the thread-local nature of one field, for instance, you may forget to synchronize it and end up with hard-to-trace data races).
The big area of concern is in memory visibility; I don't think you've established it. The empty finalizeResult() method may be synchronized, but if the writer threads didn't also synchronize on whatever it synchronizes on (presumably this?), there's no happens-before relationship. Remember, synchronization isn't absolute -- you synchronize relative to other threads that are also synchronized on the same object. Your do-nothing method will indeed do nothing, not even ensure any memory barrier.
You somehow need to establish a happens-before relationship between each thread doing its writes, and the thread that eventually reads. One way to do this without synchronization is via a volatile variable, or an AtomicInteger (or other atomic classes).
For instance, each writer thread can invoke counter.incrementAndGet(1) on the object, and the reading thread can then check that counter.get() == THE_CORRECT_VALUE. There's a happens-before relationship between a volatile/atomic field being written and it being read, which gives you the needed visibility.
Your design is sound, but it can be improved if you are using a true concurrent queue since a concurrent queue from the java.util.concurrent package already guarantees a happens before relationship between the thread putting an item into the queue, and the thread taking an item out, so this precludes needing to call finalizeResult() in the taking thread (so no need for that "do nothing" method call).
From java.util.concurrent package description:
The methods of all classes in java.util.concurrent and its subpackages
extend these guarantees to higher-level synchronization. In
particular:
Actions in a thread prior to placing an object into any
concurrent collection happen-before actions subsequent to the access
or removal of that element from the collection in another thread.
The comments in another answer concerning using an AtomicInteger instead of synchronization are also wise (as using an AtomicInteger to do your thread counting will likely perform better than synchronization), just make sure to get the value of the count after the atomic decrement (e.g. decrementAndGet()) when comparing to 0 in order to avoid adding to the queue twice.
What you've described is indeed safe, but it also sounds, frankly, brittle and (as you note) maintenance could become an issue. Without sample code, it's really hard to tell what's really easiest to understand, so an already subjective question becomes frankly unanswerable. Could you ask a coworker for a code review? (Particularly one that's likely to have to deal with this pattern.) I'm going to trust you that this is indeed the simplest approach, but doing something like wrapping synchronized blocks around writes would increase safety now and in the future. That said, you obviously know your code better than I do.