Thread Pool, Shared Data, Java Synchronization

Thread Pool, Shared Data, Java Synchronization - java

Say, I have a data object:
class ValueRef { double value; }
Where each data object is stored in a master collection:
Collection<ValueRef> masterList = ...;
I also have a collection of jobs, where each job has a local collection of data objects (where each data object also appears in the masterList):
class Job implements Runnable {
Collection<ValueRef> neededValues = ...;
void run() {
double sum = 0;
for (ValueRef x: neededValues) sum += x;
System.out.println(sum);
}
}
Use-case:
for (ValueRef x: masterList) { x.value = Math.random(); }
Populate a job queue with some jobs.
Wake up a thread pool
Wait until each job has been evaluated
Note: During the job evaluation, all of the values are all constant. The threads however, have possibly evaluated jobs in the past, and retain cached values.
Question: what is the minimal amount of synchronization necessary to ensure each thread sees the latest values?
I understand synchronize from the monitor/lock-perspective, I do not understand synchronize from the cache/flush-perspective (ie. what is being guaranteed by the memory model on enter/exit of the synchronized block).
To me, it feels like I should need to synchronize once in the thread that updates the values to commit the new values to main memory, and once per worker thread, to flush the cache so the new values are read. But I'm unsure how best to do this.
My approach: create a global monitor: static Object guard = new Object(); Then, synchronize on guard, while updating the master list. Then finally, before starting the thread pool, once for each thread in the pool, synchronize on guard in an empty block.
Does that really cause a full flush of any value read by that thread? Or just values touched inside the synchronize block? In which case, instead of an empty block, maybe I should read each value once in a loop?
Thanks for your time.
Edit: I think my question boils down to, once I exit a synchronized block, does every first read (after that point) go to main memory? Regardless of what I synchronized upon?

It doesn't matter that threads of a thread pool have evaluated some jobs in the past.
Javadoc of Executor says:
Memory consistency effects: Actions in a thread prior to submitting a Runnable object to an Executor happen-before its execution begins, perhaps in another thread.
So, as long as you use standard thread pool implementation and change the data before submitting the jobs you shouldn't worry about memory visibility effects.

What you are planning sounds sufficient. It depends on how you plan to "wake up thread pool."
The Java Memory Model provides that all writes performed by a thread before entering a synchronized block are visible to threads that subsequently synchronize on that lock.
So, if you are sure the worker threads are blocked in a wait() call (which must be inside a synchronized block) during the time you update the master list, when they wake up and become runnable, the modifications made by the master thread will be visible to these threads.
I would encourage you, however, to apply the higher level concurrency utilities in the java.util.concurrent package. These will be more robust than your own solution, and are a good place to learn concurrency before delving deeper.
Just to clarify: It's almost impossible to control worker threads without using a synchronized block where a check is made to see whether the worker has a task to implement. Thus, any changes made by the controller thread to the job happen-before the worker thread awakes. You require a synchronized block, or at least a volatile variable to act as a memory barrier; however, I can't think how you'd create a thread pool with using one of these.
As an example of the advantages of using the java.util.concurrency package, consider this: you could use a synchronized block with a wait() call in it, or a busy-wait loop with a volatile variable. Because of the overhead of context switching between threads, a busy wait can actually perform better under certain conditions—it's not necessary the horrible idea that one might assume at first glance.
If you use the Concurrency utilities (in this case, probably an ExecutorService), the best selection for your particular case can be made for you, factoring in the environment, the nature of the task, and the needs of other threads at a given time. Achieving that level of optimization yourself is a lot of needless work.

Why don't you make Collection<ValueRef> and ValueRef immutable or at least don't modify the values in the collection after you have published the reference to the collection. Then you will not have any worry about synchronization.
That is when you want to change the values of the collection, create a new collection and put new values in it. Once the values have been set pass the collection reference new job objects.
The only reason not to do this would be if the size of the collection is so large that it barely fits in memory and you cannot afford to have two copies, or the swapping of the collections would cause too much work for the garbage collector (prove that one of these is a problem before you use a mutable data structure for threaded code).

Related

Java prioritizing threads

My main thread has a private LinkedList which contains task objects for the players in my game. I then have a separate thread that runs every hour that accesses and clears that LinkedList and runs my algorithm which randomly adds new uncompleted tasks to every players LinkedList. Right now I made a getter method that is synchronized so that I dont run into any concurrency issues. This works fine but the synchronized keyword has a lot of overhead especially since its accessed a ton from the main thread while only accessed hourly from my second thread.
I am wondering if there is a way to prioritize the main thread? For example on that 2nd thread I could loop through the players then make a new LinkedList then run my algorithm and add all the tasks to that LinkedList then quickly assign the old LinkedList equal to the new one. This would slightly increase memory usage on the stack while improving main thread speed.
Basically I am trying to avoid making my main thread use synchronization when it will only be used once an hour at most and I am willing to greatly degrade the performance of the 2nd thread to keep the main threads speed. Is there a way I can use the 2nd thread to notify the 1st that it will be locking a method instead of having the 1st thread physically have to go through all of the synchronization over head steps? I feel like this would be possible since if that 2nd thread shares a cache with the main thread and it could change a boolean denoting that the main thread has to wait till that variable is changed back. The main thread would have to check that boolean every time it tries run that method and if the 2nd thread is telling it to wait the main thread will then freeze till the boolean is changed.
Of course the 2nd thread would have to specify which object and method has the lock along with a binary 0 or 1 denoting if its locked or not. Then the main thread would just need to check its shared cache for the object and the binary boolean value once it reaches that method which seems way faster than normal synchronization. Anyways this would then result in them main thread running at normal speed while the 2nd thread handles a bunch of work behind the scenes without degrading main thread performance. Does this exist if so how can I do it and if it does not exist how hard would it actually be to implement?

Premature optimization
It sounds like you are overly worried about the cost of synchronization. Doing a dozen, or a hundred, or even a thousand synchronizations once an hour is not going to impact the performance of your app by any significant amount.
If your concern has not yet been validated by careful study with a profiling tool, you’ve fallen into the common trap of premature optimization.
AtomicReference
Nevertheless, I can suggest an alternative approach.
You want to replace a list once an hour. If you do not mind letting any threads continue using the current list already accessed while you swap out for a new list, then use AtomicReference. An object of this class holds the reference to another object of a specified type.
I generally like the Atomic… classes for thread-safety work because they scream out to the reader that a concurrency problem is at hand.
AtomicReference < List < Task > > listRef = new AtomicReference<>( originalList ) ;
A different thread is able to replace that reference to the old list with a reference to the new list.
listRef.set( newList ) ;
Access by the other thread:
List< Task > list = listRef.get() ;
Note that this approach does not make thread-safe the payload, the list itself. But you claim that only a single thread will ever be manipulating the content of the list. You claim a different thread will only replace the entire list. So this AtomicReference serves the purpose of replacing the list in a thread-safe manner while making the issue of concurrency quite obvious.
volatile
Using AtomicReference accomplishes the same goal as volatile. I’m wary of volatile because (a) its use may go unnoticed by the reader, and (b) I suspect many Java programmers do not understand volatile, especially since its meaning was redefined.
For more info about why plain reference assignment is not thread-safe, see this Question.

Why do unsynchronized objects perform better than synchronized ones?

Question arises after reading this one. What is the difference between synchronized and unsynchronized objects? Why are unsynchronized objects perform better than synchronized ones?

What is the difference between Synchronized and Unsynchronized objects ? Why is Unsynchronized Objects perform better than Synchronized ones ?
HashTable is considered synchronized because its methods are marked as synchronized. Whenever a thread enters a synchronized method or a synchronized block it has to first get exclusive control over the monitor associated with the object instance being synchronized on. If another thread is already in a synchronized block on the same object then this will cause the thread to block which is a performance penalty as others have mentioned.
However, the synchronized block also does memory synchronization before and after which has memory cache implications and also restricts code reordering/optimization both of which have significant performance implications. So even if you have a single thread calling entering the synchronized block (i.e. no blocking) it will run slower than none.
One of the real performance improvements with threaded programs is realized because of separate CPU high-speed memory caches. When a threaded program does memory synchronization, the blocks of cached memory that have been updated need to be written to main memory and any updates made to main memory will invalidate local cached memory. By synchronizing more, again even in a single threaded program, you will see a performance hit.
As an aside, HashTable is an older class. If you want a reentrant Map then ConcurrentHashMap should be used.

Popular speaking the Synchronized Object is a single thread model，if there are 2 thread want to modify the Synchronized Object . if the first one get the lock of the Object ,that the last one should be waite。but if the Object is Unsynchronized，they can operat the object at the same time，It is the reason that why the Unsynchronized is unsafe。

For synchronization to work, the JVM has to prevent more than one thread entering a synchronized block at a time. This requires extra processing than if the synchronized block did not exist placing additional load on the JVM and therefore reducing performance.
The exact locking mechanisms in play when synchronization occurs are explain in How the Java virtual machine performs thread synchronization

Synchronization:
Array List is non-synchronized which means multiple threads can work
on Array List at the same time. For e.g. if one thread is performing
an add operation on Array List, there can be an another thread
performing remove operation on Array List at the same time in a multi
threaded environment
while Vector is synchronized. This means if one thread is working on
Vector, no other thread can get a hold of it. Unlike Array List, only
one thread can perform an operation on vector at a time.
Performance:
Synchronized operations consumes more time compared to
non-synchronized ones so if there is no need for thread safe
operation, Array List is a better choice as performance will be
improved because of the concurrent processes.

Synchronization is useful because it allows you to prevent code from being run twice at the same time (commonly called concurrency). This is important in a threaded environment for a multitude of reasons. In order to provide this guarantee the JVM has to do extra work which means that performance decreases. Because synchronization requires that only one process be allowed to execute at a time, it can cause multi-threaded programs to function as slowly (or slower!) than single-threaded programs.
It is important to note that the amount of performance decrease is not always obvious. Depending on the circumstances, the decrease may be tiny or huge. This depends on all sorts of things.
Finally, I'd like to add a short warning: Concurrent programming using synchronization is hard. I've found that usually other concurrency controls better suit my needs. One of my favorites is Atomic Reference. This utility is great because it very narrowly limits the amount of synchronized code. This makes it easier to read, maintain and write.

Thread safety within Java

So, while working on something that was having locking issues, a question came to me. Do objects that only can be accessed from a single thread require locks or synchronization at all?
For example, given Thread1, Thread2, and Thread3, along with Buffer1, Buffer2, Buffer3, where each buffer is instanced as a thread is created, meaning that Thread1 will only ever access Buffer1, and the same for Thread2 and Buffer2, along with Thread3 and Buffer3. Thread1 will never touch Buffer2 or Buffer3. While adding/removing/modifying bytes in the stream, are locks needed?

No, You wont need any locks in this case. Locking and synchronization is only required when any resource is being shared between multiple threads.
If you go ahead and add synchronization on the private instance of that buffer then still it wont make any difference as there will be no thread waiting to acquire locks, The only one locking and releasing the buffer will be the owner thread.

1. When more than one thread try to access an object, then locking becomes necessary.
2. Moreover classes when developed needs to be thread safe, if concurrent access by threads is possible.
3. A class is said to be thread safe, it if behaves correctly in the presence of interleaving and scheduling of the underlying OS , without any synchronization mechanism from the client.
4. Locking the resources can cause overhead, prevents concurrent access, and bottle neck situations.

Only when two or more threads need to access a shared object you need to worry about locking.

No. This strategy for ensuring thread-safety is generally referred to as confinement.
Confinement relies on encapsulation techniques to ensure that multiple threads cannot access an object. "Concurrent Programming in Java" by Doug Lea has good chapter on the details of confinement and its strengths and weaknesses compared to other exclusion techniques.
Paraphrasing from Lea, in general there are 4 conditions needed for confinement of a reference r, to an object x, within a method m:
m cannot pass r as an argument to another method.
m cannot pass r as a return value.
m cannot record r in a field (instance or static) that is accessible from another thread.
m cannot may not let any other references escape (via 1-3) that may be traversed to r.

From what I remember from my studies, if you are using a private buffer for every thread you should not worry about locking it to avoid concurrent access, since you don't have any.
If no-one is reading the buffer apart from the creator, it could do whatever he wants on it without worrying that someone else is reading or writing it. so you should be fine
But you have to remember that a thread can be interrupted at any time, so your internal buffer can be in a inconsistent state. (this shouldn't be a problem since you are accessing only sequentially from the same thread)

Locks are not needed unless threads are concurrently using the same data structure.
Hence if different data structures are used by each thread, your code is guaranteed to be thread safe.
Incidentally, this is one of the main reasons why the key Java collection classes like java.util.ArrayList are not thread safe: making them thread safe would add a performance overhead which you shouldn't have to pay for if you don't need, and in a lot of cases you don't need it because you can ensure in some other way that only one thread accesses the ArrayList at once.

Is there a way to stabilize direct communication between threads in java?

We have a system in which each thread (there can be dozens of them) works as an individual agent.
It has its own inner variables and objects, and it monitors other threads' objects as well as its own) in order to make decisions.
Unfortunately the system is deadlocking quite often.
Going through java tutorial (http://download.oracle.com/javase/tutorial/essential/concurrency/index.html) and through other topics here at stackoverflow, I managed to avoid some of these deadlocks by synchronizing the methods and using a monitor, as in:
Producer->monitor->Consumer.
However, not all communication between threads can be modeled like this. As I've mentioned before, at a given time one thread must have access to the objects (variables, lists, etc) of the other threads.
The way it's being done now is that each thread has a list with pointers to every other thread, forming a network. By looping through this list, one thread can read all the information it needs from all the others. Even though there is no writing involved (there shouldn't be any problems with data corruption), it still deadlocks.
My question is: is there an already known way for dealing with this sort of problem? A standard pattern such as the monitor solution?
Please let me know if the question needs more explanation and I'll edit the post.
Thank you in advance!
-Edit----
After getting these answers I studied more about java.concurrency and also the actor model. At the moment the problem seems to be fixed by using a reentrant lock:
http://download.oracle.com/javase/tutorial/essential/concurrency/newlocks.html
Since it can back out from an attempt to acquire the locks, it doesn't seem to have the problem of waiting forever for the them.
I also started implementing an alternate version following the actor model since it seems to be an interesting solution to this case.
My main mistakes were:
-Blindly trusting synchronize
-When in the tutorial they say "the lock is on the object" what they actually mean is the whole object running the thread (in my case), not the object I would like to access.
Thank you all for the help!

Look at higher-level concurrency constructs such as the java.util.concurrent package and the Akka framework/library. Synchronizing and locking manually is a guaranteed way to fail with threads in Java.

I would recommend to apply Actor model here (kind of share nothing parallelism model).
Using this model means that all your thread don't interrupt each other explicitely and you don't need to do any synchronization at all.
Instead of making synchronization you'll use messages. When one Actor (thread) needs to get info about another Actor, it just asynchronously send a correspondent message to that Actor.
Each Actor can also respond to messages of certain types. So, when a new message comes, Actor analyses it and sends a response (or does any other activity). The key point here is that processing of incoming messages is being done synchronously (i.e. it's the only point where you need the simplest way of synchronization - just mark the method which processes messages with synchronized modifier).

When one thread needs to synchronize with many other threads in a manner that a deadlock may occur, greedily acquire all your resources, and in the case that you can't acquire a single resource out of the set, release all resources and try again.
It's an algorithm based on the dining philosophers problem.

One important thing to remember is, that you have to aquire all locks in a consistent order across all your threads, in order to avoid the following situation:
Thread 1 Thread 2
acquire A acquire B
acquire B acquire A
One way to do it would be to have only objects used as locks, which can be ordered.
class Lock {
static final AtomicLong counter = new AtomicLong()
final long id = counter.incrementAndGet();
}
which must be used like
if (lock1.id < lock2.id) {
synchronized (lock1) {
synchronized (lock2) {
...
}
}
} else {
synchronized (lock2) {
synchronized (lock1) {
...
}
}
}
Obviously, this becomes tedious soon, in particular, the more locks are involved. Using explicit ReentrantLocks might help, as it more easily allows all that stuff to be factored out into a generic “grab multiple locks method“.
Another strategy, which might be applicable for your problem, would be "hand-over-hand" locking. Consider
class Node {
final ReentrantLock lock = new ReentrantLock();
Node previous;
Node next;
}
with a traversal operation like
Node start = ...;
Node successor;
start.lock.lock();
try {
successor = start.next;
successor.lock.lock();
} finally {
start.lock.unlock();
}
// Here, we own the lock on start's next sibling. We could continue
// with this scheme, traversing the entire graph, at any time holding
// at most two locks: the node we come from and the node we want to
// go to.
The above scheme still requires, that the locks are acquired in a consistent order across all threads. This means, that you can only every traverse the graph either in "forward" direction (i.e., following the thread of next pointers) or "backward" direction (going via previous). As soon as you start using both at random, things become prone to deadlocks again. This is potentially true also, if you make arbitrary changes to the graph structure, changing the positions of nodes.

How about actor model? Shortly speaking, in actor-based programming all threads work as independent actors (or, as you said, agents). Communication is done via messages. Each actor has its own message queue and processes these messages one by one. This model is implemented in a Scala programming language, and one of its frameworks - Akka - may be used from Java.

What I do is use ExecutorServices for each Thread Pool. When you want another thread to do work, you pass it copies (or immutable data) of all the information it will need. This way you have state which is local to a thread or thread pool and you have information which is passed to another thread. i.e. you never pass mutable state to another thread. This avoid the need to ever lock another threads data.

Disadvantage of synchronized methods in Java

What are the disadvantages of making a large Java non-static method synchronized? Large method in the sense it will take 1 to 2 mins to complete the execution.

If you synchronize the method and try to call it twice at the same time, one thread will have to wait two minutes.
This is not really a question of "disadvantages". Synchronization is either necessary or not, depending on what the method does.
If it is critical that the code runs only once at the same time, then you need synchronization.
If you want to run the code only once at the same time to preserve system resources, you may want to consider a counting Semaphore, which gives more flexibility (such as being able to configure the number of concurrent executions).
Another interesting aspect is that synchronization can only really be used to control access to resources within the same JVM. If you have more than one JVM and need to synchronize access to a shared file system or database, the synchronized keyword is not at all sufficient. You will need to get an external (global) lock for that.

If the method takes on the order of minutes to execute, then it may not need to be synchronized at such a coarse level, and it may be possible to use a more fine-grained system, perhaps by locking only the portion of a data structure that the method is operating on at the moment. Certainly, you should try to make sure that your critical section isn't really 2 minutes long - any method that takes that long to execute (regardless of the presence of other threads or locks) should be carefully studied as a candidate for parallelization. For a computation this time-consuming, you could be acquiring and releasing hundreds of locks and still have it be negligible. (Or, to put it another way, even if you need to introduce a lot of locks to parallelize this code, the overhead probably won't be significant.)

Since your method takes a huge amount of time to run, the relatively tiny amount of time it takes to acquire the synchronized lock should not be important.
A bigger problem could appear if your program is multithreaded (which I'm assuming it is, since you're making the method synchronized), and more than one thread needs to access that method, it could become a bottleneck. To prevent this, you might be able to rewrite the method so that it does not require synchronization, or use a synchronized block to reduce the size of the protected code (in general, the smaller the amount of code that is protected by the synchronize keyword, the better).
You can also look at the java.util.concurrent classes, as you may find a better solution there as well.

If the object is shared by multiple threads, if one thread tries to call the synchronized method on the object while another's call is in progress, it will be blocked for 1 to 2 minutes. In the worst case, you could end up with a bottleneck where the throughput of your system is dominated by executing these computations one at a time.
Whether this is a problem or not depends on the details of your application, but you probably should look at more fine-grained synchronization ... if that is practical.

In simple two lines Disadvantage of synchronized methods in Java :
Increase the waiting time of the thread
Create performance problem

First drawback is that threads that are blocked waiting to execute synchronize code can't be interrupted.Once they're blocked their stuck there, until they get the lock for the object the code is synchronizing on.
Second drawback is that the synchronized block must be within the same method in other words we can't start a synchronized block in one method and end the syncronized block in another for obvious reasons.
The third drawback is that we can't test to see if an object's intrinsic lock is available or find out any other information about the lock also if the lock isn't available we can't timeout after we waited lock for a while. When we reach the beginning of a synchronized block we can either get the lock and continue executing or block at that line of code until we get the lock.
The fourth drawback is that if multiple threads are awaiting to get lock, it's not first come first served. There isn't set order in which the JVM will choose the next thread that gets the lock, so the first thread that blocked could be the last thread to get the lock and vice Versa.
so instead of using synchronization we can prevent thread interference using classes that implement the java.util.concurrent locks.lock interface.

In simple two lines Disadvantage of synchronized methods in Java :
1. Increase the waiting time of the thread
2. Create a performance problem

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.