ConcurrentHashMap method execution order in multithreaded environment

ConcurrentHashMap method execution order in multithreaded environment - java

As per the below JLS rule-
Each action in a thread happens-before every action in that thread that comes later in the program's order.
In the below case, would clear() always execute before put in a multithreaded environment
private ConcurrentMap<Feature, Boolean> featureMap = new ConcurrentHashMap<>();
public void loadAllConfiguration() {
featureMap.clear();
featureMap.put()
}

In the below case, would clear() always execute before put in a multithreaded environment
Yes, in a multithreaded application, in each thread clear() comes before put(). However when you look at the interaction of multiple threads then this is no longer true in terms of the shared ConcurrentHashMap.
For example, due to race conditions, you might see the following sequence of events:
Thread 1 calls clear().
Thread 2 calls clear().
Thread 3 calls clear().
Thread 2 calls put().
Thread 1 calls put().
Thread 3 calls put().
Even though each thread does clear and then put, there is no guarantee that there will only be 1 item in the ConcurrentHashMap if that was the point of your question.

I'm not super clear on the question but I think:
Each action in a thread happens-before every action in that thread
that comes later in the program's order.
Means that within the context of a single thread (since clear and put are blocking synchronous calls) that the runtime guarantees that they will be executed in the order they are invoked.
Based on my limited understanding of java, this should NOT extend to a multithreaded environment. Suppose you have a single concurrent map shared between two threads, and each one of those threads invokes loadAllConfiguration against a shared featureMap.
The threads can be executed concurrently, so that the operations are interleved!!!!
This could result in an execution order of:
**THREAD 1** **THREAD 2**
map.clear()
map.put()
map.clear()
map.put()
or even in both clears being called concurrently and then both puts being applied concurrently.
I haven't used java so i'm not sure what the ConcurrentHashMap provides, but i'm assuming that it only protects you from race conditions (one thread writing while another reads) by using some sort of synchronization, but it should still might leave you exposed to logical errors (ie clears/put) being interleaved in a deterministic way)

Related

Difference between Locks and .join() method

Let's say you have two threads, thread1 and thread2. If you call thread1.start() and thread2.start() at the same time and they both print out numbers between 1 and 5, they will both run at the same time and they will randomly print out the numbers in any order, if I am not mistaken. To prevent this, you use the .join() method to make sure that a certain thread gets executed first. If this is what the .join() method does, what is the Lock object used for?

Thread.join is used to wait for another thread to finish. The join method uses the implicit lock on the Thread object and calls wait on it. When the thread being waited for finishes it notifies the waiting thread so it can stop waiting.
Java has different ways to use locks to protect access to data. There is implicit locking that uses a lock built into every Java object (this is where the synchronized keyword comes in), and then there are explicit Lock objects. Both of them protect data from concurrent access, the difference is the explicit Locks are more flexible and powerful, while implicit locking is designed to be easier to use.
With implicit locks, for instance, I can't not release the lock at the end of a synchronized method or block, the JVM makes sure that the lock gets released as the thread leaves. But programming with implicit locks can be limiting. For instance, there aren't separate condition objects so if there are different threads accessing a shared object for different things, notifying only a subset of them is not possible.
With explicit Locks you get separate condition objects and can notify only those threads waiting on a particular condition (producers might wait on one condition while consumers wait on another, see the ArrayBlockingQueue class for an example), and you can implement more involved kinds of patterns, like hand-over-hand locking. But you need to be much more careful, because the extra features introduce complications, and releasing the lock is up to you.

Locking typically prevents more than one thread from running a block of code at the same time. This is because only ONE thread at a time can acquire the lock and run the code within. If a thread wants the lock but it is already taken, then that thread goes into a wait state until the lock is released. If you have many threads waiting for the lock to be released, which one gets the lock next is INDETERMINATE (can't be predicted). This can lead to "thread starvation" where a thread is waiting for the lock, but it just never gets it because other threads always seem to get it instead. This is a very generic answer because you didn't specify a language. Some languages may differ slightly in that they might have a determinate method of deciding who gets the lock next.

Does a new thread have full memory-visibility of all other threads' previous actions on shared objects?

I have thread A maintaining a data structure (adding, deleting, changing values in a ConcurrentHashMap).
I have thread B listening on a socket and occasionally creating thread C to handle a new client connection.
All thread Cs will only ever read from the ConcurrentHashMap maintained by thread A (never update it).
Is thread C guaranteed to see all updates that were performed by thread A, on the ConcurrentHashMap, before thread C was created/ started by thread B?
(Edited last sentence to make question clearer: only concerned about updates to the ConcurrentHashMap.)

Is thread C guaranteed to see all updates that were performed by thread A before thread C was created/ started by thread B?
In general (for example, with an ordinary HashMap), No.
But (again in general) if thread C was created by thread A, then the answer would be yes.
(There is a happens-before relation between one thread calling start() on a thread object, and the start of the new thread's run() method. But you have introduced a third thread ... and have not described anything that would give you a happens-before chain from A to B to C.)
However, you are talking about a ConcurrentHashMap here, and concurrent maps have innate memory consistency:
"Memory consistency effects: As with other concurrent collections, actions in a thread prior to placing an object into a ConcurrentMap as a key or value happen-before actions subsequent to the access or removal of that object from the ConcurrentMap in another thread."
(From the ConcurrentMap javadoc.)
So, for anything where multiple threads are sharing a ConcurrentHashMap a read-only thread is guaranteed to see updates made by another one ... modulo the documented behavior of iterators.

Yes, as stated in docs:
Retrievals reflect the results of the most recently completed update
operations holding upon their onset. (More formally, an update
operation for a given key bears a happens-before relation with any
(non-null) retrieval for that key reporting the updated value.)

Yes, If the ConcurrentHashMap object is shared globally or passed/shared to/with thread C .

Your phrase
performed by thread A before thread C was created/ started by thread B?
deals with an ill-defined concept because Java Memory Model doesn't provide any guarantees based on time ordering. You have to ensure a happens-before ordering either through program order or through synchronization order and you seem to be doing neither.
So, to answer your question with "no" would be wrong because the question has a false hidden assumption.

I am new to this, I want to add something, Dariusz don't you think for updated values we can use volatile modifier.
Correct me if I am wrong volatile variables always return its updated value.

Mutli Threading in Java

When I have a synchronized method in java, and if multiple threads (lets say 10 threads) tries to access this method and lets assume some thread gets access to this method and finishes the execution of the method and releases the lock then which of the remaining 9 threads get access to this method? Is there any standard mechanism through which next thread will be selected from the pool or will it be selected in FIFO order or will it randomly be selected the thread?

Thread scheduling in Java is platform-specific. There is no guarantee in the order of thread execution in a synchronization scenario.
Having said that, the procedure is roughly as follows:
A preemptive scheduling algorithm is employed
Each thread gets a priority number by the JVM
The thread with he highest priority is selected
FIFO ordering is followed among threads with identical priorities
The JVM runs the thread with the highest priority. Priorities can be programmatically set, too, via the setPriority() method of the Thread class.

The next thread will be selected essentially at random, and the algorithm for selecting the next thread may be different on different machines. This is necessary for Java to gain the efficiencies of using native threads.
If you need first in, first out behavior, you may want to use something from the java.util.concurrent package, such as the Semaphore class with fairness set to true.

Thread Pool, Shared Data, Java Synchronization

Say, I have a data object:
class ValueRef { double value; }
Where each data object is stored in a master collection:
Collection<ValueRef> masterList = ...;
I also have a collection of jobs, where each job has a local collection of data objects (where each data object also appears in the masterList):
class Job implements Runnable {
Collection<ValueRef> neededValues = ...;
void run() {
double sum = 0;
for (ValueRef x: neededValues) sum += x;
System.out.println(sum);
}
}
Use-case:
for (ValueRef x: masterList) { x.value = Math.random(); }
Populate a job queue with some jobs.
Wake up a thread pool
Wait until each job has been evaluated
Note: During the job evaluation, all of the values are all constant. The threads however, have possibly evaluated jobs in the past, and retain cached values.
Question: what is the minimal amount of synchronization necessary to ensure each thread sees the latest values?
I understand synchronize from the monitor/lock-perspective, I do not understand synchronize from the cache/flush-perspective (ie. what is being guaranteed by the memory model on enter/exit of the synchronized block).
To me, it feels like I should need to synchronize once in the thread that updates the values to commit the new values to main memory, and once per worker thread, to flush the cache so the new values are read. But I'm unsure how best to do this.
My approach: create a global monitor: static Object guard = new Object(); Then, synchronize on guard, while updating the master list. Then finally, before starting the thread pool, once for each thread in the pool, synchronize on guard in an empty block.
Does that really cause a full flush of any value read by that thread? Or just values touched inside the synchronize block? In which case, instead of an empty block, maybe I should read each value once in a loop?
Thanks for your time.
Edit: I think my question boils down to, once I exit a synchronized block, does every first read (after that point) go to main memory? Regardless of what I synchronized upon?

It doesn't matter that threads of a thread pool have evaluated some jobs in the past.
Javadoc of Executor says:
Memory consistency effects: Actions in a thread prior to submitting a Runnable object to an Executor happen-before its execution begins, perhaps in another thread.
So, as long as you use standard thread pool implementation and change the data before submitting the jobs you shouldn't worry about memory visibility effects.

What you are planning sounds sufficient. It depends on how you plan to "wake up thread pool."
The Java Memory Model provides that all writes performed by a thread before entering a synchronized block are visible to threads that subsequently synchronize on that lock.
So, if you are sure the worker threads are blocked in a wait() call (which must be inside a synchronized block) during the time you update the master list, when they wake up and become runnable, the modifications made by the master thread will be visible to these threads.
I would encourage you, however, to apply the higher level concurrency utilities in the java.util.concurrent package. These will be more robust than your own solution, and are a good place to learn concurrency before delving deeper.
Just to clarify: It's almost impossible to control worker threads without using a synchronized block where a check is made to see whether the worker has a task to implement. Thus, any changes made by the controller thread to the job happen-before the worker thread awakes. You require a synchronized block, or at least a volatile variable to act as a memory barrier; however, I can't think how you'd create a thread pool with using one of these.
As an example of the advantages of using the java.util.concurrency package, consider this: you could use a synchronized block with a wait() call in it, or a busy-wait loop with a volatile variable. Because of the overhead of context switching between threads, a busy wait can actually perform better under certain conditions—it's not necessary the horrible idea that one might assume at first glance.
If you use the Concurrency utilities (in this case, probably an ExecutorService), the best selection for your particular case can be made for you, factoring in the environment, the nature of the task, and the needs of other threads at a given time. Achieving that level of optimization yourself is a lot of needless work.

Why don't you make Collection<ValueRef> and ValueRef immutable or at least don't modify the values in the collection after you have published the reference to the collection. Then you will not have any worry about synchronization.
That is when you want to change the values of the collection, create a new collection and put new values in it. Once the values have been set pass the collection reference new job objects.
The only reason not to do this would be if the size of the collection is so large that it barely fits in memory and you cannot afford to have two copies, or the swapping of the collections would cause too much work for the garbage collector (prove that one of these is a problem before you use a mutable data structure for threaded code).

Disadvantage of synchronized methods in Java

What are the disadvantages of making a large Java non-static method synchronized? Large method in the sense it will take 1 to 2 mins to complete the execution.

If you synchronize the method and try to call it twice at the same time, one thread will have to wait two minutes.
This is not really a question of "disadvantages". Synchronization is either necessary or not, depending on what the method does.
If it is critical that the code runs only once at the same time, then you need synchronization.
If you want to run the code only once at the same time to preserve system resources, you may want to consider a counting Semaphore, which gives more flexibility (such as being able to configure the number of concurrent executions).
Another interesting aspect is that synchronization can only really be used to control access to resources within the same JVM. If you have more than one JVM and need to synchronize access to a shared file system or database, the synchronized keyword is not at all sufficient. You will need to get an external (global) lock for that.

If the method takes on the order of minutes to execute, then it may not need to be synchronized at such a coarse level, and it may be possible to use a more fine-grained system, perhaps by locking only the portion of a data structure that the method is operating on at the moment. Certainly, you should try to make sure that your critical section isn't really 2 minutes long - any method that takes that long to execute (regardless of the presence of other threads or locks) should be carefully studied as a candidate for parallelization. For a computation this time-consuming, you could be acquiring and releasing hundreds of locks and still have it be negligible. (Or, to put it another way, even if you need to introduce a lot of locks to parallelize this code, the overhead probably won't be significant.)

Since your method takes a huge amount of time to run, the relatively tiny amount of time it takes to acquire the synchronized lock should not be important.
A bigger problem could appear if your program is multithreaded (which I'm assuming it is, since you're making the method synchronized), and more than one thread needs to access that method, it could become a bottleneck. To prevent this, you might be able to rewrite the method so that it does not require synchronization, or use a synchronized block to reduce the size of the protected code (in general, the smaller the amount of code that is protected by the synchronize keyword, the better).
You can also look at the java.util.concurrent classes, as you may find a better solution there as well.

If the object is shared by multiple threads, if one thread tries to call the synchronized method on the object while another's call is in progress, it will be blocked for 1 to 2 minutes. In the worst case, you could end up with a bottleneck where the throughput of your system is dominated by executing these computations one at a time.
Whether this is a problem or not depends on the details of your application, but you probably should look at more fine-grained synchronization ... if that is practical.

In simple two lines Disadvantage of synchronized methods in Java :
Increase the waiting time of the thread
Create performance problem

First drawback is that threads that are blocked waiting to execute synchronize code can't be interrupted.Once they're blocked their stuck there, until they get the lock for the object the code is synchronizing on.
Second drawback is that the synchronized block must be within the same method in other words we can't start a synchronized block in one method and end the syncronized block in another for obvious reasons.
The third drawback is that we can't test to see if an object's intrinsic lock is available or find out any other information about the lock also if the lock isn't available we can't timeout after we waited lock for a while. When we reach the beginning of a synchronized block we can either get the lock and continue executing or block at that line of code until we get the lock.
The fourth drawback is that if multiple threads are awaiting to get lock, it's not first come first served. There isn't set order in which the JVM will choose the next thread that gets the lock, so the first thread that blocked could be the last thread to get the lock and vice Versa.
so instead of using synchronization we can prevent thread interference using classes that implement the java.util.concurrent locks.lock interface.

In simple two lines Disadvantage of synchronized methods in Java :
1. Increase the waiting time of the thread
2. Create a performance problem

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.