Java - Can CountDownLatch.await() be reordered by the compiler - java

I have to invoke an operation on a different system. The other system is highly concurrent and distributed. Therefore I integrate it over a MessageBus. One implementation is required to let the caller wait until the result got received on the bus or the timeout expires.
The implementation consists of three steps: First I have to pack the future and the CountDownLatch, which is released once the result is received, into a ConcurrentHashMap. This Map is shared with the component listening on the MessageBus. Then I have to start the execution and finally wait on the latch.
public FailSafeFuture execute(Execution execution,long timeout,TimeUnit timeoutUnit) {
//Step 1
final WaitingFailSafeFuture future = new WaitingFailSafeFuture();
final CountDownLatch countDownLatch = new CountDownLatch(1);
final PendingExecution pendingExecution = new PendingExecution(future, countDownLatch);
final String id = execution.getId();
pendingExecutions.put(id, pendingExecution); //ConcurrentHashMap shared with Bus
//Step 2
execution.execute();
//Step 3
final boolean awaitSuccessfull = countDownLatch.await(timeout, timeoutUnit);
//...
return future;
}
So the question is: Can those 3 Steps be reordered by the compiler? As far as I understand, only Step 1 and Step 3 form a happens-before relationship. So in theory step 2 could be freely moved by the compiler. Could step 2 even be moved behind the await and therefore would invalidate the code?
Follow-up question: Would replacing the CountDownLatch with a wait/notify combination on a shared object solve the problem without further synchronization (leaving out the timeout of course)

Instruction reordering can happen if the end result remains the same (as far as the compiler thinks), and may require additional synchronization if multiple threads are involved (since the compiler doesn't know that other threads may expect a specific order).
In a single thread, as in your example, all steps have a happens-before relationship with each other. Even if the compiler could reorder those method invocations, the end result needs to be the same (execute() called before await()). As await() involves synchronization, there's no way that execute() could somehow slip after it, unless you were using a buggy crazy implementation.
The happens-before relationship between countDown() and await() makes sure that PendingExecution code happens before code executed await(). So there's nothing wrong with the code shown.
You should always prefer the java.util.concurrent classes over wait/notify, as they're easier to use and provide a lot more functionality.

Related

Is it thread-safe to synchronize only on add to HashSet?

Imagine having a main thread which creates a HashSet and starts a lot of worker threads passing HashSet to them.
Just like in code below:
void main() {
final Set<String> set = new HashSet<>();
final ExecutorService threadExecutor =
Executors.newFixedThreadPool(10);
threadExecutor.submit(() -> doJob(set));
}
void doJob(final Set<String> pSet) {
// do some stuff
final String x = ... // doesn't matter how we received the value.
if (!pSet.contains(x)) {
synchronized (pSet) {
// double check to prevent multiple adds within different threads
if (!pSet.contains(x)) {
// do some exclusive work with x.
pSet.add(x);
}
}
}
// do some stuff
}
I'm wondering is it thread-safe to synchronize only on add method? Is there any possible issues if contains is not synchronized?
My intuition telling me this is fine, after leaving synchronized block changes made to set should be visible to all threads, but JMM could be counter-intuitive sometimes.
P.S. I don't think it's a duplicate of How to lock multiple resources in java multithreading
Even though answers to both could be similar, this question addresses more particular case.
I'm wondering is it thread-safe to synchronize only on the add method? Are there any possible issues if contains is not synchronized as well?
Short answers: No and Yes.
There are two ways of explaining this:
The intuitive explanation
Java synchronization (in its various forms) guards against a number of things, including:
Two threads updating shared state at the same time.
One thread trying to read state while another is updating it.
Threads seeing stale values because memory caches have not been written to main memory.
In your example, synchronizing on add is sufficient to ensure that two threads cannot update the HashSet simultaneously, and that both calls will be operating on the most recent HashSet state.
However, if contains is not synchronized as well, a contains call could happen simultaneously with an add call. This could lead to the contains call seeing an intermediate state of the HashSet, leading to an incorrect result, or worse. This can also happen if the calls are not simultaneous, due to changes not being flushed to main memory immediately and/or the reading thread not reading from main memory.
The Memory Model explanation
The JLS specifies the Java Memory Model which sets out the conditions that must be fulfilled by a multi-threaded application to guarantee that one thread sees the memory updates made by another. The model is expressed in mathematical language, and not easy to understand, but the gist is that visibility is guaranteed if and only if there is a chain of happens before relationships from the write to a subsequent read. If the write and read are in different threads, then synchronization between the threads is the primary source of these relationships. For example in
// thread one
synchronized (sharedLock) {
sharedVariable = 42;
}
// thread two
synchronized (sharedLock) {
other = sharedVariable;
}
Assuming that the thread one code is run before the thread two code, there is a happens before relationships between thread one releasing the lock and thread two acquiring it. With this and the "program order" relations, we can build a chain from the write of 42 to the assignment to other. This is sufficient to guarantee that other will be assigned 42 (or possibly a later value of the variable) and NOT any value in sharedVariable before 42 was written to it.
Without the synchronized block synchronizing on the same lock, the second thread could see a stale value of sharedVariable; i.e. some value written to it before 42 was assigned to it.
That code is thread safe for the the synchronized (pSet) { } part :
if (!pSet.contains(x)) {
synchronized (pSet) {
// Here you are sure to have the updated value of pSet
if (!pSet.contains(x)) {
// do some exclusive work with x.
pSet.add(x);
}
}
because inside the synchronized statement on the pSet object :
one and only one thread may be in this block.
and inside it, pSet has also its updated state guaranteed by the happens-before relationship with the synchronized keyword.
So whatever the value returned by the first if (!pSet.contains(x)) statement for a waiting thread, when this waited thread will wake up and enter in the synchronized statement, it will set the last updated value of pSet. So even if the same element was added by a previous thread, the second if (!pSet.contains(x)) would return false.
But this code is not thread safe for the first statement if (!pSet.contains(x)) that could be executed during a writing on the Set.
As a rule of thumb, a collection not designed to be thread safe should not be used to perform concurrently writing and reading operations because the internal state of the collection could be in a in-progress/inconsistent state for a reading operation that would occur meanwhile a writing operation.
While some no thread safe collection implementations accept such a usage in the facts, that is not guarantee at all that it will always be true.
So you should use a thread safe Set implementation to guarantee the whole thing thread safe.
For example with :
Set<String> pSet = ConcurrentHashMap.newKeySet();
That uses under the hood a ConcurrentHashMap, so no lock for reading and a minimal lock for writing (only on the entry to modify and not the whole structure).
No,
You don't know in what state the Hashset might be during add by another Thread. There might be fundamental changes ongoing, like splitting of buckets, so that contains may return false during the adding by another thread, even if the element would be there in a singlethreaded HashSet. In that case you would try to add an element a second time.
Even Worse Scenario: contains might get into an endless loop or throw an exception because of an temporary invalid state of the HashSet in the memory used by the two threads at the same time.

Thread safety & atomicity of java semaphore

How are they achieved? I have seen the code of that class and it doesn't look like any kind of synchronisation mechanisms are used to ensure thread safety or atomicity of function calls.
I am referring to the class java.util.concurrent.Semaphore
edit : Please understand it is no way a bug report or non-confidence in java technology. Rather, a request to let me understand.
Introduction
In a broader lens, the question becomes how are any locking mechanisms achieving thread safety. Since this question has a few parts to it, I will step through it step by step.
Compare-and-Swap (CAS)
Compare-and-Swap (CAS) is an atomic instruction at machine-level and an atomic operation at a programmatic level. Now, in your specific semaphore question, they utilize this technique while accessing a permit from a semaphore.
Grab the current value
Calculate the new value
Perform a CAS operation by passing in the current value and the new value, seeing if the value at that memory access is the current value and swap it with the new. If it does hold the current value we have noted, the value is stored at the memory location and it returns true. If it finds a different value than our expected current value, it will not modify the memory location and return false.
AbstractQueuedSynchronizer
The AbstractQueuedSynchronizer class is an implemented class to handle synchronization while acquiring permits. The implementation creates a FIFO queue (Although there are different types of queues that can be used) and utilizes park/unpark techniques to deal with the thread running state.
Park/Unpark
A thread will park when there are no available permits in the semaphore. Conversely, the thread will unpark once a permit has become available (assuming you are using the FIFO implementation the first will unpark) and attempt a CAS operation to acquire the permit. If it fails, it will park and the process repeats.
Conclusion
These concepts working together utilize atomicity through machine-level instructions and programmatic operations to ensure permits/locks are only acquired by a single thread at a time thus, limiting thread access to blocks of logic to the desired amount.
Best way to understand that is by looking at nonfairTryAcquireShared():
final int nonfairTryAcquireShared(int acquires) {
for (;;) {
int available = getState();
int remaining = available - acquires;
if (remaining < 0 ||
compareAndSetState(available, remaining))
return remaining;
}
}
So, we busy-loop trying to acquire the state, then we're trying to take it.
Most interesting part is in compareAndSetState()
protected final boolean compareAndSetState(int expect, int update) {
// See below for intrinsics setup to support this
return unsafe.compareAndSwapInt(this, stateOffset, expect, update);
}
compareAndSwapInt is a native function, that guarantees atomicity, and thus - synchronization, in our case.

Processing sub-streams of a stream in Java using executors

I have a program that processes a huge stream (not in the sense of java.util.stream, but rather InputStream) of data coming in through the network. The stream consists of objects, each having a sort of sub-stream identifier. Right now the whole processing is done in a single thread, but it takes a lot of CPU time and each sub-stream can easily be processed independently, so I'm thinking of multi-threading it.
However, each sub-stream requires to keep a lot of bulky state, including various buffers, hash maps and such. There is no particular reason to make it concurrent or synchronized since sub-streams are independent of each other. Moreover, each sub-stream requires that its objects are processed in the order they arrive, which means that probably there should be a single thread for each sub-stream (but possibly one thread processing multiple sub-streams).
I'm thinking of several approaches to this, but they are not quite elegant.
Create a single ThreadPoolExecutor for all tasks. Each task will contain the next object to process and the reference to a Processor instance which keeps all the state. That would ensure the necessary happens-before relationship thus ensuring that the processing thread will see the up-to-date state for this sub-stream. This approach has no way to make sure that the next object of the same sub-stream will be processed in the same thread, as far as I can see. Moreover, it needs some guarantee that objects will be processed in the order they come in, which will require additional synchronization of the Processor objects, introducing unnecessary delays.
Create multiple single-thread executors manually and a sort of hash-map that maps sub-stream identifiers to executor. This approach requires manual management of executors, creating or shutting down them as new sub-streams begin or end, and distributing the tasks between them accordingly.
Create a custom executor that processes a special subclass of tasks each having a sub-stream ID. This executor would use it as a hint to use the same thread for executing this task as the previous one with the same ID. However, I don't see an easy way to implement such executor. Unfortunately, it doesn't seem possible to extend any of the existing executor classes, and implementing an executor from scratch is kind of overkill.
Create a single ThreadPoolExecutor, but instead of creating a task for each incoming object, create a single long-running task for each sub-stream that would block in a concurrent queue, waiting for the next object. Then put objects in queues according to their sub-stream IDs. This approach needs as many threads as there are sub-streams because the tasks will be blocked. The expected number of sub-streams is about 30-60, so that may be acceptable.
Alternatively, proceed as in 4, but limit the number of threads, assigning multiple sub-streams to a single task. This is sort of a hybrid between 2 and 4. As far as I can see, this is the best approach of these, but it still requires some sort of manual sub-stream distribution between tasks and some way to shut the extra tasks down as sub-streams end.
What would be the best way to ensure that each sub-stream is processed in its own thread without a lot of error-prone code? So that the following pseudo-code will work:
// loop {
Item next = stream.read();
int id = next.getSubstreamID();
Processor processor = getProcessor(id);
SubstreamTask task = new SubstreamTask(processor, next, id);
executor.submit(task); // This makes sure that the task will
// be executed in the same thread as the
// previous task with the same ID.
// } // loop
I suggest having an array of single threaded executors. If you can devise a consistent hashing strategy for sub-streams, you can map sub-streams to individual threads. e.g.
final ExecutorsService[] es = ...
public void submit(int id, Runnable run) {
es[(id & 0x7FFFFFFF) % es.length].submit(run);
}
The key could be an String or long but some way to identify the sub-stream. If you know a particular sub-stream is very expensive, you could assign it a dedicated thread.
The solution I finally chose looks like this:
private final Executor[] streamThreads
= new Executor[Runtime.getRuntime().availableProcessors()];
{
for (int i = 0; i < streamThreads.length; ++i) {
streamThreads[i] = Executors.newSingleThreadExecutor();
}
}
private final ConcurrentHashMap<SubstreamId, Integer>
threadById = new ConcurrentHashMap<>();
This code determines which executor to use:
Message msg = in.readNext();
SubstreamId msgSubstream = msg.getSubstreamId();
int exe = threadById.computeIfAbsent(msgSubstream,
id -> findBestExecutor());
streamThreads[exe].execute(() -> {
// processing goes here
});
And the findBestExecutor() function is this:
private int findBestExecutor() {
// Thread index -> substream count mapping:
final int[] loads = new int[streamThreads.length];
for (int thread : threadById.values()) {
++loads[thread];
}
// return the index of the minimum load
return IntStream.range(0, streamThreads.length)
.reduce((i, j) -> loads[i] <= loads[j] ? i : j)
.orElse(0);
}
This is, of course, not very efficient, but note that this function is only called when a new sub-stream shows up (which happens several times every few hours, so it's not a big deal in my case). My real code looks a bit more complicated because I have a way to determine whether two sub-streams are likely to finish simultaneously, and if they are, I try to assign them to different threads in order to maintain even load after they do finish. But since I never mentioned this detail in the question, I guess it doesn't belong to the answer either.

Can the synchronized statements be re-ordered by Java compiler for optimization?

Can the synchronization statements be reordered. i.e :
Can :
synchronized(A) {
synchronized(B) {
......
}
}
become :
synchronized(B) {
synchronized(A) {
......
}
}
Can the synchronization statements be reordered?
I assume you are asking if the compiler can reorder the synchronized blocks so the lock order happens in a different order than the code.
The answer is no. A synchronized block (and a volatile field access) impose ordering restrictions on the compiler. In your case, you cannot move a monitor-enter before another monitor-enter nor a monitor-exit after another monitor-exit. See the grid below.
To quote from JSR 133 (Java Memory Model) FAQ:
It is not possible, for example, for the compiler to move your code before an acquire or after a release. When we say that acquires and releases act on caches, we are using shorthand for a number of possible effects.
Doug Lea's JSR-133 Cookbook has a grid which shows the reordering possibilities. A blank entry in the grid means that reordering is allowed. In your case, entering a synchronized block is a "MonitorEnter" (same reordering limitations as loading of a volatile field) and exiting a synchronized block is a "MonitorExit" (same as storing to a volatile field).
Yes and no.
The order must be consistent.
Suppose you are creating a transaction between two bank accounts, and always grab the sender's lock first, then grab the receiver's lock. Problem is - say both Dan and Bob want to transfer money to each other at the same time.
Thread 1 might grab Dan's lock, as it processes Dan's transaction to Bob.
Then thread 2 grab's Bob's lock, as it processes Bob's transaction to Dan.
Then, bam, deadlock.
The morals are:
Lock less.
Read Java: Concurrency in Practice. My example is taken from there. I like arguing about the merits of books in programming as much as the next guy, but it's pretty rare you get comprehensive coverage of a difficult topic between two covers, so enjoy it.
So this is the part of the answer where I guess at other things you might have been trying to ask instead, because the expectation is firmly on me that I act psychic.
The JVM will not acquire the locks in an order different from which you have programmed. How do I know this? Because otherwise it would not be possible to solve the problem in the first half of my answer.
Synchronized statements are never reordered by the compiler as it has a big effect on what ends up happening.
Synchronized blocks are used to obtain a lock on the specific Object placed between the synchronized parenthesis.
private final Object LOCK_1 = new Object();
public void foo(){
synchronized(LOCK_1){
//code here...
}
}
Obtains the lock for Object LOCK_1 and releases it when the synchronization block completes. Since synchronization blocks are used to guard against concurrent access it may be sometimes required to use multiple locks especially when multiple thread-unsafe objects are being written/read to/from.
Consider the following code that uses a nested synchronization block:
private final Object LOCK_1 = new Object();
private final Object LOCK_2 = new Object();
public void bar(){
synchronized(LOCK_1){
//Point A
synchronized(LOCK_2){
//Point B
}
//Point C
}
//Point D
}
If we look at points A,B,C,D we can realize why the order of synchronization matters.
First at point A, the lock for LOCK_1 is obtained therefore any other threads trying to obtain LOCK_1 is put into a queue.
At point B, the currently executing thread owns the lock for both LOCK_1 and LOCK_2.
At point C, the currently executing thread has released the lock for LOCK_2
At point D, the currently executing thread has released all locks.
If we flip this example around and decided to put LOCK_2 on the outer block, you will realize that the thread's order of obtaining locks changes which has a big effect on what it ends up doing. Normally, when I make programs with synchronization blocks I use one MUTEX object per thread-unsafe resource I am accessing (or one MUTEX per group). Say I want to read from a stream using LOCK_1 and write to a stream using LOCK_2. It would be illogical to think that swapping the locking order around means the same thing.
Consider that LOCK_2 (the writing lock) is being held by another thread. If we have LOCK_1 on the outer block the currently executing thread can at least process all the reading code before being put into a queue for the writing lock (essentially, the ability to execute code at point A). If we flipped the order of the locks around, the currently executing thread will end up having to wait for the writing is complete, then proceed onto reading and writing whilst holding onto the writing lock (all the way through reading too).
Another problem that comes up when the order of locks are switched (and not consistently, some code has LOCK_1 first and others have LOCK_2 first). Consider that two threads both eagerly try to execute code which have different locking orders. Thread 1 obtains LOCK_1 in the outer block and thread 2 obtains LOCK_2 from the outer block. Now when thread 1 tries to obtain LOCK_2, it can't since thread 2 has it. And when thread 2 tries to obtain LOCK_1, it can't either because thread 1 has it. The two threads essentially block on each other forever, forming a deadlock situation.
To answer your question, if you want to lock on two objects immediately without doing any sort of processing between locks then the order is irrelevant (essentially no processing at point A or C). HOWEVER it is essential to keep the order consistent throughout your program as to avoid deadlocking.

These three threads don't take turns when using Thread.yield()?

In an effort to practice my rusty Java, I wanted to try a simple multi-threaded shared data example and I came across something that surprised me.
Basically we have a shared AtomicInteger counter between three threads that each take turns incrementing and printing the counter.
main
AtomicInteger counter = new AtomicInteger(0);
CounterThread ct1 = new CounterThread(counter, "A");
CounterThread ct2 = new CounterThread(counter, "B");
CounterThread ct3 = new CounterThread(counter, "C");
ct1.start();
ct2.start();
ct3.start();
CounterThread
public class CounterThread extends Thread
{
private AtomicInteger _count;
private String _id;
public CounterThread(AtomicInteger count, String id)
{
_count = count;
_id = id;
}
public void run()
{
while(_count.get() < 1000)
{
System.out.println(_id + ": " + _count.incrementAndGet());
Thread.yield();
}
}
}
I expected that when each thread executed Thread.yield(), that it would give over execution to another thread to increment _count like this:
A: 1
B: 2
C: 3
A: 4
...
Instead, I got output where A would increment _count 100 times, then pass it off to B. Sometimes all three threads would take turns consistently, but sometimes one thread would dominate for several increments.
Why doesn't Thread.yield() always yield processing over to another thread?
I expected that when each thread executed Thread.yield(), that it would give over execution to another thread to increment _count like this:
In threaded applications that are spinning, predicting the output is extremely hard. You would have to do a lot of work with locks and stuff to get perfect A:1 B:2 C:3 ... type output.
The problem is that everything is a race condition and unpredictable due to hardware, race-conditions, time-slicing randomness, and other factors. For example, when the first thread starts, it may run for a couple of millis before the next thread starts. There would be no one to yield() to. Also, even if it yields, maybe you are on a 4 processor box so there is no reason to pause any other threads at all.
Instead, I got output where A would increment _count 100 times, then pass it off to B. Sometimes all three threads would take turns consistently, but sometimes one thread would dominate for several increments.
Right, in general with this spinning loops, you see bursts of output from a single thread as it gets time slices. This is also confused by the fact that System.out.println(...) is synchronized which affects the timing as well. If it was not doing a synchronized operation, you would see even more bursty output.
Why doesn't Thread.yield() always yield processing over to another thread?
I very rarely use Thread.yield(). It is a hint to the scheduler at best and probably is ignored on some architectures. The idea that it "pauses" the thread is very misleading. It may cause the thread to be put back to the end of the run queue but there is no guarantee that there are any threads waiting so it may keep running as if the yield were removed.
See my answer here for more info : unwanted output in multithreading
Let's read some javadoc, shall we?
A hint to the scheduler that the current thread is willing to yield
its current use of a processor. The scheduler is free to ignore this
hint.
[...]
It is rarely appropriate to use this method. It may be useful
for debugging or testing purposes, where it may help to reproduce bugs
due to race conditions. It may also be useful when designing
concurrency control constructs such as the ones in the
java.util.concurrent.locks package.
You cannot guarantee that another thread will obtain the processor after a yield(). It's up to the scheduler and it seems he/she doesn't want to in your case. You might consider sleep()ing instead, for testing.

Categories