Is it thread-safe to synchronize only on add to HashSet?

Is it thread-safe to synchronize only on add to HashSet? - java

Imagine having a main thread which creates a HashSet and starts a lot of worker threads passing HashSet to them.
Just like in code below:
void main() {
final Set<String> set = new HashSet<>();
final ExecutorService threadExecutor =
Executors.newFixedThreadPool(10);
threadExecutor.submit(() -> doJob(set));
}
void doJob(final Set<String> pSet) {
// do some stuff
final String x = ... // doesn't matter how we received the value.
if (!pSet.contains(x)) {
synchronized (pSet) {
// double check to prevent multiple adds within different threads
if (!pSet.contains(x)) {
// do some exclusive work with x.
pSet.add(x);
}
}
}
// do some stuff
}
I'm wondering is it thread-safe to synchronize only on add method? Is there any possible issues if contains is not synchronized?
My intuition telling me this is fine, after leaving synchronized block changes made to set should be visible to all threads, but JMM could be counter-intuitive sometimes.
P.S. I don't think it's a duplicate of How to lock multiple resources in java multithreading
Even though answers to both could be similar, this question addresses more particular case.

I'm wondering is it thread-safe to synchronize only on the add method? Are there any possible issues if contains is not synchronized as well?
Short answers: No and Yes.
There are two ways of explaining this:
The intuitive explanation
Java synchronization (in its various forms) guards against a number of things, including:
Two threads updating shared state at the same time.
One thread trying to read state while another is updating it.
Threads seeing stale values because memory caches have not been written to main memory.
In your example, synchronizing on add is sufficient to ensure that two threads cannot update the HashSet simultaneously, and that both calls will be operating on the most recent HashSet state.
However, if contains is not synchronized as well, a contains call could happen simultaneously with an add call. This could lead to the contains call seeing an intermediate state of the HashSet, leading to an incorrect result, or worse. This can also happen if the calls are not simultaneous, due to changes not being flushed to main memory immediately and/or the reading thread not reading from main memory.
The Memory Model explanation
The JLS specifies the Java Memory Model which sets out the conditions that must be fulfilled by a multi-threaded application to guarantee that one thread sees the memory updates made by another. The model is expressed in mathematical language, and not easy to understand, but the gist is that visibility is guaranteed if and only if there is a chain of happens before relationships from the write to a subsequent read. If the write and read are in different threads, then synchronization between the threads is the primary source of these relationships. For example in
// thread one
synchronized (sharedLock) {
sharedVariable = 42;
}
// thread two
synchronized (sharedLock) {
other = sharedVariable;
}
Assuming that the thread one code is run before the thread two code, there is a happens before relationships between thread one releasing the lock and thread two acquiring it. With this and the "program order" relations, we can build a chain from the write of 42 to the assignment to other. This is sufficient to guarantee that other will be assigned 42 (or possibly a later value of the variable) and NOT any value in sharedVariable before 42 was written to it.
Without the synchronized block synchronizing on the same lock, the second thread could see a stale value of sharedVariable; i.e. some value written to it before 42 was assigned to it.

That code is thread safe for the the synchronized (pSet) { } part :
if (!pSet.contains(x)) {
synchronized (pSet) {
// Here you are sure to have the updated value of pSet
if (!pSet.contains(x)) {
// do some exclusive work with x.
pSet.add(x);
}
}
because inside the synchronized statement on the pSet object :
one and only one thread may be in this block.
and inside it, pSet has also its updated state guaranteed by the happens-before relationship with the synchronized keyword.
So whatever the value returned by the first if (!pSet.contains(x)) statement for a waiting thread, when this waited thread will wake up and enter in the synchronized statement, it will set the last updated value of pSet. So even if the same element was added by a previous thread, the second if (!pSet.contains(x)) would return false.
But this code is not thread safe for the first statement if (!pSet.contains(x)) that could be executed during a writing on the Set.
As a rule of thumb, a collection not designed to be thread safe should not be used to perform concurrently writing and reading operations because the internal state of the collection could be in a in-progress/inconsistent state for a reading operation that would occur meanwhile a writing operation.
While some no thread safe collection implementations accept such a usage in the facts, that is not guarantee at all that it will always be true.
So you should use a thread safe Set implementation to guarantee the whole thing thread safe.
For example with :
Set<String> pSet = ConcurrentHashMap.newKeySet();
That uses under the hood a ConcurrentHashMap, so no lock for reading and a minimal lock for writing (only on the entry to modify and not the whole structure).

No,
You don't know in what state the Hashset might be during add by another Thread. There might be fundamental changes ongoing, like splitting of buckets, so that contains may return false during the adding by another thread, even if the element would be there in a singlethreaded HashSet. In that case you would try to add an element a second time.
Even Worse Scenario: contains might get into an endless loop or throw an exception because of an temporary invalid state of the HashSet in the memory used by the two threads at the same time.

Related

Java ArrayList's set thread-safety

I have a usecase where multiple threads can be reading and modifying an ArrayList where the default values for these booleans are True.
The only modification the threads can make is setting an element of that ArrayList from True to False.
All of the threads will be also reading the ArrayList concurrently, but it is okay to read staled values.
Note:
The size of the ArrayList will not change throughout the lifetime of the ArrayList.
Question:
Is it necessary to synchronize the ArrayList across these threads? The only synchronization I'm doing is marking the ArrayList as volatile such that any update to it will be flushed back to the main memory from a thread's local memory. Is this enough?
Here is a sample code on how this ArrayList gets used by threads
myList is the ArrayList in question and its values are initialized to True
if (!myList.get(index)) {
return;
} else {
// do some operations to determine if we should
// update the value of myList to False.
if (needToUpdateList) {
myList.set(index, False);
}
}
Update
I previously said these threads do not care about staled values which is true. However, I have another thread that only reads these values and perform one final action. This thread does care about staleness. Does the synchronization requirement change?
Is there a cheaper way to "publish" the updated values besides requiring synchronization on every update? I'm trying to minimize locking as much as possible.

As it says in the Javadoc of ArrayList:
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally.
You're not modifying the list structurally, so you don't need to synchronize for that reason.
The other reason you'd want to synchronize is to avoid reading stale values; but you say that don't care about that.
As such there is no reason to synchronize.
Edit for the update #3
If you do care about reading stale values, you do need to synchronize.
An alternative to synchronization which would avoid locking the entire list would be to make it a List<AtomicBoolean>: this would not require synchronization of the list, because you aren't changing the values stored in the list; but reads of an AtomicBoolean value guarantees visibility.

It depends on what you want to do when an element is true. Consider your code, with a separate thread messing with the value you're looking at:
if (!myList.get(index)) { // <--- at this point, the value is True, so go to else branch
return;
} else {
// <--- at this point, other thread sets value to False.
// do some operations to determine if we should
// update the value of myList to False.
// **** Do these operations assume the value is still True?
// **** If so, then your application is not thread-safe.
if (needToUpdateList) {
myList.set(index, False);
}
}

Update
I previously said these threads do not care about staled values which is true. However, I have another thread that only reads these values and perform one final action. This thread does care about staleness. Does the synchronization requirement change?
You just invalidated a lot of perfectly good answers.
Yes, synchronization matters now. In fact probably atomicity matters too. Use a synchronized List or maybe even a Concurrent list or map of some sort.
Anytime you read-modify-write the list, you probably also have to hold the synchronized lock to preserve atomicity:
synchronized( myList ) {
if (!myList.get(index)) {
return;
} else {
// do some operations to determine if we should
// update the value of myList to False.
if (needToUpdateList) {
myList.set(index, False);
}
}
}
Edit: to reduce the time the lock is held, a lot depends on how long "do some operations" take, but a CocurrentHashMap reduces lock contention at the cost of some additional overhead. It might be worth profiling your code to determine the actual overhead and which method is faster/better.
ConcurrentHashMap<Integer,Boolean> myList = new ConcurrentHashMap<>();
//...
if( myList.get( index ) != null ) return;
// "do some opertaions" here
if( needToUpdate )
myList.put( index, false );
But I'm still not convinced that this isn't premature optimization. Write correct code first, fully synchronizing the list is probably fine. Then profile working code. If you do find a bottleneck, then you can worry about whether reducing lock contention is a good idea. But probably the code bottleneck won't be in the lock contention and will in fact be somewhere totally different.

I did some more googling and found that each thread might be storing the values in the registry or the local cache. The specs only offer some guarantee on when data would be written to shared/global memory.
https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4.5
basically volatile, synchronized, thread.start(), thread.join()...
So yeah using the AtmoicBoolean will probably be the easiest, but you can also synchronize or make a class with a volatile boolean in it.
check this link out:
http://tutorials.jenkov.com/java-concurrency/volatile.html#variable-visibility-problems

Java variable shared between two process [duplicate]

My teacher in an upper level Java class on threading said something that I wasn't sure of.
He stated that the following code would not necessarily update the ready variable. According to him, the two threads don't necessarily share the static variable, specifically in the case when each thread (main thread versus ReaderThread) is running on its own processor and therefore doesn't share the same registers/cache/etc and one CPU won't update the other.
Essentially, he said it is possible that ready is updated in the main thread, but NOT in the ReaderThread, so that ReaderThread will loop infinitely.
He also claimed it was possible for the program to print 0 or 42. I understand how 42 could be printed, but not 0. He mentioned this would be the case when the number variable is set to the default value.
I thought perhaps it is not guaranteed that the static variable is updated between the threads, but this strikes me as very odd for Java. Does making ready volatile correct this problem?
He showed this code:
public class NoVisibility {
private static boolean ready;
private static int number;
private static class ReaderThread extends Thread {
public void run() {
while (!ready) Thread.yield();
System.out.println(number);
}
}
public static void main(String[] args) {
new ReaderThread().start();
number = 42;
ready = true;
}
}

There isn't anything special about static variables when it comes to visibility. If they are accessible any thread can get at them, so you're more likely to see concurrency problems because they're more exposed.
There is a visibility issue imposed by the JVM's memory model. Here's an article talking about the memory model and how writes become visible to threads. You can't count on changes one thread makes becoming visible to other threads in a timely manner (actually the JVM has no obligation to make those changes visible to you at all, in any time frame), unless you establish a happens-before relationship.
Here's a quote from that link (supplied in the comment by Jed Wesley-Smith):
Chapter 17 of the Java Language Specification defines the happens-before relation on memory operations such as reads and writes of shared variables. The results of a write by one thread are guaranteed to be visible to a read by another thread only if the write operation happens-before the read operation. The synchronized and volatile constructs, as well as the Thread.start() and Thread.join() methods, can form happens-before relationships. In particular:
Each action in a thread happens-before every action in that thread that comes later in the program's order.
An unlock (synchronized block or method exit) of a monitor happens-before every subsequent lock (synchronized block or method entry) of that same monitor. And because the happens-before relation is transitive, all actions of a thread prior to unlocking happen-before all actions subsequent to any thread locking that monitor.
A write to a volatile field happens-before every subsequent read of that same field. Writes and reads of volatile fields have similar memory consistency effects as entering and exiting monitors, but do not entail mutual exclusion locking.
A call to start on a thread happens-before any action in the started thread.
All actions in a thread happen-before any other thread successfully returns from a join on that thread.

He was talking about visibility and not to be taken too literally.
Static variables are indeed shared between threads, but the changes made in one thread may not be visible to another thread immediately, making it seem like there are two copies of the variable.
This article presents a view that is consistent with how he presented the info:
http://jeremymanson.blogspot.com/2008/11/what-volatile-means-in-java.html
First, you have to understand a little something about the Java memory model. I've struggled a bit over the years to explain it briefly and well. As of today, the best way I can think of to describe it is if you imagine it this way:
Each thread in Java takes place in a separate memory space (this is clearly untrue, so bear with me on this one).
You need to use special mechanisms to guarantee that communication happens between these threads, as you would on a message passing system.
Memory writes that happen in one thread can "leak through" and be seen by another thread, but this is by no means guaranteed. Without explicit communication, you can't guarantee which writes get seen by other threads, or even the order in which they get seen.
...
But again, this is simply a mental model to think about threading and volatile, not literally how the JVM works.

Basically it's true, but actually the problem is more complex. Visibility of shared data can be affected not only by CPU caches, but also by out-of-order execution of instructions.
Therefore Java defines a Memory Model, that states under which circumstances threads can see consistent state of the shared data.
In your particular case, adding volatile guarantees visibility.

They are "shared" of course in the sense that they both refer to the same variable, but they don't necessarily see each other's updates. This is true for any variable, not just static.
And in theory, writes made by another thread can appear to be in a different order, unless the variables are declared volatile or the writes are explicitly synchronized.

Within a single classloader, static fields are always shared. To explicitly scope data to threads, you'd want to use a facility like ThreadLocal.

When you initialize static primitive type variable java default assigns a value for static variables
public static int i ;
when you define the variable like this the default value of i = 0;
thats why there is a possibility to get you 0.
then the main thread updates the value of boolean ready to true. since ready is a static variable , main thread and the other thread reference to the same memory address so the ready variable change. so the secondary thread get out from while loop and print value.
when printing the value initialized value of number is 0. if the thread process has passed while loop before main thread update number variable. then there is a possibility to print 0

#dontocsata
you can go back to your teacher and school him a little :)
few notes from the real world and regardless what you see or be told.
Please NOTE, the words below are regarding this particular case in the exact order shown.
The following 2 variable will reside on the same cache line under virtually any know architecture.
private static boolean ready;
private static int number;
Thread.exit (main thread) is guaranteed to exit and exit is guaranteed to cause a memory fence, due to the thread group thread removal (and many other issues). (it's a synchronized call, and I see no single way to be implemented w/o the sync part since the ThreadGroup must terminate as well if no daemon threads are left, etc).
The started thread ReaderThread is going to keep the process alive since it is not a daemon one!
Thus ready and number will be flushed together (or the number before if a context switch occurs) and there is no real reason for reordering in this case at least I can't even think of one.
You will need something truly weird to see anything but 42. Again I do presume both static variables will be in the same cache line. I just can't imagine a cache line 4 bytes long OR a JVM that will not assign them in a continuous area (cache line).

Atomicity of increment operation

I am learning multi-thread programming from 'Java Concurrency in Practice'.
At one point, book says that even an innocuous looking increment operation is not thread safe as it consists of three different operations...read,modify and write.
class A {
private void int c;
public void increment() {
++c;
}
}
So increment statement is not atomic, hence not thread safe.
My question is that if an environment is really concurrent (ie multiple threads are able to execute their program statements exactly at same time) then a statement which is really atomic also can't be thread safe as multiple threads can read same value.
So how can having an atomic statement help in achieving thread safety in a concurrent environment?

True concurrency does not exist when it comes to modifying state.
This post has some good descriptions of Concurrency and Parallelism.
As stated by #RitchieHindle in that post:
Concurrency is when two tasks can start, run, and complete in overlapping time periods. It doesn't necessarily mean they'll ever both be running at the same instant. Eg. multitasking on a single-core machine.
As an example, the danger of non-atomic operations is that one thread might read the value, another might modify the value, and then the original thread might modify and write the value (thus negating the modification the second thread did).
Atomic operations do not allow other operations access to the state while in the middle of the atomic operation. If, for example, the increment operator were atomic, it would read, modify, and write without any other thread having access to that variables state while those operations took place.

You can use AtomicInteger. The linked Javadoc says (in part) that it is an int value that may be updated atomically. AtomicInteger also implements addAndGet(int) which atomically adds the given value to the current value
private AtomicInteger ai = new AtomicInteger(1); // <-- or another initial value
public int increment() {
return ai.addAndGet(1); // <-- or another increment value
}
That can (for example) allow you to guarantee write order consistency for multiple threads. Consider, ai might represent (or include) some static (or global) resource. If a value is thread local then you don't need to consider atomicity.

Mutating/Reading a HashMap from multiple threads but one thread at a time

I want to use a LinkedHashMap in a multi-threaded environment, where multiple threads can access the hashmap (read/write), but only one thread will do so at one time. Hence synchronization is not required. But, I need to ensure that the changes done by one thread are readable by any other thread which accesses it later.
For example:
LinkedHashMap map = new LinkedHashMap();
// Thread 1
map.put(key1, val1);
// Thread 2. It starts after thread 1 has finished.
Object val = map.get(key1);
assert(val == val1);
..edit
Some people wanted the question to be explicitly stated. So here it is:
"I want to ensure that the changes done to a LinkedHashMap are visible to other threads i.e the changes are written to the main memory, and other threads read the map from main memory only. There is no concurrent access of the map."

The answer depends on how exactly you enforce "only one thread will access the map at one time". (The naive understanding of time is inapplicable here. Synchronization is what gives the meaning to time in programs). If you are piggybacking on external synchronization anywhere else, then it's probably work, albeit in a fragile way.
"I want to ensure that the changes done to a LinkedHashMap are visible
to other threads i.e the changes are written to the main memory, and
other threads read the map from main memory only."
Then you have to use synchronization. Nothing else will get you the semantics you are asking for. Guarding all accesses with synchronized on the same object is a routine way, but that penalizes the reads. It might be more performant to guard the LinkedHashMap accesses with ReadWriteLock, if reads greatly outnumber the writes.

Try with ConcurrentHashMap. It scans for a node containing given key while trying to acquire lock, creating and returning one if not found. Upon return, guarantees that lock is held.
Example:
ConcurrentMap<String, String> concurrentMap = new ConcurrentHashMap<String, String>();
concurrentMap.put("key", "value");
String value = concurrentMap.get("key");

For the sake of inter-thread visibilty, you still need to use some facility which will ensure happens-before relationship (JLS7 Section 17.4.5) between write to a map and subsequent reads from other threads. Simplest way is your olde synchronized block:
final Object lock = new Object();
...
synchronized(lock) {
// write to map
}
...
synchronized(lock) {
// read from map
}
Or, maybe, at least one of following conditions applies to your code, so you won't need synchronized:
A call to start() on a thread happens-before any actions in the started thread.
All actions in a thread happen-before any other thread successfully returns from a join() on that thread.
E.g., your writer starts reader after a write, or reader calls writer.join() before doing its' read.

Can the synchronized statements be re-ordered by Java compiler for optimization?

Can the synchronization statements be reordered. i.e :
Can :
synchronized(A) {
synchronized(B) {
......
}
}
become :
synchronized(B) {
synchronized(A) {
......
}
}

Can the synchronization statements be reordered?
I assume you are asking if the compiler can reorder the synchronized blocks so the lock order happens in a different order than the code.
The answer is no. A synchronized block (and a volatile field access) impose ordering restrictions on the compiler. In your case, you cannot move a monitor-enter before another monitor-enter nor a monitor-exit after another monitor-exit. See the grid below.
To quote from JSR 133 (Java Memory Model) FAQ:
It is not possible, for example, for the compiler to move your code before an acquire or after a release. When we say that acquires and releases act on caches, we are using shorthand for a number of possible effects.
Doug Lea's JSR-133 Cookbook has a grid which shows the reordering possibilities. A blank entry in the grid means that reordering is allowed. In your case, entering a synchronized block is a "MonitorEnter" (same reordering limitations as loading of a volatile field) and exiting a synchronized block is a "MonitorExit" (same as storing to a volatile field).

Yes and no.
The order must be consistent.
Suppose you are creating a transaction between two bank accounts, and always grab the sender's lock first, then grab the receiver's lock. Problem is - say both Dan and Bob want to transfer money to each other at the same time.
Thread 1 might grab Dan's lock, as it processes Dan's transaction to Bob.
Then thread 2 grab's Bob's lock, as it processes Bob's transaction to Dan.
Then, bam, deadlock.
The morals are:
Lock less.
Read Java: Concurrency in Practice. My example is taken from there. I like arguing about the merits of books in programming as much as the next guy, but it's pretty rare you get comprehensive coverage of a difficult topic between two covers, so enjoy it.
So this is the part of the answer where I guess at other things you might have been trying to ask instead, because the expectation is firmly on me that I act psychic.
The JVM will not acquire the locks in an order different from which you have programmed. How do I know this? Because otherwise it would not be possible to solve the problem in the first half of my answer.

Synchronized statements are never reordered by the compiler as it has a big effect on what ends up happening.
Synchronized blocks are used to obtain a lock on the specific Object placed between the synchronized parenthesis.
private final Object LOCK_1 = new Object();
public void foo(){
synchronized(LOCK_1){
//code here...
}
}
Obtains the lock for Object LOCK_1 and releases it when the synchronization block completes. Since synchronization blocks are used to guard against concurrent access it may be sometimes required to use multiple locks especially when multiple thread-unsafe objects are being written/read to/from.
Consider the following code that uses a nested synchronization block:
private final Object LOCK_1 = new Object();
private final Object LOCK_2 = new Object();
public void bar(){
synchronized(LOCK_1){
//Point A
synchronized(LOCK_2){
//Point B
}
//Point C
}
//Point D
}
If we look at points A,B,C,D we can realize why the order of synchronization matters.
First at point A, the lock for LOCK_1 is obtained therefore any other threads trying to obtain LOCK_1 is put into a queue.
At point B, the currently executing thread owns the lock for both LOCK_1 and LOCK_2.
At point C, the currently executing thread has released the lock for LOCK_2
At point D, the currently executing thread has released all locks.
If we flip this example around and decided to put LOCK_2 on the outer block, you will realize that the thread's order of obtaining locks changes which has a big effect on what it ends up doing. Normally, when I make programs with synchronization blocks I use one MUTEX object per thread-unsafe resource I am accessing (or one MUTEX per group). Say I want to read from a stream using LOCK_1 and write to a stream using LOCK_2. It would be illogical to think that swapping the locking order around means the same thing.
Consider that LOCK_2 (the writing lock) is being held by another thread. If we have LOCK_1 on the outer block the currently executing thread can at least process all the reading code before being put into a queue for the writing lock (essentially, the ability to execute code at point A). If we flipped the order of the locks around, the currently executing thread will end up having to wait for the writing is complete, then proceed onto reading and writing whilst holding onto the writing lock (all the way through reading too).
Another problem that comes up when the order of locks are switched (and not consistently, some code has LOCK_1 first and others have LOCK_2 first). Consider that two threads both eagerly try to execute code which have different locking orders. Thread 1 obtains LOCK_1 in the outer block and thread 2 obtains LOCK_2 from the outer block. Now when thread 1 tries to obtain LOCK_2, it can't since thread 2 has it. And when thread 2 tries to obtain LOCK_1, it can't either because thread 1 has it. The two threads essentially block on each other forever, forming a deadlock situation.
To answer your question, if you want to lock on two objects immediately without doing any sort of processing between locks then the order is irrelevant (essentially no processing at point A or C). HOWEVER it is essential to keep the order consistent throughout your program as to avoid deadlocking.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.