From the Source Code of ConcurrentHashMap
/**
171 * Number of unsynchronized retries in size and containsValue
172 * methods before resorting to locking. This is used to avoid
173 * unbounded retries if tables undergo continuous modification
174 * which would make it impossible to obtain an accurate result.
175 */
176 static final int RETRIES_BEFORE_LOCK = 2;
1.I have read iteration doesn't hold lock,so what does above statement means?
Do operations like get can also hold lock?Please provide scenario also.
2.Will the update operation running in thread 1 be visible to iteration in thread 2 if iteration over that element is yet to be made?(volatility and visibility?)
3.Are there any other situations besides updation where locking is done?
4.When getting data, a volatile read is used. If the volatile read results in a miss, then the lock for the segment is obtained for a last attempt at a successful read.What does this mean?What is volatile read?
I have read iteration doesn't hold lock,so what does above statement
means?
One can argue that the size method can go ahead and never hold a lock. But this implementation will get the size of the ConcurrentHashMap twice, if the size of first does not equal second it will retry. If the same is true it will lock all segments and get the size a last time.
Do operations like get can also hold lock? Please provide scenario
also.
Technically yes it can but probably will never happen. In the event a JVM publishes one of the CHM's entry entry value after the entry is made available the CHM will do the read under segment lock (again this probably will never happen).
Java 8 is releasing a new implementation of CHM so this is probably going to be outdated soon.
Will the update operation running in thread 1 be visible to iteration
in thread 2 if iteration over that element is yet to be
made?(volatility?)
If thread 1 issues the put prior to thread 2 issuing the get, then thread 2 will see the updated entry.
If, while thread 2 is doing a get at the same time thread 1 is doing the put then thread 2 may or may not see the entry (or it may, it's a matter of timing). This is because get is non-blocking. This is still thread safe because the CHM says that it will return the entry as it is at the moment you are searching the map.
Are there any other situations besides updation where locking is done?
Besides all modification methods, serialization requires a lock.
When getting data, a volatile read is used. If the volatile read
results in a miss, then the lock for the segment is obtained for a
last attempt at a successful read.What does this mean?What is volatile
read?
I have already eluded to this with my "technically" answer. What you are referring to is the readValueUnderLock method. According to the JMM the write of a non-final field, whether inline construction or within the constructor, inside an object can be published after the Object is available to another thread.
So
public Entry{
volatile Object value;
public Entry(Object v){
value = v;
}
}
Thread 1
Entry e = new Entry(new Object());
Thread 2
if(e != null)
Object value = e.value; // here, according to the JMM, value can be null.
// If value were final it would never be null
Reading the value under the lock synchronizes the read with the previous write from the other thread so that will always prevent a null value.
That all being said, this scenario is either highly unlikely or under x86 arch impossible.
Related
Imagine having a main thread which creates a HashSet and starts a lot of worker threads passing HashSet to them.
Just like in code below:
void main() {
final Set<String> set = new HashSet<>();
final ExecutorService threadExecutor =
Executors.newFixedThreadPool(10);
threadExecutor.submit(() -> doJob(set));
}
void doJob(final Set<String> pSet) {
// do some stuff
final String x = ... // doesn't matter how we received the value.
if (!pSet.contains(x)) {
synchronized (pSet) {
// double check to prevent multiple adds within different threads
if (!pSet.contains(x)) {
// do some exclusive work with x.
pSet.add(x);
}
}
}
// do some stuff
}
I'm wondering is it thread-safe to synchronize only on add method? Is there any possible issues if contains is not synchronized?
My intuition telling me this is fine, after leaving synchronized block changes made to set should be visible to all threads, but JMM could be counter-intuitive sometimes.
P.S. I don't think it's a duplicate of How to lock multiple resources in java multithreading
Even though answers to both could be similar, this question addresses more particular case.
I'm wondering is it thread-safe to synchronize only on the add method? Are there any possible issues if contains is not synchronized as well?
Short answers: No and Yes.
There are two ways of explaining this:
The intuitive explanation
Java synchronization (in its various forms) guards against a number of things, including:
Two threads updating shared state at the same time.
One thread trying to read state while another is updating it.
Threads seeing stale values because memory caches have not been written to main memory.
In your example, synchronizing on add is sufficient to ensure that two threads cannot update the HashSet simultaneously, and that both calls will be operating on the most recent HashSet state.
However, if contains is not synchronized as well, a contains call could happen simultaneously with an add call. This could lead to the contains call seeing an intermediate state of the HashSet, leading to an incorrect result, or worse. This can also happen if the calls are not simultaneous, due to changes not being flushed to main memory immediately and/or the reading thread not reading from main memory.
The Memory Model explanation
The JLS specifies the Java Memory Model which sets out the conditions that must be fulfilled by a multi-threaded application to guarantee that one thread sees the memory updates made by another. The model is expressed in mathematical language, and not easy to understand, but the gist is that visibility is guaranteed if and only if there is a chain of happens before relationships from the write to a subsequent read. If the write and read are in different threads, then synchronization between the threads is the primary source of these relationships. For example in
// thread one
synchronized (sharedLock) {
sharedVariable = 42;
}
// thread two
synchronized (sharedLock) {
other = sharedVariable;
}
Assuming that the thread one code is run before the thread two code, there is a happens before relationships between thread one releasing the lock and thread two acquiring it. With this and the "program order" relations, we can build a chain from the write of 42 to the assignment to other. This is sufficient to guarantee that other will be assigned 42 (or possibly a later value of the variable) and NOT any value in sharedVariable before 42 was written to it.
Without the synchronized block synchronizing on the same lock, the second thread could see a stale value of sharedVariable; i.e. some value written to it before 42 was assigned to it.
That code is thread safe for the the synchronized (pSet) { } part :
if (!pSet.contains(x)) {
synchronized (pSet) {
// Here you are sure to have the updated value of pSet
if (!pSet.contains(x)) {
// do some exclusive work with x.
pSet.add(x);
}
}
because inside the synchronized statement on the pSet object :
one and only one thread may be in this block.
and inside it, pSet has also its updated state guaranteed by the happens-before relationship with the synchronized keyword.
So whatever the value returned by the first if (!pSet.contains(x)) statement for a waiting thread, when this waited thread will wake up and enter in the synchronized statement, it will set the last updated value of pSet. So even if the same element was added by a previous thread, the second if (!pSet.contains(x)) would return false.
But this code is not thread safe for the first statement if (!pSet.contains(x)) that could be executed during a writing on the Set.
As a rule of thumb, a collection not designed to be thread safe should not be used to perform concurrently writing and reading operations because the internal state of the collection could be in a in-progress/inconsistent state for a reading operation that would occur meanwhile a writing operation.
While some no thread safe collection implementations accept such a usage in the facts, that is not guarantee at all that it will always be true.
So you should use a thread safe Set implementation to guarantee the whole thing thread safe.
For example with :
Set<String> pSet = ConcurrentHashMap.newKeySet();
That uses under the hood a ConcurrentHashMap, so no lock for reading and a minimal lock for writing (only on the entry to modify and not the whole structure).
No,
You don't know in what state the Hashset might be during add by another Thread. There might be fundamental changes ongoing, like splitting of buckets, so that contains may return false during the adding by another thread, even if the element would be there in a singlethreaded HashSet. In that case you would try to add an element a second time.
Even Worse Scenario: contains might get into an endless loop or throw an exception because of an temporary invalid state of the HashSet in the memory used by the two threads at the same time.
I have a usecase where multiple threads can be reading and modifying an ArrayList where the default values for these booleans are True.
The only modification the threads can make is setting an element of that ArrayList from True to False.
All of the threads will be also reading the ArrayList concurrently, but it is okay to read staled values.
Note:
The size of the ArrayList will not change throughout the lifetime of the ArrayList.
Question:
Is it necessary to synchronize the ArrayList across these threads? The only synchronization I'm doing is marking the ArrayList as volatile such that any update to it will be flushed back to the main memory from a thread's local memory. Is this enough?
Here is a sample code on how this ArrayList gets used by threads
myList is the ArrayList in question and its values are initialized to True
if (!myList.get(index)) {
return;
} else {
// do some operations to determine if we should
// update the value of myList to False.
if (needToUpdateList) {
myList.set(index, False);
}
}
Update
I previously said these threads do not care about staled values which is true. However, I have another thread that only reads these values and perform one final action. This thread does care about staleness. Does the synchronization requirement change?
Is there a cheaper way to "publish" the updated values besides requiring synchronization on every update? I'm trying to minimize locking as much as possible.
As it says in the Javadoc of ArrayList:
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally.
You're not modifying the list structurally, so you don't need to synchronize for that reason.
The other reason you'd want to synchronize is to avoid reading stale values; but you say that don't care about that.
As such there is no reason to synchronize.
Edit for the update #3
If you do care about reading stale values, you do need to synchronize.
An alternative to synchronization which would avoid locking the entire list would be to make it a List<AtomicBoolean>: this would not require synchronization of the list, because you aren't changing the values stored in the list; but reads of an AtomicBoolean value guarantees visibility.
It depends on what you want to do when an element is true. Consider your code, with a separate thread messing with the value you're looking at:
if (!myList.get(index)) { // <--- at this point, the value is True, so go to else branch
return;
} else {
// <--- at this point, other thread sets value to False.
// do some operations to determine if we should
// update the value of myList to False.
// **** Do these operations assume the value is still True?
// **** If so, then your application is not thread-safe.
if (needToUpdateList) {
myList.set(index, False);
}
}
Update
I previously said these threads do not care about staled values which is true. However, I have another thread that only reads these values and perform one final action. This thread does care about staleness. Does the synchronization requirement change?
You just invalidated a lot of perfectly good answers.
Yes, synchronization matters now. In fact probably atomicity matters too. Use a synchronized List or maybe even a Concurrent list or map of some sort.
Anytime you read-modify-write the list, you probably also have to hold the synchronized lock to preserve atomicity:
synchronized( myList ) {
if (!myList.get(index)) {
return;
} else {
// do some operations to determine if we should
// update the value of myList to False.
if (needToUpdateList) {
myList.set(index, False);
}
}
}
Edit: to reduce the time the lock is held, a lot depends on how long "do some operations" take, but a CocurrentHashMap reduces lock contention at the cost of some additional overhead. It might be worth profiling your code to determine the actual overhead and which method is faster/better.
ConcurrentHashMap<Integer,Boolean> myList = new ConcurrentHashMap<>();
//...
if( myList.get( index ) != null ) return;
// "do some opertaions" here
if( needToUpdate )
myList.put( index, false );
But I'm still not convinced that this isn't premature optimization. Write correct code first, fully synchronizing the list is probably fine. Then profile working code. If you do find a bottleneck, then you can worry about whether reducing lock contention is a good idea. But probably the code bottleneck won't be in the lock contention and will in fact be somewhere totally different.
I did some more googling and found that each thread might be storing the values in the registry or the local cache. The specs only offer some guarantee on when data would be written to shared/global memory.
https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4.5
basically volatile, synchronized, thread.start(), thread.join()...
So yeah using the AtmoicBoolean will probably be the easiest, but you can also synchronize or make a class with a volatile boolean in it.
check this link out:
http://tutorials.jenkov.com/java-concurrency/volatile.html#variable-visibility-problems
This question already has answers here:
What is the difference between atomic / volatile / synchronized?
(7 answers)
Closed 9 years ago.
I read somewhere below line.
Java volatile keyword doesn't means atomic, its common misconception
that after declaring volatile, ++ operation will be atomic, to make
the operation atomic you still need to ensure exclusive access using
synchronized method or block in Java.
So what will happen if two threads attack a volatile primitive variable at same time?
Does this mean that whosoever takes lock on it, that will be setting its value first. And if in meantime, some other thread comes up and read old value while first thread was changing its value, then doesn't new thread will read its old value?
What is the difference between Atomic and volatile keyword?
The effect of the volatile keyword is approximately that each individual read or write operation on that variable is made atomically visible to all threads.
Notably, however, an operation that requires more than one read/write -- such as i++, which is equivalent to i = i + 1, which does one read and one write -- is not atomic, since another thread may write to i between the read and the write.
The Atomic classes, like AtomicInteger and AtomicReference, provide a wider variety of operations atomically, specifically including increment for AtomicInteger.
Volatile and Atomic are two different concepts. Volatile ensures, that a certain, expected (memory) state is true across different threads, while Atomics ensure that operation on variables are performed atomically.
Take the following example of two threads in Java:
Thread A:
value = 1;
done = true;
Thread B:
if (done)
System.out.println(value);
Starting with value = 0 and done = false the rule of threading tells us, that it is undefined whether or not Thread B will print value. Furthermore value is undefined at that point as well! To explain this you need to know a bit about Java memory management (which can be complex), in short: Threads may create local copies of variables, and the JVM can reorder code to optimize it, therefore there is no guarantee that the above code is run in exactly that order. Setting done to true and then setting value to 1 could be a possible outcome of the JIT optimizations.
volatile only ensures, that at the moment of access of such a variable, the new value will be immediately visible to all other threads and the order of execution ensures, that the code is at the state you would expect it to be. So in case of the code above, defining done as volatile will ensure that whenever Thread B checks the variable, it is either false, or true, and if it is true, then value has been set to 1 as well.
As a side-effect of volatile, the value of such a variable is set thread-wide atomically (at a very minor cost of execution speed). This is however only important on 32-bit systems that i.E. use long (64-bit) variables (or similar), in most other cases setting/reading a variable is atomic anyways. But there is an important difference between an atomic access and an atomic operation. Volatile only ensures that the access is atomically, while Atomics ensure that the operation is atomically.
Take the following example:
i = i + 1;
No matter how you define i, a different Thread reading the value just when the above line is executed might get i, or i + 1, because the operation is not atomically. If the other thread sets i to a different value, in worst case i could be set back to whatever it was before by thread A, because it was just in the middle of calculating i + 1 based on the old value, and then set i again to that old value + 1. Explanation:
Assume i = 0
Thread A reads i, calculates i+1, which is 1
Thread B sets i to 1000 and returns
Thread A now sets i to the result of the operation, which is i = 1
Atomics like AtomicInteger ensure, that such operations happen atomically. So the above issue cannot happen, i would either be 1000 or 1001 once both threads are finished.
There are two important concepts in multithreading environment:
atomicity
visibility
The volatile keyword eradicates visibility problems, but it does not deal with atomicity. volatile will prevent the compiler from reordering instructions which involve a write and a subsequent read of a volatile variable; e.g. k++.
Here, k++ is not a single machine instruction, but three:
copy the value to a register;
increment the value;
place it back.
So, even if you declare a variable as volatile, this will not make this operation atomic; this means another thread can see a intermediate result which is a stale or unwanted value for the other thread.
On the other hand, AtomicInteger, AtomicReference are based on the Compare and swap instruction. CAS has three operands: a memory location V on which to operate, the expected old value A, and the new value B. CAS atomically updates V to the new value B, but only if the value in V matches the expected old value A; otherwise, it does nothing. In either case, it returns the value currently in V. The compareAndSet() methods of AtomicInteger and AtomicReference take advantage of this functionality, if it is supported by the underlying processor; if it is not, then the JVM implements it via spin lock.
As Trying as indicated, volatile deals only with visibility.
Consider this snippet in a concurrent environment:
boolean isStopped = false;
:
:
while (!isStopped) {
// do some kind of work
}
The idea here is that some thread could change the value of isStopped from false to true in order to indicate to the subsequent loop that it is time to stop looping.
Intuitively, there is no problem. Logically if another thread makes isStopped equal to true, then the loop must terminate. The reality is that the loop will likely never terminate even if another thread makes isStopped equal to true.
The reason for this is not intuitive, but consider that modern processors have multiple cores and that each core has multiple registers and multiple levels of cache memory that are not accessible to other processors. In other words, values that are cached in one processor's local memory are not visisble to threads executing on a different processor. Herein lies one of the central problems with concurrency: visibility.
The Java Memory Model makes no guarantees whatsoever about when changes that are made to a variable in one thread may become visible to other threads. In order to guarantee that updates are visisble as soon as they are made, you must synchronize.
The volatile keyword is a weak form of synchronization. While it does nothing for mutual exclusion or atomicity, it does provide a guarantee that changes made to a variable in one thread will become visible to other threads as soon as it is made. Because individual reads and writes to variables that are not 8-bytes are atomic in Java, declaring variables volatile provides an easy mechanism for providing visibility in situations where there are no other atomicity or mutual exclusion requirements.
The volatile keyword is used:
to make non atomic 64-bit operations atomic: long and double. (all other, primitive accesses are already guaranteed to be atomic!)
to make variable updates guaranteed to be seen by other threads + visibility effects: after writing to a volatile variable, all the variables that where visible before writing that variable become visible to another thread after reading the same volatile variable (happen-before ordering).
The java.util.concurrent.atomic.* classes are, according to the java docs:
A small toolkit of classes that support lock-free thread-safe
programming on single variables. In essence, the classes in this
package extend the notion of volatile values, fields, and array
elements to those that also provide an atomic conditional update
operation of the form:
boolean compareAndSet(expectedValue, updateValue);
The atomic classes are built around the atomic compareAndSet(...) function that maps to an atomic CPU instruction. The atomic classes introduce the happen-before ordering as the volatile variables do. (with one exception: weakCompareAndSet(...)).
From the java docs:
When a thread sees an update to an atomic variable caused by a
weakCompareAndSet, it does not necessarily see updates to any other
variables that occurred before the weakCompareAndSet.
To your question:
Does this mean that whosoever takes lock on it, that will be setting
its value first. And in if meantime, some other thread comes up and
read old value while first thread was changing its value, then doesn't
new thread will read its old value?
You don't lock anything, what you are describing is a typical race condition that will happen eventually if threads access shared data without proper synchronization. As already mentioned declaring a variable volatile in this case will only ensure that other threads will see the change of the variable (the value will not be cached in a register of some cache that is only seen by one thread).
What is the difference between AtomicInteger and volatile int?
AtomicInteger provides atomic operations on an int with proper synchronization (eg. incrementAndGet(), getAndAdd(...), ...), volatile int will just ensure the visibility of the int to other threads.
So what will happen if two threads attack a volatile primitive variable at same time?
Usually each one can increment the value. However sometime, both will update the value at the same time and instead of incrementing by 2 total, both thread increment by 1 and only 1 is added.
Does this mean that whosoever takes lock on it, that will be setting its value first.
There is no lock. That is what synchronized is for.
And in if meantime, some other thread comes up and read old value while first thread was changing its value, then doesn't new thread will read its old value?
Yes,
What is the difference between Atomic and volatile keyword?
AtomicXxxx wraps a volatile so they are basically same, the difference is that it provides higher level operations such as CompareAndSwap which is used to implement increment.
AtomicXxxx also supports lazySet. This is like a volatile set, but doesn't stall the pipeline waiting for the write to complete. It can mean that if you read a value you just write you might see the old value, but you shouldn't be doing that anyway. The difference is that setting a volatile takes about 5 ns, bit lazySet takes about 0.5 ns.
This is a code snippet from collections's SynchronizedMap. My question is not specific to the code snippet below - but a generic one: Why does a get operation need synchronization?
public V get(Object key) {
synchronized (mutex) {return m.get(key);}
}
If your threads are only ever getting from the Map, the synchronization is not needed. In this case it might be a good idea to express this fact by using an immutable map, like the one from the Guava libraries, this protects you at compile time from accidentally modifying the map anyway.
The trouble begins when multiple threads are reading and modifying the map, because the internal structure of, e.g. the HashMap implementation from the Java standard libraries is not prepared for that. In this case you can either wrap an external serialization layer around that map, like
using the synchronized keyword,
slightly safer would be to use a SynchronizedMap, because then you can't forget the synchonized keyword everywhere it's needed,
protect the map using a ReadWriteLock, which would allow multiple concurrently reading threads (which is fine)
switch to an ConcurrentHashMap altogether, which is prepared for being accessed by multiple threads.
But coming back to you original question, why is the synchronization needed in the first place: This is a bit hard to tell without looking at the code of the class. Possibly it would break when the put or remove from one thread causes the bucket count to change, which would cause a reading thread to see too many / too few elements because the resize is not finished yet. Maybe something completely different, I don't know and it's not really important because the exact reason(s) why it is unsafe can change at any time with a new Java release. The important fact is only that it is not supported and your code will likely blow up one or another way at runtime.
If the table gets resized in the middle of the call to get(), it could potentially look in the wrong bucket and return null incorrectly.
Consider the steps that happen in m.get():
A hash is calculated for the key.
The current length of the table (the buckets in the HashMap) is read.
This length is used to calculate the correct bucket to get from the table.
The bucket is retrieved and the entries in the bucket are walked until a match is found or until the end of the bucket is reached.
If another thread changes the map and causes the table to be resized in between 2 & 3, the wrong bucket could be used to look for the entry, potentially giving an incorrect result.
The reason why synchronization is needed in a concurrent environment is, that java operations aren't atomic. This means that a single java operation like counter++ causes the underlaying VM to execute more than one machine operation.
Read value
Increment value
Write value
While those three operations are performed, another thread called T2 may be invoked and read the old value e.g 10 of that variable. T1 increments that value und writes the value 11 back. But T2 has read value 10! In cas that T2 should also increment this value, the result stays the same, namely 11 instead of 12.
The synchronisation will avoid such concurrent errors.
T1:
Set synchronizer token
Read value
Another thread T2 was invoked and tries to read the value. But since the synchronizer token was already set, T2 has to wait.
Increment value
Write value
Remove synchronizer token
T2:
Set synchronizer token
Read value
Increment value
Write value
Remove synchronizer token
By synchronising the get method you are forcing the thread to cross the memory barrier and read the value from the main memory. If you wouldn't synchronise the get method then the JVM takes liberties to apply underlying optimisations that might result in that thread reading blissfully unaware a stale value stored in registers and caches.
I'm study java.util.concurrent library and find many infinite loops in the source code, like this one
//java.util.concurrent.atomic.AtomicInteger by Doug Lea
public final int getAndSet(int newValue) {
for (;;) {
int current = get();
if (compareAndSet(current, newValue))
return current;
}
}
I wonder, in what cases actual value can not be equal to the expected value (in this case compareAndSet returns false)?
Many modern CPUs have compareAndSet() map to an atomic hardware operation. That means, it is threadsafe without requiring synchronization (which is a relatively expensive operation in comparison). However, it's only compareAndSet() itself with is atomic, so in order to getAndSet() (i.e. set the variable to a given value and return the value it had at that time, without the possibility of it being set to a different value in between) the code uses a trick: first it gets the value, then it attempts compareAndSet() with the value it just got and the new value. If that fails, the variable was manipulated by another thread inbetween, and the code tries again.
This is faster than using synchronization if compareAndSet() fails rarely, i.e. if there are not too many threads writing to the variable at the same time. In an extreme case where many threads are writing to the variable at all times, synchronization can actually be faster because while there is an overhead to synchronize, other threads trying to access the variable will wait and be woken up when it's their turn, rather than having to retry the operation repeatedly.
When the value is modified in another thread, the get() and compareAndSet() can see different values. This is the sort of thing a concurrent library needs to worry about.
This is not an infinite loop, it is good practice when dealing with a TAS (test and set) algorithm. What the loop does is (a) read from memory (should be volatile semantics) (b) compute a new value (c) write the new value if the old value has not in the meantime changed.
In database land this is known as optimistic locking. It leverages the fact that most concurrent updates to shared memory are uncontended, and in that case, this is the cheapest possible way to do it.
In fact, this is basically what an unbiased Lock will do in the uncontended case. It will read the value of the lock, and if it is unlocked, it will do a CAS of the thread ID and if that succeeds, the lock is now held. If it fails, someone else got the lock first. Locks though deal with the failure case in a much more sophisticated way than merely retrying the op over and over again. They'll keep reading it for a little while incase the lock is quickly unlocked (spin-locking) then usually go to sleep for bit to let other threads in until their turn (exponential back-off).
Here is an actual usage of the compareAndSet operation: Imagine that you design an algorithm that calculates something in multiple threads.
Each thread remembers an old value and based on it performs a complicated calculation.
Then it wants to set the new result ONLY if the old value hasn't been already changed by another calculation thread. If the old value is not the expected one the thread discards its own work, takes a new value and restarts the calculations. It uses compareAndSet for that.
Further other threads are guaranteed to get only fresh values to continue the calculations.
The "infinite" loops are used to implement "busy waiting" which might be much less expensive than putting the thread to sleep especially when the thread contention is low.
Cheers!