Concurrent Modification of HashMap Members

Concurrent Modification of HashMap Members - java

I have a map defined
private static HashMap<Object, Object> myMap;
It is populated in a single thread and then that single thread spawns more threads that alter the data inside the map elements, but do not alter the map structure (no removes/puts/etc). Also, each thread alters one and only one unique member of the map (no two threads alter the same member).
My question is: will the main thread see the changes to the hashmap members, once all changes are complete? If not, would adding volatile to the declaration work, or would it only guarantee other threads see changes to the structure? Thanks
EDIT: Code that hopefully highlights what I'm doing in a more clear way
import java.util.HashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class TestingRandomStuff {
public static void main(String[] args) throws Exception {
HashMap<Object, Object> myMap = new HashMap();
ExecutorService pool = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
//myMap is populated by this thread, and the objects inside are initialized, but are left largely empty
populate();
for (Object o : myMap.values()) {
Runnable r = new Task(o);
pool.execute(r);
}
pool.shutdown();
try {
pool.awaitTermination(Long.MAX_VALUE, TimeUnit.SECONDS);
} catch (InterruptedException e) {;}
//How do I gurantee that the objects inside myMap are displayed correctly with all the data that was just loaded by seperate threads,
//or is it already guranteed?
displayObjectData();
}
public static class Task implements Runnable {
private Object o;
public Task(Object o) {this.o = o;}
public void run() {
try {
o.load(); //o contains many complicated instance variables that will be created and written to
} catch (Exception e) {;}
}
}
}

EDIT: in your example, the map doesn't get accessed in other threads, only objects which are referenced by the map.
The objects themselves should be thread safe due to the way they are being used.
Note: if you used a parallelStream() the code would be simpler.
will other threads see the changes to the hashmap members?
Probably, but there is no guarentee
If not, would adding volatile to the declaration work,
volatile on the field only adds a read barrier on the reference to the Map. (Unless you change the field to point to another Map when you will get a write barrier.)
or would it only guarantee other threads see changes to the structure?
No, only guaranteed see changes to the myMap reference not changes to the Map or anything in the Map. i.e. the guarantee is very shallow.
There is a number of ways you can provide thread safety however the simplest is to synchronized the object in the on write and read. There are tricks you can do with volatile fields however it is high dependant on what you are doing as to whether thi will work.

You could use ConcurrentHashMap.
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset.
This means changes done by a thread are visible to another thread reading value for the same key. But the two can interfere.

In Java, there are no objects embedded in other objects, all data structures consisting of multiple objects are reference graphs, or in other words, a collection of objects is a collection of references and a map with key and value objects is a map containing references to these objects.
Therefore, your objects never became “hashmap members”, but are just referenced by your hashmap. So, to discuss the thread safety of your code, the existence of the HashMap is just a red herring, as your multi-threaded code never sees any artifact of the HashMap.
You have code creating several distinct object, then submitting Task instances, each containing a reference to one of these objects, to an ExecutorService to be processed. Assuming that these object do not share mutable state, this is a straight-forward thread safe approach.
After waiting for the completion of all jobs, the main thread can be sure to see the result of all actions made within the jobs, i.e. the modifications made to these objects. It will again be entirely irrelevant whether you use that HashMap or anything else to get a reference to one of these objects to look at these modifications.
It would be different if you were modifying keys of a map in a way that it affects their equality or hash code, but that’s independent from thread safety concerns. You must never modify map keys in such a way, even in single threaded code, and violating this contract would even break thread safe maps. But since your objects are referenced as values, there is no such problem.
There is only one corner case to pay attention to. Your waiting for completion contains the line catch (InterruptedException e) {;}, so there is no full guaranty that after the execution of the statement truly all jobs have been completed, but this is the requirement for the visibility guaranty. Since you seem to assume that interruption should never happen, you should use something like catch(InterruptedException e) { throw new AssertionError(e); }, to be sure that violations of that assumption do not happen silently, while at the same time, get the full visibility guaranty, as now the displayObjectData(); statement can only be reached when all jobs have been completed (or Long.MAX_VALUE seconds elapsed which no-one of us will ever witness).

Related

Is it thread-safe to synchronize only on add to HashSet?

Imagine having a main thread which creates a HashSet and starts a lot of worker threads passing HashSet to them.
Just like in code below:
void main() {
final Set<String> set = new HashSet<>();
final ExecutorService threadExecutor =
Executors.newFixedThreadPool(10);
threadExecutor.submit(() -> doJob(set));
}
void doJob(final Set<String> pSet) {
// do some stuff
final String x = ... // doesn't matter how we received the value.
if (!pSet.contains(x)) {
synchronized (pSet) {
// double check to prevent multiple adds within different threads
if (!pSet.contains(x)) {
// do some exclusive work with x.
pSet.add(x);
}
}
}
// do some stuff
}
I'm wondering is it thread-safe to synchronize only on add method? Is there any possible issues if contains is not synchronized?
My intuition telling me this is fine, after leaving synchronized block changes made to set should be visible to all threads, but JMM could be counter-intuitive sometimes.
P.S. I don't think it's a duplicate of How to lock multiple resources in java multithreading
Even though answers to both could be similar, this question addresses more particular case.

I'm wondering is it thread-safe to synchronize only on the add method? Are there any possible issues if contains is not synchronized as well?
Short answers: No and Yes.
There are two ways of explaining this:
The intuitive explanation
Java synchronization (in its various forms) guards against a number of things, including:
Two threads updating shared state at the same time.
One thread trying to read state while another is updating it.
Threads seeing stale values because memory caches have not been written to main memory.
In your example, synchronizing on add is sufficient to ensure that two threads cannot update the HashSet simultaneously, and that both calls will be operating on the most recent HashSet state.
However, if contains is not synchronized as well, a contains call could happen simultaneously with an add call. This could lead to the contains call seeing an intermediate state of the HashSet, leading to an incorrect result, or worse. This can also happen if the calls are not simultaneous, due to changes not being flushed to main memory immediately and/or the reading thread not reading from main memory.
The Memory Model explanation
The JLS specifies the Java Memory Model which sets out the conditions that must be fulfilled by a multi-threaded application to guarantee that one thread sees the memory updates made by another. The model is expressed in mathematical language, and not easy to understand, but the gist is that visibility is guaranteed if and only if there is a chain of happens before relationships from the write to a subsequent read. If the write and read are in different threads, then synchronization between the threads is the primary source of these relationships. For example in
// thread one
synchronized (sharedLock) {
sharedVariable = 42;
}
// thread two
synchronized (sharedLock) {
other = sharedVariable;
}
Assuming that the thread one code is run before the thread two code, there is a happens before relationships between thread one releasing the lock and thread two acquiring it. With this and the "program order" relations, we can build a chain from the write of 42 to the assignment to other. This is sufficient to guarantee that other will be assigned 42 (or possibly a later value of the variable) and NOT any value in sharedVariable before 42 was written to it.
Without the synchronized block synchronizing on the same lock, the second thread could see a stale value of sharedVariable; i.e. some value written to it before 42 was assigned to it.

That code is thread safe for the the synchronized (pSet) { } part :
if (!pSet.contains(x)) {
synchronized (pSet) {
// Here you are sure to have the updated value of pSet
if (!pSet.contains(x)) {
// do some exclusive work with x.
pSet.add(x);
}
}
because inside the synchronized statement on the pSet object :
one and only one thread may be in this block.
and inside it, pSet has also its updated state guaranteed by the happens-before relationship with the synchronized keyword.
So whatever the value returned by the first if (!pSet.contains(x)) statement for a waiting thread, when this waited thread will wake up and enter in the synchronized statement, it will set the last updated value of pSet. So even if the same element was added by a previous thread, the second if (!pSet.contains(x)) would return false.
But this code is not thread safe for the first statement if (!pSet.contains(x)) that could be executed during a writing on the Set.
As a rule of thumb, a collection not designed to be thread safe should not be used to perform concurrently writing and reading operations because the internal state of the collection could be in a in-progress/inconsistent state for a reading operation that would occur meanwhile a writing operation.
While some no thread safe collection implementations accept such a usage in the facts, that is not guarantee at all that it will always be true.
So you should use a thread safe Set implementation to guarantee the whole thing thread safe.
For example with :
Set<String> pSet = ConcurrentHashMap.newKeySet();
That uses under the hood a ConcurrentHashMap, so no lock for reading and a minimal lock for writing (only on the entry to modify and not the whole structure).

No,
You don't know in what state the Hashset might be during add by another Thread. There might be fundamental changes ongoing, like splitting of buckets, so that contains may return false during the adding by another thread, even if the element would be there in a singlethreaded HashSet. In that case you would try to add an element a second time.
Even Worse Scenario: contains might get into an endless loop or throw an exception because of an temporary invalid state of the HashSet in the memory used by the two threads at the same time.

Is iterating over a list retrieved in a synchronized block thread-safe?

I am a bit confused regarding one pattern I have seen in some legacy code of ours.
The controller uses a map as a cache, with an approach that should be thread safe, however I am still not confident it indeed is. We have a map, which is properly synchronized during addition and retrieval, however, there is a bit of logic outside of the synchronized block, that does some additional filtering.
(the map itself and the lists are never accessed outside of this method, so concurrent modification is not an issue; the map holds some stable parameters, which basically never change, but are used often).
The code looks like the following sample:
public class FooBarController {
private final Map<String, List<FooBar>> fooBarMap =
new HashMap<String, List<FooBar>>();
public FooBar getFooBar(String key, String foo, String bar) {
List<FooBar> foobarList;
synchronized (fooBarMap) {
if (fooBarMap.get(key) == null) {
foobarList = queryDbByKey(key);
fooBarMap.put(key, foobarList);
} else {
foobarList = fooBarMap.get(key);
}
}
for(FooBar fooBar : foobarList) {
if(foo.equals(fooBar.getFoo()) && bar.equals(fooBar.getBar()))
return fooBar;
}
return null;
}
private List<FooBar> queryDbByKey(String key) {
// ... (simple Hibernate-query)
}
// ...
}
Based on what I know about the JVM memory model, this should be fine, since if one thread populates a list, another one can only retrieve it from the map with proper synchronization in place, ensuring that the entries of the list is visible. (putting the list happens-before getting it)
However, we keep seeing cases, where an entry expected to be in the map is not found, combined with the typical notorious symptoms of concurrency issues (e.g. intermittent failures in production, which I cannot reproduce in my development environment; different threads can properly retrieve the value etc.)
I am wondering if iterating through the elements of the List like this is thread-safe?

The code you provided is correct in terms of concurrency. Here are the guarantees:
only one thread at a time adds values to map, because of synchronization on map object
values added by thread become visible for all other threads, that enter synchronized block
Given that, you can be sure that all threads that iterate a list see the same elements. The issues you described are indeed strange but I doubt they're related to the code you provided.

It could be thread safe only if all access too fooBarMap are synchronized. A little out of scope, but safer may be to use a ConcurrentHashmap.
There is a great article on how hashmaps can be synchronized here.

In situation like this it's best option to use ConcurrentHashMap.
Verify if all Update-Read are in order.
As I understood from your question. There are fix set of params which never changes. One of the ways I preferred in situation like this is:
I. To create the map cache during start up and keep only one instance of it.
II. Read the map Instance anytime anywhere in the application.

In the for loop you are returning reference to fooBar objects in the foobarList.
So the method calling getFooBar() has access to the Map through this fooBar reference object.
try to clone fooBar before returning from getFooBar()

Is the following code thread-safe? [duplicate]

This question already has answers here:
Java double checked locking
(11 answers)
Closed 7 years ago.
The following code uses a double checked pattern to initialize variables. I believe the code is thread safe, as the map wont partially assigned even if two threads are getting into getMap() method at the same time. So I don't have to make the map as volatile as well. Is the reasoning correct? NOTE: The map is immutable once it is initialized.
class A {
private Map<String, Integer> map;
private final Object lock = new Object();
public static Map<String, Integer> prepareMap() {
Map<String, Integer> map = new HashMap<>();
map.put("test", 1);
return map;
}
public Map<String, Integer> getMap() {
if (map == null) {
synchronized (lock) {
if (map == null) {
map = prepareMap();
}
}
}
return map;
}
}

According to the top names in the Java world, no it is not thread safe. You can read why here: http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
You better off using ConcurrentHashmap or synchronizing your Map.
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ConcurrentHashMap.html
Edit: If you only want to make the initialization of the map thread safe (so that two or more maps are not accidentally created) then you can do two things. 1) initialize the map when it is declared. 2) make the getMap() method synchronized.

No, your reasoning is wrong, access to the map is not thread safe, because the threads that call getMap() after the initialization may not invoke synchronized(lock) and thus are not in happens-before relation to other threads.
The map has to be volatile.

The code could be optimized by inlining to
public Map<String,Integer> getMap()
{
if(map == null)
{
synchronized(lock)
{
if(map == null)
{
map = new HashMap<>(); // partial map exposed
map.put("test", 1);
}
}
}
return map;
}
}
Having a HashMap under concurrent read and write is VERY dangerous, don't do it. Google HashMap infinite loop.
Solutions -
Expand synchronized to the entire method, so that reading map variable is also under lock. This is a little expensive.
Declare map as volatile, to prevent reordering optimization. This is simple, and pretty cheap.
Use an immutable map. The final fields will also prevent exposing partial object state. In your particular example, we can use Collections.singletonMap. But for maps with more entries, I'm not sure JDK has a public implementation.

This is just one example of how things can go wrong. To fully understand the issues, there is no substitute for reading The "Double-Checked Locking is Broken" Declaration, referenced in a prior answer.
To get anything approaching the full flavor, think about two processors, A and B, each with its own caches, and a main memory that they share.
Suppose Thread A, running on Processor A, first calls getMap. It does several assignments inside the synchronized block. Suppose the assignment to map gets written to main memory first, before Thread A reaches the end of the synchronized block.
Meanwhile, on Processor B, Thread B also calls getMap, and does not happen to have the memory location representing map in its cache. It goes out to main memory to get it, and its read happens to hit just after Thread A's assignment to map, so it sees a non-null map. Thread B does not enter the synchronized block.
At this point, Thread B can go ahead and attempt to use the HashMap, despite the fact that Thread A's work on creating it has not yet been written to main memory. Thread B may even have the memory pointed to by map in its cache because of a prior use.
If you are tempted to try to work around this, consider the following quote from the referenced article:
There are lots of reasons it doesn't work. The first couple of reasons
we'll describe are more obvious. After understanding those, you may be
tempted to try to devise a way to "fix" the double-checked locking
idiom. Your fixes will not work: there are more subtle reasons why
your fix won't work. Understand those reasons, come up with a better
fix, and it still won't work, because there are even more subtle
reasons.
This answer only contains one of the most obvious reasons.

No, it is not thread safe.
The basic reason is that you can have reordering of operations you don't even see in the Java code. Let's imagine a similar pattern with an even simpler class:
class Simple {
int value = 42;
}
In the analogous getSimple() method, you assign /* non-volatile */ simple = new Simple (). What happens here?
the JVM allocates some space for the new object
the JVM sets some bit of this space to 42 (for value)
the JVM returns the address of this space, which is then assigned to space
Without synchronization instructions to prohibit it, these instructions can be reordered. In particular, steps 2 and 3 can be ordered such that simple gets the new object's address before the constructor finishes! If another thread then reads simple.value, it'll see a value 0 (the field's default value) instead of 42. This is called seeing a partially-constructed object. Yes, that's weird; yes, I've seen things like that happen. It's a real bug.
You can imagine how if the object is a non-trivial object, like HashMap, the problem is even worse; there are a lot more operations, and so more possibilities for weird ordering.
Marking the field as volatile is a way of telling the JVM, "any thread that reads a value from this field must also read all operations that happened before that value was written." That prohibits those weird reorderings, which guarantees you'll see the fully-constructed object.

Unless you declare the lock as volatile, this code may be translated to non-thread-safe bytecode.
The compiler may optimize the expression map == null, cache the value of the expression and thus read the map property only once.
volatile Map<> map instructs the Java VM to always read the property map when it is accessed. Thsi would forbid such optimization from the complier.
Please refer to JLS Chapter 17. Threads and Locks

Declaring a hashmap inside a method

Local variables are thread safe in Java. Is using a hashmap declared inside a method thread safe?
For Example-
void usingHashMap()
{
HashMap<Integer> map = new HashMap<integer>();
}

When two threads run the same method here usingHashMap(), they are in no way way related. Each thread will create its own version of every local variable, and these variables will not interact with each other in any way
If variables aren't local,then they are attached to the instance. In this case, two threads running the same method both see the one variable, and this isn't threadsafe.
public class usingHashMapNotThreadSafe {
HashMap<Integer, String> map = new HashMap<Integer, String>();
public int work() {
//manipulating the hashmap here
}
}
public class usingHashMapThreadSafe {
public int worksafe() {
HashMap<Integer, String> map = new HashMap<Integer, String>();
//manipulating the hashmap here
}
}
While usingHashMapNotThreadSafe two threads running on the same instance of usingHashMapNotThreadSafe will see the same x. This could be dangerous, because the threads are trying to change map! In the second, two threads running on the same instance of usingHashMapThreadSafe will see totally different versions of x, and can't effect each other.

As long as the reference to the HashMap object is not published (is not passed to another method), it is threadsafe.
The same applies to the keys/values stored in the map. They need to be either immutable (cannot change their states after being created) or used only within this method.

I think to ensure complete concurrency, a ConcurrentHashMap should be used in any case. Even if it is local in scope. ConcurrentHashMap implements ConcurrentMap. The partitioning is essentially an attempt, as explained in the documentation to:
The table is internally partitioned to try to permit the indicated number of concurrent updates without contention. Because placement in hash tables is essentially random, the actual concurrency will vary. Ideally, you should choose a value to accommodate as many threads as will ever concurrently modify the table. Using a significantly higher value than you need can waste space and time, and a significantly lower value can lead to thread contention.

Volatile HashMap Not Updating Outside of Thead

so I have a HashMap that is declared in class level like so:
private static volatile HashMap<String, ArrayList>String>> map =
new HashMap<String, ArrayList>String>>();
I have several threads updating the same map and the threads are declared in the class level like so:
private class UpdateThread extends Thread {
#Override
public void run() {
// update map here
// map actually gets updated here
}
}
But after the threads exit:
for (FetchSKUsThread thread : listOfThreads) {
thread.start();
}
for (FetchSKUsThread thread : listOfThreads) {
try {
thread.join();
// map not updated anymore :-[
} catch (InterruptedException e) {
e.printStackTrace();
}
}
Why are the map changes that are occuring inside the thread not persisting after the thread is done? I've decalred the map static and volatile already...
Thanks in advance

Why are the map changes that are occurring inside the thread not persisting after the thread is done? I've declared the map static and volatile already...
It depends highly on how you are updating the map.
// update map here -- what's happening here?
As #Louis points out, if multiple threads are updating the same map instance, volatile won't help you and you should be using a ConcurrentHashMap. As #Gerhard points out, volatile is only protecting the updating of the HashMap reference and not the innards of the map itself. You need to fully lock the map if the threads are updating it in parallel or use a concurrent map.
However, if each thread is replacing the map with a new map then the volatile method would work. Then again, each thread may be overwriting the central map because of race conditions.
If you show us your update code, we should be able to explain it better.

The keyowrd volatile only makes the reference to the HashMap visible to all threads.
If you want to access a HashMap in several threads, you need to use a synchronized map. The easiest choices are using java.util.Hashtable or using Collections.synchronizedMap(map). The volatile declaration is useless in your case, since your variable is initialized at the beginning.

The semantics of volatile apply only to the variable you are declaring.
In your case, the variable that holds your reference to map is volatile, and so the JVM will go to lengths to assure that changes you make to the reference contained by map are visible to other threads.
However, the object referred to by map is not covered by any such guarantee and in order for changes to any object or any object graph to be viewed by other threads, you will need to establish a happens-before relationship. With mutable state objects, this usually means synchronizing on a lock or using a thread safe object designed for concurrency. Happily, in your case, a high-performance Map implementation designed for concurrent access is part of the Java library: `ConcurrentHashMap'.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.