so I have a HashMap that is declared in class level like so:
private static volatile HashMap<String, ArrayList>String>> map =
new HashMap<String, ArrayList>String>>();
I have several threads updating the same map and the threads are declared in the class level like so:
private class UpdateThread extends Thread {
#Override
public void run() {
// update map here
// map actually gets updated here
}
}
But after the threads exit:
for (FetchSKUsThread thread : listOfThreads) {
thread.start();
}
for (FetchSKUsThread thread : listOfThreads) {
try {
thread.join();
// map not updated anymore :-[
} catch (InterruptedException e) {
e.printStackTrace();
}
}
Why are the map changes that are occuring inside the thread not persisting after the thread is done? I've decalred the map static and volatile already...
Thanks in advance
Why are the map changes that are occurring inside the thread not persisting after the thread is done? I've declared the map static and volatile already...
It depends highly on how you are updating the map.
// update map here -- what's happening here?
As #Louis points out, if multiple threads are updating the same map instance, volatile won't help you and you should be using a ConcurrentHashMap. As #Gerhard points out, volatile is only protecting the updating of the HashMap reference and not the innards of the map itself. You need to fully lock the map if the threads are updating it in parallel or use a concurrent map.
However, if each thread is replacing the map with a new map then the volatile method would work. Then again, each thread may be overwriting the central map because of race conditions.
If you show us your update code, we should be able to explain it better.
The keyowrd volatile only makes the reference to the HashMap visible to all threads.
If you want to access a HashMap in several threads, you need to use a synchronized map. The easiest choices are using java.util.Hashtable or using Collections.synchronizedMap(map). The volatile declaration is useless in your case, since your variable is initialized at the beginning.
The semantics of volatile apply only to the variable you are declaring.
In your case, the variable that holds your reference to map is volatile, and so the JVM will go to lengths to assure that changes you make to the reference contained by map are visible to other threads.
However, the object referred to by map is not covered by any such guarantee and in order for changes to any object or any object graph to be viewed by other threads, you will need to establish a happens-before relationship. With mutable state objects, this usually means synchronizing on a lock or using a thread safe object designed for concurrency. Happily, in your case, a high-performance Map implementation designed for concurrent access is part of the Java library: `ConcurrentHashMap'.
Related
I have a map defined
private static HashMap<Object, Object> myMap;
It is populated in a single thread and then that single thread spawns more threads that alter the data inside the map elements, but do not alter the map structure (no removes/puts/etc). Also, each thread alters one and only one unique member of the map (no two threads alter the same member).
My question is: will the main thread see the changes to the hashmap members, once all changes are complete? If not, would adding volatile to the declaration work, or would it only guarantee other threads see changes to the structure? Thanks
EDIT: Code that hopefully highlights what I'm doing in a more clear way
import java.util.HashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class TestingRandomStuff {
public static void main(String[] args) throws Exception {
HashMap<Object, Object> myMap = new HashMap();
ExecutorService pool = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
//myMap is populated by this thread, and the objects inside are initialized, but are left largely empty
populate();
for (Object o : myMap.values()) {
Runnable r = new Task(o);
pool.execute(r);
}
pool.shutdown();
try {
pool.awaitTermination(Long.MAX_VALUE, TimeUnit.SECONDS);
} catch (InterruptedException e) {;}
//How do I gurantee that the objects inside myMap are displayed correctly with all the data that was just loaded by seperate threads,
//or is it already guranteed?
displayObjectData();
}
public static class Task implements Runnable {
private Object o;
public Task(Object o) {this.o = o;}
public void run() {
try {
o.load(); //o contains many complicated instance variables that will be created and written to
} catch (Exception e) {;}
}
}
}
EDIT: in your example, the map doesn't get accessed in other threads, only objects which are referenced by the map.
The objects themselves should be thread safe due to the way they are being used.
Note: if you used a parallelStream() the code would be simpler.
will other threads see the changes to the hashmap members?
Probably, but there is no guarentee
If not, would adding volatile to the declaration work,
volatile on the field only adds a read barrier on the reference to the Map. (Unless you change the field to point to another Map when you will get a write barrier.)
or would it only guarantee other threads see changes to the structure?
No, only guaranteed see changes to the myMap reference not changes to the Map or anything in the Map. i.e. the guarantee is very shallow.
There is a number of ways you can provide thread safety however the simplest is to synchronized the object in the on write and read. There are tricks you can do with volatile fields however it is high dependant on what you are doing as to whether thi will work.
You could use ConcurrentHashMap.
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset.
This means changes done by a thread are visible to another thread reading value for the same key. But the two can interfere.
In Java, there are no objects embedded in other objects, all data structures consisting of multiple objects are reference graphs, or in other words, a collection of objects is a collection of references and a map with key and value objects is a map containing references to these objects.
Therefore, your objects never became “hashmap members”, but are just referenced by your hashmap. So, to discuss the thread safety of your code, the existence of the HashMap is just a red herring, as your multi-threaded code never sees any artifact of the HashMap.
You have code creating several distinct object, then submitting Task instances, each containing a reference to one of these objects, to an ExecutorService to be processed. Assuming that these object do not share mutable state, this is a straight-forward thread safe approach.
After waiting for the completion of all jobs, the main thread can be sure to see the result of all actions made within the jobs, i.e. the modifications made to these objects. It will again be entirely irrelevant whether you use that HashMap or anything else to get a reference to one of these objects to look at these modifications.
It would be different if you were modifying keys of a map in a way that it affects their equality or hash code, but that’s independent from thread safety concerns. You must never modify map keys in such a way, even in single threaded code, and violating this contract would even break thread safe maps. But since your objects are referenced as values, there is no such problem.
There is only one corner case to pay attention to. Your waiting for completion contains the line catch (InterruptedException e) {;}, so there is no full guaranty that after the execution of the statement truly all jobs have been completed, but this is the requirement for the visibility guaranty. Since you seem to assume that interruption should never happen, you should use something like catch(InterruptedException e) { throw new AssertionError(e); }, to be sure that violations of that assumption do not happen silently, while at the same time, get the full visibility guaranty, as now the displayObjectData(); statement can only be reached when all jobs have been completed (or Long.MAX_VALUE seconds elapsed which no-one of us will ever witness).
Local variables are thread safe in Java. Is using a hashmap declared inside a method thread safe?
For Example-
void usingHashMap()
{
HashMap<Integer> map = new HashMap<integer>();
}
When two threads run the same method here usingHashMap(), they are in no way way related. Each thread will create its own version of every local variable, and these variables will not interact with each other in any way
If variables aren't local,then they are attached to the instance. In this case, two threads running the same method both see the one variable, and this isn't threadsafe.
public class usingHashMapNotThreadSafe {
HashMap<Integer, String> map = new HashMap<Integer, String>();
public int work() {
//manipulating the hashmap here
}
}
public class usingHashMapThreadSafe {
public int worksafe() {
HashMap<Integer, String> map = new HashMap<Integer, String>();
//manipulating the hashmap here
}
}
While usingHashMapNotThreadSafe two threads running on the same instance of usingHashMapNotThreadSafe will see the same x. This could be dangerous, because the threads are trying to change map! In the second, two threads running on the same instance of usingHashMapThreadSafe will see totally different versions of x, and can't effect each other.
As long as the reference to the HashMap object is not published (is not passed to another method), it is threadsafe.
The same applies to the keys/values stored in the map. They need to be either immutable (cannot change their states after being created) or used only within this method.
I think to ensure complete concurrency, a ConcurrentHashMap should be used in any case. Even if it is local in scope. ConcurrentHashMap implements ConcurrentMap. The partitioning is essentially an attempt, as explained in the documentation to:
The table is internally partitioned to try to permit the indicated number of concurrent updates without contention. Because placement in hash tables is essentially random, the actual concurrency will vary. Ideally, you should choose a value to accommodate as many threads as will ever concurrently modify the table. Using a significantly higher value than you need can waste space and time, and a significantly lower value can lead to thread contention.
I am trying to get a firm handle on how a variable declared as
private volatile HashMap<Object, ArrayList<String>> data;
would behave in a multi-threaded environment.
What I understand is that volatile means get from main memory and not from the thread cache. That means that if a variable is being updated I will not see the new values until the update is complete and I will not block, rather what I see is the last updated value. (This is exactly what I want BTW.)
My question is when I retrieve the ArrayList<String> and add or remove strings to it in thread A while thread B is reading, what exactly is affected by the volatile keyword? The HashMap only or is the effect extended to the contents (K and V) of the HashMap as well? That is when thread B gets an ArrayList<String> that is currently being modified in thread A what is actually returned is the last value of ArrayList<String> that existed before the updated began.
Just to be clear, lets say the update is adding 2 strings. One string has already been added in thread A when thread B gets the array. Does thread B get the array as it was before the first string was added?
That means that if a variable is being updated I will not see the new values until the update is complete and I will not block, rather what I see is the last updated value
This is your source of confusion. What volatile does is make sure that reads and writes to that field are atomic - so no other threads could ever see a partially written value.
A non-atomic long field (which takes 2 memory addresses on a 32-bit machine) could be read incorrectly if a write operation was preempted after writing to the first address, and before writing to the second address.
Note that the atomicity of reads/writes to a field has nothing to do with updating the inner state of an HashMap. Updating the inner state of an HashMap entails multiple instructions, which are not atomic as a whole. That's why you'd use locks to synchronize access to the HashMap.
Also, since read/write operations on references are always atomic, even if the field is not marked as volatile, there is no difference between a volatile and a non-volatile HashMap, regarding atomicity. In that case, all volatile does is give you acquire-release semantics. This means that, even though the processor and the compiler are still allowed to slightly reorder your instructions, no instructions may ever be moved above a volatile read or below a volatile write.
The volatile keyword here is only applicable to HashMap, not the data stored within it, in this case is ArrayList.
As stated in HashMap documentation:
Note that this implementation is not synchronized. If multiple threads
access a hash map concurrently, and at least one of the threads
modifies the map structurally, it must be synchronized externally. (A
structural modification is any operation that adds or deletes one or
more mappings; merely changing the value associated with a key that an
instance already contains is not a structural modification.) This is
typically accomplished by synchronizing on some object that naturally
encapsulates the map. If no such object exists, the map should be
"wrapped" using the Collections.synchronizedMap method. This is best
done at creation time, to prevent accidental unsynchronized access to
the map:
Map m = Collections.synchronizedMap(new HashMap(...));
The volatile keywords neither affects operations on the HashMap (e.g. put, get) nor operations on the ArrayLists within the HashMap. The volatile keywords only affects reads and writes on this particular reference to the HashMap. Again, there can be further references to the same HashMap, which are no affected.
If you want to synchronise all operations on:
- the reference
- the HashMap
- and the ArrayList,
then use an additional Lock object for synchronisation as in the following code.
private final Object lock = new Object();
private Map<Object, List<String>> map = new HashMap<>();
// access reference
synchronized (lock) {
map = new HashMap<>();
}
// access reference and HashMap
synchronized (lock) {
return map.contains(42);
}
// access reference, HashMap and ArrayList
synchronized (lock) {
map.get(42).add("foobar");
}
If the reference is not changed, you can use the HashMap for synchronization (instead of the Lock).
#Singleton
#LocalBean
#Startup
#ConcurrencyManagement(ConcurrencyManagementType.BEAN)
public class DeliverersHolderSingleton {
private volatile Map<String, Deliverer> deliverers;
#PostConstruct
private void init() {
Map<String, Deliverer> deliverersMod = new HashMap<>();
for (String delivererName : delivererNames) {
/*gettig deliverer by name*/
deliverersMod.put(delivererName, deliverer);
}
deliverers = Collections.unmodifiableMap(deliverersMod);
}
public Deliverer getDeliverer(String delivererName) {
return deliverers.get(delivererName);
}
#Schedule(minute="*", hour="*")
public void maintenance() {
init();
}
}
Singleton is used for storing data. Data is updated once per minute.
Is it possible, that read from the unmodifiableMap will be a problem with the synchronization? Is it possible that it will occurs reordering in init method and link to the collection will published, but collection not filled completely?
The Java Memory Model guarantees that there is a happens-before relationship between a write and a subsequent read to a volatile variable. In other words, if you write to a volatile variable and subsequently read that same variable, you have the guarantee that the write operation will be visible, even if multiple threads are involved:
A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.
It goes further and guarantees that any operation that happened before the write operation will also be visible at the reading point (thanks to the program order rule and the fact that the happens-before relationship is transitive).
Your getDeliverers method reads from the volatile variable so it will see the latest write operated on the line deliverers = Collections.unmodifiableMap(deliverersMod); as well as the preceding operations where the map is populated.
So your code is thread safe and your getDeliverers method will return a result based on the latest version of your map.
Thread safety issues here:
multiple reads from the HashMap - is thread safe, because multiple reads are allowed as long as there are no modifications to the collection and writes to the HashMap will not happen, because the map is an unmodifiableMap()
read/write on deliverers - is thread safe, because all java reference assignments are atomic
I can see no thread-unsafe operations here.
I would like to note that the name of init() metod is misleading, it suggests that it is called once during initialization; I'd suggest calling it rebuild() or recreate().
According to the Reordering Grid found here http://g.oswego.edu/dl/jmm/cookbook.html, the 1st operation being Normal Store cannot be reordered with the second operation being Volatile Store, so in your case, as long as the immutable map is not null, there wouldn't be any reordering problems.
Also, all writes that occur prior to a volatile store will be visible, so you will not see any publishing issues.
I have an #ApplicationScoped bean for all users, that stores the ids-> names & vice versa in Trove & java.util maps.
I just build the maps once at construction of bean or (in case of manual refresh by the website admin).
Inside the bean methods, I am just using the get() with the maps, so not modifying the map. Is this going to be thread safe since it is used only for ready purposes? I am not sharing the maps with any other beans outside & not modifying the maps(adding/removing entries) anytime in my code.
Also, Is it neccesary in this case to make the fields final ?
Bean code as follows:
#ApplicationScoped
#ManagedBean(name="directory", eager=true)
public class directory {
private static TIntObjectHashMap<String> idsToNamesMap;
private static TreeMap<String, Integer> namesToIdsMap;
#PostConstruct
public void buildDirectory(){
// building directory here ....
}
public String getName(int topicId){
return idsToNamesMap.get(topicId);
}
public List<Entry<String, Integer>> searchTopicsByName(String query){
return new ArrayList(namesToIdsMap.subMap(query, true, query+"z", true).entrySet());
}
}
You don't have to declare them volatile or protect with any kind of synchronization in this case. As long as the constructing thread will build them and synchronize with the main memory.
For that the constructing thread need just to make a single write to a volatile variable or enter/exit a synchronization lock. This will pass a memory barrier and all local thread data will be in the main thread. Then it will be safe for all other threads to read this data.
Even more - unnecessary volatile or synchronization block - costs a serious performance penalty - on each access to the variable it will pass the memory barrier - which is an expensive operation
There could be a visibility issue after the object is constructed. That is, in the immediate aftermath of your constructor calls, the maps may appear populated to the thread that populated them, but not necessarily to other threads, at least not right away. This type of issue is extensively discussed in chapter 3 of Java Concurrency in Practice. However, I think that if you declare the maps as volatile:
private static volatile TIntObjectHashMap<String> idsToNamesMap;
private static volatile TreeMap<String, Integer> namesToIdsMap;
You should be OK.
Update
I just realized something while looking at your code again. The maps are static - why are they being populated in an instance context by a constructor? First off, it is confusing to the reader. Second, if more than one instance of the object is created, then you will have additional writes to the maps, not just one, possibly while other threads are reading them.
You should either make them non-static, or populate them in a static initialization block.