Is the following code set up to correctly synchronize the calls on synchronizedMap?
public class MyClass {
private static Map<String, List<String>> synchronizedMap = Collections.synchronizedMap(new HashMap<String, List<String>>());
public void doWork(String key) {
List<String> values = null;
while ((values = synchronizedMap.remove(key)) != null) {
//do something with values
}
}
public static void addToMap(String key, String value) {
synchronized (synchronizedMap) {
if (synchronizedMap.containsKey(key)) {
synchronizedMap.get(key).add(value);
}
else {
List<String> valuesList = new ArrayList<String>();
valuesList.add(value);
synchronizedMap.put(key, valuesList);
}
}
}
}
From my understanding, I need the synchronized block in addToMap() to prevent another thread from calling remove() or containsKey() before I get through the call to put() but I do not need a synchronized block in doWork() because another thread cannot enter the synchronized block in addToMap() before remove() returns because I created the Map originally with Collections.synchronizedMap(). Is that correct? Is there a better way to do this?
Collections.synchronizedMap() guarantees that each atomic operation you want to run on the map will be synchronized.
Running two (or more) operations on the map however, must be synchronized in a block.
So yes - you are synchronizing correctly.
If you are using JDK 6 then you might want to check out ConcurrentHashMap
Note the putIfAbsent method in that class.
There is the potential for a subtle bug in your code.
[UPDATE: Since he's using map.remove() this description isn't totally valid. I missed that fact the first time thru. :( Thanks to the question's author for pointing that out. I'm leaving the rest as is, but changed the lead statement to say there is potentially a bug.]
In doWork() you get the List value from the Map in a thread-safe way. Afterward, however, you are accessing that list in an unsafe matter. For instance, one thread may be using the list in doWork() while another thread invokes synchronizedMap.get(key).add(value) in addToMap(). Those two access are not synchronized. The rule of thumb is that a collection's thread-safe guarantees don't extend to the keys or values they store.
You could fix this by inserting a synchronized list into the map like
List<String> valuesList = new ArrayList<String>();
valuesList.add(value);
synchronizedMap.put(key, Collections.synchronizedList(valuesList)); // sync'd list
Alternatively you could synchronize on the map while you access the list in doWork():
public void doWork(String key) {
List<String> values = null;
while ((values = synchronizedMap.remove(key)) != null) {
synchronized (synchronizedMap) {
//do something with values
}
}
}
The last option will limit concurrency a bit, but is somewhat clearer IMO.
Also, a quick note about ConcurrentHashMap. This is a really useful class, but is not always an appropriate replacement for synchronized HashMaps. Quoting from its Javadocs,
This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details.
In other words, putIfAbsent() is great for atomic inserts but does not guarantee other parts of the map won't change during that call; it guarantees only atomicity. In your sample program, you are relying on the synchronization details of (a synchronized) HashMap for things other than put()s.
Last thing. :) This great quote from Java Concurrency in Practice always helps me in designing an debugging multi-threaded programs.
For each mutable state variable that may be accessed by more than one thread, all accesses to that variable must be performed with the same lock held.
Yes, you are synchronizing correctly. I will explain this in more detail.
You must synchronize two or more method calls on the synchronizedMap object only in a case you have to rely on results of previous method call(s) in the subsequent method call in the sequence of method calls on the synchronizedMap object.
Let’s take a look at this code:
synchronized (synchronizedMap) {
if (synchronizedMap.containsKey(key)) {
synchronizedMap.get(key).add(value);
}
else {
List<String> valuesList = new ArrayList<String>();
valuesList.add(value);
synchronizedMap.put(key, valuesList);
}
}
In this code
synchronizedMap.get(key).add(value);
and
synchronizedMap.put(key, valuesList);
method calls are relied on the result of the previous
synchronizedMap.containsKey(key)
method call.
If the sequence of method calls were not synchronized the result might be wrong.
For example thread 1 is executing the method addToMap() and thread 2 is executing the method doWork()
The sequence of method calls on the synchronizedMap object might be as follows:
Thread 1 has executed the method
synchronizedMap.containsKey(key)
and the result is "true".
After that operating system has switched execution control to thread 2 and it has executed
synchronizedMap.remove(key)
After that execution control has been switched back to the thread 1 and it has executed for example
synchronizedMap.get(key).add(value);
believing the synchronizedMap object contains the key and NullPointerException will be thrown because synchronizedMap.get(key)
will return null.
If the sequence of method calls on the synchronizedMap object is not dependent on the results of each other then you don't need to synchronize the sequence.
For example you don't need to synchronize this sequence:
synchronizedMap.put(key1, valuesList1);
synchronizedMap.put(key2, valuesList2);
Here
synchronizedMap.put(key2, valuesList2);
method call does not rely on the results of the previous
synchronizedMap.put(key1, valuesList1);
method call (it does not care if some thread has interfered in between the two method calls and for example has removed the key1).
That looks correct to me. If I were to change anything, I would stop using the Collections.synchronizedMap() and synchronize everything the same way, just to make it clearer.
Also, I'd replace
if (synchronizedMap.containsKey(key)) {
synchronizedMap.get(key).add(value);
}
else {
List<String> valuesList = new ArrayList<String>();
valuesList.add(value);
synchronizedMap.put(key, valuesList);
}
with
List<String> valuesList = synchronziedMap.get(key);
if (valuesList == null)
{
valuesList = new ArrayList<String>();
synchronziedMap.put(key, valuesList);
}
valuesList.add(value);
The way you have synchronized is correct. But there is a catch
Synchronized wrapper provided by Collection framework ensures that the method calls I.e add/get/contains will run mutually exclusive.
However in real world you would generally query the map before putting in the value. Hence you would need to do two operations and hence a synchronized block is needed. So the way you have used it is correct. However.
You could have used a concurrent implementation of Map available in Collection framework. 'ConcurrentHashMap' benefit is
a. It has a API 'putIfAbsent' which would do the same stuff but in a more efficient manner.
b. Its Efficient: dThe CocurrentMap just locks keys hence its not blocking the whole map's world. Where as you have blocked keys as well as values.
c. You could have passed the reference of your map object somewhere else in your codebase where you/other dev in your tean may end up using it incorrectly. I.e he may just all add() or get() without locking on the map's object. Hence his call won't run mutually exclusive to your sync block. But using a concurrent implementation gives you a peace of mind that it
can never be used/implemented incorrectly.
Check out Google Collections' Multimap, e.g. page 28 of this presentation.
If you can't use that library for some reason, consider using ConcurrentHashMap instead of SynchronizedHashMap; it has a nifty putIfAbsent(K,V) method with which you can atomically add the element list if it's not already there. Also, consider using CopyOnWriteArrayList for the map values if your usage patterns warrant doing so.
Related
Suppose I have the following code:
private ConcurrentHashMap<Integer, Book> shelf;
public Library(ConcurrentHashMap<Integer, Book> shelf){
this.shelf = new ConcurrentHashMap<Integer, Book>(shelf);
}
Given that I'm using a thread safe collection would the following method be okay to use or do I need to worry about thread safety?
public void addBook(int index, Book add){
shelf.put(index, add);
}
If the above method isn't safe to use, would adding synchronized be the proper way of doing it? Like so,
public synchronized void addBook(int index, Book add){
shelf.put(index, add);
}
You don't need to worry if you are ONLY calling shelf.put. Since put is already threadsafe then you are ok.
You would need to worry about synchronized when you are doing multiple operations that together need to be atomic. For example, maybe you had a method called updateBook that looks like
public void updateBook(int index, String newTitle){
Book book = shelf.get(index);
// do something with book or maybe update book.setTitle(newTitle);
shelf.put(index, book);
}
This method would have to be synchronized because otherwise anther thread can get a book that is not updated yet.
The synchronized keyword essentially puts a mutex lock around the entire addBook method.
A ConcurrentHashMap ensures that all operations (such as put) are threadsafe, but using retrieval operations (such as get) in conjunction might cause you to come across a situation where you are retrieving contents from the Hashmap at the same time that you are putting, and get unexpected results.
Individually, all methods in the ConcurrentHashMap are thread-safe, but used in conjunction in separate threads you cannot necessarily be certain of the order in which they execute. (Thanks to #jtahlborn for clarification).
So, in your specific case, adding the synchronized keyword to the addBook method is redundant.
If you're doing more complex operations involving multiple retrievals and puts, you may want to consider some extraneous locking (your own mutex).
See: https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ConcurrentHashMap.html
I am a bit confused regarding one pattern I have seen in some legacy code of ours.
The controller uses a map as a cache, with an approach that should be thread safe, however I am still not confident it indeed is. We have a map, which is properly synchronized during addition and retrieval, however, there is a bit of logic outside of the synchronized block, that does some additional filtering.
(the map itself and the lists are never accessed outside of this method, so concurrent modification is not an issue; the map holds some stable parameters, which basically never change, but are used often).
The code looks like the following sample:
public class FooBarController {
private final Map<String, List<FooBar>> fooBarMap =
new HashMap<String, List<FooBar>>();
public FooBar getFooBar(String key, String foo, String bar) {
List<FooBar> foobarList;
synchronized (fooBarMap) {
if (fooBarMap.get(key) == null) {
foobarList = queryDbByKey(key);
fooBarMap.put(key, foobarList);
} else {
foobarList = fooBarMap.get(key);
}
}
for(FooBar fooBar : foobarList) {
if(foo.equals(fooBar.getFoo()) && bar.equals(fooBar.getBar()))
return fooBar;
}
return null;
}
private List<FooBar> queryDbByKey(String key) {
// ... (simple Hibernate-query)
}
// ...
}
Based on what I know about the JVM memory model, this should be fine, since if one thread populates a list, another one can only retrieve it from the map with proper synchronization in place, ensuring that the entries of the list is visible. (putting the list happens-before getting it)
However, we keep seeing cases, where an entry expected to be in the map is not found, combined with the typical notorious symptoms of concurrency issues (e.g. intermittent failures in production, which I cannot reproduce in my development environment; different threads can properly retrieve the value etc.)
I am wondering if iterating through the elements of the List like this is thread-safe?
The code you provided is correct in terms of concurrency. Here are the guarantees:
only one thread at a time adds values to map, because of synchronization on map object
values added by thread become visible for all other threads, that enter synchronized block
Given that, you can be sure that all threads that iterate a list see the same elements. The issues you described are indeed strange but I doubt they're related to the code you provided.
It could be thread safe only if all access too fooBarMap are synchronized. A little out of scope, but safer may be to use a ConcurrentHashmap.
There is a great article on how hashmaps can be synchronized here.
In situation like this it's best option to use ConcurrentHashMap.
Verify if all Update-Read are in order.
As I understood from your question. There are fix set of params which never changes. One of the ways I preferred in situation like this is:
I. To create the map cache during start up and keep only one instance of it.
II. Read the map Instance anytime anywhere in the application.
In the for loop you are returning reference to fooBar objects in the foobarList.
So the method calling getFooBar() has access to the Map through this fooBar reference object.
try to clone fooBar before returning from getFooBar()
Here is a short bit of text from the Oracle Java Tutorials:
“Synchronized Statements
Another way to create synchronized code is with synchronized statements. Unlike synchronized methods, synchronized statements must specify the object that provides the intrinsic lock:
public void addName(String name) {
synchronized(this) {
lastName = name;
nameCount++;
}
nameList.add(name);
}
In this example, the addName method needs to synchronize changes to lastName and nameCount, but also needs to avoid synchronizing invocations of other objects' methods. (Invoking other objects' methods from synchronized code can create problems that are described in the section on Liveness.) Without synchronized statements, there would have to be a separate, unsynchronized method for the sole purpose of invoking nameList.add.”
I understand their point about the flexibility Synchronized gives. But why did Oracle decide that nameList.add did not need to be synchronized? More generally, how can I determine which objects methods need to be synchronized and which don't?
Synchronizing has its price, performance wise. The general rule of thumb (which the JDK itself also follows, in this case too) is not to synchronize unless it's absolutely required.
Since there are many, many cases where you'd want to add an element to a list without requiring synchronization (e.g., if said list is a local variable), it was not defined as synchronized. When you need to synchronize such an operation, you can always use a CopyOnWWriteArrayList, or synchronize your access explicitly.
Synchronize based on what you know the threads will access.
If they are trying to access a list then synchronize the getter. In this case, Oracle was synchronizing the call to the method ONLY.
If they are trying to modify the list, then synchronize the reference to the list inside your block!
So:
synchronized(this.nameList){
//now I'm safe to modify
nameList.add(name);
}
You can create a thread safe ArrayList of Strings like this:
ArrayList<String> list = Collections.synchronizedList(new ArrayList<String>());
the methods add() and remove() are thread safe. If you don't need to iterate over the elements you're just fine.
Note: It is imperative that you manually synchronize on the returned collection when iterating over it
synchronized (list) {
Iterator i = list.iterator(); // Must be in the synchronized block
while (list.hasNext())
yourmethod(list.next()); // Your logic on elements...
}
Before using read the docs here: Synchronized Collection
There is one code snippet in the 4th chapter in Java Concurrency in Practice
public class ListHelper<E> {
public List<E> list =
Collections.synchronizedList(new ArrayList<E>());
...
public synchronized boolean putIfAbsent(E x) {
boolean absent = !list.contains(x);
if (absent)
list.add(x);
return absent;
}
}
it says this is thread safe for using different locks, putIfAbsent is not atomic relative to other operations on the List.
But I think "synchronized" preventing multithreads enter putIfAbsent, if there are other methods that do other operations on the List, key word synchronized should also be as the method atttribute. So following this way, should it be thread safe? Under what case "it is not atomic"?
putIfAbsent is not atomic relative to other operations on the List. But I think "synchronized" preventing multithreads enter putIfAbsent
This is true but there is no guarantees that there are other ways threads are accessing the list. The list field is public (which is always a bad idea) which means that other threads can call methods on the list directly. To properly protect the list, you should make it private and add add(...) and other methods to your ListHelper that are also synchronized to fully control all access to the synchronized-list.
// we are synchronizing the list so no reason to use Collections.synchronizedList
private List<E> list = new ArrayList<E>();
...
public synchronized boolean add(E e) {
return list.add(e);
}
If the list is private and all of the methods are synchronized that access the list then you can remove the Collections.synchronizedList(...) since you are synchronizing it yourself.
if there are other methods that do other operations on the List, key word synchronized should also be as the method atttribute. So following this way, should it be thread safe?
Not sure I fully parse this part of the question. But if you make the list be private and you add other methods to access the list that are all synchronized then you are correct.
Under what case "it is not atomic"?
putIfAbsent(...) is not atomic because there are multiple calls to the synchronized-list. If multiple threads are operating on the list then another thread could have called list.add(...) between the time putIfAbsent(...) called list.contains(x) and then calls list.add(x). The Collections.synchronizedList(...) protects the list against corruption by multiple threads but it cannot protect against race-conditions when there are multiple calls to list methods that could interleave with calls from other threads.
Any unsynchronized method that modifies the list may introduce the absent element after list.contains() returns false, but before the element has been added.
Picture this as two threads:
boolean absent = !list.contains(x); // Returns true
-> list.add(theSameElementAsX); // Another thread
if(absent) // absent is true, but the list has been modified!
list.add(x);
return absent;
This could be accomplished with simply a method as follows:
public void add(E e) {
list.add(e);
}
If the method were synchronized, there would be no problem, since the add method wouldn't be able to run before putIfAbsent() was fully finished.
A proper correction would include making the List private, and making sure that compound operations on it are properly synchronized (i.e. on the class or the list itself).
Thread safety is not composable! Imagine a program built entirely out of "thread safe" classes. Is the program itself "thread safe?" Not necessarily. It depends on what the program does with those classes.
The synchronizedList wrapper makes each individual method of a List "thread safe". What does that mean? It means that none of those wrapped methods can corrupt the internal structure of the list when called in a multi-threaded environment.
That doesn't protect the way in which any given program uses the list. In the example code, the list appears to be used as an implementation of a set: The program doesn't allow the same object to appear in the list more than one time. There's nothing in the synchronizedList wrapper that will enforce that particular guarantee though, because that guarantee has nothing to do with the internal structure of the list. The list can be perfectly valid as a list, but not valid as a set.
That's why the additional synchronization on the putIfAbsent() method.
Collections.synchronizedList() creates a collection which adds synchronization on private mutex for every single method of it. This mutex is list this for one-argument factory used in example, or can be provided when two-argument factory is used. That's why we need an external lock to make subsequent contains() and add() calls atomic.
In case the list is available directly, not via ListHelper, this code is broken, because access will be guarded by different locks in that case. To prevent that, it is possible to make list private to prevent direct access, and wrap all neccesary API with synchronization on the same mutex declared in ListHelper or on the this of ListHelper itself.
If I do something to a list inside a synchronized block, does it prevent other threads from accessing that list elsewhere?
List<String> myList = new ArrayList<String>();
synchronized {
mylist.add("Hello");
}
Does this prevent other threads from iterating over myList and removing/adding values?
I'm looking to add/remove values from a list, but at the same time protect it from other threads/methods from iterating over it (as the values in the list might be invalidated)
No, it does not.
The synchronized block only prevents other threads from entering the block (more accurately, it prevents other threads from entering all blocks synchronized on the same object instance - in this case blocks synchronized on this).
You need to use the instance you want to protect in the synchronized block:
synchronized(myList) {
mylist.add("Hello");
}
The whole area is quite well explained in the Java tutorial:
http://download.oracle.com/javase/tutorial/essential/concurrency/syncmeth.html
Yes, but only if all other accesses to myList are protected by synchronized blocks on the same object. The code sample you posted is missing an object on which you synchronize (i.e., the object whose mutex lock you acquire). If you synchronize on different objects or fail to synchronize at all in one instance, then other threads may very well access the list concurrently. Therefore, you must ensure that all threads have to enter a synchronized block on the same object (e.g., using synchronized (myList) { ... } consistently) before accessing the list. In fact, there is already a factory method that will wrap each method of your list with synchronized methods for you: Collections.synchronizedList.
However, you can certainly use Collections.synchronizedList to wrap your list so that all of its methods are individually synchronized, but that doesn't necessarily mean that your application's invariants are maintained. Individually marking each method of the list as synchronized will ensure that the list's internal state remains consistent, but your application may wish for more, in which case you will need to write some more complex synchronization logic or see if you can take advantage of the Concurrency API (highly recommended).
here the sychronized makes sure that only one thread is adding Hello to the myList at a time...
to be more specific about synchronizing wrt objects yu can use
synchronized( myList ) //object name
{
//other code
}
vinod
From my limited understanding of concurrency control in Java I would say that it is unlikely that the code above would present the behaviour you are looking for.
The synchronised block would use the lock of whatever object you are calling said code in, which would in no way stop any other code from accessing that list unless said other code was also synchronised using the same lock object.
I have no idea if this would work, or if its in any way advised, but I think that:
List myList = new ArrayList();
synchronized(myList) {
mylist.add("Hello");
}
would give the behaviour you describe, by synchronizing on the lock object of the list itself.
However, the Java documentation recommends this way to get a synchronized list:
List list = Collections.synchronizedList(new ArrayList(...));
See: http://download.oracle.com/javase/1.4.2/docs/api/java/util/ArrayList.html