Using Synchronized with Thread-Safe Collection? - java

Suppose I have the following code:
private ConcurrentHashMap<Integer, Book> shelf;
public Library(ConcurrentHashMap<Integer, Book> shelf){
this.shelf = new ConcurrentHashMap<Integer, Book>(shelf);
}
Given that I'm using a thread safe collection would the following method be okay to use or do I need to worry about thread safety?
public void addBook(int index, Book add){
shelf.put(index, add);
}
If the above method isn't safe to use, would adding synchronized be the proper way of doing it? Like so,
public synchronized void addBook(int index, Book add){
shelf.put(index, add);
}

You don't need to worry if you are ONLY calling shelf.put. Since put is already threadsafe then you are ok.
You would need to worry about synchronized when you are doing multiple operations that together need to be atomic. For example, maybe you had a method called updateBook that looks like
public void updateBook(int index, String newTitle){
Book book = shelf.get(index);
// do something with book or maybe update book.setTitle(newTitle);
shelf.put(index, book);
}
This method would have to be synchronized because otherwise anther thread can get a book that is not updated yet.

The synchronized keyword essentially puts a mutex lock around the entire addBook method.
A ConcurrentHashMap ensures that all operations (such as put) are threadsafe, but using retrieval operations (such as get) in conjunction might cause you to come across a situation where you are retrieving contents from the Hashmap at the same time that you are putting, and get unexpected results.
Individually, all methods in the ConcurrentHashMap are thread-safe, but used in conjunction in separate threads you cannot necessarily be certain of the order in which they execute. (Thanks to #jtahlborn for clarification).
So, in your specific case, adding the synchronized keyword to the addBook method is redundant.
If you're doing more complex operations involving multiple retrievals and puts, you may want to consider some extraneous locking (your own mutex).
See: https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ConcurrentHashMap.html

Related

Why is nameList.add not synchronized?

Here is a short bit of text from the Oracle Java Tutorials:
“Synchronized Statements
Another way to create synchronized code is with synchronized statements. Unlike synchronized methods, synchronized statements must specify the object that provides the intrinsic lock:
public void addName(String name) {
synchronized(this) {
lastName = name;
nameCount++;
}
nameList.add(name);
}
In this example, the addName method needs to synchronize changes to lastName and nameCount, but also needs to avoid synchronizing invocations of other objects' methods. (Invoking other objects' methods from synchronized code can create problems that are described in the section on Liveness.) Without synchronized statements, there would have to be a separate, unsynchronized method for the sole purpose of invoking nameList.add.”
I understand their point about the flexibility Synchronized gives. But why did Oracle decide that nameList.add did not need to be synchronized? More generally, how can I determine which objects methods need to be synchronized and which don't?
Synchronizing has its price, performance wise. The general rule of thumb (which the JDK itself also follows, in this case too) is not to synchronize unless it's absolutely required.
Since there are many, many cases where you'd want to add an element to a list without requiring synchronization (e.g., if said list is a local variable), it was not defined as synchronized. When you need to synchronize such an operation, you can always use a CopyOnWWriteArrayList, or synchronize your access explicitly.
Synchronize based on what you know the threads will access.
If they are trying to access a list then synchronize the getter. In this case, Oracle was synchronizing the call to the method ONLY.
If they are trying to modify the list, then synchronize the reference to the list inside your block!
So:
synchronized(this.nameList){
//now I'm safe to modify
nameList.add(name);
}
You can create a thread safe ArrayList of Strings like this:
ArrayList<String> list = Collections.synchronizedList(new ArrayList<String>());
the methods add() and remove() are thread safe. If you don't need to iterate over the elements you're just fine.
Note: It is imperative that you manually synchronize on the returned collection when iterating over it
synchronized (list) {
Iterator i = list.iterator(); // Must be in the synchronized block
while (list.hasNext())
yourmethod(list.next()); // Your logic on elements...
}
Before using read the docs here: Synchronized Collection

Non-thread-safe Attempt to Implement Put-if-absent?

There is one code snippet in the 4th chapter in Java Concurrency in Practice
public class ListHelper<E> {
public List<E> list =
Collections.synchronizedList(new ArrayList<E>());
...
public synchronized boolean putIfAbsent(E x) {
boolean absent = !list.contains(x);
if (absent)
list.add(x);
return absent;
}
}
it says this is thread safe for using different locks, putIfAbsent is not atomic relative to other operations on the List.
But I think "synchronized" preventing multithreads enter putIfAbsent, if there are other methods that do other operations on the List, key word synchronized should also be as the method atttribute. So following this way, should it be thread safe? Under what case "it is not atomic"?
putIfAbsent is not atomic relative to other operations on the List. But I think "synchronized" preventing multithreads enter putIfAbsent
This is true but there is no guarantees that there are other ways threads are accessing the list. The list field is public (which is always a bad idea) which means that other threads can call methods on the list directly. To properly protect the list, you should make it private and add add(...) and other methods to your ListHelper that are also synchronized to fully control all access to the synchronized-list.
// we are synchronizing the list so no reason to use Collections.synchronizedList
private List<E> list = new ArrayList<E>();
...
public synchronized boolean add(E e) {
return list.add(e);
}
If the list is private and all of the methods are synchronized that access the list then you can remove the Collections.synchronizedList(...) since you are synchronizing it yourself.
if there are other methods that do other operations on the List, key word synchronized should also be as the method atttribute. So following this way, should it be thread safe?
Not sure I fully parse this part of the question. But if you make the list be private and you add other methods to access the list that are all synchronized then you are correct.
Under what case "it is not atomic"?
putIfAbsent(...) is not atomic because there are multiple calls to the synchronized-list. If multiple threads are operating on the list then another thread could have called list.add(...) between the time putIfAbsent(...) called list.contains(x) and then calls list.add(x). The Collections.synchronizedList(...) protects the list against corruption by multiple threads but it cannot protect against race-conditions when there are multiple calls to list methods that could interleave with calls from other threads.
Any unsynchronized method that modifies the list may introduce the absent element after list.contains() returns false, but before the element has been added.
Picture this as two threads:
boolean absent = !list.contains(x); // Returns true
-> list.add(theSameElementAsX); // Another thread
if(absent) // absent is true, but the list has been modified!
list.add(x);
return absent;
This could be accomplished with simply a method as follows:
public void add(E e) {
list.add(e);
}
If the method were synchronized, there would be no problem, since the add method wouldn't be able to run before putIfAbsent() was fully finished.
A proper correction would include making the List private, and making sure that compound operations on it are properly synchronized (i.e. on the class or the list itself).
Thread safety is not composable! Imagine a program built entirely out of "thread safe" classes. Is the program itself "thread safe?" Not necessarily. It depends on what the program does with those classes.
The synchronizedList wrapper makes each individual method of a List "thread safe". What does that mean? It means that none of those wrapped methods can corrupt the internal structure of the list when called in a multi-threaded environment.
That doesn't protect the way in which any given program uses the list. In the example code, the list appears to be used as an implementation of a set: The program doesn't allow the same object to appear in the list more than one time. There's nothing in the synchronizedList wrapper that will enforce that particular guarantee though, because that guarantee has nothing to do with the internal structure of the list. The list can be perfectly valid as a list, but not valid as a set.
That's why the additional synchronization on the putIfAbsent() method.
Collections.synchronizedList() creates a collection which adds synchronization on private mutex for every single method of it. This mutex is list this for one-argument factory used in example, or can be provided when two-argument factory is used. That's why we need an external lock to make subsequent contains() and add() calls atomic.
In case the list is available directly, not via ListHelper, this code is broken, because access will be guarded by different locks in that case. To prevent that, it is possible to make list private to prevent direct access, and wrap all neccesary API with synchronization on the same mutex declared in ListHelper or on the this of ListHelper itself.

Java thread safe, data races and good implementation

I'm coding a spring controller declaring some private hashmap and updating them in some methods. Of course, I'm aware of problem of concurrent access, so I'm using the most simple way to avoid these problem: "syncronised" java thread safe capability. But I was wondering if I should synchronise my method or only the hashmap I need to thread safely update:
#Controller
public class myController{
private HashMap<String, String> myHashMap = new HashMap<String, String>();
...
//That way ?
public synchronized updateMyHashmap(){
myHashMap.add(value);
}
//or this way ?
public static void updateMyHashMap(){
synchronized(myHashMap){
myHashMap.add(value)
}
}
}
These methods are equivalent ? Will I have the same behavior of application ?
These methods are not equivalent.
synchronized on a method synchronizes on this, so:
public synchronized updateMyHashmap(){
myHashMap.add(value);
}
is equivalent to:
public updateMyHashmap(){
synchronized(this) {
myHashMap.add(value);
}
}
Here, this is the instance of myController. (Side-note: it's usually recommended to start class names in Java with a capital letter).
You second method is incorrect and shouldn't compile, since you're accessing a non-static member (myHashMap) from a static method.
Assuming it wasn't static, it would synchronize on the hashmap and not the myController instance.
What you want to synchronize on will almost certainly depend on what else you want to do in this synchronized block (for example, where does this value come from, did it have to be taken from somewhere else, and does the whole operation need to be synchronized using the myController instance).
I would suggest reading Java Concurrency in Practice to learn more about synchronization problems.
If the myHashMap variable is private and only accessed via your myController methods, then you only need to synchronize the methods. To put it another way, if you synchronize your methods, they will be thread safe, but if you then directly access your myHashMap variable without using a synchronize block/method then you can subvert the lock.
To put it in even simpler terms. If you have a room with a door that allows one person in at a time then you'll only ever have one person in the room at a time... but if you put a window in the room then you can still jump into the room via the window. :)
EDIT: To elaborate. By placing synchronized on the method as a keyword your critical section will be created for the entirety of the method, that means that from that to end of the method call, nothing else will be able to run in that object. By using close blocks, you can hop in and out of critical sections as you need to. Say you had a method with 100 lines, that for 10 lines needed to use a shared resource, if you put synchronized on the method the object would be locked for the whole 100 lines of code, but if you put the lock around only the 10 lines you need, you'd only have a critical section on the 10 lines.
It's all about the situation you're in. For the example you've given there is absolutely no difference.
I would use a ConcurrentHashMap and not bother manually synchronizing access.
Making methods static won't help you much here; especially since you need an object to synchronize on, which is normally this.
These two are some different things but can be used to fulfill same kind of purpose.
See here,
//Here you are locking your method
public synchronized updateMyHashmap(){
System.out.println("Inside method");
//some code
myHashMap.add(value);
}
Now in this method
public updateMyHashmap(){
//Some code here
System.out.println("Inside method")
synchronized(this) {
myHashMap.add(value);
}
}
So I think you can now easily see the difference between these two strategies. It depends on what you want to do and according to that you can choose any one of them.
It is more advisable to synchronized the object you want to lock.

Synchronized Map or synchronized methods

I have the following class for a Router's table with synchronised methods:
public class RouterTable {
private String tableForRouter;
private Map<String,RouterTableEntry> table;
public RouterTable(String router){
tableForRouter = router;
table = new HashMap<String,RouterTableEntry>();
}
public String owner(){
return tableForRouter;
}
public synchronized void add(String network, String ipAddress, int distance){
table.put(network, new RouterTableEntry(ipAddress, distance));
}
public synchronized boolean exists(String network){
return table.containsKey(network);
}
}
Multiple threads will read and write to the HashMap. I was wondering if it would be best to remove the synchronized on the methods and just use Collections.synchronizedMap(new HashMap<String,RouterTableEntry())` what is the most sensible way in Java to do this?
I would suggest using a ConcurrentHashmap. This is a newer data structure introduced in later version of Java. It provides thread safety and allows concurrent operations, as opposed to a synchronized map, which will do one operation at a time.
If the map is the only place where thread safety is required, then just using the ConcurrentHashmap is fine. However, if you have atomic operations involving more state variables, I would suggest using synchronized code blocks instead of synchronized functions
In the absence of strict requirements about happens-before relationships and point in time correctness, the sensible thing to do in modern java is usually just use a ConcurrentMap.
Otherwise, yes, using a Collections#synchronizedMap is both safer and likely more performant (because you won't enclose any tertiary code that doesn't need synchronization) than manually synchronizing everything yourself.
The best is to use a java.util.concurrent.ConcurrentHashMap, which is designed from the ground up for concurrent access (read & write).
Using synchronization like you do works, but shows high contention and therefore not optimal performance. A collection obtained through Collections.synchronizedMap() would do just the same (it only wraps a standart collection with synchronized methods).
ConcurrentHashMap, on the contrary, used various techniques to be thread-safe and provide good concurrency ; for example, it has (by default) 16 regions, each guarded by a distinct lock, so that up to 16 threads can use it concurrently.
Synchronizing the map will prevent users of your class from doing meaningful synchronization.
They will have no way of knowing if the result from exists is still valid, once they get into there if statement, and will need to do external synchronization.
With the synchronized methods as you show, they could lock on your class until they are done with a block of method calls.
The other option is to do no synchronization and let the user handle that, which they need to do anyway to be safe.
Adding your own synchronization is what was wrong with HashTable.
The current common style tends to prefer Synchronized collections over explicit synchronized qualification on the methods that access them. However, this is not set in stone, and your decision should depend on the way you use this code/will use this code in the future.
Points to consider:
(a) If your map is going to be used by code that is outside of the RouterTable then you need to use a SynchronizedMap.
(b) OTOH, if you are going to add some additional fields to RouterTable, and their values need to be consistent with the values in the map (in other words: you want changes to the map and to the additional fields to happen in one atomic quantum), then you need to use synchrnoized method.

Determining synchronization scope?

in trying to improve my understanding on concurrency issues, I am looking at the following scenario (Edit: I've changed the example from List to Runtime, which is closer to what I am trying):
public class Example {
private final Object lock = new Object();
private final Runtime runtime = Runtime.getRuntime();
public void add(Object o) {
synchronized (lock) { runtime.exec(program + " -add "+o); }
}
public Object[] getAll() {
synchronized (lock) { return runtime.exec(program + " -list "); }
}
public void remove(Object o) {
synchronized (lock) { runtime.exec(program + " -remove "+o); }
}
}
As it stands, each method is by thread safe when used standalone. Now, what I'm trying to figure out is how to handle where the calling class wishes to call:
for (Object o : example.getAll()) {
// problems if multiple threads perform this operation concurrently
example.remove(b);
}
But as noted, there is no guarantee that the state will be consistent between the call to getAll() and the calls to remove(). If multiple threads call this, I'll be in trouble. So my question is - How should I enable the developer to perform the operation in a thread safe manner? Ideally I wish to enforce the thread safety in a way that makes it difficult for the developer to avoid/miss, but at the same time not complicated to achieve. I can think of three options so far:
A: Make the lock 'this', so the synchronization object is accessible to calling code, which can then wrap the code blocks. Drawback: Hard to enforce at compile time:
synchronized (example) {
for (Object o : example.getAll()) {
example.remove(b);
}
}
B: Place the combined code into the Example class - and benefit from being able to optimize the implementation, as in this case. Drawback: Pain to add extensions, and potential mixing unrelated logic:
public class Example {
...
public void removeAll() {
synchronized (lock) { Runtime.exec(program + " -clear"); }
}
}
C: Provide a Closure class. Drawback: Excess code, potentially too generous of a synchronization block, could in fact make deadlocks easier:
public interface ExampleClosure {
public void execute(Example example);
}
public Class Example {
...
public void execute(ExampleClosure closure) {
synchronized (this) { closure.execute(this); }
}
}
example.execute(new ExampleClosure() {
public void execute(Example example) {
for (Object o : example.getAll()) {
example.remove(b);
}
}
}
);
Is there something I'm missing? How should synchronization be scoped to ensure the code is thread safe?
Use a ReentrantReadWriteLock which is exposed via the API. That way, if someone needs to synchronize several API calls, they can acquire a lock outside of the method calls.
In general, this is a classic multithreaded design issue. By synchronizing the data structure rather than synchronizing concepts that use the data structure, it's hard to avoid the fact that you essentially have a reference to the data structure without a lock.
I would recommend that locks not be done so close to the data structure. But it's a popular option.
A potential technique to make this style work is to use an editing tree-walker. Essentially, you expose a function that does a callback on each element.
// pointer to function:
// - takes Object by reference and can be safely altered
// - if returns true, Object will be removed from list
typedef bool (*callback_function)(Object *o);
public void editAll(callback_function func) {
synchronized (lock) {
for each element o { if (callback_function(o)) {remove o} } }
}
So then your loop becomes:
bool my_function(Object *o) {
...
if (some condition) return true;
}
...
editAll(my_function);
...
The company I work for (corensic) has test cases extracted from real bugs to verify that Jinx is finding the concurrency errors properly. This type of low level data structure locking without higher level synchronization is pretty common pattern. The tree editing callback seems to be a popular fix for this race condition.
I think everyone is missing his real problem. When iterating over the new array of Object's and trying to remove one at a time the problem is still technically unsafe (though ArrayList implantation would not explode, it just wouldnt have expected results).
Even with CopyOnWriteArrayList there is the possibility that there is an out of date read on the current list to when you are trying to remove.
The two suggestions you offered are fine (A and B). My general suggestion is B. Making a collection thread-safe is very difficult. A good way to do it is to give the client as little functionality as possible (within reason). So offering the removeAll method and removing the getAll method would suffice.
Now you can at the same time say, 'well I want to keep the API the way it is and let the client worry about additional thread-safety'. If thats the case, document thread-safety. Document the fact that a 'lookup and modify' action is both non atomic and non thread-safe.
Today's concurrent list implementations are all thread safe for the single functions that are offered (get, remove add are all thread safe). Compound functions are not though and the best that could be done is documenting how to make them thread safe.
I think j.u.c.CopyOnWriteArrayList is a good example of similar problem you're trying to solve.
JDK had a similar problem with Lists - there were various ways to synchronize on arbitrary methods, but no synchronization on multiple invocations (and that's understandable).
So CopyOnWriteArrayList actually implements the same interface but has a very special contract, and whoever calls it, is aware of it.
Similar with your solution - you should probably implement List (or whatever interface this is) and at the same time define special contracts for existing/new methods. For example, getAll's consistency is not guaranteed, and calls to .remove do not fail if o is null, or isn't inside the list, etc. If users want both combined and safe/consistent options - this class of yours would provide a special method that does exactly that (e.g. safeDeleteAll), leaving other methods close to original contract as possible.
So to answer your question - I would pick option B, but would also implement interface your original object is implementing.
From the Javadoc for List.toArray():
The returned array will be "safe" in
that no references to it are
maintained by this list. (In other
words, this method must allocate a new
array even if this list is backed by
an array). The caller is thus free to
modify the returned array.
Maybe I don't understand what you're trying to accomplish. Do you want the Object[] array to always be in-sync with the current state of the List? In order to achieve that, I think you would have to synchronize on the Example instance itself and hold the lock until your thread is done with its method call AND any Object[] array it is currently using. Otherwise, how will you ever know if the original List has been modified by another thread?
You have to use the appropriate granularity when you choose what to lock. What you're complaining about in your example is too low a level of granularity, where the lock doesn't cover all the methods that have to happen together. You need to make methods that combine all the actions that need to happen together within the same lock.
Locks are reentrant so the high-level method can call low-level synchronized methods without a problem.

Categories