How to make class having multiple state as thread safe? - java

I have below class where
public class LRUCache {
private HashMap<String,String> dataMap;
private HashMap<String,String> analyticsMap;
public put(String key, String value) {
dataMap.put(key, value);
String date = getCurrentDateAsString();
analyticsMap.put(key, date);
}
public get(String key) {
String date = analyticsMap.get(key);
boolean dateExpired = isDateExpired(date);
boolean value = null;
if (!dateExpired)
value = dataMap.get();
return value;
}
}
In the above class I have 2 hashmaps, which are being accessed in get and put methods. How do I make this class thread safe ?
Do I need to synchronize both get and put which should solve my problem?
In general if I have more than 1 state in class, then instead of making each of using 2 concurrentHashMaps, should I be putting them in a synchronized method?

Merely using ConcurrentHashMap structures doesn't make your LRUCache class thread-safe. You'd need to properly control access so no other thread can modify the underlying contents when you're doing multi-step put/get operations. This can be accomplished with synchronized methods, or with ReentrantReadWriteLock read/write locks.
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.html
From the official Javadoc (my highlights) https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html :
A hash table supporting full concurrency of retrievals and high
expected concurrency for updates. This class obeys the same functional
specification as Hashtable, and includes versions of methods
corresponding to each method of Hashtable. However, even though all
operations are thread-safe, retrieval operations do not entail
locking, and there is not any support for locking the entire table in
a way that prevents all access. This class is fully interoperable with
Hashtable in programs that rely on its thread safety but not on its
synchronization details.

I'd use a ReentrantLock https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReentrantLock.html then you can just synchronize a block rather than the whole method.

Related

Using Synchronized with Thread-Safe Collection?

Suppose I have the following code:
private ConcurrentHashMap<Integer, Book> shelf;
public Library(ConcurrentHashMap<Integer, Book> shelf){
this.shelf = new ConcurrentHashMap<Integer, Book>(shelf);
}
Given that I'm using a thread safe collection would the following method be okay to use or do I need to worry about thread safety?
public void addBook(int index, Book add){
shelf.put(index, add);
}
If the above method isn't safe to use, would adding synchronized be the proper way of doing it? Like so,
public synchronized void addBook(int index, Book add){
shelf.put(index, add);
}
You don't need to worry if you are ONLY calling shelf.put. Since put is already threadsafe then you are ok.
You would need to worry about synchronized when you are doing multiple operations that together need to be atomic. For example, maybe you had a method called updateBook that looks like
public void updateBook(int index, String newTitle){
Book book = shelf.get(index);
// do something with book or maybe update book.setTitle(newTitle);
shelf.put(index, book);
}
This method would have to be synchronized because otherwise anther thread can get a book that is not updated yet.
The synchronized keyword essentially puts a mutex lock around the entire addBook method.
A ConcurrentHashMap ensures that all operations (such as put) are threadsafe, but using retrieval operations (such as get) in conjunction might cause you to come across a situation where you are retrieving contents from the Hashmap at the same time that you are putting, and get unexpected results.
Individually, all methods in the ConcurrentHashMap are thread-safe, but used in conjunction in separate threads you cannot necessarily be certain of the order in which they execute. (Thanks to #jtahlborn for clarification).
So, in your specific case, adding the synchronized keyword to the addBook method is redundant.
If you're doing more complex operations involving multiple retrievals and puts, you may want to consider some extraneous locking (your own mutex).
See: https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ConcurrentHashMap.html

Synchronized collection vs synchronized method?

I have a class container containing a collection which is going to be used by multiple threads:
public class Container{
private Map<String, String> map;
//ctor, other methods reading the map
public void doSomeWithMap(String key, String value){
//do some threads safe action
map.put(key, value);
//do something else, also thread safe
}
}
What would be better, to declare the method synchronized:
public synchronized void doSomeWithMap(String key, String value)
or to use standard thread-safe decorator?
Collections.synchronizedMap(map);
Generally speaking, synchronizing the map will protect most access to it without having to think about it further. However, the "synchronized map" is not safe for iteration which may be an issue depending on your use case. It is imperative that the user manually synchronize on the returned map when iterating over any of its collection views.
Consider using ConcurrentHashMap if that will meet your use case.
If there is other state to this object that needs to be protected from concurrency errors, then you will need to use synchronized or a Lock.
If your doSomeWithMap method will access the map more than once, you must synchronize the doSomeWithMap method. If the only access is the put() call shown, then it's better to use a ConcurrentHashMap.
Note that "more than once" is any call, and an iterator is by nature many "gets".
If you look at the implementation of SynchronizedMap, you'll see that it's simply a map wrapping a non thread-safe map that uses a mutex before calling any method
public V get(Object key) {
synchronized (mutex) {return m.get(key);}
}
public V put(K key, V value) {
synchronized (mutex) {return m.put(key, value);}
}
public Set<Map.Entry<K,V>> entrySet() {
synchronized (mutex) {
if (entrySet==null)
entrySet = new SynchronizedSet<>(m.entrySet(), mutex);
return entrySet;
}
}
If all you want is protecting get and put, this implementation does it for you.
However it's not suitable if you want a Map that can be iterated over and updated by two or more threads, in which case you should use a ConcurrentHashMap.
If the other things you do inside doSomeWithMap would cause problems if done concurrently by different threads (for example, they update class-level variables) then you should synchronise the whole method. If this is not the case then you should use a synchronised Map in order to minimise the length of time the synchronisation lock is left in place.
You should probably have the block to be synchronized upon the requirement. Please note this.
When you use synchronized Collection like ConcurrentHashMap or Collection's method like synchronizedMap(), synchronizedList() etc, only the Map/List is synchronized. To explain little further,
Consider,
Map<String, Object> map = new HashMap<>();
Map<String, Object> synchMap = Collections.synchronizedMap(map);
This makes the map's get operation to be synchronous and not the objects inside it.
Object o = synchMap.get("1");// Object referenced by o is not synchronized. Only the map is.
If you want to protect the objects inside the Map, then you also have to put the code inside the synchronized block. This is good to remember as many people forget to safe guard the object in most cases.
Look at this for little info too Collection's synchronizedMap

Avoiding concurrent structures by manually triggering memory barriers

Background
I have a class whose instances are used to collect and publish data (uses Guava's HashMultimap):
public class DataCollector {
private final SetMultimap<String, String> valueSetsByLabel
= HashMultimap.create();
public void addLabelValue(String label, String value) {
valueSetsByLabel.put(label, value);
}
public Set<String> getLabels() {
return valueSetsByLabel.keySet();
}
public Set<String> getLabelValues(String label) {
return valueSetsByLabel.get(label);
}
}
Instances of this class will now be passed between threads, so I need to modify it for thread-safety. Since Guava's Multimap implementations aren't thread-safe, I used a LoadingCache that lazily creates concurrent hash sets instead (see the CacheBuilder and MapMaker javadocs for details):
public class ThreadSafeDataCollector {
private final LoadingCache<String, Set<String>> valueSetsByLabel
= CacheBuilder.newBuilder()
.concurrencyLevel(1)
.build(new CacheLoader<String, Set<String>>() {
#Override
public Set<String> load(String label) {
// make and return a concurrent hash set
final ConcurrentMap<String, Boolean> map = new MapMaker()
.concurrencyLevel(1)
.makeMap();
return Collections.newSetFromMap(map);
}
});
public void addLabelValue(String label, String value) {
valueSetsByLabel.getUnchecked(label).add(value);
}
public Set<String> getLabels() {
return valueSetsByLabel.asMap().keySet();
}
public Set<String> getLabelValues(String label) {
return valueSetsByLabel.getUnchecked(label);
}
}
You'll notice I'm setting the concurrency level for both the loading cache and nested concurrent hash sets to 1 (meaning they each only read from and write to one underlying table). This is because I only expect one thread at a time to read from and write to these objects.
(To quote the concurrencyLevel javadoc, "A value of one permits only one thread to modify the map at a time, but since read operations can proceed concurrently, this still yields higher concurrency than full synchronization.")
Problem
Because I can assume there will only be a single reader/writer at a time, I feel that using many concurrent hash maps per object is heavy-handed. Such structures are meant to handle concurrent reads and writes, and guarantee atomicity of concurrent writes. But in my case atomicity is unimportant - I only need to make sure each thread sees the last thread's changes.
In my search for a more optimal solution I came across this answer by erickson, which says:
Any data that is shared between thread needs a "memory barrier" to ensure its visibility.
[...]
Changes to any member that is declared volatile are visible to all
threads. In effect, the write is "flushed" from any cache to main
memory, where it can be seen by any thread that accesses main memory.
Now it gets a bit trickier. Any writes made by a thread before that
thread writes to a volatile variable are also flushed. Likewise, when
a thread reads a volatile variable, its cache is cleared, and
subsequent reads may repopulate it from main memory.
[...]
One way to make this work is to have the thread that is populating
your shared data structure assign the result to a volatile variable. [...]
When other threads access that variable, not only are they guaranteed
to get the most recent value for that variable, but also any changes
made to the data structure by the thread before it assigned the value
to the variable.
(See this InfoQ article for a further explanation of memory barriers.)
The problem erickson is addressing is slightly different in that the data structure in question is fully populated and then assigned to a variable that he suggests be made volatile, whereas my structures are assigned to final variables and gradually populated across multiple threads. But his answer suggests I could use a volatile dummy variable to manually trigger memory barriers:
public class ThreadVisibleDataCollector {
private final SetMultimap<String, String> valueSetsByLabel
= HashMultimap.create();
private volatile boolean dummy;
private void readMainMemory() {
if (dummy) { }
}
private void writeMainMemory() {
dummy = false;
}
public void addLabelValue(String label, String value) {
readMainMemory();
valueSetsByLabel.put(label, value);
writeMainMemory();
}
public Set<String> getLabels() {
readMainMemory();
return valueSetsByLabel.keySet();
}
public Set<String> getLabelValues(String label) {
readMainMemory();
return valueSetsByLabel.get(label);
}
}
Theoretically, I could take this a step further and leave it to the calling code to trigger memory barriers, in order to avoid unnecessary volatile reads and writes between calls on the same thread (potentially by using Unsafe.loadFence and Unsafe.storeFence, which were added in Java 8). But that seems too extreme and hard to maintain.
Question
Have I drawn the correct conclusions from my reading of erickson's answer (and the JMM) and implemented ThreadVisibleDataCollector correctly? I wasn't able to find examples of using a volatile dummy variable to trigger memory barriers, so I want to verify that this code will behave as expected across architectures.
The thing you are trying to do is called “Premature Optimization”. You don’t have a real performance problem but try to make your entire program very complicated and possibly error prone, without any gain.
The reason why you will never experience any (notable) gain lies in the way how a lock works. You can learn a lot of it by studying the documentation of the class AbstractQueuedSynchronizer.
A Lock is formed around a simple int value with volatile semantics and atomic updates. In the simplest form, i.e. without contention, locking and unlocking consist of a single atomic update of this int variable. Since you claim that you can be sure that there will be only one thread accessing the data at a given time, there will be no contention and the lock state update has similar performance characteristics compared to your volatile boolean attempts but with the difference that the Lock code works reliable and is heavily tested.
The ConcurrentMap approach goes a step further and allows a lock-free read that has the potential to be even more efficient than your volatile read (depending on the actual implementation).
So you are creating a potentially slower and possibly error prone program just because you “feel that using many concurrent hash maps per object is heavy-handed”. The only answer can be: don’t feel. Measure. Or just leave it as is as long as there is no real performance problem.
Some value is written to volatile variable happens-before this value can be read from it. As a consequence, the visibility guarantees you want will be achieved by reading/writing it, so the answer is yes, this solves visibility issues.
Besides the problems mentioned by Darren Gilroy in his answer, I'd like to remember that in Java 8 there are explicit memory barrier instructions in Unsafe class:
/**
* Ensures lack of reordering of loads before the fence
* with loads or stores after the fence.
*/
void loadFence();
/**
* Ensures lack of reordering of stores before the fence
* with loads or stores after the fence.
*/
void storeFence();
/**
* Ensures lack of reordering of loads or stores before the fence
* with loads or stores after the fence.
*/
void fullFence();
Although Unsafe is not a public API, I still recommend to at least consider using it, if you're using Java 8.
One more solution is coming to my mind. You have set your concurrencyLevel to 1 which means that only one thread at a time can do anything with a collection. IMO standard Java synchronized or ReentrantLock (for the cases of high contention) will also fit for your task and do provide visibility guarantees. Although, if you want one writer, many readers access pattern, consider using ReentrantReadWriteLock.
Well, that's still not particularly safe, b/c it depends a lot of the underlying implementation of the HashMultimap.
You might take a look at the following blog post for a discussion: http://mailinator.blogspot.com/2009/06/beautiful-race-condition.html
For this type of thing, a common pattern is to load a "most recent version" into a volatile variable and have your readers read immutable versions through that. This is how CopyOnWriteArrayList is implemented.
Something like ...
class Collector {
private volatile HashMultimap values = HashMultimap.create();
public add(String k, String v) {
HashMultimap t = HashMultimap.create(values);
t.put(k,v);
this.values = t; // this invokes a memory barrier
}
public Set<String> get(String k) {
values.get(k); // this volatile read is memory barrier
}
}
However, both your and my solution still have a bit of a problem -- we are both returning mutable views on the underlying data structure. I might change the HashMultimap to an ImmutableMultimap to fix the mutability issue. Beware also that callers retain a reference to the full internal map (not just the returned Set) as a side effect of things being a view.
Creating a new copy can seem somewhat wasteful, but I suspect that if you have only one thread writing, then you have an understanding of the rate of change and can decide if that's reasonable or not. For example, f you wanted to return Set<String> instances which update dynamically as things change then the solution based on map maker doesn't seem heavy handed.

Synchronized Map or synchronized methods

I have the following class for a Router's table with synchronised methods:
public class RouterTable {
private String tableForRouter;
private Map<String,RouterTableEntry> table;
public RouterTable(String router){
tableForRouter = router;
table = new HashMap<String,RouterTableEntry>();
}
public String owner(){
return tableForRouter;
}
public synchronized void add(String network, String ipAddress, int distance){
table.put(network, new RouterTableEntry(ipAddress, distance));
}
public synchronized boolean exists(String network){
return table.containsKey(network);
}
}
Multiple threads will read and write to the HashMap. I was wondering if it would be best to remove the synchronized on the methods and just use Collections.synchronizedMap(new HashMap<String,RouterTableEntry())` what is the most sensible way in Java to do this?
I would suggest using a ConcurrentHashmap. This is a newer data structure introduced in later version of Java. It provides thread safety and allows concurrent operations, as opposed to a synchronized map, which will do one operation at a time.
If the map is the only place where thread safety is required, then just using the ConcurrentHashmap is fine. However, if you have atomic operations involving more state variables, I would suggest using synchronized code blocks instead of synchronized functions
In the absence of strict requirements about happens-before relationships and point in time correctness, the sensible thing to do in modern java is usually just use a ConcurrentMap.
Otherwise, yes, using a Collections#synchronizedMap is both safer and likely more performant (because you won't enclose any tertiary code that doesn't need synchronization) than manually synchronizing everything yourself.
The best is to use a java.util.concurrent.ConcurrentHashMap, which is designed from the ground up for concurrent access (read & write).
Using synchronization like you do works, but shows high contention and therefore not optimal performance. A collection obtained through Collections.synchronizedMap() would do just the same (it only wraps a standart collection with synchronized methods).
ConcurrentHashMap, on the contrary, used various techniques to be thread-safe and provide good concurrency ; for example, it has (by default) 16 regions, each guarded by a distinct lock, so that up to 16 threads can use it concurrently.
Synchronizing the map will prevent users of your class from doing meaningful synchronization.
They will have no way of knowing if the result from exists is still valid, once they get into there if statement, and will need to do external synchronization.
With the synchronized methods as you show, they could lock on your class until they are done with a block of method calls.
The other option is to do no synchronization and let the user handle that, which they need to do anyway to be safe.
Adding your own synchronization is what was wrong with HashTable.
The current common style tends to prefer Synchronized collections over explicit synchronized qualification on the methods that access them. However, this is not set in stone, and your decision should depend on the way you use this code/will use this code in the future.
Points to consider:
(a) If your map is going to be used by code that is outside of the RouterTable then you need to use a SynchronizedMap.
(b) OTOH, if you are going to add some additional fields to RouterTable, and their values need to be consistent with the values in the map (in other words: you want changes to the map and to the additional fields to happen in one atomic quantum), then you need to use synchrnoized method.

Does the static ConcurrentHashmap needs external synchronisation

Does the static ConcurrentHashmap need to be externaly synchronized using synchronize block or locks?
Yes and no. It depends on what you're doing. ConcurrentHashMap is thread safe for all of its methods (e.g. get and put). However, it is not thread safe for non-atomic operations. Here is an example a method that performs a non-atomic operation:
public class Foo {
Map<String, Object> map = new ConcurrentHashMap<String, Object>();
public Object getFoo(String bar) {
Object value = foo.get(bar);
if (value == null) {
value = new Object();
map.put(bar, foo);
}
return value;
}
}
The flaw here is that it is possible for two threads calling getFoo to receive a different Object. Remember that when dealing with a any data structure or type, even as simple as an int, non-atomic operations always require external synchronization. Classes such as AtomicInteger and ConcurrentHashMap assist in making some common operations thread safe, but do not protect against check-then-set operations such as in getFoo above.
You only need external synchronization if you need to obtain a lock on the collection. The collection doesn't expose its internal locks.
ConcurrentMap has putIfAbsent, however if the creation of the object is expensive you may not want to use this.
final ConcurrentMap<Key, Value> map =
public Value get(Key key) {
// allow concurrent read
return map.get(key);
}
public Value getOrCreate(Key key) {
// could put an extra check here to avoid synchronization.
synchronized(map) {
Value val = map.get(key);
if (val == null)
map.put(key, val = new ExpensiveValue(key));
return val;
}
}
As far as I know all needed locking is done in this class so that you don't need to worry about it too much in case you are not doing some specific things and need it to function like that.
On http://download.oracle.com/javase/1,5.0/docs/api/java/util/concurrent/ConcurrentHashMap.html it says:
However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access.
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset.
So in case this does not represent any problems in your specific application you do not need to worry about it.
No: No need to synchronise externally.
All methods on the java.util.concurrent classes are threadsafe.

Categories