What kind of the Map interface I should use? - java

I need to make the following class thread-safe:
//Shared among all threads
public class SharedCache {
private Map<Object, Future<Collection<Integer>>> chachedFutures;
{
chachedFutures = new ConcurrentHashMap<>(); //not sure about that
}
public Future<Collection<Integer>> ensureFuture(Object value,
FutureFactory<Collection<Integer>> ff){
if(chachedFutures.containsKey(value))
return chachedFutures.get(value);
Future<Collection<Integer>> ftr = ff.create();
chachedFutures.put(value, ftr);
return ftr;
}
public Future<Collection<Integer>> remove(Object value){
return chachedFutures.remove(value);
}
}
After reading the article about the ConcurrentHashMap class it's still difficult for me to make a right decision.
Firstly, I tended to make the methods ensureFuture and remove just synchronized. And it would work, but from the performance standpoint it was not very good because of mutually-exclusing.
I don't know the exact (even approximately) amount of threads having access to the Cache simultaneously and the size of the Cache. Taking into account that
resizing this or any other kind of hash table is a relatively slow
operation
I didn't specify the initial size of the map. Also the concurrencyLevel parameter. Is it justified to use ConcurrentHashMap here or synchronized methods would be enough?

You have following methods:
public Future<Collection<Integer>> ensureFuture(Object value,
FutureFactory<Collection<Integer>> ff){
if(chachedFutures.containsKey(value))
return chachedFutures.get(value);
Future<Collection<Integer>> ftr = ff.create();
chachedFutures.put(value, ftr);
return ftr;
}
public Future<Collection<Integer>> remove(Object value){
return chachedFutures.remove(value);
}
There are some points to be noticed:
Suppose method ensureFuture is not synchronized in that case it is possible that one thread invokes containsKey which returns true but before next line is executed another thread may remove the entry respective to that key. This can lead to race condition as it is check-then-act scenario. Check this as well.
Also you are using chachedFutures.put(value, ftr) but IMO you should use chachedFutures.putIfAbsent(value, ftr) . For this method if the specified key is not already associated with a value (or is mapped to null) associates it with the given value and returns null, else returns the current value. Using this you can also avoid contains check.
Is it justified to use ConcurrentHashMap here or synchronized methods
would be enough?
It depends as CHM needs more memory compared to HashMap due to lot of bookkeeping activities etc. Another alternative is to use Collections.synchronizedMap which will provide synchronization on a regular HashMap.

Related

Using a thread safe static mutable Map of Map in Java

I need to add a static thread safe HashMap. I have something like this -
private static Map<String, ConcurrentHashMap<Integer, ClassA>> myCache =
new ConcurrentHashMap<String, ConcurrentHashMap<Integer, ClassA>>();
Even though I am using ConcurrentHashMap, I see that if main thread is adding element in the myCache, and when the same is accessed in other thread even at later time, it does not have the latest data in the myCache object. For e.g
Thread 1: Adds entry to the map
myCache = {a1={1=com.x.y.z.ClassA#3ec8b657}}
Thread 2: Access the same key a1. But it does not see the data added by Thread 1. Rather it sees empty value for this key myCache = {a1={}}
As a result, data is getting corrupted. Entries added for the a1 key in Thread 1 are not visible in Thread 2.
Thanks in advance for any pointers on how can I update this map in thread safe manner.
Even though I am using ConcurrentHashMap, I see that if main thread is adding element in the myCache, and when the same is accessed in other thread even at later time, it does not have the latest data in the myCache object.
ConcurrentHashMap is running in a large number of applications around the world at this instance. If your fairly typical use case didn't work appropriately, many critical systems would be failing.
Something is going on but chances are high that it is nothing to do with ConcurrentHashMap. So here are some questions for you to help you debug your code:
Are you sure that the cache lookup happens after the cache put? ConcurrentHashMap doesn't save you from race conditions in your code.
Any chance this is a problem with the hashcode() or equals() functions of the key object? By default the hashcode() and equals() methods are for the object instance and not the object value. See the consistency requirements.
Any chance that the cache lookup happens after a cache remove or a timeout of the cache value? Is there cache cleanup logic?
Do you have logic that does 2 operations on the ConcurrentHashMap. For example testing for existence of a cache entry and then making another call to put the value? You should be using putIfAbsent(...) or the other atomic calls if so.
If you edit your post and show a small sample of your code with the key object, the real source of the issue may be revealed.
I've never had good results with ConcurrentHashMap. When I need thread safety, I usally do something like this:
public class Cache<K, V> {
private Map<K, V> cache = new HashMap<>();
public V get(K key) {
synchronized (cache) {
return cache.get(key);
}
}
public void put(K key, V value) {
synchronized (cache) {
cache.put(key, value);
}
}
}
The #Ryan answer is essentially correct.
Remember that you must proxy every Map method that you wish to use
and you must synchronize to the cashe element within every proxied method.
For example:
public void clear()
{
synchronized(cache)
{
cache.clear();
}
}

synchronized block has a logic

There is special need for creating thread monitor based on the string value.
Ex:
Map<String, String> values = new HashMap<>(); (instance variable)
values.put("1", "one");values.put("2", "two");values.put("3", "three");
void someMethod(String value) {
synchronized(values.get(value) == null ? value : values.get(value)) {
sout("I'm done");
}
}
The catch here is synchronized block has a ternary operator, is it allowed? I don't get any compile/run time exception or error.
I'm not sure about the above code really thread safe, at a time only one thread has to obtain the system monitor based on the string value.
Please provide thoughts on this. is this good practice or any other way around?
There are fundamental problems with this approach. You’re accessing a HashMap, which is not thread safe, before ever entering the synchronized block. If there are updates to the map after its construction, this approach is broken.
It’s crucial to use the same object instance for synchronizing when accessing the same data.
So even if you used a thread safe map here, using values.get(value) == null? value: values.get(value) means using changing objects for synchronization, when there are map updates, sometimes it uses the key, sometimes the mapped value, depending on whether a mapping is present. Even when the key is always present, it may use different mapped values.
It’s also pertinent to the Check-Then-Act anti-pattern, as you are checking values.get(value) == null first, and using values.get(value) afterwards, when the condition could have changed already.
You should never use strings for synchronization, as different string objects may be equal, so they map to the same data when using them as key to a Map, whereas synchronization fails due to different object identity. On the other hand, strings may get shared freely in a JVM and they are in case of string literals, so unrelated code performing synchronization on strings could block each other.
There’s a simple solution using a tool designed for this purpose. When using
ConcurrentMap<String, String> values = new ConcurrentHashMap<>();
void someMethod(String string) {
values.compute(string, (key,value) -> {
if(value == null) value = key.toUpperCase(); // construct when not present
// update value
return value;
});
}
the string’s equality determines the mutual exclusion while not serving as the synchronization key itself. So equal keys provide the desired blocking, while unrelated code, e.g. using a different ConcurrentHashMap with similar or even the same key values, is not affected by these operations.

Synchronized collection vs synchronized method?

I have a class container containing a collection which is going to be used by multiple threads:
public class Container{
private Map<String, String> map;
//ctor, other methods reading the map
public void doSomeWithMap(String key, String value){
//do some threads safe action
map.put(key, value);
//do something else, also thread safe
}
}
What would be better, to declare the method synchronized:
public synchronized void doSomeWithMap(String key, String value)
or to use standard thread-safe decorator?
Collections.synchronizedMap(map);
Generally speaking, synchronizing the map will protect most access to it without having to think about it further. However, the "synchronized map" is not safe for iteration which may be an issue depending on your use case. It is imperative that the user manually synchronize on the returned map when iterating over any of its collection views.
Consider using ConcurrentHashMap if that will meet your use case.
If there is other state to this object that needs to be protected from concurrency errors, then you will need to use synchronized or a Lock.
If your doSomeWithMap method will access the map more than once, you must synchronize the doSomeWithMap method. If the only access is the put() call shown, then it's better to use a ConcurrentHashMap.
Note that "more than once" is any call, and an iterator is by nature many "gets".
If you look at the implementation of SynchronizedMap, you'll see that it's simply a map wrapping a non thread-safe map that uses a mutex before calling any method
public V get(Object key) {
synchronized (mutex) {return m.get(key);}
}
public V put(K key, V value) {
synchronized (mutex) {return m.put(key, value);}
}
public Set<Map.Entry<K,V>> entrySet() {
synchronized (mutex) {
if (entrySet==null)
entrySet = new SynchronizedSet<>(m.entrySet(), mutex);
return entrySet;
}
}
If all you want is protecting get and put, this implementation does it for you.
However it's not suitable if you want a Map that can be iterated over and updated by two or more threads, in which case you should use a ConcurrentHashMap.
If the other things you do inside doSomeWithMap would cause problems if done concurrently by different threads (for example, they update class-level variables) then you should synchronise the whole method. If this is not the case then you should use a synchronised Map in order to minimise the length of time the synchronisation lock is left in place.
You should probably have the block to be synchronized upon the requirement. Please note this.
When you use synchronized Collection like ConcurrentHashMap or Collection's method like synchronizedMap(), synchronizedList() etc, only the Map/List is synchronized. To explain little further,
Consider,
Map<String, Object> map = new HashMap<>();
Map<String, Object> synchMap = Collections.synchronizedMap(map);
This makes the map's get operation to be synchronous and not the objects inside it.
Object o = synchMap.get("1");// Object referenced by o is not synchronized. Only the map is.
If you want to protect the objects inside the Map, then you also have to put the code inside the synchronized block. This is good to remember as many people forget to safe guard the object in most cases.
Look at this for little info too Collection's synchronizedMap

Creating a ConcurrentHashMap that supports "snapshots"

I'm attempting to create a ConcurrentHashMap that supports "snapshots" in order to provide consistent iterators, and am wondering if there's a more efficient way to do this. The problem is that if two iterators are created at the same time then they need to read the same values, and the definition of the concurrent hash map's weakly consistent iterators does not guarantee this to be the case. I'd also like to avoid locks if possible: there are several thousand values in the map and processing each item takes several dozen milliseconds, and I don't want to have to block writers during this time as this could result in writers blocking for a minute or longer.
What I have so far:
The ConcurrentHashMap's keys are Strings, and its values are instances of ConcurrentSkipListMap<Long, T>
When an element is added to the hashmap with putIfAbsent, then a new skiplist is allocated, and the object is added via skipList.put(System.nanoTime(), t).
To query the map, I use map.get(key).lastEntry().getValue() to return the most recent value. To query a snapshot (e.g. with an iterator), I use map.get(key).lowerEntry(iteratorTimestamp).getValue(), where iteratorTimestamp is the result of System.nanoTime() called when the iterator was initialized.
If an object is deleted, I use map.get(key).put(timestamp, SnapShotMap.DELETED), where DELETED is a static final object.
Questions:
Is there a library that already implements this? Or barring that, is there a data structure that would be more appropriate than the ConcurrentHashMap and the ConcurrentSkipListMap? My keys are comparable, so maybe some sort of concurrent tree would better support snapshots than a concurrent hash table.
How do I prevent this thing from continually growing? I can delete all of the skip list entries with keys less than X (except for the last key in the map) after all iterators that were initialized on or before X have completed, but I don't know of a good way to determine when this has happened: I can flag that an iterator has completed when its hasNext method returns false, but not all iterators are necessarily going to run to completion; I can keep a WeakReference to an iterator so that I can detect when it's been garbage collected, but I can't think of a good way to detect this other than by using a thread that iterates through the collection of weak references and then sleeps for several minutes - ideally the thread would block on the WeakReference and be notified when the wrapped reference is GC'd, but I don't think this is an option.
ConcurrentSkipListMap<Long, WeakReference<Iterator>> iteratorMap;
while(true) {
long latestGC = 0;
for(Map.Entry<Long, WeakReference<Iterator>> entry : iteratorMap.entrySet()) {
if(entry.getValue().get() == null) {
iteratorMap.remove(entry.getKey());
latestGC = entry.getKey();
} else break;
}
// remove ConcurrentHashMap entries with timestamps less than `latestGC`
Thread.sleep(300000); // five minutes
}
Edit: To clear up some confusion in the answers and comments, I'm currently passing weakly consistent iterators to code written by another division in the company, and they have asked me to increase the strength of the iterators' consistency. They are already aware of the fact that it is infeasible for me to make 100% consistent iterators, they just want a best effort on my part. They care more about throughput than iterator consistency, so coarse-grained locks are not an option.
What is your actual use case that requires a special implementation? From the Javadoc of ConcurrentHashMap (emphasis added):
Retrievals reflect the results of the most recently completed update operations holding upon their onset. ... Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time.
So the regular ConcurrentHashMap.values().iterator() will give you a "consistent" iterator, but only for one-time use by a single thread. If you need to use the same "snapshot" multiple times and/or by multiple threads, I suggest making a copy of the map.
EDIT: With the new information and the insistence for a "strongly consistent" iterator, I offer this solution. Please note that the use of a ReadWriteLock has the following implications:
Writes will be serialized (only one writer at a time) so write performance may be impacted.
Concurrent reads are allowed as long as there is no write in progress, so read performance impact should be minimal.
Active readers block writers but only as long as it takes to retrieve the reference to the current "snapshot". Once a thread has the snapshot, it no longer blocks writers no matter how long it takes to process the information in the snapshot.
Readers are blocked while any write is active; once the write finishes then all readers will have access to the new snapshot until a new write replaces it.
Consistency is achieved by serializing the writes and making a copy of the current values on each and every write. Readers that hold a reference to a "stale" snapshot can continue to use the old snapshot without worrying about modification, and the garbage collector will reclaim old snapshots as soon as no one is using it any more. It is assumed that there is no requirement for a reader to request a snapshot from an earlier point in time.
Because snapshots are potentially shared among multiple concurrent threads, the snapshots are read-only and cannot be modified. This restriction also applies to the remove() method of any Iterator instances created from the snapshot.
import java.util.*;
import java.util.concurrent.locks.*;
public class StackOverflow16600019 <K, V> {
private final ReadWriteLock locks = new ReentrantReadWriteLock();
private final HashMap<K,V> map = new HashMap<>();
private Collection<V> valueSnapshot = Collections.emptyList();
public V put(K key, V value) {
locks.writeLock().lock();
try {
V oldValue = map.put(key, value);
updateSnapshot();
return oldValue;
} finally {
locks.writeLock().unlock();
}
}
public V remove(K key) {
locks.writeLock().lock();
try {
V removed = map.remove(key);
updateSnapshot();
return removed;
} finally {
locks.writeLock().unlock();
}
}
public Collection<V> values() {
locks.readLock().lock();
try {
return valueSnapshot; // read-only!
} finally {
locks.readLock().unlock();
}
}
/** Callers MUST hold the WRITE LOCK. */
private void updateSnapshot() {
valueSnapshot = Collections.unmodifiableCollection(
new ArrayList<V>(map.values())); // copy
}
}
I've found that the ctrie is the ideal solution - it's a concurrent hash array mapped trie with constant time snapshots
Solution1) What about just synchronizing on the puts, and on the iteration. That should give you a consistent snapshot.
Solution2) Start iterating and make a boolean to say so, then override the puts, putAll so that they go into a queue, when the iteration is finished simply make those puts with the changed values.

Correctly synchronizing equals() in Java

I have the following class which contains only one field i. Access to this field is guarded by the lock of the object ("this"). When implementing equals() I need to lock this instance (a) and the other (b). If thread 1 calls a.equals(b) and at the same time thread 2 calls b.equals(a), the locking order is reverse in the two implementations and may result in deadlock.
How should I implement equals() for a class which has synchronized fields?
public class Sync {
// #GuardedBy("this")
private int i = 0;
public synchronized int getI() {return i;}
public synchronized void setI(int i) {this.i = i;}
public int hashCode() {
final int prime = 31;
int result = 1;
synchronized (this) {
result = prime * result + i;
}
return result;
}
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Sync other = (Sync) obj;
synchronized (this) {
synchronized (other) {
// May deadlock if "other" calls
// equals() on "this" at the same
// time
if (i != other.i)
return false;
}
}
return true;
}
}
Trying to synchronize equals and hashCode inside the object will not work properly. Consider the case of a HashMap that uses hashCode to discover which "bucket" an object will be in, and then uses equals to sequentially search all objects in the bucket.
If objects are allowed to mutate in a way that changes the outcomes of hashCode or equals you could end up with a scenario where HashMap calls hashCode. It acquires the lock, gets the hash and releases the lock again. HashMap then proceeds to compute which "bucket" to use. But before HashMap can acquire the lock on equals someone else grabs the lock and mutates the object so that equals become inconsistent with the previous value of hashCode. This will lead to catastrophic results.
The hashCode and equals methods are used in a lot of places and is core to the Java collections API. I might be valuable to rethink your application structure that do not require synchronized access to these methods. Or at the very least not synchronize on the object itself.
Why synchronise? What is the use case where it matters one if them changes during the comparison and it does not matter if if changes immediately after before code depending on equality runs. (ie if you have code depending on equlity what happens if the values become unequal before or during this code)
I think you have to take a look at the larger process to see where you need to lock.
Where is the point in synchronizing equals() if the result isn't guaranteed to be true after synchronization was left:
if (o1.equals(o2)) {
// is o1 still equal to o2?
}
Hence you could simply synchronize calls to getI() inside equals one after another without changing the output - it's simple not valid anymore.
You'll always have to synchronize the whole block:
synchronized(o1) {
synchronized(o2) {
if (o1.equals(o2)) {
// is o1 still equal to o2?
}
}
}
Admittedly, you'll still face the same problem, but at least your synchronizing at the right point ;)
If it has been said enough, the fields you use for hashCode(), equals() or compareTo() should be immutable, preferably final. In this case you don't need to synchronise them.
The only reason to implement hashCode() is so the object can be added to a hash collection, and you cannot validly change the hashCode() of an object which has been added to such a collection.
You are attempting to define a content-based "equals" and "hashCode" on a mutable object. This is not only impossible: it doesn't make sense. According to
http://java.sun.com/javase/6/docs/api/java/lang/Object.html
both "equals" and "hashCode" need to be consistent: return the same value for successive invocations on the same object(s). Mutability by definition prevents that. This is not just theory: many other classes (eg collections) depend on the objects implementing the correct semantics for equals/hashCode.
The synchronization issue is a red herring here. When you solve the underlying problem (mutability), you won't need to synchronize. If you don't solve the mutability problem, no amount of synchronization will help you.
(I assume that you're interested in the general case here, and not just in wrapped integers.)
You can't prevent two threads from calling set... methods in arbitrary order. So even when one thread gets a (valid) true from calling .equals(...), that result could be invalidated immediately by another thread that calls set... on one of the objects. IOW the result only means that the values were equal at the instant of comparison.
Therefore, synchronizing would protect against the case of the wrapped value being in an inconsistent state while you are attempting to do the compare (e.g. two int-sized halves of a wrapped long being updated consecutively). You could avoid a race condition by copying each value (i.e. independently synchronized, without overlap) and then comparing the copies.
The only way to know for sure if synchronization is strictly necessary is to analyze the entire program for situations. There are two things you need to look for; situations where one thread is changing an object while another is calling equals, and situations where the thread calling equals might see a stale value of i.
If you lock both this and the other object at the same time you do indeed risk a deadlock. But I'd question that you need to do this. Instead, I think you should implement equals(Object) like this:
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Sync other = (Sync) obj;
return this.getI() == other.getI();
}
This does not guarantee that the two objects have the same value of i at the same time, but that is unlikely to make any practical difference. After all, even if you did have that guarantee, you'd still have to cope with the issue that the two objects might no longer be equal by the time that the equals call returned. (This is #s's point!)
Furthermore, this does not entirely eliminate the risk of deadlock. Consider the case where a thread may call equals while holding a lock on one of the two objects; e.g.
// In same class as above ...
public synchronized void frobbitt(Object other) {
if (this.equals(other)) {
...
}
}
Now if two threads call a.frobbitt(b) and b.frobbitt(a) respectively, there is a risk of deadlock.
(However, you do need to call getI() or declare i to be volatile, otherwise the equals() could see a stale value of i if it was recently updated by a different thread.)
This having been said, there is something rather worrying about a value-based equals method on an object whose component values may be mutated. For example, this will break many of the collection types. Combine this with multi-threading and you are going to have a lot of difficulty figuring out whether your code is really correct. I cannot help thinking that you would be better off changing the equals and hashcode methods so that they don't depend on state that may mutate after the methods have been called the first time.
Always lock them in the same order, one way you could decide the order is on the results of System.identityHashCode(Object)
Edit to include comment:
The best solution to deal with the rare case of the identityHashCodes being equal requires more details about what other locking of those objects is going on.
All multiple object lock requirements should use the same resolution process.
You could create a shared utility to track objects with the same identityHashCode for the short period of the lock requirements, and provide a repeatable ordering for them for the period that they're being tracked.
The correct implementation of equals() and hashCode() is required by various things like hashing data structures, and so you have no real option there. From another perspective, equals() and hashCode() are just methods, with the same requirements on synchronization as other methods. You still have the deadlock problem, but it's not specific to the fact that it equals() that's causing it.
As Jason Day points out, integer compares are already atomic, so synchronizing here is superfluous. But if you were just constructing a simplified example and in real life you're thinking of a more complex object:
The direct answer to your question is, insure that you always compare the items in a consistent order. It doesn't matter what that order is, as long as it's consistent. In a case like this, System.identifyHashCode would provide an ordering, like:
public boolean equals(Object o)
{
if (this==o)
return true;
if (o==null || !o instanceof Sync)
return false;
Sync so=(Sync) o;
if (System.identityHashCode(this)<System.identityHashCode(o))
{
synchronized (this)
{
synchronized (o)
{
return equalsHelper(o);
}
}
}
else
{
synchronized (o)
{
synchronized (this)
{
return equalsHelper(o);
}
}
}
}
Then declare equalsHelper private and let it do the real work of comparing.
(But wow, that's a lot of code for such a trivial issue.)
Note that for this to work, any function that can change the state of the object would have to be declared synchronized.
Another option would be to synchronize on Sync.class rather than on either object, and then to also synchronize any setters on Sync.class. This would lock everything on a single mutex and avoid the whole problem. Of course, depending on what you're doing this might cause undesired blocking of some threads. You'd have to think through the implications in light of what your program is about.
If this is a real issue in a project you're working on, a serious alternative to consider would be to make the object immutable. Think of String and StringBuilder. You could create a SyncBuilder object that lets you do any work you need to create one of these things, then have a Sync object whose state is set by the constructor and can never change. Create a constructor that takes a SyncBuilder and sets its state to to match or have a SyncBuilder.toSync method. Either way, you do all your building in SyncBuilder, then turn it into a Sync and now you're guaranteed immutability so you don't have to mess with synchronization at all.
Do not use syncs. Think about unmodifiable beans.
You need to make sure the objects do not change between the calls to hashCode() and equals() (if called). Then you must assure that the objects do not change (to the extend that hashCode and equals are concerned) while the object sits in a hashmap. To change the object you must first remove it, then change it and put it back.
As others have mentioned, if things change during the equals check, there is already a possibility of crazy behavior (even with correct synchronization). so, all you really need to worry about is visibility (you want to make sure a change which "happens before" your equals call is visible). therefore, you can just do "snapshot" equals which will be correct in terms of "happens before" relationship and will not suffer from lock ordering problems:
public boolean equals(Object o) {
// ... standard boilerplate here ...
// take a "snapshot" (acquire and release each lock in turn)
int myI = getI();
int otherI = ((Sync)o).getI();
// and compare (no locks held at this point)
return myI == otherI;
}
Reads and writes to int variables are already atomic, so there is no need to synchronize the getter and setter (see http://java.sun.com/docs/books/tutorial/essential/concurrency/atomic.html).
Likewise, you don't need to synchronize equals here. While you could prevent another thread from changing one of the i values during comparison, that thread would simply block until the equals method completes and change it immediately afterwards.

Categories