EnumMap with concurrent put/get - java

I am considering using EnumMap in a concurrent environment. However, the environment is atypical, here's why:
EnumMap is always full: there are no unmapped keys when the map is exposed to the concurrent environment
Only put() and get() operations will be used (no iterating over, no remove(), etc.)
It is completely acceptable if a call to get() does not reflect a call to put() immediately or orderly.
From what I could gather, including relevant method source code, this seems to be a safe scenario (unlike if iterations were allowed). Is there anything I might have overlooked?

In general, using non-thread-safe classes across threads is fraught with many problems. In your particular case, assuming safe publication after all keys have had values assigned (such that map.size() == TheEnum.values().length), the only problem I can see from a quickish glance of EnumMap's code in Java 1.6 is that a put may not ever get reflected in another thread. But that's only true because of the internals of EnumMap's implementation, which could change in the future. In other words, future changes could break the use case in more dangerous, subtle ways.
It's possible to write correct code that still contains data races -- but it's tricky. Why not just wrap the instance in a Collections.synchronizedMap?

Straight from the JavaDoc:
Like most collection implementations EnumMap is not synchronized. If multiple threads access an enum map concurrently, and at least one of the threads modifies the map, it should be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the enum map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap(java.util.Map<K, V>) method. This is best done at creation time, to prevent accidental unsynchronized access:
Map<EnumKey, V> m = Collections.synchronizedMap(new EnumMap<EnumKey, V>(...));

The problem you have is that threads may not ever see the change made by another thread or they may see partially made changes. It's the same reason double-check-locking was broken before java 5 introduced volatile.
It might work if you made the EnumMap reference volatile but I'm not 100% sure even then, you might need the internal references inside the EnumMap to be volatile and obviously you can't do that without doing your own version of EnumMap.

Related

Java Concurrency in Practice: 3.5.4 Effectively immutable objects: Do we need Thread-Safe Collection containers for effectively immutable objects

Section 3.5.4 discusses about effectively immutable objects, that is, once an object is safely and fully constructed, its state would not be changed by any code of any code-path.
Sir Goetz has given an example:
For example, Date is mutable, but if you use it as if it were
immutable you may be able to eliminate the locking that would
otherwise be required when shared[sharing] a Date across threads.
Suppose you want to maintain a Map storing the last login time of each
user:
public Map<String, Date> lastLogin =
Collections.synchronizedMap(new HashMap<String, Date>());
If the Date values are not modified after they are placed in the
Map, then the synchronization in the synchronizedMap
implementation is sufficient to publish the Date values safely, and
no additional synchronization is needed when accessing them.
The point that I am not able to understand is that why do we want to use synchronizedMap and bear the extra overhead of its internal lockings when we could have simply used unsafe Map, because after all we would be placing effectively immutable Date objects in it - which means, that once properly and fully constructed and published, it would not be mutated anymore. And so even if the Map itself be unsafe, there would be just no code in any of the code-paths which could concurrently mutate any Date instance while other thread(s) has retrieved it from the Unsafe Map.
To sum up, the very premise of effectively immutable objects does not necessitate need of any Thread-safe containers since we should just not have any mutator code in any code-path for effectively immutable objects.
If you use an un-synchronized mutable map and share it across threads then you will have two thread-safety issues :visibility and atomicity. Thread-1 wont know if Thread-2 has removed a Map-Entry or it replaced its value by a new Date object.
// not atmoic and doesn't guarantee visiblity
if(map.contains(key)){
map.put(key,newDate);
}
The key phrase from the original text is, "fully constructed and published." "Published", in particular refers to making an object created by one thread visible to other threads, and when the object is not truly immutable, then it must be done safely (Google "Java safe publication").
Without synchronization, Java does not guarantee that updates to variables made by one thread will be seen by other threads or, in what order the updates will be seen.
In most computer architectures, providing a consistent view of shared memory to all of the threads is relatively expensive. By not requiring the threads to have a consistent view except when explicitly synchronizing, Java allows the threads to get a consistent view when it is needed, or to get the best performance possible when it is not needed.
Also, All of the above ignores the very real possibility that the program might need to synchronize accesses to the Map for other reasons (e.g., to prevent simultaneous updates from corrupting the Map itself.)

ConcurrentHashMap operations

Following are some lines from the java docs of ConcurrentHashMap
This class obeys the same functional specification as Hashtable, and
includes versions of methods corresponding to each method of
Hashtable. However, even though all operations are thread-safe,
retrieval operations do not entail locking, and there is not any
support for locking the entire table in a way that prevents all
access.
What is the meaning of the statement
though all operations are thread-safe
from above paragraph?
Can anyone explain with any example of put() or get() methods?
The ConcurrentHashMap allows concurrent modification of the Map from several threads without the need to block them. Collections.synchronizedMap(map) creates a blocking Map which will degrade performance, albeit ensure consistency (if used properly).
Use the second option if you need to ensure data consistency, and each thread needs to have an up-to-date view of the map. Use the first if performance is critical, and each thread only inserts data to the map, with reads happening less frequently.
Your question is odd. If you understand what "thread safety" means then you would be able to understand how it applies to get() and put() on your own. If you don't understand thread safety then there is no point to explain it specifically in relation to get() and put(). Are you sure this isn't a homework question?
However, answering your question anyway, the fact that ConcurrentHashMap is thread safe means that if you have several threads executing put()s on the same map at the same time, then: a) no damage will occur to the internal data structures of the map and: b) some other thread doing a get() will see all of the values put in by the other threads. With a non-thread safe Map such as HashMap neither of those are guaranteed.

Is ConcurrentHashMap totally safe?

this is a passage from JavaDoc regarding ConcurrentHashMap. It says retrieval operations generally do not block, so may overlap with update operations. Does this mean the get() method is not thread safe?
"However, even though all operations are thread-safe, retrieval
operations do not entail locking, and there is not any support for
locking the entire table in a way that prevents all access. This class
is fully interoperable with Hashtable in programs that rely on its
thread safety but not on its synchronization details.
Retrieval operations (including get) generally do not block, so may
overlap with update operations (including put and remove). Retrievals
reflect the results of the most recently completed update operations
holding upon their onset."
The get() method is thread-safe, and the other users gave you useful answers regarding this particular issue.
However, although ConcurrentHashMap is a thread-safe drop-in replacement for HashMap, it is important to realize that if you are doing multiple operations you may have to change your code significantly. For example, take this code:
if (!map.containsKey(key))
return map.put(key, value);
else
return map.get(key);
In a multi-thread environment, this is a race condition. You have to use the ConcurrentHashMap.putIfAbsent(K key, V value) and pay attention to the return value, which tells you if the put operation was successful or not. Read the docs for more details.
Answering to a comment that asks for clarification on why this is a race condition.
Imagine there are two threads A, B that are going to put two different values in the map, v1 and v2 respectively, having the same key. The key is initially not present in the map. They interleave in this way:
Thread A calls containsKey and finds out that the key is not present, but is immediately suspended.
Thread B calls containsKey and finds out that the key is not present, and has the time to insert its value v2.
Thread A resumes and inserts v1, "peacefully" overwriting (since put is threadsafe) the value inserted by thread B.
Now thread B "thinks" it has successfully inserted its very own value v2, but the map contains v1. This is really a disaster because thread B may call v2.updateSomething() and will "think" that the consumers of the map (e.g. other threads) have access to that object and will see that maybe important update ("like: this visitor IP address is trying to perform a DOS, refuse all the requests from now on"). Instead, the object will be soon garbage collected and lost.
It is thread-safe. However, the way it is being thread-safe may not be what you expect. There are some "hints" you can see from:
This class is fully interoperable with Hashtable in programs that
rely on its thread safety but not on its synchronization details
To know the whole story in a more complete picture, you need to be aware of the ConcurrentMap interface.
The original Map provides some very basic read/update methods. Even I was able to make a thread-safe implementation of Map; there are lots of cases that people cannot use my Map without considering my synchronization mechanism. This is a typical example:
if (!threadSafeMap.containsKey(key)) {
threadSafeMap.put(key, value);
}
This piece of code is not thread-safe, even though the map itself is. Two threads calling containsKey() at the same time could think there is no such key they both therefore insert into the Map.
In order to fix the problem, we need to do extra synchronization explicitly. Assume the thread-safety of my Map is achieved by synchronized keywords, you will need to do:
synchronized(threadSafeMap) {
if (!threadSafeMap.containsKey(key)) {
threadSafeMap.put(key, value);
}
}
Such extra code needs you to know about the "synchronization details" of the map. In the above example, we need to know that the synchronization is achieved by "synchronized".
ConcurrentMap interface take this one step further. It defines some common "complex" actions that involves multiple access to map. For example, the above example is exposed as putIfAbsent(). With these "complex" actions, users of ConcurrentMap (in most case) don't need to synchronise actions with multiple access to the map. Hence, the implementation of Map can perform more complicated synchronization mechanism for better performance. ConcurrentHashhMap is a good example. Thread-safety is in fact maintained by keeping separate locks for different partitions of the map. It is thread-safe because concurrent access to the map will not corrupt the internal data structure, or cause any update lost unexpected, etc.
With all the above in mind, the meaning of Javadoc will be clearer:
"Retrieval operations (including get) generally do not block" because ConcurrentHashMap is not using "synchronized" for its thread-safety. The logic of get itself takes care of the thread-safeness; and If you look further in the Javadoc:
The table is internally partitioned to try to permit the indicated number
of concurrent updates without contention
Not only is retrieval non-blocking, even updates can happen concurrently. However, non-blocking/concurrent-updates does not means that it is thread-UNsafe. It simply means that it is using some ways other than simple "synchronized" for thread-safety.
However, as the internal synchronization mechanism is not exposed, if you want to do some complicated actions other than those provided by ConcurrentMap, you may need to consider changing your logic, or consider not using ConcurrentHashMap. For example:
// only remove if both key1 and key2 exists
if (map.containsKey(key1) && map.containsKey(key2)) {
map.remove(key1);
map.remove(key2);
}
ConcurrentHashmap.get() is thread-safe, in the sense that
It will not throw any exception, including ConcurrentModificationException
It will return a result that was true at some (recent) time in past. This means that two back-to-back calls to get can return different results. Of course, this true of any other Map as well.
HashMap is divided into "buckets" based on hashCode. ConcurrentHashMap uses this fact. Its synchronization mechanism is based on blocking buckets rather than on entire Map. This way few threads can simultaneously write to few different buckets (one thread can write to one bucket at a time).
Reading from ConcurrentHashMap almost doesn't use synchronization. Synchronization is used when while fetching value for key, it sees null value. Since ConcurrentHashMap can't store null as values (yes, aside from keys, values also can't be nulls) it suggests that fetching null while reading happened in the middle of initializing map entry (key-value pair) by another thread: when key was assigned, but value not yet, and it still holds default null.
In such case reading thread will need to wait until entry will be written fully.
So results from read() will be based on current state of map. If you read value of key that was in the middle of updating you will likely get old value since writing process hasn't finished yet.
get() in ConcurrentHashMap is thread-safe because It reads the value
which is Volatile. And in cases when value is null of any key, then
get() method waits till it gets the lock and then it reads the updated
value.
When put() method is updating CHM, then it sets the value of that key to null, and then it creates a new entry and updates the CHM. This null value is used by get() method as signal that another thread is updating the CHM with the same key.
It just means that when one thread is updating and one thread is reading there is no guarantee that the one that called the ConcurrentHashMap method first, in time, will have their operation occur first.
Think about an update on the item telling where Bob is. If one thread asks where Bob is at about the same time that another thread updates to say he came 'inside', you can't predict whether the reader thread will get Bob's status as 'inside' or 'outside'. Even if the update thread calls the method first, the reader thread might get the 'outside' status.
The threads will not cause each other problems. The code is ThreadSafe.
One thread won't go into an infinite loop or start generating wierd NullPointerExceptions or get "itside" with half of the old status and half of the new.

Is java.util.Hashtable thread safe?

It's been a while since I've used hashtable for anything significant, but I seem to recall the get() and put() methods being synchronized.
The JavaDocs don't reflect this. They simply say that the class Hashtable is synchronized. What can I assume? If several threads access the hashtable at the same time (assuming they are not modifying the same entry), the operations will succeed, right? I guess what I'm asking is "Is java.util.Hashtable thread safe?"
Please Guide me to get out of this issue...
It is threadsafe because the get, put, contains methods etc are synchronized. Furthermore, Several threads will not be able to access the hashtable at the same time, regardless of which entries they are modifying.
edit - amended to include the provisio that synchronization makes the hashtable internally threadsafe in that it is modified atomically; it doesn't guard against race conditions in outside code caused by concurrent access to the hashtable by multiple threads.
For general usage it is thread safe.
But you have to understand that it doesent make your application logic around it thread-safe. For e.g. consider implementing to put a value in a map, if its not there already.
This idiom is called putIfAbsent. Its difficult to implement this in a thread-safe manner using HashTable alone. Similarly for the idiom replace(k,V,V).
Hence for certain idioms like putIfAbsent and and replace(K,V,V), I would recommend using ConcurrentHashMap
Hashtable is deprecated. Forget it. If you want to use synchronized collections, use Collections.syncrhonize*() wrapper for that purpose. But these ones are not recommended. In Java 5, 6 new concurrent algorithms have been implemented. Copy-on-write, CAS, lock-free algorithms.
For Map interface there are two concurrent implementations. ConcurrentHashMap (concurrent hash map) and ConcurrentSkipListMap - concurrent sorted map implementaion.
The first one is optimized for reading, so retrievals do not block even while the table is being updated. Writes are also work much faster comparing with synchronized wrappers cause a ConcurrentHashMap consists of not one but a set of tables, called segments. It can be managed by the last argument in the constructor:
public ConcurrentHashMap(int initialCapacity,
float loadFactor,
int concurrencyLevel);
ConcurrentHashMap is indispensable in highly concurrent contexts, where it performs far better than any available alternative.
No. It is 'threadsafe' only to the extent that its methods are synchronized. However it is not threadsafe in general, and it can't be, because classes that export internal state such as Iterators or Enumerations require the use of the internal state to be synchronized as well. That's why the new Collections classes are not synchronized, as the Java designers recognized that thread-safety is up to the user of the class, not the class itself.
I'm asking is "Is java.util.Hashtable thread safe?".
Yes Hashtable is thread safe, If a thread safe is not needed in your application then go through HashMap, In case, If a thread-safe implementation is desired,then it is recommended to use ConcurrentHashMap in place of Hashtable.
Note, that a lot of the answers state that Hashtable is synchronised. but this will give you a very little. The synchronization is on the accessor / mutator methods will stop two threads adding or removing from the map concurrently, but in the real world you will often need additional synchronisation.
Even iterating over a Hashtable's entries is not thread safe unless you also guard the Map from being modified through additional synchronization.
If you look into Hashtable code, you will see that methods are synchronized such as:
public synchronized V get(Object key)
public synchronized V put(K key, V value)
public synchronized boolean containsKey(Object key)
You can keep pressing on control key (command for mac) and then click on any method name in the eclipse to go to the java source code.
Unlike the new collection implementations, Hashtable is synchronized. *If a thread-safe implementation is not needed, it is recommended to use HashMap* in place of Hashtable. If a thread-safe highly-concurrent implementation is desired, then it is recommended to use ConcurrentHashMap in place of Hashtable.
http://download.oracle.com/javase/7/docs/api/java/util/Hashtable.html
Yes, Hashtable thread safe, so only one thread can access a hashtable at any time
HashMap, on the other side, is not thread safe (and thus 'faster').

Are concurrent classes provided by the JDK required to use their instance's own intrinstic lock for synchronization?

The JDK provides a set of thread-safe classes like ConcurrentHashMap, ConcurrentLinkedQueue and AtomicInteger.
Are these classes required to synchronize on this to implement their thread-safe behavior?
Provided that they do we can implement our own synchronized operations on these objects and mix them with the built-in ones?
In other words is it safe to do:
ConcurrentMap<Integer, Account> accounts
= new ConcurrentHashMap<Integer, Account>();
// Add an account atomically
synchronized(accounts) {
if (!accounts.containsKey(id)) {
Account account = new Account();
accounts.put(id, account);
}
}
And in another thread
// Access the object expecting it to synchronize(this){…} internally
accounts.get(id);
Note that the simple synchronized block above could probably be replaced by putIfAbsent() but I can see other cases where synchronizing on the object could be useful.
Are these classes required to
synchronize on this to implement their
thread-safe behavior.
No and, not only that, the various code inspection tools will warn you if you do try to use the object lock.
In the case of the put method above, note the javadoc:
A hash table supporting full
concurrency of retrievals and
adjustable expected concurrency for
updates. This class obeys the same
functional specification as Hashtable,
and includes versions of methods
corresponding to each method of
Hashtable. However, even though all
operations are thread-safe, retrieval
operations do not entail locking, and
there is not any support for locking
the entire table in a way that
prevents all access. This class is
fully interoperable with Hashtable in
programs that rely on its thread
safety but not on its synchronization
details.
This means that the options are thread safe and there isn't a way to do what you're trying to do above (lock the whole table). Furthermore, for the operations that you use (put and get), neither of them will require such locking.
I particularly like this quote from the javadoc from the values() method:
The view's iterator is a "weakly
consistent" iterator that will never
throw ConcurrentModificationException,
and guarantees to traverse elements as
they existed upon construction of the
iterator, and may (but is not
guaranteed to) reflect any
modifications subsequent to
construction.
So, if you use this method, you'll get a reasonable list: it will have the data as of the request time and might or might not have any later updates. The assurance that you won't have to worry about the ConcurrentModificationExceptions is a huge one: you can write simple code without the synchronized block that you show above and know that things will just work.

Categories