ConcurrentHashMap was introduced in 1.5 as a part java java.util.concurrent package. Before that the only way to have a threadsafe map was to use HashTable or Collections.synchronizedMap(Map).
For all the practical purpose (multithread environment),ConcurrentHashMap is sufficient to address the needs except one case wherein a thread needs a uniform view of the map.
My question is, apart from having a Uniform View of the map, are there any other scenarios wherein ConcurrentHashMap is not an option ?
The usage of Hashtable has been discouraged since Java 1.2 and the utility of synchronizedMap is quite limited and almost always ends up being insufficient due to the too-fine granularity of locking. However, when you do have a scenario where individual updates are the grain size you need, ConcurrentHashMap is a no-brainer better choice over synchronizedMap. It has better concurrency, thread-safe iterators (no, synchronizedMap doesn't have those—this is due to its design as a wrapper around a non-thread-safe map), better overall performance, and very little extra memory weight to pay for it all.
This is a stretch but I will give it as a use case.
If you needed a thread-safe Map implementation which you can do some extra compound operation on which isn't available via ConcurrentMap. Let's say you want to ensure two other objects don't exist before adding a third.
Hashtable t = new Hashtable();
synchronized(t){
if(!t.contains(object1) && !t.contains(object2)){
t.put(object3,object3);
}
}
Again this is a stretch, but you would not be able to achieve this with a CHM while ensuring atomicity and thread-safety. Because all operations of a Hashtable and its synchronizedMap counter part synchronize on the instance of the Map this ensures thread-safety.
At the end of the day I would seldom, if ever, use a synchronizedMap/Hashtable and I suggest you should do the same.
As far as I understand, ConcurrentMap is a replacement of HashTable and Collections.synchronizedMap() for thread-safe purposes. A usage of that all classes is discouraged. Thus, the answer to your question is "no, there are no other scenarios".
See also: What's the difference between ConcurrentHashMap and Collections.synchronizedMap(Map)?
Related
Why we need a thread-safe collection if we easily convert a non-thread-safe collection to Thread safe.
Ex: we can create Synchronized ArrayList by using Collections.synchronizedList() method.
synchronizedList just wraps all methods with exclusive locks. That may be too strict for you. For example, you may very well want to allow any number of concurrent read operations to proceed at the same time (and only serialize writes). A specialized implementation can offer that.
synchronizedList is only thread-safe in the sense that its internal state does not get corrupted. That may not be enough for your application. For example if (list.isEmpty()) list.add(1); is not thread-safe even on a synchronized list. Nor is for (String x: list) giving you a snapshot iteration. Specialized implementations can add higher-level atomic operations.
Why we need a thread-safe collection...
You don't need them, because, as you have pointed out,
we can create Synchronized ArrayList by using Collections.synchronizedList() method.
So why does the library provide "concurrent" collection classes? It's because some of those classes can be implemented using thread-safe algorithms, and especially, non-blocking algorithms that may be more efficient or safer than using a mutex-protected algorithm.
Of course, as others have pointed out, simply protecting a collection might not always be enough for your application. You might need a mutex anyway to protect some other data that is related to the collection.
But, if the lock-free versions are helpful to you, then the good news is that they are there; and if they are not helpful, then the good news is that you don't have to use them.
As we all know that ConcurrentHashMap is better in performance but can we have any scenario where Hashtable is better?
I'll say this before answering the question: do not ever use Hashtable anymore. Hashtable is a legacy tool from Java 1.0, before the superior collections framework introduced with Java 2 going onwards.
If you require a simple hash map, use HashMap. If you require performing thread-safety, use ConcurrentHashMap. If you require plain basic and non-performing thread-safety, wrap a HashMap into Collections.synchronizedMap(Map).
Now to actually answer the question as is, which is purely compare two specific classes without seeing the spectrum of possibilities:
Yes, such scenarios exist
From the Hashtable documentation:
If a thread-safe implementation is not needed, it is recommended to use HashMap in place of Hashtable. If a thread-safe highly-concurrent implementation is desired, then it is recommended to use ConcurrentHashMap in place of Hashtable.
So yes, Hashtable is appropriate for scenarios where you need a thread-safe implementation, but do not "desire" a highly-concurrent implementation. This, again strictly in the exclusive comparison between ConcurrentHashMap and Hashtable.
Also, if you need Enumeration[1], Hashtable has a direct support, while you have to go through Collections.enumeration(...) for other Maps.
1. Enumeration is also a Java 1.0 class. Switch to Iterator (if using Java 2 to 7) or Stream (if using Java 8+)
I have a data structure as:-
Map<String,Map<String,List<CustomPOJO>>>
The frequency of read operations on this data structure will be too high and write operations will also be there but not many.
As far as read is concerned I guess there is no issue by using simple java.util.HashMap API.
For write operations there can be two approaches:-
Put entire data in ConcurrentHashMap and use it to write data into it.
Perform all write operations in a synchronized block/method and use simple java.util.HashMap API.
Please suggest which one would be better for write operation and also suggest whether there can be any loophole in read operation.
Firstly, how predictable is the outer Map's key string value? If that's all predictable at design time, I would rather turn that into an Enum and uses an EnumMap to hold the outer Map. The same applies for the inner map, too. In that case your question turns into
EnumMap<Enum, EnumMap<Enum, List<POJO>>>
and is perfectly resolved.
Secondly, since you are using a map of map structure and using in an env where performance matters, I would assume the number of keys in the outer map << the number of total POJOs inside the entire structure. That's to say, the chance you add a new submap to the whole structure is very small. In this case a ReadWriteLock is best on the outer Map; For the inner map you could consider either ReadWriteLock or ConcurrentHashMap.
There are 3 major design considerations for the ConcurrentHashMap:
That it generates a lot of temp objects. So if your application is GC-sensitive you want to limit its usage.
That it allows maximum 16 concurrent threads operating it by default - but this is unlikely to be a concern.
That it's size() isn't constant time.
I would usually apply ReadWrite lock pattern or even atomic variable based implementation mainly when 1. turns out to be a problem. Otherwise I think ConcurrentHashMap does fine in most of circumstances.
Also, note that in the latest implementation of JDK, the Read/Write priority changed for the ReadWriteLock. If my memory is correct it seems to favor read operation than write; so in case you have too many reads your write thread might got thread starvation. In that case you might want your own read/write implementation.
I think this article best suites to your question there is explanation. Personally I think you should use ConcurrentHashMap because it allow reader to read without blocking all hashmap.
http://javarevisited.blogspot.com/2011/04/difference-between-concurrenthashmap.html
I'm new to threading in Java and I need to access data structure from few active threads. I've heard that java.util.concurrent.ConcurrentHashMap is threading-friendly. Do I need to use synchronized(map){}
while accessing ConcurrentHashMap or it will handle locks itself?
It handles the locks itself, and in fact you have no access to them (there is no other option)
You can use synchronized in special cases for writes, but it is very rare that you should need to do this. e.g. if you need to implement your own putIfAbsent because the cost of creating an object is high.
Using syncrhonized for reads would defeat the purpose of using the concurrent collection.
ConcurrentHashMap is suited only to the cases where you don't need any more atomicity than provided out-of-the-box. If for example you need to get a value, do something with it, and then set a new value, all in an atomic operation, this cannot be achieved without external locking.
In all such cases nothing can replace explicit locks in your code and it is nothing but waste to use this implementation instead of the basic HashMap.
Short answer: no you don't need to use synchronized(map).
Long answer:
all the operations provided by ConcurrentHashMap are thread safe and you can call them without worrying about locking
however, if you need some operations to be atomic in your code, you will still need some sort of locking at the client side
No, you don't need, but if you need to depend on internal synchronization, you should use Collections.synchronizedMap instead. From the javadoc of ConcurrentHashMap:
This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details.
Actually it won't synchronize on the whole data structure but on subparts (some buckets) of it.
This implies that ConcurrentHashMap's iterators are weakly consistent and the size of the map can be inaccurate. (But on the other hand it's put and get operations are still consistent and the throughput is higher)
There is one more important feature to note for concurrenthmp other than the concurrency feature it provides, which is fail safe iterator. Use CHMP just because they want to edit the entryset for put/remove while iteration.
Collections.synchronizedMap(Map) is other one. But ConcurrentModificationException may come in the above case.
Why is Java Vector considered a legacy class, obsolete or deprecated?
Isn't its use valid when working with concurrency?
And if I don't want to manually synchronize objects and just want to use a thread-safe collection without needing to make fresh copies of the underlying array (as CopyOnWriteArrayList does), then is it fine to use Vector?
What about Stack, which is a subclass of Vector, what should I use instead of it?
Vector synchronizes on each individual operation. That's almost never what you want to do.
Generally you want to synchronize a whole sequence of operations. Synchronizing individual operations is both less safe (if you iterate over a Vector, for instance, you still need to take out a lock to avoid anyone else changing the collection at the same time, which would cause a ConcurrentModificationException in the iterating thread) but also slower (why take out a lock repeatedly when once will be enough)?
Of course, it also has the overhead of locking even when you don't need to.
Basically, it's a very flawed approach to synchronization in most situations. As Mr Brian Henk pointed out, you can decorate a collection using the calls such as Collections.synchronizedList - the fact that Vector combines both the "resized array" collection implementation with the "synchronize every operation" bit is another example of poor design; the decoration approach gives cleaner separation of concerns.
As for a Stack equivalent - I'd look at Deque/ArrayDeque to start with.
Vector was part of 1.0 -- the original implementation had two drawbacks:
1. Naming: vectors are really just lists which can be accessed as arrays, so it should have been called ArrayList (which is the Java 1.2 Collections replacement for Vector).
2. Concurrency: All of the get(), set() methods are synchronized, so you can't have fine grained control over synchronization.
There is not much difference between ArrayList and Vector, but you should use ArrayList.
From the API doc.
As of the Java 2 platform v1.2, this
class was retrofitted to implement the
List interface, making it a member of
the Java Collections Framework. Unlike
the new collection implementations,
Vector is synchronized.
Besides the already stated answers about using Vector, Vector also has a bunch of methods around enumeration and element retrieval which are different than the List interface, and developers (especially those who learned Java before 1.2) can tend to use them if they are in the code. Although Enumerations are faster, they don't check if the collection was modified during iteration, which can cause issues, and given that Vector might be chosen for its syncronization - with the attendant access from multiple threads, this makes it a particularly pernicious problem. Usage of these methods also couples a lot of code to Vector, such that it won't be easy to replace it with a different List implementation.
You can use the synchronizedCollection/List method in java.util.Collection to get a thread-safe collection from a non-thread-safe one.
java.util.Stack inherits the synchronization overhead of java.util.Vector, which is usually not justified.
It inherits a lot more than that, though. The fact that java.util.Stack extends java.util.Vector is a mistake in object-oriented design. Purists will note that it also offers a lot of methods beyond the operations traditionally associated with a stack (namely: push, pop, peek, size). It's also possible to do search, elementAt, setElementAt, remove, and many other random-access operations. It's basically up to the user to refrain from using the non-stack operations of Stack.
For these performance and OOP design reasons, the JavaDoc for java.util.Stack recommends ArrayDeque as the natural replacement. (A deque is more than a stack, but at least it's restricted to manipulating the two ends, rather than offering random access to everything.)