Why we need a thread-safe collection if we easily convert a non-thread-safe collection to Thread safe.
Ex: we can create Synchronized ArrayList by using Collections.synchronizedList() method.
synchronizedList just wraps all methods with exclusive locks. That may be too strict for you. For example, you may very well want to allow any number of concurrent read operations to proceed at the same time (and only serialize writes). A specialized implementation can offer that.
synchronizedList is only thread-safe in the sense that its internal state does not get corrupted. That may not be enough for your application. For example if (list.isEmpty()) list.add(1); is not thread-safe even on a synchronized list. Nor is for (String x: list) giving you a snapshot iteration. Specialized implementations can add higher-level atomic operations.
Why we need a thread-safe collection...
You don't need them, because, as you have pointed out,
we can create Synchronized ArrayList by using Collections.synchronizedList() method.
So why does the library provide "concurrent" collection classes? It's because some of those classes can be implemented using thread-safe algorithms, and especially, non-blocking algorithms that may be more efficient or safer than using a mutex-protected algorithm.
Of course, as others have pointed out, simply protecting a collection might not always be enough for your application. You might need a mutex anyway to protect some other data that is related to the collection.
But, if the lock-free versions are helpful to you, then the good news is that they are there; and if they are not helpful, then the good news is that you don't have to use them.
Related
By using VarHandle class in Java you can achieve atomic operations on objects, through which classes like AtomicReferenceArray can perform concurrent operations on elements of array without locking (synchronized) the whole array structure ending up in significant performance gain.
Is there any specific reason why other synchronized and thread safe versions of collections are not implementing such model?
For example all the synchronized factory methods in Collections class return the version of their corresponding collection that uses synchronized methods which locks the whole collection!
Atomic operations on objects don't make a whole collection necessarily thread-safe. The synchronized factory methods in Collections, for example, are the only way to make arbitrary collection implementations thread-safe.
It is possible that VarHandle could be used to optimize collections specifically built for concurrency, such as ConcurrentHashMap. If you think that VarHandles could be used to improve those types, you should file a feature request. It's entirely possible that just nobody's bothered to do it.
While reading Oracle tutorial on collections implementations, i found the following sentence :
If you need synchronization, a Vector will be slightly faster than an ArrayList synchronized with Collections.synchronizedList
source : List Implementations
but when searching for difference between them, many people discourage the using of Vector and should be replaced by SynchronizedList when the synchronization is needed.
So which side has right to be followed ?
When you use Collections.synchronizedList(new ArrayList<>()) you are separating the two implementation details. It’s clear, how you could change the underlying storage model from “array based” to, e.g. “linked nodes”, by simply replacing new ArrayList with new LinkedList without changing the synchronized decoration.
The statement that Vector “will be slightly faster” seems to be based on the fact that its use does not bear a delegation between the wrapper and the underlying storage, but deriving statements about the performance from that was even questionable by the time, when ArrayList and the synchronizedList wrapper were introduced.
It should be noted that when you are really concerned about the performance of a list accessed by multiple threads, you will use neither of these two alternatives. The idea of making a storage thread safe by making all access methods synchronized is flawed right from the start. Every operation that involves multiple access to the list, e.g. simple constructs like if(!list.contains(o)) list.add(o); or iterating over the list or even a simple Collections.swap(list, i, j); require additional manual synchronization to work correctly in a multi-threaded setup.
If you think it over, you will realize that most operations of a real life application consist of multiple access and therefore will require careful manual locking and the fact that every low level access method synchronizes additionally can not only take away performance, it’s also a disguise pretending a safety that isn’t there.
Vector is an old API, of course. No questions there.
For speed though, it might be only because the synchronized list involves extra method calls to reach and return the data since it is a "wrapper" on top of a list after all. That's all there is to it, imho.
Java has tons of different Collections designed for concurrency and thread safety, and I'm at a loss as to which one to choose for my situation.
Multiple threads may be calling .add() and .remove(), and I will be copying this list frequently with something like List<T> newList = new ArrayList<T>(concurrentList). I will never be looping over the concurrent list.
I thought about something like CopyOnWriteArrayList, but I've read that it can be very inefficient because it copies itself every time it's modified. I'm hoping to find a good compromise between safety and efficiency.
What is the best list (or set) for this situation?
As #SpiderPig said, the best case scenario with a List would be an immutable, singly-linked list.
However, looking at what's being done here, a List is unnecessary (#bhspencer's comment). A ConcurrentSkipListSet will work most efficiently (#augray).
This Related Thread's accepted answer offers more insight on the pros and cons of different concurrent collections.
You might want to look into whether a ctrie would be appropriate for your use case - it has thread-safe add and remove operations, and "copying" (in actuality, taking a snapshot of) the data structure runs in O(1). I'm aware of two JVM implementations of the data structure: implementation one, implementation two.
Collections.newSetFromMap(new ConcurrentHashMap<...>())
This is typically how a normal Set is done (HashSet is really a modified wrapper over HashMap). It offers both the advantages of performance/concurrecy from ConcurrentHashMap, and does not have extra features like ConcurrentSkipListSet (ordering), COW lists (copying every modification), or concurrent queues (FIFO/LIFO ordering).
Edit: I didn't see #bhspencer's comment on the original post, apologies for stealing the spotlight.
Hashset being hashing based would be better than List.
Add last and remove first will be good with LinkedList.
Search will be fast in arraylist being array index based.
Thanks,
I'm new to threading in Java and I need to access data structure from few active threads. I've heard that java.util.concurrent.ConcurrentHashMap is threading-friendly. Do I need to use synchronized(map){}
while accessing ConcurrentHashMap or it will handle locks itself?
It handles the locks itself, and in fact you have no access to them (there is no other option)
You can use synchronized in special cases for writes, but it is very rare that you should need to do this. e.g. if you need to implement your own putIfAbsent because the cost of creating an object is high.
Using syncrhonized for reads would defeat the purpose of using the concurrent collection.
ConcurrentHashMap is suited only to the cases where you don't need any more atomicity than provided out-of-the-box. If for example you need to get a value, do something with it, and then set a new value, all in an atomic operation, this cannot be achieved without external locking.
In all such cases nothing can replace explicit locks in your code and it is nothing but waste to use this implementation instead of the basic HashMap.
Short answer: no you don't need to use synchronized(map).
Long answer:
all the operations provided by ConcurrentHashMap are thread safe and you can call them without worrying about locking
however, if you need some operations to be atomic in your code, you will still need some sort of locking at the client side
No, you don't need, but if you need to depend on internal synchronization, you should use Collections.synchronizedMap instead. From the javadoc of ConcurrentHashMap:
This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details.
Actually it won't synchronize on the whole data structure but on subparts (some buckets) of it.
This implies that ConcurrentHashMap's iterators are weakly consistent and the size of the map can be inaccurate. (But on the other hand it's put and get operations are still consistent and the throughput is higher)
There is one more important feature to note for concurrenthmp other than the concurrency feature it provides, which is fail safe iterator. Use CHMP just because they want to edit the entryset for put/remove while iteration.
Collections.synchronizedMap(Map) is other one. But ConcurrentModificationException may come in the above case.
Why is Java Vector considered a legacy class, obsolete or deprecated?
Isn't its use valid when working with concurrency?
And if I don't want to manually synchronize objects and just want to use a thread-safe collection without needing to make fresh copies of the underlying array (as CopyOnWriteArrayList does), then is it fine to use Vector?
What about Stack, which is a subclass of Vector, what should I use instead of it?
Vector synchronizes on each individual operation. That's almost never what you want to do.
Generally you want to synchronize a whole sequence of operations. Synchronizing individual operations is both less safe (if you iterate over a Vector, for instance, you still need to take out a lock to avoid anyone else changing the collection at the same time, which would cause a ConcurrentModificationException in the iterating thread) but also slower (why take out a lock repeatedly when once will be enough)?
Of course, it also has the overhead of locking even when you don't need to.
Basically, it's a very flawed approach to synchronization in most situations. As Mr Brian Henk pointed out, you can decorate a collection using the calls such as Collections.synchronizedList - the fact that Vector combines both the "resized array" collection implementation with the "synchronize every operation" bit is another example of poor design; the decoration approach gives cleaner separation of concerns.
As for a Stack equivalent - I'd look at Deque/ArrayDeque to start with.
Vector was part of 1.0 -- the original implementation had two drawbacks:
1. Naming: vectors are really just lists which can be accessed as arrays, so it should have been called ArrayList (which is the Java 1.2 Collections replacement for Vector).
2. Concurrency: All of the get(), set() methods are synchronized, so you can't have fine grained control over synchronization.
There is not much difference between ArrayList and Vector, but you should use ArrayList.
From the API doc.
As of the Java 2 platform v1.2, this
class was retrofitted to implement the
List interface, making it a member of
the Java Collections Framework. Unlike
the new collection implementations,
Vector is synchronized.
Besides the already stated answers about using Vector, Vector also has a bunch of methods around enumeration and element retrieval which are different than the List interface, and developers (especially those who learned Java before 1.2) can tend to use them if they are in the code. Although Enumerations are faster, they don't check if the collection was modified during iteration, which can cause issues, and given that Vector might be chosen for its syncronization - with the attendant access from multiple threads, this makes it a particularly pernicious problem. Usage of these methods also couples a lot of code to Vector, such that it won't be easy to replace it with a different List implementation.
You can use the synchronizedCollection/List method in java.util.Collection to get a thread-safe collection from a non-thread-safe one.
java.util.Stack inherits the synchronization overhead of java.util.Vector, which is usually not justified.
It inherits a lot more than that, though. The fact that java.util.Stack extends java.util.Vector is a mistake in object-oriented design. Purists will note that it also offers a lot of methods beyond the operations traditionally associated with a stack (namely: push, pop, peek, size). It's also possible to do search, elementAt, setElementAt, remove, and many other random-access operations. It's basically up to the user to refrain from using the non-stack operations of Stack.
For these performance and OOP design reasons, the JavaDoc for java.util.Stack recommends ArrayDeque as the natural replacement. (A deque is more than a stack, but at least it's restricted to manipulating the two ends, rather than offering random access to everything.)