https://ivoanjo.me/blog/2018/07/21/writing-to-a-java-treemap-concurrently-can-lead-to-an-infinite-loop-during-reads/ demonstrates how multiple concurrent writers may corrupt a TreeMap in such a way that cycles are created, and iterating the structure becomes an infinite loop.
Is it also possible to get in an infinite loop when concurrently iterating and writing with at most one concurrent writer? If not can anything other than skipping elements, processing elements twice, or throwing a ConcurrentModificationException happen?
Is it also possible to get in an infinite loop when concurrently iterating and writing with at most one concurrent writer?
I would say a cautious no: these infinite loops occur because multiple threads are re-wiring the relationships between the nodes, and so may make conflicting updates. A single thread won't conflict with itself, so such a re-wiring mixup would not occur.
However, I am not confident in this - but I don't need to be: such a usage of a TreeMap is in violation of the documentation:
If multiple threads access a map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
If you don't externally synchronize, the behavior is undefined. Implementors of the class are free to implement the class in any way that meets the specification; and it is possible that one way might result in an infinite loop.
If you are encountering an infinite loop in a TreeMap, that's a symptom, not the root cause - namely, unsynchronized access to mutable data. This means that there is no guarantee that the values being read by the only-reading threads are correct either.
If you need to concurrently access a map, you'll have to use Collections.synchronizedMap. Otherwise, donc expect it to work correctly.
Related
I just want to understand why do we really need thread safety with collections? I know that if there are two threads and first thread is iterating over the collection and second thread is modifying the collection then first thread will get ConcurrentModificationException.
But then what if, if I know that none of my threads will iterate over the collection using an iterator, so does it means that thread safety is only needed because we want to allow other threads to iterator over the collection using an iterator? Are there any other reasons and usecases?
Any sort of reading and any sort of writing in two different threads can cause issues if you're not using a thread safe collection.
Iterating is a kind of reading, but so are List.get, Set.contains, Map.get, and many other operations.
(Iterating is a case in which Java can easily detect thread safety issues -- though multiple threads are rarely the actual reason for ConcurrentModificationException.)
I know that if there are two threads and first thread is iterating over the collection and second thread is modifying the collection then first thread will get ConcurrentModificationException.
That's not true. ConcurrentModificationException is for situation when your iterate thru collection and change it at the same time.
Thread safety is complex concept that includes several parts. It won't be easy to explain why we need it.
Main thing is because of the reason outside of scope of this discussion changes made in one thread may not be visible in another.
I just want to understand why do we really need thread safety with collections?
We need thread safety with any object that is being modified. Period. If two threads are sharing the same object and one thread makes a modification, there is no guarantee that the update to the object will be seen by the other thread and there are possibilities that the object may be partially updated causing exceptions, hangs, or other unexpected results.
One of the speedups that is gained with threads is local CPU cached memory. Each thread running in a CPU has local cached memory that is much faster than system memory. The cache is used as much as possible for high speed calculations and then invalidated or flushed to system memory when necessary. Without locking and memory synchronization, each thread could be working with invalid memory and could experience race conditions.
This is why threaded programs need to use concurrent collections (think ConcurrentHashMap) or protect collections (or any mutable object) using synchronized locks or other mechanisms. These ensure that the objects can't be modified at the same time and ensure that the modifications are published between threads appropriately.
Java provides a subList function to get a view of the list between specified indices and is backed by the parent list meaning, any changes made to subList will reflect in the actual list. What I wish to know is whether these sublists get locked by the parent list if threads try to access them.
As an example, if I have a an ArrayList of 100 elements and I create 4 sublists each with 25 elements and 4 threads try to work in parallel on these sublists, will they work on their independent sublists in a truly parallel manner or will the first thread which gets executed, lock the backing arraylist?
If an arraylist is not locked by default, I am assuming the threads will run in parallel on the sublists without waiting for each other and if I programmatically ensure or rather the logic itself ensures that these threads never work on anything else other than their sublists then it will truly be parallel processing of the sublists, right?
executor.addTask(new Thread(doneSignal, parentList.subList(subListStart, subListEnd)));
The reason I ask is, I tried to loop over the sublists in parallel and noticed that it was substantially slower than not creating 4 threads and looping over the actual parent list.
As it says in the Javadoc for java.util.ArrayList:
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally.
Of course, this applies to the subList method. So, no locking is done by the ArrayList itself; you need to do it yourself if you require it.
What I wish to know is whether these sublists get locked by the parent list if threads try to access them.
No, they won't.
... [w]ill they work on their independent sublists in a truly parallel manner
Yes. (Unless there are other factors that are working against parallelism.)
I tried to loop over the sublists in parallel and noticed that it was substantially slower than not creating 4 threads and looping over the actual parent list.
That could be down to other things. For example, thread creation overheads, a thread pool that is too small, or trying to run multi-threaded code when there are too few cores.
Another possibility is that you are creating sublists of a synchronized list. If you do that, when the sublist methods delegate operations to the parent list, those operations will all be locking the same list. Note however it is the parent list that is responsible for this, not the sublists.
I have an ArrayList instance that is shared by multiple threads. It's gets initialized in a synchronized block (so there is a memory barrier to make it visible to all threads) and all threads only read from it. The ArrayList never changes.
I've read lots of posts online but it's still not clear to me if it's safe to read no matter how I do the read. If I get an iterator from it in each thread do the iterators share some state that gets altered while iterating etc. I'm not sharing the iterators, each thread gets it's own.
Is it thread safe for reads, no matter how I do the read?
As long as each thread has its own iterator, then you are OK.
The only time you need to worry about synchronization is when one thread is modifying (writing) a shared data structure while others are reading from it. This can lead to the data-structure being in an inconsistent state (imagine the thread wasn't finished its modifications when all of a sudden the scheduler pre-empts it/switches to another thread).
When all threads are only reading, the data will never be in an inconsistent state, and you don't need to worry about thread synchronization.
As long as iterators are used to read only this will work as expected.Also an iterator is fail-fast because it may throws a ConcurrentModificationException due to following reasons:
In multithreaded processing, if one thread is trying to modify a
Collection while another thread is iterating over it.
In single-threaded or in multithreaded, if after the creation of the
Iterator, the collection is modified using its own methods rather
using the Iterator's own methods.
I have a HashMap that I need multiple threads to access. My design is such that each thread will be reading and writing to its own entry in the map: the same map, but no two threads work on the same entry. No one thread will ever need to iterate over the map or call size(). So my question is: do I have to synchronize on the Hashmap or may I just use it with confidence that it will never throw a ConcurrentModificationException? My obvious worry is that synchronizing will create a huge bottleneck.
So my question is: do I have to synchronize on the Hashmap or may I just use it with confidence that it will never throw a ConcurrentModificationException?
You should use ConcurrentHashMap for this purpose. The issue is not just about iterating but it is also about memory synchronization.
... each thread will be reading and writing to its own entry in the map ...
This is someone ambiguous. If your HashMap is static in that the threads are only reading from the map and only making changes to the object that is referenced in the map but not changing the value in the map then you will be fine. You can initialize your map before the threads are started and they can use the map without memory synchronization.
However, if one thread changes the map in any manner to point to a new object, it must publish that change to central memory and other threads must see those updates. That requires memory synchronization on both the reading and writing sides.
My obvious worry is that synchronizing will create a huge bottleneck.
This smacks of premature optimization. Most likely you will be limit by IO long before contention over the map is an issue. In any case, switching to use a ConcurrentHashMap will alleviate your worries and it will be a minimal performance decrease compared to a non-synchronized map.
we can synchronize a collection by using 'collections.synchronizedCollection(Collection c)'
or 'collections.synchronizedMap(Map c)' and we can also use java concurrent API like ConcurrentHashMap or ArrayQueue or BlockingQueue.
Is there any difference in synchronization level between these two way of getting synchronized collections or these are almost same?
could any one explain?
Yes: speed during massive parallel processing.
This can be illustrated in a very simple way: Imagine that 100 Threads are waiting to take something out of a collection.
The synchronized way: 99 Threads a put to sleep and 1 Thread gets its value.
The concurrent way: 100 Threads get their value immediately, none is put on hold.
Now the second method takes a little more time than a simple get, but as soon as a minimum like 2 Threads are using it on a constant basis, it is well worth the time you save thanks to concurrent execution.
so now as per my understandings synchronized way is a wrapper and blocks whole collection object and on other hand in concurrent way only objects inside collection get synchronized and we can access 2 or more elements of a collection at same time