Accessing Hashmap from multiple threads without synchronization - java

I have a HashMap that I need multiple threads to access. My design is such that each thread will be reading and writing to its own entry in the map: the same map, but no two threads work on the same entry. No one thread will ever need to iterate over the map or call size(). So my question is: do I have to synchronize on the Hashmap or may I just use it with confidence that it will never throw a ConcurrentModificationException? My obvious worry is that synchronizing will create a huge bottleneck.

So my question is: do I have to synchronize on the Hashmap or may I just use it with confidence that it will never throw a ConcurrentModificationException?
You should use ConcurrentHashMap for this purpose. The issue is not just about iterating but it is also about memory synchronization.
... each thread will be reading and writing to its own entry in the map ...
This is someone ambiguous. If your HashMap is static in that the threads are only reading from the map and only making changes to the object that is referenced in the map but not changing the value in the map then you will be fine. You can initialize your map before the threads are started and they can use the map without memory synchronization.
However, if one thread changes the map in any manner to point to a new object, it must publish that change to central memory and other threads must see those updates. That requires memory synchronization on both the reading and writing sides.
My obvious worry is that synchronizing will create a huge bottleneck.
This smacks of premature optimization. Most likely you will be limit by IO long before contention over the map is an issue. In any case, switching to use a ConcurrentHashMap will alleviate your worries and it will be a minimal performance decrease compared to a non-synchronized map.

Related

Why do we need thread safety with collections?

I just want to understand why do we really need thread safety with collections? I know that if there are two threads and first thread is iterating over the collection and second thread is modifying the collection then first thread will get ConcurrentModificationException.
But then what if, if I know that none of my threads will iterate over the collection using an iterator, so does it means that thread safety is only needed because we want to allow other threads to iterator over the collection using an iterator? Are there any other reasons and usecases?
Any sort of reading and any sort of writing in two different threads can cause issues if you're not using a thread safe collection.
Iterating is a kind of reading, but so are List.get, Set.contains, Map.get, and many other operations.
(Iterating is a case in which Java can easily detect thread safety issues -- though multiple threads are rarely the actual reason for ConcurrentModificationException.)
I know that if there are two threads and first thread is iterating over the collection and second thread is modifying the collection then first thread will get ConcurrentModificationException.
That's not true. ConcurrentModificationException is for situation when your iterate thru collection and change it at the same time.
Thread safety is complex concept that includes several parts. It won't be easy to explain why we need it.
Main thing is because of the reason outside of scope of this discussion changes made in one thread may not be visible in another.
I just want to understand why do we really need thread safety with collections?
We need thread safety with any object that is being modified. Period. If two threads are sharing the same object and one thread makes a modification, there is no guarantee that the update to the object will be seen by the other thread and there are possibilities that the object may be partially updated causing exceptions, hangs, or other unexpected results.
One of the speedups that is gained with threads is local CPU cached memory. Each thread running in a CPU has local cached memory that is much faster than system memory. The cache is used as much as possible for high speed calculations and then invalidated or flushed to system memory when necessary. Without locking and memory synchronization, each thread could be working with invalid memory and could experience race conditions.
This is why threaded programs need to use concurrent collections (think ConcurrentHashMap) or protect collections (or any mutable object) using synchronized locks or other mechanisms. These ensure that the objects can't be modified at the same time and ensure that the modifications are published between threads appropriately.

Can concurrently reading from and writing to a TreeMap cause infinite loops

https://ivoanjo.me/blog/2018/07/21/writing-to-a-java-treemap-concurrently-can-lead-to-an-infinite-loop-during-reads/ demonstrates how multiple concurrent writers may corrupt a TreeMap in such a way that cycles are created, and iterating the structure becomes an infinite loop.
Is it also possible to get in an infinite loop when concurrently iterating and writing with at most one concurrent writer? If not can anything other than skipping elements, processing elements twice, or throwing a ConcurrentModificationException happen?
Is it also possible to get in an infinite loop when concurrently iterating and writing with at most one concurrent writer?
I would say a cautious no: these infinite loops occur because multiple threads are re-wiring the relationships between the nodes, and so may make conflicting updates. A single thread won't conflict with itself, so such a re-wiring mixup would not occur.
However, I am not confident in this - but I don't need to be: such a usage of a TreeMap is in violation of the documentation:
If multiple threads access a map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
If you don't externally synchronize, the behavior is undefined. Implementors of the class are free to implement the class in any way that meets the specification; and it is possible that one way might result in an infinite loop.
If you are encountering an infinite loop in a TreeMap, that's a symptom, not the root cause - namely, unsynchronized access to mutable data. This means that there is no guarantee that the values being read by the only-reading threads are correct either.
If you need to concurrently access a map, you'll have to use Collections.synchronizedMap. Otherwise, donc expect it to work correctly.

HashMap stuck on put

I am coding on a multithreaded environment and I see threads are stuck on HashMap.put:
34 Threads
java.util.HashMap.put(HashMap.java:374)
com.aaa.bbb.MyClass.getDefinitionMap().
Investigating the method that is the HashMap I see that the method is synchronized:
#Override
public synchronized Map<String,String> getDefinitionMap() {
//truncated some code here...
colDefMap = new HashMap<String,String>();
for (CD cd : (List<CD>)cm.getDef()) {
colDefMap.put(cd.getIdentifier(),cd);
}
return colDefMap;
}
So after switching to ConcurrentHashMap, removing the synchronized keyword from the method signature and restarting the application server - problem is resolved.
My question is why synchronized method is not sufficient in this scenario to protect the map from concurrent access?
You don't say how "stuck" this is, whether you actually have a deadlock or a bottleneck.
I would expect the posted code to be a bottleneck, where almost all your threads are trying to access the same object, waiting on acquiring the lock used by the synchronized method. It's likely that whatever cm.getDef does takes a while and only one thread at a time can make progress. So synchronizing does protect the data from concurrent access, just at the expense of throughput.
This fits the definition of "starvation" given in the Java concurrency tutorial:
Starvation describes a situation where a thread is unable to gain regular access to shared resources and is unable to make progress. This happens when shared resources are made unavailable for long periods by "greedy" threads. For example, suppose an object provides a synchronized method that often takes a long time to return. If one thread invokes this method frequently, other threads that also need frequent synchronized access to the same object will often be blocked.
Switching to ConcurrentHashMap is a good improvement, as you observed. ConcurrentHashMap avoids locking threads out of the entire map, and supports concurrent updates, see the API doc (my emphasis):
A hash table supporting full concurrency of retrievals and high expected concurrency for updates. This class obeys the same functional specification as Hashtable, and includes versions of methods corresponding to each method of Hashtable. However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access. This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details.
You might consider caching whatever cm.getDef does so you don't have to call it every time, but the practicality of that will depend on your requirements, of course.
You were synchronizing on the getDefinitionMap method in the subclass, which is apparently not the only method (or class) that has access to cm.
The iterator on the class variable cm is the likely culprit:
for (CD cd : (List<CD>) cm.getDef())
{
colDefMap.put(cd.getIdentifier(), cd);
}
In the above code, the cm variable is likely being modified while you are iterating over it.
You could have used the following:
synchronized (cm)
{
for (CD cd : (List<CD>) cm.getDef())
{
colDefMap.put(cd.getIdentifier(), cd);
}
}
However, this would have still left modification of cm open to other threads, if modifications to cm were performed without similar synchronization.
As you discovered, it is much easier to use the thread-safe versions of the collections classes than to implement workarounds for non-thread-safe collections in a multi-threaded environment.
I think you may be wrong by thinking that you solve your problem. Removing the synchronized means that you unlock access to this method which can resolve your problem and brings others. I mean your HashMap is created in the scope of your function so its obviously not here that you should have a concurrency probleme (if what is put inside is not static or thread-Safe). Never the less here using concurrentHashMap has no effect.
I suggest you to try and see in a multi-thread test if your function do properly is job without the synchronized statement (without the concurrentMap).
In my opinion without knowing the rest of your code, this function may be accessing static or shared data that may be lock by a thread so the problem do not come from the function but an other object interacting with it at some point.
Are you modifying it anywhere else? Are you 100% sure it's not being put somewhere else? I suspect you are and what is likely is that the second put is causing an infinite loop. http://mailinator.blogspot.com/2009/06/beautiful-race-condition.html
Otherwise, if this is the only place you are modifying the HashMap, it should be fine.

Can I get iterators from an ArrayList in multiple threads and use all of them safelly?

I have an ArrayList instance that is shared by multiple threads. It's gets initialized in a synchronized block (so there is a memory barrier to make it visible to all threads) and all threads only read from it. The ArrayList never changes.
I've read lots of posts online but it's still not clear to me if it's safe to read no matter how I do the read. If I get an iterator from it in each thread do the iterators share some state that gets altered while iterating etc. I'm not sharing the iterators, each thread gets it's own.
Is it thread safe for reads, no matter how I do the read?
As long as each thread has its own iterator, then you are OK.
The only time you need to worry about synchronization is when one thread is modifying (writing) a shared data structure while others are reading from it. This can lead to the data-structure being in an inconsistent state (imagine the thread wasn't finished its modifications when all of a sudden the scheduler pre-empts it/switches to another thread).
When all threads are only reading, the data will never be in an inconsistent state, and you don't need to worry about thread synchronization.
As long as iterators are used to read only this will work as expected.Also an iterator is fail-fast because it may throws a ConcurrentModificationException due to following reasons:
In multithreaded processing, if one thread is trying to modify a
Collection while another thread is iterating over it.
In single-threaded or in multithreaded, if after the creation of the
Iterator, the collection is modified using its own methods rather
using the Iterator's own methods.

Why do unsynchronized objects perform better than synchronized ones?

Question arises after reading this one. What is the difference between synchronized and unsynchronized objects? Why are unsynchronized objects perform better than synchronized ones?
What is the difference between Synchronized and Unsynchronized objects ? Why is Unsynchronized Objects perform better than Synchronized ones ?
HashTable is considered synchronized because its methods are marked as synchronized. Whenever a thread enters a synchronized method or a synchronized block it has to first get exclusive control over the monitor associated with the object instance being synchronized on. If another thread is already in a synchronized block on the same object then this will cause the thread to block which is a performance penalty as others have mentioned.
However, the synchronized block also does memory synchronization before and after which has memory cache implications and also restricts code reordering/optimization both of which have significant performance implications. So even if you have a single thread calling entering the synchronized block (i.e. no blocking) it will run slower than none.
One of the real performance improvements with threaded programs is realized because of separate CPU high-speed memory caches. When a threaded program does memory synchronization, the blocks of cached memory that have been updated need to be written to main memory and any updates made to main memory will invalidate local cached memory. By synchronizing more, again even in a single threaded program, you will see a performance hit.
As an aside, HashTable is an older class. If you want a reentrant Map then ConcurrentHashMap should be used.
Popular speaking the Synchronized Object is a single thread model,if there are 2 thread want to modify the Synchronized Object . if the first one get the lock of the Object ,that the last one should be waite。but if the Object is Unsynchronized,they can operat the object at the same time,It is the reason that why the Unsynchronized is unsafe。
For synchronization to work, the JVM has to prevent more than one thread entering a synchronized block at a time. This requires extra processing than if the synchronized block did not exist placing additional load on the JVM and therefore reducing performance.
The exact locking mechanisms in play when synchronization occurs are explain in How the Java virtual machine performs thread synchronization
Synchronization:
Array List is non-synchronized which means multiple threads can work
on Array List at the same time. For e.g. if one thread is performing
an add operation on Array List, there can be an another thread
performing remove operation on Array List at the same time in a multi
threaded environment
while Vector is synchronized. This means if one thread is working on
Vector, no other thread can get a hold of it. Unlike Array List, only
one thread can perform an operation on vector at a time.
Performance:
Synchronized operations consumes more time compared to
non-synchronized ones so if there is no need for thread safe
operation, Array List is a better choice as performance will be
improved because of the concurrent processes.
Synchronization is useful because it allows you to prevent code from being run twice at the same time (commonly called concurrency). This is important in a threaded environment for a multitude of reasons. In order to provide this guarantee the JVM has to do extra work which means that performance decreases. Because synchronization requires that only one process be allowed to execute at a time, it can cause multi-threaded programs to function as slowly (or slower!) than single-threaded programs.
It is important to note that the amount of performance decrease is not always obvious. Depending on the circumstances, the decrease may be tiny or huge. This depends on all sorts of things.
Finally, I'd like to add a short warning: Concurrent programming using synchronization is hard. I've found that usually other concurrency controls better suit my needs. One of my favorites is Atomic Reference. This utility is great because it very narrowly limits the amount of synchronized code. This makes it easier to read, maintain and write.

Categories