HashMap stuck on put - java

I am coding on a multithreaded environment and I see threads are stuck on HashMap.put:
34 Threads
java.util.HashMap.put(HashMap.java:374)
com.aaa.bbb.MyClass.getDefinitionMap().
Investigating the method that is the HashMap I see that the method is synchronized:
#Override
public synchronized Map<String,String> getDefinitionMap() {
//truncated some code here...
colDefMap = new HashMap<String,String>();
for (CD cd : (List<CD>)cm.getDef()) {
colDefMap.put(cd.getIdentifier(),cd);
}
return colDefMap;
}
So after switching to ConcurrentHashMap, removing the synchronized keyword from the method signature and restarting the application server - problem is resolved.
My question is why synchronized method is not sufficient in this scenario to protect the map from concurrent access?

You don't say how "stuck" this is, whether you actually have a deadlock or a bottleneck.
I would expect the posted code to be a bottleneck, where almost all your threads are trying to access the same object, waiting on acquiring the lock used by the synchronized method. It's likely that whatever cm.getDef does takes a while and only one thread at a time can make progress. So synchronizing does protect the data from concurrent access, just at the expense of throughput.
This fits the definition of "starvation" given in the Java concurrency tutorial:
Starvation describes a situation where a thread is unable to gain regular access to shared resources and is unable to make progress. This happens when shared resources are made unavailable for long periods by "greedy" threads. For example, suppose an object provides a synchronized method that often takes a long time to return. If one thread invokes this method frequently, other threads that also need frequent synchronized access to the same object will often be blocked.
Switching to ConcurrentHashMap is a good improvement, as you observed. ConcurrentHashMap avoids locking threads out of the entire map, and supports concurrent updates, see the API doc (my emphasis):
A hash table supporting full concurrency of retrievals and high expected concurrency for updates. This class obeys the same functional specification as Hashtable, and includes versions of methods corresponding to each method of Hashtable. However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access. This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details.
You might consider caching whatever cm.getDef does so you don't have to call it every time, but the practicality of that will depend on your requirements, of course.

You were synchronizing on the getDefinitionMap method in the subclass, which is apparently not the only method (or class) that has access to cm.
The iterator on the class variable cm is the likely culprit:
for (CD cd : (List<CD>) cm.getDef())
{
colDefMap.put(cd.getIdentifier(), cd);
}
In the above code, the cm variable is likely being modified while you are iterating over it.
You could have used the following:
synchronized (cm)
{
for (CD cd : (List<CD>) cm.getDef())
{
colDefMap.put(cd.getIdentifier(), cd);
}
}
However, this would have still left modification of cm open to other threads, if modifications to cm were performed without similar synchronization.
As you discovered, it is much easier to use the thread-safe versions of the collections classes than to implement workarounds for non-thread-safe collections in a multi-threaded environment.

I think you may be wrong by thinking that you solve your problem. Removing the synchronized means that you unlock access to this method which can resolve your problem and brings others. I mean your HashMap is created in the scope of your function so its obviously not here that you should have a concurrency probleme (if what is put inside is not static or thread-Safe). Never the less here using concurrentHashMap has no effect.
I suggest you to try and see in a multi-thread test if your function do properly is job without the synchronized statement (without the concurrentMap).
In my opinion without knowing the rest of your code, this function may be accessing static or shared data that may be lock by a thread so the problem do not come from the function but an other object interacting with it at some point.

Are you modifying it anywhere else? Are you 100% sure it's not being put somewhere else? I suspect you are and what is likely is that the second put is causing an infinite loop. http://mailinator.blogspot.com/2009/06/beautiful-race-condition.html
Otherwise, if this is the only place you are modifying the HashMap, it should be fine.

Related

Threadsafe vs Synchronized

I'm new to java.
I'm little bit confused between Threadsafe and synchronized.
Thread safe means that a method or class instance can be used by multiple threads at the same time without any problems occurring.
Where as Synchronized means only one thread can operate at single time.
So how they are related to each other?
The definition of thread safety given in Java Concurrency in Practice is:
A class is thread-safe if it behaves correctly when accessed from multiple threads, regardless of the scheduling or interleaving of the execution of those threads by the runtime environment, and with no additional synchronization or other coordination on the part of the calling code.
For example, a java.text.SimpleDateFormat object has internal mutable state that is modified when a method that parses or formats is called. If multiple threads call the methods of the same dateformat object, there is a chance a thread can modify the state needed by the other threads, with the result that the results obtained by some of the threads may be in error. The possibility of having internal state get corrupted causing bad output makes this class not threadsafe.
There are multiple ways of handling this problem. You can have every place in your application that needs a SimpleDateFormat object instantiate a new one every time it needs one, you can make a ThreadLocal holding a SimpleDateFormat object so that each thread of your program can access its own copy (so each thread only has to create one), you can use an alternative to SimpleDateFormat that doesn't keep state, or you can do locking using synchronized so that only one thread at a time can access the dateFormat object.
Locking is not necessarily the best approach, avoiding shared mutable state is best whenever possible. That's why in Java 8 they introduced a date formatter that doesn't keep mutable state.
The synchronized keyword is one way of restricting access to a method or block of code so that otherwise thread-unsafe data doesn't get corrupted. This keyword protects the method or block by requiring that a thread has to acquire exclusive access to a certain lock (the object instance, if synchronized is on an instance method, or the class instance, if synchronized is on a static method, or the specified lock if using a synchronized block) before it can enter the method or block, while providing memory visibility so that threads don't see stale data.
Thread safety is a desired behavior of the program, where the synchronized block helps you achieve that behavior. ​There are other methods of obtaining Thread safety e.g immutable class/objects. Hope this helps.
Thread safety: A thread safe program protects it's data from memory consistency errors. In a highly multi-threaded program, a thread safe program does not cause any side effects with multiple read/write operations from multiple threads on shared data (objects). Different threads can share and modify object data without consistency errors.
synchronized is one basic method of achieving ThreadSafe code.
Refer to below SE questions for more details:
What does 'synchronized' mean?
You can achieve thread safety by using advanced concurrency API. This documentation page provides good programming constructs to achieve thread safety.
Lock Objects support locking idioms that simplify many concurrent applications.
Concurrent Collections make it easier to manage large collections of data, and can greatly reduce the need for synchronization.
Atomic Variables have features that minimize synchronization and help avoid memory consistency errors.
ThreadLocalRandom (in JDK 7) provides efficient generation of pseudorandom numbers from multiple threads.
Refer to java.util.concurrent and java.util.concurrent.atomic packages too for other programming constructs.
Related SE question:
Synchronization vs Lock
Synchronized: only one thread can operate at same time.
Threadsafe: a method or class instance can be used by multiple threads at the same time without any problems occurring.
If you relate this question as, Why synchronized methods are thread safe? than you can get better idea.
As per the definition this appears to be confusive. But not,if you understand it analytically.
Synchronized means: sequentially one by one in an order,Not concurrently [Not at the same time].
synchronized method not allows to act another thread on it, While a thread is already working on it.This avoids concurrency.
example of synchronization: If you want to buy a movie ticket,and stand in a queue. you will get the ticket only after the person in front of you get the ticket.
Thread safe means: method becomes safe to be accessed by multiple threads without any problem at the same time.synchronized keyword is one of the way to achieve 'thread safe'. But Remember:Actually while multiple threads tries to access synchronized method they follow the order so becomes safe to access. Actually, Even they act at the same time, but cannot access the same resource(method/block) at the same time, because of synchronized behavior of the resource.
Because If a method becomes synchronized, so this is becomes safe to allow multiple threads to act on it, without any problem. Remember:: multiple threads "not act on it at the same time" hence we call synchronized methods thread safe.
Hope this helps to understand.
After patiently reading through a lot of answers and not being too technical at the same time, I could say something definite but close to what Nayak had already replied to fastcodejava above, which comes later on in my answer but look
synchronization is not even close to brute-forcing thread-safety; it's just making a piece of code (or method) safe and incorruptible for a single authorized thread by preventing it from being used by any other threads.
Thread safety is about how all threads accessing a certain element behave and get their desired results in the same way if they would have been sequential (or even not so), without any form of undesired corruption (sorry for the pleonasm) as in an ideal world.
One of the ways of achieving proximity to thread-safety would be using classes in java.util.concurrent.atomic.
Sad, that they don't have final methods though!
Nayak, when we declare a method as synchronized, all other calls to it from other threads are locked and can wait indefinitely. Java also provides other means of locking with Lock objects now.
You can also declare an object to be final or volatile to guarantee its availability to other concurrent threads.
ref: http://www.javamex.com/tutorials/threads/thread_safety.shtml
In practice, performance wise, Thread safe, Synchronised, non-thread safe and non-synchronised classes are ordered as:
Hashtable(slower) < Collections.SynchronizedMap < HashMap(fastest)

Why do unsynchronized objects perform better than synchronized ones?

Question arises after reading this one. What is the difference between synchronized and unsynchronized objects? Why are unsynchronized objects perform better than synchronized ones?
What is the difference between Synchronized and Unsynchronized objects ? Why is Unsynchronized Objects perform better than Synchronized ones ?
HashTable is considered synchronized because its methods are marked as synchronized. Whenever a thread enters a synchronized method or a synchronized block it has to first get exclusive control over the monitor associated with the object instance being synchronized on. If another thread is already in a synchronized block on the same object then this will cause the thread to block which is a performance penalty as others have mentioned.
However, the synchronized block also does memory synchronization before and after which has memory cache implications and also restricts code reordering/optimization both of which have significant performance implications. So even if you have a single thread calling entering the synchronized block (i.e. no blocking) it will run slower than none.
One of the real performance improvements with threaded programs is realized because of separate CPU high-speed memory caches. When a threaded program does memory synchronization, the blocks of cached memory that have been updated need to be written to main memory and any updates made to main memory will invalidate local cached memory. By synchronizing more, again even in a single threaded program, you will see a performance hit.
As an aside, HashTable is an older class. If you want a reentrant Map then ConcurrentHashMap should be used.
Popular speaking the Synchronized Object is a single thread model,if there are 2 thread want to modify the Synchronized Object . if the first one get the lock of the Object ,that the last one should be waite。but if the Object is Unsynchronized,they can operat the object at the same time,It is the reason that why the Unsynchronized is unsafe。
For synchronization to work, the JVM has to prevent more than one thread entering a synchronized block at a time. This requires extra processing than if the synchronized block did not exist placing additional load on the JVM and therefore reducing performance.
The exact locking mechanisms in play when synchronization occurs are explain in How the Java virtual machine performs thread synchronization
Synchronization:
Array List is non-synchronized which means multiple threads can work
on Array List at the same time. For e.g. if one thread is performing
an add operation on Array List, there can be an another thread
performing remove operation on Array List at the same time in a multi
threaded environment
while Vector is synchronized. This means if one thread is working on
Vector, no other thread can get a hold of it. Unlike Array List, only
one thread can perform an operation on vector at a time.
Performance:
Synchronized operations consumes more time compared to
non-synchronized ones so if there is no need for thread safe
operation, Array List is a better choice as performance will be
improved because of the concurrent processes.
Synchronization is useful because it allows you to prevent code from being run twice at the same time (commonly called concurrency). This is important in a threaded environment for a multitude of reasons. In order to provide this guarantee the JVM has to do extra work which means that performance decreases. Because synchronization requires that only one process be allowed to execute at a time, it can cause multi-threaded programs to function as slowly (or slower!) than single-threaded programs.
It is important to note that the amount of performance decrease is not always obvious. Depending on the circumstances, the decrease may be tiny or huge. This depends on all sorts of things.
Finally, I'd like to add a short warning: Concurrent programming using synchronization is hard. I've found that usually other concurrency controls better suit my needs. One of my favorites is Atomic Reference. This utility is great because it very narrowly limits the amount of synchronized code. This makes it easier to read, maintain and write.

Concurrency design principles in practice

I have a Results object which is written to by several threads concurrently. However, each thread has a specific purpose and owns certain fields, so that no data is actually modified by more than one thread. The consumer of this data will not try to read it until all of the writer threads are done writing it. Because I know this to be true, there is no synchronization on the data writes and reads.
There is a RunningState object associated with this Results object which serves to coordinate this work. All of its methods are synchronized. When a thread is done with its work on this Results object, it calls done() on the RunningState object, which does the following: decrements a counter, checks if the counter has gone to 0 (indicating that all writers are done), and if so, puts this object on a concurrent queue. That queue is consumed by a ResultsStore which reads all of the fields and stores data in the database. Before reading any data, the ResultsStore calls RunningState.finalizeResult(), which is an empty method whose sole purpose is to synchronize on the RunningState object, to ensure that writes from all of the threads are visible to the reader.
Here are my concerns:
1) I believe that this will work correctly, but I feel like I'm violating good design principles to not synchronize on the data modifications to an object that is shared by multiple threads. However, if I were to add synchronization and/or split things up so each thread only saw the data it was responsible for, it would complicate the code. Anyone who modifies this area had better understand what's going on in any case or they're likely to break something, so from a maintenance standpoint I think the simpler code with good comments explaining how it works is a better way to go.
2) The fact that I need to call this do-nothing method seems like an indication of wrong design. Is it?
Opinions appreciated.
This seems mostly right, if a bit fragile (if you change the thread-local nature of one field, for instance, you may forget to synchronize it and end up with hard-to-trace data races).
The big area of concern is in memory visibility; I don't think you've established it. The empty finalizeResult() method may be synchronized, but if the writer threads didn't also synchronize on whatever it synchronizes on (presumably this?), there's no happens-before relationship. Remember, synchronization isn't absolute -- you synchronize relative to other threads that are also synchronized on the same object. Your do-nothing method will indeed do nothing, not even ensure any memory barrier.
You somehow need to establish a happens-before relationship between each thread doing its writes, and the thread that eventually reads. One way to do this without synchronization is via a volatile variable, or an AtomicInteger (or other atomic classes).
For instance, each writer thread can invoke counter.incrementAndGet(1) on the object, and the reading thread can then check that counter.get() == THE_CORRECT_VALUE. There's a happens-before relationship between a volatile/atomic field being written and it being read, which gives you the needed visibility.
Your design is sound, but it can be improved if you are using a true concurrent queue since a concurrent queue from the java.util.concurrent package already guarantees a happens before relationship between the thread putting an item into the queue, and the thread taking an item out, so this precludes needing to call finalizeResult() in the taking thread (so no need for that "do nothing" method call).
From java.util.concurrent package description:
The methods of all classes in java.util.concurrent and its subpackages
extend these guarantees to higher-level synchronization. In
particular:
Actions in a thread prior to placing an object into any
concurrent collection happen-before actions subsequent to the access
or removal of that element from the collection in another thread.
The comments in another answer concerning using an AtomicInteger instead of synchronization are also wise (as using an AtomicInteger to do your thread counting will likely perform better than synchronization), just make sure to get the value of the count after the atomic decrement (e.g. decrementAndGet()) when comparing to 0 in order to avoid adding to the queue twice.
What you've described is indeed safe, but it also sounds, frankly, brittle and (as you note) maintenance could become an issue. Without sample code, it's really hard to tell what's really easiest to understand, so an already subjective question becomes frankly unanswerable. Could you ask a coworker for a code review? (Particularly one that's likely to have to deal with this pattern.) I'm going to trust you that this is indeed the simplest approach, but doing something like wrapping synchronized blocks around writes would increase safety now and in the future. That said, you obviously know your code better than I do.

When do we make a call to use between Synchronised method and Synchronised Block

Can any one please share their experience on
"When do we make a call to use between Synchronised method and Synchronised Block"
Any Performance Issues?
When do we make a call to use between Synchronised method and Synchronised Block.
If you want to lock for the duration of a method call AND you want to lock on this (or the current class, for a static method), then synchronized methods are the right solution.
If you are locking on something else (e.g. a private lock object or some internal data structure), then the synchronized block approach is better.
Similarly, if only some of the code in a procedure call needs to be done holding a lock, it is better to use a synchronized block and put just that code in the block.
Any Performance Issues?
None, apart from the general principal that it is a bad idea to hold a lock longer than you need to. (The longer a lock is held, the more likely it is that other threads will need to wait.)
I'm not sure what you mean by "synchronized statement". You use the synchronized keyword to either denote that a method is synchronized or (as you mention) a block of code within it.
I typically favour keeping methods small and manageable and therefore labelling the entire method as synchronized (when required). This has the advantage that it is immediately evident to a user of the class as to which methods represent critical sections. It also allows you as a programmer to more easily determine whether a class is thread-safe, namely: Are all public methods that access mutable data labelled as synchronized?
There is no performance difference between the approaches as both require obtaining a lock.
Always try to use Synchronized block if possible, for any case its not possible then go for Synchronized method. Will be a lot of performance improvements depend on the no. of lines in the Synchronized method. As no. of lines increases,performance will degrade.
I tend to use synchronized methods when it is the public interface that requires synchronization (c.f. synchronized collections) and synchronized blocks for class internal synchronization, such as access to a shared resource which needs to be thread safe.
There is also a readability issue. I find method level synchronization to be neater and more obvious as the code is not cluttered with lock management.
As for performance, I'm not aware of any particular difference in the behaviour of either approach. I think it is more important to avoid excessive synchronization, so a method which only needs access to the shared resource 1 in 10 calls should use block level rather than method level synchronization.
My approach to any given scenario is usually based on a mix of these factors, modified by previous experience.
In terms of overall performance, there is no difference between having a synchronized block or method. The issue is really in terms of coding practices. Synchronizing a method seems like an easy thing to do however, when working with multiple people on a project, it becomes possible for someone to alter a simple light method that someone else synchronized into a heavy operation one. In fact, one really good example (from personal experience) of where you can get into trouble is when you are using a dependency injection framework and you have methods in a service object that interact with data access objects (daos) that are synchronized. The expectation is that the daos perform quickly so the locks are only held briefly. Someone else comes along and either alters the daos or creates and injects new ones that are much slower and suddenly things start to really slow down because the service object has synchronized interaction with it.
I don't think synchronized blocks can get around that issue that I described above however, at least with synchronized blocks, they are harder to miss than a declaration in the method.

Disadvantage of synchronized methods in Java

What are the disadvantages of making a large Java non-static method synchronized? Large method in the sense it will take 1 to 2 mins to complete the execution.
If you synchronize the method and try to call it twice at the same time, one thread will have to wait two minutes.
This is not really a question of "disadvantages". Synchronization is either necessary or not, depending on what the method does.
If it is critical that the code runs only once at the same time, then you need synchronization.
If you want to run the code only once at the same time to preserve system resources, you may want to consider a counting Semaphore, which gives more flexibility (such as being able to configure the number of concurrent executions).
Another interesting aspect is that synchronization can only really be used to control access to resources within the same JVM. If you have more than one JVM and need to synchronize access to a shared file system or database, the synchronized keyword is not at all sufficient. You will need to get an external (global) lock for that.
If the method takes on the order of minutes to execute, then it may not need to be synchronized at such a coarse level, and it may be possible to use a more fine-grained system, perhaps by locking only the portion of a data structure that the method is operating on at the moment. Certainly, you should try to make sure that your critical section isn't really 2 minutes long - any method that takes that long to execute (regardless of the presence of other threads or locks) should be carefully studied as a candidate for parallelization. For a computation this time-consuming, you could be acquiring and releasing hundreds of locks and still have it be negligible. (Or, to put it another way, even if you need to introduce a lot of locks to parallelize this code, the overhead probably won't be significant.)
Since your method takes a huge amount of time to run, the relatively tiny amount of time it takes to acquire the synchronized lock should not be important.
A bigger problem could appear if your program is multithreaded (which I'm assuming it is, since you're making the method synchronized), and more than one thread needs to access that method, it could become a bottleneck. To prevent this, you might be able to rewrite the method so that it does not require synchronization, or use a synchronized block to reduce the size of the protected code (in general, the smaller the amount of code that is protected by the synchronize keyword, the better).
You can also look at the java.util.concurrent classes, as you may find a better solution there as well.
If the object is shared by multiple threads, if one thread tries to call the synchronized method on the object while another's call is in progress, it will be blocked for 1 to 2 minutes. In the worst case, you could end up with a bottleneck where the throughput of your system is dominated by executing these computations one at a time.
Whether this is a problem or not depends on the details of your application, but you probably should look at more fine-grained synchronization ... if that is practical.
In simple two lines Disadvantage of synchronized methods in Java :
Increase the waiting time of the thread
Create performance problem
First drawback is that threads that are blocked waiting to execute synchronize code can't be interrupted.Once they're blocked their stuck there, until they get the lock for the object the code is synchronizing on.
Second drawback is that the synchronized block must be within the same method in other words we can't start a synchronized block in one method and end the syncronized block in another for obvious reasons.
The third drawback is that we can't test to see if an object's intrinsic lock is available or find out any other information about the lock also if the lock isn't available we can't timeout after we waited lock for a while. When we reach the beginning of a synchronized block we can either get the lock and continue executing or block at that line of code until we get the lock.
The fourth drawback is that if multiple threads are awaiting to get lock, it's not first come first served. There isn't set order in which the JVM will choose the next thread that gets the lock, so the first thread that blocked could be the last thread to get the lock and vice Versa.
so instead of using synchronization we can prevent thread interference using classes that implement the java.util.concurrent locks.lock interface.
In simple two lines Disadvantage of synchronized methods in Java :
1. Increase the waiting time of the thread
2. Create a performance problem

Categories