Why ConcurrentHashMap cannot be locked for exclusive access?

Why ConcurrentHashMap cannot be locked for exclusive access? - java

A quote from #JCIP :
"Since a ConcurrentHashMap cannot be locked for exclusive access, we
cannot use client-side locking to create new atomic operations such as
put-if-absent, as we did for Vector"
Why we can't just acquire the lock in order to implement additional atomic methods and keep the collection thread-safe (like synchronized collections returned by Collections.synchronizedxxx factory) :

The whole point of the ConcurrentHashMap is that read operations never block, i.e. do not have to check for locks. That precludes the ability to have such a lock.
Why we can't just acquire the lock :
You could do that, but you have to do it consistently for all access paths to the map, and then you have completely negated to purpose of a concurrent data structure. It is supposed to be lock-free.

Why? Because the implementation does not support it. Straight from the ConcurrentHashMap JavaDocs:
There is not any support for locking the entire table in a way that prevents all access
...which is, by definition, "exclusive access."

Code you have written is your implementation, and if you use it that way then all other operations must work that way i.e. all operations must accquire same lock.
But the main point here is that java has not provided ConcurrentHashMap for this purpose, its purpose is to allow multiple thread to work simultaneously.
For your requirement go for HashTable.

Related

Confused about synchronization and thread safe ? java

Actually, I am a bit confused in regards of several explanation from website or blog about synchronization and thread-safe. I've done some research on different class of Core Java Api or Java Framework (Collections). And i've often noticed that some class are synchronize and thread-safe which means, at a time, only one thread can access the code.
But i need some precision :
A class is synchronize so its thread-safe ?
Or synchronize and thread-safe have two different meaning ?
Best regards

A class is synchronize so its thread-safe ?
A class is not synchronized. Rather a method, or a block of code is synchronized.
Synchronization (using synchronized) is one way to make code thread-safe. There are other ways.
Or synchronize and thread-safe have two different meaning ?
Yes. They have different meanings.
And i've often noticed that some class are synchronize and thread-safe which means, at a time, only one thread can access the code.
Actually, if you "noticed" that, you were not paying attention!
With a synchronized method, only one thread can access the code while holding a given lock; i.e. you get mutual exclusion. If two threads use different locks, then you won't get mutual exclusion.
The other thing to note is that merely using synchronized does not guarantee thread-safety. You need to use it in the right way:
threads need to synchronize on the appropriate objects / locks
threads need to synchronize in all appropriate code
if the code entails acquiring multiple locks, the locks need to be acquired in an order that avoids deadlocks.

Threadsafe vs Synchronized

I'm new to java.
I'm little bit confused between Threadsafe and synchronized.
Thread safe means that a method or class instance can be used by multiple threads at the same time without any problems occurring.
Where as Synchronized means only one thread can operate at single time.
So how they are related to each other?

The definition of thread safety given in Java Concurrency in Practice is:
A class is thread-safe if it behaves correctly when accessed from multiple threads, regardless of the scheduling or interleaving of the execution of those threads by the runtime environment, and with no additional synchronization or other coordination on the part of the calling code.
For example, a java.text.SimpleDateFormat object has internal mutable state that is modified when a method that parses or formats is called. If multiple threads call the methods of the same dateformat object, there is a chance a thread can modify the state needed by the other threads, with the result that the results obtained by some of the threads may be in error. The possibility of having internal state get corrupted causing bad output makes this class not threadsafe.
There are multiple ways of handling this problem. You can have every place in your application that needs a SimpleDateFormat object instantiate a new one every time it needs one, you can make a ThreadLocal holding a SimpleDateFormat object so that each thread of your program can access its own copy (so each thread only has to create one), you can use an alternative to SimpleDateFormat that doesn't keep state, or you can do locking using synchronized so that only one thread at a time can access the dateFormat object.
Locking is not necessarily the best approach, avoiding shared mutable state is best whenever possible. That's why in Java 8 they introduced a date formatter that doesn't keep mutable state.
The synchronized keyword is one way of restricting access to a method or block of code so that otherwise thread-unsafe data doesn't get corrupted. This keyword protects the method or block by requiring that a thread has to acquire exclusive access to a certain lock (the object instance, if synchronized is on an instance method, or the class instance, if synchronized is on a static method, or the specified lock if using a synchronized block) before it can enter the method or block, while providing memory visibility so that threads don't see stale data.

Thread safety is a desired behavior of the program, where the synchronized block helps you achieve that behavior. There are other methods of obtaining Thread safety e.g immutable class/objects. Hope this helps.

Thread safety: A thread safe program protects it's data from memory consistency errors. In a highly multi-threaded program, a thread safe program does not cause any side effects with multiple read/write operations from multiple threads on shared data (objects). Different threads can share and modify object data without consistency errors.
synchronized is one basic method of achieving ThreadSafe code.
Refer to below SE questions for more details:
What does 'synchronized' mean?
You can achieve thread safety by using advanced concurrency API. This documentation page provides good programming constructs to achieve thread safety.
Lock Objects support locking idioms that simplify many concurrent applications.
Concurrent Collections make it easier to manage large collections of data, and can greatly reduce the need for synchronization.
Atomic Variables have features that minimize synchronization and help avoid memory consistency errors.
ThreadLocalRandom (in JDK 7) provides efficient generation of pseudorandom numbers from multiple threads.
Refer to java.util.concurrent and java.util.concurrent.atomic packages too for other programming constructs.
Related SE question:
Synchronization vs Lock

Synchronized: only one thread can operate at same time.
Threadsafe: a method or class instance can be used by multiple threads at the same time without any problems occurring.
If you relate this question as, Why synchronized methods are thread safe? than you can get better idea.
As per the definition this appears to be confusive. But not,if you understand it analytically.
Synchronized means: sequentially one by one in an order,Not concurrently [Not at the same time].
synchronized method not allows to act another thread on it, While a thread is already working on it.This avoids concurrency.
example of synchronization: If you want to buy a movie ticket,and stand in a queue. you will get the ticket only after the person in front of you get the ticket.
Thread safe means: method becomes safe to be accessed by multiple threads without any problem at the same time.synchronized keyword is one of the way to achieve 'thread safe'. But Remember:Actually while multiple threads tries to access synchronized method they follow the order so becomes safe to access. Actually, Even they act at the same time, but cannot access the same resource(method/block) at the same time, because of synchronized behavior of the resource.
Because If a method becomes synchronized, so this is becomes safe to allow multiple threads to act on it, without any problem. Remember:: multiple threads "not act on it at the same time" hence we call synchronized methods thread safe.
Hope this helps to understand.

After patiently reading through a lot of answers and not being too technical at the same time, I could say something definite but close to what Nayak had already replied to fastcodejava above, which comes later on in my answer but look
synchronization is not even close to brute-forcing thread-safety; it's just making a piece of code (or method) safe and incorruptible for a single authorized thread by preventing it from being used by any other threads.
Thread safety is about how all threads accessing a certain element behave and get their desired results in the same way if they would have been sequential (or even not so), without any form of undesired corruption (sorry for the pleonasm) as in an ideal world.
One of the ways of achieving proximity to thread-safety would be using classes in java.util.concurrent.atomic.
Sad, that they don't have final methods though!

Nayak, when we declare a method as synchronized, all other calls to it from other threads are locked and can wait indefinitely. Java also provides other means of locking with Lock objects now.
You can also declare an object to be final or volatile to guarantee its availability to other concurrent threads.
ref: http://www.javamex.com/tutorials/threads/thread_safety.shtml

In practice, performance wise, Thread safe, Synchronised, non-thread safe and non-synchronised classes are ordered as:
Hashtable(slower) < Collections.SynchronizedMap < HashMap(fastest)

Explicit Locks vs Implicit Locks

Is using Locks (java.util.concurrent.locks.Lock) instead of keyword synchronized + method wait() and method notify() totally the same?
Can I thread-safely program using locks (explicit locks) rather than implicit locks (synchronized)?
As of know I have always been using implicit locks. I am aware of the advantages given by the Lock interface implementation like methods: isLocked(), getLockQueueLength(), getHoldCount(), etc... however still the old school way (wait() and notify()) would have other limits other than not having those methods?
I am also aware of the possibility of constructing a lock with a (boolean fairness) parameter which allows lack of starvation.

Yes, absolutely you can write thread-safe program using java.util.concurrent.locks.Lock. If you see any implementation of java.util.concurrent.locks.Lock like ReentrantLock internal implementation uses old synchronized blocks.
Lock implementations provide more extensive locking operations than can be obtained using synchronized methods and statements. They allow more flexible structuring, may have quite different properties, and may support multiple associated Condition objects.
Adding to my difference the synchronized keyword has naturally built in language support. This can mean the JIT can optimise synchronised blocks in ways it cannot with Locks. e.g. it can combine synchronized blocks.synchronized is best for a small number of threads accessing a lock and Lock may be best for a high number of threads accessing the same locks . Also synchronized block makes no guarantees about the sequence in which threads waiting to entering it are granted access.

Locks and synchronized blocks have the same semantics and provide the same guarantees from a Java Memory Model perspective. The main difference is that Locks provide more control (such as with tryLock or when asking a lock to be fair etc.) which allow for a more flexible and fine-grained lock management.
However, when you don't need those additional features, it is better to use a plain old synchronized block as it reduces the room for error (e.g. you can't "forget" to unlock it).

Concurrency design principles in practice

I have a Results object which is written to by several threads concurrently. However, each thread has a specific purpose and owns certain fields, so that no data is actually modified by more than one thread. The consumer of this data will not try to read it until all of the writer threads are done writing it. Because I know this to be true, there is no synchronization on the data writes and reads.
There is a RunningState object associated with this Results object which serves to coordinate this work. All of its methods are synchronized. When a thread is done with its work on this Results object, it calls done() on the RunningState object, which does the following: decrements a counter, checks if the counter has gone to 0 (indicating that all writers are done), and if so, puts this object on a concurrent queue. That queue is consumed by a ResultsStore which reads all of the fields and stores data in the database. Before reading any data, the ResultsStore calls RunningState.finalizeResult(), which is an empty method whose sole purpose is to synchronize on the RunningState object, to ensure that writes from all of the threads are visible to the reader.
Here are my concerns:
1) I believe that this will work correctly, but I feel like I'm violating good design principles to not synchronize on the data modifications to an object that is shared by multiple threads. However, if I were to add synchronization and/or split things up so each thread only saw the data it was responsible for, it would complicate the code. Anyone who modifies this area had better understand what's going on in any case or they're likely to break something, so from a maintenance standpoint I think the simpler code with good comments explaining how it works is a better way to go.
2) The fact that I need to call this do-nothing method seems like an indication of wrong design. Is it?
Opinions appreciated.

This seems mostly right, if a bit fragile (if you change the thread-local nature of one field, for instance, you may forget to synchronize it and end up with hard-to-trace data races).
The big area of concern is in memory visibility; I don't think you've established it. The empty finalizeResult() method may be synchronized, but if the writer threads didn't also synchronize on whatever it synchronizes on (presumably this?), there's no happens-before relationship. Remember, synchronization isn't absolute -- you synchronize relative to other threads that are also synchronized on the same object. Your do-nothing method will indeed do nothing, not even ensure any memory barrier.
You somehow need to establish a happens-before relationship between each thread doing its writes, and the thread that eventually reads. One way to do this without synchronization is via a volatile variable, or an AtomicInteger (or other atomic classes).
For instance, each writer thread can invoke counter.incrementAndGet(1) on the object, and the reading thread can then check that counter.get() == THE_CORRECT_VALUE. There's a happens-before relationship between a volatile/atomic field being written and it being read, which gives you the needed visibility.

Your design is sound, but it can be improved if you are using a true concurrent queue since a concurrent queue from the java.util.concurrent package already guarantees a happens before relationship between the thread putting an item into the queue, and the thread taking an item out, so this precludes needing to call finalizeResult() in the taking thread (so no need for that "do nothing" method call).
From java.util.concurrent package description:
The methods of all classes in java.util.concurrent and its subpackages
extend these guarantees to higher-level synchronization. In
particular:
Actions in a thread prior to placing an object into any
concurrent collection happen-before actions subsequent to the access
or removal of that element from the collection in another thread.
The comments in another answer concerning using an AtomicInteger instead of synchronization are also wise (as using an AtomicInteger to do your thread counting will likely perform better than synchronization), just make sure to get the value of the count after the atomic decrement (e.g. decrementAndGet()) when comparing to 0 in order to avoid adding to the queue twice.

What you've described is indeed safe, but it also sounds, frankly, brittle and (as you note) maintenance could become an issue. Without sample code, it's really hard to tell what's really easiest to understand, so an already subjective question becomes frankly unanswerable. Could you ask a coworker for a code review? (Particularly one that's likely to have to deal with this pattern.) I'm going to trust you that this is indeed the simplest approach, but doing something like wrapping synchronized blocks around writes would increase safety now and in the future. That said, you obviously know your code better than I do.

Analysing a BlockingQueue usage example

I was looking at the "usage example based on a typical producer-consumer scenario" at:
http://download.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/BlockingQueue.html#put(E)
Is the example correct?
I think the put and take operations need a lock on some resource before proceeding to modify the queue, but that is not happening here.
Also, had this been a Concurrent kind of a queue, the lack of locks would have been understandable since atomic operations on a concurrent queue do not need locks.

I do not think there is something to add to what is written in api:
A Queue that additionally supports operations that wait for the queue to become non-empty when retrieving an element, and wait for space to become available in the queue when storing an element.
BlockingQueue implementations are thread-safe. All queuing methods achieve their effects atomically using internal locks or other forms of concurrency control.

BlockingQueue is just an interface. This implementation could be using synchronzed blocks, Lock or be lock-free. AFAIK most methods use Lock in the implementation.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.