ConcurrentHashMap: what's the point in locking updates only?

ConcurrentHashMap: what's the point in locking updates only? - java

I always thought that ConcurrentHashMap and similar classes (which maintain synchronized updates and don't synchronize reads) did a very useful and intuitive thing: they did not lock reads and lock all functionality on updates. And a strategy like that does keep all things consistent.
But I read the documentation carefully, and opened the implementation of ConcurrentHashMap, and as I understand now, it does not block reads when another thread is performing updates. And if one thread starts doing putAll(hugeCollection) and another thread repeats contains(theSameObjectForAllCalls) at the same time then it's more then likely that the second thread gets different results while putAll is still working.
Here is the related part from the docs:
For aggregate operations such as
putAll and clear, concurrent
retrievals may reflect insertion or
removal of only some entries.
Another interesting thing is that:
Retrievals reflect the results of the
most recently completed update
operations holding upon their onset.
This works not due to some locking, but because a new object is first being added and only after that objects counter is incremented and the object becomes visible for read operations.
So, what's the whole point of locking updates?

Brian Goetz explained the working in an article at developer works. That should help out.

These classes give you additional control of tradeoffs around concurrency versus consistency. If what you really want is to have atomic, consistent reads, then get yourself a synchronized hashmap. However, if you need more concurrency, you can get a ConcurrentHashMap, so long as you're aware that reads will get an accurate picture of what the map looked like at some point, just not necessarily now.
The tradeoff is that you have to think harder about the consistency of the data, but your app can have much much greater concurrency.

Related

MultiThreading : Map Performance

In multiThreading I want to use a map which will be updated, which Map will be better considering the performance 1. HashMap 2. ConcurrentHashMap? also, will it perform slow if i make it volatile?
It is going to be used in a Java batch for approx. 20Million records.
Currently i am not sharing this map among threads.
will sharing the map among threads reduce performance?

HashMap will be better performance-wise, as it is not synchronized in any way. ConcurrentHashMap adds overhead to manage concurrent read and - especially - concurrent write access.
That being said, in a multithreaded environment, you are responsible for synchronizing access to HashMap as needed, which will cost performance, too.
Therefore, I would go for HashMap only if the use case allows for very specific optimization of the synchronization logic. Otherwise, ConcurrentHashMap will save you a lot of time working out the synchronization.
However, please note that even with ConcurrentHashMap you will need to carefully consider what level of synchronization you need. ConcurrentHashMap is thread-safe, but not fully synchronized. For instance, if you absolutely need to synchronize each read access with each write access, you will still need custom logic, since for a read operation ConcurrentHashMap will provide the state after the last successfully finished write operation. That is, there might still be an ongoing write operation which will not be seen by the read.
As for volatile, this only ensures that changes to that particular field will be synchronized between threads. Since you will likely not change the reference to the HashMap / ConcurrentHashMap, but work on the instance, the performance overhead will be negligible.

Java. Read, write, separate synch

I am learning multithreading, and I have a little question.
When I am sharing some variable between threads (ArrayList, or something other like double, float), should it be lcoked by the same object in read/write? I mean, when 1 thread is setting variable value, can another read at same time withoud any problems? Or should it be locked by same object, and force thread to wait with reading, until its changed by another thread?

All access to shared state must be guarded by the same lock, both reads and writes. A read operation must wait for the write operation to release the lock.
As a special case, if all you would to inside your synchronized blocks amounts to exactly one read or write operation, then you may dispense with the synchronized block and mark the variable as volatile.

Short: It depends.
Longer:
There is many "correct answer" for each different scenarios. (and that makes programming fun)
Do the value to be read have to be "latest"?
Do the value to be written have let all reader known?
Should I take care any race-condition if two threads write?
Will there be any issue if old/previous value being read?
What is the correct behaviour?
Do it really need it to be correct ? (yes, sometime you don't care for good)
tl;dr
For example, not all threaded programming need "always correct"
sometime you tradeoff correctness with performance (e.g. log or progress counter)
sometime reading old value is just fine
sometime you need eventually correct (e.g. in map-reduce, nobody nor synchronized is right until all done)
in some cases, correct is mandatory for every moment (e.g. your bank account balance)
in write-once, read-only it doesn't matter.
sometime threads in groups with complex cases.
sometime many small, independent lock run faster, but sometime flat global lock is faster
and many many other possible cases
Here is my suggestion: If you are learning, you should thing "why should I need a lock?" and "why a lock can help in DIFFERENT cases?" (not just the given sample from textbook), "will if fail or what could happen if a lock is missing?"

If all threads are reading, you do not need to synchronize.
If one or more threads are reading and one or more are writing you will need to synchronize somehow. If the collection is small you can use synchronized. You can either add a synchronized block around the accesses to the collection, synchronized the methods that access the collection or use a concurrent threadsafe collection (for example, Vector).
If you have a large collection and you want to allow shared reading but exclusive writing you need to use a ReadWriteLock. See here for the JavaDoc and an exact description of what you want with examples:
ReentrantReadWriteLock
Note that this question is pretty common and there are plenty of similar examples on this site.

Approach to a thread safe program

All,
What should be the approach to writing a thread safe program. Given a problem statement, my perspective is:
1 > Start of with writing the code for a single threaded environment.
2 > Underline the fields which would need atomicity and replace with possible concurrent classes
3 > Underline the critical section and enclose them in synchronized
4 > Perform test for deadlocks
Does anyone have any suggestions on the other approaches or improvements to my approach. So far, I can see myself enclosing most of the code in synchronized blocks and I am sure this is not correct.
Programming in Java

Writing correct multi-threaded code is hard, and there is not a magic formula or set of steps that will get you there. But, there are some guidelines you can follow.
Personally I wouldn't start with writing code for a single threaded environment and then converting it to multi-threaded. Good multi-threaded code is designed with multi-threading in mind from the start. Atomicity of fields is just one element of concurrent code.
You should decide on what areas of the code need to be multi-threaded (in a multi-threaded app, typically not everything needs to be threadsafe). Then you need to design how those sections will be threadsafe. Methods of making one area of the code threadsafe may be different than making other areas different. For example, understanding whether there will be a high volume of reading vs writing is important and might affect the types of locks you use to protect the data.
Immutability is also a key element of threadsafe code. When elements are immutable (i.e. cannot be changed), you don't need to worry about multiple threads modifying them since they cannot be changed. This can greatly simplify thread safety issues and allow you to focus on where you will have multiple data readers and writers.
Understanding details of concurrency in Java (and details of the Java memory model) is very important. If you're not already familiar with these concepts, I recommend reading Java Concurrency In Practice http://www.javaconcurrencyinpractice.com/.

You should use final and immutable fields wherever possible, any other data that you want to change add inside:
synchronized (this) {
// update
}
And remember, sometimes stuff brakes, and if that happens, you don't want to prolong the program execution by taking every possible way to counter it - instead "fail fast".

As you have asked about "thread-safety" and not concurrent performance, then your approach is essentially sound. However, a thread-safe program that uses synchronisation probably does not scale much in a multi cpu environment with any level of contention on your structure/program.
Personally I like to try and identify the highest level state changes and try and think about how to make them atomic, and have the state changes move from one immutable state to another – copy-on-write if you like. Then the actual write can be either a compare-and-set operation on an atomic variable or a synchronised update or whatever strategy works/performs best (as long as it safely publishes the new state).
This can be a bit difficult to structure if your new state is quite different (requires updates to several fields for instance), but I have seen it very successfully solve concurrent performance issues with synchronised access.

Buy and read Brian Goetz's "Java Concurrency in Practice".

Any variables (memory) accessible by multiple threads potentially at the same time, need to be protected by a synchronisation mechanism.

Thread safe Hash Map?

I am writing an application which will return a HashMap to user. User will get reference to this MAP.
On the backend, I will be running some threads which will update the Map.
What I have done so far?
I have made all the backend threads so share a common channel to update the MAP. So at backend I am sure that concurrent write operation will not be an issue.
Issues I am having
If user tries to update the MAP and simultaneously MAP is being updated at backend --> Concurrent write operation problem.
If use tries to read something from MAP and simultaneously MAP is being updated at backend --> concurrent READ and WRITE Operation problem.
Untill now I have not face any such issue, but i m afraid that i may face in future. Please give sugesstions.
I am using ConcurrentHashMap<String, String>.

You are on the right track using ConcurrentHashMap. For each point:
Check out the methods putIfAbsent and replace both are threadsafe and combine checking current state of hashmap and updating it into one atomic operation.
The get method is not synchronized internally but will return the most recent value for the specified key available to it (check the ConcurrentHashMap class Javadoc for discussion).
The benefit of ConcurrentHashMap over something like Collections.synchronizedMap is the combined methods like putIfAbsent which provide traditional Map get and put logic in an internally synchronized way. Use these methods and do not try to provide your own custom synchronization over ConcurrentHashMap as it will not work. The java.util.concurrent collections are internally synchronized and other threads will not respond to attempts at synchronizing the object (e.g. synchronize(myConcurrentHashMap){} will not block other threads).

Side Note:
You might want to look into the lock free hash table implementation by Cliff Click, it's part of the Highly Scalable Java library
(Here's a Google Talk by Cliff Click about this lock free hash.)

ConcurrentHashMap was designed and implemented to avoid any issues with the scenarios you describe. You have nothing to worry about.
A hash table supporting full
concurrency of retrievals and
adjustable expected concurrency for
updates.updates.
javadoc of ConcurrentHashMap

How can I perform a threadsafe point-in-time snapshot of a key-value map in Java?

In my application, I have a key-value map that serves as a central repository for storing data that is used to return to a defined state after a crash or restart (checkpointing).
The application is multithreaded and several threads may put key-value pairs into that map. One thread is responsible for regularly creating a checkpoint, i. e. serialize the map to persistant storage.
While the checkpoint is being written, the map should remain unchanged. It's rather easy to avoid new items being added, but what about other threads changing members of "their" objects inside the map?
I could have a single object whose monitor is seized when the checkpointing starts and wrap all write access to any member of the map, and members thereof, in blocks synchronizing on that object. This seems very error-prone and tedious to me.
I could also make the map private to the checkpointer and only put copies of the submitted objects in it. But then I would have to ensure that the copies are deep copies and I wouldn't be able to have the data in the map being automatically updated, on every change to the submitted objects, the submitters would have to re-submit them. This seems like a lot of overhead and also error-prone, as I have to remember putting resubmit code in all the right places.
What's an elegant and reliable way to solve this?

what about other threads changing members of "their" objects inside the map
Here you have a problem :) and it cannot be solved by any kind of Map...
One solution would be to allow only immutable objects in your Map, but this may be impossible for you.
Otherwise you have to share a lock will all threads that may change data referenced by your map and block them all during your snapshot ; but this is a stop the world approach...

pgras is right that immutability would fix things, but that would also be tough. You could just lock the whole thing but that could be a performance problem. I can think of two good ideas.
First is to use a ReadWriteLock (which requires 1.5 or newer). Since your checkpoint can acquire the read lock it can be assured things are safe, but when no one is reading performance should be pretty good. This is still a pretty coarse lock, so you may also want to do #2...
Second is to break things up. Each area of the program could keep it's own map (the map for GUI stuff, the map for user settings, the map for hardware settings, whatever). Each one would have a lock on it and things would go about as usual. When it came time to checkpoint, the checkpointer would grab ALL the locks (so things are consistant) and then do it's job. The catch here is you have define an order for the locks to be grabbed in (say alphabetical) otherwise you'll end-up with deadlocks.
If the maps are orthogonal to each other (updates to one don't require updates to another to be consistent) then the easiest thing may be to push the updates to a central "backup" map in the checkpointer, not unlike something you described.
My biggest question to you would be, how much of a problem is this (performance wise)? Are updates very frequent, or are they rare? That would help to advise on something since my last idea (previous paragraph) could be slow, but it's easy and may not matter.
There is a fantastic book called Java Concurrency in Practice which is basically the Java threading bible. It discusses how to figure out this kind of stuff and strategies to avoid problems or make solving them easier. If you are going to be doing more threading, it's a very useful read.
Actually if your key values are orthogonal to eachother, then things are really easy. The ConcurrentMap interface (there are implemetations such as the ConcurrentHashMap) would solve your problems since they can do changes atomically, so readers wouldn't see inconsistent data. But if you have any two (or more) keys that must be updated at the same time this won't cover you.
I hope this helps. Threading access to shared data structures is complex stuff.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.