Collection with record level locking? - java

I need to make a data structure keyed off of username and then some data (additional collections) in a POJO. The data needs to be thread safe.
So I'm thinking for the main structure, ConcurrentHashMap<String, MyPOJO>. For the operations I need to perform on MyPOJO, I may either just read it, or I may perform write operations on it.
Would the best approach be to do a get on the map and then operate on MyPOJO in a syncronized block? I assume I just need to put a syncronized block in the update methods and the read methods would automatically be blocked? Is that the best approach in a highly concurrent app? Or do I need to use something like ReadWriteLock on BOTH the get/set operations?
If I use something like StampedLock, each MyPOJO would need one correct, so I can do record level locking?
Thanks!

Would the best approach be to do a get on the map and then operate on MyPOJO in a synchronized block?
I assume that you mean a synchronized block on the MyPOJO instance itself (or a private lock owned by the instance).
My answer is yes, if you do it right.
I assume I just need to put a synchronized block in the update methods and the read methods would automatically be blocked?
No, that's not correct. All methods that access or update a mutable object would need to synchronize on the same lock.
If you don't synchronize for both reads and writes, you risk various thread-safety concerns, including problems with visibility of writes. Heisenbugs.
Is that the best approach in a highly concurrent app? Or do I need to use something like ReadWriteLock on BOTH the get/set operations?
It depends.
On the ReadWriteLock issue:
Unless it is likely that you will get significant lock contention on a specific MyPOJO instance, it is probably not worth the effort to optimize this.
If the access and update methods only hold the lock for a relatively short period of time, that reduces the impact of any contention.
More generally, I have a suspicion that you might be confusing "highly concurrent" with "highly scalable". Java multi-threading only performs up to the limit of the cores (and memory) on a single machine. Beyond that, clever tweaks to improve concurrency get you nowhere. To scale up further, you need to change the system architecture so that requests are handled by multiple JVM instances on different machines.
So ... to sum up ... ReadWriteLock might help if you have significant contention on individual MyPOJO instances AND there are likely to be a lot of parallel read operations on individual instances.
If I use something like StampedLock, each MyPOJO would need one correct, so I can do record level locking?
I doubt that there would be much benefit unless you have significant contention; see above. But yes, if you used a StampedLock per instance you would get record-level locking ... just like you would other per-instance locking.
FWIW: This smells to me of "premature optimization". Furthermore, if you expect that your solution will need to scale beyond a single JVM in the short to medium term, then it is arguably a waste of time to optimize the single JVM solution too much.

Related

MultiThreading : Map Performance

In multiThreading I want to use a map which will be updated, which Map will be better considering the performance 1. HashMap 2. ConcurrentHashMap? also, will it perform slow if i make it volatile?
It is going to be used in a Java batch for approx. 20Million records.
Currently i am not sharing this map among threads.
will sharing the map among threads reduce performance?
HashMap will be better performance-wise, as it is not synchronized in any way. ConcurrentHashMap adds overhead to manage concurrent read and - especially - concurrent write access.
That being said, in a multithreaded environment, you are responsible for synchronizing access to HashMap as needed, which will cost performance, too.
Therefore, I would go for HashMap only if the use case allows for very specific optimization of the synchronization logic. Otherwise, ConcurrentHashMap will save you a lot of time working out the synchronization.
However, please note that even with ConcurrentHashMap you will need to carefully consider what level of synchronization you need. ConcurrentHashMap is thread-safe, but not fully synchronized. For instance, if you absolutely need to synchronize each read access with each write access, you will still need custom logic, since for a read operation ConcurrentHashMap will provide the state after the last successfully finished write operation. That is, there might still be an ongoing write operation which will not be seen by the read.
As for volatile, this only ensures that changes to that particular field will be synchronized between threads. Since you will likely not change the reference to the HashMap / ConcurrentHashMap, but work on the instance, the performance overhead will be negligible.

Penalty of AtomicVariables over locks in Java

For my current project, i try to use Atomic Integers and Atomic Booleans where ever possible when we have more than 1 thread accessing it. This helps in keeping the logic lock free(Internally i know it may still use locks) and the code much cleaner. The use case is mostly for configuration tags which may change abruptly.
I want to know what is the penalty performance wise of using Atomic Variables, will this invalidate the cache far too often and actually make my solution inferior than just using locks?
atomic* classes does not use locks, it uses CAS (compare-and-set) to achieve thread-safety. in general they are faster then locking variant, however under very high contention they tends to be actually slower.
if you want some analogy, it is something like optimistic and pessimistic locking in DB.
if you are interested in more details, you might want to check Java Performance: The Definitive Guide book.
ADD: With cache I assume you are referring to happens-before relationship. Here is quote from oracle tutorial:
The java.util.concurrent.atomic package defines classes that support
atomic operations on single variables. All classes have get and set
methods that work like reads and writes on volatile variables. That
is, a set has a happens-before relationship with any subsequent get on
the same variable.

Java. Read, write, separate synch

I am learning multithreading, and I have a little question.
When I am sharing some variable between threads (ArrayList, or something other like double, float), should it be lcoked by the same object in read/write? I mean, when 1 thread is setting variable value, can another read at same time withoud any problems? Or should it be locked by same object, and force thread to wait with reading, until its changed by another thread?
All access to shared state must be guarded by the same lock, both reads and writes. A read operation must wait for the write operation to release the lock.
As a special case, if all you would to inside your synchronized blocks amounts to exactly one read or write operation, then you may dispense with the synchronized block and mark the variable as volatile.
Short: It depends.
Longer:
There is many "correct answer" for each different scenarios. (and that makes programming fun)
Do the value to be read have to be "latest"?
Do the value to be written have let all reader known?
Should I take care any race-condition if two threads write?
Will there be any issue if old/previous value being read?
What is the correct behaviour?
Do it really need it to be correct ? (yes, sometime you don't care for good)
tl;dr
For example, not all threaded programming need "always correct"
sometime you tradeoff correctness with performance (e.g. log or progress counter)
sometime reading old value is just fine
sometime you need eventually correct (e.g. in map-reduce, nobody nor synchronized is right until all done)
in some cases, correct is mandatory for every moment (e.g. your bank account balance)
in write-once, read-only it doesn't matter.
sometime threads in groups with complex cases.
sometime many small, independent lock run faster, but sometime flat global lock is faster
and many many other possible cases
Here is my suggestion: If you are learning, you should thing "why should I need a lock?" and "why a lock can help in DIFFERENT cases?" (not just the given sample from textbook), "will if fail or what could happen if a lock is missing?"
If all threads are reading, you do not need to synchronize.
If one or more threads are reading and one or more are writing you will need to synchronize somehow. If the collection is small you can use synchronized. You can either add a synchronized block around the accesses to the collection, synchronized the methods that access the collection or use a concurrent threadsafe collection (for example, Vector).
If you have a large collection and you want to allow shared reading but exclusive writing you need to use a ReadWriteLock. See here for the JavaDoc and an exact description of what you want with examples:
ReentrantReadWriteLock
Note that this question is pretty common and there are plenty of similar examples on this site.

Most efficient way to make a data structure thread-safe (Java)

I have a shared Map data structure that needs to be thread-safe. Is synchronized the most efficient way to read or add to the Map?
Thanks!
Edit: The data structure is a non-updatable cache, i.e. once it fills up it does not update the cache. So lots of writes initially with some reads then it is mostly reads
"Most efficient" is relative, of course, and depends on your specific situation. But consider something like ConcurrentHashMap if you expect there to be many threads working with the map simultaneously; it's thread safe but still allows concurrent access, unlike Hashtable or Collections.synchronizedMap().
That depends on how you use it in the app.
If you're doing lots of reads and writes on it, a ConcurrentHashMap is possibly the best choice, if it's mostly reading, a common Map wrapped inside a collection using a ReadWriteLock
(since writes would not be common, you'd get faster access and locking only when writing).
Collections.synchronizedMap() is possibly the worst case, since it might just give you a wrapper with all methods synchronized, avoid it at all costs.
For your specific use case (non-updatable cache), a copy on write map will outperform both a synchronized map and ConcurrentHashMap.
See: https://labs.atlassian.com/wiki/display/CONCURRENT/CopyOnWriteMap as one example (I believe apache also has a copy on write map implementation).
synchronised methods or collections will certainly work. It's not the most efficient approach but is simple to implement and you won't notice the overhead unless you are access the structure millions of times per second.
A better idea though might be to use a ConcurrentHashMap - this was designed for concurrency from the start and should perform better in a highly concurrent situation.

Java avoid race condition WITHOUT synchronized/lock

In order to avoid race condition, we can synchronize the write and access methods on the shared variables, to lock these variables to other threads.
My question is if there are other (better) ways to avoid race condition? Lock make the program slow.
What I found are:
using Atomic classes, if there is only one shared variable.
using a immutable container for multi shared variables and declare this container object with volatile. (I found this method from book "Java Concurrency in Practice")
I'm not sure if they perform faster than syncnronized way, is there any other better methods?
thanks
Avoid state.
Make your application as stateless as it is possible.
Each thread (sequence of actions) should take a context in the beginning and use this context passing it from method to method as a parameter.
When this technique does not solve all your problems, use the Event-Driven mechanism (+Messaging Queue).
When your code has to share something with other components it throws event (message) to some kind of bus (topic, queue, whatever).
Components can register listeners to listen for events and react appropriately.
In this case there are no race conditions (except inserting events to the queue). If you are using ready-to-use queue and not coding it yourself it should be efficient enough.
Also, take a look at the Actors model.
Atomics are indeed more efficient than classic locks due to their non-blocking behavior i.e. a thread waiting to access the memory location will not be context switched, which saves a lot of time.
Probably the best guideline when synchronization is needed is to see how you can reduce the critical section size as much as possible. General ideas include:
Use read-write locks instead of full locks when only a part of the threads need to write.
Find ways to restructure code in order to reduce the size of critical sections.
Use atomics when updating a single variable.
Note that some algorithms and data structures that traditionally need locks have lock-free versions (they are more complicated however).
Well, first off Atomic classes uses locking (via synchronized and volatile keywords) just as you'd do if you did it yourself by hand.
Second, immutability works great for multi-threading, you no longer need monitor locks and such, but that's because you can only read your immutables, you cand modify them.
You can't get rid of synchronized/volatile if you want to avoid race conditions in a multithreaded Java program (i.e. if the multiple threads cand read AND WRITE the same data). Your best bet is, if you want better performance, to avoid at least some of the built in thread safe classes which do sort of a more generic locking, and make your own implementation which is more tied to your context and thus might allow you to use more granullar synchronization & lock aquisition.
Check out this implementation of BlockingCache done by the Ehcache guys;
http://www.massapi.com/source/ehcache-2.4.3/src/net/sf/ehcache/constructs/blocking/BlockingCache.java.html
One of the alternatives is to make shared objects immutable. Check out this post for more details.
You can perform up to 50 million lock/unlocks per second. If you want this to be more efficient I suggest using more course grain locking. i.e. don't lock every little thing, but have locks for larger objects. Once you have much more locks than threads, you are less likely to have contention and having more locks may just add overhead.

Categories