MultiThreading : Map Performance

MultiThreading : Map Performance - java

In multiThreading I want to use a map which will be updated, which Map will be better considering the performance 1. HashMap 2. ConcurrentHashMap? also, will it perform slow if i make it volatile?
It is going to be used in a Java batch for approx. 20Million records.
Currently i am not sharing this map among threads.
will sharing the map among threads reduce performance?

HashMap will be better performance-wise, as it is not synchronized in any way. ConcurrentHashMap adds overhead to manage concurrent read and - especially - concurrent write access.
That being said, in a multithreaded environment, you are responsible for synchronizing access to HashMap as needed, which will cost performance, too.
Therefore, I would go for HashMap only if the use case allows for very specific optimization of the synchronization logic. Otherwise, ConcurrentHashMap will save you a lot of time working out the synchronization.
However, please note that even with ConcurrentHashMap you will need to carefully consider what level of synchronization you need. ConcurrentHashMap is thread-safe, but not fully synchronized. For instance, if you absolutely need to synchronize each read access with each write access, you will still need custom logic, since for a read operation ConcurrentHashMap will provide the state after the last successfully finished write operation. That is, there might still be an ongoing write operation which will not be seen by the read.
As for volatile, this only ensures that changes to that particular field will be synchronized between threads. Since you will likely not change the reference to the HashMap / ConcurrentHashMap, but work on the instance, the performance overhead will be negligible.

Related

Collection with record level locking?

I need to make a data structure keyed off of username and then some data (additional collections) in a POJO. The data needs to be thread safe.
So I'm thinking for the main structure, ConcurrentHashMap<String, MyPOJO>. For the operations I need to perform on MyPOJO, I may either just read it, or I may perform write operations on it.
Would the best approach be to do a get on the map and then operate on MyPOJO in a syncronized block? I assume I just need to put a syncronized block in the update methods and the read methods would automatically be blocked? Is that the best approach in a highly concurrent app? Or do I need to use something like ReadWriteLock on BOTH the get/set operations?
If I use something like StampedLock, each MyPOJO would need one correct, so I can do record level locking?
Thanks!

Would the best approach be to do a get on the map and then operate on MyPOJO in a synchronized block?
I assume that you mean a synchronized block on the MyPOJO instance itself (or a private lock owned by the instance).
My answer is yes, if you do it right.
I assume I just need to put a synchronized block in the update methods and the read methods would automatically be blocked?
No, that's not correct. All methods that access or update a mutable object would need to synchronize on the same lock.
If you don't synchronize for both reads and writes, you risk various thread-safety concerns, including problems with visibility of writes. Heisenbugs.
Is that the best approach in a highly concurrent app? Or do I need to use something like ReadWriteLock on BOTH the get/set operations?
It depends.
On the ReadWriteLock issue:
Unless it is likely that you will get significant lock contention on a specific MyPOJO instance, it is probably not worth the effort to optimize this.
If the access and update methods only hold the lock for a relatively short period of time, that reduces the impact of any contention.
More generally, I have a suspicion that you might be confusing "highly concurrent" with "highly scalable". Java multi-threading only performs up to the limit of the cores (and memory) on a single machine. Beyond that, clever tweaks to improve concurrency get you nowhere. To scale up further, you need to change the system architecture so that requests are handled by multiple JVM instances on different machines.
So ... to sum up ... ReadWriteLock might help if you have significant contention on individual MyPOJO instances AND there are likely to be a lot of parallel read operations on individual instances.
If I use something like StampedLock, each MyPOJO would need one correct, so I can do record level locking?
I doubt that there would be much benefit unless you have significant contention; see above. But yes, if you used a StampedLock per instance you would get record-level locking ... just like you would other per-instance locking.
FWIW: This smells to me of "premature optimization". Furthermore, if you expect that your solution will need to scale beyond a single JVM in the short to medium term, then it is arguably a waste of time to optimize the single JVM solution too much.

Thread safely loop through ConcurrentHashMap with no blocking

I'm working on an ultra low latency and high performance application.
The core is single threaded, so don't need to worry about concurrency.
I'm developing a schedule log function which log messages periodically to prevent same messages flush in the log.
So the log class contains a ConcurrentHashMap, one thread update it (put new key or update existing value), another thread periodically loop through the map to log all the messages.
My concern is since need to log when loop through the Map which may take time, would it block the thread trying to update the Map? Any blocking is not acceptable since our application core is single threaded.
And is there any other data structure other than ConcurrentHashMap I can use to reduce the memory footprint?
Is there a thread-safe way to iterate the Map without blocking in the read-only case? Even the data iterated may be stale is still acceptable.

According to the java API docs, it says that :
[...] even though all operations are thread-safe, retrieval operations do not entail locking [...]
Moreover, the entrySet() method documentation tells us that:
The [returned] set is backed by the map, so changes to the map are reflected in the set, and vice-versa.
That would mean that modification of the map is possible while an iteration is done over it, meaning that it indeed doesn't block the whole map.

There may be other structures that would allow you to reduce the memory footprint, reduce latency, and perhaps offer a more uniform & consistent performance profile.
One of these is a worker pattern where your main worker publishes to a nonblocking queue and the logger pulls from that queue. This should decouple the two processes, allow for multiple concurrent publishers and allow you to scale out loggers.
One possible data structure for this is ConcurrentLinked Queue. I have very little experience with Java so I'm not sure how the performance profile of this differs from concurrent hashmap; but this is a very common pattern in distributed systems and in golang.

Java: modifying object in a single thread, reading in multiple threads

What is a generic approach to achieve thread safety when an object (e.g. a HashMap or an ArrayList or some POJO) is always modified by a single (same) thread but can be accessed by multiple threads?
HashMap is of most interest for me but I need a generic approach.
Is it enough to make it volatile?
Thanks.

Maybe you should take a look at ConcurrentHashMap.
public class ConcurrentHashMap
extends AbstractMap
implements ConcurrentMap, Serializable
A hash table supporting full concurrency of retrievals and high expected concurrency for updates. This class obeys the same functional specification as Hashtable, and includes versions of methods corresponding to each method of Hashtable. However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access. This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details.
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset. (More formally, an update operation for a given key bears a happens-before relation with any (non-null) retrieval for that key reporting the updated value.) For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries. Similarly, Iterators, Spliterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time. Bear in mind that the results of aggregate status methods including size, isEmpty, and containsValue are typically useful only when a map is not undergoing concurrent updates in other threads. Otherwise the results of these methods reflect transient states that may be adequate for monitoring or estimation purposes, but not for program control.
More info here :
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html
It behaves programmaticaly exactly like a classical hashmap.

A generic approach will not be easy to implement and will need a lot of effort on your part, and still things will go wrong (as it is too difficult). So, your best bet is ConcurrentHashMap as suggested by Gomoku7.
Generic approach will require an implementation based on locks. You will have to lock objects before update and release lock afterwords. There are different types of locks in Java. So, chose locks which suits your need. There are some tip:
Final is your friend, do not mutate objects unless it is necessary, make objects final where possible
Avoid creating temporary variables
Use Executors and Fork/Join where possible

Most efficient way to make a data structure thread-safe (Java)

I have a shared Map data structure that needs to be thread-safe. Is synchronized the most efficient way to read or add to the Map?
Thanks!
Edit: The data structure is a non-updatable cache, i.e. once it fills up it does not update the cache. So lots of writes initially with some reads then it is mostly reads

"Most efficient" is relative, of course, and depends on your specific situation. But consider something like ConcurrentHashMap if you expect there to be many threads working with the map simultaneously; it's thread safe but still allows concurrent access, unlike Hashtable or Collections.synchronizedMap().

That depends on how you use it in the app.
If you're doing lots of reads and writes on it, a ConcurrentHashMap is possibly the best choice, if it's mostly reading, a common Map wrapped inside a collection using a ReadWriteLock
(since writes would not be common, you'd get faster access and locking only when writing).
Collections.synchronizedMap() is possibly the worst case, since it might just give you a wrapper with all methods synchronized, avoid it at all costs.

For your specific use case (non-updatable cache), a copy on write map will outperform both a synchronized map and ConcurrentHashMap.
See: https://labs.atlassian.com/wiki/display/CONCURRENT/CopyOnWriteMap as one example (I believe apache also has a copy on write map implementation).

synchronised methods or collections will certainly work. It's not the most efficient approach but is simple to implement and you won't notice the overhead unless you are access the structure millions of times per second.
A better idea though might be to use a ConcurrentHashMap - this was designed for concurrency from the start and should perform better in a highly concurrent situation.

Java avoid race condition WITHOUT synchronized/lock

In order to avoid race condition, we can synchronize the write and access methods on the shared variables, to lock these variables to other threads.
My question is if there are other (better) ways to avoid race condition? Lock make the program slow.
What I found are:
using Atomic classes, if there is only one shared variable.
using a immutable container for multi shared variables and declare this container object with volatile. (I found this method from book "Java Concurrency in Practice")
I'm not sure if they perform faster than syncnronized way, is there any other better methods?
thanks

Avoid state.
Make your application as stateless as it is possible.
Each thread (sequence of actions) should take a context in the beginning and use this context passing it from method to method as a parameter.
When this technique does not solve all your problems, use the Event-Driven mechanism (+Messaging Queue).
When your code has to share something with other components it throws event (message) to some kind of bus (topic, queue, whatever).
Components can register listeners to listen for events and react appropriately.
In this case there are no race conditions (except inserting events to the queue). If you are using ready-to-use queue and not coding it yourself it should be efficient enough.
Also, take a look at the Actors model.

Atomics are indeed more efficient than classic locks due to their non-blocking behavior i.e. a thread waiting to access the memory location will not be context switched, which saves a lot of time.
Probably the best guideline when synchronization is needed is to see how you can reduce the critical section size as much as possible. General ideas include:
Use read-write locks instead of full locks when only a part of the threads need to write.
Find ways to restructure code in order to reduce the size of critical sections.
Use atomics when updating a single variable.
Note that some algorithms and data structures that traditionally need locks have lock-free versions (they are more complicated however).

Well, first off Atomic classes uses locking (via synchronized and volatile keywords) just as you'd do if you did it yourself by hand.
Second, immutability works great for multi-threading, you no longer need monitor locks and such, but that's because you can only read your immutables, you cand modify them.
You can't get rid of synchronized/volatile if you want to avoid race conditions in a multithreaded Java program (i.e. if the multiple threads cand read AND WRITE the same data). Your best bet is, if you want better performance, to avoid at least some of the built in thread safe classes which do sort of a more generic locking, and make your own implementation which is more tied to your context and thus might allow you to use more granullar synchronization & lock aquisition.
Check out this implementation of BlockingCache done by the Ehcache guys;
http://www.massapi.com/source/ehcache-2.4.3/src/net/sf/ehcache/constructs/blocking/BlockingCache.java.html

One of the alternatives is to make shared objects immutable. Check out this post for more details.

You can perform up to 50 million lock/unlocks per second. If you want this to be more efficient I suggest using more course grain locking. i.e. don't lock every little thing, but have locks for larger objects. Once you have much more locks than threads, you are less likely to have contention and having more locks may just add overhead.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.