I have a shared Map data structure that needs to be thread-safe. Is synchronized the most efficient way to read or add to the Map?
Thanks!
Edit: The data structure is a non-updatable cache, i.e. once it fills up it does not update the cache. So lots of writes initially with some reads then it is mostly reads
"Most efficient" is relative, of course, and depends on your specific situation. But consider something like ConcurrentHashMap if you expect there to be many threads working with the map simultaneously; it's thread safe but still allows concurrent access, unlike Hashtable or Collections.synchronizedMap().
That depends on how you use it in the app.
If you're doing lots of reads and writes on it, a ConcurrentHashMap is possibly the best choice, if it's mostly reading, a common Map wrapped inside a collection using a ReadWriteLock
(since writes would not be common, you'd get faster access and locking only when writing).
Collections.synchronizedMap() is possibly the worst case, since it might just give you a wrapper with all methods synchronized, avoid it at all costs.
For your specific use case (non-updatable cache), a copy on write map will outperform both a synchronized map and ConcurrentHashMap.
See: https://labs.atlassian.com/wiki/display/CONCURRENT/CopyOnWriteMap as one example (I believe apache also has a copy on write map implementation).
synchronised methods or collections will certainly work. It's not the most efficient approach but is simple to implement and you won't notice the overhead unless you are access the structure millions of times per second.
A better idea though might be to use a ConcurrentHashMap - this was designed for concurrency from the start and should perform better in a highly concurrent situation.
Related
I need to make a data structure keyed off of username and then some data (additional collections) in a POJO. The data needs to be thread safe.
So I'm thinking for the main structure, ConcurrentHashMap<String, MyPOJO>. For the operations I need to perform on MyPOJO, I may either just read it, or I may perform write operations on it.
Would the best approach be to do a get on the map and then operate on MyPOJO in a syncronized block? I assume I just need to put a syncronized block in the update methods and the read methods would automatically be blocked? Is that the best approach in a highly concurrent app? Or do I need to use something like ReadWriteLock on BOTH the get/set operations?
If I use something like StampedLock, each MyPOJO would need one correct, so I can do record level locking?
Thanks!
Would the best approach be to do a get on the map and then operate on MyPOJO in a synchronized block?
I assume that you mean a synchronized block on the MyPOJO instance itself (or a private lock owned by the instance).
My answer is yes, if you do it right.
I assume I just need to put a synchronized block in the update methods and the read methods would automatically be blocked?
No, that's not correct. All methods that access or update a mutable object would need to synchronize on the same lock.
If you don't synchronize for both reads and writes, you risk various thread-safety concerns, including problems with visibility of writes. Heisenbugs.
Is that the best approach in a highly concurrent app? Or do I need to use something like ReadWriteLock on BOTH the get/set operations?
It depends.
On the ReadWriteLock issue:
Unless it is likely that you will get significant lock contention on a specific MyPOJO instance, it is probably not worth the effort to optimize this.
If the access and update methods only hold the lock for a relatively short period of time, that reduces the impact of any contention.
More generally, I have a suspicion that you might be confusing "highly concurrent" with "highly scalable". Java multi-threading only performs up to the limit of the cores (and memory) on a single machine. Beyond that, clever tweaks to improve concurrency get you nowhere. To scale up further, you need to change the system architecture so that requests are handled by multiple JVM instances on different machines.
So ... to sum up ... ReadWriteLock might help if you have significant contention on individual MyPOJO instances AND there are likely to be a lot of parallel read operations on individual instances.
If I use something like StampedLock, each MyPOJO would need one correct, so I can do record level locking?
I doubt that there would be much benefit unless you have significant contention; see above. But yes, if you used a StampedLock per instance you would get record-level locking ... just like you would other per-instance locking.
FWIW: This smells to me of "premature optimization". Furthermore, if you expect that your solution will need to scale beyond a single JVM in the short to medium term, then it is arguably a waste of time to optimize the single JVM solution too much.
In multiThreading I want to use a map which will be updated, which Map will be better considering the performance 1. HashMap 2. ConcurrentHashMap? also, will it perform slow if i make it volatile?
It is going to be used in a Java batch for approx. 20Million records.
Currently i am not sharing this map among threads.
will sharing the map among threads reduce performance?
HashMap will be better performance-wise, as it is not synchronized in any way. ConcurrentHashMap adds overhead to manage concurrent read and - especially - concurrent write access.
That being said, in a multithreaded environment, you are responsible for synchronizing access to HashMap as needed, which will cost performance, too.
Therefore, I would go for HashMap only if the use case allows for very specific optimization of the synchronization logic. Otherwise, ConcurrentHashMap will save you a lot of time working out the synchronization.
However, please note that even with ConcurrentHashMap you will need to carefully consider what level of synchronization you need. ConcurrentHashMap is thread-safe, but not fully synchronized. For instance, if you absolutely need to synchronize each read access with each write access, you will still need custom logic, since for a read operation ConcurrentHashMap will provide the state after the last successfully finished write operation. That is, there might still be an ongoing write operation which will not be seen by the read.
As for volatile, this only ensures that changes to that particular field will be synchronized between threads. Since you will likely not change the reference to the HashMap / ConcurrentHashMap, but work on the instance, the performance overhead will be negligible.
First, I'll describe what I want and then I'll elaborate on the possibilities I am considering. I don't know which is the best so I want some help.
I have a hash map on which I do read and write operations from a Servlet. Now, since this Servlet is on Tomcat, I need the hash map to be thread safe. Basically, when it is being written to, nothing else should write to it and nothing should be able to read it as well.
I have seen ConcurrentHashMap but noticed its get method is not thread-safe. Then, I have seen locks and something called synchronized.
I want to know which is the most reliable way to do it.
ConcurrentHashMap.get() is thread safe.
You can make HashMap thread safe by wrapping it with Collections.synchronizedMap().
EDIT: removed false information
In any case, the synchronized keyword is a safe bet. It blocks any threads from accessing the object while inside a synchronized block.
// Anything can modify map at this point, making it not thread safe
map.get(0);
as opposed to
// Nothing can modify map until the synchronized block is complete
synchronized(map) {
map.get(0);
}
I would like to suggest you to go with ConcurrentHashMap , the requirement that you have mentioned above ,earlier I also had the same type of requirement for our application but we were little more focused on the performance side.
I ran both ConcurrentHashMap and map returned by Colecctions.synchronizedMap(); , under various types of load and launching multiple threads at a time using JMeter and I monitored them using JProfiler .After all these tests we came to conclusion that that map returned by Colecctions.synchronizedMap() was not as efficient in terms of performance in comaprison to ConcurrentHashMap.
I have written a post also on the same about my experience with both.
Thanks
Collections.synchronizedMap(new HashMap<K, V>);
Returns a synchronized (thread-safe) map backed by the specified map. In order to guarantee serial access, it is critical that all access to the backing map is accomplished through the returned map.
It is imperative that the user manually synchronize on the returned map when iterating over any of its collection views:
This is the point of ConcurrentHashMap class. It protects your collection, when you have more than 1 thread.
In order to avoid race condition, we can synchronize the write and access methods on the shared variables, to lock these variables to other threads.
My question is if there are other (better) ways to avoid race condition? Lock make the program slow.
What I found are:
using Atomic classes, if there is only one shared variable.
using a immutable container for multi shared variables and declare this container object with volatile. (I found this method from book "Java Concurrency in Practice")
I'm not sure if they perform faster than syncnronized way, is there any other better methods?
thanks
Avoid state.
Make your application as stateless as it is possible.
Each thread (sequence of actions) should take a context in the beginning and use this context passing it from method to method as a parameter.
When this technique does not solve all your problems, use the Event-Driven mechanism (+Messaging Queue).
When your code has to share something with other components it throws event (message) to some kind of bus (topic, queue, whatever).
Components can register listeners to listen for events and react appropriately.
In this case there are no race conditions (except inserting events to the queue). If you are using ready-to-use queue and not coding it yourself it should be efficient enough.
Also, take a look at the Actors model.
Atomics are indeed more efficient than classic locks due to their non-blocking behavior i.e. a thread waiting to access the memory location will not be context switched, which saves a lot of time.
Probably the best guideline when synchronization is needed is to see how you can reduce the critical section size as much as possible. General ideas include:
Use read-write locks instead of full locks when only a part of the threads need to write.
Find ways to restructure code in order to reduce the size of critical sections.
Use atomics when updating a single variable.
Note that some algorithms and data structures that traditionally need locks have lock-free versions (they are more complicated however).
Well, first off Atomic classes uses locking (via synchronized and volatile keywords) just as you'd do if you did it yourself by hand.
Second, immutability works great for multi-threading, you no longer need monitor locks and such, but that's because you can only read your immutables, you cand modify them.
You can't get rid of synchronized/volatile if you want to avoid race conditions in a multithreaded Java program (i.e. if the multiple threads cand read AND WRITE the same data). Your best bet is, if you want better performance, to avoid at least some of the built in thread safe classes which do sort of a more generic locking, and make your own implementation which is more tied to your context and thus might allow you to use more granullar synchronization & lock aquisition.
Check out this implementation of BlockingCache done by the Ehcache guys;
http://www.massapi.com/source/ehcache-2.4.3/src/net/sf/ehcache/constructs/blocking/BlockingCache.java.html
One of the alternatives is to make shared objects immutable. Check out this post for more details.
You can perform up to 50 million lock/unlocks per second. If you want this to be more efficient I suggest using more course grain locking. i.e. don't lock every little thing, but have locks for larger objects. Once you have much more locks than threads, you are less likely to have contention and having more locks may just add overhead.
I am facing this issue:
I have lots of threads (1024) who access one large collection - Vector.
Question:
is it possible to do something about it which would allow me to do concurrent actions on it without having to synchronize everything (since that takes time)? What I mean, is something like Mysql database works, you don't have to worry about synchronizing and thread-safe issues. Is there some collection alike that in Java? Thanks
Vector is a very old Java class - predates the Collections API. It synchronizes on every operation, so you're not going to have any luck trying to speed it up.
You should consider reworking your code to use something like ConcurrentHashMap or a LinkedBlockingQueue, which are highly optimized for concurrent access.
Failing that, you mention that you'd like performance and access semantics similar to a database - why not use a dedicated database or a message queue? They are likely to implement it better than you ever will, and it's less code for you to write!
[edit] Given your comment:
all what thread does is adding elements to vector
(only if num of elements in vector = 0) &
removing elements from vector. (if vector size > 0)
it sounds very much like you should be using something much more like a queue than a list! A bounded queue with size 1 will give you these semantics - although I'd question why you can't add elements if there is already something there. When you've got thousands of threads this seems like a very inefficient design.
Well first off, this design doesn't sound right. It sounds like you need to think about using a proper database rather than an simple data structure, even if this means just using something like an in-memory instance of HypersonicDB.
However, if you insist on doing things this way, then the java.util.concurrent package has a number of highly concurrent, non-locking data structures. One of them might suit your purpose (e.g. ConcurrentHashMap, if you can use a Map rather than a List)
Looks like you are implementing the producer consumer pattern, you should google "producer consumer java" or have a look at the BlockingQueue interface
I agree with skaffman about looking at java.util.concurrent.
ConcurrentHashMap is very scalable. However, the size() call on it returns only an approximation. So e.g. your app will occasionally be adding elements to it even if !(num of elements in vector = 0).
If you want to strictly enforce the condition you gave, there is no other way than to synchronize.
Instead of having tons of context switches, I guess you could let your users thread post a callable on a queue and have only one thread dealing with the mutation. This will eliminate the need for synchronization on the collection. The user threads can wait on Future.get().
Just an idea.
If you do not want to change your data structure and have only infrequent writes, you might also use one or many ReentrantReadWriteLock to synchronize access. Then many threads can read at the same time, but when a thread wants to write all reads are blocked until the write is done.
But you should check whether the used data structure is appropriate for the task, or whether another of the many java.util or java.util.concurrent classes is more appropriate. java.util.Vector is synchronized, by the way.