I came across the following piece of code ,and noted a few inconsistencies - for a multi-thread safe code.
Map<String,Map<String,Set<String>> clusters = new HashMap<.........>;
Map<String,Set<String>> servers = clusters.get(clusterkey);
if(servers==null){
synchronized(clusterkey){
servers = clusters.get(clusterkey);
if(servers==null){....initialize new hashmap and put...}
}
}
Set<String> users=servers.get(serverkey);
if(users==null){
synchronized(serverkey){
users=servers.get(serverkey);
if(users==null){ ... initialize new hashset and put...}
}
}
users.add(userid);
Why would map be synchronized on clusterkey- shouldnt it be on the map as monitor itself?
Shouldnt the last users.add... be synchronized too?
This seems to be a lot of code to add a single user in a thread-safe manner.What would be a smarter implementation?
Here just some observations:
Synchronizing on a String is a very bad idea -> sync on clusterKey and serverKey will probably not work the way intended.
Better would be to use ConcurrentHashMaps and ConcurrentHashSets.
Though without more context it is not really possible to answer this question. It seems the code-author wanted to safely create just 1 mapping per clusterKey and serverKey so the user can be added just once.
A (probably better) way would be to just synchronize on the clusters map itself and then you're safe as only one thread can read and/or write to said map.
Another way would be to use custom Locks, maybe one for reading, and another one for writing, though this may lead again to inconsistencies if one thread is writing to the Map while another is reading that exact value from it.
The code looks like a non-thought through version of the Double checked locking idiom that sometimes is used for lazy initialisation. Read the provided link for why this is a really bad implementation of it.
The problem with the given code is that it fails intermittently. There is a race condition when there are several threads trying to work on the map using the same key (or keys with the same hashcode) which means that the map created first might be replaced by the second hashmap.
1 -The synch is trying to avoid that two threads, at the same time, create a new Entry in that Map. The second one must wait so his (servers==null) doesn't also return true.
2 - That users list seems to be out of scope, but seems like it doesn't need a synch. Maybe the programmer knows there is no duplicated userIds, or maybe he doesn't care about resetting the same user again and again.
3- ConcurrentHashMap maybe?
Related
I don't know why I can't get my head around this question.
I see examples on the internet where people talk about using striped locking to synchronize a map. The idea is to use multiple locks to allow for more concurrency, while retaining correctness. But is an approach like this actually correct?
I believe that the idea is to have one lock per bucket, the same thing that ConcurrentHashMap does under the hood, but how can one achieve such a feat given the fact the the mapping key -> bucket is an internal Map implementation detail, and is not actually possible to match it from the outside of the Map?
Take this example:
public void concurrentMethod(String key) {
val lock = stripedLock.get(key))
lock.lock()
// do work on map[key]
lock.unlock()
}
there's no guarantee that stripedLocks.get(key) will return the same lock for 2 keys that will end up in the same bucket. So, as far as I can tell, this is not correct and the synchronization doesn't actually work.
Is my reasoning wrong here? Can an approach like this lead to correct synchronization?
I'am trying to clarify HashMap vs ConcurrentHashMap regarding type-safety and also performance. I came across a lot of good articles, but still getting troubles figuring it all out.
Let's take the following example using a ConcurrentHashMap, where I will try to add a value for a key not already there and returning it, the new way of doing it would be:
private final Map<K,V> map = new ConcurrentHashMap<>();
return map.putIfAbsent(k, new Object());
let's assume we don't want to use the putIfAbsent method, the above code should look something like this:
private final Map<K,V> map = new ConcurrentHashMap<>();
synchronized (map) {
V value = map.get(key); //Edit adding the value fetch inside synchronized block
if (!nonNull(value)) {
map.put(key, new Object());
}
}
return map.get(key)
Is the problem with this approach the fact that the whole map is locked whereas in first approach the putIfAbsent method only synchronizes on the bucket on which the hash of the key is, and thus leading to less performance ? Would the second approach work fine with just a HashMap ?
Is the problem with this approach the fact that the whole map is locked
There are two problems with this approach.
It's not intrinsic
The fact that you've acquired the lock on the map reference has zero effect whatsoever, except in regards to any other code that (tries) to acquire this lock. Crucially, ConcurrentHashmap itself does not acquire this lock.
So, if, during that second snippet (with synchronized), some other thread does this:
map.putIfAbsent(key, new Object());
Then it may occur that your map.get(key) call returns null, and nevertheless your followup map.put call ends up overwriting. In other words, that both your thread, and that hypothetical thread running putIfAbsent, both decided to write.
Presumably, if that is just fine in your book, that'd be weird. Why use putIfAbsent and check if map.get returns null in the first place?
Had the other thread done this:
synchronized (map) {
map.putIfAbsent(key, new Object());
}
then there'd be no problem; either your get-check-if-null-then-set code will set and the putIfAbsent call is a noop, or vice versa, but they couldn't possibly both 'decide to write'.
Which leads us to;
This is pointless
There are two different ways to achieve concurrency with maps: Intrinsic and extrinsic. There is zero point in doing both, and they do not interact.
If you have structure whereby all access (both read and write) out of a plain old entirely non-multicore capable java.util.HashMap goes through some shared lock (the hashmap instance itself, or any other lock, long as all threads that interact with that particular map instance use the same one), then that works fine and there is therefore no reason or point to using ConcurrentHashMap instead.
The point of ConcurrentHashMap is to streamline concurrent processes without the use of extrinsic locking: To let the map do the locking.
One of the reasons you want this is that the ConcurrentHashMap impl is significantly faster at the jobs it is capable of doing; these jobs are spelled out explicitly: It's the methods that ConcurrentHashMap has.
Atomicity
The central problem of your code snippet is that it lacks atomicity. Check-then-act is fundamentally broken in concurrent models (in your case: Check: Is key 'k' associated with no value or null?, then Act: Set the mapping of key 'k' to value 'v'). This is broken because what if the thing you checked changes in between? What if you have two threads that both 'check-and-act' and then run simultaneously; then they both check first, then both act first, and broken things ensue: One of the two threads will be acting upon a state that isn't equal to the state as it was when you checked, which means your check's broken.
The right model is act-then-check: Act first, and then check the result of the operation. Of course, this requires redefining, and integrating, the code you wrote explicitly in your snippet, into the very definition of your 'act' phase.
In other words, putIfAbsent is not a convenience method! is a fundamental operation! It's the only way (short of extrinsic locking) to convey the notion of: "Perform the action of associating 'v' with 'k', but only if there is no association yet. I'll check the results of this operation next". There is no way to break that down into if (!map.containsKey(key)) map.put(key, v); because check-then-act does not work in concurrent modelling.
Conclusions
Either get rid of concurrenthashmap, or get rid of synchronized. Having code that uses both is probably broken and even if it isn't, it's error prone, confusing, and I can guarantee you there's a much better way to write it (better in that it is more idiomatic, easier to read, more flexible in the face of future change requests, easier to test, and less likely to have hard-to-test-for bugs in it).
If you can state all operations you need to perform 100% in terms of the methods that CHM has, then do that, because CHM is vastly superior. It even has mechanisms for arbitrary operations: For example, unlike basic hashmaps, you can iterate through a CHM even if other threads are also messing with it, whereas with a normal hashmap you need to hold the lock for the entire duration of the operation, which means any other thread trying to do anything to that hashmap, even just 'ask for its size', need to wait. Hence, for most use cases, CHM results in orders of magnitude better performance.
in first approach the putIfAbsent method only synchronizes on the bucket
That is incorrect, ConcurrentHashMap doesn't synchronize on anything, it uses different mechanics to ensure thread safety.
Would the second approach work fine with just a HashMap ?
Yes, except the second approach is flawed. If using synchronization to make a Map thread-safe, then all access of the Map should use synchronization. As such, it would be best to call Collections.synchronizedMap(map). Performance will be worse than using ConcurrentHashMap.
private final Map<Integer, Object> map = Collections.synchronizedMap(new HashMap<>());
let's assume we don't want to use the putIfAbsent method.
Why? Oh, because it wastes a allocation if the key is already in the map, which is why we should be using computeIfAbsent() instead
map.computeIfAbsent(key, k -> new Object());
I know you have to synchronize around anything that would change the structure of a hashmap (put or remove) but it seems to me you also have to synchronize around reads of the hashmap otherwise you might be reading while another thread is changing the structure of the hashmap.
So I sync around gets and puts to my hashmap.
The only machines I have available to me to test with all only have one processor so I never had any real concurrency until the system went to production and started failing. Items were missing out of my hashmap. I assume this is because two threads were writing at the same time, but based on the code below, this should not be possible. When I turned down the number of threads to 1 it started working flawlessly, so it's definitely a threading problem.
Details:
// something for all the threads to sync on
private static Object EMREPORTONE = new Object();
synchronized (EMREPORTONE)
{
reportdatacache.put("name.." + eri.recip_map_id, eri.name);
reportdatacache.put("subjec" + eri.recip_map_id, eri.subject);
etc...
}
... and elsewhere....
synchronized (EMREPORTONE)
{
eri.name = (String)reportdatacache.get("name.." + eri.recip_map_id);
eri.subject = (String)reportdatacache.get("subjec" + eri.recip_map_id);
etc...
}
and that's it. I pass around reportdatacache between functions, but that's just the reference to the hashmap.
Another important point is that this is running as a servlet in an appserver (iplanet to be specific, but I know none of you have ever heard of that)
But regardless, EMREPORTONE is global to the webserver process, no two threads should be able to step on each other, yet my hashmap is getting wrecked. Any thoughts?
In servlet container environment static variables depend on classloader. So you may think that you're dealing with same static instance, but in fact it could be completely different one.
Additionally, check if you do not use the map by escaped reference elsewhere and write/remove keys from it.
And yes, use ConcurrentHashMap instead.
Yes, synchronization is not only important when writing, but also when reading. While a write will be performed under mutually exclusion, a reader might access an errenous state of the map.
I cannot recommend you under any circumstances to synchronize the Java Collections manually, there are thread-safe counterparts: Collections.synchronizedMap and ConcurrentHashMap. Use them, they will ensure, that access to them is safe in a multithreaded environment.
Futher hints, it seems that everyone is accesing the datareportcache. Is there only one instance of that object? Why not synchronize then on the cache itself? But forget then when trying to solve your problems, use the sugar from java.util.concurrent.
As I see it there are 3 possibilities here:
You are locking on two different objects. EMREPORTONE is private static however and the code that accesses the reportdatacache is in one file only. Ok, that isn't it then. But I would recommend locking on reportdatacache instead of EMREPORTONE however. Cleaner code.
You are missing some read or write to reportdatacache somewhere. There are other accesses to the map that are not synchronized. Are things never removed from the cache?
This isn't a synchronization problem but rather a race condition issue. The data in the hashmap is fine but you are expecting things to be in the cache but they haven't be stored by the other thread yet. Maybe 2 requests come in for the same eri at the same time and they are both putting values into the cache? Maybe check to see if the old value returned by put(...) is always null? Maybe explaining more about how you know that items are missing from the map would help with this.
As an aside, you are doing this:
reportdatacache.put("name.." + eri.recip_map_id, eri.name);
reportdatacache.put("subjec" + eri.recip_map_id, eri.subject);
But it seems like you really should be storing the eri by its id.
reportdatacache.put(recip_map_id, eri);
Then you aren't creating fake keys with the "name.." prefix. Or maybe you should create a NameSubject private static class to store the name and subject in the cache. Cleaner.
Hope something here helps.
so i want to have an arraylist that stores a series of stock quotes. but i keep track of bid price, ask price and last price for each.
of course at any time, the bid ask or last of a given stock can change.
i have one thread that updates the prices and one that reads them.
i want to make sure that when reading no other thread is updating a price. so i looked at synchronized collection. but that seems to only prevent reading while another thread is adding or deleting an entry to the arraylist.
so now i'm onto the wrapper approach:
public class Qte_List {
private final ArrayList<Qte> the_list;
public void UpdateBid(String p_sym, double p_bid){
synchronized (the_list){
Qte q = Qte.FindBySym(the_list, p_sym);
q.bid=p_bid;}
}
public double ReadBid(String p_sym){
synchronized (the_list){
Qte q = Qte.FindBySym(the_list, p_sym);
return q.bid;}
}
so what i want to accomplish with this is only one thread can be doing anything - reading or updating an the_list's contents - at one time. am i approach this right?
thanks.
Yes, you are on the right track and that should work.
But why not use the existing Hashtable collection, which is synchronized, and provides a key-value lookup already?
As I understand it you are using the map to store the quotes; the number of quotes never changes, but each quote can be read or modified to reflect current prices. It is important to know that locking the collection only protects against changes to which Quote objects are in the map: it does not in any way restrict the modification of the contents of those Quotes. If you want to restrict that access you will have to provide locking on the Quote object.
Looking at your code however I don't believe you have a significant synchronization problem. If you try to do a read at the same time as a write, you will either get the price before or the price after the write. If you didn't know the write was going to occur that shouldn't matter to you. You may need locking at a higher level so that
if (getBidPrice(mystock)<10.0) {
sell(10000);
}
happens as an atomic operation and you don't end up selling at 5.0 rather than 10.0.
If the number of quotes really doesn't change then I would recommend allowing Qte objects to be added only in the constructor of Qte_List. This would make locking the collection irrelevant. The technical term for this is making Qte_List immutable.
That looks like a reasonable approach. Nit-picking, though, you probably shouldn't include the return statement inside the synchronized block:
public double ReadBid(String p_sym){
double bid;
synchronized (the_list) {
Qte q = Qte.FindBySym(the_list, p_sym);
bid = q.bid;
}
return bid;
}
I'm not sure if it's just my taste or there's some concurrency gotcha involved, but at the very least it looks cleaner to me ;-).
Yes this will work, anyway you don't need to do it yourself since it is already implemented in the Collections framework
Collections.synchronizedList
Your approach should do the trick, but as you stated, there can only be one reader and writer at a time. This isn't very scaleable.
There are some ways to improve performance without loosing thread-safety here.
You could use a ReadWriteLock for example. This will allow multiple readers at a time, but when someone gets the write-lock, all others must wait for him to finish.
Another way would be to use a proper collection. It seems you could exchange your list with a thread-safe implementation of Map. Have a look at the ConcurrentMap documentation for possible candidates.
Edit:
Assuming that you need ordering for your Map, have a look at the ConcurrentNavigableMap interface.
What you have will work, but locking the entire list every time you want to read or update the value of an element is not scalable. If this doesn't matter, then you're fine with what you have. If you want to make it more scalable consider the following...
You didn't say whether you need to be able to make structural changes to the_list (adding or removing elements), but if you don't, one big improvement would be to move the call to FindBySym() outside of the synchronized block. Then instead of synchronizing on the_list, you can just synchronize on q (the Qte object). That way you can update different Qte objects concurrently. Also, if you can make the Qte objects immutable as well you actually don't need any synchronization at all. (to update, just use the_list[i] = new Qte(...) ).
If you do need to be able to make structural changes to the list, you can use a ReentrantReadWriteLock to allow for concurrent reads and exclusive writes.
I'm also curious why you want to use an ArrayList rather than a synchronized HashMap.
I am attempting to troubleshoot an intermittent failure that appears to be related to removing an object from a HashMap and then putting that same object back using a new key. My HashMap is created as follows:
transactions = new HashMap<Short, TransactionBase>();
The code that does the re-assignment is as follows:
transactions.remove(transaction.tran_no);
transaction.tran_no = generate_transaction_id();
transactions.put(transaction.tran_no, transaction);
The intermittent behaviour that I am seeing is that code that executes immediately after this that depends upon the transaction object being locatable does not appear to find the transaction object using the new transaction id. However, at some future point, the transaction can be located. So pulling at straws, is there any sort of asynchronous effect to put() or remove that can cause this sort of behaviour?
I should mention that, to the best of my knowledge, the container is being accessed by only one thread. I have already read in he documentation that class HashMap is not "synchronised".
There is a slight difference between remove/get and put (although my guess is that you have a threading issue).
The parameter for remove/get is of type Object; for put it is of type K. The reason for that has been stated many times before. This means it has problems with boxing. I'm not even going to guess what the rules are. If a value gets boxed as a Byte in one place and a Short in another, then those two objects cannot be equal.
There is a similar issue with List.remove(int) and List.remove(Object).
I presume that every time you check for the presence of the item you're definitely using a short or Short argument to Map.get() or Map.contains()?
These methods take Object arguments so if you pass them an int it will be converted to an Integer and will never match any item in your Map because they will all have Short keys.
There are no "asynchronous" effects in the HashMap class. As soon as you put something in there, it's there. You should double- and triple- check to make sure that there are no threading issues.
The only other thing I can think of is that you're making a copy of the HashMap somewhere. The copy obviously won't be affected by you adding stuff into the original, or vice versa.
Just one suggestion...you're focusing on accesses to the HashMap but I wonder if you should also see if your generate_transaction_id() is thread safe or if it is behaving in an unexpected way.
Have you overridden equals() but not hashCode() in your model objects? How about compareTo() ? If you get these wrong, Collections will behave strangely indeed.
Check Java Practices on equals() and compareTo().
What does this generate_transaction_id() function do? If it is generating a 16-bit GUID-like thing, you could easily get hash collisions. Combined with threading, you could get:
T1: transaction1.tran_no = generate_transaction_id(); => 1729
T2: transaction2.tran_no = generate_transaction_id(); => 1729
T1: transactions.put(transaction1.tran_no, transaction1); => map.put(1729, transaction1)
T2: transactions.put(transaction2.tran_no, transaction2); => map.put(1729, transaction2)
T1: int tran_no = transactions.get(1729); => transaction2
T1: transactions.remove(transaction.tran_no); => map.remove(1729)
T2: int tran_no = transactions.get(1729); => null
Of course, this could only be a solution if that 'to the best of your knowledge' part is not true.
So you know HashMap's not thread safe. Are you sure it's being accessed by only one thread? After all, intermittent failures a frequently threading related. If not, you can wrap it with Collections.synchronizedMap(), like so:
Collections.synchronizedMap(transactions);
You could always just try so you can eliminate that possibility.
It should be pointed out that this just wraps the original map with one with all the methods synchronized. You may want to consider using a synchronized block if the access is localized.
Threading has been mentioned in a few responses already, but have you considered the visibility issue for objects used by multiple threads? It's possible (and quite common) that if you put an object into a collection in one thread, it will not be "published" to other threads unless you have properly synchronized on the collection.
Threads and Locks
Synchronization and thread safety in Java
As others have observed you HAVE to know whether the HashMap is accessed by just one Thread, or not. CollectionSpy is a new profiler that lets you find out instantly, for all containers, how many threads perform any accesses. See www.collectionspy.com for more details.