Missing items from synchronized HashMap in a threaded environment - java

I know you have to synchronize around anything that would change the structure of a hashmap (put or remove) but it seems to me you also have to synchronize around reads of the hashmap otherwise you might be reading while another thread is changing the structure of the hashmap.
So I sync around gets and puts to my hashmap.
The only machines I have available to me to test with all only have one processor so I never had any real concurrency until the system went to production and started failing. Items were missing out of my hashmap. I assume this is because two threads were writing at the same time, but based on the code below, this should not be possible. When I turned down the number of threads to 1 it started working flawlessly, so it's definitely a threading problem.
Details:
// something for all the threads to sync on
private static Object EMREPORTONE = new Object();
synchronized (EMREPORTONE)
{
reportdatacache.put("name.." + eri.recip_map_id, eri.name);
reportdatacache.put("subjec" + eri.recip_map_id, eri.subject);
etc...
}
... and elsewhere....
synchronized (EMREPORTONE)
{
eri.name = (String)reportdatacache.get("name.." + eri.recip_map_id);
eri.subject = (String)reportdatacache.get("subjec" + eri.recip_map_id);
etc...
}
and that's it. I pass around reportdatacache between functions, but that's just the reference to the hashmap.
Another important point is that this is running as a servlet in an appserver (iplanet to be specific, but I know none of you have ever heard of that)
But regardless, EMREPORTONE is global to the webserver process, no two threads should be able to step on each other, yet my hashmap is getting wrecked. Any thoughts?

In servlet container environment static variables depend on classloader. So you may think that you're dealing with same static instance, but in fact it could be completely different one.
Additionally, check if you do not use the map by escaped reference elsewhere and write/remove keys from it.
And yes, use ConcurrentHashMap instead.

Yes, synchronization is not only important when writing, but also when reading. While a write will be performed under mutually exclusion, a reader might access an errenous state of the map.
I cannot recommend you under any circumstances to synchronize the Java Collections manually, there are thread-safe counterparts: Collections.synchronizedMap and ConcurrentHashMap. Use them, they will ensure, that access to them is safe in a multithreaded environment.
Futher hints, it seems that everyone is accesing the datareportcache. Is there only one instance of that object? Why not synchronize then on the cache itself? But forget then when trying to solve your problems, use the sugar from java.util.concurrent.

As I see it there are 3 possibilities here:
You are locking on two different objects. EMREPORTONE is private static however and the code that accesses the reportdatacache is in one file only. Ok, that isn't it then. But I would recommend locking on reportdatacache instead of EMREPORTONE however. Cleaner code.
You are missing some read or write to reportdatacache somewhere. There are other accesses to the map that are not synchronized. Are things never removed from the cache?
This isn't a synchronization problem but rather a race condition issue. The data in the hashmap is fine but you are expecting things to be in the cache but they haven't be stored by the other thread yet. Maybe 2 requests come in for the same eri at the same time and they are both putting values into the cache? Maybe check to see if the old value returned by put(...) is always null? Maybe explaining more about how you know that items are missing from the map would help with this.
As an aside, you are doing this:
reportdatacache.put("name.." + eri.recip_map_id, eri.name);
reportdatacache.put("subjec" + eri.recip_map_id, eri.subject);
But it seems like you really should be storing the eri by its id.
reportdatacache.put(recip_map_id, eri);
Then you aren't creating fake keys with the "name.." prefix. Or maybe you should create a NameSubject private static class to store the name and subject in the cache. Cleaner.
Hope something here helps.

Related

thread safe map operation

I came across the following piece of code ,and noted a few inconsistencies - for a multi-thread safe code.
Map<String,Map<String,Set<String>> clusters = new HashMap<.........>;
Map<String,Set<String>> servers = clusters.get(clusterkey);
if(servers==null){
synchronized(clusterkey){
servers = clusters.get(clusterkey);
if(servers==null){....initialize new hashmap and put...}
}
}
Set<String> users=servers.get(serverkey);
if(users==null){
synchronized(serverkey){
users=servers.get(serverkey);
if(users==null){ ... initialize new hashset and put...}
}
}
users.add(userid);
Why would map be synchronized on clusterkey- shouldnt it be on the map as monitor itself?
Shouldnt the last users.add... be synchronized too?
This seems to be a lot of code to add a single user in a thread-safe manner.What would be a smarter implementation?
Here just some observations:
Synchronizing on a String is a very bad idea -> sync on clusterKey and serverKey will probably not work the way intended.
Better would be to use ConcurrentHashMaps and ConcurrentHashSets.
Though without more context it is not really possible to answer this question. It seems the code-author wanted to safely create just 1 mapping per clusterKey and serverKey so the user can be added just once.
A (probably better) way would be to just synchronize on the clusters map itself and then you're safe as only one thread can read and/or write to said map.
Another way would be to use custom Locks, maybe one for reading, and another one for writing, though this may lead again to inconsistencies if one thread is writing to the Map while another is reading that exact value from it.
The code looks like a non-thought through version of the Double checked locking idiom that sometimes is used for lazy initialisation. Read the provided link for why this is a really bad implementation of it.
The problem with the given code is that it fails intermittently. There is a race condition when there are several threads trying to work on the map using the same key (or keys with the same hashcode) which means that the map created first might be replaced by the second hashmap.
1 -The synch is trying to avoid that two threads, at the same time, create a new Entry in that Map. The second one must wait so his (servers==null) doesn't also return true.
2 - That users list seems to be out of scope, but seems like it doesn't need a synch. Maybe the programmer knows there is no duplicated userIds, or maybe he doesn't care about resetting the same user again and again.
3- ConcurrentHashMap maybe?

Is this the correct way of using synchronized in Java?

The for loop shown here is run within a thread. Inside the sychronized block, a thread writes to some file. There are several different files, so the writers are kept in an array. What I want to make sure here is that no two different threads are writing to the same file at the same time. However, they can write to different files. Am I using the correct parameter with the synchronized block?
for(Element e: elements)
{
int i = getWriterIndex(e)
writeri = writers(i)
synchronized(writeri)
{
// Write to corresponding segment
writers(i).write(e)
recordsWritten(i) += 1
}
}
While I think that this would work, I strongly suggest that you avoid using synchronized. The reason being that it is quite common that you end up being to strict in your synchronization policy. As others have mentioned this seems to be a perfect use case for queues.
If you do not want to use queues in most scenarios (including this) I would suggest using locks in order to maintain thread safety (typically ReentrantReadWriteLock). You can find an example here
In your case I would create a single lock per writer and require that in order to use the writer the current thread must hold the writelock. (If you are only writing you might use simple locks instead of ReentrantReadWriteLock.
Yes, your synchronisation code will work as expected, as long as the synchronised block accesses only the data structure that is used as the lock. So, in your code, writeri cannot be accessed by multiple threads concurrently, so it is thread-safe.
However, you have to make sure that you are not accessing the variable recordsWritten somewhere else, because then you will have race-conditions. So, ideally you would also lock on that variable as well (in every place you access it), or you can use some Java primitive such as AtomicInteger

Java multi threading atomic assignment

Same with the follow link, I use the same code with the questioner.
Java multi-threading atomic reference assignment
In my code, there
HashMap<String,String> cache = new HashMap<String,String>();
public class myClass {
private HashMap<String,String> cache = null;
public void init() {
refreshCache();
}
// this method can be called occasionally to update the cache.
//Only one threading will get to this code.
public void refreshCache() {
HashMap<String,String> newcache = new HashMap<String,String>();
// code to fill up the new cache
// and then finally
cache = newcache; //assign the old cache to the new one in Atomic way
}
//Many threads will run this code
public void getCache(Object key) {
ob = cache.get(key)
//do something
}
}
I read the sjlee's answer again and again, I can't understand in which case these code will go wrong. Can anyone give me a example?
Remember I don't care about the getCache function will get the old data.
I'm sorry I can't add comment to the above question because I don't have 50 reputation.
So I just add a new question.
Without a memory barrier you might see null or an old map but you could see an incomplete map. I.e. you see bits of it but not all. Thus is not a problem if you don't mind entries being missing but you risk seeing the Map object but not anything it refers to resulting in a possible NPE.
There is no guarantee you will see a complete Map.
final fields will be visible but non - final fields might not.
this is a very interesting problem, and it shows that one of your core assumptions
"Remember I don't care about the getCache function will get the old
data."
is not correct.
we think, that if "refreshCache" and "getCache" is not synchronized, then we will only get old data, which is not true.
Their call by the initial thread may never reflect in other threads. Since cache is not volatile, every thread is free to keep it's own local copy of it and never make it consistent across threads.
Because the "visibility" aspect of multi-threading, which says that unless we use appropriate locking, or use volatile, we do not trigger a happens-before scenario, which forces threads to make shared variable value consistent across the multiple processors they are running on, which means "cache" , may never get initialized causing an obvious NPE in getCache
to understand this properly, i would recommend reading section 16.2.4 of "Java concurrency in practice" book which deals with a similar problem in double checked locking code.
Solution: would be
To make refreshCache synchronized to force, all threads to update their copy of HashMap whenever any one thread calls it, or
To make cache volatile or
You would have to call refreshCache in every single thread that calls getCache which kind of defeats the purpose of a common cache.

synchronizing reads to a java collection

so i want to have an arraylist that stores a series of stock quotes. but i keep track of bid price, ask price and last price for each.
of course at any time, the bid ask or last of a given stock can change.
i have one thread that updates the prices and one that reads them.
i want to make sure that when reading no other thread is updating a price. so i looked at synchronized collection. but that seems to only prevent reading while another thread is adding or deleting an entry to the arraylist.
so now i'm onto the wrapper approach:
public class Qte_List {
private final ArrayList<Qte> the_list;
public void UpdateBid(String p_sym, double p_bid){
synchronized (the_list){
Qte q = Qte.FindBySym(the_list, p_sym);
q.bid=p_bid;}
}
public double ReadBid(String p_sym){
synchronized (the_list){
Qte q = Qte.FindBySym(the_list, p_sym);
return q.bid;}
}
so what i want to accomplish with this is only one thread can be doing anything - reading or updating an the_list's contents - at one time. am i approach this right?
thanks.
Yes, you are on the right track and that should work.
But why not use the existing Hashtable collection, which is synchronized, and provides a key-value lookup already?
As I understand it you are using the map to store the quotes; the number of quotes never changes, but each quote can be read or modified to reflect current prices. It is important to know that locking the collection only protects against changes to which Quote objects are in the map: it does not in any way restrict the modification of the contents of those Quotes. If you want to restrict that access you will have to provide locking on the Quote object.
Looking at your code however I don't believe you have a significant synchronization problem. If you try to do a read at the same time as a write, you will either get the price before or the price after the write. If you didn't know the write was going to occur that shouldn't matter to you. You may need locking at a higher level so that
if (getBidPrice(mystock)<10.0) {
sell(10000);
}
happens as an atomic operation and you don't end up selling at 5.0 rather than 10.0.
If the number of quotes really doesn't change then I would recommend allowing Qte objects to be added only in the constructor of Qte_List. This would make locking the collection irrelevant. The technical term for this is making Qte_List immutable.
That looks like a reasonable approach. Nit-picking, though, you probably shouldn't include the return statement inside the synchronized block:
public double ReadBid(String p_sym){
double bid;
synchronized (the_list) {
Qte q = Qte.FindBySym(the_list, p_sym);
bid = q.bid;
}
return bid;
}
I'm not sure if it's just my taste or there's some concurrency gotcha involved, but at the very least it looks cleaner to me ;-).
Yes this will work, anyway you don't need to do it yourself since it is already implemented in the Collections framework
Collections.synchronizedList
Your approach should do the trick, but as you stated, there can only be one reader and writer at a time. This isn't very scaleable.
There are some ways to improve performance without loosing thread-safety here.
You could use a ReadWriteLock for example. This will allow multiple readers at a time, but when someone gets the write-lock, all others must wait for him to finish.
Another way would be to use a proper collection. It seems you could exchange your list with a thread-safe implementation of Map. Have a look at the ConcurrentMap documentation for possible candidates.
Edit:
Assuming that you need ordering for your Map, have a look at the ConcurrentNavigableMap interface.
What you have will work, but locking the entire list every time you want to read or update the value of an element is not scalable. If this doesn't matter, then you're fine with what you have. If you want to make it more scalable consider the following...
You didn't say whether you need to be able to make structural changes to the_list (adding or removing elements), but if you don't, one big improvement would be to move the call to FindBySym() outside of the synchronized block. Then instead of synchronizing on the_list, you can just synchronize on q (the Qte object). That way you can update different Qte objects concurrently. Also, if you can make the Qte objects immutable as well you actually don't need any synchronization at all. (to update, just use the_list[i] = new Qte(...) ).
If you do need to be able to make structural changes to the list, you can use a ReentrantReadWriteLock to allow for concurrent reads and exclusive writes.
I'm also curious why you want to use an ArrayList rather than a synchronized HashMap.

Detecting concurrent modifications?

In a multi-threaded application I'm working on, we occasionally see ConcurrentModificationExceptions on our Lists (which are mostly ArrayList, sometimes Vectors). But there are other times when I think concurrent modifications are happening because iterating through the collection appears to be missing items, but no exceptions are thrown. I know that the docs for ConcurrentModificationException says you can't rely on it, but how would I go about making sure I'm not concurrently modifying a List? And is wrapping every access to the collection in a synchronized block the only way to prevent it?
Update: Yes, I know about Collections.synchronizedCollection, but it doesn't guard against somebody modifying the collection while you're iterating through it. I think at least some of my problem is happening when somebody adds something to a collection while I'm iterating through it.
Second Update If somebody wants to combine the mention of the synchronizedCollection and cloning like Jason did with a mention of the java.util.concurrent and the apache collections frameworks like jacekfoo and Javamann did, I can accept an answer.
Depending on your update frequency one of my favorites is the CopyOnWriteArrayList or CopyOnWriteArraySet. They create a new list/set on updates to avoid concurrent modification exception.
Your original question seems to be asking for an iterator that sees live updates to the underlying collection while remaining thread-safe. This is an incredibly expensive problem to solve in the general case, which is why none of the standard collection classes do it.
There are lots of ways of achieving partial solutions to the problem, and in your application, one of those may be sufficient.
Jason gives a specific way to achieve thread safety, and to avoid throwing a ConcurrentModificationException, but only at the expense of liveness.
Javamann mentions two specific classes in the java.util.concurrent package that solve the same problem in a lock-free way, where scalability is critical. These only shipped with Java 5, but there have been various projects that backport the functionality of the package into earlier Java versions, including this one, though they won't have such good performance in earlier JREs.
If you are already using some of the Apache Commons libraries, then as jacekfoo points out, the apache collections framework contains some helpful classes.
You might also consider looking at the Google collections framework.
Check out java.util.concurrent for versions of the standard Collections classes that are engineered to handle concurrency better.
Yes you have to synchronize access to collections objects.
Alternatively, you can use the synchronized wrappers around any existing object. See Collections.synchronizedCollection(). For example:
List<String> safeList = Collections.synchronizedList( originalList );
However all code needs to use the safe version, and even so iterating while another thread modifies will result in problems.
To solve the iteration problem, copy the list first. Example:
for ( String el : safeList.clone() )
{ ... }
For more optimized, thread-safe collections, also look at java.util.concurrent.
Usually you get a ConcurrentModificationException if you're trying to remove an element from a list whilst it's being iterated through.
The easiest way to test this is:
List<Blah> list = new ArrayList<Blah>();
for (Blah blah : list) {
list.remove(blah); // will throw the exception
}
I'm not sure how you'd get around it. You may have to implement your own thread-safe list, or you could create copies of the original list for writing and have a synchronized class that writes to the list.
You could try using defensive copying so that modifications to one List don't affect others.
Wrapping accesses to the collection in a synchronized block is the correct way to do this. Standard programming practice dictates the use of some sort of locking mechanism (semaphore, mutex, etc) when dealing with state that is shared across multiple threads.
Depending on your use case however you can usually make some optimizations to only lock in certain cases. For example, if you have a collection that is frequently read but rarely written, then you can allow concurrent reads but enforce a lock whenever a write is in progress. Concurrent reads only cause conflicts if the collection is in the process of being modified.
ConcurrentModificationException is best-effort because what you're asking is a hard problem. There's no good way to do this reliably without sacrificing performance besides proving that your access patterns do not concurrently modify the list.
Synchronization would likely prevent concurrent modifications, and it may be what you resort to in the end, but it can end up being costly. The best thing to do is probably to sit down and think for a while about your algorithm. If you can't come up with a lock-free solution, then resort to synchronization.
See the implementation. It basically stores an int:
transient volatile int modCount;
and that is incremented when there is a 'structural modification' (like remove). If iterator detects that modCount changed it throws Concurrent modification exception.
Synchronizing (via Collections.synchronizedXXX) won't do good since it does not guarantee iterator safety it only synchronizes writes and reads via put, get, set ...
See java.util.concurennt and apache collections framework (it has some classes that are optimized do work correctly in concurrent environment when there is more reads (that are unsynchronized) than writes - see FastHashMap.
You can also synchronize over iteratins over the list.
List<String> safeList = Collections.synchronizedList( originalList );
public void doSomething() {
synchronized(safeList){
for(String s : safeList){
System.out.println(s);
}
}
}
This will lock the list on synchronization and block all threads that try to access the list while you edit it or iterate over it. The downside is that you create a bottleneck.
This saves some memory over the .clone() method and might be faster depending on what you're doing in the iteration...
Collections.synchronizedList() will render a list nominally thread-safe and java.util.concurrent has more powerful features.
This will get rid of your concurrent modification exception. I won't speak to the efficiency however ;)
List<Blah> list = fillMyList();
List<Blah> temp = new ArrayList<Blah>();
for (Blah blah : list) {
//list.remove(blah); would throw the exception
temp.add(blah);
}
list.removeAll(temp);

Categories