Checkstyle reports this code as "The double-checked locking idiom is broken", but I don't think that my code actually is affected by the problems with double-checked locking.
The code is supposed to create a row in a database if a row with that id doesn't exist. It runs in a multi-threaded environment and I want to avoid the primary-key-exists SQL-exceptions.
The pseudo-code:
private void createRow(int id) {
Row row = dao().fetch(id);
if (row == null) {
synchronized (TestClass.class) {
row = dao().fetch(id);
if (row == null) {
dao().create(id);
}
}
}
}
I can agree that it looks like double-checked locking, but I am not using static variables and the code in fetch() and create() is probably too complex to be inlined and put out of order.
Am I wrong or checkstyle? :)
I think in this case, checkstyle is correct. In your code as presented, consider what would happen if two threads both had row == null at the entry to the synchronized block. Thread A would enter the block, and insert the new row. Then after thread A exits the block, thread B would enter the block (because it doesn't know what just happened), and try to insert the same new row again.
I see you just changed the code and added a pretty important missing line in there. In the new code, you might be able to get away with that, since two threads won't be relying on changes to a shared (static) variable. But you might be better off seeing if your DBMS supports a statement such as INSERT OR UPDATE.
Another good reason to delegate this functionality to the DBMS is if you ever need to deploy more than one application server. Since synchronized blocks don't work across machines, you will have to do something else in that case anyway.
Assuming you want that innermost line to read:
row = dao().create(id);
It's not a classic double-checked lock problem assuming dao().fetch is properly mutexed from the create method.
Edit: (code was updated)
The classic problem of a double-checked lock is having a value assigned before initialization occurs where two threads are accessing the same value.
Assuming the DAO is properly synchronized and will not return a partially initialized value, this doesn't suffer from the flaws of the double-checked lock idiom.
If you're tempted to write code like this, consider:
Since Java 1.4, synchronizing methods has become pretty cheap. It's not free but the runtime really doesn't suffer that much that it's worthwhile to risk data corruption.
Since Java 1.5, you have the Atomic* classes which allow you to read and set fields in an atomic way. Unfortunately, they don't solve your problem. Why they didn't add AtomicCachedReference or something (which would call an overridable method when get() is called and the current value == null) is beyond me.
Try ehcache. It allows you to set up a cache (i.e. and object which allows you to call code if a key is not contained in a map). This is usually what you want and the caches really solve your problem (and all those other problems which you didn't know they even existed).
As others have pointed out, this code will do what you intend as is, but only under a strict set of non-obvious assumptions:
The Java code is non-clustered (see #Greg H's answer)
The "row" reference is only being checked for null in the first line, before the synchronization block.
The reason the double-checked locking idiom is broken (per section 16.2.4 of Java Concurrency in Practice) is that it's possible for a thread running this method to see a non-null but improperly initialized reference to "row", before entering the synchronized block (unless "dao" provides proper synchronization). If your method were doing anything with "row" other than checking that it's null or not, it would be broken. As it stands, it is probably okay but very fragile - personally I wouldn't be comfortable committing this code if I thought there were even a remote chance that some other developer at some later time might modify the method without understanding the subtleties of DCL.
Related
I have a class that contains a boolean field like this one:
public class MyClass
{
private bool boolVal;
public bool BoolVal
{
get { return boolVal; }
set { boolVal = value; }
}
}
The field can be read and written from many threads using the property. My question is if I should fence the getter and setter with a lock statement? Or should I simply use the volatile keyword and save the locking? Or should I totally ignore multithreading since getting and setting boolean values atomic?
regards,
There are several issues here.
The simple first. Yes, reading and writing a boolean variable is an atomic operation. (clarification: What I mean is that read and write operations by themselves are atomic operations for booleans, not reading and writing, that will of course generate two operations, which together will not be atomic)
However, unless you take extra steps, the compiler might optimize away such reading and writing, or move the operations around, which could make your code operate differently from what you intend.
Marking the field as volatile means that the operations will not be optimized away, the directive basically says that the compiler should never assume the value in this field is the same as the previous one, even if it just read it in the previous instruction.
However, on multicore and multicpu machines, different cores and cpus might have a different value for the field in their cache, and thus you add a lock { } clause, or anything else that forces a memory barrier. This will ensure that the field value is consistent across cores. Additionally, reads and writes will not move past a memory barrier in the code, which means you have predictability in where the operations happen.
So if you suspect, or know, that this field will be written to and read from multiple threads, I would definitely add locking and volatile to the mix.
Note that I'm no expert in multithreading, I'm able to hold my own, but I usually program defensively. There might (I would assume it is highly likely) that you can implement something that doesn't use a lock (there are many lock-free constructs), but sadly I'm not experienced enough in this topic to handle those things. Thus my advice is to add both a lock clause and a volatile directive.
volatile alone is not enough and serves for a different purpose, lock should be fine, but in the end it depends if anyone is going to set boolVal in MyClass iself, who knows, you may have a worker thread spinning in there. It also depends and how you are using boolVal internally. You may also need protection elsewhere. If you ask me, if you are not DEAD SURE you are going to use MyClass in more than one thread, then it's not worth even thinking about it.
P.S. you may also want to read this section
Same with the follow link, I use the same code with the questioner.
Java multi-threading atomic reference assignment
In my code, there
HashMap<String,String> cache = new HashMap<String,String>();
public class myClass {
private HashMap<String,String> cache = null;
public void init() {
refreshCache();
}
// this method can be called occasionally to update the cache.
//Only one threading will get to this code.
public void refreshCache() {
HashMap<String,String> newcache = new HashMap<String,String>();
// code to fill up the new cache
// and then finally
cache = newcache; //assign the old cache to the new one in Atomic way
}
//Many threads will run this code
public void getCache(Object key) {
ob = cache.get(key)
//do something
}
}
I read the sjlee's answer again and again, I can't understand in which case these code will go wrong. Can anyone give me a example?
Remember I don't care about the getCache function will get the old data.
I'm sorry I can't add comment to the above question because I don't have 50 reputation.
So I just add a new question.
Without a memory barrier you might see null or an old map but you could see an incomplete map. I.e. you see bits of it but not all. Thus is not a problem if you don't mind entries being missing but you risk seeing the Map object but not anything it refers to resulting in a possible NPE.
There is no guarantee you will see a complete Map.
final fields will be visible but non - final fields might not.
this is a very interesting problem, and it shows that one of your core assumptions
"Remember I don't care about the getCache function will get the old
data."
is not correct.
we think, that if "refreshCache" and "getCache" is not synchronized, then we will only get old data, which is not true.
Their call by the initial thread may never reflect in other threads. Since cache is not volatile, every thread is free to keep it's own local copy of it and never make it consistent across threads.
Because the "visibility" aspect of multi-threading, which says that unless we use appropriate locking, or use volatile, we do not trigger a happens-before scenario, which forces threads to make shared variable value consistent across the multiple processors they are running on, which means "cache" , may never get initialized causing an obvious NPE in getCache
to understand this properly, i would recommend reading section 16.2.4 of "Java concurrency in practice" book which deals with a similar problem in double checked locking code.
Solution: would be
To make refreshCache synchronized to force, all threads to update their copy of HashMap whenever any one thread calls it, or
To make cache volatile or
You would have to call refreshCache in every single thread that calls getCache which kind of defeats the purpose of a common cache.
I know you have to synchronize around anything that would change the structure of a hashmap (put or remove) but it seems to me you also have to synchronize around reads of the hashmap otherwise you might be reading while another thread is changing the structure of the hashmap.
So I sync around gets and puts to my hashmap.
The only machines I have available to me to test with all only have one processor so I never had any real concurrency until the system went to production and started failing. Items were missing out of my hashmap. I assume this is because two threads were writing at the same time, but based on the code below, this should not be possible. When I turned down the number of threads to 1 it started working flawlessly, so it's definitely a threading problem.
Details:
// something for all the threads to sync on
private static Object EMREPORTONE = new Object();
synchronized (EMREPORTONE)
{
reportdatacache.put("name.." + eri.recip_map_id, eri.name);
reportdatacache.put("subjec" + eri.recip_map_id, eri.subject);
etc...
}
... and elsewhere....
synchronized (EMREPORTONE)
{
eri.name = (String)reportdatacache.get("name.." + eri.recip_map_id);
eri.subject = (String)reportdatacache.get("subjec" + eri.recip_map_id);
etc...
}
and that's it. I pass around reportdatacache between functions, but that's just the reference to the hashmap.
Another important point is that this is running as a servlet in an appserver (iplanet to be specific, but I know none of you have ever heard of that)
But regardless, EMREPORTONE is global to the webserver process, no two threads should be able to step on each other, yet my hashmap is getting wrecked. Any thoughts?
In servlet container environment static variables depend on classloader. So you may think that you're dealing with same static instance, but in fact it could be completely different one.
Additionally, check if you do not use the map by escaped reference elsewhere and write/remove keys from it.
And yes, use ConcurrentHashMap instead.
Yes, synchronization is not only important when writing, but also when reading. While a write will be performed under mutually exclusion, a reader might access an errenous state of the map.
I cannot recommend you under any circumstances to synchronize the Java Collections manually, there are thread-safe counterparts: Collections.synchronizedMap and ConcurrentHashMap. Use them, they will ensure, that access to them is safe in a multithreaded environment.
Futher hints, it seems that everyone is accesing the datareportcache. Is there only one instance of that object? Why not synchronize then on the cache itself? But forget then when trying to solve your problems, use the sugar from java.util.concurrent.
As I see it there are 3 possibilities here:
You are locking on two different objects. EMREPORTONE is private static however and the code that accesses the reportdatacache is in one file only. Ok, that isn't it then. But I would recommend locking on reportdatacache instead of EMREPORTONE however. Cleaner code.
You are missing some read or write to reportdatacache somewhere. There are other accesses to the map that are not synchronized. Are things never removed from the cache?
This isn't a synchronization problem but rather a race condition issue. The data in the hashmap is fine but you are expecting things to be in the cache but they haven't be stored by the other thread yet. Maybe 2 requests come in for the same eri at the same time and they are both putting values into the cache? Maybe check to see if the old value returned by put(...) is always null? Maybe explaining more about how you know that items are missing from the map would help with this.
As an aside, you are doing this:
reportdatacache.put("name.." + eri.recip_map_id, eri.name);
reportdatacache.put("subjec" + eri.recip_map_id, eri.subject);
But it seems like you really should be storing the eri by its id.
reportdatacache.put(recip_map_id, eri);
Then you aren't creating fake keys with the "name.." prefix. Or maybe you should create a NameSubject private static class to store the name and subject in the cache. Cleaner.
Hope something here helps.
All,
I started learning Java threads in the past few days and have only read about scenarios where even after using synchronizer methods/blocks, the code/class remains vulnerable to concurrency issues. Can anyone please provide a scenario where synchronized blocks/methods fail ? And, what should be the alternative in these cases to ensure thread safety.
Proper behaviour under concurrent access is a complex topic, and it's not as simple as just slapping synchronized on everything, as now you have to think about how operations might interleave.
For instance, imagine you have a class like a list, and you want to make it threadsafe. So you make all the methods synchronized and continue. Chances are, clients might be using your list in the following way:
int index = ...; // this gets set somewhere, maybe passed in as an argument
// Check that the list has enough elements for this call to make sense
if (list.size() > index)
{
return list.get(index);
}
else
{
return DEFAULT_VALUE;
}
In a single-threaded environment this code is perfectly safe. However, if the list is being accessed (and possibly modified) concurrently, it's possible for the list's size to change after the call to size(), but before the call to get(). So the list could "impossibly" throw an IndexOutOfBoundsException (or similar) in this case, even though the size was checked beforehand.
There's no shortcut of how to fix this - you simply need to think carefully about the use-cases for your class/interface, and ensure that you can actually guarantee them when interleaved with any other valid operations. Often this might require some additional complexity, or simply more specifics in the documentation. If the hypothetical list class specified that it always synchronized on its own monitor, than that specific situation could be fixed as
synchronized(list)
{
if (list.size() > index)
{
return list.get(index);
}
}
but under other synchronization schemes, this would not work. Or it might be too much of a bottleneck. Or forcing the clients to make the multiple calls within the same lexical scope may be an unacceptable constraint. It all depends on what you're trying to achieve, as to how you can make your interface safe, performant and elegant.
Scenario 1 Classic deadlock:
Object Mutex1;
Object Mutex2;
public void method1(){
synchronized(Mutex1){
synchronized(Mutex2){
}
}
}
public void method2(){
synchronized(Mutex2){
synchronized(Mutex1){
}
}
}
Other scenarios include anything with a shared resource even a variable, because one thread could change the variables contents, or even make it point to null without the other thread knowing. Writing to IO has similar issues try writing code to a file using two threads or out to a sockeet.
Very good articles about concurrency and the Java Memory Model can be found at Angelika Langers website
"vulnerable to concurrency issues" is very vague. It would help to know what you have actually read and where. Two things that come to mind:
Just slapping on "synchronized" somewhere does not mean the code is synchronized correctly - it can be very hard to do correctly, and developers frequently miss some problematic scenarios even when they think they're doing it right.
Even if the synchronization correctly prevents non-deterministic changes to the data, you can still run into deadlocks.
Synchronized methods prevent other methods/blocks requiring same monitor from being executed when you execute them.
But if you have 2 methods, lets say int get() and set(int val) and have somewhere else method which does
obj.set(1+obj.get());
and this method runs in two threads, you can end with value increased by one or by two, depending on unpredictable factors.
Therefore you must somehow protect using such methods too (but only if its needed).
btw. use each monitor for as few functions/blocks as possible, so only those who can wrongly influence each other are synchronized.
And try to expose as few as possible methods requiring further protection.
I have this snippet of code
private Templates retrieveFromCache(String name) {
TemplatesWrapper t = xlCache.get(name);
synchronized(t){
if (!t.isValid()) {
xlCache.remove(name);
return null;
}
}
return t.getTemplate();
}
xlCache is a ConcurrentHashMap; my reason for synchronizing on t is that 2 threads could interleave where by the time Thread 1 verifies the predicate Thread 2 has already removed the object from the map and then a NullPointerException would be thrown. Is my assumption correct as I know concurrency is one of the more difficult things to reason about. And then to my original question, can I lock on t even if it's local?
And this is private method as well which gets called from a public method, does it make a diff?
EDIT: MY original premise that a NullPointerException is thrown was incorrect as remove() returns boolean making synchronization moot; however, my question was of locking on a local object was answered.
ConcurrentHashMap (and Map/ConcurrentMap in general) won't throw an exception if the specified key doesn't exist. That's why the remove method returns a boolean, to indicate whether or not anything was actually removed.
But yes, you can lock on the local variable. After all, you're really locking via a reference (and the monitor associated with the referenced object), not a variable - and the other concurrently running method would have the same reference.
You can lock on any object you want. However, in your case, it looks like you could solve it in a clearer and safer.
Synchronization should be as localized as possible. Since you're getting the TemplatesWrapper from some unknown location, its possible that anyone can synchronize on it making it really hard to control the concurrency. It should also be as obvious as possible just by looking at the code why something gets locked.
It would be better off to put the synchronization inside xlCache with something like removeIfInvalid()
Yep, that will work just fine.
You can synchronize on any object in java so you code will work and will be thread safe.
Appart from the fact that you aren't checking for t being null. I'm guessing you have just missed that out of your code example?
A better way to do this would be to use the 2 arg remove method from ConcurrentMap (assuming t has a reasonable equals implementation). then you don't need any synchronization:
private Templates retrieveFromCache(String name) {
TemplatesWrapper t = xlCache.get(name);
if (!t.isValid()) {
xlCache.remove(name, t);
return null;
}
return t.getTemplate();
}
If remove(null) would call a null pointer exception, this seems reasonable. If you don't expect collision to be a regular problem, you could also implement a possibly-faster version of the code, by just wrapping a try/catch around that instead of a synchronized.
In either case, I'd add a comment there to explain why you did what you did, so that a month from now, it still makes sense.