I understand the overall concepts of multi-threading and synchronization but am new to writing thread-safe code. I currently have the following code snippet:
synchronized(compiledStylesheets) {
if(compiledStylesheets.containsKey(xslt)) {
exec = compiledStylesheets.get(xslt);
} else {
exec = compile(s, imports);
compiledStylesheets.put(xslt, exec);
}
}
where compiledStylesheets is a HashMap (private, final). I have a few questions.
The compile method can take a few hundred milliseconds to return. This seems like a long time to have the object locked, but I don't see an alternative. Also, it is unnecessary to use Collections.synchronizedMap in addition to the synchronized block, correct? This is the only code that hits this object other than initialization/instantiation.
Alternatively, I know of the existence of a ConcurrentHashMap but I don't know if that's overkill. The putIfAbsent() method will not be usable in this instance because it doesn't allow me to skip the compile() method call. I also don't know if it will solve the "modified after containsKey() but before put()" problem, or if that's even really a concern in this case.
Edit: Spelling
For tasks of this nature, I highly recommend Guava caching support.
If you can't use that library, here is a compact implementation of a Multiton. Use of the FutureTask was a tip from assylias, here, via OldCurmudgeon.
public abstract class Cache<K, V>
{
private final ConcurrentMap<K, Future<V>> cache = new ConcurrentHashMap<>();
public final V get(K key)
throws InterruptedException, ExecutionException
{
Future<V> ref = cache.get(key);
if (ref == null) {
FutureTask<V> task = new FutureTask<>(new Factory(key));
ref = cache.putIfAbsent(key, task);
if (ref == null) {
task.run();
ref = task;
}
}
return ref.get();
}
protected abstract V create(K key)
throws Exception;
private final class Factory
implements Callable<V>
{
private final K key;
Factory(K key)
{
this.key = key;
}
#Override
public V call()
throws Exception
{
return create(key);
}
}
}
I think you are looking for a Multiton.
There's a very good Java one here that #assylas posted some time ago.
You can loosen the lock at the risk of an occasional doubly compiled stylesheet in race condition.
Object y;
// lock here if needed
y = map.get(x);
if(y == null) {
y = compileNewY();
// lock here if needed
map.put(x, y); // this may happen twice, if put is t.s. one will be ignored
y = map.get(x); // essential because other thread's y may have been put
}
This requires get and put to be atomic, which is true in the case of ConcurrentHashMap and you can achieve by wrapping individual calls to get and put with a lock in your class. (As I tried to explain with "lock here if needed" comments - the point being you only need to wrap individual calls, not have one big lock).
This is a standard thread safe pattern to use even with ConcurrentHashMap (and putIfAbsent) to minimize the cost of compiling twice. It still needs to be acceptable to compile twice sometimes, but it should be okay even if expensive.
By the way, you can solve that problem. Usually the above pattern isn't used with a heavy function like compileNewY but a lightweight constructor new Y(). e.g. do this:
class PrecompiledY {
public volatile Y y;
private final AtomicBoolean compiled = new AtomicBoolean(false);
public void compile() {
if(!compiled.getAndSet(true)) {
y = compile();
}
}
}
// ...
ConcurrentMap<X, PrecompiledY> myMap; // alternatively use proper locking
py = map.get(x);
if(py == null) {
py = new PrecompiledY(); // much cheaper than compiling
map.put(x, y); // this may happen twice, if put is t.s. one will be ignored
y = map.get(x); // essential because other thread's y may have been put
y.compile(); // object that didn't get inserted never gets compiled
}
Also:
Alternatively, I know of the existence of a ConcurrentHashMap but I don't know if that's overkill.
Given that your code is heavily locking, ConcurrentHashMap is almost certainly far faster, so not overkill. (And much more likely to be bug-free. Concurrency bugs are not fun to fix.)
Please see Erickson's comment below. Using double-checked locking with Hashmaps is not very smart
The compile method can take a few hundred milliseconds to return. This seems like a long time to have the object locked, but I don't see an alternative.
You can use double-checked locking, and note that you don't need any lock before get since you never remove anything from the map.
if(compiledStylesheets.containsKey(xslt)) {
exec = compiledStylesheets.get(xslt);
} else {
synchronized(compiledStylesheets) {
if(compiledStylesheets.containsKey(xslt)) {
// another thread might have created it while
// this thread was waiting for lock
exec = compiledStylesheets.get(xslt);
} else {
exec = compile(s, imports);
compiledStylesheets.put(xslt, exec);
}
}
}
}
Also, it is unnecessary to use Collections.synchronizedMap in addition to the synchronized block, correct?
Correct
This is the only code that hits this object other than initialization/instantiation.
First of all, the code as you posted it is race-condition-free because containsKey() result will never change while compile() method is running.
Collections.synchronizedMap() is useless for your case as stated above because it wraps all map methods into a synchronized block using either this as a mutex or another object you provided (for two-argument version).
IMO using ConcurrentHashMap is also not an option because it stripes locks based on key hashCode() result; its concurrent iterators is also useless here.
If you really want compile() out of synchronized block, you may pre-calculate if before checking containsKey(). This may draw the overall performance back, but may be better than calling it in synchronized block. To make a decision, personally I would consider how often key "miss" is happening and so, which option is preferrable - keep the lock for longer times or calculate your stuff always.
Related
my question is about synchronisation and preventing deadlocks when using threads. In this example an object simply holds an integer variable and multiple threads call swapValue on those objects.
public class Data {
private long value;
public Data(long value) {
this.value = value;
}
public synchronized long getValue() {
return value;
}
public synchronized void setValue(long value) {
this.value = value;
}
public void swapValue(Data other) {
long temp = getValue();
long newValue = other.getValue();
setValue(newValue);
other.setValue(temp);
}
}
The swapValue method should be thread safe and should not skip swapping the values if the resources are not available. Simply using the synchronized keyword on the method signature will result in a deadlock. I came up with this (apparently) working solution, which is only based on the probability that one thread unlocks its resource and the other tries to claim it while the resource is still unlocked.
private Lock lock = new ReentrantLock();
...
public void swapValue(Data other) {
lock.lock();
while(!other.lock.tryLock())
{
lock.unlock();
lock.lock();
}
long temp = getValue();
long newValue = other.getValue();
setValue(newValue);
other.setValue(temp);
other.lock.unlock();
lock.unlock();
}
To me this looks like a hack. Is this a common solution for these kind of problems? Are there solutions that are "more deterministic" in their behaviour and also applicable in practice?
There are two issues at play here:
First, mixing Data.lock with the built-in lock used by the synchronized keyword
Second, inconsistent locking order among four (!) locks - this.lock, other.lock, the built-in lock of this, and the built-in lock of other
Even without synchronized, a.swapValue(b) and b.swapValue(a) can deadlock unless you use your approach to try to spin while locking and unlocking, which is inefficient.
One approach that you could take is add a field with some kind of final unique ID to each Data object - when swapping data of two objects, lock the one with a lower ID before the one with the higher ID, regardless of which is this and which is other. Note that System.identityHashCode is unfortunately not unique so it can't be easily used here.
The unlock ordering isn't critical here, but unlocking in the reverse order of locking is generally a good practice to follow where possible.
#Nanofarad has the right idea: Give every Data instance a unique, permanent numeric ID, and then use those IDs to decide which object to lock first. Here's what that might look like in practice:
private static void lockBoth(Data a, Data b) {
Lock first = a.lock;
Lock second = b.lock;
if (a.getID() < b.getID()) {
first = b.lock;
second = a.lock;
}
first.lock();
second.lock();
}
private static void unlockBoth(Data a, Data b) {
a.lock.unlock();
b.lock.unlock();
// Note: #Queeg suggests in comments below that in the general case,
// it would be good practice to make this routine always unlock the
// two locks in the order opposite to which `lockBoth()` locked them.
// See https://stackoverflow.com/a/8949355/801894 for an explanation.
}
public void swapValue(Data other) {
lockBoth(this, other);
...swap 'em...
unlockBoth(this, other);
}
In your case, just use AtomicInteger or AtomicLong instead inventing the wheel again. About the synchronization and deadlocks part of your question in general - DO NOT RELY ON PROBABILITY -- it is way too tricky and too easy to get it wrong, unless you're experienced mathematician knowing exactly what youre doing - but even then it is risky. One example when probability is used is UUID, but if computers will get fast enough then the code that shouldn't reasonably break till the end of universe can break in matter of milliseconds or faster, it is better to write code that do not rely on probability, especially concurrent code.
ConcurrentHashMap<String, Config> configStore = new ConcurrentHashMap<>();
...
void updateStore() {
Config newConfig = generateNewConfig();
Config oldConfig = configStore.get(configName);
if (newConfig.replaces(oldConfig)) {
configStore.put(configName, newConfig);
}
}
The ConcurrentHashMap can be read by multiple threads but can be updated only by a single thread. I'd like to block the get() operations when a put() operation is in progress. The rationale here being that if a put() operation is in progress, that implies the current entry in the map is stale and all get() operations should block until the put() is complete. How can I go about achieving this in Java without synchronizing the whole map?
It surely looks like you can defer this to compute and it will take care for that for you:
Config newConfig = generateNewConfig();
configStore.compute(
newConfig,
(oldConfig, value) -> {
if (newConfig.replaces(oldConfig)) {
return key;
}
return oldConfig;
}
);
You get two guarantees from using this method:
Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple
and
The entire method invocation is performed atomically
according to its documentation.
The accepted answer proposed to use compute(...) instead of put().
But if you want
to block the get() operations when a put() operation is in progress
then you should also use compute(...) instead of get().
That's because for ConcurrentHashMap get() doesn't block while compute() is in progress.
Here is a unit test to prove it:
#Test
public void myTest() throws Exception {
var map = new ConcurrentHashMap<>(Map.of("key", "v1"));
var insideComputeLatch = new CountDownLatch(1);
var threadGet = new Thread(() -> {
try {
insideComputeLatch.await();
System.out.println("threadGet: before get()");
var v = map.get("key");
System.out.println("threadGet: after get() (v='" + v + "')");
} catch (InterruptedException e) {
throw new Error(e);
}
});
var threadCompute = new Thread(() -> {
System.out.println("threadCompute: before compute()");
map.compute("key", (k, v) -> {
try {
System.out.println("threadCompute: inside compute(): start");
insideComputeLatch.countDown();
threadGet.join();
System.out.println("threadCompute: inside compute(): end");
return "v2";
} catch (InterruptedException e) {
throw new Error(e);
}
});
System.out.println("threadCompute: after compute()");
});
threadGet.start();
threadCompute.start();
threadGet.join();
threadCompute.join();
}
Output:
threadCompute: before compute()
threadCompute: inside compute(): start
threadGet: before get()
threadGet: after get() (v='v1')
threadCompute: inside compute(): end
threadCompute: after compute()
This fundamentally doesn't work. Think about it: When the code realizes that the information is stale, some time passes and then a .put call is done. Even if the .put call somehow blocks, the timeline is as follows:
Some event occurs in the cosmos that makes your config stale.
Some time passes. [A]
Your run some code that realizes that this is the case.
Some time passes. [B]
Your code begins the .put call.
An extremely tiny amount of time passes. [C]
Your code finishes the .put call.
What you're asking for is a strategy that eliminates [C] while doing absolutely nothing whatsoever to prevent reads of stale data at point [A] and [B], both of which seem considerably more problematic.
Whatever, just give me the answer
ConcurrentHashMap is just wrong if you want this, it's a thing that is designed for multiple concurrent (hence the name) accesses. What you want is a plain old HashMap, where every access to it goes through a lock. Or, you can turn the logic around: The only way to do what you want is to engage a lock for everything (both reads and writes); at which point the 'Concurrent' part of ConcurrentHashMap has become completely pointless:
private final Object lock = new Object[0];
public void updateConfig() {
synchronized (lock) {
// do the stuff
}
}
public Config getConfig(String key) {
synchronized (lock) {
return configStore.get(key);
}
}
NB: Use private locks; public locks are like public fields. If there is an object that code outside of your control can get a ref to, and you lock on it, you need to describe the behaviour of your code in regards to that lock, and then sign up to maintain that behaviour forever, or indicate clearly when you change the behaviour that your API just went through a breaking change, and you should thus also bump the major version number.
For the same reason public fields are almost invariably a bad idea in light of the fact that you want API control, you want the refs you lock on to be not accessible to anything except code you have under your direct control. Hence why the above code does not use the synchronized keyword on the method itself (as this is usually a ref that leaks all over the place).
Okay, maybe I want the different answer
The answer is either 'it does not matter' or 'use locks'. If [C] truly is all you care about, that time is so short, and pales in comparison to the times for [A] and [B], that if A/B are acceptable, certainly so is C. In that case: Just accept the situation.
Alternatively, you can use locks but lock even before the data ever becomes stale. This timeline guarantees that no stale data reads can ever occur:
The cosmos cannot ever make your data stale.
Your code, itself, is the only causal agent for stale date.
Whenever code runs that will or may end up making data stale:
Acquire a lock before you even start.
Do the thing that (may) make some config stale.
Keep holding on to the lock; fix the config.
Release the lock.
How can I go about achieving this in Java without synchronizing the whole map?
There are some good answers here but there is a simpler answer to use the ConcurrentMap.replace(key, oldValue, newValue) method which is atomic.
while (true) {
Config newConfig = generateNewConfig();
Config oldConfig = configStore.get(configName);
if (!newConfig.replaces(oldConfig)) {
// nothing to do
break;
}
// this is atomic and will only replace the config if the old hasn't changed
if (configStore.replace(configName, oldConfig, newConfig)) {
// if we replaced it then we are done
break;
}
// otherwise, loop around and create a new config
}
I have a scenario where I have to maintain a Map which can be populated by multiple threads, each modifying their respective List (unique identifier/key being the thread name), and when the list size for a thread exceeds a fixed batch size, we have to persist the records to the database.
Aggregator class
private volatile ConcurrentHashMap<String, List<T>> instrumentMap = new ConcurrentHashMap<String, List<T>>();
private ReentrantLock lock ;
public void addAll(List<T> entityList, String threadName) {
try {
lock.lock();
List<T> instrumentList = instrumentMap.get(threadName);
if(instrumentList == null) {
instrumentList = new ArrayList<T>(batchSize);
instrumentMap.put(threadName, instrumentList);
}
if(instrumentList.size() >= batchSize -1){
instrumentList.addAll(entityList);
recordSaver.persist(instrumentList);
instrumentList.clear();
} else {
instrumentList.addAll(entityList);
}
} finally {
lock.unlock();
}
}
There is one more separate thread running after every 2 minutes (using the same lock) to persist all the records in Map (to make sure we have something persisted after every 2 minutes and the map size does not gets too big)
if(//Some condition) {
Thread.sleep(//2 minutes);
aggregator.getLock().lock();
List<T> instrumentList = instrumentMap.values().stream().flatMap(x->x.stream()).collect(Collectors.toList());
if(instrumentList.size() > 0) {
saver.persist(instrumentList);
instrumentMap .values().parallelStream().forEach(x -> x.clear());
aggregator.getLock().unlock();
}
}
This solution is working fine in almost for every scenario that we tested, except sometimes we see some of the records went missing, i.e. they are not persisted at all, although they were added fine to the Map.
My questions are:
What is the problem with this code?
Is ConcurrentHashMap not the best solution here?
Does the List that is used with the ConcurrentHashMap have an issue?
Should I use the compute method of ConcurrentHashMap here (no need I think, as ReentrantLock is already doing the same job)?
The answer provided by #Slaw in the comments did the trick. We were letting the instrumentList instance escape in non-synchronized way i.e. access/operations are happening over list without any synchonization. Fixing the same by passing the copy to further methods did the trick.
Following line of code is the one where this issue was happening
recordSaver.persist(instrumentList);
instrumentList.clear();
Here we are allowing the instrumentList instance to escape in non-synchronized way i.e. it is passed to another class (recordSaver.persist) where it was to be actioned on but we are also clearing the list in very next line(in Aggregator class) and all of this is happening in non-synchronized way. List state can't be predicted in record saver... a really stupid mistake.
We fixed the issue by passing a cloned copy of instrumentList to recordSaver.persist(...) method. In this way instrumentList.clear() has no affect on list available in recordSaver for further operations.
I see, that you are using ConcurrentHashMap's parallelStream within a lock. I am not knowledgeable about Java 8+ stream support, but quick searching shows, that
ConcurrentHashMap is a complex data structure, that used to have concurrency bugs in past
Parallel streams must abide to complex and poorly documented usage restrictions
You are modifying your data within a parallel stream
Based on that information (and my gut-driven concurrency bugs detector™), I wager a guess, that removing the call to parallelStream might improve robustness of your code. In addition, as mentioned by #Slaw, you should use ordinary HashMap in place of ConcurrentHashMap if all instrumentMap usage is already guarded by lock.
Of course, since you don't post the code of recordSaver, it is possible, that it too has bugs (and not necessarily concurrency-related ones). In particular, you should make sure, that the code that reads records from persistent storage — the one, that you are using to detect loss of records — is safe, correct, and properly synchronized with rest of your system (preferably by using a robust, industry-standard SQL database).
It looks like this was an attempt at optimization where it was not needed. In that case, less is more and simpler is better. In the code below, only two concepts for concurrency are used: synchronized to ensure a shared list is properly updated and final to ensure all threads see the same value.
import java.util.ArrayList;
import java.util.List;
public class Aggregator<T> implements Runnable {
private final List<T> instruments = new ArrayList<>();
private final RecordSaver recordSaver;
private final int batchSize;
public Aggregator(RecordSaver recordSaver, int batchSize) {
super();
this.recordSaver = recordSaver;
this.batchSize = batchSize;
}
public synchronized void addAll(List<T> moreInstruments) {
instruments.addAll(moreInstruments);
if (instruments.size() >= batchSize) {
storeInstruments();
}
}
public synchronized void storeInstruments() {
if (instruments.size() > 0) {
// in case recordSaver works async
// recordSaver.persist(new ArrayList<T>(instruments));
// else just:
recordSaver.persist(instruments);
instruments.clear();
}
}
#Override
public void run() {
while (true) {
try { Thread.sleep(1L); } catch (Exception ignored) {
break;
}
storeInstruments();
}
}
class RecordSaver {
void persist(List<?> l) {}
}
}
I've this kind of code:
public class RecursiveQueue {
//#Inject
private QueueService queueService;
public static void main(String[] args) {
RecursiveQueue test = new RecursiveQueue();
test.enqueue(new Node("X"), true);
test.enqueue(new Node("Y"), false);
test.enqueue(new Node("Z"), false);
}
private void enqueue(final Node node, final boolean waitTillFinished) {
final AtomicLong totalDuration = new AtomicLong(0L);
final AtomicInteger counter = new AtomicInteger(0);
AfterCallback callback= new AfterCallback() {
#Override
public void onFinish(Result result) {
for(Node aNode : result.getChildren()) {
counter.incrementAndGet();
queueService.requestProcess(aNode, this);
}
totalDuration.addAndGet(result.getDuration());
if(counter.decrementAndGet() <= 0) { //last one
System.out.println("Processing of " + node.toString() + " has finished in " + totalDuration.get() + " ms");
if(waitTillFinished) {
counter.notify();
}
}
}
};
counter.incrementAndGet();
queueService.requestProcess(node, callback);
if(waitTillFinished) {
try {
counter.wait();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
Imagine there is a queueService which uses blocking queue and few consumer threads to process nodes = calls DAO to fetch children of nodes (it's a tree).
So requestProcess method just enqueues the node and does not block.
Is there some better/safe way to avoid using wait/notify in this sample ?
According to some findings I can use Phaser (but I work on java 6) or conditions (but I'm not using locks).
There is no synchronized anything in your example. You mustn't call o.wait() or o.notify() except from within a synchronized(o) {...} block.
Your call to wait() is not in a loop. This may not ever happen in your JVM, but the language spec permits wait() to return prematurely (that's known as a spurious wakeup) More generally, it is good practice to always use a loop because it's a familiar design pattern. A while statement costs no more than an if, and you should have it because of the possibility of spurious wakeup, and you'd absolutely must have it in a multi-consumer situation, and so you might as well just always write it that way.
Since you must use synchronized blocks in order to use wait() and notify(), there probably is no reason to use Atomic anything.
This "recursive" thing seems awfully complicated, what with the callback adding more items to the queue. How deep can that go?
I think you are looking for CountDownLatch.
You actually use locks or, let's put it this way, you should be using them if you try to use wait/notify as James pointed out. As you are bound to Java 1.6 and ForkJoin or Phaser are not available to you the choice is either implementing wait/notify properly or using Condition with explicit lock. This would be a matter of your personal preferences.
Another alternative is to try and restructure your algorithm so you first get to know the entire set of steps you would need to execute. It is not always possible though.
I have a web application and I am using Oracle database and I have a method basically like this:
public static void saveSomethingImportantToDataBase(Object theObjectIwantToSave) {
if (!methodThatChecksThatObjectAlreadyExists) {
storemyObject() //pseudo code
}
// Have to do a lot other saving stuff, because it either saves everything or nothing
commit() // pseudo code to actually commit all my changes to the database.
}
Right now there is no synchronization of any kind so n threads can of course access this method freely, the problem arises when 2 threads enter this method both check and of course there is nothing just yet, and then they can both commit the transaction, creating a duplicate object.
I do not want to solve this with a unique key identifier in my Database, because I don't think I should be catching that SQLException.
I also cannot check right before the commit, because there are several checks not only 1, which would take a considerable amount of time.
My experience with locks and threads is limited, but my idea is basically to lock this code on the object that it is receiving. I don't know if for example say I receive an Integer Object, and I lock on my Integer with value 1, would that only prevent threads with another Integer with value 1 from entering, and all the other threads with value != 1 can enter freely?, is this how it works?.
Also if this is how it works, how is the lock object compared? how is it determined that they are in fact the same object?. A good article on this would also be appreciated.
How would you solve this?.
Your idea is a good one. This is the simplistic/naive version, but it's unlikely to work:
public static void saveSomethingImportantToDataBase(Object theObjectIwantToSave) {
synchronized (theObjectIwantToSave) {
if (!methodThatChecksThatObjectAlreadyExists) {
storemyObject() //pseudo code
}
// Have to do a lot other saving stuff, because it either saves everything or nothing
commit() // pseudo code to actually commit all my changes to the database.
}
}
This code uses the object itself as the lock. But it has to be the same object (ie objectInThreadA == objectInThreadB) if it's to work. If two threads are operating on an object that is a copy of each other - ie has the same "id" for example, then you'll need to either synchronize the whole method:
public static synchronized void saveSomethingImportantToDataBase(Object theObjectIwantToSave) ...
which will of course greatly reduce concurrency (throughput will drop to one thread at a time using the method - to be avoided).
Or find a way to get the same lock object based on the save object, like this approach:
private static final ConcurrentHashMap<Object, Object> LOCKS = new ConcurrentHashMap<Object, Object>();
public static void saveSomethingImportantToDataBase(Object theObjectIwantToSave) {
synchronized (LOCKS.putIfAbsent(theObjectIwantToSave.getId(), new Object())) {
....
}
LOCKS.remove(theObjectIwantToSave.getId()); // Clean up lock object to stop memory leak
}
This last version it the recommended one: It will ensure that two save objects that share the same "id" are locked with the same lock object - the method ConcurrentHashMap.putIfAbsent() is threadsafe, so "this will work", and it requires only that objectInThreadA.getId().equals(objectInThreadB.getId()) to work properly. Also, the datatype of getId() can be anything, including primitives (eg int) due to java's autoboxing.
If you override equals() and hashcode() for your object, then you could use the object itself instead of object.getId(), and that would be an improvement (Thanks #TheCapn for pointing this out)
This solution will only work with in one JVM. If your servers are clustered, that a whole different ball game and java's locking mechanism will not help you. You'll have to use a clustered locking solution, which is beyond the scope of this answer.
Here is an option adapted from And360's comment on Bohemian's answer, that tries to avoid race conditions, etc. Though I prefer my other answer to this question over this one, slightly:
import java.util.HashMap;
import java.util.concurrent.atomic.AtomicInteger;
// it is no advantage of using ConcurrentHashMap, since we synchronize access to it
// (we need to in order to "get" the lock and increment/decrement it safely)
// AtomicInteger is just a mutable int value holder
// we don't actually need it to be atomic
static final HashMap<Object, AtomicInteger> locks = new HashMap<Integer, AtomicInteger>();
public static void saveSomethingImportantToDataBase(Object objectToSave) {
AtomicInteger lock;
synchronized (locks) {
lock = locks.get(objectToSave.getId());
if (lock == null) {
lock = new AtomicInteger(1);
locks.put(objectToSave.getId(), lock);
}
else
lock.incrementAndGet();
}
try {
synchronized (lock) {
// do synchronized work here (synchronized by objectToSave's id)
}
} finally {
synchronized (locks) {
lock.decrementAndGet();
if (lock.get() == 0)
locks.remove(id);
}
}
}
You could split these out into helper methods "get lock object" and "release lock" or what not, as well, to cleanup the code. This way feels a little more kludgey than my other answer.
Bohemian's answer seems to have race condition problems if one thread is in the synchronized section while another thread removes the synchro-object from the Map, etc. So here is an alternative that leverages WeakRef's.
// there is no synchronized weak hash map, apparently
// and Collections.synchronizedMap has no putIfAbsent method, so we use synchronized(locks) down below
WeakHashMap<Integer, Integer> locks = new WeakHashMap<>();
public void saveSomethingImportantToDataBase(DatabaseObject objectToSave) {
Integer lock;
synchronized (locks) {
lock = locks.get(objectToSave.getId());
if (lock == null) {
lock = new Integer(objectToSave.getId());
locks.put(lock, lock);
}
}
synchronized (lock) {
// synchronized work here (synchronized by objectToSave's id)
}
// no releasing needed, weakref does that for us, we're done!
}
And a more concrete example of how to use the above style system:
static WeakHashMap<Integer, Integer> locks = new WeakHashMap<>();
static Object getSyncObjectForId(int id) {
synchronized (locks) {
Integer lock = locks.get(id);
if (lock == null) {
lock = new Integer(id);
locks.put(lock, lock);
}
return lock;
}
}
Then use it elsewhere like this:
...
synchronized (getSyncObjectForId(id)) {
// synchronized work here
}
...
The reason this works is basically that if two objects with matching keys enter the critical block, the second will retrieve the lock the first is already using (or the one that is left behind and hasn't been GC'ed yet). However if it is unused, both will have left the method behind and removed their references to the lock object, so it is safely collected.
If you have a limited "known size" of synchronization points you want to use (one that doesn't have to decrease in size eventually), you could probably avoid using a HashMap and use a ConcurrentHashMap instead, with its putIfAbsent method which might be easier to understand.
My opinion is you are not struggling with a real threading problem.
You would be better off letting the DBMS automatically assign a non conflicting row id.
If you need to work with existing row ids store them as thread local variables.
If there is no need for shared data do not share data between threads.
http://download.oracle.com/javase/6/docs/api/java/lang/ThreadLocal.html
An Oracle dbms is much better in keeping the data consistent when an application server or a web container.
"Many database systems automatically generate a unique key field when a row is inserted. Oracle Database provides the same functionality with the help of sequences and triggers. JDBC 3.0 introduces the retrieval of auto-generated keys feature that enables you to retrieve such generated values. In JDBC 3.0, the following interfaces are enhanced to support the retrieval of auto-generated keys feature ...."
http://download.oracle.com/docs/cd/B19306_01/java.102/b14355/jdbcvers.htm#CHDEGDHJ
If you can live with occasional over-synchronization (ie. work done sequentially when not needed) try this:
Create a table with lock objects. The bigger table, the fewer chances for over-synchronizaton.
Apply some hashing function to your id to compute table index. If your id is numeric, you can just use a remainder (modulo) function, if it is a String, use hashCode() and a remainder.
Get a lock from the table and synchronize on it.
An IdLock class:
public class IdLock {
private Object[] locks = new Object[10000];
public IdLock() {
for (int i = 0; i < locks.length; i++) {
locks[i] = new Object();
}
}
public Object getLock(int id) {
int index = id % locks.length;
return locks[index];
}
}
and its use:
private idLock = new IdLock();
public void saveSomethingImportantToDataBase(Object theObjectIwantToSave) {
synchronized (idLock.getLock(theObjectIwantToSave.getId())) {
// synchronized work here
}
}
public static void saveSomethingImportantToDataBase(Object theObjectIwantToSave) {
synchronized (theObjectIwantToSave) {
if (!methodThatChecksThatObjectAlreadyExists) {
storemyObject() //pseudo code
}
// Have to do a lot other saving stuff, because it either saves everything or nothing
commit() // pseudo code to actually commit all my changes to the database.
}
}
The synchronized keyword locks the object you want so that no other method could access it.
I don't think you have any choice but to take one of the solutions that you do not seem to want to do.
In your case, I don't think any type of synchronization on the objectYouWantToSave is going to work since they are based on web requests. Therefore each request (on its own thread) is most likely going to have it's own instance of the object. Even though they might be considered logically equal, that doesn't matter for synchronization.
synchronized keyword (or another sync operation) is must but is not enough for your problem. You should use a data structure to store which integer values are used. In our example HashSet is used. Do not forget clean too old record from hashset.
private static HashSet <Integer>isUsed= new HashSet <Integer>();
public synchronized static void saveSomethingImportantToDataBase(Object theObjectIwantToSave) {
if(isUsed.contains(theObjectIwantToSave.your_integer_value) != null) {
if (!methodThatChecksThatObjectAlreadyExists) {
storemyObject() //pseudo code
}
// Have to do a lot other saving stuff, because it either saves everything or nothing
commit() // pseudo code to actually commit all my changes to the database.
isUsed.add(theObjectIwantToSave.your_integer_value);
}
}
To answer your question about locking the Integer, the short answer is NO - it won't prevent threads with another Integer instance with the same value from entering. The long answer: depends on how you obtain the Integer - by constructor, by reusing some instances or by valueOf (that uses some caching). Anyway, I wouldn't rely on it.
A working solution that will work is to make the method synchronized:
public static synchronized void saveSomethingImportantToDataBase(Object theObjectIwantToSave) {
if (!methodThatChecksThatObjectAlreadyExists) {
storemyObject() //pseudo code
}
// Have to do a lot other saving stuff, because it either saves everything or nothing
commit() // pseudo code to actually commit all my changes to the database.
}
This is probably not the best solution performance-wise, but it is guaranteed to work (note, if you are not in a clustered environment) until you find a better solution.
private static final Set<Object> lockedObjects = new HashSet<>();
private void lockObject(Object dbObject) throws InterruptedException {
synchronized (lockedObjects) {
while (!lockedObjects.add(dbObject)) {
lockedObjects.wait();
}
}
}
private void unlockObject(Object dbObject) {
synchronized (lockedObjects) {
lockedObjects.remove(dbObject);
lockedObjects.notifyAll();
}
}
public void saveSomethingImportantToDatabase(Object theObjectIwantToSave) throws InterruptedException {
try {
lockObject(theObjectIwantToSave);
if (!methodThatChecksThatObjectAlreadyExists(theObjectIwantToSave)) {
storeMyObject(theObjectIwantToSave);
}
commit();
} finally {
unlockObject(theObjectIwantToSave);
}
}
You must correctly override methods 'equals' and 'hashCode' for your objects' classes. If you have unique id (String or Number) inside your object then you can just check this id instead of the whole object and no need to override 'equals' and 'hashCode'.
try-finally - is very important - you must guarantee to unlock waiting threads after your operation even if your operation threw exception.
This approach will not work if your back-end is distributed across multiple servers.