ConcurrentHashMap<String, Config> configStore = new ConcurrentHashMap<>();
...
void updateStore() {
Config newConfig = generateNewConfig();
Config oldConfig = configStore.get(configName);
if (newConfig.replaces(oldConfig)) {
configStore.put(configName, newConfig);
}
}
The ConcurrentHashMap can be read by multiple threads but can be updated only by a single thread. I'd like to block the get() operations when a put() operation is in progress. The rationale here being that if a put() operation is in progress, that implies the current entry in the map is stale and all get() operations should block until the put() is complete. How can I go about achieving this in Java without synchronizing the whole map?
It surely looks like you can defer this to compute and it will take care for that for you:
Config newConfig = generateNewConfig();
configStore.compute(
newConfig,
(oldConfig, value) -> {
if (newConfig.replaces(oldConfig)) {
return key;
}
return oldConfig;
}
);
You get two guarantees from using this method:
Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple
and
The entire method invocation is performed atomically
according to its documentation.
The accepted answer proposed to use compute(...) instead of put().
But if you want
to block the get() operations when a put() operation is in progress
then you should also use compute(...) instead of get().
That's because for ConcurrentHashMap get() doesn't block while compute() is in progress.
Here is a unit test to prove it:
#Test
public void myTest() throws Exception {
var map = new ConcurrentHashMap<>(Map.of("key", "v1"));
var insideComputeLatch = new CountDownLatch(1);
var threadGet = new Thread(() -> {
try {
insideComputeLatch.await();
System.out.println("threadGet: before get()");
var v = map.get("key");
System.out.println("threadGet: after get() (v='" + v + "')");
} catch (InterruptedException e) {
throw new Error(e);
}
});
var threadCompute = new Thread(() -> {
System.out.println("threadCompute: before compute()");
map.compute("key", (k, v) -> {
try {
System.out.println("threadCompute: inside compute(): start");
insideComputeLatch.countDown();
threadGet.join();
System.out.println("threadCompute: inside compute(): end");
return "v2";
} catch (InterruptedException e) {
throw new Error(e);
}
});
System.out.println("threadCompute: after compute()");
});
threadGet.start();
threadCompute.start();
threadGet.join();
threadCompute.join();
}
Output:
threadCompute: before compute()
threadCompute: inside compute(): start
threadGet: before get()
threadGet: after get() (v='v1')
threadCompute: inside compute(): end
threadCompute: after compute()
This fundamentally doesn't work. Think about it: When the code realizes that the information is stale, some time passes and then a .put call is done. Even if the .put call somehow blocks, the timeline is as follows:
Some event occurs in the cosmos that makes your config stale.
Some time passes. [A]
Your run some code that realizes that this is the case.
Some time passes. [B]
Your code begins the .put call.
An extremely tiny amount of time passes. [C]
Your code finishes the .put call.
What you're asking for is a strategy that eliminates [C] while doing absolutely nothing whatsoever to prevent reads of stale data at point [A] and [B], both of which seem considerably more problematic.
Whatever, just give me the answer
ConcurrentHashMap is just wrong if you want this, it's a thing that is designed for multiple concurrent (hence the name) accesses. What you want is a plain old HashMap, where every access to it goes through a lock. Or, you can turn the logic around: The only way to do what you want is to engage a lock for everything (both reads and writes); at which point the 'Concurrent' part of ConcurrentHashMap has become completely pointless:
private final Object lock = new Object[0];
public void updateConfig() {
synchronized (lock) {
// do the stuff
}
}
public Config getConfig(String key) {
synchronized (lock) {
return configStore.get(key);
}
}
NB: Use private locks; public locks are like public fields. If there is an object that code outside of your control can get a ref to, and you lock on it, you need to describe the behaviour of your code in regards to that lock, and then sign up to maintain that behaviour forever, or indicate clearly when you change the behaviour that your API just went through a breaking change, and you should thus also bump the major version number.
For the same reason public fields are almost invariably a bad idea in light of the fact that you want API control, you want the refs you lock on to be not accessible to anything except code you have under your direct control. Hence why the above code does not use the synchronized keyword on the method itself (as this is usually a ref that leaks all over the place).
Okay, maybe I want the different answer
The answer is either 'it does not matter' or 'use locks'. If [C] truly is all you care about, that time is so short, and pales in comparison to the times for [A] and [B], that if A/B are acceptable, certainly so is C. In that case: Just accept the situation.
Alternatively, you can use locks but lock even before the data ever becomes stale. This timeline guarantees that no stale data reads can ever occur:
The cosmos cannot ever make your data stale.
Your code, itself, is the only causal agent for stale date.
Whenever code runs that will or may end up making data stale:
Acquire a lock before you even start.
Do the thing that (may) make some config stale.
Keep holding on to the lock; fix the config.
Release the lock.
How can I go about achieving this in Java without synchronizing the whole map?
There are some good answers here but there is a simpler answer to use the ConcurrentMap.replace(key, oldValue, newValue) method which is atomic.
while (true) {
Config newConfig = generateNewConfig();
Config oldConfig = configStore.get(configName);
if (!newConfig.replaces(oldConfig)) {
// nothing to do
break;
}
// this is atomic and will only replace the config if the old hasn't changed
if (configStore.replace(configName, oldConfig, newConfig)) {
// if we replaced it then we are done
break;
}
// otherwise, loop around and create a new config
}
Related
It comes to a surprise to me when I am trying to implement some compound actions with a BlockingQueue based producer/consumer pattern, which makes me think I most likely have missed something obvious.
1. In short
I need
my consumer to make sequence actions in form of ‘take obj from the queue + do more consumer operations on the obj’ atomic and
My producer to make sequence actions in form of ‘offer obj onto the queue + do more producer operations on the obj’ atomic and
The two above atomic sequences synchronized on the same obj, obviously
Without such atomicity, problem may occur, see 'PROBLEM!!' as an example in the comment in code for the producer in the following section 2.
But I can’t simply put a synchronized block around the call to take() and its associated consumer operations as when the queue is empty, this consumer will be stuck there FOREVER since it will still possess the sync lock while it waits on the producer to fill the queue with an obj, and that sync lock possession of consumer will in turn stop the producer from entering corresponding critical region to do any 'producing'.
2. Specially, simplified example code are as the following:
Common code known to the producer and consumer classes:
Queue<QObj> nbq = new ConcurrentLinkedQueue();
BlockingQueue<QObj> bq = new LinkedBlockingQueue<>();
List<String> idList = new LinkedList<>();
Object lockObj = idList;
int Idx = 1;
public static class QObj {
public String id;
public String content;
public QObj(String id, String content) {
this.id = id;
this.content = content;
}
}
Main logic in producer class:
public void produceBlocking() {
QObj o = new QObj(String.valueOf(Idx), "Content_" + Idx++);
// synchronized(lockObj) {
// no point to include Queue.offer(...) call in a synchronized block as we
// won't be able to use synchronized() in corresponding consumer anyway
// for the reason described above
bq.offer(o);
synchronized (lockObj) {
// PROBLEM!! by now, 'o' could have been 'consumed' already
// hence we shouldn't do the following operations:
// do the associated part of compound action of 'producer'
idList.add(o.id);
// do some more operation as part of this compound action ...
}
// }
}
Main logic in consumer class:
public void consumeBlocking() {
while (true) {
try {
// synchronized (lockObj) {
// can't simply put synchronized() here to make the following compound action atomic
// - when the queue is empty, this consumer will be stuck here forever since it still possesses
// the lockObj, which stops the producer from entering the critical region to do any 'producing'
QObj o = bq.take();
synchronized (lockObj) {
// do the associated part of compound action of 'consumer'
idList.remove(o.id);
// do some more operation as part of this compound action ...
}
// }
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
3. Why has this not been a common problem?
I feel that this must have been a common occurring problem when people are using BlockingQueue, and the fact that I couldn’t really locate anything addressing directly to a similar problem affirms my belief that I might have got something fundamentally wrong.
Can someone give some hint about a direct solution or point out where I thought wrong about this problem?
4. Alternative Ideas
I did think of a few ideas as alternatives, but I feel none of them is addressing this issue directly and all have some drawbacks (as highlighted 'DRAWBACK!!' in the comments in the code)
4.1 -
Do a check using Queue.contains() before continue
public void produceBlockingWithCheck() {
QObj o = new QObj(String.valueOf(Idx), "Content_" + Idx++);
bq.offer(o);
synchronized (lockObj) {
// First, Check if the obj could have already been consumed
// DRAWBACK!!: this could be very costly, e.g.
// when 'bq' is a LinkedBlockingQueue, and contains(...) always triggers
// a sequential traversal, the Queue itself can be very large
if (bq.contains(o)) {
// do the associated part of compound action of 'producer'
idList.add(o.id);
// do some more operation as part of this compound action ...
}
}
}
4.2 -
Adjust the order of ops on the producer, move the Queue.offer() call to the end
public void produceBlockingOrderAdjusted() {
QObj o = new QObj(String.valueOf(Idx), "Content_" + Idx++);
// do the associated part of compound action of 'producer', only before
// calling BlockingQueue.offer(...)
// DRAWBACK!!: even this may work for this simple case, such order adjustment
// won't not be logically possible for all cases, will it?
synchronized (lockObj) {
idList.add(o.id);
// do some more operation as part of this compound action ...
}
bq.offer(o);
}
4.3 -
Use non-blocking queues instead.
public void produceNonBlocking() {
QObj o = new QObj(String.valueOf(Idx), "Content_" + Idx++);
synchronized(lockObj) {
nbq.offer(o);
// do the associated part of compound action of 'producer'
idList.add(o.id);
// do some more operation as part of this compound action ...
}
}
public void consumeNonBlocking() {
while (true) {
synchronized (lockObj) {
// kind of doing our own blocking.
QObj o = nbq.poll();
if (o != null) {
// do the associated part of compound action of 'consumer'
idList.add(o.id);
// do some more operation as part of this compound action ...
}
// DRAWBACK!!: if the 'producers' don't produce faster than the 'consumers' consuming,
// this 'miss' could be happening too often and get costly
}
}
}
Why has this not been a common problem?
Multi-threading is like the old board game, "Othello," which was marketed with the tag line, "A minute to learn, a lifetime to master." Modern threading libraries make it easy to get started writing multi-threaded code, but it's not easy to design algorithms that use multi-threading effectively. Sometimes, the same design principles that underly efficient, single-threaded algorithms can be completely inappropriate to use in multi-threaded code.
An experienced designer knows that when thread A puts some object in a queue to be "consumed" by thread B, it's best to let thread A be done with that object for good. Simply taking the object out of the queue should be enough for thread B to have exclusive use of it. If you can't do that without adding complexity to your design,... Well, that's the price you pay for using multiple threads.
A multi-threaded, parallel computation that's only half as efficient as a single-threaded implementation could be still could run four times as fast if it's running on an eight core machine.
I need
my consumer to make sequence actions in form of ‘take obj from the queue + do more consumer operations on the obj’ atomic and
My producer to make sequence actions in form of ‘offer obj onto the queue + do more producer operations on the obj’ atomic and
The two above atomic sequences synchronized on the same obj, obviously
You can use wait+notifyAll for that.
Try to read this article:
it explains wait+notifyAll in details.
But I can’t simply put a synchronized block around the call to take() and its associated consumer operations as when the queue is empty, this consumer will be stuck there FOREVER since it will still possess the sync lock while it waits on the producer to fill the queue with an obj, and that sync lock possession of consumer will in turn stop the producer from entering corresponding critical region to do any 'producing'.
wait+notifyAll solves this problem because a thread that is waiting inside wait() releases the lock (and later when wait() need to return the thread acquires the lock again).
Also you can look at Condition javadocs.
Condition is the same concept as wait+notify but for Lock interface (which is more flexible and powerful version of synchronized).
Again, look at the BoundedBuffer example in the javadocs - it seems like it could be modified to do what you want in your code.
I was about to write something about this, but maybe it is better to have a second opinion before appearing like a fool...
So the idea in the next piece of code (android's room package v2.4.1, RoomTrackingLiveData), is that the winner thread is kept alive, and is forced to check for contention that may have entered the process (coming from losing threads) while computing.
While fail CAS operations performed by these losing threads keep them out from entering and executing code, preventing repeating signals (mComputeFunction.call() OR postValue()).
final Runnable mRefreshRunnable = new Runnable() {
#WorkerThread
#Override
public void run() {
if (mRegisteredObserver.compareAndSet(false, true)) {
mDatabase.getInvalidationTracker().addWeakObserver(mObserver);
}
boolean computed;
do {
computed = false;
if (mComputing.compareAndSet(false, true)) {
try {
T value = null;
while (mInvalid.compareAndSet(true, false)) {
computed = true;
try {
value = mComputeFunction.call();
} catch (Exception e) {
throw new RuntimeException("Exception while computing database"
+ " live data.", e);
}
}
if (computed) {
postValue(value);
}
} finally {
mComputing.set(false);
}
}
} while (computed && mInvalid.get());
}
};
final Runnable mInvalidationRunnable = new Runnable() {
#MainThread
#Override
public void run() {
boolean isActive = hasActiveObservers();
if (mInvalid.compareAndSet(false, true)) {
if (isActive) {
getQueryExecutor().execute(mRefreshRunnable);
}
}
}
};
The most obvious thing here is that atomics are being used for everything they are not good at:
Identifying losers and ignoring winners (what reactive patterns need).
AND a happens once behavior, performed by the loser thread.
So this is completely counter intuitive to what atomics are able to achieve, since they are extremely good at defining winners, AND anything that requires a "happens once" becomes impossible to ensure state consistency (the last one is suitable to start a philosophical debate about concurrency, and I will definitely agree with any conclusion).
If atomics are used as: "Contention checkers" and "Contention blockers" then we can implement the exact principle with a volatile check of an atomic reference after a successful CAS.
And checking this volatile against the snapshot/witness during every other step of the process.
private final AtomicInteger invalidationCount = new AtomicInteger();
private final IntFunction<Runnable> invalidationRunnableFun = invalidationVersion -> (Runnable) () -> {
if (invalidationVersion != invalidationCount.get()) return;
try {
T value = computeFunction.call();
if (invalidationVersion != invalidationCount.get()) return; //In case computation takes too long...
postValue(value);
} catch (Exception e) {
e.printStackTrace();
}
};
getQueryExecutor().execute(invalidationRunnableFun.apply(invalidationCount.incrementAndGet()));
In this case, each thread is left with the individual responsibility of checking their position in the contention lane, if their position moved and is not at the front anymore, it means that a new thread entered the process, and they should stop further processing.
This alternative is so laughably simple that my first question is:
Why didn't they do it like this?
Maybe my solution has a flaw... but the thing about the first alternative (the nested spin-lock) is that it follows the idea that an atomic CAS operation cannot be verified a second time, and that a verification can only be achieved with a cmpxchg process.... which is... false.
It also follows the common (but wrong) believe that what you define after a successful CAS is the sacred word of GOD... as I've seen code seldom check for concurrency issues once they enter the if body.
if (mInvalid.compareAndSet(false, true)) {
// Ummm... yes... mInvalid is still true...
// Let's use a second atomicReference just in case...
}
It also follows common code conventions that involve "double-<enter something>" in concurrency scenarios.
So only because the first code follows those ideas, is that I am inclined to believe that my solution is a valid and better alternative.
Even though there is an argument in favor of the "nested spin-lock" option, but does not hold up much:
The first alternative is "safer" precisely because it is SLOWER, so it has MORE time to identify contention at the end of the current of incoming threads.
BUT is not even 100% safe because of the "happens once" thing that is impossible to ensure.
There is also a behavior with the code, that, when it reaches the end of a continuos flow of incoming threads, 2 signals are dispatched one after the other, the second to last one, and then the last one.
But IF it is safer because it is slower, wouldn't that defeat the goal of using atomics, since their usage is supposed to be with the aim of being a better performance alternative in the first place?
I need to replace the first value in Deque with the new value, only
if the size will exceed the limit. I wrote this code to solve it:
final class Some {
final int buffer;
final Deque<Operation> operations = new ConcurrentLinkedDeque<>();
// constructors ommited;
#Override
public void register(final Operation operation) {
if (this.operations.size() == this.buffer) {
// remove the oldest operation
this.operations.removeFirst();
}
// add new operation to the tail
this.operations.addLast(operation);
}
#Override
public void apply() {
// take the fresh operation from tail and perform it
this.operations.removeLast().perform();
}
}
As you see, I have two methods, that modifies the Deque. I have doubts, that this code will work correctly in the multithreaded environment. The question is: is it safe to check the size() and then performing operations, that modifies the ConcurrentLinkedDeque afterward? I want to have as least locks as possible. So if this code won't work, then I had to introduce locking and then there is no point in the usage of ConcurrentLinkedDeque().
final class Some {
final int buffer;
final Deque<Operation> operations = new LinkedList<>();
final Lock lock = new ReentrantLock();
// constructors ommited;
#Override
public void register(final Operation operation) {
this.lock.lock();
try {
if (this.operations.size() == this.buffer) {
// remove the oldest operation
this.operations.removeFirst();
}
// add new operation to the tail
this.operations.addLast(operation);
} finally {
lock.unlock();
}
}
#Override
public void apply() {
this.lock.lock();
try {
// take the fresh operation from tail and perform it
this.operations.removeLast().perform();
} finally {
this.lock.unlock();
}
}
}
This is the alternative with the Lock. Is that the only way to achieve what I want? I am especially interested in trying to use the concurrent collections.
Concurrent collections are thread-safe when it comes to internal state. In other words, they
Allow multiple threads to read/write concurrently without having to worry that the internal state will become corrupted
Allow iteration and removal while other threads are modifying the collection
Not all, however. I believe CopyOnWriteArrayList's Iterator does not support the remove() operation
Guarantees things such as happens-before
Meaning a write by one thread will happen-before a read by a subsequent thread
However, they are not thread-safe across external method calls. When you call one method it will acquire whatever locks are necessary but those locks are released by the time the method returns. If you're not careful this can lead to a check-then-act race condition. Looking at your code
if (this.operations.size() == this.buffer) {
this.operations.removeFirst();
}
this.operations.addLast(operation);
the following can happen:
Thread-A checks size condition, result is false
Thread-A moves to add new Operation
Before Thread-A can add the Operation, Thread-B checks size condition which results in false as well
Thread-B goes to add new Operation
Thread-A does add new Operation
Oh, no! The Operation added by Thread-A causes the size threshold to be reached
Thread-B, already past the if statement, adds its Operation making the deque have one too many Operations
This is why a check-then-act requires external synchronization, which you do in your second example using a Lock. Note you could also use a synchronized block on the Deque.
Unrelated to your question: You call Operation.perform() in your second example while still holding the Lock. This means no other thread can attempt to add another Operation to the Deque while perform() executes. If this isn't desired you can change the code like so:
Operation op;
lock.lock();
try {
op = deque.pollLast(); // poll won't throw exception if there is no element
} finally {
lock.unlock();
}
if (op != null) {
op.perform();
}
From the doc of size()
BlockquoteBeware that, unlike in most collections, this method is NOT a constant-time operation. Because of the asynchronous nature of these deques, determining the current number of elements requires traversing them all to count them. Additionally, it is possible for the size to change during execution of this method, in which case the returned result will be inaccurate. Thus, this method is typically not very useful in concurrent applications.
While #Slaw is correct, also add that an addition/subtraction can occur during the traversal.
I don't use size() in my software. I keep my own count of what is in the collection with an AtomicInteger. If count.get() < max, I can add. Being a little over max is ok for my usage. You can use a lock on count to force compliance.
I understand the overall concepts of multi-threading and synchronization but am new to writing thread-safe code. I currently have the following code snippet:
synchronized(compiledStylesheets) {
if(compiledStylesheets.containsKey(xslt)) {
exec = compiledStylesheets.get(xslt);
} else {
exec = compile(s, imports);
compiledStylesheets.put(xslt, exec);
}
}
where compiledStylesheets is a HashMap (private, final). I have a few questions.
The compile method can take a few hundred milliseconds to return. This seems like a long time to have the object locked, but I don't see an alternative. Also, it is unnecessary to use Collections.synchronizedMap in addition to the synchronized block, correct? This is the only code that hits this object other than initialization/instantiation.
Alternatively, I know of the existence of a ConcurrentHashMap but I don't know if that's overkill. The putIfAbsent() method will not be usable in this instance because it doesn't allow me to skip the compile() method call. I also don't know if it will solve the "modified after containsKey() but before put()" problem, or if that's even really a concern in this case.
Edit: Spelling
For tasks of this nature, I highly recommend Guava caching support.
If you can't use that library, here is a compact implementation of a Multiton. Use of the FutureTask was a tip from assylias, here, via OldCurmudgeon.
public abstract class Cache<K, V>
{
private final ConcurrentMap<K, Future<V>> cache = new ConcurrentHashMap<>();
public final V get(K key)
throws InterruptedException, ExecutionException
{
Future<V> ref = cache.get(key);
if (ref == null) {
FutureTask<V> task = new FutureTask<>(new Factory(key));
ref = cache.putIfAbsent(key, task);
if (ref == null) {
task.run();
ref = task;
}
}
return ref.get();
}
protected abstract V create(K key)
throws Exception;
private final class Factory
implements Callable<V>
{
private final K key;
Factory(K key)
{
this.key = key;
}
#Override
public V call()
throws Exception
{
return create(key);
}
}
}
I think you are looking for a Multiton.
There's a very good Java one here that #assylas posted some time ago.
You can loosen the lock at the risk of an occasional doubly compiled stylesheet in race condition.
Object y;
// lock here if needed
y = map.get(x);
if(y == null) {
y = compileNewY();
// lock here if needed
map.put(x, y); // this may happen twice, if put is t.s. one will be ignored
y = map.get(x); // essential because other thread's y may have been put
}
This requires get and put to be atomic, which is true in the case of ConcurrentHashMap and you can achieve by wrapping individual calls to get and put with a lock in your class. (As I tried to explain with "lock here if needed" comments - the point being you only need to wrap individual calls, not have one big lock).
This is a standard thread safe pattern to use even with ConcurrentHashMap (and putIfAbsent) to minimize the cost of compiling twice. It still needs to be acceptable to compile twice sometimes, but it should be okay even if expensive.
By the way, you can solve that problem. Usually the above pattern isn't used with a heavy function like compileNewY but a lightweight constructor new Y(). e.g. do this:
class PrecompiledY {
public volatile Y y;
private final AtomicBoolean compiled = new AtomicBoolean(false);
public void compile() {
if(!compiled.getAndSet(true)) {
y = compile();
}
}
}
// ...
ConcurrentMap<X, PrecompiledY> myMap; // alternatively use proper locking
py = map.get(x);
if(py == null) {
py = new PrecompiledY(); // much cheaper than compiling
map.put(x, y); // this may happen twice, if put is t.s. one will be ignored
y = map.get(x); // essential because other thread's y may have been put
y.compile(); // object that didn't get inserted never gets compiled
}
Also:
Alternatively, I know of the existence of a ConcurrentHashMap but I don't know if that's overkill.
Given that your code is heavily locking, ConcurrentHashMap is almost certainly far faster, so not overkill. (And much more likely to be bug-free. Concurrency bugs are not fun to fix.)
Please see Erickson's comment below. Using double-checked locking with Hashmaps is not very smart
The compile method can take a few hundred milliseconds to return. This seems like a long time to have the object locked, but I don't see an alternative.
You can use double-checked locking, and note that you don't need any lock before get since you never remove anything from the map.
if(compiledStylesheets.containsKey(xslt)) {
exec = compiledStylesheets.get(xslt);
} else {
synchronized(compiledStylesheets) {
if(compiledStylesheets.containsKey(xslt)) {
// another thread might have created it while
// this thread was waiting for lock
exec = compiledStylesheets.get(xslt);
} else {
exec = compile(s, imports);
compiledStylesheets.put(xslt, exec);
}
}
}
}
Also, it is unnecessary to use Collections.synchronizedMap in addition to the synchronized block, correct?
Correct
This is the only code that hits this object other than initialization/instantiation.
First of all, the code as you posted it is race-condition-free because containsKey() result will never change while compile() method is running.
Collections.synchronizedMap() is useless for your case as stated above because it wraps all map methods into a synchronized block using either this as a mutex or another object you provided (for two-argument version).
IMO using ConcurrentHashMap is also not an option because it stripes locks based on key hashCode() result; its concurrent iterators is also useless here.
If you really want compile() out of synchronized block, you may pre-calculate if before checking containsKey(). This may draw the overall performance back, but may be better than calling it in synchronized block. To make a decision, personally I would consider how often key "miss" is happening and so, which option is preferrable - keep the lock for longer times or calculate your stuff always.
I have a web application and I am using Oracle database and I have a method basically like this:
public static void saveSomethingImportantToDataBase(Object theObjectIwantToSave) {
if (!methodThatChecksThatObjectAlreadyExists) {
storemyObject() //pseudo code
}
// Have to do a lot other saving stuff, because it either saves everything or nothing
commit() // pseudo code to actually commit all my changes to the database.
}
Right now there is no synchronization of any kind so n threads can of course access this method freely, the problem arises when 2 threads enter this method both check and of course there is nothing just yet, and then they can both commit the transaction, creating a duplicate object.
I do not want to solve this with a unique key identifier in my Database, because I don't think I should be catching that SQLException.
I also cannot check right before the commit, because there are several checks not only 1, which would take a considerable amount of time.
My experience with locks and threads is limited, but my idea is basically to lock this code on the object that it is receiving. I don't know if for example say I receive an Integer Object, and I lock on my Integer with value 1, would that only prevent threads with another Integer with value 1 from entering, and all the other threads with value != 1 can enter freely?, is this how it works?.
Also if this is how it works, how is the lock object compared? how is it determined that they are in fact the same object?. A good article on this would also be appreciated.
How would you solve this?.
Your idea is a good one. This is the simplistic/naive version, but it's unlikely to work:
public static void saveSomethingImportantToDataBase(Object theObjectIwantToSave) {
synchronized (theObjectIwantToSave) {
if (!methodThatChecksThatObjectAlreadyExists) {
storemyObject() //pseudo code
}
// Have to do a lot other saving stuff, because it either saves everything or nothing
commit() // pseudo code to actually commit all my changes to the database.
}
}
This code uses the object itself as the lock. But it has to be the same object (ie objectInThreadA == objectInThreadB) if it's to work. If two threads are operating on an object that is a copy of each other - ie has the same "id" for example, then you'll need to either synchronize the whole method:
public static synchronized void saveSomethingImportantToDataBase(Object theObjectIwantToSave) ...
which will of course greatly reduce concurrency (throughput will drop to one thread at a time using the method - to be avoided).
Or find a way to get the same lock object based on the save object, like this approach:
private static final ConcurrentHashMap<Object, Object> LOCKS = new ConcurrentHashMap<Object, Object>();
public static void saveSomethingImportantToDataBase(Object theObjectIwantToSave) {
synchronized (LOCKS.putIfAbsent(theObjectIwantToSave.getId(), new Object())) {
....
}
LOCKS.remove(theObjectIwantToSave.getId()); // Clean up lock object to stop memory leak
}
This last version it the recommended one: It will ensure that two save objects that share the same "id" are locked with the same lock object - the method ConcurrentHashMap.putIfAbsent() is threadsafe, so "this will work", and it requires only that objectInThreadA.getId().equals(objectInThreadB.getId()) to work properly. Also, the datatype of getId() can be anything, including primitives (eg int) due to java's autoboxing.
If you override equals() and hashcode() for your object, then you could use the object itself instead of object.getId(), and that would be an improvement (Thanks #TheCapn for pointing this out)
This solution will only work with in one JVM. If your servers are clustered, that a whole different ball game and java's locking mechanism will not help you. You'll have to use a clustered locking solution, which is beyond the scope of this answer.
Here is an option adapted from And360's comment on Bohemian's answer, that tries to avoid race conditions, etc. Though I prefer my other answer to this question over this one, slightly:
import java.util.HashMap;
import java.util.concurrent.atomic.AtomicInteger;
// it is no advantage of using ConcurrentHashMap, since we synchronize access to it
// (we need to in order to "get" the lock and increment/decrement it safely)
// AtomicInteger is just a mutable int value holder
// we don't actually need it to be atomic
static final HashMap<Object, AtomicInteger> locks = new HashMap<Integer, AtomicInteger>();
public static void saveSomethingImportantToDataBase(Object objectToSave) {
AtomicInteger lock;
synchronized (locks) {
lock = locks.get(objectToSave.getId());
if (lock == null) {
lock = new AtomicInteger(1);
locks.put(objectToSave.getId(), lock);
}
else
lock.incrementAndGet();
}
try {
synchronized (lock) {
// do synchronized work here (synchronized by objectToSave's id)
}
} finally {
synchronized (locks) {
lock.decrementAndGet();
if (lock.get() == 0)
locks.remove(id);
}
}
}
You could split these out into helper methods "get lock object" and "release lock" or what not, as well, to cleanup the code. This way feels a little more kludgey than my other answer.
Bohemian's answer seems to have race condition problems if one thread is in the synchronized section while another thread removes the synchro-object from the Map, etc. So here is an alternative that leverages WeakRef's.
// there is no synchronized weak hash map, apparently
// and Collections.synchronizedMap has no putIfAbsent method, so we use synchronized(locks) down below
WeakHashMap<Integer, Integer> locks = new WeakHashMap<>();
public void saveSomethingImportantToDataBase(DatabaseObject objectToSave) {
Integer lock;
synchronized (locks) {
lock = locks.get(objectToSave.getId());
if (lock == null) {
lock = new Integer(objectToSave.getId());
locks.put(lock, lock);
}
}
synchronized (lock) {
// synchronized work here (synchronized by objectToSave's id)
}
// no releasing needed, weakref does that for us, we're done!
}
And a more concrete example of how to use the above style system:
static WeakHashMap<Integer, Integer> locks = new WeakHashMap<>();
static Object getSyncObjectForId(int id) {
synchronized (locks) {
Integer lock = locks.get(id);
if (lock == null) {
lock = new Integer(id);
locks.put(lock, lock);
}
return lock;
}
}
Then use it elsewhere like this:
...
synchronized (getSyncObjectForId(id)) {
// synchronized work here
}
...
The reason this works is basically that if two objects with matching keys enter the critical block, the second will retrieve the lock the first is already using (or the one that is left behind and hasn't been GC'ed yet). However if it is unused, both will have left the method behind and removed their references to the lock object, so it is safely collected.
If you have a limited "known size" of synchronization points you want to use (one that doesn't have to decrease in size eventually), you could probably avoid using a HashMap and use a ConcurrentHashMap instead, with its putIfAbsent method which might be easier to understand.
My opinion is you are not struggling with a real threading problem.
You would be better off letting the DBMS automatically assign a non conflicting row id.
If you need to work with existing row ids store them as thread local variables.
If there is no need for shared data do not share data between threads.
http://download.oracle.com/javase/6/docs/api/java/lang/ThreadLocal.html
An Oracle dbms is much better in keeping the data consistent when an application server or a web container.
"Many database systems automatically generate a unique key field when a row is inserted. Oracle Database provides the same functionality with the help of sequences and triggers. JDBC 3.0 introduces the retrieval of auto-generated keys feature that enables you to retrieve such generated values. In JDBC 3.0, the following interfaces are enhanced to support the retrieval of auto-generated keys feature ...."
http://download.oracle.com/docs/cd/B19306_01/java.102/b14355/jdbcvers.htm#CHDEGDHJ
If you can live with occasional over-synchronization (ie. work done sequentially when not needed) try this:
Create a table with lock objects. The bigger table, the fewer chances for over-synchronizaton.
Apply some hashing function to your id to compute table index. If your id is numeric, you can just use a remainder (modulo) function, if it is a String, use hashCode() and a remainder.
Get a lock from the table and synchronize on it.
An IdLock class:
public class IdLock {
private Object[] locks = new Object[10000];
public IdLock() {
for (int i = 0; i < locks.length; i++) {
locks[i] = new Object();
}
}
public Object getLock(int id) {
int index = id % locks.length;
return locks[index];
}
}
and its use:
private idLock = new IdLock();
public void saveSomethingImportantToDataBase(Object theObjectIwantToSave) {
synchronized (idLock.getLock(theObjectIwantToSave.getId())) {
// synchronized work here
}
}
public static void saveSomethingImportantToDataBase(Object theObjectIwantToSave) {
synchronized (theObjectIwantToSave) {
if (!methodThatChecksThatObjectAlreadyExists) {
storemyObject() //pseudo code
}
// Have to do a lot other saving stuff, because it either saves everything or nothing
commit() // pseudo code to actually commit all my changes to the database.
}
}
The synchronized keyword locks the object you want so that no other method could access it.
I don't think you have any choice but to take one of the solutions that you do not seem to want to do.
In your case, I don't think any type of synchronization on the objectYouWantToSave is going to work since they are based on web requests. Therefore each request (on its own thread) is most likely going to have it's own instance of the object. Even though they might be considered logically equal, that doesn't matter for synchronization.
synchronized keyword (or another sync operation) is must but is not enough for your problem. You should use a data structure to store which integer values are used. In our example HashSet is used. Do not forget clean too old record from hashset.
private static HashSet <Integer>isUsed= new HashSet <Integer>();
public synchronized static void saveSomethingImportantToDataBase(Object theObjectIwantToSave) {
if(isUsed.contains(theObjectIwantToSave.your_integer_value) != null) {
if (!methodThatChecksThatObjectAlreadyExists) {
storemyObject() //pseudo code
}
// Have to do a lot other saving stuff, because it either saves everything or nothing
commit() // pseudo code to actually commit all my changes to the database.
isUsed.add(theObjectIwantToSave.your_integer_value);
}
}
To answer your question about locking the Integer, the short answer is NO - it won't prevent threads with another Integer instance with the same value from entering. The long answer: depends on how you obtain the Integer - by constructor, by reusing some instances or by valueOf (that uses some caching). Anyway, I wouldn't rely on it.
A working solution that will work is to make the method synchronized:
public static synchronized void saveSomethingImportantToDataBase(Object theObjectIwantToSave) {
if (!methodThatChecksThatObjectAlreadyExists) {
storemyObject() //pseudo code
}
// Have to do a lot other saving stuff, because it either saves everything or nothing
commit() // pseudo code to actually commit all my changes to the database.
}
This is probably not the best solution performance-wise, but it is guaranteed to work (note, if you are not in a clustered environment) until you find a better solution.
private static final Set<Object> lockedObjects = new HashSet<>();
private void lockObject(Object dbObject) throws InterruptedException {
synchronized (lockedObjects) {
while (!lockedObjects.add(dbObject)) {
lockedObjects.wait();
}
}
}
private void unlockObject(Object dbObject) {
synchronized (lockedObjects) {
lockedObjects.remove(dbObject);
lockedObjects.notifyAll();
}
}
public void saveSomethingImportantToDatabase(Object theObjectIwantToSave) throws InterruptedException {
try {
lockObject(theObjectIwantToSave);
if (!methodThatChecksThatObjectAlreadyExists(theObjectIwantToSave)) {
storeMyObject(theObjectIwantToSave);
}
commit();
} finally {
unlockObject(theObjectIwantToSave);
}
}
You must correctly override methods 'equals' and 'hashCode' for your objects' classes. If you have unique id (String or Number) inside your object then you can just check this id instead of the whole object and no need to override 'equals' and 'hashCode'.
try-finally - is very important - you must guarantee to unlock waiting threads after your operation even if your operation threw exception.
This approach will not work if your back-end is distributed across multiple servers.