In JCIP book, Listing 5.19 Final Implementation of Memorizer. My questions are:
The endless while loop is here because of atomic putIfAbsent()?
should the while loop just inside impl of putIfAbsent() instead of client code?
should the while loop be in smaller scope just wrapping putIfAbsent()?
while loop looks bad on readability
Code:
public class Memorizer<A, V> implements Computable<A, V> {
private final ConcurrentMap<A, Future<V>> cache
= new ConcurrentHashMap<A, Future<V>>();
private final Computable<A, V> c;
public Memorizer(Computable<A, V> c) { this.c = c; }
public V compute(final A arg) throws InterruptedException {
while (true) { //<==== WHY?
Future<V> f = cache.get(arg);
if (f == null) {
Callable<V> eval = new Callable<V>() {
public V call() throws InterruptedException {
return c.compute(arg);
}
};
FutureTask<V> ft = new FutureTask<V>(eval);
f = cache.putIfAbsent(arg, ft);
if (f == null) { f = ft; ft.run(); }
}
try {
return f.get();
} catch (CancellationException e) {
cache.remove(arg, f);
} catch (ExecutionException e) {
throw launderThrowable(e.getCause());
}
}
}
}
1) The endless while loop is here because of atomic putIfAbsent()?
The while loop here is for repeating computation when a computation was cancelled (first case in try).
2) Should the while loop just inside impl of putIfAbsent() instead of client code?
No, please, read what putIfAbsent does. It just tries to put an object once only.
3) Should the while loop be in smaller scope just wrapping putIfAbsent()?
No, it shouldn't. See #1.
4) While loop looks bad on readability.
You are free to offer something better. In fact, this construction suites perfect for situation when you have to try to do something until it proceeds successfully.
No, you cannot reduce the scope of the while loop. You want to do f.get() on the value that is in the cache. If there was no value for arg in the map, you want to execute get() on your result, otherwise you want to get the existing value for arg in the map and get() that one.
The problem is that there are no locks in this implementation, so between you checking if there is a value and trying to insert a value, another thread could have inserted its own value. Equally, it could be the case that between the insertion failing and the retrieval, the value could have been removed from the cache (due to an CancellationException). Because of these failure cases, you spin in the while(true) until either you can get the canonical value out of the map or you insert a new value into the map (making your value canonical).
It would seem that you could try to more the f.get() out of the loop, but that is kept in due to the risk of an CancellationException, where you want to keep trying.
Related
I have two functions which must run in a critical section:
public synchronized void f1() { ... }
public synchronized void f2() { ... }
Assume that the behavior is as following:
f1 is almost never called. Actually, under normal conditions, this method is never called. If f1 is called anyway, it should return quickly.
f2 is called at a very high rate. It returns very quickly.
These methods never call each other and there is no reentrancy as well.
In other words, there is very low contention. So when f2 is called, we have some overhead to acquire the lock, which is granted immediately in 99,9% of the cases. I am wondering if there are approaches to avoid this overhead.
I came up with the following alternative:
private final AtomicInteger lock = new AtomicInteger(0);
public void f1() {
while (!lock.compareAndSet(0, 1)) {}
try {
...
} finally {
lock.set(0);
}
}
public void f2() {
while (!lock.compareAndSet(0, 2)) {}
try {
...
} finally {
lock.set(0);
}
}
Are there other approaches? Does the java.util.concurrent package offer something natively?
update
Although my intention is to have a generic question, some information regarding my situation:
f1: This method creates a new remote stream, if for some reason the current one becomes corrupt, for example due to a timeout. A remote stream could be considered as a socket connection which consumes a remote queue starting from a given location:
private Stream stream;
public synchronized void f1() {
final Stream stream = new Stream(...);
if (this.stream != null) {
stream.setPosition(this.stream.getPosition());
}
this.stream = stream;
return stream;
}
f2: This method advances the stream position. It is a plain setter:
public synchronized void f2(Long p) {
stream.setPosition(p);
}
Here, stream.setPosition(Long) is implemented as a plain setter as well:
public class Stream {
private volatile Long position = 0;
public void setPosition(Long position) {
this.position = position;
}
}
In Stream, the current position will be sent to the server periodically asynchronously. Note that Stream is not implemented by myself.
My idea was to introduce compare-and-swap as illustrated above, and mark stream as volatile.
Your example isn't doing what you want it to. You are actually executing your code when the lock is being used. Try something like this:
public void f1() {
while (!lock.compareAndSet(0, 1)) {
}
try {
...
} finally {
lock.set(0);
}
}
To answer your question, I don't believe that this will be any faster than using synchronized methods, and this method is harder to read and comprehend.
From the description and your example code, I've inferred the following:
Stream has its own internal position, and you're also tracking the most recent position externally. You use this as a sort of 'resume point': when you need to reinitialize the stream, you advance it to this point.
The last known position may be stale; I'm assuming this based on your assertion that the stream periodically does asynchronously notifies the server of its current position.
At the time f1 is called, the stream is known to be in a bad state.
The functions f1 and f2 access the same data, and may run concurrently. However, neither f1 nor f2 will ever run concurrently against itself. In other words, you almost have a single-threaded program, except for the rare cases when both f1 and f2 are executing.
[Side note: My solution doesn't actually care if f1 gets called concurrently with itself; it only cares that f2 is not called concurrently with itself]
If any of this is wrong, then the solution below is wrong. Heck, it might be wrong anyway, either because of some detail left out, or because I made a mistake. Writing low-lock code is hard, which is exactly why you should avoid it unless you've observed an actual performance issue.
static class Stream {
private long position = 0L;
void setPosition(long position) {
this.position = position;
}
}
final static class StreamInfo {
final Stream stream = new Stream();
volatile long resumePosition = -1;
final void setPosition(final long position) {
stream.setPosition(position);
resumePosition = position;
}
}
private final Object updateLock = new Object();
private final AtomicReference<StreamInfo> currentInfo = new AtomicReference<>(new StreamInfo());
void f1() {
synchronized (updateLock) {
final StreamInfo oldInfo = currentInfo.getAndSet(null);
final StreamInfo newInfo = new StreamInfo();
if (oldInfo != null && oldInfo.resumePosition > 0L) {
newInfo.setPosition(oldInfo.resumePosition);
}
// Only `f2` can modify `currentInfo`, so update it last.
currentInfo.set(newInfo);
// The `f2` thread might be waiting for us, so wake them up.
updateLock.notifyAll();
}
}
void f2(final long newPosition) {
while (true) {
final StreamInfo s = acquireStream();
s.setPosition(newPosition);
s.resumePosition = newPosition;
// Make sure the stream wasn't replaced while we worked.
// If it was, run again with the new stream.
if (acquireStream() == s) {
break;
}
}
}
private StreamInfo acquireStream() {
// Optimistic concurrency: hope we get a stream that's ready to go.
// If we fail, branch off into a slower code path that waits for it.
final StreamInfo s = currentInfo.get();
return s != null ? s : acquireStreamSlow();
}
private StreamInfo acquireStreamSlow() {
synchronized (updateLock) {
while (true) {
final StreamInfo s = currentInfo.get();
if (s != null) {
return s;
}
try {
updateLock.wait();
}
catch (final InterruptedException ignored) {
}
}
}
}
If the stream has faulted and is being replaced by f1, it is possible that an earlier call to f2 is still performing some operations on the (now defunct) stream. I'm assuming this is okay, and that it won't introduce undesirable side effects (beyond those already present in your lock-based version). I make this assumption because we've already established in the list above that your resume point may be stale, and we also established that f1 is only called once the stream is known to be in a bad state.
Based on my JMH benchmarks, this approach is around 3x faster than the CAS or synchronized versions (which are pretty close themselves).
Another approach is to use a timestamp lock which works like a modification count. This works well if you have a high read to write ratio.
Another approach is to have an immutable object which stores state via an AtomicReference. This works well if you have a very high read to write ratio.
Java Concurrency In Practice by Brian Goetz provides an example of a efficient scalable cache for concurrent use. The final version of the example showing the implementation for class Memoizer (pg 108) shows such a cache. I am wondering why there were an inner and outer check of the if (f == null).
The second one does not make any sense because:
there is a check ahead and the immediate last step ahead will definitely return a not-null value out of cache.putIfAbsent(arg,
ft);
the ft.run() inside the second check does not make any sense because f.get() will be called immediately thereafter.
Here is the code for Memoizer:
public class Memoizer<A, V> implements Computable<A, V> {
private final ConcurrentMap<A, Future<V>> cache
= new ConcurrentHashMap<A, Future<V>>();
private final Computable<A, V> c;
public Memoizer(Computable<A, V> c) { this.c = c; }
public V compute(final A arg) throws InterruptedException {
while (true) {
Future<V> f = cache.get(arg);
if (f == null) {
Callable<V> eval = new Callable<V>() {
public V call() throws InterruptedException {
return c.compute(arg);
}
};
FutureTask<V> ft = new FutureTask<V>(eval);
f = cache.putIfAbsent(arg, ft);
if (f == null) { f = ft; ft.run(); }
}
try {
return f.get();
} catch (CancellationException e) {
cache.remove(arg, f);
} catch (ExecutionException e) {
throw launderThrowable(e.getCause());
}
}
}
there is a check ahead and the immediate last step ahead will definitely return a not-null value out of cache.putIfAbsent(arg, ft);
If there is only a single thread calling compute, then cache.putIfAbsent(arg, ft); will always return null, as there is no previous value.
If there are two or more threads calling the compute method at the same time, then only one of them will get null out of cache.putIfAbsent(arg, ft);, the others will get the value of ft that the thread which got null created.
In that case, the other threads throw away their instance of FutureTask and continue with the instance that they received from cache.putIfAbsent(arg, ft);
the ft.run() inside the second check does not make any sense because f.get() will be called immediately thereafter.
You need to run a FutureTask in order to get the value from it later. If you don't call run you won't ever get a value. The thread that created the FutureTask that got stored in the cache, will run it and then get will return immediately, because it is already complete at that point.
But the other threads that called compute at the same time and that got a non-null value from putIfAbsent, will go to the get call and wait until the first thread is done with the run method.
I have already topic with same code:
public abstract class Digest {
private Map<String, byte[]> cache = new HashMap<>();
public byte[] digest(String input) {
byte[] result = cache.get(input);
if (result == null) {
synchronized (cache) {
result = cache.get(input);
if (result == null) {
result = doDigest(input);
cache.put(input, result);
}
}
}
return result;
}
protected abstract byte[] doDigest(String input);
}
At previous I got prove that code is not thread safe.
At this topic I want to provide solutions which I have in my head and I ask you to review these solutions:
Solution#1 through ReadWriteLock:
public abstract class Digest {
private final ReadWriteLock rwl = new ReentrantReadWriteLock();
private final Lock readLock = rwl.readLock();
private final Lock writeLock = rwl.writeLock();
private Map<String, byte[]> cache = new HashMap<>(); // I still don't know should I use volatile or not
public byte[] digest(String input) {
byte[] result = null;
readLock.lock();
try {
result = cache.get(input);
} finally {
readLock.unlock();
}
if (result == null) {
writeLock.lock();
try {
result = cache.get(input);
if (result == null) {
result = doDigest(input);
cache.put(input, result);
}
} finally {
writeLock.unlock();
}
}
return result;
}
protected abstract byte[] doDigest(String input);
}
Solution#2 through CHM
public abstract class Digest {
private Map<String, byte[]> cache = new ConcurrentHashMap<>(); //should be volatile?
public byte[] digest(String input) {
return cache.computeIfAbsent(input, this::doDigest);
}
protected abstract byte[] doDigest(String input);
}
Please review correctness of both solutions. It is not question about what the solution better. I undestand that CHM better. Please, review correctnes of implementation
Unlike the clusterfudge we got into in the last question, this is better.
As was shown in the prefious question's duplicate, the original code is not thread-safe since HashMap is not threadsafe and the initial get() can be called while the put() is being executed inside the synchronized block. This can break all sorts of things, so that's definitely not threadsafe.
The second solution is thread-safe, since all accesses to cache are done in guarded code. The inital get() is protected by a readlock, and the put() is done inside a writelock, guaranteeing that threads can't read the cache while it's being written to, but they're free to read it at the same time as other reading threads. No concurrency issues, no visibility issues, no chances of deadlocks. Everything's fine.
The last is of course the most elegant one. Since computeIfAbsent() is an atomic operation, it guarantees that the value is either directly returned or computed at most once, from the javadoc:
If the specified key is not already associated with a value, attempts
to compute its value using the given mapping function and enters it
into this map unless null. The entire method invocation is
performed atomically, so the function is applied at most once per key.
Some attempted update operations on this map by other threads may be
blocked while computation is in progress, so the computation should be
short and simple, and must not attempt to update any other mappings of
this map.
The Map in question shouldn't be volatile, but it should be final. If it's not final, it could (at least in theory) be changed and it would be possible for 2 threads to work on different objects, which is not what you want.
I'm dealing with some third-party library code that involves creating expensive objects and caching them in a Map. The existing implementation is something like
lock.lock()
try {
Foo result = cache.get(key);
if (result == null) {
result = createFooExpensively(key);
cache.put(key, result);
}
return result;
} finally {
lock.unlock();
}
Obviously this is not the best design when Foos for different keys can be created independently.
My current hack is to use a Map of Futures:
lock.lock();
Future<Foo> future;
try {
future = allFutures.get(key);
if (future == null) {
future = executorService.submit(new Callable<Foo>() {
public Foo call() {
return createFooExpensively(key);
}
});
allFutures.put(key, future);
}
} finally {
lock.unlock();
}
try {
return future.get();
} catch (InterruptedException e) {
throw new MyRuntimeException(e);
} catch (ExecutionException e) {
throw new MyRuntimeException(e);
}
But this seems... a little hacky, for two reasons:
The work is done on an arbitrary pooled thread. I'd be happy to have the work
done on the first thread that tries to get that particular key, especially since
it's going to be blocked anyway.
Even when the Map is fully populated, we still go through Future.get() to get
the results. I expect this is pretty cheap, but it's ugly.
What I'd like is to replace cache with a Map that will block gets for a given key until that key has a value, but allow other gets meanwhile. Does any such thing exist? Or does someone have a cleaner alternative to the Map of Futures?
Creating a lock per key sounds tempting, but it may not be what you want, especially when the number of keys is large.
As you would probably need to create a dedicated (read-write) lock for each key, it has impact on your memory usage. Also, that fine granularity may hit a point of diminishing returns given a finite number of cores if concurrency is truly high.
ConcurrentHashMap is oftentimes a good enough solution in a situation like this. It provides normally full reader concurrency (normally readers do not block), and updates can be concurrent up to the level of concurrency level desired. This gives you pretty good scalability. The above code may be expressed with ConcurrentHashMap like the following:
ConcurrentMap<Key,Foo> cache = new ConcurrentHashMap<>();
...
Foo result = cache.get(key);
if (result == null) {
result = createFooExpensively(key);
Foo old = cache.putIfAbsent(key, result);
if (old != null) {
result = old;
}
}
The straightforward use of ConcurrentHashMap does have one drawback, which is that multiple threads may find that the key is not cached, and each may invoke createFooExpensively(). As a result, some threads may do throw-away work. To avoid this, you would want to use the memoizer pattern that's mentioned in "Java Concurrency in Practice".
But then again, the nice folks at Google already solved these problems for you in the form of CacheBuilder:
LoadingCache<Key,Foo> cache = CacheBuilder.newBuilder().
concurrencyLevel(32).
build(new CacheLoader<Key,Foo>() {
public Foo load(Key key) {
return createFooExpensively(key);
}
});
...
Foo result = cache.get(key);
You can use funtom-java-utils - PerKeySynchronizedExecutor.
It will create a lock for each key but will clear it for you immediately when it becomes unused.
It will also grantee memory visibility between invocations with the same key, and is designed to be very fast and minimize the contention between invocations off different keys.
Declare it in your class:
final PerKeySynchronizedExecutor<KEY_CLASS> executor = new PerKeySynchronizedExecutor<>();
Use it:
Foo foo = executor.execute(key, () -> createFooExpensively());
public class Cache {
private static final Set<String> lockedKeys = new HashSet<>();
private void lock(String key) {
synchronized (lockedKeys) {
while (!lockedKeys.add(key)) {
try {
lockedKeys.wait();
} catch (InterruptedException e) {
log.error("...");
throw new RuntimeException(e);
}
}
}
}
private void unlock(String key) {
synchronized (lockedKeys) {
lockedKeys.remove(key);
lockedKeys.notifyAll();
}
}
public Foo getFromCache(String key) {
try {
lock(key);
Foo result = cache.get(key);
if (result == null) {
result = createFooExpensively(key);
cache.put(key, result);
}
return result;
//For different keys it is executed in parallel.
//For the same key it is executed synchronously.
} finally {
unlock(key);
}
}
}
key can be not only a 'String' but any class with correctly overridden 'equals' and 'hashCode' methods.
try-finally - is very important - you must guarantee to unlock waiting threads after your operation even if your operation threw exception.
It will not work if your back-end is distributed across multiple servers/JVMs.
I'm having a Map object that could be null or simply cleared when the application first starts. I need all threads accessing this map to block till the map is initialized and only then I need to signal all threads to access this map.
This map holds configuration data and it will be for reading only unless a single threads decides to refresh to load new configuration data (So it doesn't need to Synchronized for the sake of performance as I don't find necessary too). I tried using a Condition object for a ReentrantLock but it threw IllegalMonitorState exceptions whenever I tried to signalAll() or await().
Here is a pseudo code for what I need to do:
void monitorThread{
while(someCondition){
map = updatedMap();
condition.signalAll();
}
}
String readValueFromMap(String key){
if(map == null){
condition.await();
}
return map.get(key);
}
CountDownLatch is all you need.
CountDownLatch latch = new CountDownLatch(1);
While initialize hashmap do latch.countdown() and in threads use latch.await()
void monitorThread{
map = updatedMap();
latch.countDown();
}
String readValueFromMap(String key){
latch.await();
return map.get(key);
}
Please note that CountDownLatch await() method only waits if countdown is greater than 0 hence only first time.
To do this right, you need a memory barrier hence the volatile. Because the map may be null initially, you are going to need another lock object. The following should work:
private final Object lockObject = new Object();
private volatile Map<...> map;
void monitorThread{
while (condition){
// do this outside of the synchronized in case it takes a while
Map<...> updatedMap = updatedMap();
synchronized (lockObject) {
map = updatedMap;
// notify everyone that may be waiting for the map to be initialized
lockObject.notifyAll();
}
}
}
String readValueFromMap(String key) {
// we grab a copy of the map to avoid race conditions in case the map is
// updated in the future
Map<...> mapRef = map;
// we have a while loop here to handle spurious signals
if (mapRef == null) {
synchronized (lockObject) {
while (map == null) {
// wait for the map to initialized
lockObject.wait();
}
mapRef = map;
}
}
return mapRef.get(key);
}
Sounds like all you need is a "Lock" object that guards access to the Map.
These are pretty easy to use:
Lock l = ...;
l.lock();
try {
// access the resource protected by this lock
} finally {
l.unlock();
}
You could probably use: java.util.concurrent.locks.ReentrantReadWriteLock.ReadLock