Minimal blocking cache store

Minimal blocking cache store - java

Suppose we have different methods which do some http calls, each of those are called with some specific argument... and we want to compare last value of method + argument and see if response was different and only then proceed...
method1(Arg arg)
method2(Arg arg)
when we make a particular call we have a hash of the response so that we can put them in a map...
{"key" : "method1|arg", "value" : "hash"}
now the next time we get the response we retrieve this particular "hash" from that cache store and compare it...
but all the method|arg calls are concurrent and there might be many calls of the same combination running in parallel, and only concurrency issue might happen on an Entry level... when the same call tries to update cache or read while the other one is updating...
So we need to synchronize on a entry object, and with that we will have that only a unique exact same combination of "method|arg" can block it... only the same call can block its other executions, and wont block other calls that have nothing to do with it.
I wonder if there is a lib (cache) already for this purpose?
if not, then is there any Map implementation that will allow to get Entry by key? or i shall keep another map?
and generally will it be safe to use HashMap and synchronize on Entry objects? (i dont really imagine what will happen when HashMap is rehashing and some concurrent gets are executing...)
UPDATE
Here is the implementation i've come up with... altough ConcurrentHashMap is probably covering this case but idea was to lock only on an entry not the entire map... (well except on writes)
public class HashCache {
final HashMap<String, Holder> hashCache = new HashMap<>();
public boolean hasChanged(String key, Object hash) {
assert key != null && hash != null;
Holder holder = hashCache.get(key);
if (holder == null) {
synchronized (hashCache) {
hashCache.put(key, new Holder(hash));
}
return true; // first hash
} else {
synchronized (holder) {
if (Objects.equals(holder.object, hash)) {
return false; // hash not changed
} else {
holder.object = hash;
return true; // hash changed
}
}
}
}
private static class Holder {
Object object;
Holder(Object object) {
this.object = object;
}
}
}
if you see a possible bug please comment :)

I think you'd be OK with a ConcurrentHashMap. I don't believe you need a cache for this, since you don't need to cache the response, but to store response's hash.
ConcurrentHashMap is a highly optimized Map which avoids thread contention as much as possible, especially for reads (I believe this matches your case).
You could use another approach and lock on every entry once you get it from a common HashMap, however I don't think it's worth the effort. I'd go first with the ConcurrentHashMap and test it, and would only change the implementation if behavior differs from expected results.
EDIT:
As per your edit, I must insist on recommending you use a ConcurrentHashMap. Anyways, if by some reason this is not affordable to you, I believe you should double-check when putting the value in the map for the first time:
public boolean hasChanged(String key, Object hash) {
assert key != null && hash != null;
Holder holder = hashCache.get(key);
if (holder == null) {
synchronized (hashCache) { // Double-check that value hasn't been changed
// before entering synchronized block
holder = hashCache.get(key);
if (holder == null) {
hashCache.put(key, new Holder(hash));
return true; // first hash
} // inner if
} // sync block
} // outer if
// No more else!
synchronized (holder) {
if (Objects.equals(holder.object, hash)) {
return false; // hash not changed
} else {
holder.object = hash;
return true; // hash changed
}
}
}
The double-check is needed because another thread might have put a value for the same key after your first get() but before you enter the synchronized block.

Related

Is double-checked locking on ConcurrentHashMap thread-safe? [duplicate]

I have a piece of code that can be executed by multiple threads that needs to perform an I/O-bound operation in order to initialize a shared resource that is stored in a ConcurrentMap. I need to make this code thread safe and avoid unnecessary calls to initialize the shared resource. Here's the buggy code:
private ConcurrentMap<String, Resource> map;
// .....
String key = "somekey";
Resource resource;
if (map.containsKey(key)) {
resource = map.get(key);
} else {
resource = getResource(key); // I/O-bound, expensive operation
map.put(key, resource);
}
With the above code, multiple threads may check the ConcurrentMap and see that the resource isn't there, and all attempt to call getResource() which is expensive. In order to ensure only a single initialization of the shared resource and to make the code efficient once the resource has been initialized, I want to do something like this:
String key = "somekey";
Resource resource;
if (!map.containsKey(key)) {
synchronized (map) {
if (!map.containsKey(key)) {
resource = getResource(key);
map.put(key, resource);
}
}
}
Is this a safe version of double checked locking? It seems to me that since the checks are called on ConcurrentMap, it behaves like a shared resource that is declared to be volatile and thus prevents any of the "partial initialization" problems that may happen.

If you can use external libraries, take a look at Guava's MapMaker.makeComputingMap(). It's tailor-made for what you're trying to do.

yes it' safe.
If map.containsKey(key) is true, according to doc, map.put(key, resource) happens before it. Therefore getResource(key) happens before resource = map.get(key), everything is safe and sound.

Why not use the putIfAbsent() method on ConcurrentMap?
if(!map.containsKey(key)){
map.putIfAbsent(key, getResource(key));
}
Conceivably you might call getResource() more than once, but it won't happen a bunch of times. Simpler code is less likely to bite you.

In general, double-checked locking is safe if the variable you're synchronizing on is marked volatile. But you're better off synchronizing the entire function:
public synchronized Resource getResource(String key) {
Resource resource = map.get(key);
if (resource == null) {
resource = expensiveGetResourceOperation(key);
map.put(key, resource);
}
return resource;
}
The performance hit will be tiny, and you'll be certain that there will be no sync
problems.
Edit:
This is actually faster than the alternatives, because you won't have to do two calls to the map in most cases. The only extra operation is the null check, and the cost of that is close to zero.
Second edit:
Also, you don't have to use ConcurrentMap. A regular HashMap will do it. Faster still.

No need for that - ConcurrentMap supports this as with its special atomic putIfAbsent method.
Don't reinvent the wheel: Always use the API where possible.

The verdict is in. I timed 3 different solutions in nanosecond accuracy, since after all the initial question was about performance:
Fully synching the function on a regular HashMap:
synchronized (map) {
Object result = map.get(key);
if (result == null) {
result = new Object();
map.put(key, result);
}
return result;
}
first invocation: 15,000 nanoseconds, subsequent invocations: 700 nanoseconds
Using the double check lock with a ConcurrentHashMap:
if (!map.containsKey(key)) {
synchronized (map) {
if (!map.containsKey(key)) {
map.put(key, new Object());
}
}
}
return map.get(key);
first invocation: 15,000 nanoseconds, subsequent invocations: 1500 nanoseconds
A different flavor of double checked ConcurrentHashMap:
Object result = map.get(key);
if (result == null) {
synchronized (map) {
if (!map.containsKey(key)) {
result = new Object();
map.put(key, result);
} else {
result = map.get(key);
}
}
}
return result;
first invocation: 15,000 nanoseconds, subsequent invocations: 1000 nanoseconds
You can see that the biggest cost was on the first invocation, but was similar for all 3. Subsequent invocations were the fastest on the regular HashMap with method sync like user237815 suggested but only by 300 NANO seocnds. And after all we are talking about NANO seconds here which means a BILLIONTH of a second.

Cache using ConcurrentHashMap

I have the following code:
public class Cache {
private final Map map = new ConcurrentHashMap();
public Object get(Object key) {
Object value = map.get(key);
if (value == null) {
value = new SomeObject();
map.put(key, value);
}
return value;
}
}
My question is:
The put and get methods of the map are thread safe, but since the whole block in not synchronized - could multiple threads add a the same key twice?

put and get are thread safe in the sense that calling them from different threads cannot corrupt the data structure (as, e.g., is possible with a normal java.util.HashMap).
However, since the block is not synchronized, you may still have multiple threads adding the same key:
Both threads may pass the null check, one adds the key and returns its value, and then the second will override that value with a new one and returns it.

As of Java 8, you can also prevent this addition of duplicate keys with:
public class Cache {
private final Map map = new ConcurrentHashMap();
public Object get(Object key) {
Object value = map.computeIfAbsent(key, (key) -> {
return new SomeObject();
});
return value;
}
}
The API docs state:
If the specified key is not already associated with a value, attempts
to compute its value using the given mapping function and enters it
into this map unless null. The entire method invocation is performed
atomically, so the function is applied at most once per key. Some
attempted update operations on this map by other threads may be
blocked while computation is in progress, so the computation should be
short and simple, and must not attempt to update any other mappings of
this map.

could multiple threads add a the same key twice?
Yes, they could. To fix this problem you can:
1) Use putIfAbsent method instead of put. It very fast but unnecessary SomeObject instances can be created.
2) Use double checked locking:
Object value = map.get(key);
if (value == null) {
synchronized (map) {
value = map.get(key);
if (value == null) {
value = new SomeObject();
map.put(key, value);
}
}
}
return value;
Lock is much slower, but only necessary objects will be created

you could also combine checking and putIfAbsent such as:
Object value = map.get(key);
if (value == null) {
return map.putIfAbsent(key, new SomeObject());
}
return value;
thereby reducing the unneccessary new objects to cases where new entries are introduced in the short time between the check and the putIfAbsent.
If you are feeling lucky and reads vastly outnumber writes to your map, you can also create your own copy-on-write map similar to CopyOnWriteArrayList.

Synchronizing a Map of Sets/Lists

I would like to implement a variation on the "Map of Sets" collection that will be constantly accessed by multiple threads. I am wondering whether the synchronization I am doing is sufficient to guarantee that no issues will manifest.
So given the following code, where Map, HashMap, and Set are the Java implementations, and Key and Value are some arbitrary Objects:
public class MapOfSets {
private Map<Key, Set<Value>> map;
public MapOfLists() {
map = Collections.synchronizedMap(new HashMap<Key, Set<Value>());
}
//adds value to the set mapped to key
public void add(Key key, Value value) {
Set<Value> old = map.get(key);
//if no previous set exists on this key, create it and add value to it
if(old == null) {
old = new Set<Value>();
old.add(value);
map.put(old);
}
//otherwise simply insert the value to the existing set
else {
old.add(value);
}
}
//similar to add
public void remove(Key key, Value value) {...}
//perform some operation on all elements in the set mapped to key
public void foo(Key key) {
Set<Value> set = map.get(key);
for(Value v : set)
v.bar();
}
}
The idea here is that because I've synchronized the Map itself, the get() and put() method should be atomic right? So there should be no need to do additional synchronization on the Map or the Sets contained in it. So will this work?
Alternatively, would the above code be advantageous over another possible synchronization solution:
public class MapOfSets {
private Map<Key, Set<Value>> map;
public MapOfLists() {
map = new HashMap<Key, Set<Value>();
}
public synchronized void add(Key key, Value value) {
Set<Value> old = map.get(key);
//if no previous set exists on this key, create it and add value to it
if(old == null) {
old = new Set<Value>();
old.add(value);
map.put(old);
}
//otherwise simply insert the value to the existing set
else {
old.add(value);
}
}
//similar to add
public synchronized void remove(Key key, Value value) {...}
//perform some operation on all elements in the set mapped to key
public synchronized void foo(Key key) {
Set<Value> set = map.get(key);
for(Value v : set)
v.bar();
}
}
Where I leave the data structures unsynchronized but synchronize all the possible public methods instead. So which ones will work, and which one is better?

The first implementation you posted is not thread safe. Consider what happens when the add method is accessed by two concurrent threads with the same key:
thread A executes line 1 of the method, and gets a null reference because no item with the given key is present
thread B executes line 1 of the method, and gets a null reference because no item with the given key is present — this will happen after A returns from the first call, as the map is synchronized
thread A evaluates the if condition to false
thread B evaluates the if condition to false
From that point on, the two threads will carry on with execution of the true branch of the if statement, and you will lose one of the two value objects.
The second variant of the method you posted looks safer.
However, if you can use third party libraries, I would suggest you to check out Google Guava, as they offer concurrent multimaps (docs).

The second one is correct, but the first one isn't.
Think about it a minute, and suppose two threads are calling add() in parallel. Here's what could occur:
Thread 1 calls add("foo", bar");
Thread 2 calls add("foo", baz");
Thread 1 gets the set for "foo" : null
Thread 2 gets the set for "foo" : null
Thread 1 creates a new set and adds "bar" in it
Thread 2 creates a new set and adds "baz" in it
Thread 1 puts its set in the map
Thread 2 puts its set in the map
At the end of the story, the map contains one value for "foo" instead of two.
Synchronizing the map makes sure that its internal state is coherent, and that each method you call on the map is thread-safe. but it doesn't make the get-then-put operation atomic.
Consider using one of Guava's SetMultiMap implementations, which does everything for you. Wrap it into a call to Multimaps.synchronizedSetMultimap(SetMultimap) to make it thread-safe.

Your second implementation will work, but it holds locks for longer than it needs to (an inevitable problem with using synchronized methods rather than synchronized blocks), which will reduce concurrency. If you find that the limit on concurrency here is a bottleneck, you could shrink the locked regions a bit.
Alternatively, you could use some of the lock-free collections providded by java.util.concurrent. Here's my attempt at that; this isn't tested, and it requires Key to be comparable, but it should not perform any locking ever:
public class MapOfSets {
private final ConcurrentMap<Key, Set<Value>> map;
public MapOfSets() {
map = new ConcurrentSkipListMap<Key, Set<Value>>();
}
private static ThreadLocal<Set<Value>> freshSets = new ThreadLocal<Set<Value>>() {
#Override
protected Set<Value> initialValue() {
return new ConcurrentSkipListSet<Value>();
}
};
public void add(Key key, Value value) {
Set<Value> freshSet = freshSets.get();
Set<Value> set = map.putIfAbsent(key, freshSet);
if (set == null) {
set = freshSet;
freshSets.remove();
}
set.add(value);
}
public void remove(Key key, Value value) {
Set<Value> set = map.get(key);
if (set != null) {
set.remove(value);
}
}
//perform some operation on all elements in the set mapped to key
public void foo(Key key) {
Set<Value> set = map.get(key);
if (set != null) {
for (Value v: set) {
v.bar();
}
}
}
}

For your Map implementation you could just use a ConcurrentHashMap - You wouldn't have to worry about ensuring thread safety for access, whether it's input or retrieval, as the implemenation takes care of that for you.
And if you really want to use a Set, you could call
Collections.newSetFromMap(new ConcurrentHashMap<Object,Boolean>())
on your ConcurrentHashMap.

Keep 'obvious' lock-retrieve or employ double-checked locking?

I suck at formulating questions. I have the following piece of (Java) code (pseudo):
public SomeObject getObject(Identifier someIdentifier) {
// getUniqueIdentifier retrieves a singleton instance of the identifier object,
// to prevent two Identifiers that are equals() but not == (reference equals) in the system.
Identifier singletonInstance = getUniqueIdentifier(someIdentifier);
synchronized (singletonInstance) {
SomeObject cached = cache.get(singletonInstance);
if (cached != null) {
return cached;
} else {
SomeObject newInstance = createSomeObject(singletonInstance);
cache.put(singletonInstance, newInstance);
return newInstance;
}
}
}
Basically, it makes an identifier 'unique' (reference equals, as in ==), checks a cache, and in case of a cache miss, calls an expensive method (involving calling an external resource and parsing, etc), puts that in the cache, and returns. The synchronized Identifier, in this case, avoids two equals() but not == Identifier objects being used to call the expensive method, which would retrieve the same resource simultaneously.
The above works. I'm just wondering, and probably micro-optimizing, would a rewrite such as the following that employs more naïve cache retrieval and double-checked locking be 'safe' (safe as in threadsafe, void of odd race conditions) and be 'more optimal' (as in a reduction of unneeded locking and threads having to wait for a lock)?
public SomeObject getObject(Identifier someIdentifier) {
// just check the cache, reference equality is not relevant just yet.
SomeObject cached = cache.get(someIdentifier);
if (cached != null) {
return cached;
}
Identifier singletonInstance = getUniqueIdentifier(someIdentifier);
synchronized (singletonInstance) {
// re-check the cache here, in case of a context switch in between the
// cache check and the opening of the synchronized block.
SomeObject cached = cache.get(singletonInstance);
if (cached != null) {
return cached;
} else {
SomeObject newInstance = createSomeObject(singletonInstance);
cache.put(singletonInstance, newInstance);
return newInstance;
}
}
}
You could say 'Just test it' or 'Just do a micro-benchmark', but testing multi-threaded bits of code isn't my strong point, and I doubt I'd be able to simulate realistic situations or accurately fake race conditions. Plus it'd take me half a day, whereas writing a SO question only takes me a few minutes :).

You are reinventing Google-Collections/Guava's MapMaker/ComputingMap:
ConcurrentMap<Identifier, SomeObject> cache = new MapMaker().makeComputingMap(new Function<Identifier, SomeObject>() {
public SomeObject apply(Identifier from) {
return createSomeObject(from);
}
};
public SomeObject getObject(Identifier someIdentifier) {
return cache.get(someIdentifier);
}
Interning is not necessary here as the ComputingMap guarantees a single thread will only attempt to populate if absent and another thread asking for the same item will block and wait for the result. If you remove a key that is in the process of being populated then that thread and any that are currently waiting would still get that result but subsequent requests will start the population again.
If you do need interning, that library provides the excellent Interner class that has both strongly and weakly referenced caching.

synchronized takes up to 2 micro-seconds. Unless you need to cut this further you may be better off with the simplest solution.
BTW You can write
SomeObject cached = cache.get(singletonInstance);
if (cached == null)
cache.put(singletonInstance, cached = createSomeObject(singletonInstance));
return cached;

If "cache" is a map (which I suspect it is), then this problem is quite different than a simple double-checked locking problem.
If cache is a plain HashMap, then the problem is actually much worse; i.e. your proposed "double-checked pattern" behaves much worse than a simple reference-based double-checking. In fact, it can lead to ConcurrentModificationExceptions, getting incorrect values, or even an infinite loop.
If it is based on a plain HashMap, I would suggest using a ConcurrentHashMap as the first approach. With a ConcurrentHashMap, there is no explicit locking needed on your part.
public SomeObject getObject(Identifier someIdentifier) {
// cache is a ConcurrentHashMap
// just check the cache, reference equality is not relevant just yet.
SomeObject cached = cache.get(someIdentifier);
if (cached != null) {
return cached;
}
Identifier singletonInstance = getUniqueIdentifier(someIdentifier);
SomeObject newInstance = createSomeObject(singletonInstance);
SombObject old = cache.putIfAbsent(singletonInstance, newInstance);
if (old != null) {
newInstance = old;
}
return newInstance;
}

Java synchronized block using method call to get synch object

We are writing some locking code and have run into a peculiar question. We use a ConcurrentHashMap for fetching instances of Object that we lock on. So our synchronized blocks look like this
synchronized(locks.get(key)) { ... }
We have overridden the get method of ConcurrentHashMap to make it always return a new object if it did not contain one for the key.
#Override
public Object get(Object key) {
Object o = super.get(key);
if (null == o) {
Object no = new Object();
o = putIfAbsent((K) key, no);
if (null == o) {
o = no;
}
}
return o;
}
But is there a state in which the get-method has returned the object, but the thread has not yet entered the synchronized block. Allowing other threads to get the same object and lock on it.
We have a potential race condition were
thread 1: gets the object with key A, but does not enter the synchronized block
thread 2: gets the object with key A, enters a synchronized block
thread 2: removes the object from the map, exits synchronized block
thread 1: enters the synchronized block with the object that is no longer in the map
thread 3: gets a new object for key A (not the same object as thread 1 got)
thread 3: enters a synchronized block, while thread 1 also is in its synchronized block both using key A
This situation would not be possible if java entered the synchronized block directly after the call to get has returned. If not, does anyone have any input on how we could remove keys without having to worry about this race condition?

As I see it, the problem originates from the fact that you lock on map values, while in fact you need to lock on the key (or some derivation of it). If I understand correctly, you want to avoid 2 threads from running the critical section using the same key.
Is it possible for you to lock on the keys? can you guarantee that you always use the same instance of the key?
A nice alternative:
Don't delete the locks at all. Use a ReferenceMap with weak values. This way, a map entry is removed only if it is not currently in use by any thread.
Note:
1) Now you will have to synchronize this map (using Collections.synchronizedMap(..)).
2) You also need to synchronize the code that generates/returns a value for a given key.

you have 2 options:
a. you could check the map once inside the synchronized block.
Object o = map.get(k);
synchronized(o) {
if(map.get(k) != o) {
// object removed, handle...
}
}
b. you could extend your values to contain a flag indicating their status. when a value is removed from the map, you set a flag indicating that it was removed (within the sync block).
CacheValue v = map.get(k);
sychronized(v) {
if(v.isRemoved()) {
// object removed, handle...
}
}

The code as is, is thread safe. That being said, if you are removing from the CHM then any type of assumptions that are made when synchronizing on an object returned from the collection will be lost.
But is there a state in which the
get-method has returned the object,
but the thread has not yet entered the
synchronized block. Allowing other
threads to get the same object and
lock on it.
Yes, but that happens any time you synchronize on an Object. What is garunteed is that the other thread will not enter the synchronized block until the other exists.
If not, does anyone have any input on
how we could remove keys without
having to worry about this race
condition?
The only real way of ensuring this atomicity is to either synchronize on the CHM or another object (shared by all threads). The best way is to not remove from the CHM.

Thanks for all the great suggestions and ideas, really appreciate it! Eventually this discussion made me come up with a solution that does not use objects for locking.
Just a brief description of what we're actually doing.
We have a cache that receives data continuously from our environment. The cache has several 'buckets' for each key and aggregated events into the buckets as they come in. The events coming in have a key that determines the cache entry to be used, and a timestamp determining the bucket in the cache entry that should be incremented.
The cache also has an internal flush task that runs periodically. It will iterate all cache entries and flushes all buckets but the current one to database.
Now the timestamps of the incoming data can be for any time in the past, but the majority of them are for very recent timestamps. So the current bucket will get more hits than buckets for previous time intervals.
Knowing this, I can demonstrate the race condition we had. All this code is for one single cache entry, since the issue was isolated to concurrent writing and flushing of single cache elements.
// buckets :: ConcurrentMap<Long, AtomicLong>
void incrementBucket(long timestamp, long value) {
long key = bucketKey(timestamp, LOG_BUCKET_INTERVAL);
AtomicLong bucket = buckets.get(key);
if (null == bucket) {
AtomicLong newBucket = new AtomicLong(0);
bucket = buckets.putIfAbsent(key, newBucket);
if (null == bucket) {
bucket = newBucket;
}
}
bucket.addAndGet(value);
}
Map<Long, Long> flush() {
long now = System.currentTimeMillis();
long nowKey = bucketKey(now, LOG_BUCKET_INTERVAL);
Map<Long, Long> flushedValues = new HashMap<Long, Long>();
for (Long key : new TreeSet<Long>(buckets.keySet())) {
if (key != nowKey) {
AtomicLong bucket = buckets.remove(key);
if (null != bucket) {
long databaseKey = databaseKey(key);
long n = bucket.get()
if (!flushedValues.containsKey(databaseKey)) {
flushedValues.put(databaseKey, n);
} else {
long sum = flushedValues.get(databaseKey) + n;
flushedValues.put(databaseKey, sum);
}
}
}
}
return flushedValues;
}
What could happen was: (fl = flush thread, it = increment thread)
it: enters incrementBucket, executes until just before the call to addAndGet(value)
fl: enters flush and iterates the buckets
fl: reaches the bucket that is being incremented
fl: removes it and calls bucket.get() and stores the value to the flushed values
it: increments the bucket (which will be lost now, because the bucket has been flushed and removed)
The solution:
void incrementBucket(long timestamp, long value) {
long key = bucketKey(timestamp, LOG_BUCKET_INTERVAL);
boolean done = false;
while (!done) {
AtomicLong bucket = buckets.get(key);
if (null == bucket) {
AtomicLong newBucket = new AtomicLong(0);
bucket = buckets.putIfAbsent(key, newBucket);
if (null == bucket) {
bucket = newBucket;
}
}
synchronized (bucket) {
// double check if the bucket still is the same
if (buckets.get(key) != bucket) {
continue;
}
done = true;
bucket.addAndGet(value);
}
}
}
Map<Long, Long> flush() {
long now = System.currentTimeMillis();
long nowKey = bucketKey(now, LOG_BUCKET_INTERVAL);
Map<Long, Long> flushedValues = new HashMap<Long, Long>();
for (Long key : new TreeSet<Long>(buckets.keySet())) {
if (key != nowKey) {
AtomicLong bucket = buckets.get(key);
if (null != value) {
synchronized(bucket) {
buckets.remove(key);
long databaseKey = databaseKey(key);
long n = bucket.get()
if (!flushedValues.containsKey(databaseKey)) {
flushedValues.put(databaseKey, n);
} else {
long sum = flushedValues.get(databaseKey) + n;
flushedValues.put(databaseKey, sum);
}
}
}
}
}
return flushedValues;
}
I hope this will be useful for others that might run in to the same problem.

The two code snippets you've provided are fine, as they are. What you've done is similar to how lazy instantiation with Guava's MapMaker.makeComputingMap() might work, but I see no problems with the way that the keys are lazily created.
You're right by the way that it's entirely possible for a thread to be prempted after the get() lookup of a lock object, but before entering sychronized.
My problem is with the third bullet point in your race condition description. You say:
thread 2: removes the object from the map, exits synchronized block
Which object, and which map? In general, I presumed that you were looking up a key to lock on, and then would be performing some other operations on other data structures, within the synchronized block. If you're talking about removing the lock object from the ConcurrentHashMap mentioned at the start, that's a massive difference.
And the real question is whether this is necessary at all. In a general purpose environment, I don't think there will be any memory issues with just remembering all of the lock objects for all the keys that have ever been looked up (even if those keys no longer represent live objects). It is much harder to come up with some way of safely disposing of an object that may be stored in a local variable of some other thread at any time, and if you do want to go down this route I have a feeling that performance will degrade to that of a single coarse lock around the key lookup.
If I've misunderstood what's going on there then feel free to correct me.
Edit: OK - in which case I stand by my above claim that the easiest way to do this is not remove the keys; this might not actually be as problematic as you think, since the rate at which the space grows will be very small. By my calculations (which may well be off, I'm not an expert in space calculations and your JVM may vary) the map grows by about 14Kb/hour. You'd have to have a year of continuous uptime before this map used up 100MB of heap space.
But let's assume that the keys really do need to be removed. This poses the problem that you can't remove a key until you know that no threads are using it. This leads to the chicken-and-egg problem that you'll require all threads to synchronize on something else in order to get atomicity (of checking) and visibility across threads, which then means that you can't do much else than slap a single synchronized block around the whole thing, completely subverting your lock striping strategy.
Let's revisit the constraints. The main thing here is that things get cleared up eventually. It's not a correctness constraint but just a memory issue. Hence what we really want to do is identify some point at which the key could definitely no longer be used, and then use this as the trigger to remove it from the map. There are two cases here:
You can identify such a condition, and logically test for it. In which case you can remove the keys from the map with (in the worst case) some kind of timer thread, or hopefully some logic that's more cleanly integrated with your application.
You cannot identify any condition by which you know that a key will no longer be used. In this case, by definition, there is no point at which it's safe to remove the keys from the map. So in fact, for correctness' sake, you must leave them in.
In any case, this effectively boils down to manual garbage collection. Remove the keys from the map when you can lazily determine that they're no longer going to be used. Your current solution is too eager here since (as you point out) it's doing the removal before this situation holds.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.