java.util.concurrent: external synchronize to remove map value

java.util.concurrent: external synchronize to remove map value - java

I need to keep track of multiple values against unique keys i.e. 1(a,b) 2(c,d) etc...
The solution is accessed by multiple threads so effectively I have the following defined;
ConcurrentSkipListMap<key, ConcurrentSkipListSet<values>>
My question is does the removal of the key when the value set size is 0 need to be synchronized? I know that the two classes are "concurrent" and I've looked through the OpenJDK source code but I there would appear to be a window between one thread T1 checking that the Set is empty and removing the Map in remove(...) and another thread T2 calling add(...). Result being T1 removes last Set entry and removes the Map interleaved with T2 just adding a Set entry. Thus the Map and T2 Set entry are removed by T1 and data is lost.
Do I just "synchronize" the add() and remove() methods or is there a "better" way?
The Map is modified by multiple threads but only through two methods.
Code snippet as follows;
protected static class EndpointSet extends U4ConcurrentSkipListSet<U4Endpoint> {
private static final long serialVersionUID = 1L;
public EndpointSet() {
super();
}
}
protected static class IDToEndpoint extends U4ConcurrentSkipListMap<String, EndpointSet> {
private static final long serialVersionUID = 1L;
protected Boolean add(String id, U4Endpoint endpoint) {
EndpointSet endpoints = get(id);
if (endpoints == null) {
endpoints = new EndpointSet();
put(id, endpoints);
}
endpoints.add(endpoint);
return true;
}
protected Boolean remove(String id, U4Endpoint endpoint) {
EndpointSet endpoints = get(id);
if (endpoints == null) {
return false;
} else {
endpoints.remove(endpoint);
if (endpoints.size() == 0) {
remove(id);
}
return true;
}
}
}

As it is your code has data races. Examples of what could happen:
a thread could add between if (endpoints.size() == 0) and remove(id); - you saw that
in add, a thread could read a non null value in EndpointSet endpoints = get(id); and another thread could remove data from that set, remove the set from the map because the set is empty. The initial thread would then add a value to the set, which is not held in the map any longer => data gets lost too as it becomes unreachable.
The easiest way to solve your issue is to make both add and remove synchronized. But you then lose all the performance benefits of using a ConcurrentMap.
Alternatively, you could simply leave the empty sets in the map - unless you have memory constraints. You would still need some form of synchronization but it would be easier to optimise.
If contention (performance) is an issue, you could try a more fine grained locking strategy by synchronizing on the keys or values but it could be quite tricky (and locking on Strings is not such a good idea because of String pooling).
It seems that in all cases, you could use a non concurrent set as you will need to synchronize it externally yourself.

Related

Missing updates with locks and ConcurrentHashMap

I have a scenario where I have to maintain a Map which can be populated by multiple threads, each modifying their respective List (unique identifier/key being the thread name), and when the list size for a thread exceeds a fixed batch size, we have to persist the records to the database.
Aggregator class
private volatile ConcurrentHashMap<String, List<T>> instrumentMap = new ConcurrentHashMap<String, List<T>>();
private ReentrantLock lock ;
public void addAll(List<T> entityList, String threadName) {
try {
lock.lock();
List<T> instrumentList = instrumentMap.get(threadName);
if(instrumentList == null) {
instrumentList = new ArrayList<T>(batchSize);
instrumentMap.put(threadName, instrumentList);
}
if(instrumentList.size() >= batchSize -1){
instrumentList.addAll(entityList);
recordSaver.persist(instrumentList);
instrumentList.clear();
} else {
instrumentList.addAll(entityList);
}
} finally {
lock.unlock();
}
}
There is one more separate thread running after every 2 minutes (using the same lock) to persist all the records in Map (to make sure we have something persisted after every 2 minutes and the map size does not gets too big)
if(//Some condition) {
Thread.sleep(//2 minutes);
aggregator.getLock().lock();
List<T> instrumentList = instrumentMap.values().stream().flatMap(x->x.stream()).collect(Collectors.toList());
if(instrumentList.size() > 0) {
saver.persist(instrumentList);
instrumentMap .values().parallelStream().forEach(x -> x.clear());
aggregator.getLock().unlock();
}
}
This solution is working fine in almost for every scenario that we tested, except sometimes we see some of the records went missing, i.e. they are not persisted at all, although they were added fine to the Map.
My questions are:
What is the problem with this code?
Is ConcurrentHashMap not the best solution here?
Does the List that is used with the ConcurrentHashMap have an issue?
Should I use the compute method of ConcurrentHashMap here (no need I think, as ReentrantLock is already doing the same job)?

The answer provided by #Slaw in the comments did the trick. We were letting the instrumentList instance escape in non-synchronized way i.e. access/operations are happening over list without any synchonization. Fixing the same by passing the copy to further methods did the trick.
Following line of code is the one where this issue was happening
recordSaver.persist(instrumentList);
instrumentList.clear();
Here we are allowing the instrumentList instance to escape in non-synchronized way i.e. it is passed to another class (recordSaver.persist) where it was to be actioned on but we are also clearing the list in very next line(in Aggregator class) and all of this is happening in non-synchronized way. List state can't be predicted in record saver... a really stupid mistake.
We fixed the issue by passing a cloned copy of instrumentList to recordSaver.persist(...) method. In this way instrumentList.clear() has no affect on list available in recordSaver for further operations.

I see, that you are using ConcurrentHashMap's parallelStream within a lock. I am not knowledgeable about Java 8+ stream support, but quick searching shows, that
ConcurrentHashMap is a complex data structure, that used to have concurrency bugs in past
Parallel streams must abide to complex and poorly documented usage restrictions
You are modifying your data within a parallel stream
Based on that information (and my gut-driven concurrency bugs detector™), I wager a guess, that removing the call to parallelStream might improve robustness of your code. In addition, as mentioned by #Slaw, you should use ordinary HashMap in place of ConcurrentHashMap if all instrumentMap usage is already guarded by lock.
Of course, since you don't post the code of recordSaver, it is possible, that it too has bugs (and not necessarily concurrency-related ones). In particular, you should make sure, that the code that reads records from persistent storage — the one, that you are using to detect loss of records — is safe, correct, and properly synchronized with rest of your system (preferably by using a robust, industry-standard SQL database).

It looks like this was an attempt at optimization where it was not needed. In that case, less is more and simpler is better. In the code below, only two concepts for concurrency are used: synchronized to ensure a shared list is properly updated and final to ensure all threads see the same value.
import java.util.ArrayList;
import java.util.List;
public class Aggregator<T> implements Runnable {
private final List<T> instruments = new ArrayList<>();
private final RecordSaver recordSaver;
private final int batchSize;
public Aggregator(RecordSaver recordSaver, int batchSize) {
super();
this.recordSaver = recordSaver;
this.batchSize = batchSize;
}
public synchronized void addAll(List<T> moreInstruments) {
instruments.addAll(moreInstruments);
if (instruments.size() >= batchSize) {
storeInstruments();
}
}
public synchronized void storeInstruments() {
if (instruments.size() > 0) {
// in case recordSaver works async
// recordSaver.persist(new ArrayList<T>(instruments));
// else just:
recordSaver.persist(instruments);
instruments.clear();
}
}
#Override
public void run() {
while (true) {
try { Thread.sleep(1L); } catch (Exception ignored) {
break;
}
storeInstruments();
}
}
class RecordSaver {
void persist(List<?> l) {}
}
}

Delegating thread-safety to ConcurrentMap and AtomicInteger

I need to provide thread-safe implementation of the following container:
public interface ParameterMetaData<ValueType> {
public String getName();
}
public interface Parameters {
public <M> M getValue(ParameterMetaData<M> pmd);
public <M> void put(ParameterMetaData<M> p, M value);
public int size();
}
The thing is the size method should return the accurate number of paramters currently contained in a Parameters instance. So, my first attempt was to try delegating thread-safety as follows:
public final class ConcurrentParameters implements Parameters{
private final ConcurrentMap<ParameterMetaData<?>, Object> parameters =
new ConcurrentHashMap<>();
//Should represent the ACCURATE size of the internal map
private final AtomicInteger size = new AtomicInteger();
#Override
public <M> M getValue(ParameterMetaData<M> pmd) {
#SuppressWarnings("unchecked")
M value = (M) parameters.get(pmd);
return value;
}
#Override
public <M> void put(ParameterMetaData<M> p, M value){
if(value == null)
return;
//The problem is in the code below
M previous = (M) parameters.putIfAbsent(p, value);
if(previous != null)
//throw an exception indicating that the parameter already exists
size.incrementAndGet();
}
#Override
public int size() {
return size.intValue();
}
The problem is that I can't just call parameters.size() on the ConcurrentHashMap instance to return the actual size, as that the operation performs traversal without locking and there's no guaratee that it will retrieve the actual size. It isn't acceptable in my case. So, I decided to maintain the field containing the size.
QUESTION: Is it possible somehow to delegate thread safety and preserve the invariatns?

The outcome you want to achieve is non-atomic. You want to modify map and then get count of elements that would be consistent in a scope of single thread. The only way to achieve that is to make this flow "atomic operation" by synchronizing access to the map. This is the only way to assure that count will not change due to modifications made in another thread.
Synchronize modify-count access to the map via synchronized or Semaphore to allow only single thread to modify map and count elements at the time.
Using additional field as a counter does not guarantee thread safety here, as after map modification and before counter manipulation, other thread can in fact modify map, and the counter value will not be valid.
This is the reason why map does not keeps its size internally but has to traversal over elements - to give most accurate results at given point in time.
EDIT:
To be 100% clear, this is the most convinient way to achieve this:
synchronized(yourMap){
doSomethingWithTheMap();
yourMap.size();
}
so if you will change every map operation to such block, you will guarantee that size() will return accurate count of elements. The only condition is that all data manipulations are done using such synchronized block.

Updating highly read Lists/Maps in a concurrent environment

The following class acts as a simple cache that gets updated very infrequently (say e.g. twice a day) and gets read quite a lot (up to several times a second). There are two different types, a List and a Map. My question is about the new assignment after the data gets updated in the update method. What's the best (safest) way for the new data to get applied?
I should add that it isn't necessary for readers to see the absolute latest value. The requirements are just to get either the old or the new value at any given time.
public class Foo {
private ThreadPoolExecutor _executor;
private List<Object> _listObjects = new ArrayList<Object>(0);
private Map<Integer, Object> _mapObjects = new HashMap<Integer, Object>();
private Object _mutex = new Object();
private boolean _updateInProgress;
public void update() {
synchronized (_mutex) {
if (_updateInProgress) {
return;
} else {
_updateInProgress = true;
}
}
_executor.execute(new Runnable() {
#Override
public void run() {
try {
List<Object> newObjects = loadListObjectsFromDatabase();
Map<Integer, Object> newMapObjects = loadMapObjectsFromDatabase();
/*
* this is the interesting part
*/
_listObjects = newObjects;
_mapObjects = newMapObjects;
} catch (final Exception ex) {
// error handling
} finally {
synchronized (_mutex) {
_updateInProgress = false;
}
}
}
});
}
public Object getObjectById(Integer id) {
return _mapObjects.get(id);
}
public List<Object> getListObjects() {
return new ArrayList<Object>(_listObjects);
}
}
As you see, currently no ConcurrentHashMap or CopyOnWriteArrayList is used. The only synchronisation is done in the update method.
Although not necessary for my current problem, it would be also great to know the best solution for cases where it is essential for readers to always get the absolute latest value.

You could use plan synchronization unless you are reading over 10,000 times per second.
If you want concurrent access I would use on of the concurrent collections like ConcurrentHashMap or CopyOnWriteArrayList. These are simpler to use than synchronizing the collection. (i.e. you don't need them for performance reasons, use them for simplicity)
BTW: A modern CPU can perform billions of operations in 0.1 seconds so several times a second is an eternity to a computer.

I am also seeing this issue and think of multiple solutions:
Use synchronization block on the both codes, one where reading and other where writing.
Make a separate remove list, add all removable items in that list. Remove in the same thread where reading the list just after reading is done. This way reading and deleting will happen in sequence and no error will come.

Java threads locking on a specific object

I have a web application and I am using Oracle database and I have a method basically like this:
public static void saveSomethingImportantToDataBase(Object theObjectIwantToSave) {
if (!methodThatChecksThatObjectAlreadyExists) {
storemyObject() //pseudo code
}
// Have to do a lot other saving stuff, because it either saves everything or nothing
commit() // pseudo code to actually commit all my changes to the database.
}
Right now there is no synchronization of any kind so n threads can of course access this method freely, the problem arises when 2 threads enter this method both check and of course there is nothing just yet, and then they can both commit the transaction, creating a duplicate object.
I do not want to solve this with a unique key identifier in my Database, because I don't think I should be catching that SQLException.
I also cannot check right before the commit, because there are several checks not only 1, which would take a considerable amount of time.
My experience with locks and threads is limited, but my idea is basically to lock this code on the object that it is receiving. I don't know if for example say I receive an Integer Object, and I lock on my Integer with value 1, would that only prevent threads with another Integer with value 1 from entering, and all the other threads with value != 1 can enter freely?, is this how it works?.
Also if this is how it works, how is the lock object compared? how is it determined that they are in fact the same object?. A good article on this would also be appreciated.
How would you solve this?.

Your idea is a good one. This is the simplistic/naive version, but it's unlikely to work:
public static void saveSomethingImportantToDataBase(Object theObjectIwantToSave) {
synchronized (theObjectIwantToSave) {
if (!methodThatChecksThatObjectAlreadyExists) {
storemyObject() //pseudo code
}
// Have to do a lot other saving stuff, because it either saves everything or nothing
commit() // pseudo code to actually commit all my changes to the database.
}
}
This code uses the object itself as the lock. But it has to be the same object (ie objectInThreadA == objectInThreadB) if it's to work. If two threads are operating on an object that is a copy of each other - ie has the same "id" for example, then you'll need to either synchronize the whole method:
public static synchronized void saveSomethingImportantToDataBase(Object theObjectIwantToSave) ...
which will of course greatly reduce concurrency (throughput will drop to one thread at a time using the method - to be avoided).
Or find a way to get the same lock object based on the save object, like this approach:
private static final ConcurrentHashMap<Object, Object> LOCKS = new ConcurrentHashMap<Object, Object>();
public static void saveSomethingImportantToDataBase(Object theObjectIwantToSave) {
synchronized (LOCKS.putIfAbsent(theObjectIwantToSave.getId(), new Object())) {
....
}
LOCKS.remove(theObjectIwantToSave.getId()); // Clean up lock object to stop memory leak
}
This last version it the recommended one: It will ensure that two save objects that share the same "id" are locked with the same lock object - the method ConcurrentHashMap.putIfAbsent() is threadsafe, so "this will work", and it requires only that objectInThreadA.getId().equals(objectInThreadB.getId()) to work properly. Also, the datatype of getId() can be anything, including primitives (eg int) due to java's autoboxing.
If you override equals() and hashcode() for your object, then you could use the object itself instead of object.getId(), and that would be an improvement (Thanks #TheCapn for pointing this out)
This solution will only work with in one JVM. If your servers are clustered, that a whole different ball game and java's locking mechanism will not help you. You'll have to use a clustered locking solution, which is beyond the scope of this answer.

Here is an option adapted from And360's comment on Bohemian's answer, that tries to avoid race conditions, etc. Though I prefer my other answer to this question over this one, slightly:
import java.util.HashMap;
import java.util.concurrent.atomic.AtomicInteger;
// it is no advantage of using ConcurrentHashMap, since we synchronize access to it
// (we need to in order to "get" the lock and increment/decrement it safely)
// AtomicInteger is just a mutable int value holder
// we don't actually need it to be atomic
static final HashMap<Object, AtomicInteger> locks = new HashMap<Integer, AtomicInteger>();
public static void saveSomethingImportantToDataBase(Object objectToSave) {
AtomicInteger lock;
synchronized (locks) {
lock = locks.get(objectToSave.getId());
if (lock == null) {
lock = new AtomicInteger(1);
locks.put(objectToSave.getId(), lock);
}
else
lock.incrementAndGet();
}
try {
synchronized (lock) {
// do synchronized work here (synchronized by objectToSave's id)
}
} finally {
synchronized (locks) {
lock.decrementAndGet();
if (lock.get() == 0)
locks.remove(id);
}
}
}
You could split these out into helper methods "get lock object" and "release lock" or what not, as well, to cleanup the code. This way feels a little more kludgey than my other answer.

Bohemian's answer seems to have race condition problems if one thread is in the synchronized section while another thread removes the synchro-object from the Map, etc. So here is an alternative that leverages WeakRef's.
// there is no synchronized weak hash map, apparently
// and Collections.synchronizedMap has no putIfAbsent method, so we use synchronized(locks) down below
WeakHashMap<Integer, Integer> locks = new WeakHashMap<>();
public void saveSomethingImportantToDataBase(DatabaseObject objectToSave) {
Integer lock;
synchronized (locks) {
lock = locks.get(objectToSave.getId());
if (lock == null) {
lock = new Integer(objectToSave.getId());
locks.put(lock, lock);
}
}
synchronized (lock) {
// synchronized work here (synchronized by objectToSave's id)
}
// no releasing needed, weakref does that for us, we're done!
}
And a more concrete example of how to use the above style system:
static WeakHashMap<Integer, Integer> locks = new WeakHashMap<>();
static Object getSyncObjectForId(int id) {
synchronized (locks) {
Integer lock = locks.get(id);
if (lock == null) {
lock = new Integer(id);
locks.put(lock, lock);
}
return lock;
}
}
Then use it elsewhere like this:
...
synchronized (getSyncObjectForId(id)) {
// synchronized work here
}
...
The reason this works is basically that if two objects with matching keys enter the critical block, the second will retrieve the lock the first is already using (or the one that is left behind and hasn't been GC'ed yet). However if it is unused, both will have left the method behind and removed their references to the lock object, so it is safely collected.
If you have a limited "known size" of synchronization points you want to use (one that doesn't have to decrease in size eventually), you could probably avoid using a HashMap and use a ConcurrentHashMap instead, with its putIfAbsent method which might be easier to understand.

My opinion is you are not struggling with a real threading problem.
You would be better off letting the DBMS automatically assign a non conflicting row id.
If you need to work with existing row ids store them as thread local variables.
If there is no need for shared data do not share data between threads.
http://download.oracle.com/javase/6/docs/api/java/lang/ThreadLocal.html
An Oracle dbms is much better in keeping the data consistent when an application server or a web container.
"Many database systems automatically generate a unique key field when a row is inserted. Oracle Database provides the same functionality with the help of sequences and triggers. JDBC 3.0 introduces the retrieval of auto-generated keys feature that enables you to retrieve such generated values. In JDBC 3.0, the following interfaces are enhanced to support the retrieval of auto-generated keys feature ...."
http://download.oracle.com/docs/cd/B19306_01/java.102/b14355/jdbcvers.htm#CHDEGDHJ

If you can live with occasional over-synchronization (ie. work done sequentially when not needed) try this:
Create a table with lock objects. The bigger table, the fewer chances for over-synchronizaton.
Apply some hashing function to your id to compute table index. If your id is numeric, you can just use a remainder (modulo) function, if it is a String, use hashCode() and a remainder.
Get a lock from the table and synchronize on it.
An IdLock class:
public class IdLock {
private Object[] locks = new Object[10000];
public IdLock() {
for (int i = 0; i < locks.length; i++) {
locks[i] = new Object();
}
}
public Object getLock(int id) {
int index = id % locks.length;
return locks[index];
}
}
and its use:
private idLock = new IdLock();
public void saveSomethingImportantToDataBase(Object theObjectIwantToSave) {
synchronized (idLock.getLock(theObjectIwantToSave.getId())) {
// synchronized work here
}
}

public static void saveSomethingImportantToDataBase(Object theObjectIwantToSave) {
synchronized (theObjectIwantToSave) {
if (!methodThatChecksThatObjectAlreadyExists) {
storemyObject() //pseudo code
}
// Have to do a lot other saving stuff, because it either saves everything or nothing
commit() // pseudo code to actually commit all my changes to the database.
}
}
The synchronized keyword locks the object you want so that no other method could access it.

I don't think you have any choice but to take one of the solutions that you do not seem to want to do.
In your case, I don't think any type of synchronization on the objectYouWantToSave is going to work since they are based on web requests. Therefore each request (on its own thread) is most likely going to have it's own instance of the object. Even though they might be considered logically equal, that doesn't matter for synchronization.

synchronized keyword (or another sync operation) is must but is not enough for your problem. You should use a data structure to store which integer values are used. In our example HashSet is used. Do not forget clean too old record from hashset.
private static HashSet <Integer>isUsed= new HashSet <Integer>();
public synchronized static void saveSomethingImportantToDataBase(Object theObjectIwantToSave) {
if(isUsed.contains(theObjectIwantToSave.your_integer_value) != null) {
if (!methodThatChecksThatObjectAlreadyExists) {
storemyObject() //pseudo code
}
// Have to do a lot other saving stuff, because it either saves everything or nothing
commit() // pseudo code to actually commit all my changes to the database.
isUsed.add(theObjectIwantToSave.your_integer_value);
}
}

To answer your question about locking the Integer, the short answer is NO - it won't prevent threads with another Integer instance with the same value from entering. The long answer: depends on how you obtain the Integer - by constructor, by reusing some instances or by valueOf (that uses some caching). Anyway, I wouldn't rely on it.
A working solution that will work is to make the method synchronized:
public static synchronized void saveSomethingImportantToDataBase(Object theObjectIwantToSave) {
if (!methodThatChecksThatObjectAlreadyExists) {
storemyObject() //pseudo code
}
// Have to do a lot other saving stuff, because it either saves everything or nothing
commit() // pseudo code to actually commit all my changes to the database.
}
This is probably not the best solution performance-wise, but it is guaranteed to work (note, if you are not in a clustered environment) until you find a better solution.

private static final Set<Object> lockedObjects = new HashSet<>();
private void lockObject(Object dbObject) throws InterruptedException {
synchronized (lockedObjects) {
while (!lockedObjects.add(dbObject)) {
lockedObjects.wait();
}
}
}
private void unlockObject(Object dbObject) {
synchronized (lockedObjects) {
lockedObjects.remove(dbObject);
lockedObjects.notifyAll();
}
}
public void saveSomethingImportantToDatabase(Object theObjectIwantToSave) throws InterruptedException {
try {
lockObject(theObjectIwantToSave);
if (!methodThatChecksThatObjectAlreadyExists(theObjectIwantToSave)) {
storeMyObject(theObjectIwantToSave);
}
commit();
} finally {
unlockObject(theObjectIwantToSave);
}
}
You must correctly override methods 'equals' and 'hashCode' for your objects' classes. If you have unique id (String or Number) inside your object then you can just check this id instead of the whole object and no need to override 'equals' and 'hashCode'.
try-finally - is very important - you must guarantee to unlock waiting threads after your operation even if your operation threw exception.
This approach will not work if your back-end is distributed across multiple servers.

Java synchronized block using method call to get synch object

We are writing some locking code and have run into a peculiar question. We use a ConcurrentHashMap for fetching instances of Object that we lock on. So our synchronized blocks look like this
synchronized(locks.get(key)) { ... }
We have overridden the get method of ConcurrentHashMap to make it always return a new object if it did not contain one for the key.
#Override
public Object get(Object key) {
Object o = super.get(key);
if (null == o) {
Object no = new Object();
o = putIfAbsent((K) key, no);
if (null == o) {
o = no;
}
}
return o;
}
But is there a state in which the get-method has returned the object, but the thread has not yet entered the synchronized block. Allowing other threads to get the same object and lock on it.
We have a potential race condition were
thread 1: gets the object with key A, but does not enter the synchronized block
thread 2: gets the object with key A, enters a synchronized block
thread 2: removes the object from the map, exits synchronized block
thread 1: enters the synchronized block with the object that is no longer in the map
thread 3: gets a new object for key A (not the same object as thread 1 got)
thread 3: enters a synchronized block, while thread 1 also is in its synchronized block both using key A
This situation would not be possible if java entered the synchronized block directly after the call to get has returned. If not, does anyone have any input on how we could remove keys without having to worry about this race condition?

As I see it, the problem originates from the fact that you lock on map values, while in fact you need to lock on the key (or some derivation of it). If I understand correctly, you want to avoid 2 threads from running the critical section using the same key.
Is it possible for you to lock on the keys? can you guarantee that you always use the same instance of the key?
A nice alternative:
Don't delete the locks at all. Use a ReferenceMap with weak values. This way, a map entry is removed only if it is not currently in use by any thread.
Note:
1) Now you will have to synchronize this map (using Collections.synchronizedMap(..)).
2) You also need to synchronize the code that generates/returns a value for a given key.

you have 2 options:
a. you could check the map once inside the synchronized block.
Object o = map.get(k);
synchronized(o) {
if(map.get(k) != o) {
// object removed, handle...
}
}
b. you could extend your values to contain a flag indicating their status. when a value is removed from the map, you set a flag indicating that it was removed (within the sync block).
CacheValue v = map.get(k);
sychronized(v) {
if(v.isRemoved()) {
// object removed, handle...
}
}

The code as is, is thread safe. That being said, if you are removing from the CHM then any type of assumptions that are made when synchronizing on an object returned from the collection will be lost.
But is there a state in which the
get-method has returned the object,
but the thread has not yet entered the
synchronized block. Allowing other
threads to get the same object and
lock on it.
Yes, but that happens any time you synchronize on an Object. What is garunteed is that the other thread will not enter the synchronized block until the other exists.
If not, does anyone have any input on
how we could remove keys without
having to worry about this race
condition?
The only real way of ensuring this atomicity is to either synchronize on the CHM or another object (shared by all threads). The best way is to not remove from the CHM.

Thanks for all the great suggestions and ideas, really appreciate it! Eventually this discussion made me come up with a solution that does not use objects for locking.
Just a brief description of what we're actually doing.
We have a cache that receives data continuously from our environment. The cache has several 'buckets' for each key and aggregated events into the buckets as they come in. The events coming in have a key that determines the cache entry to be used, and a timestamp determining the bucket in the cache entry that should be incremented.
The cache also has an internal flush task that runs periodically. It will iterate all cache entries and flushes all buckets but the current one to database.
Now the timestamps of the incoming data can be for any time in the past, but the majority of them are for very recent timestamps. So the current bucket will get more hits than buckets for previous time intervals.
Knowing this, I can demonstrate the race condition we had. All this code is for one single cache entry, since the issue was isolated to concurrent writing and flushing of single cache elements.
// buckets :: ConcurrentMap<Long, AtomicLong>
void incrementBucket(long timestamp, long value) {
long key = bucketKey(timestamp, LOG_BUCKET_INTERVAL);
AtomicLong bucket = buckets.get(key);
if (null == bucket) {
AtomicLong newBucket = new AtomicLong(0);
bucket = buckets.putIfAbsent(key, newBucket);
if (null == bucket) {
bucket = newBucket;
}
}
bucket.addAndGet(value);
}
Map<Long, Long> flush() {
long now = System.currentTimeMillis();
long nowKey = bucketKey(now, LOG_BUCKET_INTERVAL);
Map<Long, Long> flushedValues = new HashMap<Long, Long>();
for (Long key : new TreeSet<Long>(buckets.keySet())) {
if (key != nowKey) {
AtomicLong bucket = buckets.remove(key);
if (null != bucket) {
long databaseKey = databaseKey(key);
long n = bucket.get()
if (!flushedValues.containsKey(databaseKey)) {
flushedValues.put(databaseKey, n);
} else {
long sum = flushedValues.get(databaseKey) + n;
flushedValues.put(databaseKey, sum);
}
}
}
}
return flushedValues;
}
What could happen was: (fl = flush thread, it = increment thread)
it: enters incrementBucket, executes until just before the call to addAndGet(value)
fl: enters flush and iterates the buckets
fl: reaches the bucket that is being incremented
fl: removes it and calls bucket.get() and stores the value to the flushed values
it: increments the bucket (which will be lost now, because the bucket has been flushed and removed)
The solution:
void incrementBucket(long timestamp, long value) {
long key = bucketKey(timestamp, LOG_BUCKET_INTERVAL);
boolean done = false;
while (!done) {
AtomicLong bucket = buckets.get(key);
if (null == bucket) {
AtomicLong newBucket = new AtomicLong(0);
bucket = buckets.putIfAbsent(key, newBucket);
if (null == bucket) {
bucket = newBucket;
}
}
synchronized (bucket) {
// double check if the bucket still is the same
if (buckets.get(key) != bucket) {
continue;
}
done = true;
bucket.addAndGet(value);
}
}
}
Map<Long, Long> flush() {
long now = System.currentTimeMillis();
long nowKey = bucketKey(now, LOG_BUCKET_INTERVAL);
Map<Long, Long> flushedValues = new HashMap<Long, Long>();
for (Long key : new TreeSet<Long>(buckets.keySet())) {
if (key != nowKey) {
AtomicLong bucket = buckets.get(key);
if (null != value) {
synchronized(bucket) {
buckets.remove(key);
long databaseKey = databaseKey(key);
long n = bucket.get()
if (!flushedValues.containsKey(databaseKey)) {
flushedValues.put(databaseKey, n);
} else {
long sum = flushedValues.get(databaseKey) + n;
flushedValues.put(databaseKey, sum);
}
}
}
}
}
return flushedValues;
}
I hope this will be useful for others that might run in to the same problem.

The two code snippets you've provided are fine, as they are. What you've done is similar to how lazy instantiation with Guava's MapMaker.makeComputingMap() might work, but I see no problems with the way that the keys are lazily created.
You're right by the way that it's entirely possible for a thread to be prempted after the get() lookup of a lock object, but before entering sychronized.
My problem is with the third bullet point in your race condition description. You say:
thread 2: removes the object from the map, exits synchronized block
Which object, and which map? In general, I presumed that you were looking up a key to lock on, and then would be performing some other operations on other data structures, within the synchronized block. If you're talking about removing the lock object from the ConcurrentHashMap mentioned at the start, that's a massive difference.
And the real question is whether this is necessary at all. In a general purpose environment, I don't think there will be any memory issues with just remembering all of the lock objects for all the keys that have ever been looked up (even if those keys no longer represent live objects). It is much harder to come up with some way of safely disposing of an object that may be stored in a local variable of some other thread at any time, and if you do want to go down this route I have a feeling that performance will degrade to that of a single coarse lock around the key lookup.
If I've misunderstood what's going on there then feel free to correct me.
Edit: OK - in which case I stand by my above claim that the easiest way to do this is not remove the keys; this might not actually be as problematic as you think, since the rate at which the space grows will be very small. By my calculations (which may well be off, I'm not an expert in space calculations and your JVM may vary) the map grows by about 14Kb/hour. You'd have to have a year of continuous uptime before this map used up 100MB of heap space.
But let's assume that the keys really do need to be removed. This poses the problem that you can't remove a key until you know that no threads are using it. This leads to the chicken-and-egg problem that you'll require all threads to synchronize on something else in order to get atomicity (of checking) and visibility across threads, which then means that you can't do much else than slap a single synchronized block around the whole thing, completely subverting your lock striping strategy.
Let's revisit the constraints. The main thing here is that things get cleared up eventually. It's not a correctness constraint but just a memory issue. Hence what we really want to do is identify some point at which the key could definitely no longer be used, and then use this as the trigger to remove it from the map. There are two cases here:
You can identify such a condition, and logically test for it. In which case you can remove the keys from the map with (in the worst case) some kind of timer thread, or hopefully some logic that's more cleanly integrated with your application.
You cannot identify any condition by which you know that a key will no longer be used. In this case, by definition, there is no point at which it's safe to remove the keys from the map. So in fact, for correctness' sake, you must leave them in.
In any case, this effectively boils down to manual garbage collection. Remove the keys from the map when you can lazily determine that they're no longer going to be used. Your current solution is too eager here since (as you point out) it's doing the removal before this situation holds.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.