Creating a ConcurrentHashMap that supports "snapshots"

Creating a ConcurrentHashMap that supports "snapshots" - java

I'm attempting to create a ConcurrentHashMap that supports "snapshots" in order to provide consistent iterators, and am wondering if there's a more efficient way to do this. The problem is that if two iterators are created at the same time then they need to read the same values, and the definition of the concurrent hash map's weakly consistent iterators does not guarantee this to be the case. I'd also like to avoid locks if possible: there are several thousand values in the map and processing each item takes several dozen milliseconds, and I don't want to have to block writers during this time as this could result in writers blocking for a minute or longer.
What I have so far:
The ConcurrentHashMap's keys are Strings, and its values are instances of ConcurrentSkipListMap<Long, T>
When an element is added to the hashmap with putIfAbsent, then a new skiplist is allocated, and the object is added via skipList.put(System.nanoTime(), t).
To query the map, I use map.get(key).lastEntry().getValue() to return the most recent value. To query a snapshot (e.g. with an iterator), I use map.get(key).lowerEntry(iteratorTimestamp).getValue(), where iteratorTimestamp is the result of System.nanoTime() called when the iterator was initialized.
If an object is deleted, I use map.get(key).put(timestamp, SnapShotMap.DELETED), where DELETED is a static final object.
Questions:
Is there a library that already implements this? Or barring that, is there a data structure that would be more appropriate than the ConcurrentHashMap and the ConcurrentSkipListMap? My keys are comparable, so maybe some sort of concurrent tree would better support snapshots than a concurrent hash table.
How do I prevent this thing from continually growing? I can delete all of the skip list entries with keys less than X (except for the last key in the map) after all iterators that were initialized on or before X have completed, but I don't know of a good way to determine when this has happened: I can flag that an iterator has completed when its hasNext method returns false, but not all iterators are necessarily going to run to completion; I can keep a WeakReference to an iterator so that I can detect when it's been garbage collected, but I can't think of a good way to detect this other than by using a thread that iterates through the collection of weak references and then sleeps for several minutes - ideally the thread would block on the WeakReference and be notified when the wrapped reference is GC'd, but I don't think this is an option.
ConcurrentSkipListMap<Long, WeakReference<Iterator>> iteratorMap;
while(true) {
long latestGC = 0;
for(Map.Entry<Long, WeakReference<Iterator>> entry : iteratorMap.entrySet()) {
if(entry.getValue().get() == null) {
iteratorMap.remove(entry.getKey());
latestGC = entry.getKey();
} else break;
}
// remove ConcurrentHashMap entries with timestamps less than `latestGC`
Thread.sleep(300000); // five minutes
}
Edit: To clear up some confusion in the answers and comments, I'm currently passing weakly consistent iterators to code written by another division in the company, and they have asked me to increase the strength of the iterators' consistency. They are already aware of the fact that it is infeasible for me to make 100% consistent iterators, they just want a best effort on my part. They care more about throughput than iterator consistency, so coarse-grained locks are not an option.

What is your actual use case that requires a special implementation? From the Javadoc of ConcurrentHashMap (emphasis added):
Retrievals reflect the results of the most recently completed update operations holding upon their onset. ... Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time.
So the regular ConcurrentHashMap.values().iterator() will give you a "consistent" iterator, but only for one-time use by a single thread. If you need to use the same "snapshot" multiple times and/or by multiple threads, I suggest making a copy of the map.
EDIT: With the new information and the insistence for a "strongly consistent" iterator, I offer this solution. Please note that the use of a ReadWriteLock has the following implications:
Writes will be serialized (only one writer at a time) so write performance may be impacted.
Concurrent reads are allowed as long as there is no write in progress, so read performance impact should be minimal.
Active readers block writers but only as long as it takes to retrieve the reference to the current "snapshot". Once a thread has the snapshot, it no longer blocks writers no matter how long it takes to process the information in the snapshot.
Readers are blocked while any write is active; once the write finishes then all readers will have access to the new snapshot until a new write replaces it.
Consistency is achieved by serializing the writes and making a copy of the current values on each and every write. Readers that hold a reference to a "stale" snapshot can continue to use the old snapshot without worrying about modification, and the garbage collector will reclaim old snapshots as soon as no one is using it any more. It is assumed that there is no requirement for a reader to request a snapshot from an earlier point in time.
Because snapshots are potentially shared among multiple concurrent threads, the snapshots are read-only and cannot be modified. This restriction also applies to the remove() method of any Iterator instances created from the snapshot.
import java.util.*;
import java.util.concurrent.locks.*;
public class StackOverflow16600019 <K, V> {
private final ReadWriteLock locks = new ReentrantReadWriteLock();
private final HashMap<K,V> map = new HashMap<>();
private Collection<V> valueSnapshot = Collections.emptyList();
public V put(K key, V value) {
locks.writeLock().lock();
try {
V oldValue = map.put(key, value);
updateSnapshot();
return oldValue;
} finally {
locks.writeLock().unlock();
}
}
public V remove(K key) {
locks.writeLock().lock();
try {
V removed = map.remove(key);
updateSnapshot();
return removed;
} finally {
locks.writeLock().unlock();
}
}
public Collection<V> values() {
locks.readLock().lock();
try {
return valueSnapshot; // read-only!
} finally {
locks.readLock().unlock();
}
}
/** Callers MUST hold the WRITE LOCK. */
private void updateSnapshot() {
valueSnapshot = Collections.unmodifiableCollection(
new ArrayList<V>(map.values())); // copy
}
}

I've found that the ctrie is the ideal solution - it's a concurrent hash array mapped trie with constant time snapshots

Solution1) What about just synchronizing on the puts, and on the iteration. That should give you a consistent snapshot.
Solution2) Start iterating and make a boolean to say so, then override the puts, putAll so that they go into a queue, when the iteration is finished simply make those puts with the changed values.

Related

ConcurrentHashMap thread-safety without using putIfAbsent

I'am trying to clarify HashMap vs ConcurrentHashMap regarding type-safety and also performance. I came across a lot of good articles, but still getting troubles figuring it all out.
Let's take the following example using a ConcurrentHashMap, where I will try to add a value for a key not already there and returning it, the new way of doing it would be:
private final Map<K,V> map = new ConcurrentHashMap<>();
return map.putIfAbsent(k, new Object());
let's assume we don't want to use the putIfAbsent method, the above code should look something like this:
private final Map<K,V> map = new ConcurrentHashMap<>();
synchronized (map) {
V value = map.get(key); //Edit adding the value fetch inside synchronized block
if (!nonNull(value)) {
map.put(key, new Object());
}
}
return map.get(key)
Is the problem with this approach the fact that the whole map is locked whereas in first approach the putIfAbsent method only synchronizes on the bucket on which the hash of the key is, and thus leading to less performance ? Would the second approach work fine with just a HashMap ?

Is the problem with this approach the fact that the whole map is locked
There are two problems with this approach.
It's not intrinsic
The fact that you've acquired the lock on the map reference has zero effect whatsoever, except in regards to any other code that (tries) to acquire this lock. Crucially, ConcurrentHashmap itself does not acquire this lock.
So, if, during that second snippet (with synchronized), some other thread does this:
map.putIfAbsent(key, new Object());
Then it may occur that your map.get(key) call returns null, and nevertheless your followup map.put call ends up overwriting. In other words, that both your thread, and that hypothetical thread running putIfAbsent, both decided to write.
Presumably, if that is just fine in your book, that'd be weird. Why use putIfAbsent and check if map.get returns null in the first place?
Had the other thread done this:
synchronized (map) {
map.putIfAbsent(key, new Object());
}
then there'd be no problem; either your get-check-if-null-then-set code will set and the putIfAbsent call is a noop, or vice versa, but they couldn't possibly both 'decide to write'.
Which leads us to;
This is pointless
There are two different ways to achieve concurrency with maps: Intrinsic and extrinsic. There is zero point in doing both, and they do not interact.
If you have structure whereby all access (both read and write) out of a plain old entirely non-multicore capable java.util.HashMap goes through some shared lock (the hashmap instance itself, or any other lock, long as all threads that interact with that particular map instance use the same one), then that works fine and there is therefore no reason or point to using ConcurrentHashMap instead.
The point of ConcurrentHashMap is to streamline concurrent processes without the use of extrinsic locking: To let the map do the locking.
One of the reasons you want this is that the ConcurrentHashMap impl is significantly faster at the jobs it is capable of doing; these jobs are spelled out explicitly: It's the methods that ConcurrentHashMap has.
Atomicity
The central problem of your code snippet is that it lacks atomicity. Check-then-act is fundamentally broken in concurrent models (in your case: Check: Is key 'k' associated with no value or null?, then Act: Set the mapping of key 'k' to value 'v'). This is broken because what if the thing you checked changes in between? What if you have two threads that both 'check-and-act' and then run simultaneously; then they both check first, then both act first, and broken things ensue: One of the two threads will be acting upon a state that isn't equal to the state as it was when you checked, which means your check's broken.
The right model is act-then-check: Act first, and then check the result of the operation. Of course, this requires redefining, and integrating, the code you wrote explicitly in your snippet, into the very definition of your 'act' phase.
In other words, putIfAbsent is not a convenience method! is a fundamental operation! It's the only way (short of extrinsic locking) to convey the notion of: "Perform the action of associating 'v' with 'k', but only if there is no association yet. I'll check the results of this operation next". There is no way to break that down into if (!map.containsKey(key)) map.put(key, v); because check-then-act does not work in concurrent modelling.
Conclusions
Either get rid of concurrenthashmap, or get rid of synchronized. Having code that uses both is probably broken and even if it isn't, it's error prone, confusing, and I can guarantee you there's a much better way to write it (better in that it is more idiomatic, easier to read, more flexible in the face of future change requests, easier to test, and less likely to have hard-to-test-for bugs in it).
If you can state all operations you need to perform 100% in terms of the methods that CHM has, then do that, because CHM is vastly superior. It even has mechanisms for arbitrary operations: For example, unlike basic hashmaps, you can iterate through a CHM even if other threads are also messing with it, whereas with a normal hashmap you need to hold the lock for the entire duration of the operation, which means any other thread trying to do anything to that hashmap, even just 'ask for its size', need to wait. Hence, for most use cases, CHM results in orders of magnitude better performance.

in first approach the putIfAbsent method only synchronizes on the bucket
That is incorrect, ConcurrentHashMap doesn't synchronize on anything, it uses different mechanics to ensure thread safety.
Would the second approach work fine with just a HashMap ?
Yes, except the second approach is flawed. If using synchronization to make a Map thread-safe, then all access of the Map should use synchronization. As such, it would be best to call Collections.synchronizedMap(map). Performance will be worse than using ConcurrentHashMap.
private final Map<Integer, Object> map = Collections.synchronizedMap(new HashMap<>());
let's assume we don't want to use the putIfAbsent method.
Why? Oh, because it wastes a allocation if the key is already in the map, which is why we should be using computeIfAbsent() instead
map.computeIfAbsent(key, k -> new Object());

Thread safe data structure to check for existence and write if not

I want to parse a long list of strings with duplicates and save each unique string to an array exactly once. In a multi threaded approach, threads will check a shared data structure for existence and write if it does not exist.
I forget what data structure is appropriate for this.
Anything from Java.util is fine and so are high performance third party libraries.

The collection classes in the java.util package are not thread-safe in order to provide maximum performance in single-threaded applications. (Vector and Hashtable being exceptions)
There are a few ways to achieve the thread safety you are looking for.
Sychronized Wrapper
Set<String> safeSet = Collections.synchronizedSet(new HashSet<>());
This will wrap all the calls to the underlying set in in a synchronized block, locking on the object. However, That means when a thread is iterating over elements in a collection, all other collection’s methods block, causing other threads having to wait.
java.util.concurrent Package
Java 5 introduced concurrent collections that provide much better performance than synchronized wrappers.
There are different flavors: copy-on-write, Compare-And-Swap, and concurrent collections.
The concurrent collections use a special Lock that is more flexible than synchronization.
So for what you are doing, a HashSet is probably a good match, if it was single threaded. In the concurrent package you could use ConcurrentHashMap.
It would look like this:
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
...
private static final Object PRESENT = new Object();
Map<String, Object> seenStrings = new ConcurrentHashMap<>();
for ( String aString : stringList ) {
if ( seenStrings.containsKey(aString) ) {
// Already there
} else {
// Not seen yet
seenStrings.put(aString, PRESENT);
}
}
Update
Andy's comment is a good one, I wasn't sure what you wanted to do if you had already seen an item or if you haven't.
You could do this to ensure the check and insert are executed atomically
if (seenStrings.put(aString, PRESENT) == null) {
// Not seen yet
}
Update In Java 8+, you can create a set backed by the specified map. Effectively a ConcurrentHashSet.
Set<String> seenStrings = Collections.newSetFromMap(new ConcurrentHashMap<>());
for (String aString : stringList) {
if (seenStrings.add(aString)) {
// Not seen yet
}
}

You can either use CopyOnWriteArrayList or ConcurrentLinkedQueue for this purpose. However if you have many writes CopyOnWrite approach would be costly.
If you want to remove duplicates consider using CopyOnWriteArraySet

Avoiding concurrent structures by manually triggering memory barriers

Background
I have a class whose instances are used to collect and publish data (uses Guava's HashMultimap):
public class DataCollector {
private final SetMultimap<String, String> valueSetsByLabel
= HashMultimap.create();
public void addLabelValue(String label, String value) {
valueSetsByLabel.put(label, value);
}
public Set<String> getLabels() {
return valueSetsByLabel.keySet();
}
public Set<String> getLabelValues(String label) {
return valueSetsByLabel.get(label);
}
}
Instances of this class will now be passed between threads, so I need to modify it for thread-safety. Since Guava's Multimap implementations aren't thread-safe, I used a LoadingCache that lazily creates concurrent hash sets instead (see the CacheBuilder and MapMaker javadocs for details):
public class ThreadSafeDataCollector {
private final LoadingCache<String, Set<String>> valueSetsByLabel
= CacheBuilder.newBuilder()
.concurrencyLevel(1)
.build(new CacheLoader<String, Set<String>>() {
#Override
public Set<String> load(String label) {
// make and return a concurrent hash set
final ConcurrentMap<String, Boolean> map = new MapMaker()
.concurrencyLevel(1)
.makeMap();
return Collections.newSetFromMap(map);
}
});
public void addLabelValue(String label, String value) {
valueSetsByLabel.getUnchecked(label).add(value);
}
public Set<String> getLabels() {
return valueSetsByLabel.asMap().keySet();
}
public Set<String> getLabelValues(String label) {
return valueSetsByLabel.getUnchecked(label);
}
}
You'll notice I'm setting the concurrency level for both the loading cache and nested concurrent hash sets to 1 (meaning they each only read from and write to one underlying table). This is because I only expect one thread at a time to read from and write to these objects.
(To quote the concurrencyLevel javadoc, "A value of one permits only one thread to modify the map at a time, but since read operations can proceed concurrently, this still yields higher concurrency than full synchronization.")
Problem
Because I can assume there will only be a single reader/writer at a time, I feel that using many concurrent hash maps per object is heavy-handed. Such structures are meant to handle concurrent reads and writes, and guarantee atomicity of concurrent writes. But in my case atomicity is unimportant - I only need to make sure each thread sees the last thread's changes.
In my search for a more optimal solution I came across this answer by erickson, which says:
Any data that is shared between thread needs a "memory barrier" to ensure its visibility.
[...]
Changes to any member that is declared volatile are visible to all
threads. In effect, the write is "flushed" from any cache to main
memory, where it can be seen by any thread that accesses main memory.
Now it gets a bit trickier. Any writes made by a thread before that
thread writes to a volatile variable are also flushed. Likewise, when
a thread reads a volatile variable, its cache is cleared, and
subsequent reads may repopulate it from main memory.
[...]
One way to make this work is to have the thread that is populating
your shared data structure assign the result to a volatile variable. [...]
When other threads access that variable, not only are they guaranteed
to get the most recent value for that variable, but also any changes
made to the data structure by the thread before it assigned the value
to the variable.
(See this InfoQ article for a further explanation of memory barriers.)
The problem erickson is addressing is slightly different in that the data structure in question is fully populated and then assigned to a variable that he suggests be made volatile, whereas my structures are assigned to final variables and gradually populated across multiple threads. But his answer suggests I could use a volatile dummy variable to manually trigger memory barriers:
public class ThreadVisibleDataCollector {
private final SetMultimap<String, String> valueSetsByLabel
= HashMultimap.create();
private volatile boolean dummy;
private void readMainMemory() {
if (dummy) { }
}
private void writeMainMemory() {
dummy = false;
}
public void addLabelValue(String label, String value) {
readMainMemory();
valueSetsByLabel.put(label, value);
writeMainMemory();
}
public Set<String> getLabels() {
readMainMemory();
return valueSetsByLabel.keySet();
}
public Set<String> getLabelValues(String label) {
readMainMemory();
return valueSetsByLabel.get(label);
}
}
Theoretically, I could take this a step further and leave it to the calling code to trigger memory barriers, in order to avoid unnecessary volatile reads and writes between calls on the same thread (potentially by using Unsafe.loadFence and Unsafe.storeFence, which were added in Java 8). But that seems too extreme and hard to maintain.
Question
Have I drawn the correct conclusions from my reading of erickson's answer (and the JMM) and implemented ThreadVisibleDataCollector correctly? I wasn't able to find examples of using a volatile dummy variable to trigger memory barriers, so I want to verify that this code will behave as expected across architectures.

The thing you are trying to do is called “Premature Optimization”. You don’t have a real performance problem but try to make your entire program very complicated and possibly error prone, without any gain.
The reason why you will never experience any (notable) gain lies in the way how a lock works. You can learn a lot of it by studying the documentation of the class AbstractQueuedSynchronizer.
A Lock is formed around a simple int value with volatile semantics and atomic updates. In the simplest form, i.e. without contention, locking and unlocking consist of a single atomic update of this int variable. Since you claim that you can be sure that there will be only one thread accessing the data at a given time, there will be no contention and the lock state update has similar performance characteristics compared to your volatile boolean attempts but with the difference that the Lock code works reliable and is heavily tested.
The ConcurrentMap approach goes a step further and allows a lock-free read that has the potential to be even more efficient than your volatile read (depending on the actual implementation).
So you are creating a potentially slower and possibly error prone program just because you “feel that using many concurrent hash maps per object is heavy-handed”. The only answer can be: don’t feel. Measure. Or just leave it as is as long as there is no real performance problem.

Some value is written to volatile variable happens-before this value can be read from it. As a consequence, the visibility guarantees you want will be achieved by reading/writing it, so the answer is yes, this solves visibility issues.
Besides the problems mentioned by Darren Gilroy in his answer, I'd like to remember that in Java 8 there are explicit memory barrier instructions in Unsafe class:
/**
* Ensures lack of reordering of loads before the fence
* with loads or stores after the fence.
*/
void loadFence();
/**
* Ensures lack of reordering of stores before the fence
* with loads or stores after the fence.
*/
void storeFence();
/**
* Ensures lack of reordering of loads or stores before the fence
* with loads or stores after the fence.
*/
void fullFence();
Although Unsafe is not a public API, I still recommend to at least consider using it, if you're using Java 8.
One more solution is coming to my mind. You have set your concurrencyLevel to 1 which means that only one thread at a time can do anything with a collection. IMO standard Java synchronized or ReentrantLock (for the cases of high contention) will also fit for your task and do provide visibility guarantees. Although, if you want one writer, many readers access pattern, consider using ReentrantReadWriteLock.

Well, that's still not particularly safe, b/c it depends a lot of the underlying implementation of the HashMultimap.
You might take a look at the following blog post for a discussion: http://mailinator.blogspot.com/2009/06/beautiful-race-condition.html
For this type of thing, a common pattern is to load a "most recent version" into a volatile variable and have your readers read immutable versions through that. This is how CopyOnWriteArrayList is implemented.
Something like ...
class Collector {
private volatile HashMultimap values = HashMultimap.create();
public add(String k, String v) {
HashMultimap t = HashMultimap.create(values);
t.put(k,v);
this.values = t; // this invokes a memory barrier
}
public Set<String> get(String k) {
values.get(k); // this volatile read is memory barrier
}
}
However, both your and my solution still have a bit of a problem -- we are both returning mutable views on the underlying data structure. I might change the HashMultimap to an ImmutableMultimap to fix the mutability issue. Beware also that callers retain a reference to the full internal map (not just the returned Set) as a side effect of things being a view.
Creating a new copy can seem somewhat wasteful, but I suspect that if you have only one thread writing, then you have an understanding of the rate of change and can decide if that's reasonable or not. For example, f you wanted to return Set<String> instances which update dynamically as things change then the solution based on map maker doesn't seem heavy handed.

How to use ReadWriteLock?

I'm the following situation.
At web application startup I need to load a Map which is thereafter used by multiple incoming threads. That is, requests comes in and the Map is used to find out whether it contains a particular key and if so the value (the object) is retrieved and associated to another object.
Now, at times the content of the Map changes. I don't want to restart my application to reload the new situation. Instead I want to do this dynamically.
However, at the time the Map is re-loading (removing all items and replacing them with the new ones), concurrent read requests on that Map still arrive.
What should I do to prevent all read threads from accessing that Map while it's being reloaded ? How can I do this in the most performant way, because I only need this when the Map is reloading which will only occur sporadically (each every x weeks) ?
If the above is not an option (blocking) how can I make sure that while reloading my read request won't suffer from unexpected exceptions (because a key is no longer there, or a value is no longer present or being reloaded) ?
I was given the advice that a ReadWriteLock might help me out. Can you someone provide me an example on how I should use this ReadWriteLock with my readers and my writer ?
Thanks,
E

I suggest to handle this as follow:
Have your map accessible at a central place (could be a Spring singleton, a static ...).
When starting to reload, let the instance as is, work in a different Map instance.
When that new map is filled, replace the old map with this new one (that's an atomic operation).
Sample code:
static volatile Map<U, V> map = ....;
// **************************
Map<U, V> tempMap = new ...;
load(tempMap);
map = tempMap;
Concurrency effects :
volatile helps with visibility of the variable to other threads.
While reloading the map, all other threads see the old value undisturbed, so they suffer no penalty whatsoever.
Any thread that retrieves the map the instant before it is changed will work with the old values.
It can ask several gets to the same old map instance, which is great for data consistency (not loading the first value from the older map, and others from the newer).
It will finish processing its request with the old map, but the next request will ask the map again, and will receive the newer values.

If the client threads do not modify the map, i.e. the contents of the map is solely dependent on the source from where it is loaded, you can simply load a new map and replace the reference to the map your client threads are using once the new map is loaded.
Other then using twice the memory for a short time, no performance penalty is incurred.
In case the map uses too much memory to have 2 of them, you can use the same tactic per object in the map; iterate over the map, construct a new mapped-to object and replace the original mapping once the object is loaded.

Note that changing the reference as suggested by others could cause problems if you rely on the map being unchanged for a while (e.g. if (map.contains(key)) {V value = map.get(key); ...}. If you need that, you should keep a local reference to the map:
static Map<U,V> map = ...;
void do() {
Map<U,V> local = map;
if (local.contains(key)) {
V value = local.get(key);
...
}
}
EDIT:
The assumption is that you don't want costly synchronization for your client threads. As a trade-off, you allow client threads to finish their work that they've already begun before your map changed - ignoring any changes to the map that happened while it is running. This way, you can safely made some assumptions about your map - e.g. that a key is present and always mapped to the same value for the duration of a single request. In the example above, if your reader thread changed the map just after a client called map.contains(key), the client might get null on map.get(key) - and you'd almost certainly end this request with a NullPointerException. So if you're doing multiple reads to the map and need to do some assumptions as the one mentioned before, it's easiest to keep a local reference to the (maybe obsolete) map.
The volatile keyword isn't strictly necessary here. It would just make sure that the new map is used by other threads as soon as you changed the reference (map = newMap). Without volatile, a subsequent read (local = map) could still return the old reference for some time (we're talking about less than a nanosecond though) - especially on multicore systems if I remember correctly. I wouldn't care about it, but f you feel a need for that extra bit of multi-threading beauty, your free to use it of course ;)

I like the volatile Map solution from KLE a lot and would go with that. Another idea that someone might find interesting is to use the map equivalent of a CopyOnWriteArrayList, basically a CopyOnWriteMap. We built one of these internally and it is non-trivial but you might be able to find a COWMap out in the wild:
http://old.nabble.com/CopyOnWriteMap-implementation-td13018855.html

This is the answer from the JDK javadocs for ReentrantReadWriteLock implementation of ReadWriteLock. A few years late but still valid, especially if you don't want to rely only on volatile
class RWDictionary {
private final Map<String, Data> m = new TreeMap<String, Data>();
private final ReentrantReadWriteLock rwl = new ReentrantReadWriteLock();
private final Lock r = rwl.readLock();
private final Lock w = rwl.writeLock();
public Data get(String key) {
r.lock();
try { return m.get(key); }
finally { r.unlock(); }
}
public String[] allKeys() {
r.lock();
try { return m.keySet().toArray(); }
finally { r.unlock(); }
}
public Data put(String key, Data value) {
w.lock();
try { return m.put(key, value); }
finally { w.unlock(); }
}
public void clear() {
w.lock();
try { m.clear(); }
finally { w.unlock(); }
}
}

Synchronizing on an Integer value [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
What is the best way to increase number of locks in java
Suppose I want to lock based on an integer id value. In this case, there's a function that pulls a value from a cache and does a fairly expensive retrieve/store into the cache if the value isn't there.
The existing code isn't synchronized and could potentially trigger multiple retrieve/store operations:
//psuedocode
public Page getPage (Integer id){
Page p = cache.get(id);
if (p==null)
{
p=getFromDataBase(id);
cache.store(p);
}
}
What I'd like to do is synchronize the retrieve on the id, e.g.
if (p==null)
{
synchronized (id)
{
..retrieve, store
}
}
Unfortunately this won't work because 2 separate calls can have the same Integer id value but a different Integer object, so they won't share the lock, and no synchronization will happen.
Is there a simple way of insuring that you have the same Integer instance? For example, will this work:
syncrhonized (Integer.valueOf(id.intValue())){
The javadoc for Integer.valueOf() seems to imply that you're likely to get the same instance, but that doesn't look like a guarantee:
Returns a Integer instance
representing the specified int value.
If a new Integer instance is not
required, this method should generally
be used in preference to the
constructor Integer(int), as this
method is likely to yield
significantly better space and time
performance by caching frequently
requested values.
So, any suggestions on how to get an Integer instance that's guaranteed to be the same, other than the more elaborate solutions like keeping a WeakHashMap of Lock objects keyed to the int? (nothing wrong with that, it just seems like there must be an obvious one-liner than I'm missing).

You really don't want to synchronize on an Integer, since you don't have control over what instances are the same and what instances are different. Java just doesn't provide such a facility (unless you're using Integers in a small range) that is dependable across different JVMs. If you really must synchronize on an Integer, then you need to keep a Map or Set of Integer so you can guarantee that you're getting the exact instance you want.
Better would be to create a new object, perhaps stored in a HashMap that is keyed by the Integer, to synchronize on. Something like this:
public Page getPage(Integer id) {
Page p = cache.get(id);
if (p == null) {
synchronized (getCacheSyncObject(id)) {
p = getFromDataBase(id);
cache.store(p);
}
}
}
private ConcurrentMap<Integer, Integer> locks = new ConcurrentHashMap<Integer, Integer>();
private Object getCacheSyncObject(final Integer id) {
locks.putIfAbsent(id, id);
return locks.get(id);
}
To explain this code, it uses ConcurrentMap, which allows use of putIfAbsent. You could do this:
locks.putIfAbsent(id, new Object());
but then you incur the (small) cost of creating an Object for each access. To avoid that, I just save the Integer itself in the Map. What does this achieve? Why is this any different from just using the Integer itself?
When you do a get() from a Map, the keys are compared with equals() (or at least the method used is the equivalent of using equals()). Two different Integer instances of the same value will be equal to each other. Thus, you can pass any number of different Integer instances of "new Integer(5)" as the parameter to getCacheSyncObject and you will always get back only the very first instance that was passed in that contained that value.
There are reasons why you may not want to synchronize on Integer ... you can get into deadlocks if multiple threads are synchronizing on Integer objects and are thus unwittingly using the same locks when they want to use different locks. You can fix this risk by using the
locks.putIfAbsent(id, new Object());
version and thus incurring a (very) small cost to each access to the cache. Doing this, you guarantee that this class will be doing its synchronization on an object that no other class will be synchronizing on. Always a Good Thing.

Use a thread-safe map, such as ConcurrentHashMap. This will allow you to manipulate a map safely, but use a different lock to do the real computation. In this way you can have multiple computations running simultaneous with a single map.
Use ConcurrentMap.putIfAbsent, but instead of placing the actual value, use a Future with computationally-light construction instead. Possibly the FutureTask implementation. Run the computation and then get the result, which will thread-safely block until done.

Integer.valueOf() only returns cached instances for a limited range. You haven't specified your range, but in general, this won't work.
However, I would strongly recommend you not take this approach, even if your values are in the correct range. Since these cached Integer instances are available to any code, you can't fully control the synchronization, which could lead to a deadlock. This is the same problem people have trying to lock on the result of String.intern().
The best lock is a private variable. Since only your code can reference it, you can guarantee that no deadlocks will occur.
By the way, using a WeakHashMap won't work either. If the instance serving as the key is unreferenced, it will be garbage collected. And if it is strongly referenced, you could use it directly.

Using synchronized on an Integer sounds really wrong by design.
If you need to synchronize each item individually only during retrieve/store you can create a Set and store there the currently locked items. In another words,
// this contains only those IDs that are currently locked, that is, this
// will contain only very few IDs most of the time
Set<Integer> activeIds = ...
Object retrieve(Integer id) {
// acquire "lock" on item #id
synchronized(activeIds) {
while(activeIds.contains(id)) {
try {
activeIds.wait();
} catch(InterruptedExcption e){...}
}
activeIds.add(id);
}
try {
// do the retrieve here...
return value;
} finally {
// release lock on item #id
synchronized(activeIds) {
activeIds.remove(id);
activeIds.notifyAll();
}
}
}
The same goes to the store.
The bottom line is: there is no single line of code that solves this problem exactly the way you need.

How about a ConcurrentHashMap with the Integer objects as keys?

You could have a look at this code for creating a mutex from an ID. The code was written for String IDs, but could easily be edited for Integer objects.

As you can see from the variety of answers, there are various ways to skin this cat:
Goetz et al's approach of keeping a cache of FutureTasks works quite well in situations like this where you're "caching something anyway" so don't mind building up a map of FutureTask objects (and if you did mind the map growing, at least it's easy to make pruning it concurrent)
As a general answer to "how to lock on ID", the approach outlined by Antonio has the advantage that it's obvious when the map of locks is added to/removed from.
You may need to watch out for a potential issue with Antonio's implementation, namely that the notifyAll() will wake up threads waiting on all IDs when one of them becomes available, which may not scale very well under high contention. In principle, I think you can fix that by having a Condition object for each currently locked ID, which is then the thing that you await/signal. Of course, if in practice there's rarely more than one ID being waited on at any given time, then this isn't an issue.

Steve,
your proposed code has a bunch of problems with synchronization. (Antonio's does as well).
To summarize:
You need to cache an expensive
object.
You need to make sure that while one thread is doing the retrieval, another thread does not also attempt to retrieve the same object.
That for n-threads all attempting to get the object only 1 object is ever retrieved and returned.
That for threads requesting different objects that they do not contend with each other.
pseudo code to make this happen (using a ConcurrentHashMap as the cache):
ConcurrentMap<Integer, java.util.concurrent.Future<Page>> cache = new ConcurrentHashMap<Integer, java.util.concurrent.Future<Page>>;
public Page getPage(Integer id) {
Future<Page> myFuture = new Future<Page>();
cache.putIfAbsent(id, myFuture);
Future<Page> actualFuture = cache.get(id);
if ( actualFuture == myFuture ) {
// I am the first w00t!
Page page = getFromDataBase(id);
myFuture.set(page);
}
return actualFuture.get();
}
Note:
java.util.concurrent.Future is an interface
java.util.concurrent.Future does not actually have a set() but look at the existing classes that implement Future to understand how to implement your own Future (Or use FutureTask)
Pushing the actual retrieval to a worker thread will almost certainly be a good idea.

See section 5.6 in Java Concurrency in Practice: "Building an efficient, scalable, result cache". It deals with the exact issue you are trying to solve. In particular, check out the memoizer pattern.
(source: umd.edu)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.