Atomic multi-entry operations on ConcurrentHashMap

Atomic multi-entry operations on ConcurrentHashMap - java

I need to perform a two-entries concurrent operation on a ConcurrentHashMapatomically.
I have a ConcurrentHashMap of Client, with an Integer id as key; every client has a selectedId attribute, which contains the id of another client or itself (means nobody selected).
At every clientChangedSelection(int whoChangedSelection) concurrent event, I need to check atomically if both the client and the selected client are referencing each other. If they do, they get removed and returned.
In the meantime clients can be added or removed by other threads.
The "ideal" solution would be to have a lock for every entry and lock the affected entries, every clientChangedSelection runs in it's own thread so they would wait if necessary. Of course that's not practical. On top of that, ConcurrentHashMap doesn't offer apis to manually lock buckets as far as I know. And on top of that again, I've read somewhere that the buckets' locks aren't reentrant. Not sure if that's true or why.
My "imaginative" approach makes heavy use of nested compute() methods to guarantee atomicity. If ConcurrentHashMap's locks aren't reentrant, this won't work. It loses any readability, requires "value capturing" workarounds, and performances are probably bad. But performances aren't much an issue as long as they don't affect threads working on unrelated entries. (i.e. in different buckets).
public Client[] match(int id){
final Client players[]=new Client[]{null,null};
clients.computeIfPresent(id,(idA, playerA)->{
if(playerA.selectedId!=idA){
clients.computeIfPresent(playerA.selectedId,(idB, playerB)->{
if(playerB.selectedId==idA){
players[0]=playerA;
players[1]=playerB;
return null;
}else{
return playerB;
}
});
}
if(players[0]==null){
return playerA;
}else{
return null;
}
});
if(players[0]==null){
return null;
}else{
return players;
}
}
The "unacceptable" approach synchronizes the entire match method. This invalidates the point of having concurrent events in the first place.
The "wrong" approach temporarily removes the two clients while working with them, and adds them back in case. This makes concurrent events using the entries fail instead of waiting, as "in use" becomes indistinguishable from "not present".
I think I'll go back to a timer which inspects the whole map in one pass every n seconds. No additional synchronization would be required, but it's less elegant.
This is, more or less, a common concurrency situation, but it's made interesting by the ConcurrentHashMap, that discourages from reinventing too much the wheel.
What would your approach be? Any suggestions?
Edit 1
Synchronizing every access (thus defeating the point of using a ConcurrentHashMap) is not a viable solution either. Concurrent access must be preserved, else the problem itself wouldn't exist.
I've removed the selectedId parameter from match(), but note that doesn't really matter. The fictitious event clientChangedSelection(int whoChangedSelection) represents the concurrent event. Could happen any time in any operating thread. match() is just an example function that gets called to handle the matching. Hope I made it clearer.
Edit 2
This is the doubly-synchronized function I ended up with. idSelect() is an example of a method that requires synchronization, as it modifies client attributes. Synchronization for put() and remove() is not required in this case, what the function sees is new enough.
There happens to be two checks: the first one is there just to get the clients to synchronize onto, the second one is there to tell if a previously executed match succeeded and removed the client, while the current one was waiting.
match() can't match the same client twice, and that was important (the atomic part).
match() can still match concurrently removed clients (removed with classic map apis, not by the same function), and that's tolerable.
public void idSelected(int id, int selectedId){
Client playerA=clients.get(id);
if(playerA!=null){
synchronized(playerA){
playerA.selectedId=selectedId;
}
}
}
public Client[] match(int id, int selectedId){
// determine if players exist in order be synchronized onto
Client playerA=clients.get(id);
if(playerA==null){
return null;
}
Client playerB=clients.get(selectedId);
if(playerB==null){
return null;
}
// sort players in order to do nested synchronization safely
if(id>selectedId){
final Client t=playerA;
playerA=playerB;
playerB=t;
}
// check under synchronization
synchronized(playerA){
if(clients.containsKey(playerA.id)){
synchronized(playerB){
if(clients.containsKey(playerB.id)){
if(playerA.selectedId==playerB.id&&playerB.selectedId==playerA.id){
clients.remove(id);
clients.remove(selectedId);
return new Client[]{playerA,playerB};
}
}
}
}
}
return null;
}

Related

How to guarantee get() of ConcurrentHashMap to always return the latest actual value?

Introduction
Suppose I have a ConcurrentHashMap singleton:
public class RecordsMapSingleton {
private static final ConcurrentHashMap<String,Record> payments = new ConcurrentHashMap<>();
public static ConcurrentHashMap<String, Record> getInstance() {
return payments;
}
}
Then I have three subsequent requests (all processed by different threads) from different sources.
The first service makes a request, that gets the singleton, creates Record instance, generates unique ID and places it into Map, then sends this ID to another service.
Then the second service makes another request, with that ID. It gets the singleton, finds Record instance and modifies it.
Finally (probably after half an hour) the second service makes another request, in order to modify Record further.
Problem
In some really rare cases, I'm experiencing heisenbug. In logs I can see, that first request successfully placed Record into Map, second request found it by ID and modified it, and then third request tried to find Record by ID, but found nothing (get() returned null).
The single thing that I found about ConcurrentHashMap guarantees, is:
Actions in a thread prior to placing an object into any concurrent
collection happen-before actions subsequent to the access or removal
of that element from the collection in another thread.
from here. If I got it right, it literally means, that get() could return any value that actually was sometime into Map, as far as it doesn't ruin happens-before relationship between actions in different threads.
In my case it applies like this: if third request doesn't care about what happened during processing of first and second, then it could read null from Map.
It doesn't suit me, because I really need to get from Map the latest actual Record.
What have I tried
So I started to think, how to form happens-before relationship between subsequent Map modifications; and came with idea. JLS says (in 17.4.4) that:
A write to a volatile variable v (§8.3.1.4) synchronizes-with all
subsequent reads of v by any thread (where "subsequent" is defined
according to the synchronization order).
So, let's suppose, I'll modify my singleton like this:
public class RecordsMapSingleton {
private static final ConcurrentHashMap<String,Record> payments = new ConcurrentHashMap<>();
private static volatile long revision = 0;
public static ConcurrentHashMap<String, Record> getInstance() {
return payments;
}
public static void incrementRevision() {
revision++;
}
public static long getRevision() {
return revision;
}
}
Then, after each modification of Map or Record inside, I'll call incrementRevision() and before any read from Map I'll call getRevision().
Question
Due to nature of heisenbugs no amount of tests is enough to tell that this solution is correct. And I'm not an expert in concurrency, so couldn't verify it formally.
Can someone approve, that following this approach guarantees that I'm always going to get the latest actual value from ConcurrentHashMap? If this approach is incorrect or appears to be inefficient, could you recommend me something else?

You approach will not work as you are actually repeating the same mistake again. As ConcurrentHashMap.put and ConcurrentHashMap.get will create a happens before relationship but no time ordering guaranty, exactly the same applies to your reads and writes to the volatile variable. They form a happens before relationship but no time ordering guaranty, if one thread happens to call get before the other performed put, the same applies to the volatile read that will happen before the volatile write then. Besides that, you are adding another error as applying the ++ operator to a volatile variable is not atomic.
The guarantees made for volatile variables are not stronger than these made for a ConcurrentHashMap. It’s documentation explicitly states:
Retrievals reflect the results of the most recently completed update operations holding upon their onset.
The JLS states that external actions are inter-thread actions regarding the program order:
An inter-thread action is an action performed by one thread that can be detected or directly influenced by another thread. There are several kinds of inter-thread action that a program may perform:
…
External Actions. An external action is an action that may be observable outside of an execution, and has a result based on an environment external to the execution.
Simply said, if one thread puts into a ConcurrentHashMap and sends a message to an external entity and a second thread gets from the same ConcurrentHashMap after receiving a message from an external entity depending on the previously sent message, there can’t be a memory visibility issue.
It might be the case that these action aren’t programmed that way or that the external entity doesn’t have the assumed dependency, but it might be the case that the error lies in a completely different area but we can’t tell as you didn’t post the relevant code, e.g. the key doesn’t match or the printing code is wrong. But whatever it is, it won’t be fixed by the volatile variable.

Scalable patterns for thread-safe hashtable puts when keeping track of frequency

This was an interview question I got some time last week and it ended at a cliffhanger. The question was simple: Design a service that keeps track of the frequency of "messages" (a 1 line string, could be in different languages) passed to it. There are 2 broad apis: submitMsg(String msg) and getFrequency(String msg). My immediate reaction was to use as hashMap that uses a String as a key (in this case, a message) and an Integer as a value (to keep track of counts/frequency).
The submitMsg api simply sees whether a message exists in the hashMap. If it doesn't, put the message and set the frequency to 1; if it does, then get the current count and increment it by 1. The interviewer then pointed out this would fail miserably in the event multiple threads access the SAME key at the SAME exact time.
For example: At 12:00:00:000 Thread1 would try to "submitMsg", and thereby my method would do a (1) get on the hashMap and see that the value is not null, it is infact, say 100 (2) do a put by incrementing the frequency by 1 so that the key's value is 101. Meanwhile consider that Thread2 ALSO tried to do a submitMsg at exactly At 12:00:00:000, and the method once again internally did a get on the hashMap (which returned a 100 - this is a race condition), after which the hashMap now increments the frequency to 101. Alas, the true frequency should have been 102 and not 101, and this is a major design flaw in a largely multithreaded environment. I wasn't sure how to stop this from happening: Putting a lock on simply the write isn't good enough, and having a lock on a read didn't make sense. What would have been ideal is to "lock" an element if a get was invoked internally via the submitMsg api because we expect it to be "written to" soonafter. The lock would be released once the frequency had been updated, but if someone were to use the getFrequency() api having a pure lock wouldn't make sense. I'm not sure whether a mutex would help here because I don't have a strong background in distributed systems.
I'm looking to the SO community for help on the best way to think through a problem like this. Is the magic in the datastructure to be used or some kind of synchronization that I need to do in my api itself? How can we maintain the integrity of "frequency" while maintaining the scalability of the service as well?

Well, your initial idea isn't a million miles off, you just need to make it thread safe. For instance, you could use a ConcurrentHashMap<String, AtomicInteger>.
public void submitMsg(String msg) {
AtomicInteger previous = map.putIfAbsent(msg, new AtomicInteger(1));
if (null != previous) {
previous.incrementAndGet();
}
}

The simplest solution is using Guava's com.google.common.collect.ConcurrentHashMultiset:
private final ConcurrentHashMultiset<String> multiset = ConcurrentHashMultiset.create();
public void submitMsg(String msg) {
multiset.add(msg);
}
public int count(String msg) {
return multiset.count(msg);
}
But this is basically the same as Aurand's solution, just that somebody already implemented the boring details like creating the counter if it doesn't exists yet, etc.

Treat it as a Producer–consumer problem.
The service is the producer; it should add each message to a queue that feeds the consumer. You could run one queue per producer to ensure that the producers do not wait.
The consumer encapsulates the HashTable, and pulls the messages off the queue and updates the table.

specific question on java threading + synchronization

I know this question sounds crazy, but consider the following java snippets:
Part - I:
class Consumer implements Runnable{
private boolean shouldTerminate = false
public void run() {
while( !shouldTerminate ){
//consume and perform some operation.
}
}
public void terminate(){
this.shouldTerminate = true;
}
}
So, the first question is, should I ever need to synchronize on shouldTerminate boolean? If so why? I don't mind missing the flag set to true for one or two cycles(cycle = 1 loop execution). And second, can a boolean variable ever be in a inconsistent state?(anything other than true or false)
Part - II of the question:
class Cache<K,V> {
private Map<K, V> cache = new HashMap<K, V>();
public V getValue(K key) {
if ( !cache.containsKey(key) ) {
synchronized(this.cache){
V value = loadValue(key)
cache.put(key, value);
}
}
return cache.get(key);
}
}
Should access to the whole map be synchronized? Is there any possibility where two threads try to run this method, with one "writer thread" halfway through the process of storing value into the map and simultaneously, a "reader thread" invoking the "contains" method. Will this cause the JVM to blow up? (I don't mind overwriting values in the map -- if two writer threads try to load at the same time)

Both of the code examples have broken concurrency.
The first one requires at least the field marked volatile or else the other thread might never see the variable being changed (it may store its value in CPU cache or a register, and not check whether the value in memory has changed).
The second one is even more broken, because the internals of HashMap are no thread-safe and it's not just a single value but a complex data structure - using it from many threads produces completely unpredictable results. The general rule is that both reading and writing the shared state must be synchronized. You may also use ConcurrentHashMap for better performance.

Unless you either synchronize on the variable, or mark the variable as volatile, there is no guarantee that separate threads' view of the object ever get reconciled. To quote the Wikipedia artible on the Java Memory Model
The major caveat of this is that as-if-serial semantics do not prevent different threads from having different views of the data.
Realistically, so long as the two threads synchronize on some lock at some time, the update to the variable will be seen.
I am wondering why you wouldn't want to mark the variable volatile?

It's not that the JVM will "blow up" as such. But both cases are incorrectly synchronised, and so the results will be unpredictable. The bottom line is that JVMs are designed to behave in a particular way if you synchronise in a particular way; if you don't synchronise correctly, you lose that guarantee.
It's not uncommon for people to think they've found a reason why certain synchronisation can be omitted, or to unknowingly omit necessary synchronisation but with no immediately obvious problem. But with inadequate synchronisation, there is a danger that your program could appear to work fine in one environment, only for an issue to appear later when a particular factor is changed (e.g. moving to a machine with more CPUs, or an update to the JVM that adds a particular optimisation).

Synchronizing shouldTerminate: See
Dilum's answer
Your bool value will
never be inconsistent state.
If one
thread is calling
cache.containsKey(key) while
another thread is calling
cache.put(key, value) the JVM will
blow up (by throwing ConcurrentModificationException)
something bad might happen if that put call caused the map
the grow, but will usually mostly work (worse than failure).

Limiting concurrent access to a method

I have a problem with limiting concurrent access to a method. I have a method MyService that can be called from many places at many times. This method must return a String, that should be updated according to some rules. For this, I have an updatedString class. Before getting the String, it makes sure that the String is updated, if not, it updates it. Many threads could read the String at the same time but ONLY ONE should renew the String at the same time if it is out of date.
public final class updatedString {
private static final String UPstring;
private static final Object lock = new Object();
public static String getUpdatedString(){
synchronized(lock){
if(stringNeedRenewal()){
renewString();
}
}
return getString();
}
...
This works fine. If I have 7 threads getting the String, it guarantees that, if necessary, ONLY one thread is updating the String.
My question is, is it a good idea to have all this static? Why if not? Is it fast? Is there a better way to do this?
I have read posts like this:
What Cases Require Synchronized Method Access in Java? which suggests that static mutable variables are not a good idea, and static classes either. But I cannot see any dead-lock in the code or a better valid solution. Only that some threads will have to wait until the String is updated (if necessary) or wait for other thread to leave the synchronized block (which causes a small delay).
If the method is not static, then I have a problem because this will not work since the synchronized method acts only for the current instance that the thread is using. Synchronized the method does not work either, it seems that the lock instance-specific and not class-specific.
The other solution could be to have a Singleton that avoids creating more than one instance and then use a single synchronized not-static class, but I do not like this solution too much.
Additional information:
stringNeedRenewal() is not too expensive although it has to read from a database. renewString() on the contrary is very expensive, and has to read from several tables on the database to finally come to an answer. The String needs arbitrary renewal, but this does not happen very often (from once per hour to once per week).
#forsvarir made me think... and I think he/she was right. return getString(); MUST be inside the synchronized method. At a first sight it looks as if it can be out of it so threads will be able to read it concurrently, but what happens if a thread stops running WHILE calling getString() and other thread partially execute renewString()? We could have this situation (assuming a single processor):
THREAD 1 starts getString(). The OS
starts copying into memory the bytes
to be returned.
THREAD 1 is stopped by the OS before finishing the copy.
THREAD 2 enters the synchronized
block and starts renewString(),
changing the original String in
memory.
THREAD 1 gets control back
and finish getString using a
corrupted String!! So it copied one
part from the old string and another
from the new one.
Having the read inside the synchronized block can make everything very slow, since threads could only access this one by one.
As #Jeremy Heiler pointed out, this is an abstract problem of a cache. If the cache is old, renew it. If not, use it. It is better more clear to picture the problem like this instead of a single String (or imagine that there are 2 strings instead of one). So what happens if someone is reading at the same time as someone is modifying the cache?

First of all, you can remove the lock and the synchronized block and simply use:
public static synchronized String getUpdatedString(){
if(stringNeedRenewal()){
renewString();
}
return getString();
}
this synchronizes on the UpdatedString.class object.
Another thing you can do is used double-checked locking to prevent unnecessary waiting. Declare the string to be volatile and:
public static String getUpdatedString(){
if(stringNeedRenewal()){
synchronized(lock) {
if(stringNeedRenewal()){
renewString();
}
}
}
return getString();
}
Then, whether to use static or not - it seems it should be static, since you want to invoke it without any particular instance.

I would suggest looking into a ReentrantReadWriteLock. (Whether or not it is performant is up to you to decide.) This way you can have many read operations occur simultaneously.
Here is the example from the documentation:
class CachedData {
Object data;
volatile boolean cacheValid;
ReentrantReadWriteLock rwl = new ReentrantReadWriteLock();
void processCachedData() {
rwl.readLock().lock();
if (!cacheValid) {
// Must release read lock before acquiring write lock
rwl.readLock().unlock();
rwl.writeLock().lock();
// Recheck state because another thread might have acquired
// write lock and changed state before we did.
if (!cacheValid) {
data = ...
cacheValid = true;
}
// Downgrade by acquiring read lock before releasing write lock
rwl.readLock().lock();
rwl.writeLock().unlock(); // Unlock write, still hold read
}
use(data);
rwl.readLock().unlock();
}
}

This isn't exactly what you're after, and I'm not a Java specialist, so take this with a pinch of salt :)
Perhaps the code sample you've provided is contrived, but if not, I'm unclear what the purpose of the class is. You only want one thread to update the string to it's new value. Why? Is it to save effort (because you'd rather use the processor cycles on something else)? Is it to maintain consistentcy (once a certain point is reached, the string must be updated)?
How long is the cycle between required updates?
Looking at your code...
public final class updatedString {
private static final String UPstring;
private static final Object lock = new Object();
public static String getUpdatedString(){
synchronized(lock){
// One thread is in this block at a time
if(stringNeedRenewal()){
renewString(); // This updates the shared string?
}
}
// At this point, you're calling out to a method. I don't know what the
// method does, I'm assuming it just returns UPstring, but at this point,
// you're no longer synchronized. The string actually returned may or may
// not be the same one that was present when the thread went through the
// synchronized section hence the question, what is the purpose of the
// synchronization...
return getString(); // This returns the shared string?
}
The right locking / optimizations depend upon the reason that you're putting them in place, the likelyhood of a write being required and as Paulo has said, the cost of the operations involved.
For some situations where writes are rare, and obviously depending upon what renewString does, it may be desirable to use an optimistic write approach. Where each thread checks if a refresh is required, proceeds to perform the update on a local and then only at the end, assigns the value across to the field being read (you need to track the age of your updates if you follow this approach). This would be faster for reading, since the check for 'does the string need renewed' can be performed outside of the synchronised section. Various other approaches could be used, depending upon the individual scenario...

as long as you lock is static, everything else doesn't have to be, and things will work just as they do now

Synchronizing on an Integer value [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
What is the best way to increase number of locks in java
Suppose I want to lock based on an integer id value. In this case, there's a function that pulls a value from a cache and does a fairly expensive retrieve/store into the cache if the value isn't there.
The existing code isn't synchronized and could potentially trigger multiple retrieve/store operations:
//psuedocode
public Page getPage (Integer id){
Page p = cache.get(id);
if (p==null)
{
p=getFromDataBase(id);
cache.store(p);
}
}
What I'd like to do is synchronize the retrieve on the id, e.g.
if (p==null)
{
synchronized (id)
{
..retrieve, store
}
}
Unfortunately this won't work because 2 separate calls can have the same Integer id value but a different Integer object, so they won't share the lock, and no synchronization will happen.
Is there a simple way of insuring that you have the same Integer instance? For example, will this work:
syncrhonized (Integer.valueOf(id.intValue())){
The javadoc for Integer.valueOf() seems to imply that you're likely to get the same instance, but that doesn't look like a guarantee:
Returns a Integer instance
representing the specified int value.
If a new Integer instance is not
required, this method should generally
be used in preference to the
constructor Integer(int), as this
method is likely to yield
significantly better space and time
performance by caching frequently
requested values.
So, any suggestions on how to get an Integer instance that's guaranteed to be the same, other than the more elaborate solutions like keeping a WeakHashMap of Lock objects keyed to the int? (nothing wrong with that, it just seems like there must be an obvious one-liner than I'm missing).

You really don't want to synchronize on an Integer, since you don't have control over what instances are the same and what instances are different. Java just doesn't provide such a facility (unless you're using Integers in a small range) that is dependable across different JVMs. If you really must synchronize on an Integer, then you need to keep a Map or Set of Integer so you can guarantee that you're getting the exact instance you want.
Better would be to create a new object, perhaps stored in a HashMap that is keyed by the Integer, to synchronize on. Something like this:
public Page getPage(Integer id) {
Page p = cache.get(id);
if (p == null) {
synchronized (getCacheSyncObject(id)) {
p = getFromDataBase(id);
cache.store(p);
}
}
}
private ConcurrentMap<Integer, Integer> locks = new ConcurrentHashMap<Integer, Integer>();
private Object getCacheSyncObject(final Integer id) {
locks.putIfAbsent(id, id);
return locks.get(id);
}
To explain this code, it uses ConcurrentMap, which allows use of putIfAbsent. You could do this:
locks.putIfAbsent(id, new Object());
but then you incur the (small) cost of creating an Object for each access. To avoid that, I just save the Integer itself in the Map. What does this achieve? Why is this any different from just using the Integer itself?
When you do a get() from a Map, the keys are compared with equals() (or at least the method used is the equivalent of using equals()). Two different Integer instances of the same value will be equal to each other. Thus, you can pass any number of different Integer instances of "new Integer(5)" as the parameter to getCacheSyncObject and you will always get back only the very first instance that was passed in that contained that value.
There are reasons why you may not want to synchronize on Integer ... you can get into deadlocks if multiple threads are synchronizing on Integer objects and are thus unwittingly using the same locks when they want to use different locks. You can fix this risk by using the
locks.putIfAbsent(id, new Object());
version and thus incurring a (very) small cost to each access to the cache. Doing this, you guarantee that this class will be doing its synchronization on an object that no other class will be synchronizing on. Always a Good Thing.

Use a thread-safe map, such as ConcurrentHashMap. This will allow you to manipulate a map safely, but use a different lock to do the real computation. In this way you can have multiple computations running simultaneous with a single map.
Use ConcurrentMap.putIfAbsent, but instead of placing the actual value, use a Future with computationally-light construction instead. Possibly the FutureTask implementation. Run the computation and then get the result, which will thread-safely block until done.

Integer.valueOf() only returns cached instances for a limited range. You haven't specified your range, but in general, this won't work.
However, I would strongly recommend you not take this approach, even if your values are in the correct range. Since these cached Integer instances are available to any code, you can't fully control the synchronization, which could lead to a deadlock. This is the same problem people have trying to lock on the result of String.intern().
The best lock is a private variable. Since only your code can reference it, you can guarantee that no deadlocks will occur.
By the way, using a WeakHashMap won't work either. If the instance serving as the key is unreferenced, it will be garbage collected. And if it is strongly referenced, you could use it directly.

Using synchronized on an Integer sounds really wrong by design.
If you need to synchronize each item individually only during retrieve/store you can create a Set and store there the currently locked items. In another words,
// this contains only those IDs that are currently locked, that is, this
// will contain only very few IDs most of the time
Set<Integer> activeIds = ...
Object retrieve(Integer id) {
// acquire "lock" on item #id
synchronized(activeIds) {
while(activeIds.contains(id)) {
try {
activeIds.wait();
} catch(InterruptedExcption e){...}
}
activeIds.add(id);
}
try {
// do the retrieve here...
return value;
} finally {
// release lock on item #id
synchronized(activeIds) {
activeIds.remove(id);
activeIds.notifyAll();
}
}
}
The same goes to the store.
The bottom line is: there is no single line of code that solves this problem exactly the way you need.

How about a ConcurrentHashMap with the Integer objects as keys?

You could have a look at this code for creating a mutex from an ID. The code was written for String IDs, but could easily be edited for Integer objects.

As you can see from the variety of answers, there are various ways to skin this cat:
Goetz et al's approach of keeping a cache of FutureTasks works quite well in situations like this where you're "caching something anyway" so don't mind building up a map of FutureTask objects (and if you did mind the map growing, at least it's easy to make pruning it concurrent)
As a general answer to "how to lock on ID", the approach outlined by Antonio has the advantage that it's obvious when the map of locks is added to/removed from.
You may need to watch out for a potential issue with Antonio's implementation, namely that the notifyAll() will wake up threads waiting on all IDs when one of them becomes available, which may not scale very well under high contention. In principle, I think you can fix that by having a Condition object for each currently locked ID, which is then the thing that you await/signal. Of course, if in practice there's rarely more than one ID being waited on at any given time, then this isn't an issue.

Steve,
your proposed code has a bunch of problems with synchronization. (Antonio's does as well).
To summarize:
You need to cache an expensive
object.
You need to make sure that while one thread is doing the retrieval, another thread does not also attempt to retrieve the same object.
That for n-threads all attempting to get the object only 1 object is ever retrieved and returned.
That for threads requesting different objects that they do not contend with each other.
pseudo code to make this happen (using a ConcurrentHashMap as the cache):
ConcurrentMap<Integer, java.util.concurrent.Future<Page>> cache = new ConcurrentHashMap<Integer, java.util.concurrent.Future<Page>>;
public Page getPage(Integer id) {
Future<Page> myFuture = new Future<Page>();
cache.putIfAbsent(id, myFuture);
Future<Page> actualFuture = cache.get(id);
if ( actualFuture == myFuture ) {
// I am the first w00t!
Page page = getFromDataBase(id);
myFuture.set(page);
}
return actualFuture.get();
}
Note:
java.util.concurrent.Future is an interface
java.util.concurrent.Future does not actually have a set() but look at the existing classes that implement Future to understand how to implement your own Future (Or use FutureTask)
Pushing the actual retrieval to a worker thread will almost certainly be a good idea.

See section 5.6 in Java Concurrency in Practice: "Building an efficient, scalable, result cache". It deals with the exact issue you are trying to solve. In particular, check out the memoizer pattern.
(source: umd.edu)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.