How to make use of element versioning in an EHCache instance?

How to make use of element versioning in an EHCache instance? - java

I am caching objects that are being sent to my component in an asynchronous way. In other words, the order in which these objects arrive is unpredictable. To avoid any issues, I have included a version attribute to my objects (which basically is a timestamp). The idea is that any object that arrives with a version that's older than the one that has already been cached, it can be discarded.
The "Element" class of EHCache (which wraps objects in an EHCache) seems to facilitate this: apart from a key and value, the constructor can take a (long-based) version. I cannot make this work in the way I'd expect it to work though. The following code snippet demonstrates my problem (Using EHCache 2.1.1):
public static void main(String[] args) {
final CacheManager manager = CacheManager.create();
final Cache testCache = new Cache(new CacheConfiguration("test", 40));
manager.addCache(testCache);
final String key = "key";
final Element elNew = new Element(key, "NEW", 2L);
testCache.put(elNew);
final Element elOld = new Element(key, "OLD", 1L);
testCache.put(elOld);
System.out.println("Cache content:");
for (Object k : testCache.getKeys()) {
System.out.println(testCache.get(k));
}
}
I'd expect the code above to cause the cached value to be "NEW", instead, "OLD" is printed. If you play a bit with the order in which elements are inserted, you'll find that the last one that has been inserted is the one that will remain in cache. Versioning seems to be ignored.
Am I not using the versioning-feature properly, or is it perhaps not intended to be used for this purpose? Can anyone recommend alternatives?

EhCache apparently ignores the value of the version field — its meaning is defined by the user. So EhCache overwrites your version 2L with version 1L without knowing what the version numbers mean.
See 1) http://jira.terracotta.org/jira/browse/EHC-765
it was decided that providing an internal versioning scheme would
cause unnecessary overhead for all users. Instead we now leave the
version value untouched so that it is entirely within the control of
the user.
And 2) http://jira.terracotta.org/jira/browse/EHC-666
[...] I would much prefer the solution proposed by Marek, that we grant the
user complete control over the version attribute and to not mutate it
at all internally. This prevents there being any performance impact
for the bulk of users, and allows the user the flexibility to use it
as they see fit. [...]
As agreed with Greg via email I fixed this as
per my last comment.
I suppose using the version field might lead to race conditions, resulting in one thread overwriting the up-to-date version of a cache item with a some what somewhat older version. Therefore, in my application, I have a counter that keeps track of the most recent version of the database, and when I load a cached value with a version field different from the most-recent-database-version-value, I know that the cached value might be stale and ignore it.

Related

Missing items from synchronized HashMap in a threaded environment

I know you have to synchronize around anything that would change the structure of a hashmap (put or remove) but it seems to me you also have to synchronize around reads of the hashmap otherwise you might be reading while another thread is changing the structure of the hashmap.
So I sync around gets and puts to my hashmap.
The only machines I have available to me to test with all only have one processor so I never had any real concurrency until the system went to production and started failing. Items were missing out of my hashmap. I assume this is because two threads were writing at the same time, but based on the code below, this should not be possible. When I turned down the number of threads to 1 it started working flawlessly, so it's definitely a threading problem.
Details:
// something for all the threads to sync on
private static Object EMREPORTONE = new Object();
synchronized (EMREPORTONE)
{
reportdatacache.put("name.." + eri.recip_map_id, eri.name);
reportdatacache.put("subjec" + eri.recip_map_id, eri.subject);
etc...
}
... and elsewhere....
synchronized (EMREPORTONE)
{
eri.name = (String)reportdatacache.get("name.." + eri.recip_map_id);
eri.subject = (String)reportdatacache.get("subjec" + eri.recip_map_id);
etc...
}
and that's it. I pass around reportdatacache between functions, but that's just the reference to the hashmap.
Another important point is that this is running as a servlet in an appserver (iplanet to be specific, but I know none of you have ever heard of that)
But regardless, EMREPORTONE is global to the webserver process, no two threads should be able to step on each other, yet my hashmap is getting wrecked. Any thoughts?

In servlet container environment static variables depend on classloader. So you may think that you're dealing with same static instance, but in fact it could be completely different one.
Additionally, check if you do not use the map by escaped reference elsewhere and write/remove keys from it.
And yes, use ConcurrentHashMap instead.

Yes, synchronization is not only important when writing, but also when reading. While a write will be performed under mutually exclusion, a reader might access an errenous state of the map.
I cannot recommend you under any circumstances to synchronize the Java Collections manually, there are thread-safe counterparts: Collections.synchronizedMap and ConcurrentHashMap. Use them, they will ensure, that access to them is safe in a multithreaded environment.
Futher hints, it seems that everyone is accesing the datareportcache. Is there only one instance of that object? Why not synchronize then on the cache itself? But forget then when trying to solve your problems, use the sugar from java.util.concurrent.

As I see it there are 3 possibilities here:
You are locking on two different objects. EMREPORTONE is private static however and the code that accesses the reportdatacache is in one file only. Ok, that isn't it then. But I would recommend locking on reportdatacache instead of EMREPORTONE however. Cleaner code.
You are missing some read or write to reportdatacache somewhere. There are other accesses to the map that are not synchronized. Are things never removed from the cache?
This isn't a synchronization problem but rather a race condition issue. The data in the hashmap is fine but you are expecting things to be in the cache but they haven't be stored by the other thread yet. Maybe 2 requests come in for the same eri at the same time and they are both putting values into the cache? Maybe check to see if the old value returned by put(...) is always null? Maybe explaining more about how you know that items are missing from the map would help with this.
As an aside, you are doing this:
reportdatacache.put("name.." + eri.recip_map_id, eri.name);
reportdatacache.put("subjec" + eri.recip_map_id, eri.subject);
But it seems like you really should be storing the eri by its id.
reportdatacache.put(recip_map_id, eri);
Then you aren't creating fake keys with the "name.." prefix. Or maybe you should create a NameSubject private static class to store the name and subject in the cache. Cleaner.
Hope something here helps.

Pattern for Java ConcurrentHashMap of Sets

A data structure that I use commonly in multi-threaded applications is a ConcurrentHashMap where I want to save a group of items that all share the same key. The problem occurs when installing the first item for a particular key value.
The pattern that I have been using is:
final ConcurrentMap<KEYTYPE, Set<VALUETYPE>> hashMap = new ConcurrentHashMap<KEYTYPE, Set<VALUETYPE>>();
// ...
Set<VALUETYPE> newSet = new HashSet<VALUETYPE>();
final Set<VALUETYPE> set = hashMap.putIfAbsent(key, newSet)
if (set != null) {
newSet = set;
}
synchronized (newSet) {
if (!newSet.contains(value)) {
newSet.add(value);
}
}
Is there a better pattern for doing this operation? Is this even thread-safe? Is there a better class to use for the inner Set than java.util.HashSet?

I strongly recommend using the Google Guava libraries for this, specifically an implementation of Multimap. The HashMultimap would be your best bet, though if you need concurrent update opterations you would need to wrap it in a delegate using Multimaps.synchronizedSetMultimap().
Another option is to use a ComputingMap (also from Guava), which is a map that, if the Value returned from a call to get(Key) does not exist, it is instantiated there and then. ComputingMaps are created using MapMaker.
The code from your question would be roughly:
ConcurrentMap<KEYTYPE, Set<VALUETYPE>> hashMap = new MapMaker()
.makeComputingMap(
new Function<KEYTYPE, VALUETYPE>() {
public Graph apply(KEYTYPE key) {
return new HashSet<VALUETYPE>();
}
});
The Function would only be called when a call to get() for a specific key would otherwise return null. This means that you can then do this:
hashMap.get(key).put(value);
safely knowing that the HashSet<VALUETYPE> is created if it doesn't already exist.
MapMaker is also relevant because of the control it gives you over the tuning of the returned Map, letting you specify, for example, the concurrency level using the method concurrencyLevel(). You may find that useful:
Guides the allowed concurrency among update operations. Used as a hint for internal sizing. The table is internally partitioned to try to permit the indicated number of concurrent updates without contention. Because assignment of entries to these partitions is not necessarily uniform, the actual concurrency observed may vary.

I think using java.util.concurrent.ConcurrentSkipListMap and java.util.concurrent.ConcurrentSkipListSet could help you resolve the concurrency concerns.

Persist guava cache on shutdown

I use the following guava cache to store messages for a specific time waiting for a possible response. So I use the cache more like a timeout for messages:
Cache cache = CacheBuilder.newBuilder().expireAfterWrite(7, TimeUnit.DAYS).build();
cache.put(id,message);
...
cache.getIfPresent(id);
In the end I need to persist the messages with its currently 'timeout' information on shutdown
and restore it on startup with the internal already expired times per entry. I couldn't find any methods which give me access to the time information, so I can handle it by myself.
The gauva wiki says:
Your application will not need to store more data than what would fit in RAM. (Guava caches are local to a single run of your application. They do not store data in files, or on outside servers. If this does not fit your needs, consider a tool like Memcached.)
Do you think this restriction address also a 'timeout' map to persist on shutdown?

I don't believe there's any way to recreate the cache with per-entry expiration values -- even if you do use reflection. You might be able to simulate it by using a DelayedQueue in a separate thread that explicitly invalidates entries that should have expired, but that's the best I think you can do.
That said, if you're just interested in peeking at the expiration information, I would recommend wrapping your cache values in a class that remembers the expiration time, so you can look up the expiration time for an entry just by looking up its value and calling a getExpirationTime() method or what have you.
That approach, at least, should not break with new Guava releases.

Well, unfortunately Guava doesn't seems to expose this functionality but if you feel adventurous and absolutely must have this you could always use reflection. Just look at sources and see what methods do you need. As always care should be taken as your code might break when Guaval internal implementation changes. Code below seems to work with Guava 10.0.1:
Cache<Integer, String> cache = CacheBuilder.newBuilder().expireAfterWrite(7, TimeUnit.DAYS).build(new CacheLoader<Integer, String>() {
#Override
public String load(Integer key) throws Exception {
return "The value is "+key.toString();
}
});
Integer key_1 = Integer.valueOf(1);
Integer key_2 = Integer.valueOf(2);
System.out.println(cache.get(key_1));
System.out.println(cache.get(key_2));
ConcurrentMap<Integer, String> map = cache.asMap();
Method m = map.getClass().getDeclaredMethod("getEntry", Object.class);
m.setAccessible(true);
for(Integer key: map.keySet()) {
Object value = m.invoke(map, key);
Method m2 = value.getClass().getDeclaredMethod("getExpirationTime", null);
m2.setAccessible(true);
Long expirationTime = (Long)m2.invoke(value, null);
System.out.println(key+" expiration time is "+expirationTime);
}

reliably forcing Guava map eviction to take place

EDIT: I've reorganized this question to reflect the new information that since became available.
This question is based on the responses to a question by Viliam concerning Guava Maps' use of lazy eviction: Laziness of eviction in Guava's maps
Please read this question and its response first, but essentially the conclusion is that Guava maps do not asynchronously calculate and enforce eviction. Given the following map:
ConcurrentMap<String, MyObject> cache = new MapMaker()
.expireAfterAccess(10, TimeUnit.MINUTES)
.makeMap();
Once ten minutes has passed following access to an entry, it will still not be evicted until the map is "touched" again. Known ways to do this include the usual accessors - get() and put() and containsKey().
The first part of my question [solved]: what other calls cause the map to be "touched"? Specifically, does anyone know if size() falls into this category?
The reason for wondering this is that I've implemented a scheduled task to occasionally nudge the Guava map I'm using for caching, using this simple method:
public static void nudgeEviction() {
cache.containsKey("");
}
However I'm also using cache.size() to programmatically report the number of objects contained in the map, as a way to confirm this strategy is working. But I haven't been able to see a difference from these reports, and now I'm wondering if size() also causes eviction to take place.
Answer: So Mark has pointed out that in release 9, eviction is invoked only by the get(), put(), and replace() methods, which would explain why I wasn't seeing an effect for containsKey(). This will apparently change with the next version of guava which is set for release soon, but unfortunately my project's release is set sooner.
This puts me in an interesting predicament. Normally I could still touch the map by calling get(""), but I'm actually using a computing map:
ConcurrentMap<String, MyObject> cache = new MapMaker()
.expireAfterAccess(10, TimeUnit.MINUTES)
.makeComputingMap(loadFunction);
where loadFunction loads the MyObject corresponding to the key from a database. It's starting to look like I have no easy way of forcing eviction until r10. But even being able to reliably force eviction is put into doubt by the second part of my question:
The second part of my question [solved]: In reaction to one of the responses to the linked question, does touching the map reliably evict all expired entries? In the linked answer, Niraj Tolia indicates otherwise, saying eviction is potentially only processed in batches, which would mean multiple calls to touch the map might be needed to ensure all expired objects were evicted. He did not elaborate, however this seems related to the map being split into segments based on concurrency level. Assuming I used r10, in which a containsKey("") does invoke eviction, would this then be for the entire map, or only for one of the segments?
Answer: maaartinus has addressed this part of the question:
Beware that containsKey and other reading methods only run postReadCleanup, which does nothing but on each 64th invocation (see DRAIN_THRESHOLD). Moreover, it looks like all cleanup methods work with single Segment only.
So it looks like calling containsKey("") wouldn't be a viable fix, even in r10. This reduces my question to the title: How can I reliably force eviction to occur?
Note: Part of the reason my web app is noticeably affected by this issue is that when I implemented caching I decided to use multiple maps - one for each class of my data objects. So with this issue there is the possibility that one area of code is executed, causing a bunch of Foo objects to be cached, and then the Foo cache isn't touched again for a long time so it doesn't evict anything. Meanwhile Bar and Baz objects are being cached from other areas of code, and memory is being eaten. I'm setting a maximum size on these maps, but this is a flimsy safeguard at best (I'm assuming its effect is immediate - still need to confirm this).
UPDATE 1: Thanks to Darren for linking the relevant issues - they now have my votes. So it looks like a resolution is in the pipeline, but seems unlikely to be in r10. In the meantime, my question remains.
UPDATE 2: At this point I'm just waiting for a Guava team member to give feedback on the hack maaartinus and I put together (see answers below).
LAST UPDATE: feedback received!

I just added the method Cache.cleanUp() to Guava. Once you migrate from MapMaker to CacheBuilder you can use that to force eviction.

I was wondering the about the same issue you described in the first part of your question. From what I can tell from looking at the source code for Guava's CustomConcurrentHashMap (release 9), it appears that entries are evicted on the get(), put(), and replace() methods. The containsKey() method does not appear to invoke eviction. I'm not 100% sure because I took a quick pass at the code.
Update:
I also found a more recent version of the CustomConcurrentHashmap in Guava's git repository and it looks like containsKey() has been updated to invoke eviction.
Both release 9 and the latest version I just found do not invoke eviction when size() is called.
Update 2:
I recently noticed that Guava r10 (yet to be released) has a new class called CacheBuilder. Basically this class is a forked version of the MapMaker but with caching in mind. The documentation suggests that it will support some of the eviction requirements you are looking for.
I reviewed the updated code in r10's version of the CustomConcurrentHashMap and found what looks like a scheduled map cleaner. Unfortunately, that code appears unfinished at this point but r10 looks more and more promising each day.

Beware that containsKey and other reading methods only run postReadCleanup, which does nothing but on each 64th invocation (see DRAIN_THRESHOLD). Moreover, it looks like all cleanup methods work with single Segment only.
The easiest way to enforce eviction seems to be to put some dummy object into each segment. For this to work, you'd need to analyze CustomConcurrentHashMap.hash(Object), which is surely no good idea, as this method may change anytime. Moreover, depending on the key class it may be hard to find a key with a hashCode ensuring it lands in a given segment.
You could use reads instead, but would have to repeat them 64 times per segment. Here, it'd easy to find a key with an appropriate hashCode, since here any object is allowed as an argument.
Maybe you could hack into the CustomConcurrentHashMap source code instead, it could be as trivial as
public void runCleanup() {
final Segment<K, V>[] segments = this.segments;
for (int i = 0; i < segments.length; ++i) {
segments[i].runCleanup();
}
}
but I wouldn't do it without a lot of testing and/or an OK by a guava team member.

Yep, we've gone back and forth a few times on whether these cleanup tasks should be done on a background thread (or pool), or should be done on user threads. If they were done on a background thread, this would eventually happen automatically; as it is, it'll only happen as each segment gets used. We're still trying to come up with the right approach here - I wouldn't be surprised to see this change in some future release, but I also can't promise anything or even make a credible guess as to how it will change. Still, you've presented a reasonable use case for some kind of background or user-triggered cleanup.
Your hack is reasonable, as long as you keep in mind that it's a hack, and liable to break (possibly in subtle ways) in future releases. As you can see in the source, Segment.runCleanup() calls runLockedCleanup and runUnlockedCleanup: runLockedCleanup() will have no effect if it can't lock the segment, but if it can't lock the segment it's because some other thread has the segment locked, and that other thread can be expected to call runLockedCleanup as part of its operation.
Also, in r10, there's CacheBuilder/Cache, analogous to MapMaker/Map. Cache is the preferred approach for many current users of makeComputingMap. It uses a separate CustomConcurrentHashMap, in the common.cache package; depending on your needs, you may want your GuavaEvictionHacker to work with both. (The mechanism is the same, but they're different Classes and therefore different Methods.)

I'm not a big fan of hacking into or forking external code until absolutely necessary. This problem occurs in part due to an early decision for MapMaker to fork ConcurrentHashMap, thereby dragging in a lot of complexity that could have been deferred until after the algorithms were worked out. By patching above MapMaker, the code is robust to library changes so that you can remove your workaround on your own schedule.
An easy approach is to use a priority queue of weak reference tasks and a dedicated thread. This has the drawback of creating many stale no-op tasks, which can become excessive in due to the O(lg n) insertion penalty. It works reasonably well for small, less frequently used caches. It was the original approach taken by MapMaker and its simple to write your own decorator.
A more robust choice is to mirror the lock amortization model with a single expiration queue. The head of the queue can be volatile so that a read can always peek to determine if it has expired. This allows all reads to trigger an expiration and an optional clean-up thread to check regularly.
By far the simplest is to use #concurrencyLevel(1) to force MapMaker to use a single segment. This reduces the write concurrency, but most caches are read heavy so the loss is minimal. The original hack to nudge the map with a dummy key would then work fine. This would be my preferred approach, but the other two options are okay if you have high write loads.

I don't know if it is appropriate for your use case, but your main concern about the lack of background cache eviction seems to be memory consumption, so I would have thought that using softValues() on the MapMaker to allow the Garbage Collector to reclaim entries from the cache when a low memory situation occurs. Could easily be the solution for you. I have used this on a subscription-server (ATOM) where entries are served through a Guava cache using SoftReferences for values.

Based on maaartinus's answer, I came up with the following code which uses reflection rather than directly modifying the source (If you find this useful please upvote his answer!). While it will come at a performance penalty for using reflection, the difference should be negligible since I'll run it about once every 20 minutes for each caching Map (I'm also caching the dynamic lookups in the static block which will help). I have done some initial testing and it appears to work as intended:
public class GuavaEvictionHacker {
//Class objects necessary for reflection on Guava classes - see Guava docs for info
private static final Class<?> computingMapAdapterClass;
private static final Class<?> nullConcurrentMapClass;
private static final Class<?> nullComputingConcurrentMapClass;
private static final Class<?> customConcurrentHashMapClass;
private static final Class<?> computingConcurrentHashMapClass;
private static final Class<?> segmentClass;
//MapMaker$ComputingMapAdapter#cache points to the wrapped CustomConcurrentHashMap
private static final Field cacheField;
//CustomConcurrentHashMap#segments points to the array of Segments (map partitions)
private static final Field segmentsField;
//CustomConcurrentHashMap$Segment#runCleanup() enforces eviction on the calling Segment
private static final Method runCleanupMethod;
static {
try {
//look up Classes
computingMapAdapterClass = Class.forName("com.google.common.collect.MapMaker$ComputingMapAdapter");
nullConcurrentMapClass = Class.forName("com.google.common.collect.MapMaker$NullConcurrentMap");
nullComputingConcurrentMapClass = Class.forName("com.google.common.collect.MapMaker$NullComputingConcurrentMap");
customConcurrentHashMapClass = Class.forName("com.google.common.collect.CustomConcurrentHashMap");
computingConcurrentHashMapClass = Class.forName("com.google.common.collect.ComputingConcurrentHashMap");
segmentClass = Class.forName("com.google.common.collect.CustomConcurrentHashMap$Segment");
//look up Fields and set accessible
cacheField = computingMapAdapterClass.getDeclaredField("cache");
segmentsField = customConcurrentHashMapClass.getDeclaredField("segments");
cacheField.setAccessible(true);
segmentsField.setAccessible(true);
//look up the cleanup Method and set accessible
runCleanupMethod = segmentClass.getDeclaredMethod("runCleanup");
runCleanupMethod.setAccessible(true);
}
catch (ClassNotFoundException cnfe) {
throw new RuntimeException("ClassNotFoundException thrown in GuavaEvictionHacker static initialization block.", cnfe);
}
catch (NoSuchFieldException nsfe) {
throw new RuntimeException("NoSuchFieldException thrown in GuavaEvictionHacker static initialization block.", nsfe);
}
catch (NoSuchMethodException nsme) {
throw new RuntimeException("NoSuchMethodException thrown in GuavaEvictionHacker static initialization block.", nsme);
}
}
/**
* Forces eviction to take place on the provided Guava Map. The Map must be an instance
* of either {#code CustomConcurrentHashMap} or {#code MapMaker$ComputingMapAdapter}.
*
* #param guavaMap the Guava Map to force eviction on.
*/
public static void forceEvictionOnGuavaMap(ConcurrentMap<?, ?> guavaMap) {
try {
//we need to get the CustomConcurrentHashMap instance
Object customConcurrentHashMap;
//get the type of what was passed in
Class<?> guavaMapClass = guavaMap.getClass();
//if it's a CustomConcurrentHashMap we have what we need
if (guavaMapClass == customConcurrentHashMapClass) {
customConcurrentHashMap = guavaMap;
}
//if it's a NullConcurrentMap (auto-evictor), return early
else if (guavaMapClass == nullConcurrentMapClass) {
return;
}
//if it's a computing map we need to pull the instance from the adapter's "cache" field
else if (guavaMapClass == computingMapAdapterClass) {
customConcurrentHashMap = cacheField.get(guavaMap);
//get the type of what we pulled out
Class<?> innerCacheClass = customConcurrentHashMap.getClass();
//if it's a NullComputingConcurrentMap (auto-evictor), return early
if (innerCacheClass == nullComputingConcurrentMapClass) {
return;
}
//otherwise make sure it's a ComputingConcurrentHashMap - error if it isn't
else if (innerCacheClass != computingConcurrentHashMapClass) {
throw new IllegalArgumentException("Provided ComputingMapAdapter's inner cache was an unexpected type: " + innerCacheClass);
}
}
//error for anything else passed in
else {
throw new IllegalArgumentException("Provided ConcurrentMap was not an expected Guava Map: " + guavaMapClass);
}
//pull the array of Segments out of the CustomConcurrentHashMap instance
Object[] segments = (Object[])segmentsField.get(customConcurrentHashMap);
//loop over them and invoke the cleanup method on each one
for (Object segment : segments) {
runCleanupMethod.invoke(segment);
}
}
catch (IllegalAccessException iae) {
throw new RuntimeException(iae);
}
catch (InvocationTargetException ite) {
throw new RuntimeException(ite.getCause());
}
}
}
I'm looking for feedback on whether this approach is advisable as a stopgap until the issue is resolved in a Guava release, particularly from members of the Guava team when they get a minute.
EDIT: updated the solution to allow for auto-evicting maps (NullConcurrentMap or NullComputingConcurrentMap residing in a ComputingMapAdapter). This turned out to be necessary in my case, since I'm calling this method on all of my maps and a few of them are auto-evictors.

Synchronizing on an Integer value [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
What is the best way to increase number of locks in java
Suppose I want to lock based on an integer id value. In this case, there's a function that pulls a value from a cache and does a fairly expensive retrieve/store into the cache if the value isn't there.
The existing code isn't synchronized and could potentially trigger multiple retrieve/store operations:
//psuedocode
public Page getPage (Integer id){
Page p = cache.get(id);
if (p==null)
{
p=getFromDataBase(id);
cache.store(p);
}
}
What I'd like to do is synchronize the retrieve on the id, e.g.
if (p==null)
{
synchronized (id)
{
..retrieve, store
}
}
Unfortunately this won't work because 2 separate calls can have the same Integer id value but a different Integer object, so they won't share the lock, and no synchronization will happen.
Is there a simple way of insuring that you have the same Integer instance? For example, will this work:
syncrhonized (Integer.valueOf(id.intValue())){
The javadoc for Integer.valueOf() seems to imply that you're likely to get the same instance, but that doesn't look like a guarantee:
Returns a Integer instance
representing the specified int value.
If a new Integer instance is not
required, this method should generally
be used in preference to the
constructor Integer(int), as this
method is likely to yield
significantly better space and time
performance by caching frequently
requested values.
So, any suggestions on how to get an Integer instance that's guaranteed to be the same, other than the more elaborate solutions like keeping a WeakHashMap of Lock objects keyed to the int? (nothing wrong with that, it just seems like there must be an obvious one-liner than I'm missing).

You really don't want to synchronize on an Integer, since you don't have control over what instances are the same and what instances are different. Java just doesn't provide such a facility (unless you're using Integers in a small range) that is dependable across different JVMs. If you really must synchronize on an Integer, then you need to keep a Map or Set of Integer so you can guarantee that you're getting the exact instance you want.
Better would be to create a new object, perhaps stored in a HashMap that is keyed by the Integer, to synchronize on. Something like this:
public Page getPage(Integer id) {
Page p = cache.get(id);
if (p == null) {
synchronized (getCacheSyncObject(id)) {
p = getFromDataBase(id);
cache.store(p);
}
}
}
private ConcurrentMap<Integer, Integer> locks = new ConcurrentHashMap<Integer, Integer>();
private Object getCacheSyncObject(final Integer id) {
locks.putIfAbsent(id, id);
return locks.get(id);
}
To explain this code, it uses ConcurrentMap, which allows use of putIfAbsent. You could do this:
locks.putIfAbsent(id, new Object());
but then you incur the (small) cost of creating an Object for each access. To avoid that, I just save the Integer itself in the Map. What does this achieve? Why is this any different from just using the Integer itself?
When you do a get() from a Map, the keys are compared with equals() (or at least the method used is the equivalent of using equals()). Two different Integer instances of the same value will be equal to each other. Thus, you can pass any number of different Integer instances of "new Integer(5)" as the parameter to getCacheSyncObject and you will always get back only the very first instance that was passed in that contained that value.
There are reasons why you may not want to synchronize on Integer ... you can get into deadlocks if multiple threads are synchronizing on Integer objects and are thus unwittingly using the same locks when they want to use different locks. You can fix this risk by using the
locks.putIfAbsent(id, new Object());
version and thus incurring a (very) small cost to each access to the cache. Doing this, you guarantee that this class will be doing its synchronization on an object that no other class will be synchronizing on. Always a Good Thing.

Use a thread-safe map, such as ConcurrentHashMap. This will allow you to manipulate a map safely, but use a different lock to do the real computation. In this way you can have multiple computations running simultaneous with a single map.
Use ConcurrentMap.putIfAbsent, but instead of placing the actual value, use a Future with computationally-light construction instead. Possibly the FutureTask implementation. Run the computation and then get the result, which will thread-safely block until done.

Integer.valueOf() only returns cached instances for a limited range. You haven't specified your range, but in general, this won't work.
However, I would strongly recommend you not take this approach, even if your values are in the correct range. Since these cached Integer instances are available to any code, you can't fully control the synchronization, which could lead to a deadlock. This is the same problem people have trying to lock on the result of String.intern().
The best lock is a private variable. Since only your code can reference it, you can guarantee that no deadlocks will occur.
By the way, using a WeakHashMap won't work either. If the instance serving as the key is unreferenced, it will be garbage collected. And if it is strongly referenced, you could use it directly.

Using synchronized on an Integer sounds really wrong by design.
If you need to synchronize each item individually only during retrieve/store you can create a Set and store there the currently locked items. In another words,
// this contains only those IDs that are currently locked, that is, this
// will contain only very few IDs most of the time
Set<Integer> activeIds = ...
Object retrieve(Integer id) {
// acquire "lock" on item #id
synchronized(activeIds) {
while(activeIds.contains(id)) {
try {
activeIds.wait();
} catch(InterruptedExcption e){...}
}
activeIds.add(id);
}
try {
// do the retrieve here...
return value;
} finally {
// release lock on item #id
synchronized(activeIds) {
activeIds.remove(id);
activeIds.notifyAll();
}
}
}
The same goes to the store.
The bottom line is: there is no single line of code that solves this problem exactly the way you need.

How about a ConcurrentHashMap with the Integer objects as keys?

You could have a look at this code for creating a mutex from an ID. The code was written for String IDs, but could easily be edited for Integer objects.

As you can see from the variety of answers, there are various ways to skin this cat:
Goetz et al's approach of keeping a cache of FutureTasks works quite well in situations like this where you're "caching something anyway" so don't mind building up a map of FutureTask objects (and if you did mind the map growing, at least it's easy to make pruning it concurrent)
As a general answer to "how to lock on ID", the approach outlined by Antonio has the advantage that it's obvious when the map of locks is added to/removed from.
You may need to watch out for a potential issue with Antonio's implementation, namely that the notifyAll() will wake up threads waiting on all IDs when one of them becomes available, which may not scale very well under high contention. In principle, I think you can fix that by having a Condition object for each currently locked ID, which is then the thing that you await/signal. Of course, if in practice there's rarely more than one ID being waited on at any given time, then this isn't an issue.

Steve,
your proposed code has a bunch of problems with synchronization. (Antonio's does as well).
To summarize:
You need to cache an expensive
object.
You need to make sure that while one thread is doing the retrieval, another thread does not also attempt to retrieve the same object.
That for n-threads all attempting to get the object only 1 object is ever retrieved and returned.
That for threads requesting different objects that they do not contend with each other.
pseudo code to make this happen (using a ConcurrentHashMap as the cache):
ConcurrentMap<Integer, java.util.concurrent.Future<Page>> cache = new ConcurrentHashMap<Integer, java.util.concurrent.Future<Page>>;
public Page getPage(Integer id) {
Future<Page> myFuture = new Future<Page>();
cache.putIfAbsent(id, myFuture);
Future<Page> actualFuture = cache.get(id);
if ( actualFuture == myFuture ) {
// I am the first w00t!
Page page = getFromDataBase(id);
myFuture.set(page);
}
return actualFuture.get();
}
Note:
java.util.concurrent.Future is an interface
java.util.concurrent.Future does not actually have a set() but look at the existing classes that implement Future to understand how to implement your own Future (Or use FutureTask)
Pushing the actual retrieval to a worker thread will almost certainly be a good idea.

See section 5.6 in Java Concurrency in Practice: "Building an efficient, scalable, result cache". It deals with the exact issue you are trying to solve. In particular, check out the memoizer pattern.
(source: umd.edu)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.