My service endpoint recieves a list of metric every minute along with their timestamps. If a metric passes certain conditions we need to store them in a cache, so that they can be accessed later. The access functions for this service are -
List<Metrics> GetAllInterestingMetrics5Mins();
List<Metrics> GetAllInterestingMetrics10Mins();
List<Metrics> GetAllInterestingMetrics30Mins();
My curent solution is to use 3 Guava caches with time based eviction set to 5, 10 & 15 minutes. When somebody calls one of the above functions, I return all the metrics from the relvant cache.
There are 2 problems with this -
Guava cache start timing for eviction based on when the value is put in the cache (or accessed, depending upon setting). Now its possible for a metric to be delayed, so the timestamp would be earlier than the time when the metric is put in the cache.
I dont like that I have to create 3 caches, when one cache with 30 mins should suffice, it increases memory footprint and complexity in cache handling.
Is there a way to solve these 2 problems in Guava or any other out of box caching solution ?
There is a particular difference between caching solutions like Guava and EHCache and what you are trying to implement. The sole purpose of these caches is to act in the same way than getter functions work. So, caches are intended to retrieve a single element by its key and store it for further use; evicting it after it stops being used.
E.g.
#Cacheable
public Object getter(String key){
...
}
That's why getting a whole set of objects from the cache feels a little like forcing the cache and the eviction policy to work differently from its original purpose.
What you need, instead of Guava cache (or other caching solutions), is a collection that can be evicted all at once by a timer function. Sadly, Guava doesn't provide that right now. You would still need a timer function provided by the application that would remove all existing elements from the cache.
So, my suggestion would be the following:
Even when it is possible for Guava to behave in the way you want it to, you will find out that you are not using the features that make Guava really valuable, and you are "forcing" it to behave differently. So I suggest you forget about the Guava implementation and consider using, for example, an specialization from the AbstractMap class, along with a timer function that will evict its contents every N seconds.
This way you will be able to have all your entries in a single cache and stop worrying about the discrepancies between the timestamp and the time the entry was added to the cache.
Regarding Topic 1:
Just a sidenote: Please do not confuse expiry and eviction. Expiry means the entry may not returned by the cache any more and may happen at a specified point in time or after a duration. Eviction is the action to free resources, the entry is removed from the cache. After expiry, eviction may happen at the same time or later.
All common cache products do not have support for exact, aka "point in time", expiry. We need that usecase very often in our applications so I spent some effort with cache2k to support this.
Here is a blueprint for cache2k:
static class MetricsEntry {
long nextUpdate;
List<Metrics> metrics;
}
static class MyEntryExpiryCalculator implements EntryExpiryCalculator<Integer, MetricsEntry> {
#Override
public long calculateExpiryTime(Integer _key, MetricsEntry _value, long _fetchTime, CacheEntry _oldEntry) {
return _value.nextUpdate;
}
}
Cache createTheCache() {
Cache<Integer, MetricsEntry> cache =
CacheBuilder.newCache(Integer.class, MetricsEntry.class)
.sharpExpiry(true)
.entryExpiryCalculator(new MyEntryExpiryCalculator())
.source(new MySource())
.build();
return cache;
}
If you have a time reference in the metrics objects, you can use that and you can omit the additional entry class. sharpExpiry(true) instructs cache2k for exact expiry. If you leave this out, the expiry may be a few milliseconds off, but the access time is slightly faster.
Regarding Topic 2:
The straight forward approach would be to use the interval minutes as cache key.
Here is a cache source (aka cache loader) that strictly returns the metrics of the previous interval:
static class MySource implements CacheSource<Integer, MetricsEntry> {
#Override
public MetricsEntry get(Integer interval) {
MetricsEntry e = new MetricsEntry();
boolean crossedIntervalEnd;
do {
long now = System.currentTimeMillis();
long intervalMillis = interval * 1000 * 60;
long startOfInterval = now % (intervalMillis);
e.metrics = calculateMetrics(startOfInterval, interval);
e.nextUpdate = startOfInterval + intervalMillis;
now = System.currentTimeMillis();
crossedIntervalEnd = now >= e.nextUpdate;
} while (crossedIntervalEnd);
return e;
}
}
That would return the metrics for 10:00-10:05 if you do the request on lets say 10:07.
If you just want to calculate instantly the metrics of the past interval, then it is simpler:
static class MySource implements CacheSource<Integer, MetricsEntry> {
#Override
public MetricsEntry get(Integer interval) {
MetricsEntry e = new MetricsEntry();
long intervalMillis = interval * 1000 * 60;
long startOfInterval = System.currentTimeMillis();
e.metrics = calculateMetrics(startOfInterval, interval);
e.nextUpdate = startOfInterval + intervalMillis;
return e;
}
}
The use of the cache source has an advantage over put(). cache2k is blocking, so if multiple requests come in for one metric, only one metric calculation is started.
If you don't need exact expiry to the millisecond, you can use other caches, too. The thing you need to do is to store the time it takes to calculate the metrics within your cache value and then correct the expiry duration accordingly.
Have a good one!
Have you considered using something like a Deque instead? Just put the metrics in the queue and when you want to retrieve metrics for the last N minutes, just start at the end with the most recent additions and take everything until you find one that's from > N minutes ago. You can evict entries that are too old from the other end in a similar way. (It's not clear to me from your question how the key/value aspect of Cache relates to your problem.)
Related
I have List<CompletableFuture> stored in Map
private final Map<UUID, List<CompletableFuture>> hydrationProcesses = new ConcurrentHashMap<>();
Currently there is a daemon thread which runs every 30 sec and removes all Futures that have alredy been completed.
There are lots of TTL invalidation implementations, I am looking for invalidation based on some predicate. I want to get rid of this daemon thread.
Is there any out of a box solution for scheduled cache invalidation based on some custom logic. Maybe something in Spring/Guava that I missed?
Maybe something similar to Guava's
CacheBuilder.newBuilder()
.expireAfterAccess(2,TimeUnit.MILLISECONDS)
.build(loader);
But rather then mark everything as expired after access I need to check if this Future has already been completed and then remove it from cache.
I doubt there is a solution based on Guava. Since you don't specifically ask for a "Guava-only" solution, I'd like to give an idea how to solve the problem with cache2k.
Here is how the setup could look like:
final long RECHECK_INTERVAL_MILLIS = 30000;
Cache<UUID, CompletableFuture<Void>> cache =
new Cache2kBuilder<UUID, CompletableFuture<Void>>(){}
.loader(new AdvancedCacheLoader<UUID, CompletableFuture<Void>>() {
#Override
public CompletableFuture<Void> load(UUID key,
long currentTime,
CacheEntry<UUID, CompletableFuture<Void>> currentEntry) {
return currentEntry != null ? currentEntry.getValue() : null;
}
})
.expiryPolicy(new ExpiryPolicy<UUID, CompletableFuture<Void>>() {
#Override
public long calculateExpiryTime(UUID key,
CompletableFuture<Void> value,
long loadTime,
CacheEntry<UUID, CompletableFuture<Void>> oldEntry) {
return value.isDone() ? NOW : loadTime + RECHECK_INTERVAL_MILLIS;
}
})
.refreshAhead(true)
.build();
Actually cache2k has similar capabilities then Guava or other caches. However there are tiny extensions that allow more sophisticated setups.
The trick used here, is to configure the cache in read through operation, but make the loader return the current cache value. When an entry expires, the loader is called, because of refreshAhead(true), but the current value is preserved and the expiry policy is evaluated again. The predicate that you like to be checked goes into the expiry policy.
Other caches have read through and custom expiry as well, but lack the concept of a "smarter loader" (AdvancedCacheLoader) that can act more efficiently based on the existing cache value.
We use setups similar to this in production.
There is a downside, too. cache2k uses one timing thread per cache, if custom expiry is used. This means your extra thread will not go away. cache2k will be enhanced in the future to have a global set of threads for timing that are shared among all caches. Update: New versions of cache2k use a common thread pool for timers.
Disclaimer: I am the author of cache2k, so I cannot speak with absolute certainty that there is no possible solution based on Guava.
I'm using CacheBuilder and LocalCache from guava library, but have some performance issues p99.9 latency around 300-400 ms for getAllPresent.
Latency for requests almost doubles between p99 and p99.9 (p99 is around 150 ms)
The following configuration is used:
120 sec for refreshAfterWrite, maxsize is set to be 2e6 and expiration for 24 hours, initial capacity is 1e6. No removeListener is used and no expireAfterWrite. ConcurrencyLevel 256 (Tried different values). Machine has 12 cores.
While cache is in use it has between 8e5 to 1.2e6 entries.
Pattern of usage is getAllPresent for around 3k keys on p99.9 and around 100 qps.
Key is a complex object for hashCode, Objects.hash method is used with all fields supplied there. I tried different hash function to make sure that distribution is uniform (murmur3 shown similar results). So, the problem is not in collisions.
Any pointers on how to tune it to be more performant?
I would say it is efficient in Java for the 99%tile to be double the 90%tile and for the 99.9%tile to be double the 99%tile. If you see this pattern, you will need to reduce the cost of the operation over all to reduce the latency i.e. it is unlikely there is some quick wins that will help you.
NOTE: when you have a large cache and scan across it you can expect every entry to involve at least one or two L3 cache misses. This is going to be expensive. For a small cache which fits in your CPU cache this will be many times faster.
I would use a profiler to reduce CPU and memory allocation for this operation, or change the how you call the cache to do what you need and this will also bring down the 99.9%tile.
On varying request times / "Request times doubles between p99 and p99.9"
That might simply be an occasional GC during the getAllPresent call. To really investigate this you should do a stripped down benchmark which tracks the GC activity (just the counters).
Another source of trouble may be a lock contention. I am missing in your problem statement the exact access pattern. How many requests are done in parallel? How does the key space overlap? Guava partitions the cache hashtables internally and uses the concurrencyLevel as hint. The read access is not completely lock free, since the LRU list needs to be updated. For accessing the same key from different threads, this is a source for lock contention. Here is an (outdated) evaluation on nitro cache performance showing this effect. (Update: the guava cache has some strategy to avoid the locks on read; this needs further investigation)
On how to get (15 times?) faster
The most costly thing when you access the cache is the eviction algorithm updating its data structure. However, your maximum cache size (2E6) is above the maximum experienced size (1.2E6). This means no eviction will take place, because the capacity limit is never reached. This means that all the updating of the LRU list in Guava Cache is senseless. I have benchmarked the cache runtime for Google Guava, EHCache, infinispan and different eviction strategies at cache2k benchmarks see the "runtime comparison for hits". Benchmarks for multi threaded accesses are missing yet, this will show up during august.
From my understanding there is no option to change or switch of the eviction strategy in Guava Cache (can anybody second this?).
Within cache2k I do experiment with alternative eviction strategies which allow a lock free read access. Within your scenario, you could simply select "random eviction", and I would expect a speedup of about factor 15. BTW: The cache2k cache also prints out hash table statistics and a quality metric for your hashCode() implementation see the notes on cache2k statistics.
It should be possible to do a quick evaluation. Here some code snippets to get you started quickly:
<dependency>
<groupId>org.cache2k</groupId>
<artifactId>cache2k-core</artifactId>
<version>0.19.1</version>
</dependency>
<dependency>
<groupId>org.cache2k</groupId>
<artifactId>cache2k-api</artifactId>
<version>0.19.1</version>
</dependency>
Remark: The cache implementations are not exposed in the API module, that's why we need the core module in the compile scope. Cache initialization:
// optional data source (similar to CacheLoader)
CacheSource<Integer, String> source =
new CacheSource<Integer, String>() {
public String get(Integer o) {
return o + "hello";
}
};
Cache<Integer, String> cache =
CacheBuilder.newCache(Integer.class, String.class)
.implementation(RandomCache.class)
.maxSize(3000000)
.expiryMillis(120 * 1000)
/* optional, if cache should do the refresh itself
.source(source)
.backgroundRefresh(true)
*/
.build();
You can experiment with other eviction algorithms by altering the implementation option.
getAllPresent is not available in cache2k, you can code it yourself:
public Map<Integer, String> getAllPresent(Iterator<Integer> it) {
HashMap<Integer, String> hash = new HashMap<>();
while(it.hasNext()) {
int k = it.next();
String v = cache.peek(k);
if (v != null) {
hash.put(k, v);
}
}
return hash;
}
In cache2k cache.peek() returns a mapped element without invoking the cache source, that is exactly the intended semantic of getAllPresent. Building up the hash map produces actually a lot GC load. The usage of bulk operations like getAll or getAllPresent should be a careful decision. Since the access times in cache2k are similar to a hash table access time, bulk operations will probably not speed things up.
A note on getAllPresent()
Within cache2k there is a JSR107 compatible getAll() method which serves about the same purpose. From an API designers standpoint these methods are evil, since it contradicts the idea of the cache to control the resources. Just got with cache.get() or cache.peek(). If there is a CacheSource (aka CacheLoader) use cache.prefetch(keys) "say to the cache" that you want to work with these keys next.... Sorry, a little offtopic.
I use the following guava cache to store messages for a specific time waiting for a possible response. So I use the cache more like a timeout for messages:
Cache cache = CacheBuilder.newBuilder().expireAfterWrite(7, TimeUnit.DAYS).build();
cache.put(id,message);
...
cache.getIfPresent(id);
In the end I need to persist the messages with its currently 'timeout' information on shutdown
and restore it on startup with the internal already expired times per entry. I couldn't find any methods which give me access to the time information, so I can handle it by myself.
The gauva wiki says:
Your application will not need to store more data than what would fit in RAM. (Guava caches are local to a single run of your application. They do not store data in files, or on outside servers. If this does not fit your needs, consider a tool like Memcached.)
Do you think this restriction address also a 'timeout' map to persist on shutdown?
I don't believe there's any way to recreate the cache with per-entry expiration values -- even if you do use reflection. You might be able to simulate it by using a DelayedQueue in a separate thread that explicitly invalidates entries that should have expired, but that's the best I think you can do.
That said, if you're just interested in peeking at the expiration information, I would recommend wrapping your cache values in a class that remembers the expiration time, so you can look up the expiration time for an entry just by looking up its value and calling a getExpirationTime() method or what have you.
That approach, at least, should not break with new Guava releases.
Well, unfortunately Guava doesn't seems to expose this functionality but if you feel adventurous and absolutely must have this you could always use reflection. Just look at sources and see what methods do you need. As always care should be taken as your code might break when Guaval internal implementation changes. Code below seems to work with Guava 10.0.1:
Cache<Integer, String> cache = CacheBuilder.newBuilder().expireAfterWrite(7, TimeUnit.DAYS).build(new CacheLoader<Integer, String>() {
#Override
public String load(Integer key) throws Exception {
return "The value is "+key.toString();
}
});
Integer key_1 = Integer.valueOf(1);
Integer key_2 = Integer.valueOf(2);
System.out.println(cache.get(key_1));
System.out.println(cache.get(key_2));
ConcurrentMap<Integer, String> map = cache.asMap();
Method m = map.getClass().getDeclaredMethod("getEntry", Object.class);
m.setAccessible(true);
for(Integer key: map.keySet()) {
Object value = m.invoke(map, key);
Method m2 = value.getClass().getDeclaredMethod("getExpirationTime", null);
m2.setAccessible(true);
Long expirationTime = (Long)m2.invoke(value, null);
System.out.println(key+" expiration time is "+expirationTime);
}
EDIT: I've reorganized this question to reflect the new information that since became available.
This question is based on the responses to a question by Viliam concerning Guava Maps' use of lazy eviction: Laziness of eviction in Guava's maps
Please read this question and its response first, but essentially the conclusion is that Guava maps do not asynchronously calculate and enforce eviction. Given the following map:
ConcurrentMap<String, MyObject> cache = new MapMaker()
.expireAfterAccess(10, TimeUnit.MINUTES)
.makeMap();
Once ten minutes has passed following access to an entry, it will still not be evicted until the map is "touched" again. Known ways to do this include the usual accessors - get() and put() and containsKey().
The first part of my question [solved]: what other calls cause the map to be "touched"? Specifically, does anyone know if size() falls into this category?
The reason for wondering this is that I've implemented a scheduled task to occasionally nudge the Guava map I'm using for caching, using this simple method:
public static void nudgeEviction() {
cache.containsKey("");
}
However I'm also using cache.size() to programmatically report the number of objects contained in the map, as a way to confirm this strategy is working. But I haven't been able to see a difference from these reports, and now I'm wondering if size() also causes eviction to take place.
Answer: So Mark has pointed out that in release 9, eviction is invoked only by the get(), put(), and replace() methods, which would explain why I wasn't seeing an effect for containsKey(). This will apparently change with the next version of guava which is set for release soon, but unfortunately my project's release is set sooner.
This puts me in an interesting predicament. Normally I could still touch the map by calling get(""), but I'm actually using a computing map:
ConcurrentMap<String, MyObject> cache = new MapMaker()
.expireAfterAccess(10, TimeUnit.MINUTES)
.makeComputingMap(loadFunction);
where loadFunction loads the MyObject corresponding to the key from a database. It's starting to look like I have no easy way of forcing eviction until r10. But even being able to reliably force eviction is put into doubt by the second part of my question:
The second part of my question [solved]: In reaction to one of the responses to the linked question, does touching the map reliably evict all expired entries? In the linked answer, Niraj Tolia indicates otherwise, saying eviction is potentially only processed in batches, which would mean multiple calls to touch the map might be needed to ensure all expired objects were evicted. He did not elaborate, however this seems related to the map being split into segments based on concurrency level. Assuming I used r10, in which a containsKey("") does invoke eviction, would this then be for the entire map, or only for one of the segments?
Answer: maaartinus has addressed this part of the question:
Beware that containsKey and other reading methods only run postReadCleanup, which does nothing but on each 64th invocation (see DRAIN_THRESHOLD). Moreover, it looks like all cleanup methods work with single Segment only.
So it looks like calling containsKey("") wouldn't be a viable fix, even in r10. This reduces my question to the title: How can I reliably force eviction to occur?
Note: Part of the reason my web app is noticeably affected by this issue is that when I implemented caching I decided to use multiple maps - one for each class of my data objects. So with this issue there is the possibility that one area of code is executed, causing a bunch of Foo objects to be cached, and then the Foo cache isn't touched again for a long time so it doesn't evict anything. Meanwhile Bar and Baz objects are being cached from other areas of code, and memory is being eaten. I'm setting a maximum size on these maps, but this is a flimsy safeguard at best (I'm assuming its effect is immediate - still need to confirm this).
UPDATE 1: Thanks to Darren for linking the relevant issues - they now have my votes. So it looks like a resolution is in the pipeline, but seems unlikely to be in r10. In the meantime, my question remains.
UPDATE 2: At this point I'm just waiting for a Guava team member to give feedback on the hack maaartinus and I put together (see answers below).
LAST UPDATE: feedback received!
I just added the method Cache.cleanUp() to Guava. Once you migrate from MapMaker to CacheBuilder you can use that to force eviction.
I was wondering the about the same issue you described in the first part of your question. From what I can tell from looking at the source code for Guava's CustomConcurrentHashMap (release 9), it appears that entries are evicted on the get(), put(), and replace() methods. The containsKey() method does not appear to invoke eviction. I'm not 100% sure because I took a quick pass at the code.
Update:
I also found a more recent version of the CustomConcurrentHashmap in Guava's git repository and it looks like containsKey() has been updated to invoke eviction.
Both release 9 and the latest version I just found do not invoke eviction when size() is called.
Update 2:
I recently noticed that Guava r10 (yet to be released) has a new class called CacheBuilder. Basically this class is a forked version of the MapMaker but with caching in mind. The documentation suggests that it will support some of the eviction requirements you are looking for.
I reviewed the updated code in r10's version of the CustomConcurrentHashMap and found what looks like a scheduled map cleaner. Unfortunately, that code appears unfinished at this point but r10 looks more and more promising each day.
Beware that containsKey and other reading methods only run postReadCleanup, which does nothing but on each 64th invocation (see DRAIN_THRESHOLD). Moreover, it looks like all cleanup methods work with single Segment only.
The easiest way to enforce eviction seems to be to put some dummy object into each segment. For this to work, you'd need to analyze CustomConcurrentHashMap.hash(Object), which is surely no good idea, as this method may change anytime. Moreover, depending on the key class it may be hard to find a key with a hashCode ensuring it lands in a given segment.
You could use reads instead, but would have to repeat them 64 times per segment. Here, it'd easy to find a key with an appropriate hashCode, since here any object is allowed as an argument.
Maybe you could hack into the CustomConcurrentHashMap source code instead, it could be as trivial as
public void runCleanup() {
final Segment<K, V>[] segments = this.segments;
for (int i = 0; i < segments.length; ++i) {
segments[i].runCleanup();
}
}
but I wouldn't do it without a lot of testing and/or an OK by a guava team member.
Yep, we've gone back and forth a few times on whether these cleanup tasks should be done on a background thread (or pool), or should be done on user threads. If they were done on a background thread, this would eventually happen automatically; as it is, it'll only happen as each segment gets used. We're still trying to come up with the right approach here - I wouldn't be surprised to see this change in some future release, but I also can't promise anything or even make a credible guess as to how it will change. Still, you've presented a reasonable use case for some kind of background or user-triggered cleanup.
Your hack is reasonable, as long as you keep in mind that it's a hack, and liable to break (possibly in subtle ways) in future releases. As you can see in the source, Segment.runCleanup() calls runLockedCleanup and runUnlockedCleanup: runLockedCleanup() will have no effect if it can't lock the segment, but if it can't lock the segment it's because some other thread has the segment locked, and that other thread can be expected to call runLockedCleanup as part of its operation.
Also, in r10, there's CacheBuilder/Cache, analogous to MapMaker/Map. Cache is the preferred approach for many current users of makeComputingMap. It uses a separate CustomConcurrentHashMap, in the common.cache package; depending on your needs, you may want your GuavaEvictionHacker to work with both. (The mechanism is the same, but they're different Classes and therefore different Methods.)
I'm not a big fan of hacking into or forking external code until absolutely necessary. This problem occurs in part due to an early decision for MapMaker to fork ConcurrentHashMap, thereby dragging in a lot of complexity that could have been deferred until after the algorithms were worked out. By patching above MapMaker, the code is robust to library changes so that you can remove your workaround on your own schedule.
An easy approach is to use a priority queue of weak reference tasks and a dedicated thread. This has the drawback of creating many stale no-op tasks, which can become excessive in due to the O(lg n) insertion penalty. It works reasonably well for small, less frequently used caches. It was the original approach taken by MapMaker and its simple to write your own decorator.
A more robust choice is to mirror the lock amortization model with a single expiration queue. The head of the queue can be volatile so that a read can always peek to determine if it has expired. This allows all reads to trigger an expiration and an optional clean-up thread to check regularly.
By far the simplest is to use #concurrencyLevel(1) to force MapMaker to use a single segment. This reduces the write concurrency, but most caches are read heavy so the loss is minimal. The original hack to nudge the map with a dummy key would then work fine. This would be my preferred approach, but the other two options are okay if you have high write loads.
I don't know if it is appropriate for your use case, but your main concern about the lack of background cache eviction seems to be memory consumption, so I would have thought that using softValues() on the MapMaker to allow the Garbage Collector to reclaim entries from the cache when a low memory situation occurs. Could easily be the solution for you. I have used this on a subscription-server (ATOM) where entries are served through a Guava cache using SoftReferences for values.
Based on maaartinus's answer, I came up with the following code which uses reflection rather than directly modifying the source (If you find this useful please upvote his answer!). While it will come at a performance penalty for using reflection, the difference should be negligible since I'll run it about once every 20 minutes for each caching Map (I'm also caching the dynamic lookups in the static block which will help). I have done some initial testing and it appears to work as intended:
public class GuavaEvictionHacker {
//Class objects necessary for reflection on Guava classes - see Guava docs for info
private static final Class<?> computingMapAdapterClass;
private static final Class<?> nullConcurrentMapClass;
private static final Class<?> nullComputingConcurrentMapClass;
private static final Class<?> customConcurrentHashMapClass;
private static final Class<?> computingConcurrentHashMapClass;
private static final Class<?> segmentClass;
//MapMaker$ComputingMapAdapter#cache points to the wrapped CustomConcurrentHashMap
private static final Field cacheField;
//CustomConcurrentHashMap#segments points to the array of Segments (map partitions)
private static final Field segmentsField;
//CustomConcurrentHashMap$Segment#runCleanup() enforces eviction on the calling Segment
private static final Method runCleanupMethod;
static {
try {
//look up Classes
computingMapAdapterClass = Class.forName("com.google.common.collect.MapMaker$ComputingMapAdapter");
nullConcurrentMapClass = Class.forName("com.google.common.collect.MapMaker$NullConcurrentMap");
nullComputingConcurrentMapClass = Class.forName("com.google.common.collect.MapMaker$NullComputingConcurrentMap");
customConcurrentHashMapClass = Class.forName("com.google.common.collect.CustomConcurrentHashMap");
computingConcurrentHashMapClass = Class.forName("com.google.common.collect.ComputingConcurrentHashMap");
segmentClass = Class.forName("com.google.common.collect.CustomConcurrentHashMap$Segment");
//look up Fields and set accessible
cacheField = computingMapAdapterClass.getDeclaredField("cache");
segmentsField = customConcurrentHashMapClass.getDeclaredField("segments");
cacheField.setAccessible(true);
segmentsField.setAccessible(true);
//look up the cleanup Method and set accessible
runCleanupMethod = segmentClass.getDeclaredMethod("runCleanup");
runCleanupMethod.setAccessible(true);
}
catch (ClassNotFoundException cnfe) {
throw new RuntimeException("ClassNotFoundException thrown in GuavaEvictionHacker static initialization block.", cnfe);
}
catch (NoSuchFieldException nsfe) {
throw new RuntimeException("NoSuchFieldException thrown in GuavaEvictionHacker static initialization block.", nsfe);
}
catch (NoSuchMethodException nsme) {
throw new RuntimeException("NoSuchMethodException thrown in GuavaEvictionHacker static initialization block.", nsme);
}
}
/**
* Forces eviction to take place on the provided Guava Map. The Map must be an instance
* of either {#code CustomConcurrentHashMap} or {#code MapMaker$ComputingMapAdapter}.
*
* #param guavaMap the Guava Map to force eviction on.
*/
public static void forceEvictionOnGuavaMap(ConcurrentMap<?, ?> guavaMap) {
try {
//we need to get the CustomConcurrentHashMap instance
Object customConcurrentHashMap;
//get the type of what was passed in
Class<?> guavaMapClass = guavaMap.getClass();
//if it's a CustomConcurrentHashMap we have what we need
if (guavaMapClass == customConcurrentHashMapClass) {
customConcurrentHashMap = guavaMap;
}
//if it's a NullConcurrentMap (auto-evictor), return early
else if (guavaMapClass == nullConcurrentMapClass) {
return;
}
//if it's a computing map we need to pull the instance from the adapter's "cache" field
else if (guavaMapClass == computingMapAdapterClass) {
customConcurrentHashMap = cacheField.get(guavaMap);
//get the type of what we pulled out
Class<?> innerCacheClass = customConcurrentHashMap.getClass();
//if it's a NullComputingConcurrentMap (auto-evictor), return early
if (innerCacheClass == nullComputingConcurrentMapClass) {
return;
}
//otherwise make sure it's a ComputingConcurrentHashMap - error if it isn't
else if (innerCacheClass != computingConcurrentHashMapClass) {
throw new IllegalArgumentException("Provided ComputingMapAdapter's inner cache was an unexpected type: " + innerCacheClass);
}
}
//error for anything else passed in
else {
throw new IllegalArgumentException("Provided ConcurrentMap was not an expected Guava Map: " + guavaMapClass);
}
//pull the array of Segments out of the CustomConcurrentHashMap instance
Object[] segments = (Object[])segmentsField.get(customConcurrentHashMap);
//loop over them and invoke the cleanup method on each one
for (Object segment : segments) {
runCleanupMethod.invoke(segment);
}
}
catch (IllegalAccessException iae) {
throw new RuntimeException(iae);
}
catch (InvocationTargetException ite) {
throw new RuntimeException(ite.getCause());
}
}
}
I'm looking for feedback on whether this approach is advisable as a stopgap until the issue is resolved in a Guava release, particularly from members of the Guava team when they get a minute.
EDIT: updated the solution to allow for auto-evicting maps (NullConcurrentMap or NullComputingConcurrentMap residing in a ComputingMapAdapter). This turned out to be necessary in my case, since I'm calling this method on all of my maps and a few of them are auto-evictors.
I'm writing some client-server-application where I have to deal with multiple threads. I've got some servers, that send alive-packets every few seconds. Those servers are maintained in a ConcurrentHashMap, that contains their EndPoints paired with the time the last alive-package arrived of the respective server.
Now I've got a thread, that has to "sort out" all the servers that haven't sent alive-packets for a specific amount of time.
I guess I can't just do it like that, can I?
for( IPEndPoint server : this.fileservers.keySet() )
{
Long time = this.fileservers.get( server );
//If server's time is updated here, I got a problem
if( time > fileserverTimeout )
this.fileservers.remove( server );
}
Is there a way I can get around that without aquiring a lock for the whole loop (that I then have to respect in the other threads as well)?
There is probably no problem here, depending on what exactly you store in the map. Your code looks a little weird to me, since you seem to save "the duration for which the server hasn't been active".
My first idea for recording that data was to store "the latest timestamp at which the server has been active". Then your code would look like this:
package so3950354;
import java.util.Iterator;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
public class ServerManager {
private final ConcurrentMap<Server, Long> lastActive = new ConcurrentHashMap<Server, Long>();
/** May be overridden by a special method for testing. */
protected long now() {
return System.currentTimeMillis();
}
public void markActive(Server server) {
lastActive.put(server, Long.valueOf(now()));
}
public void removeInactive(long timeoutMillis) {
final long now = now();
Iterator<Map.Entry<Server, Long>> it = lastActive.entrySet().iterator();
while (it.hasNext()) {
final Map.Entry<Server, Long> entry = it.next();
final long backThen = entry.getValue().longValue();
/*
* Even if some other code updates the timestamp of this server now,
* the server had timed out at some point in time, so it may be
* removed. It's bad luck, but impossible to avoid.
*/
if (now - backThen >= timeoutMillis) {
it.remove();
}
}
}
static class Server {
}
}
If you really want to avoid that no code ever calls markActive during a call to removeInactive, there is no way around explicit locking. What you probably want is:
concurrent calls to markActive are allowed.
during markActive no calls to removeInactive are allowed.
during removeInactive no calls to markActive are allowed.
This looks like a typical scenario for a ReadWriteLock, where markActive is the "reading" operation and removeInactive is the "writing" Operation.
I don't see how another thread can update the server's time at that point in your code. Once you've retrieved the time of a server from the map using this.fileservers.get( server ), another thread cannot change its value as Long objects are immutable. Yes, another thread can put a new Long object for that server into the map, but that doesn't affect this thread, because it has already retrieved the time of the server.
So as it stands I can't see anything wrong with your code. The iterators in a ConcurrentHashMap are weakly consistent which means they can tolerate concurrent modification, so there is no risk of a ConcurrentModificationException being thrown either.
(See Roland's answer, which takes the ideas here and fleshes them out into a fuller example, with some great additional insights.)
Since it's a concurrent hash map, you can do the following. Note that CHM's iterators all implement the optional methods, including remove(), which you want. See CHM API docs, which states:
This class and its views and iterators
implement all of the optional methods
of the Map and Iterator interfaces.
This code should work (I don't know the type of the Key in your CHM):
ConcurrentHashMap<K,Long> fileservers = ...;
for(Iterator<Map.Entry<K,Long>> fsIter = fileservers.entrySet().iterator(); fileservers.hasNext(); )
{
Map.Entry<K,Long> thisEntry = fsIter.next();
Long time = thisEntry.getValue();
if( time > fileserverTimeout )
fsIter.remove( server );
}
But note that there may be race conditions elsewhere... You need to make sure that other bits of code accessing the map can cope with this kind of spontaneous removal -- i.e., probably whereever you touch fileservers.put() you'll need a bit of logic involving fileservers.putIfAbsent(). This solution is less likely to create bottlenecks than using synchronized, but it also requires a bit more thought.
Where you wrote "If server's time is updated here, I got a problem" is exactly where putIfAbsent() comes in. If the entry is absent, either you hadn't seen it before, or you just recently dropped it from the table. If the two sides of this need to be coordinated, then you may instead want to introduce a lockable record for the entry, and carry out the synchronization at that level (i.e., sync on the record while doing remove(), rather than on the whole table). Then the put() end of things can also sync on the same record, eliminating a potential race.
Firstly make map synchronized
this.fileservers = Collections.synchronizedMap(Map)
then use the strategy which is used in Singleton classes
if( time > fileserverTimeout )
{
synchronized(this.fileservers)
{
if( time > fileserverTimeout )
this.fileservers.remove( server );
}
}
Now this makes sure that once you inside the synchronized block, no updates can occur. This is so because once the lock on the map is taken, map(synchronized wrapper) will not have itself available to provide a thread lock on it for update, remove etc.
Checking for time twice makes sure that synchronization is used only when there is a genuine case of delete