Scheduled in-memory cache invalidation based on predicate

Scheduled in-memory cache invalidation based on predicate - java

I have List<CompletableFuture> stored in Map
private final Map<UUID, List<CompletableFuture>> hydrationProcesses = new ConcurrentHashMap<>();
Currently there is a daemon thread which runs every 30 sec and removes all Futures that have alredy been completed.
There are lots of TTL invalidation implementations, I am looking for invalidation based on some predicate. I want to get rid of this daemon thread.
Is there any out of a box solution for scheduled cache invalidation based on some custom logic. Maybe something in Spring/Guava that I missed?
Maybe something similar to Guava's
CacheBuilder.newBuilder()
.expireAfterAccess(2,TimeUnit.MILLISECONDS)
.build(loader);
But rather then mark everything as expired after access I need to check if this Future has already been completed and then remove it from cache.

I doubt there is a solution based on Guava. Since you don't specifically ask for a "Guava-only" solution, I'd like to give an idea how to solve the problem with cache2k.
Here is how the setup could look like:
final long RECHECK_INTERVAL_MILLIS = 30000;
Cache<UUID, CompletableFuture<Void>> cache =
new Cache2kBuilder<UUID, CompletableFuture<Void>>(){}
.loader(new AdvancedCacheLoader<UUID, CompletableFuture<Void>>() {
#Override
public CompletableFuture<Void> load(UUID key,
long currentTime,
CacheEntry<UUID, CompletableFuture<Void>> currentEntry) {
return currentEntry != null ? currentEntry.getValue() : null;
}
})
.expiryPolicy(new ExpiryPolicy<UUID, CompletableFuture<Void>>() {
#Override
public long calculateExpiryTime(UUID key,
CompletableFuture<Void> value,
long loadTime,
CacheEntry<UUID, CompletableFuture<Void>> oldEntry) {
return value.isDone() ? NOW : loadTime + RECHECK_INTERVAL_MILLIS;
}
})
.refreshAhead(true)
.build();
Actually cache2k has similar capabilities then Guava or other caches. However there are tiny extensions that allow more sophisticated setups.
The trick used here, is to configure the cache in read through operation, but make the loader return the current cache value. When an entry expires, the loader is called, because of refreshAhead(true), but the current value is preserved and the expiry policy is evaluated again. The predicate that you like to be checked goes into the expiry policy.
Other caches have read through and custom expiry as well, but lack the concept of a "smarter loader" (AdvancedCacheLoader) that can act more efficiently based on the existing cache value.
We use setups similar to this in production.
There is a downside, too. cache2k uses one timing thread per cache, if custom expiry is used. This means your extra thread will not go away. cache2k will be enhanced in the future to have a global set of threads for timing that are shared among all caches. Update: New versions of cache2k use a common thread pool for timers.
Disclaimer: I am the author of cache2k, so I cannot speak with absolute certainty that there is no possible solution based on Guava.

Related

thread safe or not my class?

Service must be cached in-memory data and save data in the database. getAmount(id) retrieves current balance or zero if addAmount() method was not called before for
specified id. addAmount(id, amount) increases balance or set if method was called first time. Service must be thread-safe. Thread Safety Is my implementation? What improvements can be made?
public class AccountServiceImpl implements AccountService {
private static final Logger LOGGER = LoggerFactory.getLogger(AccountServiceImpl.class);
private LoadingCache cache;
private AccountDAO accountDAO = new AccountDAOImpl();
public AccountServiceImpl() {
cache = CacheBuilder.newBuilder()
.expireAfterAccess(1, TimeUnit.HOURS)
.concurrencyLevel(4)
.maximumSize(10000)
.recordStats()
.build(new CacheLoader<Integer, Account>() {
#Override
public Account load(Integer id) throws Exception {
return new Account(id, accountDAO.getAmountById(id));
}
});
}
public Long getAmount(Integer id) throws Exception {
synchronized (cache.get(id)) {
return cache.get(id).getAmount();
}
}
public void addAmount(Integer id, Long value) throws Exception {
Account account = cache.get(id);
synchronized (account) {
accountDAO.addAmount(id, value);
account.setAmount(accountDAO.getAmountById(id));
cache.put(id, account);
}
}
}

A race condition could occur if the Account is evicted from the cache and multiple updates to that account are taking place. The eviction results in multiple Account instances, so the synchronization doesn't provide atomicity and a stale value could be inserted into the cache.
The race is more obvious if you change the settings, e.g. maximumSize(0). At the current settings likelihood of the race may be rare, but eviction may still occur even after the access. This is because the entry might be chosen for eviction but not yet removed, so a subsequent read succeeds even though the access is ignored from the policy's perspective.
The proper way to do this in Guava is to Cache.invalidate() the entry. The DAO is transactionally updating the system of record, so it ensures atomicity of the operation. The LoadingCache ensures atomicity of an entry being computed, so reads will be blocked while a fresh value is loaded. This results in an extra database lookup which seems unnecessary, but is negligible in practice. Unfortunately there is a tiny potential race even still, because Guava does not invalidate loading entries.
Guava doesn't support the write-through caching behavior you are trying to implement. Its successor, Caffeine, does by exposing Java 8's compute map methods and soon a CacheWriter abstraction. That said, the loading approach Guava expects is simple, elegant, and less error prone than manual updates.

There are two issues here to take care of:
The update of the amount value must be atomic.
If you have declared:
class Account { long amount; }
Changing the field value is not atomic on 32 bit systems. It is atomic on 64 bit systems. See: Are 64 bit assignments in Java atomic on a 32 bit machine?
So, the best way would be to change the declaration to "volatile long amout;" Then the update of the value is always atomic, plus, the volatile ensures that the others Threads/CPUs see the changed value.
That means for updating the single value, you don't need the synchronized block.
Race between inserting and modify
With your synchronized statements you just solve the first problem. But there are multiple races in your code.
See this code:
synchronized (cache.get(id)) {
return cache.get(id).getAmount();
}
You obviously assume that cache.get(id) returns the same object instance if called for the same id. That is not the case, since a cache essentially does not guarantee this.
Guava Cache blocks until the loading is complete. Other Cache may or may not block, meaning if requests come in in parallel multiple loads will be called resulting in multiple changes of the stored cache value.
Still, Guava Cache is a cache, so the item may be evicted from the cache any time, so for the next get another instance is returned.
Same problem here:
public void addAmount(Integer id, Long value) throws Exception {
Account account = cache.get(id);
/* what happens if lots of requests come in and another
threads evict the account object from the cache? */
synchronized (account) {
. . .
In general: Never synchronize on an object that life cycle is not in your control. BTW: Other cache implementations may store just the serialized object value and return another instance on each request.
Since you have a cache.put after the modify, your solution will probably work. However, the synchronize does just fulfill the purpose of flushing the memory, it may or may not really do the locking.
The update of the cache happens after the value changed in the database. This means an application may read a former value even if it is changed in the database already. This may lead to inconsistencies.
Solution 1
Have a static set of lock objects that you chose by the key value e.g. by locks[id % locks.length]. See my answer here: Guava Cache, how to block access while doing removal
Solution 2
Use database transactions, and update with the pattern:
Transaction.begin();
cache.remove(id);
accountDAO.addAmount(id, value);
Transaction.commit();
Do not update the value directly inside the cache. This will lead to update races and needs locking again.
If the transactions are solely handled in the DAO, this means for your software architecture that the caching should be implemented in the DAO and not outside.
Solution 3
Why not just store the amount value in the cache? If it is allowed that the cache results may be inconsistent with the database content while updating, the simplest solution is:
public AccountServiceImpl() {
cache = CacheBuilder.newBuilder()
.expireAfterAccess(1, TimeUnit.HOURS)
.concurrencyLevel(4)
.maximumSize(10000)
.recordStats()
.build(new CacheLoader<Integer, Account>() {
#Override
public Account load(Integer id) throws Exception {
return accountDAO.getAmountById(id);
}
});
}
Long getAmount(Integer id) {
return cache.get(id);
}
void addAmount(Integer id, Long value) {
accountDAO.addAmount(id, value);
cache.remove(id);
}

No,
private LoadingCache cache;
must be final.
cache.get(id)
must be synchronized. Are you using a library for that?

Cache must be synchronized . Otherwise two threads updating amount at same time, you will never be sure of final result. check the implementation of `put' method of used library

Caching with eviction based on timestamp

My service endpoint recieves a list of metric every minute along with their timestamps. If a metric passes certain conditions we need to store them in a cache, so that they can be accessed later. The access functions for this service are -
List<Metrics> GetAllInterestingMetrics5Mins();
List<Metrics> GetAllInterestingMetrics10Mins();
List<Metrics> GetAllInterestingMetrics30Mins();
My curent solution is to use 3 Guava caches with time based eviction set to 5, 10 & 15 minutes. When somebody calls one of the above functions, I return all the metrics from the relvant cache.
There are 2 problems with this -
Guava cache start timing for eviction based on when the value is put in the cache (or accessed, depending upon setting). Now its possible for a metric to be delayed, so the timestamp would be earlier than the time when the metric is put in the cache.
I dont like that I have to create 3 caches, when one cache with 30 mins should suffice, it increases memory footprint and complexity in cache handling.
Is there a way to solve these 2 problems in Guava or any other out of box caching solution ?

There is a particular difference between caching solutions like Guava and EHCache and what you are trying to implement. The sole purpose of these caches is to act in the same way than getter functions work. So, caches are intended to retrieve a single element by its key and store it for further use; evicting it after it stops being used.
E.g.
#Cacheable
public Object getter(String key){
...
}
That's why getting a whole set of objects from the cache feels a little like forcing the cache and the eviction policy to work differently from its original purpose.
What you need, instead of Guava cache (or other caching solutions), is a collection that can be evicted all at once by a timer function. Sadly, Guava doesn't provide that right now. You would still need a timer function provided by the application that would remove all existing elements from the cache.
So, my suggestion would be the following:
Even when it is possible for Guava to behave in the way you want it to, you will find out that you are not using the features that make Guava really valuable, and you are "forcing" it to behave differently. So I suggest you forget about the Guava implementation and consider using, for example, an specialization from the AbstractMap class, along with a timer function that will evict its contents every N seconds.
This way you will be able to have all your entries in a single cache and stop worrying about the discrepancies between the timestamp and the time the entry was added to the cache.

Regarding Topic 1:
Just a sidenote: Please do not confuse expiry and eviction. Expiry means the entry may not returned by the cache any more and may happen at a specified point in time or after a duration. Eviction is the action to free resources, the entry is removed from the cache. After expiry, eviction may happen at the same time or later.
All common cache products do not have support for exact, aka "point in time", expiry. We need that usecase very often in our applications so I spent some effort with cache2k to support this.
Here is a blueprint for cache2k:
static class MetricsEntry {
long nextUpdate;
List<Metrics> metrics;
}
static class MyEntryExpiryCalculator implements EntryExpiryCalculator<Integer, MetricsEntry> {
#Override
public long calculateExpiryTime(Integer _key, MetricsEntry _value, long _fetchTime, CacheEntry _oldEntry) {
return _value.nextUpdate;
}
}
Cache createTheCache() {
Cache<Integer, MetricsEntry> cache =
CacheBuilder.newCache(Integer.class, MetricsEntry.class)
.sharpExpiry(true)
.entryExpiryCalculator(new MyEntryExpiryCalculator())
.source(new MySource())
.build();
return cache;
}
If you have a time reference in the metrics objects, you can use that and you can omit the additional entry class. sharpExpiry(true) instructs cache2k for exact expiry. If you leave this out, the expiry may be a few milliseconds off, but the access time is slightly faster.
Regarding Topic 2:
The straight forward approach would be to use the interval minutes as cache key.
Here is a cache source (aka cache loader) that strictly returns the metrics of the previous interval:
static class MySource implements CacheSource<Integer, MetricsEntry> {
#Override
public MetricsEntry get(Integer interval) {
MetricsEntry e = new MetricsEntry();
boolean crossedIntervalEnd;
do {
long now = System.currentTimeMillis();
long intervalMillis = interval * 1000 * 60;
long startOfInterval = now % (intervalMillis);
e.metrics = calculateMetrics(startOfInterval, interval);
e.nextUpdate = startOfInterval + intervalMillis;
now = System.currentTimeMillis();
crossedIntervalEnd = now >= e.nextUpdate;
} while (crossedIntervalEnd);
return e;
}
}
That would return the metrics for 10:00-10:05 if you do the request on lets say 10:07.
If you just want to calculate instantly the metrics of the past interval, then it is simpler:
static class MySource implements CacheSource<Integer, MetricsEntry> {
#Override
public MetricsEntry get(Integer interval) {
MetricsEntry e = new MetricsEntry();
long intervalMillis = interval * 1000 * 60;
long startOfInterval = System.currentTimeMillis();
e.metrics = calculateMetrics(startOfInterval, interval);
e.nextUpdate = startOfInterval + intervalMillis;
return e;
}
}
The use of the cache source has an advantage over put(). cache2k is blocking, so if multiple requests come in for one metric, only one metric calculation is started.
If you don't need exact expiry to the millisecond, you can use other caches, too. The thing you need to do is to store the time it takes to calculate the metrics within your cache value and then correct the expiry duration accordingly.
Have a good one!

Have you considered using something like a Deque instead? Just put the metrics in the queue and when you want to retrieve metrics for the last N minutes, just start at the end with the most recent additions and take everything until you find one that's from > N minutes ago. You can evict entries that are too old from the other end in a similar way. (It's not clear to me from your question how the key/value aspect of Cache relates to your problem.)

Persist guava cache on shutdown

I use the following guava cache to store messages for a specific time waiting for a possible response. So I use the cache more like a timeout for messages:
Cache cache = CacheBuilder.newBuilder().expireAfterWrite(7, TimeUnit.DAYS).build();
cache.put(id,message);
...
cache.getIfPresent(id);
In the end I need to persist the messages with its currently 'timeout' information on shutdown
and restore it on startup with the internal already expired times per entry. I couldn't find any methods which give me access to the time information, so I can handle it by myself.
The gauva wiki says:
Your application will not need to store more data than what would fit in RAM. (Guava caches are local to a single run of your application. They do not store data in files, or on outside servers. If this does not fit your needs, consider a tool like Memcached.)
Do you think this restriction address also a 'timeout' map to persist on shutdown?

I don't believe there's any way to recreate the cache with per-entry expiration values -- even if you do use reflection. You might be able to simulate it by using a DelayedQueue in a separate thread that explicitly invalidates entries that should have expired, but that's the best I think you can do.
That said, if you're just interested in peeking at the expiration information, I would recommend wrapping your cache values in a class that remembers the expiration time, so you can look up the expiration time for an entry just by looking up its value and calling a getExpirationTime() method or what have you.
That approach, at least, should not break with new Guava releases.

Well, unfortunately Guava doesn't seems to expose this functionality but if you feel adventurous and absolutely must have this you could always use reflection. Just look at sources and see what methods do you need. As always care should be taken as your code might break when Guaval internal implementation changes. Code below seems to work with Guava 10.0.1:
Cache<Integer, String> cache = CacheBuilder.newBuilder().expireAfterWrite(7, TimeUnit.DAYS).build(new CacheLoader<Integer, String>() {
#Override
public String load(Integer key) throws Exception {
return "The value is "+key.toString();
}
});
Integer key_1 = Integer.valueOf(1);
Integer key_2 = Integer.valueOf(2);
System.out.println(cache.get(key_1));
System.out.println(cache.get(key_2));
ConcurrentMap<Integer, String> map = cache.asMap();
Method m = map.getClass().getDeclaredMethod("getEntry", Object.class);
m.setAccessible(true);
for(Integer key: map.keySet()) {
Object value = m.invoke(map, key);
Method m2 = value.getClass().getDeclaredMethod("getExpirationTime", null);
m2.setAccessible(true);
Long expirationTime = (Long)m2.invoke(value, null);
System.out.println(key+" expiration time is "+expirationTime);
}

reliably forcing Guava map eviction to take place

EDIT: I've reorganized this question to reflect the new information that since became available.
This question is based on the responses to a question by Viliam concerning Guava Maps' use of lazy eviction: Laziness of eviction in Guava's maps
Please read this question and its response first, but essentially the conclusion is that Guava maps do not asynchronously calculate and enforce eviction. Given the following map:
ConcurrentMap<String, MyObject> cache = new MapMaker()
.expireAfterAccess(10, TimeUnit.MINUTES)
.makeMap();
Once ten minutes has passed following access to an entry, it will still not be evicted until the map is "touched" again. Known ways to do this include the usual accessors - get() and put() and containsKey().
The first part of my question [solved]: what other calls cause the map to be "touched"? Specifically, does anyone know if size() falls into this category?
The reason for wondering this is that I've implemented a scheduled task to occasionally nudge the Guava map I'm using for caching, using this simple method:
public static void nudgeEviction() {
cache.containsKey("");
}
However I'm also using cache.size() to programmatically report the number of objects contained in the map, as a way to confirm this strategy is working. But I haven't been able to see a difference from these reports, and now I'm wondering if size() also causes eviction to take place.
Answer: So Mark has pointed out that in release 9, eviction is invoked only by the get(), put(), and replace() methods, which would explain why I wasn't seeing an effect for containsKey(). This will apparently change with the next version of guava which is set for release soon, but unfortunately my project's release is set sooner.
This puts me in an interesting predicament. Normally I could still touch the map by calling get(""), but I'm actually using a computing map:
ConcurrentMap<String, MyObject> cache = new MapMaker()
.expireAfterAccess(10, TimeUnit.MINUTES)
.makeComputingMap(loadFunction);
where loadFunction loads the MyObject corresponding to the key from a database. It's starting to look like I have no easy way of forcing eviction until r10. But even being able to reliably force eviction is put into doubt by the second part of my question:
The second part of my question [solved]: In reaction to one of the responses to the linked question, does touching the map reliably evict all expired entries? In the linked answer, Niraj Tolia indicates otherwise, saying eviction is potentially only processed in batches, which would mean multiple calls to touch the map might be needed to ensure all expired objects were evicted. He did not elaborate, however this seems related to the map being split into segments based on concurrency level. Assuming I used r10, in which a containsKey("") does invoke eviction, would this then be for the entire map, or only for one of the segments?
Answer: maaartinus has addressed this part of the question:
Beware that containsKey and other reading methods only run postReadCleanup, which does nothing but on each 64th invocation (see DRAIN_THRESHOLD). Moreover, it looks like all cleanup methods work with single Segment only.
So it looks like calling containsKey("") wouldn't be a viable fix, even in r10. This reduces my question to the title: How can I reliably force eviction to occur?
Note: Part of the reason my web app is noticeably affected by this issue is that when I implemented caching I decided to use multiple maps - one for each class of my data objects. So with this issue there is the possibility that one area of code is executed, causing a bunch of Foo objects to be cached, and then the Foo cache isn't touched again for a long time so it doesn't evict anything. Meanwhile Bar and Baz objects are being cached from other areas of code, and memory is being eaten. I'm setting a maximum size on these maps, but this is a flimsy safeguard at best (I'm assuming its effect is immediate - still need to confirm this).
UPDATE 1: Thanks to Darren for linking the relevant issues - they now have my votes. So it looks like a resolution is in the pipeline, but seems unlikely to be in r10. In the meantime, my question remains.
UPDATE 2: At this point I'm just waiting for a Guava team member to give feedback on the hack maaartinus and I put together (see answers below).
LAST UPDATE: feedback received!

I just added the method Cache.cleanUp() to Guava. Once you migrate from MapMaker to CacheBuilder you can use that to force eviction.

I was wondering the about the same issue you described in the first part of your question. From what I can tell from looking at the source code for Guava's CustomConcurrentHashMap (release 9), it appears that entries are evicted on the get(), put(), and replace() methods. The containsKey() method does not appear to invoke eviction. I'm not 100% sure because I took a quick pass at the code.
Update:
I also found a more recent version of the CustomConcurrentHashmap in Guava's git repository and it looks like containsKey() has been updated to invoke eviction.
Both release 9 and the latest version I just found do not invoke eviction when size() is called.
Update 2:
I recently noticed that Guava r10 (yet to be released) has a new class called CacheBuilder. Basically this class is a forked version of the MapMaker but with caching in mind. The documentation suggests that it will support some of the eviction requirements you are looking for.
I reviewed the updated code in r10's version of the CustomConcurrentHashMap and found what looks like a scheduled map cleaner. Unfortunately, that code appears unfinished at this point but r10 looks more and more promising each day.

Beware that containsKey and other reading methods only run postReadCleanup, which does nothing but on each 64th invocation (see DRAIN_THRESHOLD). Moreover, it looks like all cleanup methods work with single Segment only.
The easiest way to enforce eviction seems to be to put some dummy object into each segment. For this to work, you'd need to analyze CustomConcurrentHashMap.hash(Object), which is surely no good idea, as this method may change anytime. Moreover, depending on the key class it may be hard to find a key with a hashCode ensuring it lands in a given segment.
You could use reads instead, but would have to repeat them 64 times per segment. Here, it'd easy to find a key with an appropriate hashCode, since here any object is allowed as an argument.
Maybe you could hack into the CustomConcurrentHashMap source code instead, it could be as trivial as
public void runCleanup() {
final Segment<K, V>[] segments = this.segments;
for (int i = 0; i < segments.length; ++i) {
segments[i].runCleanup();
}
}
but I wouldn't do it without a lot of testing and/or an OK by a guava team member.

Yep, we've gone back and forth a few times on whether these cleanup tasks should be done on a background thread (or pool), or should be done on user threads. If they were done on a background thread, this would eventually happen automatically; as it is, it'll only happen as each segment gets used. We're still trying to come up with the right approach here - I wouldn't be surprised to see this change in some future release, but I also can't promise anything or even make a credible guess as to how it will change. Still, you've presented a reasonable use case for some kind of background or user-triggered cleanup.
Your hack is reasonable, as long as you keep in mind that it's a hack, and liable to break (possibly in subtle ways) in future releases. As you can see in the source, Segment.runCleanup() calls runLockedCleanup and runUnlockedCleanup: runLockedCleanup() will have no effect if it can't lock the segment, but if it can't lock the segment it's because some other thread has the segment locked, and that other thread can be expected to call runLockedCleanup as part of its operation.
Also, in r10, there's CacheBuilder/Cache, analogous to MapMaker/Map. Cache is the preferred approach for many current users of makeComputingMap. It uses a separate CustomConcurrentHashMap, in the common.cache package; depending on your needs, you may want your GuavaEvictionHacker to work with both. (The mechanism is the same, but they're different Classes and therefore different Methods.)

I'm not a big fan of hacking into or forking external code until absolutely necessary. This problem occurs in part due to an early decision for MapMaker to fork ConcurrentHashMap, thereby dragging in a lot of complexity that could have been deferred until after the algorithms were worked out. By patching above MapMaker, the code is robust to library changes so that you can remove your workaround on your own schedule.
An easy approach is to use a priority queue of weak reference tasks and a dedicated thread. This has the drawback of creating many stale no-op tasks, which can become excessive in due to the O(lg n) insertion penalty. It works reasonably well for small, less frequently used caches. It was the original approach taken by MapMaker and its simple to write your own decorator.
A more robust choice is to mirror the lock amortization model with a single expiration queue. The head of the queue can be volatile so that a read can always peek to determine if it has expired. This allows all reads to trigger an expiration and an optional clean-up thread to check regularly.
By far the simplest is to use #concurrencyLevel(1) to force MapMaker to use a single segment. This reduces the write concurrency, but most caches are read heavy so the loss is minimal. The original hack to nudge the map with a dummy key would then work fine. This would be my preferred approach, but the other two options are okay if you have high write loads.

I don't know if it is appropriate for your use case, but your main concern about the lack of background cache eviction seems to be memory consumption, so I would have thought that using softValues() on the MapMaker to allow the Garbage Collector to reclaim entries from the cache when a low memory situation occurs. Could easily be the solution for you. I have used this on a subscription-server (ATOM) where entries are served through a Guava cache using SoftReferences for values.

Based on maaartinus's answer, I came up with the following code which uses reflection rather than directly modifying the source (If you find this useful please upvote his answer!). While it will come at a performance penalty for using reflection, the difference should be negligible since I'll run it about once every 20 minutes for each caching Map (I'm also caching the dynamic lookups in the static block which will help). I have done some initial testing and it appears to work as intended:
public class GuavaEvictionHacker {
//Class objects necessary for reflection on Guava classes - see Guava docs for info
private static final Class<?> computingMapAdapterClass;
private static final Class<?> nullConcurrentMapClass;
private static final Class<?> nullComputingConcurrentMapClass;
private static final Class<?> customConcurrentHashMapClass;
private static final Class<?> computingConcurrentHashMapClass;
private static final Class<?> segmentClass;
//MapMaker$ComputingMapAdapter#cache points to the wrapped CustomConcurrentHashMap
private static final Field cacheField;
//CustomConcurrentHashMap#segments points to the array of Segments (map partitions)
private static final Field segmentsField;
//CustomConcurrentHashMap$Segment#runCleanup() enforces eviction on the calling Segment
private static final Method runCleanupMethod;
static {
try {
//look up Classes
computingMapAdapterClass = Class.forName("com.google.common.collect.MapMaker$ComputingMapAdapter");
nullConcurrentMapClass = Class.forName("com.google.common.collect.MapMaker$NullConcurrentMap");
nullComputingConcurrentMapClass = Class.forName("com.google.common.collect.MapMaker$NullComputingConcurrentMap");
customConcurrentHashMapClass = Class.forName("com.google.common.collect.CustomConcurrentHashMap");
computingConcurrentHashMapClass = Class.forName("com.google.common.collect.ComputingConcurrentHashMap");
segmentClass = Class.forName("com.google.common.collect.CustomConcurrentHashMap$Segment");
//look up Fields and set accessible
cacheField = computingMapAdapterClass.getDeclaredField("cache");
segmentsField = customConcurrentHashMapClass.getDeclaredField("segments");
cacheField.setAccessible(true);
segmentsField.setAccessible(true);
//look up the cleanup Method and set accessible
runCleanupMethod = segmentClass.getDeclaredMethod("runCleanup");
runCleanupMethod.setAccessible(true);
}
catch (ClassNotFoundException cnfe) {
throw new RuntimeException("ClassNotFoundException thrown in GuavaEvictionHacker static initialization block.", cnfe);
}
catch (NoSuchFieldException nsfe) {
throw new RuntimeException("NoSuchFieldException thrown in GuavaEvictionHacker static initialization block.", nsfe);
}
catch (NoSuchMethodException nsme) {
throw new RuntimeException("NoSuchMethodException thrown in GuavaEvictionHacker static initialization block.", nsme);
}
}
/**
* Forces eviction to take place on the provided Guava Map. The Map must be an instance
* of either {#code CustomConcurrentHashMap} or {#code MapMaker$ComputingMapAdapter}.
*
* #param guavaMap the Guava Map to force eviction on.
*/
public static void forceEvictionOnGuavaMap(ConcurrentMap<?, ?> guavaMap) {
try {
//we need to get the CustomConcurrentHashMap instance
Object customConcurrentHashMap;
//get the type of what was passed in
Class<?> guavaMapClass = guavaMap.getClass();
//if it's a CustomConcurrentHashMap we have what we need
if (guavaMapClass == customConcurrentHashMapClass) {
customConcurrentHashMap = guavaMap;
}
//if it's a NullConcurrentMap (auto-evictor), return early
else if (guavaMapClass == nullConcurrentMapClass) {
return;
}
//if it's a computing map we need to pull the instance from the adapter's "cache" field
else if (guavaMapClass == computingMapAdapterClass) {
customConcurrentHashMap = cacheField.get(guavaMap);
//get the type of what we pulled out
Class<?> innerCacheClass = customConcurrentHashMap.getClass();
//if it's a NullComputingConcurrentMap (auto-evictor), return early
if (innerCacheClass == nullComputingConcurrentMapClass) {
return;
}
//otherwise make sure it's a ComputingConcurrentHashMap - error if it isn't
else if (innerCacheClass != computingConcurrentHashMapClass) {
throw new IllegalArgumentException("Provided ComputingMapAdapter's inner cache was an unexpected type: " + innerCacheClass);
}
}
//error for anything else passed in
else {
throw new IllegalArgumentException("Provided ConcurrentMap was not an expected Guava Map: " + guavaMapClass);
}
//pull the array of Segments out of the CustomConcurrentHashMap instance
Object[] segments = (Object[])segmentsField.get(customConcurrentHashMap);
//loop over them and invoke the cleanup method on each one
for (Object segment : segments) {
runCleanupMethod.invoke(segment);
}
}
catch (IllegalAccessException iae) {
throw new RuntimeException(iae);
}
catch (InvocationTargetException ite) {
throw new RuntimeException(ite.getCause());
}
}
}
I'm looking for feedback on whether this approach is advisable as a stopgap until the issue is resolved in a Guava release, particularly from members of the Guava team when they get a minute.
EDIT: updated the solution to allow for auto-evicting maps (NullConcurrentMap or NullComputingConcurrentMap residing in a ComputingMapAdapter). This turned out to be necessary in my case, since I'm calling this method on all of my maps and a few of them are auto-evictors.

Thread safety when iterating through concurrent collections

I'm writing some client-server-application where I have to deal with multiple threads. I've got some servers, that send alive-packets every few seconds. Those servers are maintained in a ConcurrentHashMap, that contains their EndPoints paired with the time the last alive-package arrived of the respective server.
Now I've got a thread, that has to "sort out" all the servers that haven't sent alive-packets for a specific amount of time.
I guess I can't just do it like that, can I?
for( IPEndPoint server : this.fileservers.keySet() )
{
Long time = this.fileservers.get( server );
//If server's time is updated here, I got a problem
if( time > fileserverTimeout )
this.fileservers.remove( server );
}
Is there a way I can get around that without aquiring a lock for the whole loop (that I then have to respect in the other threads as well)?

There is probably no problem here, depending on what exactly you store in the map. Your code looks a little weird to me, since you seem to save "the duration for which the server hasn't been active".
My first idea for recording that data was to store "the latest timestamp at which the server has been active". Then your code would look like this:
package so3950354;
import java.util.Iterator;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
public class ServerManager {
private final ConcurrentMap<Server, Long> lastActive = new ConcurrentHashMap<Server, Long>();
/** May be overridden by a special method for testing. */
protected long now() {
return System.currentTimeMillis();
}
public void markActive(Server server) {
lastActive.put(server, Long.valueOf(now()));
}
public void removeInactive(long timeoutMillis) {
final long now = now();
Iterator<Map.Entry<Server, Long>> it = lastActive.entrySet().iterator();
while (it.hasNext()) {
final Map.Entry<Server, Long> entry = it.next();
final long backThen = entry.getValue().longValue();
/*
* Even if some other code updates the timestamp of this server now,
* the server had timed out at some point in time, so it may be
* removed. It's bad luck, but impossible to avoid.
*/
if (now - backThen >= timeoutMillis) {
it.remove();
}
}
}
static class Server {
}
}
If you really want to avoid that no code ever calls markActive during a call to removeInactive, there is no way around explicit locking. What you probably want is:
concurrent calls to markActive are allowed.
during markActive no calls to removeInactive are allowed.
during removeInactive no calls to markActive are allowed.
This looks like a typical scenario for a ReadWriteLock, where markActive is the "reading" operation and removeInactive is the "writing" Operation.

I don't see how another thread can update the server's time at that point in your code. Once you've retrieved the time of a server from the map using this.fileservers.get( server ), another thread cannot change its value as Long objects are immutable. Yes, another thread can put a new Long object for that server into the map, but that doesn't affect this thread, because it has already retrieved the time of the server.
So as it stands I can't see anything wrong with your code. The iterators in a ConcurrentHashMap are weakly consistent which means they can tolerate concurrent modification, so there is no risk of a ConcurrentModificationException being thrown either.

(See Roland's answer, which takes the ideas here and fleshes them out into a fuller example, with some great additional insights.)
Since it's a concurrent hash map, you can do the following. Note that CHM's iterators all implement the optional methods, including remove(), which you want. See CHM API docs, which states:
This class and its views and iterators
implement all of the optional methods
of the Map and Iterator interfaces.
This code should work (I don't know the type of the Key in your CHM):
ConcurrentHashMap<K,Long> fileservers = ...;
for(Iterator<Map.Entry<K,Long>> fsIter = fileservers.entrySet().iterator(); fileservers.hasNext(); )
{
Map.Entry<K,Long> thisEntry = fsIter.next();
Long time = thisEntry.getValue();
if( time > fileserverTimeout )
fsIter.remove( server );
}
But note that there may be race conditions elsewhere... You need to make sure that other bits of code accessing the map can cope with this kind of spontaneous removal -- i.e., probably whereever you touch fileservers.put() you'll need a bit of logic involving fileservers.putIfAbsent(). This solution is less likely to create bottlenecks than using synchronized, but it also requires a bit more thought.
Where you wrote "If server's time is updated here, I got a problem" is exactly where putIfAbsent() comes in. If the entry is absent, either you hadn't seen it before, or you just recently dropped it from the table. If the two sides of this need to be coordinated, then you may instead want to introduce a lockable record for the entry, and carry out the synchronization at that level (i.e., sync on the record while doing remove(), rather than on the whole table). Then the put() end of things can also sync on the same record, eliminating a potential race.

Firstly make map synchronized
this.fileservers = Collections.synchronizedMap(Map)
then use the strategy which is used in Singleton classes
if( time > fileserverTimeout )
{
synchronized(this.fileservers)
{
if( time > fileserverTimeout )
this.fileservers.remove( server );
}
}
Now this makes sure that once you inside the synchronized block, no updates can occur. This is so because once the lock on the map is taken, map(synchronized wrapper) will not have itself available to provide a thread lock on it for update, remove etc.
Checking for time twice makes sure that synchronization is used only when there is a genuine case of delete

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.