How to cluster an instance with Weka's DBSCAN?

How to cluster an instance with Weka's DBSCAN? - java

I've been trying to use the DBSCAN clusterer from Weka to cluster instances. From what I understand I should be using the clusterInstance() method for this, but to my surprise, when taking a look at the code of that method, it looks like the implementation ignores the parameter:
/**
* Classifies a given instance.
*
* #param instance The instance to be assigned to a cluster
* #return int The number of the assigned cluster as an integer
* #throws java.lang.Exception If instance could not be clustered
* successfully
*/
public int clusterInstance(Instance instance) throws Exception {
if (processed_InstanceID >= database.size()) processed_InstanceID = 0;
int cnum = (database.getDataObject(Integer.toString(processed_InstanceID++))).getClusterLabel();
if (cnum == DataObject.NOISE)
throw new Exception();
else
return cnum;
}
This doesn't seem right. How is that supposed to work? Is there a different method I should be using for clustering? Do I have to run this method sequentially on all instances, in some specific order, if I want to get any useful information out of it?

This has been reported as a bug - [Wekalist] DBScan - Issue/Bug with "clusterInstance()"-Function.
I'm doing some clustering with the DBScan library. Unfortunately it
seems that there is a bug in the function "clusterInstance()". The
function doesn't return the number of the assigned cluster but only
returns the cluster-number of the first database element (or the
second on the second call, the third on the third call, and so on.)
and NOT the assigned instance.
It simply cannot work because the assigned variable is never used in
the function.
The response reads:
DBScan and Optics are contributions to Weka. It's probably best if you
contact the authors to see if they can suggest a bug fix. The code and
package info (Weka 3.7) has contact information:
http://weka.sourceforge.net/packageMetaData/optics_dbScan/index.html
I'm afraid I am unfamiliar with the DBScan algorithm and the code is quite old now (2004), you might be lucky and find that you are still able to contact the authors at LMU Munich.
I did find numerous copies of it via Google Code Search and GitHub but I could not find an example where it had been fixed. While searching I did notice several other implementations of DBScan that you could examine to work out how this one could be fixed (e.g. ELKI's DBSCAN)
As I have said I am unfamiliar with DBScan but looking at the JavaDocs gave me the impression that actual clustering is invoked by calling buildClusterer(Instances instances). Examining the source code there seems to be much more going on inside the buildClusterer method than the clusterInstance method. OPTICS.java contains a clusterInstance method too and that one just throws an exception. If your are lucky maybe you can get by without a functioning clusterInstance method.
I found an example of Weka's DBScan being used here: DBSCANClustering.java

The example posted by Mark shows well how to use the DBScan class.
The method that does the actual clustering is DBScan.buildClusterer(Instances instances).
The DBScan.clusterInstance(Instance instance) is supposed to return the number of the assigned cluster for a given instance (after you ran the buildClusterer method). But it's true the parameter is actually ignored, so I guess it won't do what it's supposed to do.

As Mark answered, this is obviously a bug. As long as you query about instances in the exact same order in which they were inserted into the clusterer it's okay; but it won't work in any other case.
A co-worker solved this by writing her own version of the DBScan class: essentially identical (copy-pasted), except that she maintains a mapping between instances and cluster labels. This mapping can be produced by iterating over the contents of the database instance. The appropriate cluster for an instance can then be immediately retrieved from that mapping.
Editing this method is also a good opportunity to change the throw new Exception into something more sensible in this context, such as return -1.

Related

Why are there a lot of non-public low-level Java methods ending in 0?

I like to poke around the Java source with IntelliJ. However, I have noticed something strange. Some non-public methods, particularly those in low-level classes, end in 0. This is most often the case with native methods, however I have observed some non-native methods with this name too. For example, java.lang.reflect.Executable#getAnnotatedReturnType0(Type), and java.lang.reflect.AccessibleObject#setAccessible0(boolean). The former simply sets two boolean flags! What is the reasoning behind this strange convention?

It's a somewhat common convention for helper methods.
Java does not allow methods-in-methods (though there is some traffic about adding them). Sometimes you have a repetitive task that nevertheless doesn't fit properly in a looping structure (such as a while or for loop) and doesn't easily fit in a lambda (because its parameter + checked exception requirements don't line up to a convenient type, or the code predates JDK8).
The JDK itself, and some other projects, use the convention of: helper methods get the same name it is a helper for, with a single digit suffix, generally 0 (if there are more helpers, they'd be 1, 2, etc).
This is particularly common with overloading. Let's say you have the following method signature:
/**
* Detaches a disk from a virtual computer.
* The disk must have been unmounted already on the virtual PC, unless
* the {#code force} parmater is true, but note that (long story
* about the perils of force disconnecting here).
*
* #param computerId computer to detach the disk from.
* #param diskId ID of disk to detach
* #param force Force the issue even if the disk is in use.
* #throws DiskInUseException If the disk is in use. Never thrown if
* {#code force} is {#code true}.
*/
public void detachDisk(
String computerId, String diskId, boolean force) throws DiskInUseException { ... }
That's a bad API. It's got all sorts of ugly warts on it:
The docs specify a lot of caveats that literally do not apply whatsoever if I set force to false.
It throws a checked exception that may never occur, which is always bad (don't force calling code to catch impossible exceptions, obviously!) - if force is true, the DiskInUseException cannot occur.
If I call it: detachDisk(computer, disk, true) and I see that in code, I can guess that computer is a variable referring to a virtualPC, same for disk but whatever might true be about? I might guess that it's about forcing things, but maybe in my head that third parameter is safely (the reverse of force).
We can solve a few issues by using an enum instead of a boolean, but that leaves the rest. This is vastly superior API design:
/**
* Same text, but leave out all the stuff about the dangers of forcing.
* #see #forceDetachDisk(String, String)
*/
public void detachDisk(String computerId, String diskId)
throws DiskInUseException {..}
/**
* Same, but now highlight the dangers of forcing.
* #see #detachDisk(String, String)
*/
void forceDetachDisk(String computerId, String diskId) { .. }
Muuuch better, but most likely these 2 methods share most of the implementation. That naturally leads to making this thing:
private void detachDisk0(String computerId, String diskId,
boolean force) throws DiskInUseException { .. }
which is private so the confusion and warts it has don't matter, and the public detachDisk methods will simply call the private one. However, some auto-complete dialogs, and certainly most java compilers including javac itself, do recognize calls to private methods and will 'helpfully' alert you about the fact that it does exist, and would you perhaps like to change its accessor keyword (if it's a source file and not a class dep)? - That's not so nice. Toss that 0 in there to lessen the effect, and to make more clear that the private method is meant solely to be invoked by the 0-less methods and no other method; you wrote it with only its usage from the detachDisk method, and not any other method, not even in the same source file.
That's an API design more projects should be using.

This "convention" has been used not only for Java. I remember seeing the same in C#. Usually you will see a public method with the same name (without the zero at the end) calling the private method ending in zero.
But anyway, this naming convention for a private method is not a good example or something you should copy.

CPLEX+JAVA: Retrieving Solution inside Heuristic Callback behaves weirdly

I have the following problem: I solve a large VRP with many synchronization constraints using CPLEX integrated with a heuristic component to improve incumbents. The general algorithm is as follows: If a new incumbent is found in CPLEX, or if a time limit is reached, I move to a heuristic, and try to improve the current incumbent. The former is done using an Incumbent Callback, the latter using a Heuristic Callback. While I am able to query all variables in the incumbent callback, I get some weird behavior in the heuristic callback:
When I query
this.getStatus().toString()
this returns "Optimal", even though the solution is not optimal yet (there is an incumbent, but still a rather large integrality gap). I made sure that the model actually queries the correct cplex object by looking into objective value and current integrality gap, they match the log. Then,
this.getIncumbentValue(v[n][i][j]);
fails (it also fails if I query the values using this.getIncumbentValues(v[n][i]);).
When I check in the model (using cplex.exportModel(String filename), all variables are present.
I was thinking that this might be related to the fact that I use CPLEX as a singleton, but the status is already "optimal" when I use the singleton for the first time (in the first iteration however, all variables can be queried, this problem only exists in the second iteration).
I create the singleton as such:
public static IloCplex getCplex() {
if (cplex == null) {
try {
cplex = new IloCplex();
} catch (IloException e) {
e.printStackTrace();
}
} else {
try {
cplex.clearModel();
cplex.setDefaults();
} catch (IloException e) {
e.printStackTrace();
}
}
return cplex;
}
Did I maybe do something wrong here?
EDIT: The exact error message including back trace is:
ilog.cplex.IloCplex$UnknownObjectException: CPLEX Error: object is unknown to IloCplex
at ilog.cplex.CpxNumVar.getVarIndexValue(CpxNumVar.java:295)
at ilog.cplex.IloCplex$MIPInfoCallback.getIndex(IloCplex.java:13648)
at ilog.cplex.IloCplex$MIPInfoCallback.getIncumbentValues(IloCplex.java:13807)
at ilog.cplex.IloCplex$MIPInfoCallback.getIncumbentValues(IloCplex.java:13785)
at SolverHybridCRP$InsertSolution.getV(SolverHybridCRP.java:2091)
at SolverHybridCRP$InsertSolution.improveIncumbent(SolverHybridCRP.java:2054)
at SolverHybridCRP$InsertSolution.main(SolverHybridCRP.java:2024)
at ilog.cplex.CpxCallback.callmain(CpxCallback.java:160)
at ilog.cplex.CpxHeuristicCallbackFunction.callIt(CpxHeuristicCallbackFunction.java:48)
at ilog.cplex.Cplex.CPXmipopt(Native Method)
at ilog.cplex.CplexI$SolveHandle.start(CplexI.java:2837)
at ilog.cplex.CplexI.solve(CplexI.java:2963)
at ilog.cplex.IloCplex.solve(IloCplex.java:10254)
at SolverHybridCRP.solveModel(SolverHybridCRP.java:1525)
at AppHelp.runtimeTest4(AppHelp.java:1218)
at AppHelp.main(AppHelp.java:61)
it occurs when I query ANY variable, but only after I query the cplex object the second time. (So: I start the program, it iterates over a lot of instances, the first instance is fine, all heuristic callbacks work, in all further iterations, I end up in the catch-block and get the above exception trace.)
That is why I assumed that maybe the singleton does not work exactly as supposed, and not everything is deleted from the first iteration.

Looking at the reference documentation you can see for IloCplex.HeuristicCallback.getStatus()
Returns the solution status for the current node.
This method returns the status of the solution found by the instance
of IloCplex at the current node during the last call to the method
IloCplex.HeuristicCallback.solve (which may have been called directly
in the callback or by IloCplex when processing the node just before
the callback is called).
In other words, the function does not return a global status but only a node-local status. It is expected that the node is currently solved to optimality when the callback is invoked.
With respect to the exception in the callback: you are trying to access a variable object that is not in the model being solved. Typical cases in which this happens are:
You created the variable but it does not appear in any constraints or in the objective, i.e., it is not used anywhere. You can force its usage by explicitly calling cplex.add() with the variable as argument.
The variable was created in a previous iteration and is no longer part of the model in the current iteration. A good way to debug this is to assing a name to each variable and have that name include the iteration index. Then in the exception handler print the name of the offending variable. That should give a very good hint about what is wrong.

Determine all Strings (dynamically built and literals) passed to a method in Java

I am trying to build a static code analysis, that allows to collect all Strings passed to a function without running the code. I am using Eclipse JDT (3.10.0) to parse the code.
Assumptions/Precodinditions:
Every passed argument can be resolved to a String literal
The value is not saved to any (non-static) fields and then passed on in another call
All callers can be identified and visited by the parser
The MethodInvocation of the examined method is already identified
What i have:
At the moment i am able to identify all MethodInvocations on that particular Method and therefore am able to collect all arguments passed as StringLiterals.
I am able to see all argument Types but, of course, cannot determine the value of parameters, fields, objects etc. as the value binding would only be available at runtime.
The Problem
Under the assumption, that every single passed Argument (no matter the Type) is at some time resolvable to a StringLiteral or a Concatenation of StringLiterals, i should be able to determine the distinct set of all Values which are passed to this method by the program.
Is there a way, to recursively determine the String value of all method calls without following every stacktrace and manually implementing the logic for every occurence?
Imagine the following examples:
public class IAmAnalysed{
public void analysedMethod(String argument){
//do something useful
}
}
//Values in a map
hashMap.put("test", "TestString");
hashMap.put("test2", "TestString2");
for (Map.Entry<String, String> e : hashMap.entrySet()) {
iAmAnalysed.analysedMethod(e.getValue);
}
//util method
public void util(String argument){
iAmAnalysed.analysedMethod(argument + "utilCalled");
}
util("TestString3")
This should give me the following set of values:
TestString
TestString2
TestString3utilCalled
The only appraoch i can think of (using Eclipse JDT) is to add every argument that is no StringLiteral to a working Set and start one more iteration with the ASTParser to determine where the passed value is set, or where it comes from. Then i add this location to the workingSet and iterate once more. This at the end should lead me to all possible arguments.
Unfortunately with this approach i would have to implement logic for every single possible way the value could be passed (imagine all the other possibilities next to the two above)

Examining all possible data flows into a given method is not generally feasible by static analysis. The approach you outline can work for a small group of programs, until you hit, e.g., recursion, at what point the thing will blow up.
Maybe a comprehensive test suite will give better results than static analysis on this one.

Why need final def in poll() method of LinkedList in Java

/**
* Retrieves and removes the head (first element) of this list.
*
* #return the head of this list, or {#code null} if this list is empty
* #since 1.5
*/
public E poll() {
final Node<E> f = first; //why final here? <<-------- ******* --------
return (f == null) ? null : unlinkFirst(f);
}
Hi there, I'm reading the source code of JDK 1.7. In above code snippet in LinkedList.java, I cannot understand why need 'final' in poll() method. Why not :
public E poll() {
return (first == null) ? null : unlinkFirst(first);
}
Can you share the insight of the implementation? Thanks.

Most of the methods in LinkedList use the final declaration on local variables.
LinkedList JDK 1.7 Source
This is likely related to the concept behind Using "final" modifier whenever applicable in java.
Adding final to all things which should not change simply narrows down the possibilities that you (or the next programmer, working on your code) will misinterpret or misuse the thought process which resulted in your code. At least it should ring some bells when they now want to change your previously immutable thing.
Technically, at the cost of 6 letters, you guarantee that something you don't ever expect to change will never change.
Does your proposed code work? Yes. I don't see any scenarios where it wouldn't. It is programmatically valid.
However, the use of final throughout the code supports sanity testing, and understandably, for all the util stuff that holds pretty much all of the things we do in Java together, it'd be nice to know that everything is working as intended.
Note: If there is a security issue that I have not seen, however, I would be interested to know about that, in a separate answer.

Martin Buchholz answered this on the concurrency-interest list in a related question:
We in jsr166-land consider our software important enough to make
optimizations we don't recommend to regular java programmers. Copying final
fields to locals generates smaller bytecode and might help the jit produce
better code (and with current hotspot, still does).
Using final on locals has no performance advantage, but it does have some
software engineering advantages. We tend to use it for locals with the same
name as a field, e.g.
final Foo foo = this.foo;

Compass answer is very reasonable, but I just want to add a further guess. I think it's a micro optimization since the access to the local variable f should, on average, be faster than access to the class field first. The final modifier is good practice but doesn't effect access latency. In this specific case the gain would likely be extremely small since you are trading two class field accesses for a single class field access and two local variable accesses.
This is not something that you should do in everyday programming, but since the collections library is used by basically every single Java program in existence these things are justifiable here.

reliably forcing Guava map eviction to take place

EDIT: I've reorganized this question to reflect the new information that since became available.
This question is based on the responses to a question by Viliam concerning Guava Maps' use of lazy eviction: Laziness of eviction in Guava's maps
Please read this question and its response first, but essentially the conclusion is that Guava maps do not asynchronously calculate and enforce eviction. Given the following map:
ConcurrentMap<String, MyObject> cache = new MapMaker()
.expireAfterAccess(10, TimeUnit.MINUTES)
.makeMap();
Once ten minutes has passed following access to an entry, it will still not be evicted until the map is "touched" again. Known ways to do this include the usual accessors - get() and put() and containsKey().
The first part of my question [solved]: what other calls cause the map to be "touched"? Specifically, does anyone know if size() falls into this category?
The reason for wondering this is that I've implemented a scheduled task to occasionally nudge the Guava map I'm using for caching, using this simple method:
public static void nudgeEviction() {
cache.containsKey("");
}
However I'm also using cache.size() to programmatically report the number of objects contained in the map, as a way to confirm this strategy is working. But I haven't been able to see a difference from these reports, and now I'm wondering if size() also causes eviction to take place.
Answer: So Mark has pointed out that in release 9, eviction is invoked only by the get(), put(), and replace() methods, which would explain why I wasn't seeing an effect for containsKey(). This will apparently change with the next version of guava which is set for release soon, but unfortunately my project's release is set sooner.
This puts me in an interesting predicament. Normally I could still touch the map by calling get(""), but I'm actually using a computing map:
ConcurrentMap<String, MyObject> cache = new MapMaker()
.expireAfterAccess(10, TimeUnit.MINUTES)
.makeComputingMap(loadFunction);
where loadFunction loads the MyObject corresponding to the key from a database. It's starting to look like I have no easy way of forcing eviction until r10. But even being able to reliably force eviction is put into doubt by the second part of my question:
The second part of my question [solved]: In reaction to one of the responses to the linked question, does touching the map reliably evict all expired entries? In the linked answer, Niraj Tolia indicates otherwise, saying eviction is potentially only processed in batches, which would mean multiple calls to touch the map might be needed to ensure all expired objects were evicted. He did not elaborate, however this seems related to the map being split into segments based on concurrency level. Assuming I used r10, in which a containsKey("") does invoke eviction, would this then be for the entire map, or only for one of the segments?
Answer: maaartinus has addressed this part of the question:
Beware that containsKey and other reading methods only run postReadCleanup, which does nothing but on each 64th invocation (see DRAIN_THRESHOLD). Moreover, it looks like all cleanup methods work with single Segment only.
So it looks like calling containsKey("") wouldn't be a viable fix, even in r10. This reduces my question to the title: How can I reliably force eviction to occur?
Note: Part of the reason my web app is noticeably affected by this issue is that when I implemented caching I decided to use multiple maps - one for each class of my data objects. So with this issue there is the possibility that one area of code is executed, causing a bunch of Foo objects to be cached, and then the Foo cache isn't touched again for a long time so it doesn't evict anything. Meanwhile Bar and Baz objects are being cached from other areas of code, and memory is being eaten. I'm setting a maximum size on these maps, but this is a flimsy safeguard at best (I'm assuming its effect is immediate - still need to confirm this).
UPDATE 1: Thanks to Darren for linking the relevant issues - they now have my votes. So it looks like a resolution is in the pipeline, but seems unlikely to be in r10. In the meantime, my question remains.
UPDATE 2: At this point I'm just waiting for a Guava team member to give feedback on the hack maaartinus and I put together (see answers below).
LAST UPDATE: feedback received!

I just added the method Cache.cleanUp() to Guava. Once you migrate from MapMaker to CacheBuilder you can use that to force eviction.

I was wondering the about the same issue you described in the first part of your question. From what I can tell from looking at the source code for Guava's CustomConcurrentHashMap (release 9), it appears that entries are evicted on the get(), put(), and replace() methods. The containsKey() method does not appear to invoke eviction. I'm not 100% sure because I took a quick pass at the code.
Update:
I also found a more recent version of the CustomConcurrentHashmap in Guava's git repository and it looks like containsKey() has been updated to invoke eviction.
Both release 9 and the latest version I just found do not invoke eviction when size() is called.
Update 2:
I recently noticed that Guava r10 (yet to be released) has a new class called CacheBuilder. Basically this class is a forked version of the MapMaker but with caching in mind. The documentation suggests that it will support some of the eviction requirements you are looking for.
I reviewed the updated code in r10's version of the CustomConcurrentHashMap and found what looks like a scheduled map cleaner. Unfortunately, that code appears unfinished at this point but r10 looks more and more promising each day.

Beware that containsKey and other reading methods only run postReadCleanup, which does nothing but on each 64th invocation (see DRAIN_THRESHOLD). Moreover, it looks like all cleanup methods work with single Segment only.
The easiest way to enforce eviction seems to be to put some dummy object into each segment. For this to work, you'd need to analyze CustomConcurrentHashMap.hash(Object), which is surely no good idea, as this method may change anytime. Moreover, depending on the key class it may be hard to find a key with a hashCode ensuring it lands in a given segment.
You could use reads instead, but would have to repeat them 64 times per segment. Here, it'd easy to find a key with an appropriate hashCode, since here any object is allowed as an argument.
Maybe you could hack into the CustomConcurrentHashMap source code instead, it could be as trivial as
public void runCleanup() {
final Segment<K, V>[] segments = this.segments;
for (int i = 0; i < segments.length; ++i) {
segments[i].runCleanup();
}
}
but I wouldn't do it without a lot of testing and/or an OK by a guava team member.

Yep, we've gone back and forth a few times on whether these cleanup tasks should be done on a background thread (or pool), or should be done on user threads. If they were done on a background thread, this would eventually happen automatically; as it is, it'll only happen as each segment gets used. We're still trying to come up with the right approach here - I wouldn't be surprised to see this change in some future release, but I also can't promise anything or even make a credible guess as to how it will change. Still, you've presented a reasonable use case for some kind of background or user-triggered cleanup.
Your hack is reasonable, as long as you keep in mind that it's a hack, and liable to break (possibly in subtle ways) in future releases. As you can see in the source, Segment.runCleanup() calls runLockedCleanup and runUnlockedCleanup: runLockedCleanup() will have no effect if it can't lock the segment, but if it can't lock the segment it's because some other thread has the segment locked, and that other thread can be expected to call runLockedCleanup as part of its operation.
Also, in r10, there's CacheBuilder/Cache, analogous to MapMaker/Map. Cache is the preferred approach for many current users of makeComputingMap. It uses a separate CustomConcurrentHashMap, in the common.cache package; depending on your needs, you may want your GuavaEvictionHacker to work with both. (The mechanism is the same, but they're different Classes and therefore different Methods.)

I'm not a big fan of hacking into or forking external code until absolutely necessary. This problem occurs in part due to an early decision for MapMaker to fork ConcurrentHashMap, thereby dragging in a lot of complexity that could have been deferred until after the algorithms were worked out. By patching above MapMaker, the code is robust to library changes so that you can remove your workaround on your own schedule.
An easy approach is to use a priority queue of weak reference tasks and a dedicated thread. This has the drawback of creating many stale no-op tasks, which can become excessive in due to the O(lg n) insertion penalty. It works reasonably well for small, less frequently used caches. It was the original approach taken by MapMaker and its simple to write your own decorator.
A more robust choice is to mirror the lock amortization model with a single expiration queue. The head of the queue can be volatile so that a read can always peek to determine if it has expired. This allows all reads to trigger an expiration and an optional clean-up thread to check regularly.
By far the simplest is to use #concurrencyLevel(1) to force MapMaker to use a single segment. This reduces the write concurrency, but most caches are read heavy so the loss is minimal. The original hack to nudge the map with a dummy key would then work fine. This would be my preferred approach, but the other two options are okay if you have high write loads.

I don't know if it is appropriate for your use case, but your main concern about the lack of background cache eviction seems to be memory consumption, so I would have thought that using softValues() on the MapMaker to allow the Garbage Collector to reclaim entries from the cache when a low memory situation occurs. Could easily be the solution for you. I have used this on a subscription-server (ATOM) where entries are served through a Guava cache using SoftReferences for values.

Based on maaartinus's answer, I came up with the following code which uses reflection rather than directly modifying the source (If you find this useful please upvote his answer!). While it will come at a performance penalty for using reflection, the difference should be negligible since I'll run it about once every 20 minutes for each caching Map (I'm also caching the dynamic lookups in the static block which will help). I have done some initial testing and it appears to work as intended:
public class GuavaEvictionHacker {
//Class objects necessary for reflection on Guava classes - see Guava docs for info
private static final Class<?> computingMapAdapterClass;
private static final Class<?> nullConcurrentMapClass;
private static final Class<?> nullComputingConcurrentMapClass;
private static final Class<?> customConcurrentHashMapClass;
private static final Class<?> computingConcurrentHashMapClass;
private static final Class<?> segmentClass;
//MapMaker$ComputingMapAdapter#cache points to the wrapped CustomConcurrentHashMap
private static final Field cacheField;
//CustomConcurrentHashMap#segments points to the array of Segments (map partitions)
private static final Field segmentsField;
//CustomConcurrentHashMap$Segment#runCleanup() enforces eviction on the calling Segment
private static final Method runCleanupMethod;
static {
try {
//look up Classes
computingMapAdapterClass = Class.forName("com.google.common.collect.MapMaker$ComputingMapAdapter");
nullConcurrentMapClass = Class.forName("com.google.common.collect.MapMaker$NullConcurrentMap");
nullComputingConcurrentMapClass = Class.forName("com.google.common.collect.MapMaker$NullComputingConcurrentMap");
customConcurrentHashMapClass = Class.forName("com.google.common.collect.CustomConcurrentHashMap");
computingConcurrentHashMapClass = Class.forName("com.google.common.collect.ComputingConcurrentHashMap");
segmentClass = Class.forName("com.google.common.collect.CustomConcurrentHashMap$Segment");
//look up Fields and set accessible
cacheField = computingMapAdapterClass.getDeclaredField("cache");
segmentsField = customConcurrentHashMapClass.getDeclaredField("segments");
cacheField.setAccessible(true);
segmentsField.setAccessible(true);
//look up the cleanup Method and set accessible
runCleanupMethod = segmentClass.getDeclaredMethod("runCleanup");
runCleanupMethod.setAccessible(true);
}
catch (ClassNotFoundException cnfe) {
throw new RuntimeException("ClassNotFoundException thrown in GuavaEvictionHacker static initialization block.", cnfe);
}
catch (NoSuchFieldException nsfe) {
throw new RuntimeException("NoSuchFieldException thrown in GuavaEvictionHacker static initialization block.", nsfe);
}
catch (NoSuchMethodException nsme) {
throw new RuntimeException("NoSuchMethodException thrown in GuavaEvictionHacker static initialization block.", nsme);
}
}
/**
* Forces eviction to take place on the provided Guava Map. The Map must be an instance
* of either {#code CustomConcurrentHashMap} or {#code MapMaker$ComputingMapAdapter}.
*
* #param guavaMap the Guava Map to force eviction on.
*/
public static void forceEvictionOnGuavaMap(ConcurrentMap<?, ?> guavaMap) {
try {
//we need to get the CustomConcurrentHashMap instance
Object customConcurrentHashMap;
//get the type of what was passed in
Class<?> guavaMapClass = guavaMap.getClass();
//if it's a CustomConcurrentHashMap we have what we need
if (guavaMapClass == customConcurrentHashMapClass) {
customConcurrentHashMap = guavaMap;
}
//if it's a NullConcurrentMap (auto-evictor), return early
else if (guavaMapClass == nullConcurrentMapClass) {
return;
}
//if it's a computing map we need to pull the instance from the adapter's "cache" field
else if (guavaMapClass == computingMapAdapterClass) {
customConcurrentHashMap = cacheField.get(guavaMap);
//get the type of what we pulled out
Class<?> innerCacheClass = customConcurrentHashMap.getClass();
//if it's a NullComputingConcurrentMap (auto-evictor), return early
if (innerCacheClass == nullComputingConcurrentMapClass) {
return;
}
//otherwise make sure it's a ComputingConcurrentHashMap - error if it isn't
else if (innerCacheClass != computingConcurrentHashMapClass) {
throw new IllegalArgumentException("Provided ComputingMapAdapter's inner cache was an unexpected type: " + innerCacheClass);
}
}
//error for anything else passed in
else {
throw new IllegalArgumentException("Provided ConcurrentMap was not an expected Guava Map: " + guavaMapClass);
}
//pull the array of Segments out of the CustomConcurrentHashMap instance
Object[] segments = (Object[])segmentsField.get(customConcurrentHashMap);
//loop over them and invoke the cleanup method on each one
for (Object segment : segments) {
runCleanupMethod.invoke(segment);
}
}
catch (IllegalAccessException iae) {
throw new RuntimeException(iae);
}
catch (InvocationTargetException ite) {
throw new RuntimeException(ite.getCause());
}
}
}
I'm looking for feedback on whether this approach is advisable as a stopgap until the issue is resolved in a Guava release, particularly from members of the Guava team when they get a minute.
EDIT: updated the solution to allow for auto-evicting maps (NullConcurrentMap or NullComputingConcurrentMap residing in a ComputingMapAdapter). This turned out to be necessary in my case, since I'm calling this method on all of my maps and a few of them are auto-evictors.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.