my ideal cache using guava - java

Off and on for the past few weeks I've been trying to find my ideal cache implementation using guava's MapMaker. See my previous two questions here and here to follow my thought process.
Taking what I've learned, my next attempt is going to ditch soft values in favor of maximumSize and expireAfterAccess:
ConcurrentMap<String, MyObject> cache = new MapMaker()
.maximumSize(MAXIMUM_SIZE)
.expireAfterAccess(MINUTES_TO_EXPIRY, TimeUnit.MINUTES)
.makeComputingMap(loadFunction);
where
Function<String, MyObject> loadFunction = new Function<String, MyObject>() {
#Override
public MyObject apply(String uidKey) {
return getFromDataBase(uidKey);
}
};
However, the one remaining issue I'm still grappling with is that this implementation will evict objects even if they are strongly reachable, once their time is up. This could result in multiple objects with the same UID floating around in the environment, which I don't want (I believe what I'm trying to achieve is known as canonicalization).
So as far as I can tell the only answer is to have an additional map which functions as an interner that I can check to see if a data object is still in memory:
ConcurrentMap<String, MyObject> interner = new MapMaker()
.weakValues()
.makeMap();
and the load function would be revised:
Function<String, MyObject> loadFunction = new Function<String, MyObject>() {
#Override
public MyObject apply(String uidKey) {
MyObject dataObject = interner.get(uidKey);
if (dataObject == null) {
dataObject = getFromDataBase(uidKey);
interner.put(uidKey, dataObject);
}
return dataObject;
}
};
However, using two maps instead of one for the cache seems inefficient. Is there a more sophisticated way to approach this? In general, am I going about this the right way, or should I rethink my caching strategy?

Whether two maps is efficient depends entirely on how expensive getFromDatabase() is, and how big your objects are. It does not seem out of all reasonable boundaries to do something like this.
As for the implementation, It looks like you can probably layer your maps in a slightly different way to get the behavior you want, and still have good concurrency properties.
Create your first map with weak values, and put the computing function getFromDatabase() on this map.
The second map is the expiring one, also computing, but this function just gets from the first map.
Do all your access through the second map.
In other words, the expiring map acts to pin a most-recently-used subset of your objects in memory, while the weak-reference map is the real cache.
-dg

I don't understand the full picture here, but two things.
Given this statement: "this implementation will evict objects even if they are strongly reachable, once their time is up. This could result in multiple objects with the same UID floating around in the environment, which I don't want." -- it sounds like you just need to use weakKeys() and NOT use either timed or size-based eviction.
Or if you do want to bring an "interner" into this, I'd use a real Interners.newWeakInterner.

Related

Is updating the value of a Map by mutating its reference bad practice?

I have code similar to this all around a codebase I'm working on:
Map<String, Map<Object, Integer>> mapOfMap = new HashMap<>();
Map<Object, Integer> mapA = mapOfMap.computeIfAbsent(keyParam, k -> new HashMap<>());
mapA.put(objectParam, mapA.getOrDefault(objectParam, 0) + 1);
This way, we have updated the value of 'keyParam' in the first map without a direct call to the put() method.
I tend to prefer when code is explicit and would typically write a plain call to 'put'. However, I'm wondering if I'm overthinking it and this conciseness is something we gain for having object references? Can this be written in a similarly concise way in Java that pretends the objects are immutable?
It can be, but not in this case.
In this case, the code knows what it's doing, presumably. It's maybe even inteded for the object to be mutable. The only way to get around this is copy the value object, mutate it, then put it again. This could be quite costly in some programs, also I am not sure if the style is really better, I don't feel it would be more readable or always less error prone. It depends on how much you believe in immutable objects in general.
Where it does become a problem is when things are mutated that should not be. But this has nothing to do with maps per se.
For example:
In this issue on github, a reference to a list that is supposed to be immutable gets passed to a part of the code that does mutate it, leading to at least a memory leak in this case.

Java:How can I populate map if I use callables?

I want to use a Map as a form of small database "cache" in my application.
I thought that it would be better to use something like:
ConcurrentHashMap<K,Callable<V>>
So that I have a single cache for many kind of database objects (and not 1 for each kind i.e. `ConcurrentHashMap<K,V> where V would be some specific object).
My problem now (assuming all the above thoughts are reasonable) is how would I pre-load this cache on start up from DB?
I mean using callable if I need something in the cache and is not there the callable would get it the first time and have it ready on the next get.
But how can I pre-load the cache if I use callables?
Note:I am not interested in using some library since my needs are small.
You might have better luck with ConcurrentHashMap<K, Future<V>>, since Future better matches the concept of "something in the process of being computed, or possibly already computed." You could just initialize some elements of the cache with a Future that's already computed.
Couldn't you just do something simple like this?
for (Callable<V> c : map.values()) {
c.call();
}
You probably should use interfaces on your objects:
public interface Cacheable{}
public MyObject implements Cacheable{...}
ConcurrentHashMap<K, Cacheable> = ...

Reuse hashmaps in array

I am holding an array of hashmaps, I want to gain maximum performance and memory usage so I would like to resue the hashmaps inside an array.
So when there is a hashmap in the array that is not needed any more and I want to add new hashmap to the array I just clear the hashmap and use put() to add new values.
I also need to copy back values when I retireve hashmap from array.
I am not sure if this is better than creating new HashMap() every time.
What is better?
UPDATE
need to cycle about 50 milions of hashmaps, each hash map has about 10 key-value pairs. If size of the array 20,000 I need just 20,000 hashmaps instead of 50 milions new hashmaps()
Be very careful with this approach. Although it may be better performance-wise to recycle objects, you may get into trouble by modifying the same reference several times, as illustrated in the following example:
public class A {
public int counter = 0;
public static void main(String[] args) {
A a = new A();
a.counter = 5;
A b = a; // I want to save a into b and then recycle a for other purposes
a.counter = 10; // now b.counter is also 10
}
}
I'm sure you got the point, however if you are not copying around references to HashMaps from the array, then it should be ok.
Doesn't matter. Premature optimization. Come back when you have profiler results telling you where you're actually spending most memory or CPU cycles
It is entirely unclear why re-using maps in this manner would improve performance and/or memory usage. For all we know, it might make no difference, or might have the opposite effect.
You should do whatever results in the most readable code, then profile, and finally optimize the parts of the code that the profiler highlights as bottlenecks.
In most cases you will not feel any difference.
Typically number of map entries is MUCH higher than number of map objects. When you populate map you create instance of Map.Entry per entry. This is relatively light-weight object but anyway you invoke new. The map itself without data is lightweight too, so you will not get any benefits with these tricks unless your map is supposed to hold 1-2 entries.
Bottom line.
Forget about pre-mature optimization. Implement your application. If you have performance problems profile the application, find bottle necks and fix them. I can 99% guarantee you that the bottleneck will never be in new HashMap() call.
I think what you want is an Object pool kind of thing, where you get an object(in your case, its HashMap) from the object pool, perform your operations, and if that Object is no longer needed you put it back in the pool.
check for Object pool design pattern, for further reference check this link :
http://sourcemaking.com/design_patterns/object_pool
The problem you have is that most of the objects are Map.Entry objects in the HashMap. While you can recycle the HashMap itself (and its array) these are only a small portion of the objects. One way around this is to use FastMap from javolution which recycles everything and has support for managing the lifecycle (its designed to minimise garbage this way)
I suspect the most efficient way is to use an EnumMap is possible (if you have known key attributes) or POJOs even if most fields are not used.
There's a few problems with reusing HashMaps.
Even if the key and value data were to take no memory (shared from other places), the Map.Entry objects would dominate memory usage but not be reused (unless you did something a bit special).
Because of generational GC, generally having old objects point to new is expensive (and relatively difficult to see what's going on). Might not be an issue if you are keeping millions of these.
More complicated code is more difficult to optimise. So keep it simple, and then do the big optimisations, which probably involve changing the data structures.

Accesing hidden getEntry(Object key) in HashMap

I have similar problem to one discussed here, but with stronger practical usage.
For example, I have a Map<String, Integer>, and I have some function, which is given a key and in case the mapped integer value is negative, puts NULL to the map:
Map<String, Integer> map = new HashMap<String, Integer>();
public void nullifyIfNegative(String key) {
Integer value = map.get(key);
if (value != null && value.intValue() < 0) {
map.put(key, null);
}
}
I this case, the lookup (and hence, hashCode calculation for the key) is done twice: one for lookup and one for replacement. It would be nice to have another method (which is already in HashMap) and allows to make this more effective:
public void nullifyIfNegative(String key) {
Map.Entry<String, Integer> entry = map.getEntry(key);
if (entry != null && entry.getValue().intValue() < 0) {
entry.setValue(null);
}
}
The same concerns cases, when you want to manipulate immutable objects, which can be map values:
Map<String, String>: I want to append something to the string value.
Map<String, int[]>: I want to insert a number into the array.
So the case is quite common. Solutions, which might work, but not for me:
Reflection. Is good, but I cannot sacrifice performance just for this nice feature.
Use org.apache.commons.collections.map.AbstractHashedMap (it has at least protected getEntry() method), but unfortunately, commons-collections do not support generics.
Use generic commons-collections, but this library (AFAIK) is out-of-date (not in sync with latest library version from Apache), and (what is critical) is not available in central maven repository.
Use value wrappers, which means "making values mutable" (e.g. use mutable integers [e.g. org.apache.commons.lang.mutable.MutableInt], or collections instead of arrays). This solutions leads to memory loss, which I would like to avoid.
Try to extend java.util.HashMap with custom class implementation (which should be in java.util package) and put it to endorsed folder (as java.lang.ClassLoader will refuse to load it in Class<?> defineClass(String name, byte[] b, int off, int len), see sources), but I don't want to patch JDK and it seems like the list of packages that can be endorsed, does not include java.util.
The similar question is already raised on sun.com bugtracker, but I would like to know, what is the opinion of the community and what can be the way out taking in mind the maximum memory & performance effectiveness.
If you agree, this is nice and beneficiary functionality, please, vote this bug!
As a logical matter, you're right in that the single getEntry would save you a hash lookup. As a practical matter, unless you have a specific use case where you have reason to be concerned about the performance hit( which seems pretty unlikely, hash lookup is common, O(1), and well optimized) what you're worrying about is probably negligible.
Why don't you write a test? Create a hashtable with a few 10's of millions of objects, or whatever's an order of magnitude greater than what your application is likely to create, and average the time of a get() over a million or so iterations (hint: it's going to be a very small number).
A bigger issue with what you're doing is synchronization. You should be aware that if you're doing conditional alterations on a map you could run into issues, even if you're using a Synchronized map, as you'd have to lock access to the key covering the span of both the get() and set() operations.
Not pretty, but you could use lightweight object to hold a reference to the actual value to avoid second lookups.
HashMap<String, String[]> map = ...;
// append value to the current value of key
String key = "key";
String value = "value";
// I use an array to hold a reference - even uglier than the whole idea itself ;)
String[] ref = new String[1]; // lightweigt object
String[] prev = map.put(key, ref);
ref[0] = (prev != null) ? prev[0] + value : value;
I wouldn't worry about hash lookup performance too much though (Steve B's answer is pretty good in pointing out why). Especially with String keys, I wouldn't worry too much about hashCode() as its result is cached. You could worry about equals() though as it might be called more than once per lookup. But for short strings (which are often used as keys) this is negligible too.
There are no performance gain from this proposal, because performance of Map in average case is O(1). But enabling access to the raw Entry in such case will raise another problem. It will be possible to change key in entry (even if it's only possible via reflection) and therefore break order of the internal array.

What's the quickest way to remove an element from a Map by value in Java?

What's the quickest way to remove an element from a Map by value in Java?
Currently I'm using:
DomainObj valueToRemove = new DomainObj();
String removalKey = null;
for (Map.Entry<String, DomainObj> entry : map.entrySet()) {
if (valueToRemove.equals(entry.getValue())) {
removalKey = entry.getKey();
break;
}
}
if (removalKey != null) {
map.remove(removalKey);
}
The correct and fast one-liner would actually be:
while (map.values().remove(valueObject));
Kind of strange that most examples above assume the valueObject to be unique.
Here's the one-line solution:
map.values().remove(valueToRemove);
That's probably faster than defining your own iterator, since the JDK collection code has been significantly optimized.
As others have mentioned, a bimap will have faster value removes, though it requires more memory and takes longer to populate. Also, a bimap only works when the values are unique, which may or may not be the case in your code.
Without using a Bi-directional map (commons-collections and google collections have them), you're stuck with iterating the Map
map.values().removeAll(Collections.singleton(null));
reference to How to filter "Null" values from HashMap<String, String>?, we can do following for java 8:
map.values().removeIf(valueToRemove::equals);
If you don't have a reverse map, I'd go for an iterator.
DomainObj valueToRemove = new DomainObj();
for (
Iterator<Map.Entry<String, DomainObj>> iter = map.entrySet().iterator();
iter.hasNext();
) {
Map.Entry<String, DomainObj> entry = iter.next();
if (valueToRemove.equals(entry.getValue())) {
iter.remove();
break; // if only want to remove first match.
}
}
You could always use the values collection, since any changes made to that collection will result in the change being reflected in the map. So if you were to call Map.values().remove(valueToRemove) that should work - though I'm not sure if you'll see performance better than what you have with that loop. One idea would be to extend or override the map class such that the backing collection then is always sorted by value - that would enable you to do a binary search on the value which may be faster.
Edit: This is essentially the same as Alcon's answer except I don't think his will work since the entrySet is still going to be ordered by key - in which case you can't call .remove() with the value.
This is also assuming that the value is supposed to be unique or that you would want to remove any duplicates from the Map as well.
i would use this
Map x = new HashMap();
x.put(1, "value1");
x.put(2, "value2");
x.put(3, "value3");
x.put(4, "value4");
x.put(5, "value5");
x.put(6, "value6");
x.values().remove("value4");
edit:
because objects are referenced by "pointer" not by value.
N
If you have no way to figure out the key from the DomainObj, then I don't see how you can improve on that. There's no built in method to get the key from the value, so you have to iterate through the map.
If this is something you're doing all the time, you might maintain two maps (string->DomainObj and DomainObj->Key).
Like most of the other posters have said, it's generally an O(N) operation because you're going to have to look through the whole list of hashtable values regardless. #tackline has the right solution for keeping the memory usage at O(1) (I gave him an up-vote for that).
Your other option is to sacrifice memory space for the sake of speed. If your map is reasonably sized, you could store two maps in parallel.
If you have a Map then maintain a Map in parallel to it. When you insert/remove on one map, do it on the other also. Granted this is uglier because you're wasting space and you'll have to make sure the "hashCode" method of DomainObj is written properly, but your removal time drops from O(N) to O(1) because you can lookup the key/object mapping in constant time either direction.
Not generally the best solution, but if your number one concern is speed, I think this is probably as fast as you're gonna get.
====================
Addendum: This essentially what #msaeed suggested just sans the third party library.
A shorter usage of iterator is to use a values() iterator.
DomainObj valueToRemove = new DomainObj();
for (Iterator<DomainObj> it = map.values().iterator(); it.hasNext();)) {
if (valueToRemove.equals(it.next())) {
it.remove();
break;
}
}
We know this situation arise rarely but is extremely helpful. I'll prefer BidiMap from org.apache.commons.collections .
I don't think this will happen only once in the lifetime of your app.
So what I would do, is to delegate to another object the responsability to maintain a reference to the objects added to that map.
So the next time you need to remove it, you use that "reverse map" ...
class MapHolder {
private Map<String, DomainObj> originalMap;
private Map<DomainObj,String> reverseMap;
public void remove( DomainObj value ) {
if ( reverseMap.contains( value ) ) {
originalMap.remove( reverseMap.get( value ) );
reverseMap.remove( value );
}
}
}
This is much much faster than iterating.
Obviously you need to keep them synchronized. But it should not be that hard if you refector your code to have one object being responsible for the state of the map.
Remember that in OOP we have objects that have an state and behavior. If your data is passing around variables all over the place, you are creating unnecessary dependencies between objects
Yes, It will take you some time to correct the code, but the time spent correcting it, will save you a lot of headaches in the future. Think about it.

Categories