Will java streams sum values of a ConcurrentHashMap in an consistent manner? - java

I have a concurrentHashMap instance that some threads add entries to. The values are integers.
Simultaneously, other threads wish to retrieve the sum of all the values in the map. I wish that these threads see a consistent value. However, it doesn't need to be such that they always see the latest value.
Is the following code thread safe?
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
public class MyClass {
private Map<Integer, Integer> values = new ConcurrentHashMap<>();
public void addValue(Integer key, int value){
values.put(key, value);
}
public long sumOfValues(){
return values
.values()
.stream()
.mapToInt(Integer::intValue)
.sum();
}
}
Will the sum operation be calculated on a consistent set of values?
When the sum operation is happening, will calls to put() be blocked?
Of course I could synchronize the access myself, and even split the read and write locks to allow for concurrent read access and synchronized write access, but I am curious if its necessary when using concurrentHashMap as the collection implementation.

The documentation says about ConcurrentHashMap's keySet() and entrySet(): The view's iterators and spliterators are weakly consistent.
Weakly consistent characterized as
they may proceed concurrently with other operations
they will never throw ConcurrentModificationException
they are guaranteed to traverse elements as they existed upon construction exactly once, and may (but are not guaranteed to) reflect any modifications subsequent to construction.
So...
Is the following code thread safe?
Yes, in the narrow sense of absent ConcurrentModificationException or internal inconsistencies of the HashMap.
Will the sum operation be calculated on a consistent set of values?
on a weakly consistent set
When the sum operation is happening, will calls to put() be blocked?
No

The point of ConcurrentHashMap is that the entries are as independent from one another as possible. There isn't a consistent view of the whole map. Indeed, even size doesn't return a very useful value.

If you need to query the sum concurrently, one solution is to write a wrapper class which maintains both the map's state and the sum, using a LongAdder to atomically maintain the sum.
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.atomic.LongAdder;
public class MapSum {
private final ConcurrentMap<Integer, Integer> map = new ConcurrentHashMap<>();
private final LongAdder sum = new LongAdder();
public Integer get(Integer k) {
return map.get(k);
}
public Integer put(Integer k, Integer v) {
Integer[] out = new Integer[1];
map.compute(k, (_k, old) -> {
out[0] = old;
// cast to long to avoid overflow
sum.add((long) v - (old != null ? old : 0));
return v;
});
return out[0];
}
public Integer remove(Integer k) {
Integer[] out = new Integer[1];
map.compute(k, (_k, old) -> {
out[0] = old;
// cast to long to avoid overflow; -Integer.MIN_VALUE == Integer.MIN_VALUE
if(old != null) { sum.add(- (long) old); }
return null;
});
return out[0];
}
public long sum() {
return sum.sum();
}
}
This has the added benefit of querying the sum in O(1) instead of O(n) time. You can add more Map methods if you like, and even implement Map<Integer, Integer> - just be careful to maintain the sum when you change the map's contents in any way.

Related

Is it possible to create Stream implementation that counts their elements in a single operation

Q: Is it possible to create Stream implementation that counts their elements in a single operation rather than counting each and every element in the stream?
I came to this though when i tried to compare two methods on a list :
size()
count()
Stream::count terminal operation counts the number of elements in a Stream. The complexity of the operation is often O(N), meaning that the number of sub-operations is proportional to the number of elements in the Stream.
List::size method has a complexity of O(1), which means that regardless of the number of elements in the List, the size() method will return in constant time.
List<Integer> list = IntStream.range(0, 100).boxed().collect(toList());
System.out.println(list.size());
System.out.println(list.stream().count());
size() took a relative less time than count(),so is there any possible way to create Stream implementation that counts their elements in a single operation and make a complexity of O(1) ??
Edit Article to answer Yes:
It is possible to create Stream implementation that counts their
elements in a single operation O(1) rather than counting each and
every element in the stream. This can improve performance
significantly, especially for streams with many elements.
This is already happening in Java 9 and newer (considering the OpenJDK implementation which is also the base for Oracle’s JDK).
If you want a similar operation, you can use, e.g.
public static long count(BaseStream<?,?> s) {
Spliterator<?> sp = s.spliterator();
long c = sp.getExactSizeIfKnown();
if(c >= 0) return c;
final class Counter implements Consumer<Object>,
IntConsumer, LongConsumer, DoubleConsumer { // avoid boxing where possible
long count;
public void accept(Object t) { count++; }
public void accept(int value) { count++; }
public void accept(long value) { count++; }
public void accept(double value) { count++; }
}
Counter c = new Counter();
sp.forEachRemaining(c);
return c.count;
}
You can check that it won’t process all elements with
System.out.println(count(IntStream.range(0, 100).peek(System.out::println)));
System.out.println(count(Stream.of("a", "b", "c").peek(System.out::println)));
whereas inserting a filter operation like
System.out.println(count(Stream.of("a", "b", "c")
.peek(System.out::println).filter(x -> true)));
will make the count unpredictable and require a traversal.
As said above, in JDK 9 or newer, you can simple use
System.out.println(Stream.of("a", "b", "c").peek(System.out::println).count());
and
System.out.println(Stream.of("a", "b", "c")
.peek(System.out::println).filter(x -> true).count());
to see that the traversal does not happen when the count is predictable.

Do I need to synchronize ConcurrentMap when adding key only if needed?

I have a ConcurrentMap<String, SomeObject> object. I want to write a method that would return the SomeObject value if it exists, or create a new SomeObject, put it in the Map, and return it if it doesn't exist.
Ideally, I could use ConcurrentMap's putIfAbsent(key, new SomeObject(key)), but that means that I create a new SomeObject(key) each time, which seems very wasteful.
So I resorted to the following code, but am not sure that it's the best way to handle this:
public SomeValue getSomevalue(String key){
SomeValue result = concurrentMap.get(key);
if (result != null)
return result;
synchronized(concurrentMap){
SomeValue result = concurrentMap.get(key);
if (result == null){
result = new SomeValue(key);
concurrentMap.put(key, result);
}
return result;
}
}
Ideally, I could use ConcurrentMap's putIfAbsent(key, new SomeObject(key)), but that means that I create a new SomeObject(key) each time, which seems very wasteful.
Then use computeIfAbsent:
concurrentMap.computeIfAbsent(key, SomeObject::new);
Using synchronized with a ConcurrentMap doesn't prevent other threads from performing operations on the map in the middle of the synchronized block. ConcurrentMap doesn't promise to use the map's monitor for synchronization, and neither ConcurrentHashMap nor ConcurrentSkipListMap synchronize on the map object.
Note that the ConcurrentMap interface doesn't promise that the value will only be computed once, or that the value won't be computed if the key is already present. ConcurrentHashMap makes these promises, but ConcurrentSkipListMap doesn't.

How to update a LinkedList value stored in cache using Guava's LoadingCache

I am trying to utilize LoadingCache from the Guava library to cache a LinkedList.
LoadingCache<Integer, LinkedList<String>> cache;
I've setup a CacheLoader to handle misses, which is working fine. However there is another system that needs to submit updates to existing cache entries. Each update needs to be appended to the LinkedList and will arrive at a fairly quick rate (thousands per minute). Finally, it needs to be thread safe.
Here is a naive approach that illustrates the logic but is not thread safe:
public void add(Integer key, String value) {
LinkedList<String> list = cache.get(key);
list.add(value);
cache.put(key, list);
}
Any advice on how to make this work? I can look at other libraries but Guava 14 is already a dependency of this codebase and would be very convenient.
The last line in
public void add(Integer key, String value) {
LinkedList<String> list = cache.get(key);
list.add(value);
cache.put(key, list);
}
is not needed as you already modify the object obtained from the cache. Maybe all you need is
public void add(Integer key, String value) {
LinkedList<String> list = cache.get(key);
synchronized (list) {
list.add(value);
}
}
It depends on what eviction happens. If there's no eviction at all, then it will work. If an entry can get evicted before the updating method finishes, then you're out of luck.
Nonetheless, there's a simple solution: Using a global lock would work, but obviously inefficiently. So use a list of locks:
private static final CONCURRENCY_LEVEL = 64; // must be power of two
List<Object> locks = Lists.newArrayList(); // an array would do as well
for (int i=0; i<CONCURRENCY_LEVEL; ++i) locks.add(new Object());
public void add(Integer key, String value) {
synchronized (locks.get(hash(key))) {
cache.get(key).add(value);
}
}
where hash - depending on the distribution of your keys - can be as simple as key.intValue() & (CONCURRENCY_LEVEL-1) or something like here what sort of randomizes the distribution.
While my above list of locks should work, there's Striped.lock(int) in Guava, which makes it a bit simpler and takes care of padding (see false sharing for what it's good for) and whatever.
Most probably you should not use LinkedList as it's nearly always slower than ArrayList.

Reset all values in hashmap without iterating?

I am trying reset all values in a HashMap to some default value if a condition fails.
Currently i am doing this by iterating over all the keys and individually resetting the values.Is there any possible way to set a same value to all the keys without iterating?
Something like:
hm.putAll("some val") //hm is hashmap object
You can't avoid iterating but if you're using java-8, you could use the replaceAll method which will do that for you.
Apply the specified function to each entry in this map, replacing each
entry's value with the result of calling the function's Function#map
method with the current entry's key and value.
m.replaceAll((k,v) -> yourDefaultValue);
Basically it iterates through each node of the table the map holds and affect the return value of the function for each value.
#Override
public void replaceAll(BiFunction<? super K, ? super V, ? extends V> function) {
Node<K,V>[] tab;
if (function == null)
throw new NullPointerException();
if (size > 0 && (tab = table) != null) {
int mc = modCount;
for (int i = 0; i < tab.length; ++i) {
for (Node<K,V> e = tab[i]; e != null; e = e.next) {
e.value = function.apply(e.key, e.value); //<-- here
}
}
if (modCount != mc)
throw new ConcurrentModificationException();
}
}
Example:
public static void main (String[] args){
Map<String, Integer> m = new HashMap<>();
m.put("1",1);
m.put("2",2);
System.out.println(m);
m.replaceAll((k,v) -> null);
System.out.println(m);
}
Output:
{1=1, 2=2}
{1=null, 2=null}
You can't avoid iterating in some fashion.
You could get the values via Map.values() and iterate over those. You'll bypass the lookup by key and it's probably the most efficient solution (although I suspect generally that would save you relatively little, and perhaps it's not the most obvious to a casual reader of your code)
IMHO You must create your own Data Structure that extends from Map. Then you can write your method resetAll() and give the default value. A Map is a quick balanced tree that allows you to walk quick in the structure and set the value. No worries about the speed, because the tree will have the same structure before and after the reset.
Only, be carefull with concurrent threads. Maybe you should use ConcurrentHashMap.
public class MyMap<K,V> extends ConcurrentHashMap<K, V>{
public void resetAll(V value){
Iterator<Entry<K, V>> it = this.entrySet().iterator();
while (it.hasNext()) {
Map.Entry pairs = (Map.Entry)it.next();
pairs.setValue( value );
}
}
}
Regards
If you're willing to make a copy of it ( a hasmap with default values )
You can first clear your hashmap and then move over the default values
hm.keySet().removeAll();
hm.putAll(defaultMap);
It is not possible to apply an operation to all values in a collection in less than O(n) time, however if your objection is truly with iteration itself, there are some possible alternatives, notably functional programming.
This is made most easy by the Guava library (or natively in Java 8), and their functional programming utilities. Their Maps.transformValues() provides a view of the map, with the provided function applied. This means that the function returns in O(1) time, unlike your iteration, but that the computation is done on the fly whenever you .get() from the returned map. This is obviously a tradeoff - if you only need to .get() certain elements from the transformed map, you save time by avoiding computing unnecessary values. On the other hand, if you know you'll later hit every element at least once, using this behavior means you'll actually waste time. In essence, this approach is O(k) where k is the number of lookups you plan to do. If k is always less than n, then using the transformation approach is optimal.
Read carefully however the caveat at the top of the page; iteration is a simple, easy, and generally ideally efficient way to work with the members of a map. You should only try to optimize past that when absolutely necessary.
Assuming that your problem is not with doing the iteration yourself, but with the fact that O(n) is going on at some point, I would suggest a couple of alternative approaches. Bear in mind I have no idea what you are using this for, so it might not make any sense to you.
Case A: If your set of keys is known and fixed beforehand, keep a copy (not a reference, an actual clone) somewhere with the values reset to the one you want. Then on that condition you mention, simply switch the references to use the default one.
Case B: If they keys change over time, use the idea from case A but add new entries with the default value for every new key added (or remove accordingly). Your updates should hardly notice but you can still switch back to the default in O(1).

Most efficient way to increment a Map value in Java

I hope this question is not considered too basic for this forum, but we'll see. I'm wondering how to refactor some code for better performance that is getting run a bunch of times.
Say I'm creating a word frequency list, using a Map (probably a HashMap), where each key is a String with the word that's being counted and the value is an Integer that's incremented each time a token of the word is found.
In Perl, incrementing such a value would be trivially easy:
$map{$word}++;
But in Java, it's much more complicated. Here the way I'm currently doing it:
int count = map.containsKey(word) ? map.get(word) : 0;
map.put(word, count + 1);
Which of course relies on the autoboxing feature in the newer Java versions. I wonder if you can suggest a more efficient way of incrementing such a value. Are there even good performance reasons for eschewing the Collections framework and using a something else instead?
Update: I've done a test of several of the answers. See below.
Now there is a shorter way with Java 8 using Map::merge.
myMap.merge(key, 1, Integer::sum)
or
myMap.merge(key, 1L, Long::sum)
for longs respectively.
What it does:
if key do not exists, put 1 as value
otherwise sum 1 to the value linked to key
More information here.
Some test results
I've gotten a lot of good answers to this question--thanks folks--so I decided to run some tests and figure out which method is actually fastest. The five methods I tested are these:
the "ContainsKey" method that I presented in the question
the "TestForNull" method suggested by Aleksandar Dimitrov
the "AtomicLong" method suggested by Hank Gay
the "Trove" method suggested by jrudolph
the "MutableInt" method suggested by phax.myopenid.com
Method
Here's what I did...
created five classes that were identical except for the differences shown below. Each class had to perform an operation typical of the scenario I presented: opening a 10MB file and reading it in, then performing a frequency count of all the word tokens in the file. Since this took an average of only 3 seconds, I had it perform the frequency count (not the I/O) 10 times.
timed the loop of 10 iterations but not the I/O operation and recorded the total time taken (in clock seconds) essentially using Ian Darwin's method in the Java Cookbook.
performed all five tests in series, and then did this another three times.
averaged the four results for each method.
Results
I'll present the results first and the code below for those who are interested.
The ContainsKey method was, as expected, the slowest, so I'll give the speed of each method in comparison to the speed of that method.
ContainsKey: 30.654 seconds (baseline)
AtomicLong: 29.780 seconds (1.03 times as fast)
TestForNull: 28.804 seconds (1.06 times as fast)
Trove: 26.313 seconds (1.16 times as fast)
MutableInt: 25.747 seconds (1.19 times as fast)
Conclusions
It would appear that only the MutableInt method and the Trove method are significantly faster, in that only they give a performance boost of more than 10%. However, if threading is an issue, AtomicLong might be more attractive than the others (I'm not really sure). I also ran TestForNull with final variables, but the difference was negligible.
Note that I haven't profiled memory usage in the different scenarios. I'd be happy to hear from anybody who has good insights into how the MutableInt and Trove methods would be likely to affect memory usage.
Personally, I find the MutableInt method the most attractive, since it doesn't require loading any third-party classes. So unless I discover problems with it, that's the way I'm most likely to go.
The code
Here is the crucial code from each method.
ContainsKey
import java.util.HashMap;
import java.util.Map;
...
Map<String, Integer> freq = new HashMap<String, Integer>();
...
int count = freq.containsKey(word) ? freq.get(word) : 0;
freq.put(word, count + 1);
TestForNull
import java.util.HashMap;
import java.util.Map;
...
Map<String, Integer> freq = new HashMap<String, Integer>();
...
Integer count = freq.get(word);
if (count == null) {
freq.put(word, 1);
}
else {
freq.put(word, count + 1);
}
AtomicLong
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.atomic.AtomicLong;
...
final ConcurrentMap<String, AtomicLong> map =
new ConcurrentHashMap<String, AtomicLong>();
...
map.putIfAbsent(word, new AtomicLong(0));
map.get(word).incrementAndGet();
Trove
import gnu.trove.TObjectIntHashMap;
...
TObjectIntHashMap<String> freq = new TObjectIntHashMap<String>();
...
freq.adjustOrPutValue(word, 1, 1);
MutableInt
import java.util.HashMap;
import java.util.Map;
...
class MutableInt {
int value = 1; // note that we start at 1 since we're counting
public void increment () { ++value; }
public int get () { return value; }
}
...
Map<String, MutableInt> freq = new HashMap<String, MutableInt>();
...
MutableInt count = freq.get(word);
if (count == null) {
freq.put(word, new MutableInt());
}
else {
count.increment();
}
A little research in 2016: https://github.com/leventov/java-word-count, benchmark source code
Best results per method (smaller is better):
time, ms
kolobokeCompile 18.8
koloboke 19.8
trove 20.8
fastutil 22.7
mutableInt 24.3
atomicInteger 25.3
eclipse 26.9
hashMap 28.0
hppc 33.6
hppcRt 36.5
Time\space results:
Map<String, Integer> map = new HashMap<>();
String key = "a random key";
int count = map.getOrDefault(key, 0); // ensure count will be one of 0,1,2,3,...
map.put(key, count + 1);
And that's how you increment a value with simple code.
Benefit:
No need to add a new class or use another concept of mutable int
Not relying on any library
Easy to understand what's going on exactly (Not too much abstraction)
Downside:
The hash map will be searched twice for get() and put(). So it will not be the most performant code.
Theoretically, once you call get(), you already know where to put(), so you should not have to search again. But searching in hash map usually takes a very minimal time that you can kind of ignore this performance issue.
But if you are very serious about the issue, you are a perfectionist, another way is to use merge method, this is (probably) more efficient than the previous code snippet as you will be (theoretically) searching the map only once: (though this code is not obvious from first sight, it's short and performant)
map.merge(key, 1, (a,b) -> a+b);
Suggestion: you should care about code readability more than little performance gain in most of the time. If the first code snippet is easier for you to understand then use it. But if you are able to understand the 2nd one fine then you can also go for it!
As a follow-up to my own comment: Trove looks like the way to go. If, for whatever reason, you wanted to stick with the standard JDK, ConcurrentMap and AtomicLong can make the code a tiny bit nicer, though YMMV.
final ConcurrentMap<String, AtomicLong> map = new ConcurrentHashMap<String, AtomicLong>();
map.putIfAbsent("foo", new AtomicLong(0));
map.get("foo").incrementAndGet();
will leave 1 as the value in the map for foo. Realistically, increased friendliness to threading is all that this approach has to recommend it.
Google Guava is your friend...
...at least in some cases. They have this nice AtomicLongMap. Especially nice because you are dealing with long as value in your map.
E.g.
AtomicLongMap<String> map = AtomicLongMap.create();
[...]
map.getAndIncrement(word);
Also possible to add more then 1 to the value:
map.getAndAdd(word, 112L);
It's always a good idea to look at the Google Collections Library for this kind of thing. In this case a Multiset will do the trick:
Multiset bag = Multisets.newHashMultiset();
String word = "foo";
bag.add(word);
bag.add(word);
System.out.println(bag.count(word)); // Prints 2
There are Map-like methods for iterating over keys/entries, etc. Internally the implementation currently uses a HashMap<E, AtomicInteger>, so you will not incur boxing costs.
You should be aware of the fact that your original attempt
int count = map.containsKey(word) ? map.get(word) : 0;
contains two potentially expensive operations on a map, namely containsKey and get. The former performs an operation potentially pretty similar to the latter, so you're doing the same work twice!
If you look at the API for Map, get operations usually return null when the map does not contain the requested element.
Note that this will make a solution like
map.put( key, map.get(key) + 1 );
dangerous, since it might yield NullPointerExceptions. You should check for a null first.
Also note, and this is very important, that HashMaps can contain nulls by definition. So not every returned null says "there is no such element". In this respect, containsKey behaves differently from get in actually telling you whether there is such an element. Refer to the API for details.
For your case, however, you might not want to distinguish between a stored null and "noSuchElement". If you don't want to permit nulls you might prefer a Hashtable. Using a wrapper library as was already proposed in other answers might be a better solution to manual treatment, depending on the complexity of your application.
To complete the answer (and I forgot to put that in at first, thanks to the edit function!), the best way of doing it natively, is to get into a final variable, check for null and put it back in with a 1. The variable should be final because it's immutable anyway. The compiler might not need this hint, but its clearer that way.
final HashMap map = generateRandomHashMap();
final Object key = fetchSomeKey();
final Integer i = map.get(key);
if (i != null) {
map.put(i + 1);
} else {
// do something
}
If you do not want to rely on autoboxing, you should say something like map.put(new Integer(1 + i.getValue())); instead.
Another way would be creating a mutable integer:
class MutableInt {
int value = 0;
public void inc () { ++value; }
public int get () { return value; }
}
...
Map<String,MutableInt> map = new HashMap<String,MutableInt> ();
MutableInt value = map.get (key);
if (value == null) {
value = new MutableInt ();
map.put (key, value);
} else {
value.inc ();
}
of course this implies creating an additional object but the overhead in comparison to creating an Integer (even with Integer.valueOf) should not be so much.
You can make use of computeIfAbsent method in Map interface provided in Java 8.
final Map<String,AtomicLong> map = new ConcurrentHashMap<>();
map.computeIfAbsent("A", k->new AtomicLong(0)).incrementAndGet();
map.computeIfAbsent("B", k->new AtomicLong(0)).incrementAndGet();
map.computeIfAbsent("A", k->new AtomicLong(0)).incrementAndGet(); //[A=2, B=1]
The method computeIfAbsent checks if the specified key is already associated with a value or not? If no associated value then it attempts to compute its value using the given mapping function. In any case it returns the current (existing or computed) value associated with the specified key, or null if the computed value is null.
On a side note if you have a situation where multiple threads update a common sum you can have a look at LongAdder class.Under high contention, expected throughput of this class is significantly higher than AtomicLong, at the expense of higher space consumption.
Quite simple, just use the built-in function in Map.java as followed
map.put(key, map.getOrDefault(key, 0) + 1);
Memory rotation may be an issue here, since every boxing of an int larger than or equal to 128 causes an object allocation (see Integer.valueOf(int)). Although the garbage collector very efficiently deals with short-lived objects, performance will suffer to some degree.
If you know that the number of increments made will largely outnumber the number of keys (=words in this case), consider using an int holder instead. Phax already presented code for this. Here it is again, with two changes (holder class made static and initial value set to 1):
static class MutableInt {
int value = 1;
void inc() { ++value; }
int get() { return value; }
}
...
Map<String,MutableInt> map = new HashMap<String,MutableInt>();
MutableInt value = map.get(key);
if (value == null) {
value = new MutableInt();
map.put(key, value);
} else {
value.inc();
}
If you need extreme performance, look for a Map implementation which is directly tailored towards primitive value types. jrudolph mentioned GNU Trove.
By the way, a good search term for this subject is "histogram".
I suggest to use Java 8 Map::compute().
It considers the case when a key doesn't exist, too.
Map.compute(num, (k, v) -> (v == null) ? 1 : v + 1);
Instead of calling containsKey() it is faster just to call map.get and check if the returned value is null or not.
Integer count = map.get(word);
if(count == null){
count = 0;
}
map.put(word, count + 1);
Are you sure that this is a bottleneck? Have you done any performance analysis?
Try using the NetBeans profiler (its free and built into NB 6.1) to look at hotspots.
Finally, a JVM upgrade (say from 1.5->1.6) is often a cheap performance booster. Even an upgrade in build number can provide good performance boosts. If you are running on Windows and this is a server class application, use -server on the command line to use the Server Hotspot JVM. On Linux and Solaris machines this is autodetected.
There are a couple of approaches:
Use a Bag alorithm like the sets contained in Google Collections.
Create mutable container which you can use in the Map:
class My{
String word;
int count;
}
And use put("word", new My("Word") ); Then you can check if it exists and increment when adding.
Avoid rolling your own solution using lists, because if you get innerloop searching and sorting, your performance will stink. The first HashMap solution is actually quite fast, but a proper like that found in Google Collections is probably better.
Counting words using Google Collections, looks something like this:
HashMultiset s = new HashMultiset();
s.add("word");
s.add("word");
System.out.println(""+s.count("word") );
Using the HashMultiset is quite elegent, because a bag-algorithm is just what you need when counting words.
A variation on the MutableInt approach that might be even faster, if a bit of a hack, is to use a single-element int array:
Map<String,int[]> map = new HashMap<String,int[]>();
...
int[] value = map.get(key);
if (value == null)
map.put(key, new int[]{1} );
else
++value[0];
It would be interesting if you could rerun your performance tests with this variation. It might be the fastest.
Edit: The above pattern worked fine for me, but eventually I changed to use Trove's collections to reduce memory size in some very large maps I was creating -- and as a bonus it was also faster.
One really nice feature is that the TObjectIntHashMap class has a single adjustOrPutValue call that, depending on whether there is already a value at that key, will either put an initial value or increment the existing value. This is perfect for incrementing:
TObjectIntHashMap<String> map = new TObjectIntHashMap<String>();
...
map.adjustOrPutValue(key, 1, 1);
Google Collections HashMultiset :
- quite elegant to use
- but consume CPU and memory
Best would be to have a method like : Entry<K,V> getOrPut(K);
(elegant, and low cost)
Such a method will compute hash and index only once,
and then we could do what we want with the entry
(either replace or update the value).
More elegant:
- take a HashSet<Entry>
- extend it so that get(K) put a new Entry if needed
- Entry could be your own object.
--> (new MyHashSet()).get(k).increment();
"put" need "get" (to ensure no duplicate key).
So directly do a "put",
and if there was a previous value, then do an addition:
Map map = new HashMap ();
MutableInt newValue = new MutableInt (1); // default = inc
MutableInt oldValue = map.put (key, newValue);
if (oldValue != null) {
newValue.add(oldValue); // old + inc
}
If count starts at 0, then add 1: (or any others values...)
Map map = new HashMap ();
MutableInt newValue = new MutableInt (0); // default
MutableInt oldValue = map.put (key, newValue);
if (oldValue != null) {
newValue.setValue(oldValue + 1); // old + inc
}
Notice : This code is not thread safe. Use it to build then use the map, not to concurrently update it.
Optimization : In a loop, keep old value to become the new value of next loop.
Map map = new HashMap ();
final int defaut = 0;
final int inc = 1;
MutableInt oldValue = new MutableInt (default);
while(true) {
MutableInt newValue = oldValue;
oldValue = map.put (key, newValue); // insert or...
if (oldValue != null) {
newValue.setValue(oldValue + inc); // ...update
oldValue.setValue(default); // reuse
} else
oldValue = new MutableInt (default); // renew
}
}
The various primitive wrappers, e.g., Integer are immutable so there's really not a more concise way to do what you're asking unless you can do it with something like AtomicLong. I can give that a go in a minute and update. BTW, Hashtable is a part of the Collections Framework.
I'd use Apache Collections Lazy Map (to initialize values to 0) and use MutableIntegers from Apache Lang as values in that map.
Biggest cost is having to serach the map twice in your method. In mine you have to do it just once. Just get the value (it will get initialized if absent) and increment it.
The Functional Java library's TreeMap datastructure has an update method in the latest trunk head:
public TreeMap<K, V> update(final K k, final F<V, V> f)
Example usage:
import static fj.data.TreeMap.empty;
import static fj.function.Integers.add;
import static fj.pre.Ord.stringOrd;
import fj.data.TreeMap;
public class TreeMap_Update
{public static void main(String[] a)
{TreeMap<String, Integer> map = empty(stringOrd);
map = map.set("foo", 1);
map = map.update("foo", add.f(1));
System.out.println(map.get("foo").some());}}
This program prints "2".
I don't know how efficient it is but the below code works as well.You need to define a BiFunction at the beginning. Plus, you can make more than just increment with this method.
public static Map<String, Integer> strInt = new HashMap<String, Integer>();
public static void main(String[] args) {
BiFunction<Integer, Integer, Integer> bi = (x,y) -> {
if(x == null)
return y;
return x+y;
};
strInt.put("abc", 0);
strInt.merge("abc", 1, bi);
strInt.merge("abc", 1, bi);
strInt.merge("abc", 1, bi);
strInt.merge("abcd", 1, bi);
System.out.println(strInt.get("abc"));
System.out.println(strInt.get("abcd"));
}
output is
3
1
If you're using Eclipse Collections, you can use a HashBag. It will be the most efficient approach in terms of memory usage and it will also perform well in terms of execution speed.
HashBag is backed by a MutableObjectIntMap which stores primitive ints instead of Counter objects. This reduces memory overhead and improves execution speed.
HashBag provides the API you'd need since it's a Collection that also allows you to query for the number of occurrences of an item.
Here's an example from the Eclipse Collections Kata.
MutableBag<String> bag =
HashBag.newBagWith("one", "two", "two", "three", "three", "three");
Assert.assertEquals(3, bag.occurrencesOf("three"));
bag.add("one");
Assert.assertEquals(2, bag.occurrencesOf("one"));
bag.addOccurrences("one", 4);
Assert.assertEquals(6, bag.occurrencesOf("one"));
Note: I am a committer for Eclipse Collections.
Counting using streams and getOrDefault:
String s = "abcdeff";
s.chars().mapToObj(c -> (char) c)
.forEach(c -> {
int count = countMap.getOrDefault(c, 0) + 1;
countMap.put(c, count);
});
Since a lot of people search Java topics for Groovy answers, here's how you can do it in Groovy:
dev map = new HashMap<String, Integer>()
map.put("key1", 3)
map.merge("key1", 1) {a, b -> a + b}
map.merge("key2", 1) {a, b -> a + b}
Hope I'm understanding your question correctly, I'm coming to Java from Python so I can empathize with your struggle.
if you have
map.put(key, 1)
you would do
map.put(key, map.get(key) + 1)
Hope this helps!
The simple and easy way in java 8 is the following:
final ConcurrentMap<String, AtomicLong> map = new ConcurrentHashMap<String, AtomicLong>();
map.computeIfAbsent("foo", key -> new AtomicLong(0)).incrementAndGet();

Categories