Specifically I need a collection which uses one field A for accessing and a different one (field S) for sorting but a sorted collection which accepts duplicate would be sufficient.
I often come to this point where I need exactly this collection and TreeMap is not an option as it does not allow duplicates. So now it is time to ask here. There are several workarounds as pointed out on stackoverflow here and here - namely there are:
PriorityQueue: slow update (remove(Object) + add(Object)), and boxing of primitive keys
Fibonacci heap: memory waste (?)
TreeMap<Field_S, List<Value>>: problem for me is the memory overhead of the list, and boxing of primitive keys
sorted list or array: problem is the slow insert and remove -> should I implement one segmented sorted list?
TreeMultimap from guava (docs): external dependency and probably memory inefficient (?)
Anyone with better suggestions? Or should I role my own sorted datastructure (which one?)? Also other sources (in Java, open source, with unit tests and small deps) would be nice.
Update
More details on my use case at the moment (although I'm having similar demand in the last time). I have a collection (with millions) of references where I want to be able
to poll or get the smallest element regarding field S
and update field S with the help of field A
identical values of field S can happen. field A is actually a integer pointing into another array
the only dependency I want is trove4j. I could use a different like the mahout collections if that would be required. But not guava as although a nice lib the collections are not tuned to be memory efficient (boxing/unboxing).
So all cries for a fibonacci heap but I fear it has too many overhead per element -> that was the reason I thought about a more memory efficient "sorted+segmented array" solution.
When you need a sorted collection, you should analyze your needs carefully.
If the majority of operations is inserting and only a few are to search then using a sorted collection i.e. keep the elements sorted in the collection constantly, would not be a good option (due to the overhead of keeping the elements sorted on insert which would be the most common operation).
In this case it would be best to keep an unsorted collection and do the sorting only when needed. I.e. before the search. You could even use a simple List and sort it (using Collections.sort i.e. mergesort) when needed. But I recommend this with caution, as for this to be efficient the assumption is that you work on large data. In really small data even linear search is good enough.
If the majority of operations is searching then you could use a sorted collection which from my of point of view there are data structures to choose from (some you already mention) and you could benchmark to see which one fits your needs.
What about guava TreeMultiset? What you asked for: a sorted collection which accepts duplicates. Don't know anything about its performance though.
I decided to roll my own but not the optimal solution just a TreeMap variant. I'll keep this updated if I'll fine tune this collection regarding memory. Speed is already a lot better then the previous PriorityQueue attempt as I needed the collection.remove(Object) method (for updating an entry):
package com.graphhopper.coll;
import gnu.trove.iterator.TIntIterator;
import gnu.trove.set.hash.TIntHashSet;
import java.util.Map.Entry;
import java.util.TreeMap;
/**
* A priority queue implemented by a treemap to allow fast key update. Or should we use a standard
* b-tree?
*/
public class MySortedCollection {
private int size;
private int slidingMeanValue = 20;
private TreeMap<Integer, TIntHashSet> map;
public MySortedCollection(int size) {
map = new TreeMap<Integer, TIntHashSet>();
}
void remove(int key, int value) {
TIntHashSet set = map.get(value);
if (set == null || !set.remove(key))
throw new IllegalStateException("cannot remove key " + key + " with value " + value
+ " - did you insert " + key + "," + value + " before?");
size--;
if (set.isEmpty())
map.remove(value);
}
public void update(int key, int oldValue, int value) {
remove(key, oldValue);
insert(key, value);
}
public void insert(int key, int value) {
TIntHashSet set = map.get(value);
if (set == null)
map.put(value, set = new TIntHashSet(slidingMeanValue));
// else
// slidingMeanValue = Math.max(5, (slidingMeanValue + set.size()) / 2);
if (!set.add(key))
throw new IllegalStateException("use update if you want to update " + key);
size++;
}
public int peekValue() {
if (size == 0)
throw new IllegalStateException("collection is already empty!?");
Entry<Integer, TIntHashSet> e = map.firstEntry();
if (e.getValue().isEmpty())
throw new IllegalStateException("internal set is already empty!?");
return map.firstEntry().getKey();
}
public int peekKey() {
if (size == 0)
throw new IllegalStateException("collection is already empty!?");
TIntHashSet set = map.firstEntry().getValue();
if (set.isEmpty())
throw new IllegalStateException("internal set is already empty!?");
return set.iterator().next();
}
public int pollKey() {
size--;
if (size < 0)
throw new IllegalStateException("collection is already empty!?");
Entry<Integer, TIntHashSet> e = map.firstEntry();
TIntHashSet set = e.getValue();
TIntIterator iter = set.iterator();
if (set.isEmpty())
throw new IllegalStateException("internal set is already empty!?");
int val = iter.next();
iter.remove();
if (set.isEmpty())
map.remove(e.getKey());
return val;
}
public int size() {
return size;
}
public boolean isEmpty() {
return size == 0;
}
public int getSlidingMeanValue() {
return slidingMeanValue;
}
#Override
public String toString() {
return "size " + size + " min=(" + peekKey() + "=>" + peekValue() + ")";
}
}
You need to decide if you want external dependencies or not. I wouldn't roll my own implementation for something like this.
That said, you've told us almost nothing about what you're using this for, and what you plan to do with it. Without enough data, there's only so much we can tell you -- do you actually need to access the elements in random order? How large do you expect this collection to be? We really don't have enough data to pick out the one right data structure for your needs.
That said, here are some options I would consider.
ArrayList or PriorityQueue, depending on whether or not you actually need to support remove(Object). Do you? Are you sure? (Even if you do need to support remove(Object), I would choose this option if the collection is likely to stay small.)
Not the TreeList you linked to, but instead the Apache Commons Collections TreeList. Despite the name, it doesn't actually maintain sorted order, but what it does is support O(log n) add, remove, and get from anywhere in the list. Using binary search, you could potentially achieve O((log n)^2) time for add, remove, or lookup according to the sorted part of your values.
The TreeList you linked to, or -- if you're like me, and care about the List contract -- a custom Guava ListMultimap, obtained with Multimaps.newListMultimap(new TreeMap<K, Collection<V>>, new Supplier<List<V>>() { public List<V> get() { return new ArrayList<V>(); }}).
If you also care about primitive boxing, or can't tolerate third-party dependencies, you're going to have no choice but to write up your own data structure. I'd just adapt one of the implementations above to your primitive type, but this is going to be a royal pain.
Finally: I'd really like to hear your use case. Guava doesn't have any support for things like this because we haven't had enough demand, or seen a use case for which a more sophisticated data structure really appropriate.
I would go with skiplist - more memory efficient than a tree, allows duplicates, provides O(logn) for inserts and deletes. You can even implement an indexed skiplist, it will allow you to have indexed access, something that's hard to get with a tree.
I have good experience with TreeMultimap https://guava.dev/releases/19.0/api/docs/com/google/common/collect/TreeMultimap.html
I'm working with a large ArrayList<HashMap<A,B>>, and I would repeatedly need to select a random key from a random HashMap (and do some stuff with it). Selecting the random HashMap is trivial, but how should I select a random key from within this HashMap?
Speed is important (as I need to do this 10000 times and the hashmaps are large), so just selecting a random number k in [0,9999] and then doing .next() on the iterator k times, is really not an option. Similarly, converting the HashMap to an array or ArrayList on every random pick is really not an option. Please, read this before replying.
Technically I feel that this should be possible, since the HashMap stores its keys in an Entry[] internally, and selecting at random from an array is easy, but I can't figure out how to access this Entry[]. So any ideas to access the internal Entry[] are more than welcome. Other solutions (as long as they don't consume linear time in the hashmap size) are also welcome of course.
Note: heuristics are fine, so if there's a method that excludes 1% of the elements (e.g. because of multi-filled buckets) that's no problem at all.
from top of my head
List<A> keysAsArray = new ArrayList<A>(map.keySet())
Random r = new Random()
then just
map.get(keysAsArray.get(r.nextInt(keysAsArray.size()))
I managed to find a solution without performance loss. I will post it here since it may help other people -- and potentially answer several open questions on this topic (I'll search for these later).
What you need is a second custom Set-like data structure to store the keys -- not a list as some suggested here. Lists-like data structures are to expensive to remove items from. The operations needed are adding/removing elements in constant time (to keep it up-to-date with the HashMap) and a procedure to select the random element. The following class MySet does exactly this
class MySet<A> {
ArrayList<A> contents = new ArrayList();
HashMap<A,Integer> indices = new HashMap<A,Integer>();
Random R = new Random();
//selects random element in constant time
A randomKey() {
return contents.get(R.nextInt(contents.size()));
}
//adds new element in constant time
void add(A a) {
indices.put(a,contents.size());
contents.add(a);
}
//removes element in constant time
void remove(A a) {
int index = indices.get(a);
contents.set(index,contents.get(contents.size()-1));
indices.put(contents.get(index),index);
contents.remove((int)(contents.size()-1));
indices.remove(a);
}
}
You need access to the underlying entry table.
// defined staticly
Field table = HashMap.class.getDeclaredField("table");
table.setAccessible(true);
Random rand = new Random();
public Entry randomEntry(HashMap map) {
Entry[] entries = (Entry[]) table.get(map);
int start = rand.nextInt(entries.length);
for(int i=0;i<entries.length;i++) {
int idx = (start + i) % entries.length;
Entry entry = entries[idx];
if (entry != null) return entry;
}
return null;
}
This still has to traverse the entries to find one which is there so the worst case is O(n) but the typical behaviour is O(1).
Sounds like you should consider either an ancillary List of keys or a real object, not a Map, to store in your list.
As #Alberto Di Gioacchino pointed out, there is a bug in the accepted solution with the removal operation. This is how I fixed it.
class MySet<A> {
ArrayList<A> contents = new ArrayList();
HashMap<A,Integer> indices = new HashMap<A,Integer>();
Random R = new Random();
//selects random element in constant time
A randomKey() {
return contents.get(R.nextInt(contents.size()));
}
//adds new element in constant time
void add(A item) {
indices.put(item,contents.size());
contents.add(item);
}
//removes element in constant time
void remove(A item) {
int index = indices.get(item);
contents.set(index,contents.get(contents.size()-1));
indices.put(contents.get(index),index);
contents.remove(contents.size()-1);
indices.remove(item);
}
}
I'm assuming you are using HashMap as you need to look something up at a later date?
If not the case, then just change your HashMap to an Array/ArrayList.
If this is the case, why not store your objects in a Map AND an ArrayList so you can look up randomly or by key.
Alternatively, could you use a TreeMap instead of HashMap? I don't know what type your key is but you use TreeMap.floorKey() in conjunction with some key randomizer.
After spending some time, I came to the conclusion that you need to create a model which can be backed by a List<Map<A, B>> and a List<A> to maintain your keys. You need to keep the access of your List<Map<A, B>> and List<A>, just provide the operations/methods to the caller. In this way, you will have the full control over implementation, and the actual objects will be safer from external changes.
Btw, your questions lead me to,
Why does the java.util.Set<V> interface not provide a get(Object o) method?, and
Bimap: I was trying to be clever but, of course, its values() method also returns Set.
This example, IndexedSet, may give you an idea about how-to.
[edited]
This class, SetUniqueList, might help you if you decide to create your own model. It explicitly states that it wraps the list, not copies. So, I think, we can do something like,
List<A> list = new ArrayList(map.keySet());
SetUniqueList unikList = new SetUniqueList(list, map.keySet);
// Now unikList should reflect all the changes to the map keys
...
// Then you can do
unikList.get(i);
Note: I didn't try this myself. Will do that later (rushing to home).
Since Java 8, there is an O(log(N)) approach with O(log(N)) additional memory: create a Spliterator via map.entrySet().spliterator(), make log(map.size()) trySplit() calls and choose either the first or the second half randomly. When there are say less than 10 elements left in a Spliterator, dump them into a list and make a random pick.
If you absolutely need to access the Entry array in HashMap, you can use reflection. But then your program will be dependent on that concrete implementation of HashMap.
As proposed, you can keep a separate list of keys for each map. You would not keep deep copies of the keys, so the actual memory denormalisation wouldn't be that big.
Third approach is to implement your own Map implementation, the one that keeps keys in a list instead of a set.
How about wrapping HashMap in another implementation of Map? The other map maintains a List, and on put() it does:
if (inner.put(key, value) == null) listOfKeys.add(key);
(I assume that nulls for values aren't permitted, if they are use containsKey, but that's slower)
I have a LinkedHashMap (called info) that contains name/age (string/int) pairs. How can I get the position of the key/value if I input the key? For example, if my LinkedHashMap looked like this {bob=12, jeremy=42, carly=21} and I was to search jeremy, it should return 1 as its in position 1. I was hoping I can use something like info.getIndex("jeremy").
HashMap implementations in general are un-ordered for Iteration.
LinkedHashMap is predictablely ordered for Iteration ( insertion order ) but does not expose the List interface and a LinkedList ( which is what mirrors the key set insertion order ) does not track index position itself either, it is very in-efficient to find the index as well. The LinkedHashMap doesn't expose the reference to the internal LinkedList either.
The actual "Linked List" behavior is implementation specific. Some
may actually use an instance of LinkedList some many just have
Entry track a previous and next Entry and use that as its
implementation. Don't assume anything without looking at the source.
The KeySet that contains the keys does not guarantee order as well because of the hashing algorithms used for placement in the backing data structure of the inherited HashMap. So you can't use that.
The only way to do this, without writing your own implementation, is to walk the Iterator which uses the mirroring LinkedList and keep a count where you are, this will be very in-efficient with large data sets.
Solution
What it sounds like you want is original insertion order index positions, you would have to mirror the keys in the KeySet in something like an ArrayList, keep it in sync with updates to the HashMap and use it for finding position. Creating a sub-class of HashMap, say IndexedHashMap and adding this ArrayList internally and adding a .getKeyIndex(<K> key) that delegates to the internal ArrayList .indexOf() is probably the best way to go about this.
This is what LinkedHashMap does but with a LinkedList mirroring the KeySet instead of an ArrayList.
int pos = new ArrayList<String>(info.keySet()).indexOf("jeremy")
I saw a suggestion from one of the duplicates of this question at
How get value from LinkedHashMap based on index not on key?
and I liked the suggestion as described as pseudo code from #schippi in the comments. I thought some working Java code might be useful to others on this approach
import java.util.ArrayList;
import java.util.LinkedHashMap;
public class IndexedLinkedHashMap<K,V> extends LinkedHashMap<K,V> {
/**
*
*/
private static final long serialVersionUID = 1L;
ArrayList<K> al_Index = new ArrayList<K>();
#Override
public V put(K key,V val) {
if (!super.containsKey(key)) al_Index.add(key);
V returnValue = super.put(key,val);
return returnValue;
}
public V getValueAtIndex(int i){
return (V) super.get(al_Index.get(i));
}
public K getKeyAtIndex(int i) {
return (K) al_Index.get(i);
}
public int getIndexOf(K key) {
return al_Index.indexOf(key);
}
}
Considering that LinkedHashMap keep the order of insertion, you can use the keySet() and List.copyOf() (since Java 10) methods like this:
List<String> keys = List.copyOf( yourLinkedHashMap.keySet() );
System.out.println( keys.indexOf("jeremy") ); // prints '1'
LinkedHashMap has "predictable iteration order" (javadoc). Items don't know their location, though, so you'll have to iterate the collection to get it. If you're maintaining a large map you may want to use a different structure for storage.
Edit: clarified iteration
You can use com.google.common.collect.LinkedListMultimap from the Google Guava library. You don't need the multimap behaviour of this class what you want is that the keys() method guarantees they are returned in insertion order and can then be used to construct a List, you can use the indexOf() to find the required index position
I do extract the positons of the key into a concurent map like this:
Here for a Map, someListOfComplexObject() would be entrySet()
and getComplexStringKeyElem() would be getKey()
might come from
final int[] index = {0};
Stream<ComplexObject> t = someListOfComplexObject.stream();
ConcurrentMap<String, List<Integer>> m =
t.collect(Collectors.groupingBy(
e -> e.getComplexStringKeyElem(),
Collectors.mapping(
e -> index[0]++,
Collectors.toList()
),
ConcurrentSkipListMap::new));
I hope this question is not considered too basic for this forum, but we'll see. I'm wondering how to refactor some code for better performance that is getting run a bunch of times.
Say I'm creating a word frequency list, using a Map (probably a HashMap), where each key is a String with the word that's being counted and the value is an Integer that's incremented each time a token of the word is found.
In Perl, incrementing such a value would be trivially easy:
$map{$word}++;
But in Java, it's much more complicated. Here the way I'm currently doing it:
int count = map.containsKey(word) ? map.get(word) : 0;
map.put(word, count + 1);
Which of course relies on the autoboxing feature in the newer Java versions. I wonder if you can suggest a more efficient way of incrementing such a value. Are there even good performance reasons for eschewing the Collections framework and using a something else instead?
Update: I've done a test of several of the answers. See below.
Now there is a shorter way with Java 8 using Map::merge.
myMap.merge(key, 1, Integer::sum)
or
myMap.merge(key, 1L, Long::sum)
for longs respectively.
What it does:
if key do not exists, put 1 as value
otherwise sum 1 to the value linked to key
More information here.
Some test results
I've gotten a lot of good answers to this question--thanks folks--so I decided to run some tests and figure out which method is actually fastest. The five methods I tested are these:
the "ContainsKey" method that I presented in the question
the "TestForNull" method suggested by Aleksandar Dimitrov
the "AtomicLong" method suggested by Hank Gay
the "Trove" method suggested by jrudolph
the "MutableInt" method suggested by phax.myopenid.com
Method
Here's what I did...
created five classes that were identical except for the differences shown below. Each class had to perform an operation typical of the scenario I presented: opening a 10MB file and reading it in, then performing a frequency count of all the word tokens in the file. Since this took an average of only 3 seconds, I had it perform the frequency count (not the I/O) 10 times.
timed the loop of 10 iterations but not the I/O operation and recorded the total time taken (in clock seconds) essentially using Ian Darwin's method in the Java Cookbook.
performed all five tests in series, and then did this another three times.
averaged the four results for each method.
Results
I'll present the results first and the code below for those who are interested.
The ContainsKey method was, as expected, the slowest, so I'll give the speed of each method in comparison to the speed of that method.
ContainsKey: 30.654 seconds (baseline)
AtomicLong: 29.780 seconds (1.03 times as fast)
TestForNull: 28.804 seconds (1.06 times as fast)
Trove: 26.313 seconds (1.16 times as fast)
MutableInt: 25.747 seconds (1.19 times as fast)
Conclusions
It would appear that only the MutableInt method and the Trove method are significantly faster, in that only they give a performance boost of more than 10%. However, if threading is an issue, AtomicLong might be more attractive than the others (I'm not really sure). I also ran TestForNull with final variables, but the difference was negligible.
Note that I haven't profiled memory usage in the different scenarios. I'd be happy to hear from anybody who has good insights into how the MutableInt and Trove methods would be likely to affect memory usage.
Personally, I find the MutableInt method the most attractive, since it doesn't require loading any third-party classes. So unless I discover problems with it, that's the way I'm most likely to go.
The code
Here is the crucial code from each method.
ContainsKey
import java.util.HashMap;
import java.util.Map;
...
Map<String, Integer> freq = new HashMap<String, Integer>();
...
int count = freq.containsKey(word) ? freq.get(word) : 0;
freq.put(word, count + 1);
TestForNull
import java.util.HashMap;
import java.util.Map;
...
Map<String, Integer> freq = new HashMap<String, Integer>();
...
Integer count = freq.get(word);
if (count == null) {
freq.put(word, 1);
}
else {
freq.put(word, count + 1);
}
AtomicLong
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.atomic.AtomicLong;
...
final ConcurrentMap<String, AtomicLong> map =
new ConcurrentHashMap<String, AtomicLong>();
...
map.putIfAbsent(word, new AtomicLong(0));
map.get(word).incrementAndGet();
Trove
import gnu.trove.TObjectIntHashMap;
...
TObjectIntHashMap<String> freq = new TObjectIntHashMap<String>();
...
freq.adjustOrPutValue(word, 1, 1);
MutableInt
import java.util.HashMap;
import java.util.Map;
...
class MutableInt {
int value = 1; // note that we start at 1 since we're counting
public void increment () { ++value; }
public int get () { return value; }
}
...
Map<String, MutableInt> freq = new HashMap<String, MutableInt>();
...
MutableInt count = freq.get(word);
if (count == null) {
freq.put(word, new MutableInt());
}
else {
count.increment();
}
A little research in 2016: https://github.com/leventov/java-word-count, benchmark source code
Best results per method (smaller is better):
time, ms
kolobokeCompile 18.8
koloboke 19.8
trove 20.8
fastutil 22.7
mutableInt 24.3
atomicInteger 25.3
eclipse 26.9
hashMap 28.0
hppc 33.6
hppcRt 36.5
Time\space results:
Map<String, Integer> map = new HashMap<>();
String key = "a random key";
int count = map.getOrDefault(key, 0); // ensure count will be one of 0,1,2,3,...
map.put(key, count + 1);
And that's how you increment a value with simple code.
Benefit:
No need to add a new class or use another concept of mutable int
Not relying on any library
Easy to understand what's going on exactly (Not too much abstraction)
Downside:
The hash map will be searched twice for get() and put(). So it will not be the most performant code.
Theoretically, once you call get(), you already know where to put(), so you should not have to search again. But searching in hash map usually takes a very minimal time that you can kind of ignore this performance issue.
But if you are very serious about the issue, you are a perfectionist, another way is to use merge method, this is (probably) more efficient than the previous code snippet as you will be (theoretically) searching the map only once: (though this code is not obvious from first sight, it's short and performant)
map.merge(key, 1, (a,b) -> a+b);
Suggestion: you should care about code readability more than little performance gain in most of the time. If the first code snippet is easier for you to understand then use it. But if you are able to understand the 2nd one fine then you can also go for it!
As a follow-up to my own comment: Trove looks like the way to go. If, for whatever reason, you wanted to stick with the standard JDK, ConcurrentMap and AtomicLong can make the code a tiny bit nicer, though YMMV.
final ConcurrentMap<String, AtomicLong> map = new ConcurrentHashMap<String, AtomicLong>();
map.putIfAbsent("foo", new AtomicLong(0));
map.get("foo").incrementAndGet();
will leave 1 as the value in the map for foo. Realistically, increased friendliness to threading is all that this approach has to recommend it.
Google Guava is your friend...
...at least in some cases. They have this nice AtomicLongMap. Especially nice because you are dealing with long as value in your map.
E.g.
AtomicLongMap<String> map = AtomicLongMap.create();
[...]
map.getAndIncrement(word);
Also possible to add more then 1 to the value:
map.getAndAdd(word, 112L);
It's always a good idea to look at the Google Collections Library for this kind of thing. In this case a Multiset will do the trick:
Multiset bag = Multisets.newHashMultiset();
String word = "foo";
bag.add(word);
bag.add(word);
System.out.println(bag.count(word)); // Prints 2
There are Map-like methods for iterating over keys/entries, etc. Internally the implementation currently uses a HashMap<E, AtomicInteger>, so you will not incur boxing costs.
You should be aware of the fact that your original attempt
int count = map.containsKey(word) ? map.get(word) : 0;
contains two potentially expensive operations on a map, namely containsKey and get. The former performs an operation potentially pretty similar to the latter, so you're doing the same work twice!
If you look at the API for Map, get operations usually return null when the map does not contain the requested element.
Note that this will make a solution like
map.put( key, map.get(key) + 1 );
dangerous, since it might yield NullPointerExceptions. You should check for a null first.
Also note, and this is very important, that HashMaps can contain nulls by definition. So not every returned null says "there is no such element". In this respect, containsKey behaves differently from get in actually telling you whether there is such an element. Refer to the API for details.
For your case, however, you might not want to distinguish between a stored null and "noSuchElement". If you don't want to permit nulls you might prefer a Hashtable. Using a wrapper library as was already proposed in other answers might be a better solution to manual treatment, depending on the complexity of your application.
To complete the answer (and I forgot to put that in at first, thanks to the edit function!), the best way of doing it natively, is to get into a final variable, check for null and put it back in with a 1. The variable should be final because it's immutable anyway. The compiler might not need this hint, but its clearer that way.
final HashMap map = generateRandomHashMap();
final Object key = fetchSomeKey();
final Integer i = map.get(key);
if (i != null) {
map.put(i + 1);
} else {
// do something
}
If you do not want to rely on autoboxing, you should say something like map.put(new Integer(1 + i.getValue())); instead.
Another way would be creating a mutable integer:
class MutableInt {
int value = 0;
public void inc () { ++value; }
public int get () { return value; }
}
...
Map<String,MutableInt> map = new HashMap<String,MutableInt> ();
MutableInt value = map.get (key);
if (value == null) {
value = new MutableInt ();
map.put (key, value);
} else {
value.inc ();
}
of course this implies creating an additional object but the overhead in comparison to creating an Integer (even with Integer.valueOf) should not be so much.
You can make use of computeIfAbsent method in Map interface provided in Java 8.
final Map<String,AtomicLong> map = new ConcurrentHashMap<>();
map.computeIfAbsent("A", k->new AtomicLong(0)).incrementAndGet();
map.computeIfAbsent("B", k->new AtomicLong(0)).incrementAndGet();
map.computeIfAbsent("A", k->new AtomicLong(0)).incrementAndGet(); //[A=2, B=1]
The method computeIfAbsent checks if the specified key is already associated with a value or not? If no associated value then it attempts to compute its value using the given mapping function. In any case it returns the current (existing or computed) value associated with the specified key, or null if the computed value is null.
On a side note if you have a situation where multiple threads update a common sum you can have a look at LongAdder class.Under high contention, expected throughput of this class is significantly higher than AtomicLong, at the expense of higher space consumption.
Quite simple, just use the built-in function in Map.java as followed
map.put(key, map.getOrDefault(key, 0) + 1);
Memory rotation may be an issue here, since every boxing of an int larger than or equal to 128 causes an object allocation (see Integer.valueOf(int)). Although the garbage collector very efficiently deals with short-lived objects, performance will suffer to some degree.
If you know that the number of increments made will largely outnumber the number of keys (=words in this case), consider using an int holder instead. Phax already presented code for this. Here it is again, with two changes (holder class made static and initial value set to 1):
static class MutableInt {
int value = 1;
void inc() { ++value; }
int get() { return value; }
}
...
Map<String,MutableInt> map = new HashMap<String,MutableInt>();
MutableInt value = map.get(key);
if (value == null) {
value = new MutableInt();
map.put(key, value);
} else {
value.inc();
}
If you need extreme performance, look for a Map implementation which is directly tailored towards primitive value types. jrudolph mentioned GNU Trove.
By the way, a good search term for this subject is "histogram".
I suggest to use Java 8 Map::compute().
It considers the case when a key doesn't exist, too.
Map.compute(num, (k, v) -> (v == null) ? 1 : v + 1);
Instead of calling containsKey() it is faster just to call map.get and check if the returned value is null or not.
Integer count = map.get(word);
if(count == null){
count = 0;
}
map.put(word, count + 1);
Are you sure that this is a bottleneck? Have you done any performance analysis?
Try using the NetBeans profiler (its free and built into NB 6.1) to look at hotspots.
Finally, a JVM upgrade (say from 1.5->1.6) is often a cheap performance booster. Even an upgrade in build number can provide good performance boosts. If you are running on Windows and this is a server class application, use -server on the command line to use the Server Hotspot JVM. On Linux and Solaris machines this is autodetected.
There are a couple of approaches:
Use a Bag alorithm like the sets contained in Google Collections.
Create mutable container which you can use in the Map:
class My{
String word;
int count;
}
And use put("word", new My("Word") ); Then you can check if it exists and increment when adding.
Avoid rolling your own solution using lists, because if you get innerloop searching and sorting, your performance will stink. The first HashMap solution is actually quite fast, but a proper like that found in Google Collections is probably better.
Counting words using Google Collections, looks something like this:
HashMultiset s = new HashMultiset();
s.add("word");
s.add("word");
System.out.println(""+s.count("word") );
Using the HashMultiset is quite elegent, because a bag-algorithm is just what you need when counting words.
A variation on the MutableInt approach that might be even faster, if a bit of a hack, is to use a single-element int array:
Map<String,int[]> map = new HashMap<String,int[]>();
...
int[] value = map.get(key);
if (value == null)
map.put(key, new int[]{1} );
else
++value[0];
It would be interesting if you could rerun your performance tests with this variation. It might be the fastest.
Edit: The above pattern worked fine for me, but eventually I changed to use Trove's collections to reduce memory size in some very large maps I was creating -- and as a bonus it was also faster.
One really nice feature is that the TObjectIntHashMap class has a single adjustOrPutValue call that, depending on whether there is already a value at that key, will either put an initial value or increment the existing value. This is perfect for incrementing:
TObjectIntHashMap<String> map = new TObjectIntHashMap<String>();
...
map.adjustOrPutValue(key, 1, 1);
Google Collections HashMultiset :
- quite elegant to use
- but consume CPU and memory
Best would be to have a method like : Entry<K,V> getOrPut(K);
(elegant, and low cost)
Such a method will compute hash and index only once,
and then we could do what we want with the entry
(either replace or update the value).
More elegant:
- take a HashSet<Entry>
- extend it so that get(K) put a new Entry if needed
- Entry could be your own object.
--> (new MyHashSet()).get(k).increment();
"put" need "get" (to ensure no duplicate key).
So directly do a "put",
and if there was a previous value, then do an addition:
Map map = new HashMap ();
MutableInt newValue = new MutableInt (1); // default = inc
MutableInt oldValue = map.put (key, newValue);
if (oldValue != null) {
newValue.add(oldValue); // old + inc
}
If count starts at 0, then add 1: (or any others values...)
Map map = new HashMap ();
MutableInt newValue = new MutableInt (0); // default
MutableInt oldValue = map.put (key, newValue);
if (oldValue != null) {
newValue.setValue(oldValue + 1); // old + inc
}
Notice : This code is not thread safe. Use it to build then use the map, not to concurrently update it.
Optimization : In a loop, keep old value to become the new value of next loop.
Map map = new HashMap ();
final int defaut = 0;
final int inc = 1;
MutableInt oldValue = new MutableInt (default);
while(true) {
MutableInt newValue = oldValue;
oldValue = map.put (key, newValue); // insert or...
if (oldValue != null) {
newValue.setValue(oldValue + inc); // ...update
oldValue.setValue(default); // reuse
} else
oldValue = new MutableInt (default); // renew
}
}
The various primitive wrappers, e.g., Integer are immutable so there's really not a more concise way to do what you're asking unless you can do it with something like AtomicLong. I can give that a go in a minute and update. BTW, Hashtable is a part of the Collections Framework.
I'd use Apache Collections Lazy Map (to initialize values to 0) and use MutableIntegers from Apache Lang as values in that map.
Biggest cost is having to serach the map twice in your method. In mine you have to do it just once. Just get the value (it will get initialized if absent) and increment it.
The Functional Java library's TreeMap datastructure has an update method in the latest trunk head:
public TreeMap<K, V> update(final K k, final F<V, V> f)
Example usage:
import static fj.data.TreeMap.empty;
import static fj.function.Integers.add;
import static fj.pre.Ord.stringOrd;
import fj.data.TreeMap;
public class TreeMap_Update
{public static void main(String[] a)
{TreeMap<String, Integer> map = empty(stringOrd);
map = map.set("foo", 1);
map = map.update("foo", add.f(1));
System.out.println(map.get("foo").some());}}
This program prints "2".
I don't know how efficient it is but the below code works as well.You need to define a BiFunction at the beginning. Plus, you can make more than just increment with this method.
public static Map<String, Integer> strInt = new HashMap<String, Integer>();
public static void main(String[] args) {
BiFunction<Integer, Integer, Integer> bi = (x,y) -> {
if(x == null)
return y;
return x+y;
};
strInt.put("abc", 0);
strInt.merge("abc", 1, bi);
strInt.merge("abc", 1, bi);
strInt.merge("abc", 1, bi);
strInt.merge("abcd", 1, bi);
System.out.println(strInt.get("abc"));
System.out.println(strInt.get("abcd"));
}
output is
3
1
If you're using Eclipse Collections, you can use a HashBag. It will be the most efficient approach in terms of memory usage and it will also perform well in terms of execution speed.
HashBag is backed by a MutableObjectIntMap which stores primitive ints instead of Counter objects. This reduces memory overhead and improves execution speed.
HashBag provides the API you'd need since it's a Collection that also allows you to query for the number of occurrences of an item.
Here's an example from the Eclipse Collections Kata.
MutableBag<String> bag =
HashBag.newBagWith("one", "two", "two", "three", "three", "three");
Assert.assertEquals(3, bag.occurrencesOf("three"));
bag.add("one");
Assert.assertEquals(2, bag.occurrencesOf("one"));
bag.addOccurrences("one", 4);
Assert.assertEquals(6, bag.occurrencesOf("one"));
Note: I am a committer for Eclipse Collections.
Counting using streams and getOrDefault:
String s = "abcdeff";
s.chars().mapToObj(c -> (char) c)
.forEach(c -> {
int count = countMap.getOrDefault(c, 0) + 1;
countMap.put(c, count);
});
Since a lot of people search Java topics for Groovy answers, here's how you can do it in Groovy:
dev map = new HashMap<String, Integer>()
map.put("key1", 3)
map.merge("key1", 1) {a, b -> a + b}
map.merge("key2", 1) {a, b -> a + b}
Hope I'm understanding your question correctly, I'm coming to Java from Python so I can empathize with your struggle.
if you have
map.put(key, 1)
you would do
map.put(key, map.get(key) + 1)
Hope this helps!
The simple and easy way in java 8 is the following:
final ConcurrentMap<String, AtomicLong> map = new ConcurrentHashMap<String, AtomicLong>();
map.computeIfAbsent("foo", key -> new AtomicLong(0)).incrementAndGet();