I need to hold a large number of elements (500k or so) in a list or a set I need to do high performance traversal, addition and removal. This will be done in a multithreaded environment and I don't care if I gets to see the updates done after traversal began (weakly consistent), what Java collection is right for this scenario?
I need to hold a large number of
elements (500k or so) in a list or a
set I need to do high performance
traversal, addition and removal.
...
This will be done in a multithreaded
environment
ConcrrentSkipListMap - it's not a List but List semantics are practically useless in concurrent environment.
It will have the elements sorted in a tree alike structure and not accessible via hashing, so you need some natural ordering (or external via comparator)
If you need only add/remove at the ends of a Queue - ConcurrentLinkedQueue.
Synchronized collections are not suited for multi-threaded environment if you expect even moderate contention. They require full lock holding during the entire traverse operation as well. I'd advise against ConcurrentHashMap, either.
In the end: if you are going for real multi-CPU like 64+ and expect high contention and don't want natural ordering follow the link: http://sourceforge.net/projects/high-scale-lib
Here is a very good article on selecting a collection depending on your application
http://www.developer.com/java/article.php/3829891/Selecting-the-Best-Java-Collection-Class-for-Your-Application.htm
you can try this as well
http://www.javamex.com/tutorials/collections/how_to_choose.shtml
If traversal == read, and add/remove == update, I'd say that it's not often that a single collection is optimized for both operations.
But your best bet is likely to be a HashMap.
Multithreaded - so look at j.u.concurrent. Maybe ConcurrentHashMap used as a Set - e.g. use put(x, x) instead of add(x).
If you do addition and removal often, then something "linked" is probably the best choice. That way everytime you add/remove only an index has to be updated, in contrast to an ArrayList for example where the whole Array has to be "moved". The problem is that you are asking for the holy grail of Collections.
Taking a look at the Concurrent Collections might help.
But what do you mean by "traversal"?
If you need to add or remove items in the middle of a list quickly, LinkedList is a good choice. To use it in multithreaded enviroment, you need to synchronise it like this:
List l = Collections.synchronisedList(new LinkedList());
On other hand, due to large size of data, is it possible to store the data in database? And use memory collection as cache.
are duplicate items allowed?
is yes, Set can't be used. you can use SortedSet otherwise.
Related
I have a collection of 'effects' I draw on an 'object' in a GUI ( gradients, textures, text etc ). The nature of the underlying system means that this effect collection can be accessed by multiple threads. The majority of operations are reads, so at the moment I'm using a CopyOnWriteArrayList which works ok.
But now I need to sort the collection of effects based on their draw order whenever I add a new effect or change an effect's draw order. I also need to be able to iterate through the collection in forwards & reverse ( iterater.next() & iterator.previous() ).
After some research I've found CopyOnWriteArrayLists don't like being sorted:
Behaviour of CopyOnWriteArrayList
If you tried to sort a CopyOnWriteArrayList you'll see the list throws an UsupportedOperationException (the sort invokes set on the collection N times). You should only use this read when you are doing upwards of 90+% reads.
I also found a suggestion of using ConcurrentSkipListSet, as it handles concurrency & sorting, but looking at the JavaDoc I'm worried about this:
Beware that, unlike in most collections, the size method is not a constant-time operation.
And I use & rely on the size() method quite a bit.
I could implement a system of manual synchronization on the effect collection as a standard ArrayList, but this is a very big refactor & I'd rather exhaust all other possibilities, if anyone has any ideas? Thanks for sticking with me this far.
Probably the best way to go is to manually synchronize at the point where you are sorting your collection. You could do something like (pseudo code) :
synchronize {
convert copyonwritearrylist to normal list.
sort normallist.
convert normal list to copyonwritearraylist and replace the shared instance
}
Alternatively you might just use a normal ArrayList and roll out your own solution using ReentrantReadWriteLock This should work OK in case you have more reads than writes.
The JDK ships with CopyOnWrite* implementations for Set and List, but none for Map and I've often lamented this fact. I know there are other collections implementations out there that have them, but it would be nice if one shipped as standard. It seems like an obvious omission and I'm wondering if there was a good reason for it. Anyone any idea why this was left out?
I guess this depends on your use case, but why would you need a CopyOnWriteMap when you already have a ConcurrentHashMap?
For a plain lookup table with many readers and only one or few updates it is a good fit.
Compared to a copy on write collection:
Read concurrency:
Equal to a copy on write collection. Several readers can retrieve elements from the map concurrently in a lock-free fashion.
Write concurrency:
Better concurrency than the copy on write collections that basically serialize updates (one update at a time). Using a concurrent hash map you have a good chance of doing several updates concurrently. If your hash keys are evenly distributed.
If you do want to have the effect of a copy on write map, you can always initialize a ConcurrentHashMap with a concurrency level of 1.
The easiest implementation of a set would usually be to use an underlying map. They even have a Collections.newSetFromMap() method [maybe from 1.6 only].
What they should have done was have a CopyOnWriteMap and the CopyOnWriteSet being equivalent to Collections.newSetFromMap(new CopyOnWriteMap()).
But as you can see the CopyOnWriteArraySet is actually backed by an array not a map. And wouldn't Collections.newSetFromMap(ConcurrentHashMap()) be acceptable for your usecase?
I'm looking for a good hash map implementation. Specifically, one that's good for creating a large number of maps, most of them small. So memory is an issue. It should be thread-safe (though losing the odd put might be an OK compromise in return for better performance), and fast for both get and put. And I'd also like the moon on a stick, please, with a side-order of justice.
The options I know are:
HashMap. Disastrously un-thread safe.
ConcurrentHashMap. My first choice, but this has a hefty memory footprint - about 2k per instance.
Collections.sychronizedMap(HashMap). That's working OK for me, but I'm sure there must be faster alternatives.
Trove or Colt - I think neither of these are thread-safe, but perhaps the code could be adapted to be thread safe.
Any others? Any advice on what beats what when? Any really good new hash map algorithms that Java could use an implementation of?
Thanks in advance for your input!
Collections.synchronizedMap() simply makes all the Map methods synchronized.
ConcurrentMap is really the interface you want and there are several implementations (eg ConcurrentHashMap, ConcurrentSkipList). It has several operations that Map doesn't that are important for threadsafe operations. Plus it is more granular than a synchronized Map as an operation will only lock a slice of the backing data structure rather than the entire thing.
I have no experience of the following, but I worked with a project once who swore by Javolution for real time and memory sensitive tasks.
I notice in the API there is FastMap that claims to be thread safe. As I say, I've no idea if it's any good for you, but worth a look:
API for FastMap
Javolution Home
Google Collection's MapMaker seems like it can do the job too.
Very surprising that it has a 2k foot print!! How about making ConcurrentHashMap's concurrency setting lower (e.g. 2-3), and optimizing its initial size (= make smaller).
I don't know where that memory consumption is coming from, but maybe it has something to do with maintaining striped locks. If you lower the concurrency setting, it will have less.
If you want good performance with out-of-the-box thread safety, ConcurrentHashMap is really nice.
Well, there's a spruced-up Colt in Apache Mahout. It's still not in the current business. What's wrong with protecting the code with a synchronized block? Are you expecting some devilishly complex scheme that hold locks for smaller granularity than put or get?
If you can code one, please contribute it to Mahout.
It's worth taking a look at the persistent hash maps in Clojure.
These are immutable, thread safe data structures with performance comparable to classic Java HashMaps. You'd obviously need to wrap them if you want a mutable map, but that shouldn't be difficult.
http://clojure.org/data_structures
I just saw this data-structure on the Java 6 API and I'm curious about when it would be an useful resource. I'm studying for the scjp exam and I don't see it covered on Kathy Sierra's book, even though I've seen mock exam questions that mention it.
ConcurrentSkipListSet and ConcurrentSkipListMap are useful when you need a sorted container that will be accessed by multiple threads. These are essentially the equivalents of TreeMap and TreeSet for concurrent code.
The implementation for JDK 6 is based on High Performance Dynamic Lock-Free Hash Tables and List-Based Sets by Maged Michael at IBM, which shows that you can implement a lot of operations on skip lists atomically using compare and swap (CAS) operations. These are lock-free, so you don't have to worry about the overhead of synchronized (for most operations) when you use these classes.
There's currently no Red-Black tree based concurrent Map/Set implementation in Java. I looked through the literature a bit and found a couple papers that showed concurrent RB trees outperforming skip lists, but a lot of these tests were done with transactional memory, which isn't supported in hardware on any major architectures at the moment.
I'm assuming the JDK guys went with a skip list here because the implementation was well-known and because making it lock-free was simple and portable (using CAS). If anyone cares to clarify, please do. I'm curious.
skip lists are sorted lists, and efficient to modify with log(n) performance. in that regard it's like TreeSet. however there is no ConcurrentTreeSet. what I heard is that skip list is very easy to implement, that's probably why.
Anyway, when you need a concurrent, sorted and efficient set, you can use ConcurrentSkipListSet
These are useful when you need a set that can safely be accessed by multiple threads simultaneously. It also provides decent performance by being weakly consistent -- inserts can be made safely while you're iterating through the Set, but there's no guarantee that your Iterator will see that insert.
ConcurrentSkipListMap was a fantastic find when I needed to implement a replication layer for a home-grown cache. The Map aspects implemented the cache, and the underlying List aspects let me keep track of the order in which objects appeared in the cache. The "skip" aspect of that list made it efficient to remove an object from one spot in the list and bump it to the end when it was replaced in the cache.
java.util.PriorityQueue allows a Comparator to be passed at construction time. When inserting elements, they are ordered according to the priority specified by the comparator.
What happens when the priority of an element changes after it has been inserted? When does the PriorityQueue reorder elements? Is it possible to poll an element that does not actually have minimal priority?
Are there good implementations of a priority queue which allow efficient priority updates?
You should remove the element, change it, and re-insert, since ordering occurs when it is inserted. Although it involves several steps, it is efficient might be good enough. (I just noticed the comment about removal being O(n).)
One problem is that it will also re-order when you remove the element, which is redundant if you are just going to re-insert it a moment later. If you implement your own priority queue from scratch, you could have an update() that skips this step, but extending Java's class won't work because you are still limited to the remove() and add() provided by the base.
I would expect the PriorityQueue to not reorder things - and it could get very confused if it tries to do a binary search to find the right place to put any new entries.
Generally speaking I'd expect changing the priority of something already in a queue to be a bad idea, just like changing the values making up a key in a hash table.