When is a ConcurrentSkipListSet useful? - java

I just saw this data-structure on the Java 6 API and I'm curious about when it would be an useful resource. I'm studying for the scjp exam and I don't see it covered on Kathy Sierra's book, even though I've seen mock exam questions that mention it.

ConcurrentSkipListSet and ConcurrentSkipListMap are useful when you need a sorted container that will be accessed by multiple threads. These are essentially the equivalents of TreeMap and TreeSet for concurrent code.
The implementation for JDK 6 is based on High Performance Dynamic Lock-Free Hash Tables and List-Based Sets by Maged Michael at IBM, which shows that you can implement a lot of operations on skip lists atomically using compare and swap (CAS) operations. These are lock-free, so you don't have to worry about the overhead of synchronized (for most operations) when you use these classes.
There's currently no Red-Black tree based concurrent Map/Set implementation in Java. I looked through the literature a bit and found a couple papers that showed concurrent RB trees outperforming skip lists, but a lot of these tests were done with transactional memory, which isn't supported in hardware on any major architectures at the moment.
I'm assuming the JDK guys went with a skip list here because the implementation was well-known and because making it lock-free was simple and portable (using CAS). If anyone cares to clarify, please do. I'm curious.

skip lists are sorted lists, and efficient to modify with log(n) performance. in that regard it's like TreeSet. however there is no ConcurrentTreeSet. what I heard is that skip list is very easy to implement, that's probably why.
Anyway, when you need a concurrent, sorted and efficient set, you can use ConcurrentSkipListSet

These are useful when you need a set that can safely be accessed by multiple threads simultaneously. It also provides decent performance by being weakly consistent -- inserts can be made safely while you're iterating through the Set, but there's no guarantee that your Iterator will see that insert.

ConcurrentSkipListMap was a fantastic find when I needed to implement a replication layer for a home-grown cache. The Map aspects implemented the cache, and the underlying List aspects let me keep track of the order in which objects appeared in the cache. The "skip" aspect of that list made it efficient to remove an object from one spot in the list and bump it to the end when it was replaced in the cache.

Related

Lightweight collection w/ Set contract

I'm developing in an application requiring lots of objects in memory. One of the largest structures is of the type
Map<String,Set<OwnObject>> (with Set as HashSet)
with OwnObject being a heavyweight object representing records in a database. The application works, but has a rather large memory footprint. Reading this Java Specialists newsletter from 2001, I've analyzed the memory usage of my large structure above. The HashSet uses a HashMap in the back, which in turn is quite a heavyweight object, and I guess this is where most of my additional memory goes.
Trying to optimize the memory usage of the structure, I tried around with multiple versions:
Map<String,List<OwnObject>> (with List as ArrayList)
Map<String,OwnObject[]>
Both work, and both are much more lean than the version using the Set<>. However, I'd like to keep the Set contract in place (uniqueness of entries).
One way would be to implement the logic myself. I could extend ArrayList and ensure the contract in add().
Are there frameworks implementing lightweight collections that honor the Set contract? Or do I miss something from the Java collections that I could use without ensuring uniqueness by myself?
The solution I implemented is the following:
Map<String,OwnObject[]>
Adding and removing to the array was done using Arrays.binarySearch() and 2 slice System.arraysCopy()s, by which sorting and uniqueness happen on the side.

Thread-safe Tree

Is there a thread-safe implementation of a tree in Java? I have found a bit of information that recommends using synchronized() around the add and remove methods, but I interested in seeing if there is anything built into Java.
Edit: I am trying to use an Octree. Just learning as I go, but I am using this project to learn both multi-threading and spatial indexing so there are lots of new topics for me here. If anyone has some particularly good reference material please do share.
From the documentation for TreeMap:
SortedMap m = Collections.synchronizedSortedMap(new TreeMap(...));
Note that this only makes each call synchronized. In many cases this is the wrong granularity for an application and you are better off synchronizing at a higher level. See the docs for synchronizedSortedMap.
You can use Collections.synchronizedSet() or synchronizedMap() to add the synchronization around individual methods, but thread safety isn't really a property of a data structue but of an application. The wrapper will not be sufficient if you iterate over the tree, or do series of operations that need to be atomic.
A java.util.concurrent.ConcurrentSkipListMap might be of interest. This is overkill for most uses, but if you need fine-grained synchronization there's nothing like it. And overkill beats underkill. Of course, it's not a tree, but does the same job. I do not believe you can get low-level synchronization in a real tree.

What is the right Java collection to use in this case?

I need to hold a large number of elements (500k or so) in a list or a set I need to do high performance traversal, addition and removal. This will be done in a multithreaded environment and I don't care if I gets to see the updates done after traversal began (weakly consistent), what Java collection is right for this scenario?
I need to hold a large number of
elements (500k or so) in a list or a
set I need to do high performance
traversal, addition and removal.
...
This will be done in a multithreaded
environment
ConcrrentSkipListMap - it's not a List but List semantics are practically useless in concurrent environment.
It will have the elements sorted in a tree alike structure and not accessible via hashing, so you need some natural ordering (or external via comparator)
If you need only add/remove at the ends of a Queue - ConcurrentLinkedQueue.
Synchronized collections are not suited for multi-threaded environment if you expect even moderate contention. They require full lock holding during the entire traverse operation as well. I'd advise against ConcurrentHashMap, either.
In the end: if you are going for real multi-CPU like 64+ and expect high contention and don't want natural ordering follow the link: http://sourceforge.net/projects/high-scale-lib
Here is a very good article on selecting a collection depending on your application
http://www.developer.com/java/article.php/3829891/Selecting-the-Best-Java-Collection-Class-for-Your-Application.htm
you can try this as well
http://www.javamex.com/tutorials/collections/how_to_choose.shtml
If traversal == read, and add/remove == update, I'd say that it's not often that a single collection is optimized for both operations.
But your best bet is likely to be a HashMap.
Multithreaded - so look at j.u.concurrent. Maybe ConcurrentHashMap used as a Set - e.g. use put(x, x) instead of add(x).
If you do addition and removal often, then something "linked" is probably the best choice. That way everytime you add/remove only an index has to be updated, in contrast to an ArrayList for example where the whole Array has to be "moved". The problem is that you are asking for the holy grail of Collections.
Taking a look at the Concurrent Collections might help.
But what do you mean by "traversal"?
If you need to add or remove items in the middle of a list quickly, LinkedList is a good choice. To use it in multithreaded enviroment, you need to synchronise it like this:
List l = Collections.synchronisedList(new LinkedList());
On other hand, due to large size of data, is it possible to store the data in database? And use memory collection as cache.
are duplicate items allowed?
is yes, Set can't be used. you can use SortedSet otherwise.

Why doesn't Java ship with a CopyOnWriteMap?

The JDK ships with CopyOnWrite* implementations for Set and List, but none for Map and I've often lamented this fact. I know there are other collections implementations out there that have them, but it would be nice if one shipped as standard. It seems like an obvious omission and I'm wondering if there was a good reason for it. Anyone any idea why this was left out?
I guess this depends on your use case, but why would you need a CopyOnWriteMap when you already have a ConcurrentHashMap?
For a plain lookup table with many readers and only one or few updates it is a good fit.
Compared to a copy on write collection:
Read concurrency:
Equal to a copy on write collection. Several readers can retrieve elements from the map concurrently in a lock-free fashion.
Write concurrency:
Better concurrency than the copy on write collections that basically serialize updates (one update at a time). Using a concurrent hash map you have a good chance of doing several updates concurrently. If your hash keys are evenly distributed.
If you do want to have the effect of a copy on write map, you can always initialize a ConcurrentHashMap with a concurrency level of 1.
The easiest implementation of a set would usually be to use an underlying map. They even have a Collections.newSetFromMap() method [maybe from 1.6 only].
What they should have done was have a CopyOnWriteMap and the CopyOnWriteSet being equivalent to Collections.newSetFromMap(new CopyOnWriteMap()).
But as you can see the CopyOnWriteArraySet is actually backed by an array not a map. And wouldn't Collections.newSetFromMap(ConcurrentHashMap()) be acceptable for your usecase?

Java: multi-threaded maps: how do the implementations compare?

I'm looking for a good hash map implementation. Specifically, one that's good for creating a large number of maps, most of them small. So memory is an issue. It should be thread-safe (though losing the odd put might be an OK compromise in return for better performance), and fast for both get and put. And I'd also like the moon on a stick, please, with a side-order of justice.
The options I know are:
HashMap. Disastrously un-thread safe.
ConcurrentHashMap. My first choice, but this has a hefty memory footprint - about 2k per instance.
Collections.sychronizedMap(HashMap). That's working OK for me, but I'm sure there must be faster alternatives.
Trove or Colt - I think neither of these are thread-safe, but perhaps the code could be adapted to be thread safe.
Any others? Any advice on what beats what when? Any really good new hash map algorithms that Java could use an implementation of?
Thanks in advance for your input!
Collections.synchronizedMap() simply makes all the Map methods synchronized.
ConcurrentMap is really the interface you want and there are several implementations (eg ConcurrentHashMap, ConcurrentSkipList). It has several operations that Map doesn't that are important for threadsafe operations. Plus it is more granular than a synchronized Map as an operation will only lock a slice of the backing data structure rather than the entire thing.
I have no experience of the following, but I worked with a project once who swore by Javolution for real time and memory sensitive tasks.
I notice in the API there is FastMap that claims to be thread safe. As I say, I've no idea if it's any good for you, but worth a look:
API for FastMap
Javolution Home
Google Collection's MapMaker seems like it can do the job too.
Very surprising that it has a 2k foot print!! How about making ConcurrentHashMap's concurrency setting lower (e.g. 2-3), and optimizing its initial size (= make smaller).
I don't know where that memory consumption is coming from, but maybe it has something to do with maintaining striped locks. If you lower the concurrency setting, it will have less.
If you want good performance with out-of-the-box thread safety, ConcurrentHashMap is really nice.
Well, there's a spruced-up Colt in Apache Mahout. It's still not in the current business. What's wrong with protecting the code with a synchronized block? Are you expecting some devilishly complex scheme that hold locks for smaller granularity than put or get?
If you can code one, please contribute it to Mahout.
It's worth taking a look at the persistent hash maps in Clojure.
These are immutable, thread safe data structures with performance comparable to classic Java HashMaps. You'd obviously need to wrap them if you want a mutable map, but that shouldn't be difficult.
http://clojure.org/data_structures

Categories