I'm looking for a good hash map implementation. Specifically, one that's good for creating a large number of maps, most of them small. So memory is an issue. It should be thread-safe (though losing the odd put might be an OK compromise in return for better performance), and fast for both get and put. And I'd also like the moon on a stick, please, with a side-order of justice.
The options I know are:
HashMap. Disastrously un-thread safe.
ConcurrentHashMap. My first choice, but this has a hefty memory footprint - about 2k per instance.
Collections.sychronizedMap(HashMap). That's working OK for me, but I'm sure there must be faster alternatives.
Trove or Colt - I think neither of these are thread-safe, but perhaps the code could be adapted to be thread safe.
Any others? Any advice on what beats what when? Any really good new hash map algorithms that Java could use an implementation of?
Thanks in advance for your input!
Collections.synchronizedMap() simply makes all the Map methods synchronized.
ConcurrentMap is really the interface you want and there are several implementations (eg ConcurrentHashMap, ConcurrentSkipList). It has several operations that Map doesn't that are important for threadsafe operations. Plus it is more granular than a synchronized Map as an operation will only lock a slice of the backing data structure rather than the entire thing.
I have no experience of the following, but I worked with a project once who swore by Javolution for real time and memory sensitive tasks.
I notice in the API there is FastMap that claims to be thread safe. As I say, I've no idea if it's any good for you, but worth a look:
API for FastMap
Javolution Home
Google Collection's MapMaker seems like it can do the job too.
Very surprising that it has a 2k foot print!! How about making ConcurrentHashMap's concurrency setting lower (e.g. 2-3), and optimizing its initial size (= make smaller).
I don't know where that memory consumption is coming from, but maybe it has something to do with maintaining striped locks. If you lower the concurrency setting, it will have less.
If you want good performance with out-of-the-box thread safety, ConcurrentHashMap is really nice.
Well, there's a spruced-up Colt in Apache Mahout. It's still not in the current business. What's wrong with protecting the code with a synchronized block? Are you expecting some devilishly complex scheme that hold locks for smaller granularity than put or get?
If you can code one, please contribute it to Mahout.
It's worth taking a look at the persistent hash maps in Clojure.
These are immutable, thread safe data structures with performance comparable to classic Java HashMaps. You'd obviously need to wrap them if you want a mutable map, but that shouldn't be difficult.
http://clojure.org/data_structures
Related
Is there a thread-safe implementation of a tree in Java? I have found a bit of information that recommends using synchronized() around the add and remove methods, but I interested in seeing if there is anything built into Java.
Edit: I am trying to use an Octree. Just learning as I go, but I am using this project to learn both multi-threading and spatial indexing so there are lots of new topics for me here. If anyone has some particularly good reference material please do share.
From the documentation for TreeMap:
SortedMap m = Collections.synchronizedSortedMap(new TreeMap(...));
Note that this only makes each call synchronized. In many cases this is the wrong granularity for an application and you are better off synchronizing at a higher level. See the docs for synchronizedSortedMap.
You can use Collections.synchronizedSet() or synchronizedMap() to add the synchronization around individual methods, but thread safety isn't really a property of a data structue but of an application. The wrapper will not be sufficient if you iterate over the tree, or do series of operations that need to be atomic.
A java.util.concurrent.ConcurrentSkipListMap might be of interest. This is overkill for most uses, but if you need fine-grained synchronization there's nothing like it. And overkill beats underkill. Of course, it's not a tree, but does the same job. I do not believe you can get low-level synchronization in a real tree.
I'm making a game in Java. Every enemy in the game is a thread and they are constantly looping through the game's data structures (I always use the class Vector).
Lately I have been getting the "ConcurrentModificationException" because an element is being added/removed from a Vector while a Thread is looping through it. I know there are strategies to avoid the add/remove problem (I actually use some to avoid problems with Removes but I still have problems with "Adds").
I heard that java supports a Vector/List that avoids the ConcurrentModificationException.
Do you have any idea of what this structure might be?
Thanks.
Check out java.util.concurrent, it has what you're looking for.
CopyOnWriteArrayList. But read its javadocs carefully and consider if in practice it gives the behavior that you are expecting (check Memory Consistence effects), plus if the performance overhead is worth it. Besides synchronization with ReentrantReadWriteLock, AtomicReferences, and Collections.synchronizedList may help you.
i have to write a simple vector/matrix library for a small geometry related project i'm working on. here's what i'm wondering.
when doing mathematical operations on vectors in a java environment, is it better practice to return a new instance of a vector or modify the state of the original.
i've seen it back and forth and would just like to get a majority input.
certain people say that the vectors should be immutable and static methods should be used to create new ones, others say that they should be mutable and normal methods should be used to modify their state. i've seen it in some cases where the object is immutable and normal methods are called which returns a new vector from the object without changing the state - this seems a little off to me.
i would just like to get a feel for if there is any best practice for this - i imagine it's something that's been done a million times and am really just wondering if there's a standard way to do this.
i noticed the apache commons math library returns a new vector every time from the original.
How important is performance going to be? Is vector arithmetic going to be a large component so that it affects the performance of overall system?
If it is not and there is going to be lot of concurrency then immutable vectors will be useful because they reduce concurrency issues.
If there are lot of mutations on vectors then the overhead of new objects that immutable vectors will require will become significant and it may be better to have mutable vectors and do the concurrency the hard way.
It depends. Generally speaking, immutability is better.
First and foremost, it is automatically threadsafe. It is easier to maintain and test.
That said, sometimes you need speed where creating new instances will take too much time.
(Note: If you're not 100% positive you need that amount of speed, you don't need it. Think high-frequency trading and real-time math-intensive applications. And even though, you should go simple first, and optimize later.)
As for static vs normal methods, following good OOP principles, you shouldn't have static methods. To create new vectors/matrices you can use the constructor.
Next, what's your backing structure? Your best bet is probably single-dimensional arrays of doubles for vectors and multi-dimensional arrays of doubles for matrices. This at least lets you stay relatively quick by using primitive objects.
If you get to the point that you need even more performance, you can add modifiers on your Vector/Matrix that can change the backing data. You could even decide that the dimensions are immutable but the contents are mutable which would give you some other safeties as well.
The JDK ships with CopyOnWrite* implementations for Set and List, but none for Map and I've often lamented this fact. I know there are other collections implementations out there that have them, but it would be nice if one shipped as standard. It seems like an obvious omission and I'm wondering if there was a good reason for it. Anyone any idea why this was left out?
I guess this depends on your use case, but why would you need a CopyOnWriteMap when you already have a ConcurrentHashMap?
For a plain lookup table with many readers and only one or few updates it is a good fit.
Compared to a copy on write collection:
Read concurrency:
Equal to a copy on write collection. Several readers can retrieve elements from the map concurrently in a lock-free fashion.
Write concurrency:
Better concurrency than the copy on write collections that basically serialize updates (one update at a time). Using a concurrent hash map you have a good chance of doing several updates concurrently. If your hash keys are evenly distributed.
If you do want to have the effect of a copy on write map, you can always initialize a ConcurrentHashMap with a concurrency level of 1.
The easiest implementation of a set would usually be to use an underlying map. They even have a Collections.newSetFromMap() method [maybe from 1.6 only].
What they should have done was have a CopyOnWriteMap and the CopyOnWriteSet being equivalent to Collections.newSetFromMap(new CopyOnWriteMap()).
But as you can see the CopyOnWriteArraySet is actually backed by an array not a map. And wouldn't Collections.newSetFromMap(ConcurrentHashMap()) be acceptable for your usecase?
I just saw this data-structure on the Java 6 API and I'm curious about when it would be an useful resource. I'm studying for the scjp exam and I don't see it covered on Kathy Sierra's book, even though I've seen mock exam questions that mention it.
ConcurrentSkipListSet and ConcurrentSkipListMap are useful when you need a sorted container that will be accessed by multiple threads. These are essentially the equivalents of TreeMap and TreeSet for concurrent code.
The implementation for JDK 6 is based on High Performance Dynamic Lock-Free Hash Tables and List-Based Sets by Maged Michael at IBM, which shows that you can implement a lot of operations on skip lists atomically using compare and swap (CAS) operations. These are lock-free, so you don't have to worry about the overhead of synchronized (for most operations) when you use these classes.
There's currently no Red-Black tree based concurrent Map/Set implementation in Java. I looked through the literature a bit and found a couple papers that showed concurrent RB trees outperforming skip lists, but a lot of these tests were done with transactional memory, which isn't supported in hardware on any major architectures at the moment.
I'm assuming the JDK guys went with a skip list here because the implementation was well-known and because making it lock-free was simple and portable (using CAS). If anyone cares to clarify, please do. I'm curious.
skip lists are sorted lists, and efficient to modify with log(n) performance. in that regard it's like TreeSet. however there is no ConcurrentTreeSet. what I heard is that skip list is very easy to implement, that's probably why.
Anyway, when you need a concurrent, sorted and efficient set, you can use ConcurrentSkipListSet
These are useful when you need a set that can safely be accessed by multiple threads simultaneously. It also provides decent performance by being weakly consistent -- inserts can be made safely while you're iterating through the Set, but there's no guarantee that your Iterator will see that insert.
ConcurrentSkipListMap was a fantastic find when I needed to implement a replication layer for a home-grown cache. The Map aspects implemented the cache, and the underlying List aspects let me keep track of the order in which objects appeared in the cache. The "skip" aspect of that list made it efficient to remove an object from one spot in the list and bump it to the end when it was replaced in the cache.