I understand that Concurrent HashMap allows only a single thread at a time to update/write operation for "each segment". However multiple threads are allowed to read values from the map at the same time.
For my project, I want to extend this functionality such that while getting a value from a particular segment, no update/write operations should take place in that segment until read is completed.
Any ideas to achieve this?
Just to elaborate on the problem I'm facing right now. After reading a value from the map I perform certain update operations which are strongly dependent on that read value. Thus if a separate thread updates a key value and another threads get() fails to get the most recently updated values, this will lead to a big mess. So in this case extending would be a good idea?
My gut says no. Extending ConcurrentHashMap does not sound like a good idea.
One of the most valuable design principles to which you can adhere is called "Separation of Concerns." The main "concern" of a HashMap is to store key/value pairs. Sounds like maintaining consistent relationships between certain data in your program is another concern.
Don't try to address both concerns with a single class. I would create a higher-level class to take care of maintaining the consistent relationships (maybe by using Lock objects), and I would use a plain HashMap or ConcurrentHashMap to store the key/value pairs.
Extend the ConcurrentHashMap class, and implement the getValue() method by including a synchronized block, so that no access is allowed to other threads until the read operation is completed.
Informally, you can think of a Map as an set of "variables", each "variable" is addressed by a key (instead of a static name of an ordinary variable).
(An array is formally a list of variables, each addressed by an integer index.)
In HashMap, these "variables" are like "plain" variables; if you access a "variable" concurrently, things may go wrong (just like ordinary non-volatile variables)
In ConcurrentHashMap, these "variables" have volatile semantics. Therefore it is "more" safe to use concurrently. For example, a write will be visible to the "subsequent" read.
Of course, volatile is not enough sometimes; for example, we know we cannot use a volatile int for atomic increments (without locking). We need new devices, like AtomicInteger, for atomic operations.
Fortunately, in Java 8, new atomic methods are added to ConcurrentHashMap, so that now we can operate on these "variables" atomically. See if the compute() method may fit your use case.
Related
I have a data structure as:-
Map<String,Map<String,List<CustomPOJO>>>
The frequency of read operations on this data structure will be too high and write operations will also be there but not many.
As far as read is concerned I guess there is no issue by using simple java.util.HashMap API.
For write operations there can be two approaches:-
Put entire data in ConcurrentHashMap and use it to write data into it.
Perform all write operations in a synchronized block/method and use simple java.util.HashMap API.
Please suggest which one would be better for write operation and also suggest whether there can be any loophole in read operation.
Firstly, how predictable is the outer Map's key string value? If that's all predictable at design time, I would rather turn that into an Enum and uses an EnumMap to hold the outer Map. The same applies for the inner map, too. In that case your question turns into
EnumMap<Enum, EnumMap<Enum, List<POJO>>>
and is perfectly resolved.
Secondly, since you are using a map of map structure and using in an env where performance matters, I would assume the number of keys in the outer map << the number of total POJOs inside the entire structure. That's to say, the chance you add a new submap to the whole structure is very small. In this case a ReadWriteLock is best on the outer Map; For the inner map you could consider either ReadWriteLock or ConcurrentHashMap.
There are 3 major design considerations for the ConcurrentHashMap:
That it generates a lot of temp objects. So if your application is GC-sensitive you want to limit its usage.
That it allows maximum 16 concurrent threads operating it by default - but this is unlikely to be a concern.
That it's size() isn't constant time.
I would usually apply ReadWrite lock pattern or even atomic variable based implementation mainly when 1. turns out to be a problem. Otherwise I think ConcurrentHashMap does fine in most of circumstances.
Also, note that in the latest implementation of JDK, the Read/Write priority changed for the ReadWriteLock. If my memory is correct it seems to favor read operation than write; so in case you have too many reads your write thread might got thread starvation. In that case you might want your own read/write implementation.
I think this article best suites to your question there is explanation. Personally I think you should use ConcurrentHashMap because it allow reader to read without blocking all hashmap.
http://javarevisited.blogspot.com/2011/04/difference-between-concurrenthashmap.html
I'm new to threading in Java and I need to access data structure from few active threads. I've heard that java.util.concurrent.ConcurrentHashMap is threading-friendly. Do I need to use synchronized(map){}
while accessing ConcurrentHashMap or it will handle locks itself?
It handles the locks itself, and in fact you have no access to them (there is no other option)
You can use synchronized in special cases for writes, but it is very rare that you should need to do this. e.g. if you need to implement your own putIfAbsent because the cost of creating an object is high.
Using syncrhonized for reads would defeat the purpose of using the concurrent collection.
ConcurrentHashMap is suited only to the cases where you don't need any more atomicity than provided out-of-the-box. If for example you need to get a value, do something with it, and then set a new value, all in an atomic operation, this cannot be achieved without external locking.
In all such cases nothing can replace explicit locks in your code and it is nothing but waste to use this implementation instead of the basic HashMap.
Short answer: no you don't need to use synchronized(map).
Long answer:
all the operations provided by ConcurrentHashMap are thread safe and you can call them without worrying about locking
however, if you need some operations to be atomic in your code, you will still need some sort of locking at the client side
No, you don't need, but if you need to depend on internal synchronization, you should use Collections.synchronizedMap instead. From the javadoc of ConcurrentHashMap:
This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details.
Actually it won't synchronize on the whole data structure but on subparts (some buckets) of it.
This implies that ConcurrentHashMap's iterators are weakly consistent and the size of the map can be inaccurate. (But on the other hand it's put and get operations are still consistent and the throughput is higher)
There is one more important feature to note for concurrenthmp other than the concurrency feature it provides, which is fail safe iterator. Use CHMP just because they want to edit the entryset for put/remove while iteration.
Collections.synchronizedMap(Map) is other one. But ConcurrentModificationException may come in the above case.
I know using volatile keyword in Java we get some kind of weak synchronization (It allows visibility updates but do not provide actual locking). Is there any situation where volatile should be given preference over actual locking in implementing concurrent programs. A somewhat similar question is there on SO which says volatile as a synchronization mechanism but that was tagged to C#.
If the shared state consists in a single field, and you don't use any get-and-set construct (like i++ for example) to assign it, then volatile is good enough. Most of the volatile usages can be replaced by the use of AtomicXxx types, though (which provide atomic get-and-set operations).
In short, you should prefer to avoid locks wherever they are not necessary since locks expose your program to deadlocks and deter performance by excluding concurrency from critical parts of code. So, whenever the situation permits, by all means rely on volatile; if all you additionally need is atomic two-step operations like compare-and-swap, use AtomicReference. Fall back to synchronized only for the scenarios where this is the only option. For example, if you need to lazily initialize a heavy object, you'll need locks to prevent double initialization—but again, not to fetch the already initialized instance (double-check idiom).
Volatile guarantees that all threads will see the last write of a variable by any other thread, that's it. There's no synchronization involved. If you synchronize both read and write method of an instance variable, then you don't have to make that variable volatile (all threads will see the most recent write).
If I need atomic access to an int field inside an object, is it sufficient to declare the field as an AtomicInteger or do I need to use an AtomicIntegerFieldUpdater? (and why?)
Using an AtomicInteger is sufficient. Atomic updaters are for use with volatile fields; the primary use case is data structures which have large numbers of fields that require atomic access; you use the field updater to use those fields with atomic semantics without having an AtomicInteger reference for each field.
For a detailed discussion, see this link.
AtomicInteger and friends should usually be sufficient, and is generally preferable as it does not involve reflection or other such hackery.
AtomicIntegerFieldUpdater can be useful where you have lots instances where the same needs to be updated, as this reduces the total number of objects. It's particularly useful if operations other than straight reading and writing are infrequent. For instance an AtomicReferenceFieldUpdater is used in java.nio for the attach method, which is generally set once (exposed as a get-and-set) and read many times.
In addition to biziclop's comment (see link):
Are java primitive ints atomic by design or by accident?
Just in case you've not came across this already.
I am working on my first mutlithreaded program and got stuck about a couple of aspects of synchronization. I have gone over the multi-threading tutorial on oracle/sun homepage, as well as a number of questions here on SO, so I believe I have an idea of what synchronization is. However, as I mentioned there are a couple of aspects I am not quite sure how to figure out. I formulated them below in form of clear-cut question:
Question 1: I have a singleton class that holds methods for checking valid identifiers. It turns out this class needs to hold to collections to keep track of associations between 2 different identifier types. (If the word identifier sounds complicated; these are just strings). I chose to implement two MultiValueMap instances to implement this many-to-many relationship. I am not sure if these collections have to be thread-safe as the collection will be updated only at the creation of the instance of the singleton class but nevertheless I noticed that in the documentation it says:
Note that MultiValueMap is not synchronized and is not thread-safe. If you wish to use this map from multiple threads concurrently, you must use appropriate synchronization. This class may throw exceptions when accessed by concurrent threads without synchronization.
Could anyone elaborate on this "appropriate synchronization"? What exactly does it mean? I can't really use MultiValueMap.decorate() on a synchronized HashMap, or have I misunderstood something?
Question 2: I have another class that extends a HashMap to hold my experimental values, that are parsed in when the software starts. This class is meant to provide appropriate methods for my analysis, such as permutation(), randomization(), filtering(criteria) etc. Since I want to protect my data as much as possible, the class is created and updated once, and all the above mentioned methods return new collections. Again, I am not sure if this class needs to be thread-safe, as it's not supposed to be updated from multiple threads, but the methods will most certainly be called from a number of threads, and to be "safe" I have added synchronized modifier to all my methods. Can you foresee any problems with that? What kind of potential problems should I be aware of?
Thanks,
Answer 1: Your singleton class should not expose the collections it uses internally to other objects. Instead it should provide appropriate methods to expose the behaviours you want. For example, if your object has a Map in it, don't have a public or protected method to return that Map. Instead have a method that takes a key and returns the corresponding value in the Map (and optionally one that sets the value for the key). These methods can then be made thread safe if required.
NB even for collections that you do not intend to write to, I don't think you should assume that reads are necessarily thread safe unless they are documented to be so. The collection object might maintain some internal state that you don't see, but might get modified on reads.
Answer 2: Firstly, I don't think that inheritance is necessarily the correct thing to use here. I would have a class that provides your methods and has a HashMap as a private member. As long as your methods don't change the internal state of the object or the HashMap, they won't have to be synchronised.
It's hard to give general rules about synchronization, but your general understanding is right. A data-structure which is used in a read-only way, does not have to be synchronized. But, (1) you have to ensure that nobody (i.e. no other thread) can use this structure before it is properly initialized and (2) that the structure is indeed read-only. Remember, even iterators have a remove method.
To your second question: In order to ensure the immutability, i.e. that it is read-only, I would not inherit the HashMap but use it inside your class.
Synchronization commonly is needed when you either could have concurrent modifications of the underlying data or one thread modifies the data while another reads and needs to see that modification.
In your case, if I understand it correctly, the MultiValueMap is filled once upon creation and the just read. So unless reading the map would modify some internals it should be safe to read it from multiple threads without synchronization. The creation process should be synchronized or you should at least prevent read access during initialization (a simple flag might be sufficient).
The class you descibe in question 2 might not need to be synchronized if you always return new collections and no internals of the base collection are modified during creation of those "copies".
One additional note: be aware of the fact that the values in the collections might need to be synchronized as well, since if you safely get an object from the collection in multiple thread but then concurrently modify that object you'll still get problems.
So as a general rule of thumb: read-only access does not necessarily need synchronization (if the objects are not modified during those reads or if that doesn't matter), write access should generally be synchronized.
If your maps are populated once, at the time the class is loaded (i.e. in a static initializer block), and are never modified afterwards (i.e. no elements or associations are added / removed), you are fine. Static initialization is guaranteed to be performed in a thread safe manner by the JVM, and its results are visible to all threads. So in this case you most probably don't need any further synchronization.
If the maps are instance members (this is not clear to me from your description), but not modified after creation, I would say again you are most probably safe if you declare your members final (unless you publish the this object reference prematurely, i.e. pass it to the outside world from the cunstructor somehow before the constructor is finished).