java nested Map data structure read and write operations

java nested Map data structure read and write operations - java

I have a data structure as:-
Map<String,Map<String,List<CustomPOJO>>>
The frequency of read operations on this data structure will be too high and write operations will also be there but not many.
As far as read is concerned I guess there is no issue by using simple java.util.HashMap API.
For write operations there can be two approaches:-
Put entire data in ConcurrentHashMap and use it to write data into it.
Perform all write operations in a synchronized block/method and use simple java.util.HashMap API.
Please suggest which one would be better for write operation and also suggest whether there can be any loophole in read operation.

Firstly, how predictable is the outer Map's key string value? If that's all predictable at design time, I would rather turn that into an Enum and uses an EnumMap to hold the outer Map. The same applies for the inner map, too. In that case your question turns into
EnumMap<Enum, EnumMap<Enum, List<POJO>>>
and is perfectly resolved.
Secondly, since you are using a map of map structure and using in an env where performance matters, I would assume the number of keys in the outer map << the number of total POJOs inside the entire structure. That's to say, the chance you add a new submap to the whole structure is very small. In this case a ReadWriteLock is best on the outer Map; For the inner map you could consider either ReadWriteLock or ConcurrentHashMap.
There are 3 major design considerations for the ConcurrentHashMap:
That it generates a lot of temp objects. So if your application is GC-sensitive you want to limit its usage.
That it allows maximum 16 concurrent threads operating it by default - but this is unlikely to be a concern.
That it's size() isn't constant time.
I would usually apply ReadWrite lock pattern or even atomic variable based implementation mainly when 1. turns out to be a problem. Otherwise I think ConcurrentHashMap does fine in most of circumstances.
Also, note that in the latest implementation of JDK, the Read/Write priority changed for the ReadWriteLock. If my memory is correct it seems to favor read operation than write; so in case you have too many reads your write thread might got thread starvation. In that case you might want your own read/write implementation.

I think this article best suites to your question there is explanation. Personally I think you should use ConcurrentHashMap because it allow reader to read without blocking all hashmap.
http://javarevisited.blogspot.com/2011/04/difference-between-concurrenthashmap.html

Related

Thread Safe get method in ConcurrentHashMap

I understand that Concurrent HashMap allows only a single thread at a time to update/write operation for "each segment". However multiple threads are allowed to read values from the map at the same time.
For my project, I want to extend this functionality such that while getting a value from a particular segment, no update/write operations should take place in that segment until read is completed.
Any ideas to achieve this?

Just to elaborate on the problem I'm facing right now. After reading a value from the map I perform certain update operations which are strongly dependent on that read value. Thus if a separate thread updates a key value and another threads get() fails to get the most recently updated values, this will lead to a big mess. So in this case extending would be a good idea?
My gut says no. Extending ConcurrentHashMap does not sound like a good idea.
One of the most valuable design principles to which you can adhere is called "Separation of Concerns." The main "concern" of a HashMap is to store key/value pairs. Sounds like maintaining consistent relationships between certain data in your program is another concern.
Don't try to address both concerns with a single class. I would create a higher-level class to take care of maintaining the consistent relationships (maybe by using Lock objects), and I would use a plain HashMap or ConcurrentHashMap to store the key/value pairs.

Extend the ConcurrentHashMap class, and implement the getValue() method by including a synchronized block, so that no access is allowed to other threads until the read operation is completed.

Informally, you can think of a Map as an set of "variables", each "variable" is addressed by a key (instead of a static name of an ordinary variable).
(An array is formally a list of variables, each addressed by an integer index.)
In HashMap, these "variables" are like "plain" variables; if you access a "variable" concurrently, things may go wrong (just like ordinary non-volatile variables)
In ConcurrentHashMap, these "variables" have volatile semantics. Therefore it is "more" safe to use concurrently. For example, a write will be visible to the "subsequent" read.
Of course, volatile is not enough sometimes; for example, we know we cannot use a volatile int for atomic increments (without locking). We need new devices, like AtomicInteger, for atomic operations.
Fortunately, in Java 8, new atomic methods are added to ConcurrentHashMap, so that now we can operate on these "variables" atomically. See if the compute() method may fit your use case.

Vector vs SynchronizedList performance

While reading Oracle tutorial on collections implementations, i found the following sentence :
If you need synchronization, a Vector will be slightly faster than an ArrayList synchronized with Collections.synchronizedList
source : List Implementations
but when searching for difference between them, many people discourage the using of Vector and should be replaced by SynchronizedList when the synchronization is needed.
So which side has right to be followed ?

When you use Collections.synchronizedList(new ArrayList<>()) you are separating the two implementation details. It’s clear, how you could change the underlying storage model from “array based” to, e.g. “linked nodes”, by simply replacing new ArrayList with new LinkedList without changing the synchronized decoration.
The statement that Vector “will be slightly faster” seems to be based on the fact that its use does not bear a delegation between the wrapper and the underlying storage, but deriving statements about the performance from that was even questionable by the time, when ArrayList and the synchronizedList wrapper were introduced.
It should be noted that when you are really concerned about the performance of a list accessed by multiple threads, you will use neither of these two alternatives. The idea of making a storage thread safe by making all access methods synchronized is flawed right from the start. Every operation that involves multiple access to the list, e.g. simple constructs like if(!list.contains(o)) list.add(o); or iterating over the list or even a simple Collections.swap(list, i, j); require additional manual synchronization to work correctly in a multi-threaded setup.
If you think it over, you will realize that most operations of a real life application consist of multiple access and therefore will require careful manual locking and the fact that every low level access method synchronizes additionally can not only take away performance, it’s also a disguise pretending a safety that isn’t there.

Vector is an old API, of course. No questions there.
For speed though, it might be only because the synchronized list involves extra method calls to reach and return the data since it is a "wrapper" on top of a list after all. That's all there is to it, imho.

Using ConcurrentHashMap

I'm new to threading in Java and I need to access data structure from few active threads. I've heard that java.util.concurrent.ConcurrentHashMap is threading-friendly. Do I need to use synchronized(map){}
while accessing ConcurrentHashMap or it will handle locks itself?

It handles the locks itself, and in fact you have no access to them (there is no other option)
You can use synchronized in special cases for writes, but it is very rare that you should need to do this. e.g. if you need to implement your own putIfAbsent because the cost of creating an object is high.
Using syncrhonized for reads would defeat the purpose of using the concurrent collection.

ConcurrentHashMap is suited only to the cases where you don't need any more atomicity than provided out-of-the-box. If for example you need to get a value, do something with it, and then set a new value, all in an atomic operation, this cannot be achieved without external locking.
In all such cases nothing can replace explicit locks in your code and it is nothing but waste to use this implementation instead of the basic HashMap.

Short answer: no you don't need to use synchronized(map).
Long answer:
all the operations provided by ConcurrentHashMap are thread safe and you can call them without worrying about locking
however, if you need some operations to be atomic in your code, you will still need some sort of locking at the client side

No, you don't need, but if you need to depend on internal synchronization, you should use Collections.synchronizedMap instead. From the javadoc of ConcurrentHashMap:
This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details.
Actually it won't synchronize on the whole data structure but on subparts (some buckets) of it.
This implies that ConcurrentHashMap's iterators are weakly consistent and the size of the map can be inaccurate. (But on the other hand it's put and get operations are still consistent and the throughput is higher)

There is one more important feature to note for concurrenthmp other than the concurrency feature it provides, which is fail safe iterator. Use CHMP just because they want to edit the entryset for put/remove while iteration.
Collections.synchronizedMap(Map) is other one. But ConcurrentModificationException may come in the above case.

Specific usage of Hashtable over ConcurrentHashMap

ConcurrentHashMap was introduced in 1.5 as a part java java.util.concurrent package. Before that the only way to have a threadsafe map was to use HashTable or Collections.synchronizedMap(Map).
For all the practical purpose (multithread environment),ConcurrentHashMap is sufficient to address the needs except one case wherein a thread needs a uniform view of the map.
My question is, apart from having a Uniform View of the map, are there any other scenarios wherein ConcurrentHashMap is not an option ?

The usage of Hashtable has been discouraged since Java 1.2 and the utility of synchronizedMap is quite limited and almost always ends up being insufficient due to the too-fine granularity of locking. However, when you do have a scenario where individual updates are the grain size you need, ConcurrentHashMap is a no-brainer better choice over synchronizedMap. It has better concurrency, thread-safe iterators (no, synchronizedMap doesn't have those—this is due to its design as a wrapper around a non-thread-safe map), better overall performance, and very little extra memory weight to pay for it all.

This is a stretch but I will give it as a use case.
If you needed a thread-safe Map implementation which you can do some extra compound operation on which isn't available via ConcurrentMap. Let's say you want to ensure two other objects don't exist before adding a third.
Hashtable t = new Hashtable();
synchronized(t){
if(!t.contains(object1) && !t.contains(object2)){
t.put(object3,object3);
}
}
Again this is a stretch, but you would not be able to achieve this with a CHM while ensuring atomicity and thread-safety. Because all operations of a Hashtable and its synchronizedMap counter part synchronize on the instance of the Map this ensures thread-safety.
At the end of the day I would seldom, if ever, use a synchronizedMap/Hashtable and I suggest you should do the same.

As far as I understand, ConcurrentMap is a replacement of HashTable and Collections.synchronizedMap() for thread-safe purposes. A usage of that all classes is discouraged. Thus, the answer to your question is "no, there are no other scenarios".
See also: What's the difference between ConcurrentHashMap and Collections.synchronizedMap(Map)?

What are synchronizing strategies for Prevayler?

Prevayler guarantees that all the writes ( through its transactions) are synchronized. But what about reads?
Is it right that dirty reads are possible if no explicit synchronizing is used (in user code)?
Are they possible if a business object is read as:
// get the 3rd account
Accont account = (Bank)prevayler.prevalentSystem().getAccounts().get(2);
?
If so what synchronizing strategies are good for a user code?
(Consider a business object A contains a collection of business objects Bs),
using a synchronized collection (of Bs inside of A), for example
from java.util.concurrent package?
synchronize collection reads outside transactions with the
collection writes inside transactions, for example using
"synchronized( collection )" code around reads and writes?

The recommended way is to use JMatch Query and Prevayler.execute(Query). Either directly or by using subclassing.
The returned results must be either primitive values or immutable objects. If you plan to return mutable objects you should subclass JMatch Query to do these deep copies. This way you get a system that locks every sensible read with other (sensible) reads and writes. This can speed up and simplify development, especially for developers without multithreaded programming expirience.
If you need more performance under high concurrent load, which is supposed to be a rare case, you indeed can use described above fine grained locking - using "synchronized" and java.util.concurrent.
See this discussion for more details.

It's been a really long time since I looked at Prevayler (I used it in a POC project about 6 or 7 years ago). I am pretty sure that if doing all your reads and writes through Prevayler no further synchronization is required - certainly I didn't need to in what I did, and that had multiple threads using the datastore.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.