Blocking Or NonBlocking - Adding Element during Rehashing in HashMap in Java

Blocking Or NonBlocking - Adding Element during Rehashing in HashMap in Java - java

As given in documentation of HashMap when HashMap is full by 75%, HashMap internally performs rehashing of all existing objects.
If while performing rehashing, any element is added ->
Do we have blocking behavior of HashMap ? - Means rehashing will finish first, then element will be added.
Or
Do we have non-blocking behavior of HashMap - Means rehashing will allow adding of element in between rehashing process.
How does HashMap handles adding new element while rehashing is going on ?

From the Javadoc:
Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
You must apply external synchronization; otherwise the state of the map may get corrupted in the face of access by multiple threads.
If you are synchronizing, no other element can be added while the map is rearranging itself.
If you are not synchronizing, you are not using the class as documented, so the behaviour is undefined.

It's not blocking, but it won't "allow adding of element in between rehashing process' either. java.util.HashMap is documented to be thread unsafe. If you try to add or remove on the map while it's rehashing, you will get inconsistent behaviors.
You might want to consider using java.util.concurrent.ConcurrentHashMap.

Related

Java ConcurrentLinkedQueue instead of List?

I have a worker thread which should iterate over an ArrayList<ConcurrentLinkedQueue>. Other threads can add and remove objects (queues). But ArrayList is not thread-safe. Would it be fine to use ConcurrentLinkedQueue<ConcurrentLinkedQueue> instead of ArrayList<ConcurrentLinkedQueue>?

If you are asking if you can iterate a ConcurrentLinkedQueue safely, then the answer is Yes. The javadoc says:
Iterators are weakly consistent, returning elements reflecting the state of the queue at some point at or since the creation of the iterator. They do not throw ConcurrentModificationException, and may proceed concurrently with other operations. Elements contained in the queue since the creation of the iterator will be returned exactly once.
However, there are things that you can do on a List that you cannot do on a Queue (e.g. positional get / set, insertion / removal of arbitrary elements.) If your application needs to do those things or similar, then using ConcurrentLinkedQueue instead of ArrayList won't work.
Also, beware that ConcurrentLinkedQueue.size() is an O(N) operation!

Understanding concurrentHashMap

As known, the ConcurrenthashMap class allows us to use iterators safely. As far as I understood from the sources of the Map it's achieved by storing the current Map state into the iterator itself. Here is the inner class representing the iterator (There's a child that is created when iterator()'s called):
abstract class HashIterator {
int nextSegmentIndex;
int nextTableIndex;
HashEntry<K,V>[] currentTable;
HashEntry<K, V> nextEntry;
HashEntry<K, V> lastReturned;
//Methods and ctor
}
But what if some thread writes to the Map something during construction of the iterator? Do we get non-determenistic state of the map then?
The thing is neither of the methods of the Map are synchronized. There's a ReentrantLock for put method, but that's it (as far as I could find). So, I don't understand how the iterator can support a correct state even if some thread writes to the map during its construction?.

The Iterator offers a weakly consistent state. It doesn't offer a transactional view of the data. It only offers that you will see all the keys/values if it is not altered and if it is, you may or may not see that change, but you won't get an error.

From the java doc of ConcurrentHashMap:
Retrieval operations (including get) generally do not block, so may
overlap with update operations (including put and remove). Retrievals
reflect the results of the most recently completed update operations
holding upon their onset. For aggregate operations such as putAll and
clear, concurrent retrievals may reflect insertion or removal of only
some entries. Similarly, Iterators and Enumerations return elements
reflecting the state of the hash table at some point at or since the
creation of the iterator/enumeration. They do not throw
ConcurrentModificationException. However, iterators are designed to be
used by only one thread at a time.
Now answering the questions.
But what if some thread writes to the Map something during
construction of the iterator?
As mentioned, an iterator represents the state at some point of time. So it may not be the most recent state.
how the iterator can support a correct state even if some thread
writes to the map during its construction?
The guarantee is that things will not break if you put/remove during iteration. However, there is no guarantee that one thread will see the changes to the map that the other thread performs (without obtaining a new iterator from the map). The iterator is guaranteed to reflect the state of the map at the time of it's creation. Futher changes may be reflected in the iterator, but they do not have to be.

Thread safe container for <key, value> pair

I need a container that contains [key, Value] pair.
Here, key = Integer, Value = User Defined class object.
Mutiple threads are trying to add [key, Value] pair in above container.
If key already present in the container, I want to update the value by checking some condition.
At the end I want container in sorted order, according to Key.
My efforts -
I used this synchronizedSortedMap and Sorted Map for above task.
SortedMap<Integer, USER_DEFINED_OBJECT> m = Collections.synchronizedSortedMap(new TreeMap<Integer, USER_DEFINED_OBJECT>());
This helps me to add pairs concurrently on above container.
And, yes If key already present, then I check some condition, then proceed.
Is my approach always thread safe ? If not, please correct me.
Updated
USER_DEFINED_OBJECT has some field index.
At the time of adding, I am checking if key is already present, then compare current USER_DEFINED_OBJECT with already present USER_DEFINED_OBJECT on the basis of above mentioned(in point 1) filed "index". If currect "index" is greater than update.

Use ConcurrentHashMap from java.util package, read the API ConcurrentHashMap

java.util.concurrent.ConcurrentSkipListMap
A scalable concurrent ConcurrentNavigableMap implementation. The map is sorted according to the natural ordering of its keys, or by a Comparator provided at map creation time, depending on which constructor is used.
This class implements a concurrent variant of SkipLists providing expected average log(n) time cost for the containsKey, get, put and remove operations and their variants. Insertion, removal, update, and access operations safely execute concurrently by multiple threads. Iterators are weakly consistent, returning elements reflecting the state of the map at some point at or since the creation of the iterator. They do not throw ConcurrentModificationException, and may proceed concurrently with other operations. Ascending key ordered views and their iterators are faster than descending ones.

The concurrent collections let you call methods like put, remove etc. in kind of transaction. Therefore it's thread safe.
From what I understood your scenario for adding new [key, value] pair is as follows:
Check whether the mapping already exists
If not, just add it
If yes, update the existing value in the mapping based on some check
I doubt there is an implementation in place which does this for you in thread-safe way. In the case I understood your use-case correctly you will need to add some manual synchronization on your own to make the update steps transactional.

Synchronizing LinkedHashmap externally

What is the best way to implement synchronization of a linkedhashmap externally, without using Collections.synchronizedMap
When Collections.synchronizedMap is used entire datastructure is locked, so performance is hugely impacted in a bad way.
What is the best way to lock only required part of datastructure. e.g. If thread is accessing key (K1), it should lock only Key(K1) and Value(v1) part of the datastructure

You can't get a fine-grained-locking, FIFO-eviction concurrent map from the built-in Java implementations.
Check out Guava's Cache or the open-source ConcurrentLinkedHashMap project.

I think you may want to synchronize the subsequent operation you do, just on the value coming from the map:
Object value = map.get(key);
synchronized(value) {
doSomethingWith(value);
}
Synchronizing to values get from the Map, makes sense, since they can be shared and accessed concurrently; the example I posted above should do what you need. That should be enough.
By the way you can also synchronize on the key doing two nested synchronized blocks:
synchronized(key) {
Object value = map.get(key);
synchronized(value) {
doSomethingWith(value);
}
}
The key is -usually- just used to access the object (by hashing). Keys are matched by hash value, so it doesn't make full sense to me to synchronize over the key.
Or, maybe you can subclass ConcurrentHashMap adding what is missing from LinkedHashMap.

Louis Wasserman's suggestion is probably the best because it gives you a lot of useful functionality. However, even if you lock on the entire map, you have to be hitting it really, really hard to make that a bottleneck (as in, your code is mostly doing read/write on the map). If you don't need the additional functionality of Guava's Cache, a synchronized map could be simpler & better. You could also use a ReadWriteLock if you mostly read from the map.

Best option would be to use java.util.concurrent.ConcurrentHashMap .
I can't see how it would be possible to externally lock only parts of zour Map, since you cannot control what shared datastructures are accessed internally by a call to any of the maps function.

If you don't need a LinkedHaspMap, use a ConcurrentHashMap from the java.util.concurrent package.
It is specifically designed for both speed and thread safety. It uses the minimal possible locking to achieve its thread safety.

An insertion in a HashMap, or LinkedHashMap, can cause a rehash because it increases the ratio between the size and the number of buckets. Having two or more threads rehash simultaneously would be a disaster.
Even if you are only doing a get, another thread may be removing an entry from the same bucket, so you are scanning a linked list that is being modified under you. You could also have two or more threads appending to the main linked list at the same time.
If you can do without the linking, use java.util.concurrent.ConcurrentHashMap, as already suggested.

is there any Concurrent LinkedHashSet in JDK6.0 or other libraries?

my code throw follow exception:
java.util.ConcurrentModificationException
at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:761)
at java.util.LinkedList$ListItr.next(LinkedList.java:696)
at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
at java.util.LinkedHashSet.<init>(LinkedHashSet.java:152)
...
I want a ConcurrentLinkedHashSet to fix it,
but I only found ConcurrentSkipListSet in java.util.concurrent,this is TreeSet, not LinkedHashSet
any easies way to get ConcurrentLinkedHashSet in JDK6.0?
thanks for help :)

A ConcurrentModificationException has nothing to do with concurrency in the form you're thinking of. This just means that while iterating over the Collection, someone (probably your own code - that happens often enough ;) ) is changing it, i.e. adding/removing some values.
Make sure you're using the Iterator to remove values from the collection and not the collection itself.
Edit: If really another thread is accessing the Collection at the same time, the weak synchronization you get from the standard library is useless anyhow, since you've got to block the Collection for the whole duration of the operation not just for one add/remove! I.e. something like
synchronize(collection) {
// do stuff here
}

You can always create a synchronized collection with Collections.synchronizedMap(myMap);. However, trying to alter the map while you're iterating (which I'm assuming is the cause of your error) will still be a problem.
From the docs for synchronizedMap:
Returns a synchronized (thread-safe) map backed by the specified map. In order to guarantee serial access, it is critical that all access to the backing map is accomplished through the returned map.
It is imperative that the user
manually synchronize on the returned
map when iterating over any of its
collection views ... Failure to follow
this advice may result in
non-deterministic behavior.
This is because
normally a concurrent collection is really guaranteeing atomic get/put but is not locking the entire collection during iteration, which would be too slow. There's no concurrency guarantee over iteration, which is actually many operations against the map.
it's not really concurrency if you're altering during iteration, as it's impossible to determine correct behavior - for example, how do you reconcile your iterator returning hasNext == true with deleting a (possibly the next value) from the collection?

There is ConcurrentLinkedHashMap - https://code.google.com/p/concurrentlinkedhashmap/
You can create Set out of it with java.util.Collections.newSetFromMap(map)

Unfortunately not. You could implement your own, wrapping a ConcurrentHashMap and a ConcurrentLinkedQueue, but this wouldn't allow you to remove values easily (removal would be O(N), since you'd have to iterate through everything in the queue) ...
What are you using the LinkedHashSet for though? Might be able to suggest alternatives...

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Blocking Or NonBlocking - Adding Element during Rehashing in HashMap in Java - java

Related

Java ConcurrentLinkedQueue instead of List?

Understanding concurrentHashMap

Thread safe container for <key, value> pair

Synchronizing LinkedHashmap externally

is there any Concurrent LinkedHashSet in JDK6.0 or other libraries?

Categories

Resources