I have to store words and their corresponding integer indices in a hash map. The hash map will be updated concurrently.
For example: lets say the wordList is {a,b,c,a,d,e,a,d,e,b}
The the hash map will contain the following key-value pairs
a:1
b:2
c:3
d:4
e:5
The code for this is as follows:
public class Dictionary {
private ConcurrentMap<String, Integer> wordToIndex;
private AtomicInteger maxIndex;
public Dictionary( int startFrom ) {
wordToIndex = new ConcurrentHashMap<String, Integer>();
this.maxIndex = new AtomicInteger(startFrom);
}
public void insertAndComputeIndices( List<String> words ) {
Integer index;
//iterate over the list of words
for ( String word : words ) {
// check if the word exists in the Map
// if it does not exist, increment the maxIndex and put it in the
// Map if it is still absent
// set the maxIndex to the newly inserted index
if (!wordToIndex.containsKey(word)) {
index = maxIndex.incrementAndGet();
index = wordToIndex.putIfAbsent(word, index);
if (index != null)
maxIndex.set(index);
}
}
}
My question is whether the above class is thread safe or not?
Basically an atomic operation in this case should be to increment the maxIndex and then put the word in the hash map if it is absent.
Is there a better way to achieve concurrency in this situation?
Clearly another thread can see maxIndex incrementing and then getting clobbered.
Assuming this is all that is going on to the map (in particular, no removes), then you could try putting the word in the map and only incrementing if that succeeds.
Integer oldIndex = wordToIndex.putIfAbsent(word, -1);
if (oldIndex == null) {
wordToIndex.put(word, maxIndex.incrementAndGet());
}
(Alternatively for a single put, use some sort of mutable type in place of Integer.)
No, it is not. If you have two methods A and B, both thread safe, this of course does not mean that calling A and B in a sequence is also thread safe, as a thread can interrupt another one between the function calls. This is what happens here:
if (!wordToIndex.containsKey(word)) {
index = maxIndex.incrementAndGet();
index = wordToIndex.putIfAbsent(word, index);
if (index != null)
maxIndex.set(index);
}
Thread A verifies that wordToIndex does not contain the word "dog" and proceeds inside the if. Before it can add the word "dog", thread B also finds that "dog" is not in the map (A did not add it yet) so it also proceeds inside the if. Now you have the word "dog" trying to be inserted twice.
Of course, putIfAbsent will guarantee that only one thread can add it, but I think that your goal is to not have two threads enter the if at the same time with the same key.
AtomicInteger is something you should consider using.
And you should wrap all the code that needs to happen as a transaction in a synchronized(this) block.
The other answers are correct --- there are non-thread-safe fields in your class. What you should do, to start, is make sure
how to implement the threading
1) I would make sure everything internal is private, although this is not a requirement of thread-safe code.
2) Find any of your accessor methods, make sure they are snychronized whenever the state of the global object is modified (OR AT LEAST THE IF BLOCK IS SYNCHRONIZED).
3) Test for deadlocks or bad counts, this can be implemented in a unit test by making sure the value of maxIndex is correct after 10000 threaded inserts, for example...
Related
Here is my Java code:
static Map<BigInteger, Integer> cache = new ConcurrentHashMap<>();
static Integer minFinder(BigInteger num) {
if (num.equals(BigInteger.ONE)) {
return 0;
}
if (num.mod(BigInteger.valueOf(2)).equals(BigInteger.ZERO)) {
//focus on stuff thats happening inside this block, since with given inputs it won't reach last return
return 1 + cache.computeIfAbsent(num.divide(BigInteger.valueOf(2)),
n -> minFinder(n));
}
return 1 + Math.min(cache.computeIfAbsent(num.subtract(BigInteger.ONE), n -> minFinder(n)),
cache.computeIfAbsent(num.add(BigInteger.ONE), n -> minFinder(n)));
}
I tried to memoize a function that returns a minimum number of actions such as division by 2, subtract by one or add one.
The problem I'm facing is when I call it with smaller inputs such as:
minFinder(new BigInteger("32"))
it works, but with bigger values like:
minFinder(new BigInteger("64"))
It throws a Recursive Update exception.
Is there any way to increase recursion size to prevent this exception or any other way to solve this?
From the API docs of Map.computeIfAbsent():
The mapping function should not modify this map during computation.
The API docs of ConcurrentHashMap.computeIfAbsent() make that stronger:
The mapping function must not modify this map during computation.
(Emphasis added)
You are violating that by using your minFinder() method as the mapping function. That it seems nevertheless to work for certain inputs is irrelevant. You need to find a different way to achieve what you're after.
Is there any way to increase recursion size to prevent this exception or any other way to solve this?
You could avoid computeIfAbsent() and instead do the same thing the old-school way:
BigInteger halfNum = num.divide(BigInteger.valueOf(2));
BigInteger cachedValue = cache.get(halfNum);
if (cachedValue == null) {
cachedValue = minFinder(halfNum);
cache.put(halfNum, cachedValue);
}
return 1 + cachedValue;
But that's not going to be sufficient if the computation loops. You could perhaps detect that by putting a sentinel value into the map before you recurse, so that you can recognize loops.
I would like to use ConcurrentHashMap to let one thread delete some items from the map periodically and other threads to put and get items from the map at the same time.
I'm using map.entrySet().removeIf(lambda) in the removing thread. I'm wondering what assumptions I can make about its behavior. I can see that removeIf method uses iterator to go through elements in the map, check the given condition and then remove them if needed using iterator.remove().
Documentation gives some info about ConcurrentHashMap iterators behavior:
Similarly, Iterators, Spliterators and Enumerations return elements
reflecting the state of the hash table at some point at or since the
creation of the iterator/enumeration. hey do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time.
As the whole removeIf call happens in one thread I can be sure that the iterator is not used by more than one thread at the time. Still I'm wondering if the course of events described below is possible:
Map contains mapping: 'A'->0
Deleting Thread starts executing map.entrySet().removeIf(entry->entry.getValue()==0)
Deleting Thread calls .iteratator() inside removeIf call and gets the iterator reflecting the current state of the collection
Another thread executes map.put('A', 1)
Deleting thread still sees 'A'->0 mapping (iterator reflects the old state) and because 0==0 is true it decides to remove A key from the map.
The map now contains 'A'->1 but deleting thread saw the old value of 0 and the 'A' ->1 entry is removed even though it shouldn't be. The map is empty.
I can imagine that the behavior may be prevented by the implementation in many ways. For example: maybe iterators are not reflecting put/remove operations but are always reflecting value updates or maybe the remove method of the iterator checks if the whole mapping (both key and value) is still present in the map before calling remove on the key. I couldn't find info about any of those things happening and I'm wondering if there's something which makes that use case safe.
I also managed to reproduce such case on my machine.
I think, the problem is that EntrySetView (which is returned by ConcurrentHashMap.entrySet()) inherits its removeIf implementation from Collection, and it looks like:
default boolean removeIf(Predicate<? super E> filter) {
Objects.requireNonNull(filter);
boolean removed = false;
final Iterator<E> each = iterator();
while (each.hasNext()) {
// `test` returns `true` for some entry
if (filter.test(each.next())) {
// entry has been just changed, `test` would return `false` now
each.remove(); // ...but we still remove
removed = true;
}
}
return removed;
}
In my humble opinion, this cannot be considered as a correct implementation for ConcurrentHashMap.
After discussion with user Zielu in comments below Zielu's answer I have gone deeper into the ConcurrentHashMap code and found out that:
ConcurrentHashMap implementation provides remove(key, value) method which calls replaceNode(key, null, value)
replaceNode checks if both key and value are still present in the map before removing so using it should be fine. Documentation says that it
Replaces node value with v, conditional upon match of cv if
* non-null.
In the case mentioned in the question ConcurrentHashMap's .entrySet() is called which returns EntrySetView class. Then removeIf method calls .iterator() which returns EntryIterator.
EntryIterator extends BaseIterator and inherits remove implementation that calls map.replaceNode(p.key, null, null) which disables conditional removal and just always removes the key.
The negative course of events could be still prevented if iterators always iterated over 'current' values and never returned old ones if some value is modified. I still don't know if that happens or not, but the test case mentioned below seems to verify the whole thing.
I think that have created a test case which shows that the behavior described in my question can really happen. Please correct me if I there are any mistakes in the code.
The code starts two threads. One of them (DELETING_THREAD) removes all entries mapped to 'false' boolean value. Another one (ADDING_THREAD) randomly puts (1, true) or (1,false) values into the map. If it puts true in the value it expects that the entry will still be there when checked and throws an exception if it is not. It throws an exception quickly when I run it locally.
package test;
import java.util.Random;
import java.util.concurrent.ConcurrentHashMap;
public class MainClass {
private static final Random RANDOM = new Random();
private static final ConcurrentHashMap<Integer, Boolean> MAP = new ConcurrentHashMap<Integer, Boolean>();
private static final Integer KEY = 1;
private static final Thread DELETING_THREAD = new Thread() {
#Override
public void run() {
while (true) {
MAP.entrySet().removeIf(entry -> entry.getValue() == false);
}
}
};
private static final Thread ADDING_THREAD = new Thread() {
#Override
public void run() {
while (true) {
boolean val = RANDOM.nextBoolean();
MAP.put(KEY, val);
if (val == true && !MAP.containsKey(KEY)) {
throw new RuntimeException("TRUE value was removed");
}
}
}
};
public static void main(String[] args) throws InterruptedException {
DELETING_THREAD.setDaemon(true);
ADDING_THREAD.start();
DELETING_THREAD.start();
ADDING_THREAD.join();
}
}
The following is isEmpty() method from ConcurrentWeakKeyHashMap.java,
https://github.com/netty/netty/blob/master/src/main/java/org/jboss/netty/util/internal/ConcurrentWeakKeyHashMap.java
Why does it need mcsum, and what does the if(mcsum!= 0) {..} block doing ?
And more importantly, how do I get
if (segments[i].count != 0 || mc[i] != segments[i].modCount)
to evaluate to true?
public boolean isEmpty() {
final Segment<K, V>[] segments = this.segments;
/*
* We keep track of per-segment modCounts to avoid ABA problems in which
* an element in one segment was added and in another removed during
* traversal, in which case the table was never actually empty at any
* point. Note the similar use of modCounts in the size() and
* containsValue() methods, which are the only other methods also
* susceptible to ABA problems.
*/
int[] mc = new int[segments.length];
int mcsum = 0;
for (int i = 0; i < segments.length; ++ i) {
if (segments[i].count != 0) {
return false;
} else {
mcsum += mc[i] = segments[i].modCount;
}
}
// If mcsum happens to be zero, then we know we got a snapshot before
// any modifications at all were made. This is probably common enough
// to bother tracking.
if (mcsum != 0) {
for (int i = 0; i < segments.length; ++ i) {
if (segments[i].count != 0 || mc[i] != segments[i].modCount) {
return false;
}
}
}
return true;
}
EDIT:
Code to evaluate the above if block is now in ConcurrentWeakKeyHashMapTest
Essentially 1 thread continously monitors the concurrentMap, while another thread continuously add/remove same keypair value
This method is a copy of the same in Javas ConcurrentHashMap.
This kind of Map is using a modCount per segment to track during operations if it remained unchanged by different treads. During our traversal of the Map there could actually be other operations modifying the Map. This is called an ABA problem. We are asking the Map if it is empty and in fact it is not, but by accident it appears to be. A simple example:
Map with three segements
Segment 1: size=0
Segment 2: size=0
Segment 3: size=1
In this moment we decide to ask the Map and look into segment 1, which appears to be empty.
Now another algorithm comes and inserts an element to segment 1, but removes the other from segment 3. The Map was never empty.
Our Thread is running now again and we look into segment 2 and 3, both are empty. For us the Map is empty - as a result.
But for any empty slot we tracked whether it was modified, too. And for slot 3 we realize there have been modifications: mc[2]>=1 which means mcsum>=1. This means: since construction the Map was modified at least once. So to answer what mcsum is for: It is a shortcut for the default empty ConcurrentHashMap. If there never have been modifications, we do not need to check for concurrent modifications.
So we know something happened and check again each segment. If now a segment is empty we know what its modCount has been. For segment 3, lets say it was 1, for segment 1 it has been 0. Checking the modCount of segment 1 now it is 1 and the count is > 0 so we know that the Map is not empty.
Still there could be an ABA problem in the second loop as well. But because we know the modCounts we can catch any other concurrent algorithm changing something. So we say if the segment is empty and something changed with the modCount it has not been empty in the first place. That is, what the second loop is doing.
Hope this helps.
EDIT
And more importantly, how do I get
if (segments[i].count != 0 || mc[i] != segments[i].modCount)
to evaluate to true?
This evaluates to true if a segment contains something or if something was modified since the first loop. And it evaluates to false (which means: segment empty) if the segment contains nothing AND nothing was changed since the first loop. Or, to say it differently: We can be sure it has been empty all the time since looked on the checked segment first.
The mcsum checks if the map has ever been structurally modified. There appears to be no way to reset the modification counts to zero, so if the map has ever contained anything at all mcsum will be non-zero.
The weak keys are only cleaned up when the map is changed through a put, remove, et c, and they are only cleaned up within the modified segment. Retrieving values from the map does not clear up the weak keys. This means the map as implemented will hold many weak keys that have been garbage collected as they are only cleaned up if the same segment is modified.
This means results from the size() and isEmpty() methods will frequently return the wrong result.
With the API as provided your best recourse is to call purgeStaleEntries() prior to checking if the map is empty.
please can anybody help me solve this problem last so many days I could not able to solve this error. I tried using synchronized method and other ways but did not work so please help me
Error
java.util.ConcurrentModificationException
at java.util.AbstractList$Itr.checkForComodification(Unknown Source)
at java.util.AbstractList$Itr.remove(Unknown Source)
at JCA.startAnalysis(JCA.java:103)
at PrgMain2.doPost(PrgMain2.java:235)
Code
public synchronized void startAnalysis() {
//set Starting centroid positions - Start of Step 1
setInitialCentroids();
Iterator<DataPoint> n = mDataPoints.iterator();
//assign DataPoint to clusters
loop1:
while (true) {
for (Cluster c : clusters)
{
c.addDataPoint(n.next());
if (!n.hasNext())
break loop1;
}
}
//calculate E for all the clusters
calcSWCSS();
//recalculate Cluster centroids - Start of Step 2
for (Cluster c : clusters) {
c.getCentroid().calcCentroid();
}
//recalculate E for all the clusters
calcSWCSS();
// List copy = new ArrayList(originalList);
//synchronized (c) {
for (int i = 0; i < miter; i++) {
//enter the loop for cluster 1
for (Cluster c : clusters) {
for (Iterator<DataPoint> k = c.getDataPoints().iterator(); k.hasNext(); ) {
// synchronized (k) {
DataPoint dp = k.next();
System.out.println("Value of DP" +dp);
//pick the first element of the first cluster
//get the current Euclidean distance
double tempEuDt = dp.getCurrentEuDt();
Cluster tempCluster = null;
boolean matchFoundFlag = false;
//call testEuclidean distance for all clusters
for (Cluster d : clusters) {
//if testEuclidean < currentEuclidean then
if (tempEuDt > dp.testEuclideanDistance(d.getCentroid())) {
tempEuDt = dp.testEuclideanDistance(d.getCentroid());
tempCluster = d;
matchFoundFlag = true;
}
//if statement - Check whether the Last EuDt is > Present EuDt
}
//for variable 'd' - Looping between different Clusters for matching a Data Point.
//add DataPoint to the cluster and calcSWCSS
if (matchFoundFlag) {
tempCluster.addDataPoint(dp);
//k.notify();
// if(k.hasNext())
k.remove();
for (Cluster d : clusters) {
d.getCentroid().calcCentroid();
}
//for variable 'd' - Recalculating centroids for all Clusters
calcSWCSS();
}
//if statement - A Data Point is eligible for transfer between Clusters.
// }// syn
}
//for variable 'k' - Looping through all Data Points of the current Cluster.
}//for variable 'c' - Looping through all the Clusters.
}//for variable 'i' - Number of iterations.
// syn
}
You can't modify a list while you're iterating it, unless you do it through the Iterator.
From the API: ConcurrentModificationException
This exception may be thrown by methods that have detected concurrent modification of an object when such modification is not permissible.
For example, it is not generally permissible for one thread to modify a Collection while another thread is iterating over it.
Your code is a mess, so it's hard to figure out what's going on, but I'd check for:
Shared references
All remove AND add
I think that simply looking up the javadoc for ConcurrentModificationException would have answered your question. Did you try that?
Iterator.remove() is causing the exception, presumably on the linke k.remove(). This means you modified the List it is iterating over while iterating, which is not allowed. So you need to figure out where c.getDataPoints() is changing. I am guessing it is because you eventually find a cluster d, assign to tempCluster, then change its data points (which is eventually the list you are iterating over.
if you need to delete few elements from your list. You can maintain another list like elements to be removed. And finally call removeAll(collection). Of course this is not good for huge data.
Keep few things in mind to avoid concurrent access issues :
First of all the method (startAnalysis) is an instance method. So synchronization will be specific to its instance. So you need to make sure that all the threads trying to access this method must use the same instance to avoid concurrent access issues. If every thread is referring to a different instance, then all the threads will be allowed to execute the method and eventually may lead to concurrency issues.
Secondly, one should always prefer to use Iterator rather the for:each loop to iterate over collections, to avoid concurrent access/modification issues.
Also you can use concurrent collection api classes to avoid concurrency issues. These classes are heavily used in such requirements to avoid concurrent modification issues.
Hope this helps.
I am looking for a data structure that operates similar to a hash table, but where the table has a size limit. When the number of items in the hash reaches the size limit, a culling function should be called to get rid of the least-retrieved key/value pairs in the table.
Here's some pseudocode of what I'm working on:
class MyClass {
private Map<Integer, Integer> cache = new HashMap<Integer, Integer>();
public int myFunc(int n) {
if(cache.containsKey(n))
return cache.get(n);
int next = . . . ; //some complicated math. guaranteed next != n.
int ret = 1 + myFunc(next);
cache.put(n, ret);
return ret;
}
}
What happens is that there are some values of n for which myFunc() will be called lots of times, but many other values of n which will only be computed once. So the cache could fill up with millions of values that are never needed again. I'd like to have a way for the cache to automatically remove elements that are not frequently retrieved.
This feels like a problem that must be solved already, but I'm not sure what the data structure is that I would use to do it efficiently. Can anyone point me in the right direction?
Update I knew this had to be an already-solved problem. It's called an LRU Cache and is easy to make by extending the LinkedHashMap class. Here is the code that incorporates the solution:
class MyClass {
private final static int SIZE_LIMIT = 1000;
private Map<Integer, Integer> cache =
new LinkedHashMap<Integer, Integer>(16, 0.75f, true) {
protected boolean removeEldestEntry(Map.Entry<Integer, Integer> eldest)
{
return size() > SIZE_LIMIT;
}
};
public int myFunc(int n) {
if(cache.containsKey(n))
return cache.get(n);
int next = . . . ; //some complicated math. guaranteed next != n.
int ret = 1 + myFunc(next);
cache.put(n, ret);
return ret;
}
}
You are looking for an LRUList/Map. Check out LinkedHashMap:
The removeEldestEntry(Map.Entry) method may be overridden to impose a policy for removing stale mappings automatically when new mappings are added to the map.
Googling "LRU map" and "I'm feeling lucky" gives you this:
http://commons.apache.org/proper/commons-collections//javadocs/api-release/org/apache/commons/collections4/map/LRUMap.html
A Map implementation with a fixed
maximum size which removes the least
recently used entry if an entry is
added when full.
Sounds pretty much spot on :)
WeakHashMap will probably not do what you expect it to... read the documentation carefully and ensure that you know exactly what you from weak and strong references.
I would recommend you have a look at java.util.LinkedHashMap and use its removeEldestEntry method to maintain your cache. If your math is very resource intensive, you might want to move entries to the front whenever they are used to ensure that only unused entries fall to the end of the set.
The Adaptive Replacement Cache policy is designed to keep one-time requests from polluting your cache. This may be fancier than you're looking for, but it does directly address your "filling up with values that are never needed again".
Take a look at WeakHashMap
You probably want to implement a Least-Recently Used policy for your map. There's a simple way to do it on top of a LinkedHashMap:
http://www.roseindia.net/java/example/java/util/LRUCacheExample.shtml