ConcurrentWeakKeyHashMap isEmpty method

ConcurrentWeakKeyHashMap isEmpty method - java

The following is isEmpty() method from ConcurrentWeakKeyHashMap.java,
https://github.com/netty/netty/blob/master/src/main/java/org/jboss/netty/util/internal/ConcurrentWeakKeyHashMap.java
Why does it need mcsum, and what does the if(mcsum!= 0) {..} block doing ?
And more importantly, how do I get
if (segments[i].count != 0 || mc[i] != segments[i].modCount)
to evaluate to true?
public boolean isEmpty() {
final Segment<K, V>[] segments = this.segments;
/*
* We keep track of per-segment modCounts to avoid ABA problems in which
* an element in one segment was added and in another removed during
* traversal, in which case the table was never actually empty at any
* point. Note the similar use of modCounts in the size() and
* containsValue() methods, which are the only other methods also
* susceptible to ABA problems.
*/
int[] mc = new int[segments.length];
int mcsum = 0;
for (int i = 0; i < segments.length; ++ i) {
if (segments[i].count != 0) {
return false;
} else {
mcsum += mc[i] = segments[i].modCount;
}
}
// If mcsum happens to be zero, then we know we got a snapshot before
// any modifications at all were made. This is probably common enough
// to bother tracking.
if (mcsum != 0) {
for (int i = 0; i < segments.length; ++ i) {
if (segments[i].count != 0 || mc[i] != segments[i].modCount) {
return false;
}
}
}
return true;
}
EDIT:
Code to evaluate the above if block is now in ConcurrentWeakKeyHashMapTest
Essentially 1 thread continously monitors the concurrentMap, while another thread continuously add/remove same keypair value

This method is a copy of the same in Javas ConcurrentHashMap.
This kind of Map is using a modCount per segment to track during operations if it remained unchanged by different treads. During our traversal of the Map there could actually be other operations modifying the Map. This is called an ABA problem. We are asking the Map if it is empty and in fact it is not, but by accident it appears to be. A simple example:
Map with three segements
Segment 1: size=0
Segment 2: size=0
Segment 3: size=1
In this moment we decide to ask the Map and look into segment 1, which appears to be empty.
Now another algorithm comes and inserts an element to segment 1, but removes the other from segment 3. The Map was never empty.
Our Thread is running now again and we look into segment 2 and 3, both are empty. For us the Map is empty - as a result.
But for any empty slot we tracked whether it was modified, too. And for slot 3 we realize there have been modifications: mc[2]>=1 which means mcsum>=1. This means: since construction the Map was modified at least once. So to answer what mcsum is for: It is a shortcut for the default empty ConcurrentHashMap. If there never have been modifications, we do not need to check for concurrent modifications.
So we know something happened and check again each segment. If now a segment is empty we know what its modCount has been. For segment 3, lets say it was 1, for segment 1 it has been 0. Checking the modCount of segment 1 now it is 1 and the count is > 0 so we know that the Map is not empty.
Still there could be an ABA problem in the second loop as well. But because we know the modCounts we can catch any other concurrent algorithm changing something. So we say if the segment is empty and something changed with the modCount it has not been empty in the first place. That is, what the second loop is doing.
Hope this helps.
EDIT
And more importantly, how do I get
if (segments[i].count != 0 || mc[i] != segments[i].modCount)
to evaluate to true?
This evaluates to true if a segment contains something or if something was modified since the first loop. And it evaluates to false (which means: segment empty) if the segment contains nothing AND nothing was changed since the first loop. Or, to say it differently: We can be sure it has been empty all the time since looked on the checked segment first.

The mcsum checks if the map has ever been structurally modified. There appears to be no way to reset the modification counts to zero, so if the map has ever contained anything at all mcsum will be non-zero.
The weak keys are only cleaned up when the map is changed through a put, remove, et c, and they are only cleaned up within the modified segment. Retrieving values from the map does not clear up the weak keys. This means the map as implemented will hold many weak keys that have been garbage collected as they are only cleaned up if the same segment is modified.
This means results from the size() and isEmpty() methods will frequently return the wrong result.
With the API as provided your best recourse is to call purgeStaleEntries() prior to checking if the map is empty.

Related

Usage of Iteration's .hasNext(); and .next(); methods in Java

For 2 days I'm pretty confused about .hasNext(); and next(); methods of Iteration interface especially in while loop. Let me give an example:
import java.util.*; // imported whole java.util package.
class Main {
public static void main(String[] args) {
ArrayList<String> cars = new ArrayList<String>(); // created ArrayList which name is cars.
cars.add("Volvo");
cars.add("Mercedes");
cars.add("BMW");
Iterator<String> x = cars.iterator();
while(x.hasNext()) {
System.out.print(x.next() + " "); // It prints Volvo Mercedes BMW
}
}
}
I understood that .hasNext(); is boolean and it returns true if iteration has elements. The .next(); method returns the next element. After first element Volvo, it gets back while(x.hasNext()) and entering the inside of loop again but where is the counter of this loop? I mean after printing Volvo how can it goes to the next element? It returns all element and if there is no more .hasNext(); returns false and code continues to next line is simple answer and correct but I want to understand it clearly.

Actually the iterator() Method Creates an iterator for all elements in your (let's say) Arraylist. Now in Your Code, the condition of while loop x.hasNext() checks whether the list contains an element or not, if yes it will return true else false.
Now x.next() point to First Element (Like of LinkedLists for example) in Your ArrayList and store that Object(Volvo in your case). When You Call this method it basically gives you reference of that object in List and the Iterator Moves to next element in the List. When Calling next() method the iterator(x in your case) returns the object and then moves to next Element (Mercedes in your case).
Now When you call next() method Again you will find Mercedes is returned. To know How it Works type System.out.println(x.next()) thrice instead of while loop u will understand that it moves to next location. If you type System.out.println(x.next()) fourth time it will give exception because there is no element in your list furthur more. Exception in thread "main" java.util.NoSuchElementException This is the Exception.
That's why hasNext() method is used as it checks whether an element is there or not.
you can Compare this to printing of linkedlist(Data structure if you know) where we make one object point to head node print it and move to next node. Same Case is here it returns current element(object) and moves to next element(object).

while is java-ese for: Keep doing this until a thing changes.
It's a bit like this common household task:
How to wash dishes
Check if there are still dirty dishes on the right side of the counter.
If there are some, do the dishwash thing: Pick up the nearest dirty dish, and wash it, then stow it away on the left side of the counter, and go back to the start of this algorithm.
Otherwise (there are no dirty dishes), you are done.
while (counterRightSide.hasItems()) {
Dish dirty = counterRightSide.fetch();
Dish clean = sink.clean(dirty);
counterLeftSide.stow(clean);
}
EDIT: I realize now that 'a kitchen counter' is an unfortunate example, given the homonym 'kitchen counter' and 'counter in code'. I'll use 'accumulator' instead of 'counter in code' to fix this confusion.
Note how there is no accumulator here either, and you aren't counting in your head when you wash the dishes either. You COULD first take inventory of the amount of dirty dishes you have (say, 15 dishes), and then write a protocol where you grab a dirty dish exactly 15 times before decreeing that you've done the job, but surely you realize that that's just one way to do it, and another is to just... check if there are any dirty dishes left every time you're done washing 1 dirty dish. Which is how the above code works.
Note that the action of 'fetching' an item from the right side of the kitchen counter changes properties of that kitchen counter. It now has one less item on it.
The same applies to iterators: Calling .next() on an iterator changes it in some form. After all, if you call next twice, the results of these calls are different. Contrast with invoking, say, someArrayList.get(5) a bunch of times in a row; that doesn't change the arraylist at all, and therefore you get the same thing back every time. next() is not like that.
So how DOES that work? Who counts this stuff?
Well, that's the neat thing about abstraction! A 'Collection' is something that can do a few things, such as 'it must be able to report to how how many items it holds', and, 'it must be able to festoon you an object that can be used to loop through each item that it holds'.
That's what an iterator is: An object that can do that.
So how does it work? Who knows! It doesn't matter! As long as you CAN do these things, you can be a collection, you are free to implement the task however you like.
Okay, but how does a common collection, like, say, ArrayList do this?
With a counter, of course. Here's the implementation:
public Iterator<T> iterator() {
return new Iterator<T>() {
int counter = 0; // here it is!
public boolean hasNext() {
return counter < size();
}
public int next() {
return get(counter++);
}
};
}
you're getting this object and referencing it in your code (your x variable), and that object has a field with a counter in it. You can't see it; it's private, and not actually in the Iterator type (Iterator is an interface; what you get is some unknown subtype of it, and that contains the counter variable). That's assuming you're getting an iterator by invoking .iterator() on an arraylist. If you invoke it on something else, there may or may not be a counter - as long as the iterator WORKS, it doesn't matter how it works, that's the beauty of interfaces.

A while loop checks the condition and when condition is true then executes the body of the loop and iterates itself,
while(condition){
//do something.
}
The hasNext() is a method from the Iterator interface which returns true if the "iteration" has more elements, if there are no more elements it returns fals and will no longer enter the body of the while loop:
while(x.hasNext){
//do something.
}
The next() method is a method from the Iterator interface and returns the next element of the iteration.
while(x.hasNext){
x.next();
}

I mean after printing Volvo how can it goes to the next element? It
returns all element and if there is no more .hasNext(); returns false
and code continues to next line is simple answer and correct but I
want to understand it clearly.
int cursor; // index of next element to return
//...
public boolean hasNext() {
return cursor != size;
}
public E next() {
checkForComodification();
int i = cursor;
if (i >= size)
throw new NoSuchElementException();
Object[] elementData = ArrayList.this.elementData;
if (i >= elementData.length)
throw new ConcurrentModificationException();
cursor = i + 1;
return (E) elementData[lastRet = i];
}
//...
Given above is how ArrayList#hasNext and and ArrayList#next have been implemented. The code is straight forward and easy to understand. You can check the full code by using a decompiler or here.

Java : Is the get method of an Arraylist cached?

Does the Arraylist object store the last requested value in memory to access it faster the next time? Or do I need to do this myself?
Or more concretely, in terms of performance, is it better to do this :
for (int i = 0; i < myArray.size(); i++){
int value = myArray.get(i);
int result = value + 2 * value - 5 / value;
}
Instead of doing this :
for (int i = 0; i < myArray.size(); i++)
int result = myArray.get(i) + 2 * myArray.get(i) - 5 / myArray.get(i);

In terms of performance, it doesn't matter one bit. No, ArrayList doesn't cache anything, although the JITted end result could be a different issue.
If you're wondering which version to use, use the first one. It's clearer.

You can answer your (first) question yourself by looking into the actual source:
public E get(int index) {
rangeCheck(index);
return elementData(index);
}
So: No, there is no caching taking place but you can also see that there is no much of an impact in terms of performance because the get method is essentially just an access to an array.
But it's still good to avoid multiple calls for some reasons:
int result = value + 2 * value - 5 / value is easier to understand (i.e. realizing that you use the same value three times in your calculation)
If you later decide to change the underlying list (e.g. to a LinkedList) you might end up with an impact on performance and then have to change your code to get around it.
As long as you don't synchronize the access to the list, repeated calls of get(index) might actually return different values if between two calls a call of set(index, value) has taken place (even in small souce blocks like this, it's possible to happen - BTST)
The second point has also a consequence in terms of how to access all values of a list, that leads to the decision to avoid list.get(i) altogether if you're going to iterate over all elements in a list. In that case it's better to use the Iterator or streams:
You code would then look like this:
Iterator it = myArray.iterator();
while (it.hasNext()) {
int value = it.next();
int result = value + 2 * value - 5 / value;
}
LinkedList is very slow when trying to access elements in it by specific index but can iteratre quite fast from one element to the next, so the Iterator returned by LinkedList makes use of that while the Iterator returned by ArrayList simply accesses the internal array (without the need to do the repeated range check calls you can see in the get-method above

I can't get to modify my static variable in java

You give a grid (4x4 here). you need to find out the total no of unique paths from (0,0) to (4,4). main() call a function pathify for this. It finds the possible "next steps" and calls it again. When (4,4) is reached noOfPaths++; is supposed to execute. This doesn't happen and I can't find the problem.
import java.util.ArrayList;
public class NoOfPaths {
static int xRows = 4;
static int yColumns = 4;
static int noOfPaths = 0;
/*A robot is located in the upper-left corner of a 4×4 grid.
* The robot can move either up, down, left, or right,
* but cannot go to the same location twice.
* The robot is trying to reach the lower-right corner of the grid.
* Your task is to find out the number of unique ways to reach the destination.
**/
static ArrayList validNeighbours (int x,int y, ArrayList visited) {
ArrayList valid = new ArrayList();
if((x+1 <= xRows) && !visited.contains(((x+1)*10)+y) ) {
valid.add(((x+1)*10)+y);
}
if((x-1 >= 0) && !visited.contains(((x-1)*10)+y) ) {
valid.add(((x-1)*10)+y);
}
if((y+1 <= yColumns) && !visited.contains(x*10+y+1) ) {
valid.add(x*10+y+1);
}
if((y-1 >= 0) && !visited.contains(x*10+y-1) ) {
valid.add(x*10+y-1);
}
return valid;
}
static void pathify(int x,int y, ArrayList alreadyVisited) {
if(x == xRows && y == yColumns) {
noOfPaths++;
} else {
alreadyVisited.add(x*10+y);
ArrayList callAgain = new ArrayList();
callAgain = validNeighbours(x,y,alreadyVisited);
for (int t=0,temp; t<callAgain.size(); t++) {
temp=(int) callAgain.get(t);
pathify(temp/10, temp%10, alreadyVisited);
}
}
}
public static void main(String[] args) {
ArrayList alreadyVisited = new ArrayList();
pathify(0, 0, alreadyVisited);
System.out.println(noOfPaths);
}
}

The error is in how you're handling alreadyVisited. The first time pathify is called, this list will contain only the initial square (0,0), which is fine. Here's the important part of your code:
for (int t=0,temp; t<callAgain.size(); t++) {
temp=(int) callAgain.get(t);
pathify(temp/10, temp%10, alreadyVisited);
}
You've found the neighbors of the initial cell. Your code will pick the first neighbor; then it will find paths starting with that neighbor, and the recursive calls to pathify will add cells to alreadyVisited.
Now, after all the recursive calls come back, you're ready to find cells starting with the second neighbor of the initial cell. But you have a problem: alreadyVisited still has all the cells it's collected from the paths it found starting with the second neighbor. So you won't find all possible paths starting with the second neighbor; you won't find any path that includes any cell in any path you've previously found. This isn't what you want, since you only want to avoid visiting the same cell in each path--you don't want to avoid visiting the same cell in all your previous paths. (I simplified this a little bit. In reality, the problem will start occurring deeper down the recursive stack, and you won't even find all the paths beginning with the first neighbor.)
When implementing a recursive algorithm, I've found that it's generally a bad idea to keep an intermediate data structure that is shared by recursive invocations that will be modified by those invocations. In this case, that's the list alreadyVisited. The problem is that when an invocation deeper down the stack modifies the structure, this affects invocations further up, because they will see the modifications after the deeper invocations return, which is basically data they need changing underneath them. (I'm not talking about a collection that is used to hold a list of results, if the list is basically write-only.) The way to avoid it here is that instead of adding to alreadyVisited, you could create a clone of this list and then add to it. That way, a deeper invocation can be sure that it's not impacting the shallower invocations by changing their data. That is, instead of
alreadyVisited.add(x*10+y);
write
alreadyVisited = [make a copy of alreadyVisited];
alreadyVisited.add(x*10+y);
The add will modify a new list, not the list that other invocations are using. (Personally, I'd declare a new variable such as newAlreadyVisited, since I don't really like modifying parameters, for readability reasons.)
This may seem inefficient. It will definitely use more memory (although the memory should be garbage-collectible pretty quickly). But trying to share a data structure between recursive invocations is very, very difficult to do correctly. It can be done if you're very careful about cleaning up the changes and restoring the structure to what it was when the method began. That might be necessary if the structure is something like a large tree, making it unfeasible to copy for every invocation. But it can take a lot of skill to make things work.
EDIT: I tested it and it appears to work: 12 if xRows=yColumns=2, 8512 if both are 4 (is that correct?). Another approach: instead of copying the list, I tried
alreadyVisited.remove((Object)(x*10+y));
at the end of the method ((Object) is needed so that Java doesn't think you're removing at an index) and that gave me the same results. If you do that, you'll make sure that alreadyVisited is the same when pathify returns as it was when it started. But I want to emphasize that I don't recommend this "cleanup" approach unless you really know what you're doing.

Updating both a ConcurrentHashMap and an AtomicInteger safely

I have to store words and their corresponding integer indices in a hash map. The hash map will be updated concurrently.
For example: lets say the wordList is {a,b,c,a,d,e,a,d,e,b}
The the hash map will contain the following key-value pairs
a:1
b:2
c:3
d:4
e:5
The code for this is as follows:
public class Dictionary {
private ConcurrentMap<String, Integer> wordToIndex;
private AtomicInteger maxIndex;
public Dictionary( int startFrom ) {
wordToIndex = new ConcurrentHashMap<String, Integer>();
this.maxIndex = new AtomicInteger(startFrom);
}
public void insertAndComputeIndices( List<String> words ) {
Integer index;
//iterate over the list of words
for ( String word : words ) {
// check if the word exists in the Map
// if it does not exist, increment the maxIndex and put it in the
// Map if it is still absent
// set the maxIndex to the newly inserted index
if (!wordToIndex.containsKey(word)) {
index = maxIndex.incrementAndGet();
index = wordToIndex.putIfAbsent(word, index);
if (index != null)
maxIndex.set(index);
}
}
}
My question is whether the above class is thread safe or not?
Basically an atomic operation in this case should be to increment the maxIndex and then put the word in the hash map if it is absent.
Is there a better way to achieve concurrency in this situation?

Clearly another thread can see maxIndex incrementing and then getting clobbered.
Assuming this is all that is going on to the map (in particular, no removes), then you could try putting the word in the map and only incrementing if that succeeds.
Integer oldIndex = wordToIndex.putIfAbsent(word, -1);
if (oldIndex == null) {
wordToIndex.put(word, maxIndex.incrementAndGet());
}
(Alternatively for a single put, use some sort of mutable type in place of Integer.)

No, it is not. If you have two methods A and B, both thread safe, this of course does not mean that calling A and B in a sequence is also thread safe, as a thread can interrupt another one between the function calls. This is what happens here:
if (!wordToIndex.containsKey(word)) {
index = maxIndex.incrementAndGet();
index = wordToIndex.putIfAbsent(word, index);
if (index != null)
maxIndex.set(index);
}
Thread A verifies that wordToIndex does not contain the word "dog" and proceeds inside the if. Before it can add the word "dog", thread B also finds that "dog" is not in the map (A did not add it yet) so it also proceeds inside the if. Now you have the word "dog" trying to be inserted twice.
Of course, putIfAbsent will guarantee that only one thread can add it, but I think that your goal is to not have two threads enter the if at the same time with the same key.

AtomicInteger is something you should consider using.
And you should wrap all the code that needs to happen as a transaction in a synchronized(this) block.

The other answers are correct --- there are non-thread-safe fields in your class. What you should do, to start, is make sure
how to implement the threading
1) I would make sure everything internal is private, although this is not a requirement of thread-safe code.
2) Find any of your accessor methods, make sure they are snychronized whenever the state of the global object is modified (OR AT LEAST THE IF BLOCK IS SYNCHRONIZED).
3) Test for deadlocks or bad counts, this can be implemented in a unit test by making sure the value of maxIndex is correct after 10000 threaded inserts, for example...

Java java.util.ConcurrentModificationException error

please can anybody help me solve this problem last so many days I could not able to solve this error. I tried using synchronized method and other ways but did not work so please help me
Error
java.util.ConcurrentModificationException
at java.util.AbstractList$Itr.checkForComodification(Unknown Source)
at java.util.AbstractList$Itr.remove(Unknown Source)
at JCA.startAnalysis(JCA.java:103)
at PrgMain2.doPost(PrgMain2.java:235)
Code
public synchronized void startAnalysis() {
//set Starting centroid positions - Start of Step 1
setInitialCentroids();
Iterator<DataPoint> n = mDataPoints.iterator();
//assign DataPoint to clusters
loop1:
while (true) {
for (Cluster c : clusters)
{
c.addDataPoint(n.next());
if (!n.hasNext())
break loop1;
}
}
//calculate E for all the clusters
calcSWCSS();
//recalculate Cluster centroids - Start of Step 2
for (Cluster c : clusters) {
c.getCentroid().calcCentroid();
}
//recalculate E for all the clusters
calcSWCSS();
// List copy = new ArrayList(originalList);
//synchronized (c) {
for (int i = 0; i < miter; i++) {
//enter the loop for cluster 1
for (Cluster c : clusters) {
for (Iterator<DataPoint> k = c.getDataPoints().iterator(); k.hasNext(); ) {
// synchronized (k) {
DataPoint dp = k.next();
System.out.println("Value of DP" +dp);
//pick the first element of the first cluster
//get the current Euclidean distance
double tempEuDt = dp.getCurrentEuDt();
Cluster tempCluster = null;
boolean matchFoundFlag = false;
//call testEuclidean distance for all clusters
for (Cluster d : clusters) {
//if testEuclidean < currentEuclidean then
if (tempEuDt > dp.testEuclideanDistance(d.getCentroid())) {
tempEuDt = dp.testEuclideanDistance(d.getCentroid());
tempCluster = d;
matchFoundFlag = true;
}
//if statement - Check whether the Last EuDt is > Present EuDt
}
//for variable 'd' - Looping between different Clusters for matching a Data Point.
//add DataPoint to the cluster and calcSWCSS
if (matchFoundFlag) {
tempCluster.addDataPoint(dp);
//k.notify();
// if(k.hasNext())
k.remove();
for (Cluster d : clusters) {
d.getCentroid().calcCentroid();
}
//for variable 'd' - Recalculating centroids for all Clusters
calcSWCSS();
}
//if statement - A Data Point is eligible for transfer between Clusters.
// }// syn
}
//for variable 'k' - Looping through all Data Points of the current Cluster.
}//for variable 'c' - Looping through all the Clusters.
}//for variable 'i' - Number of iterations.
// syn
}

You can't modify a list while you're iterating it, unless you do it through the Iterator.
From the API: ConcurrentModificationException
This exception may be thrown by methods that have detected concurrent modification of an object when such modification is not permissible.
For example, it is not generally permissible for one thread to modify a Collection while another thread is iterating over it.
Your code is a mess, so it's hard to figure out what's going on, but I'd check for:
Shared references
All remove AND add

I think that simply looking up the javadoc for ConcurrentModificationException would have answered your question. Did you try that?
Iterator.remove() is causing the exception, presumably on the linke k.remove(). This means you modified the List it is iterating over while iterating, which is not allowed. So you need to figure out where c.getDataPoints() is changing. I am guessing it is because you eventually find a cluster d, assign to tempCluster, then change its data points (which is eventually the list you are iterating over.

if you need to delete few elements from your list. You can maintain another list like elements to be removed. And finally call removeAll(collection). Of course this is not good for huge data.

Keep few things in mind to avoid concurrent access issues :
First of all the method (startAnalysis) is an instance method. So synchronization will be specific to its instance. So you need to make sure that all the threads trying to access this method must use the same instance to avoid concurrent access issues. If every thread is referring to a different instance, then all the threads will be allowed to execute the method and eventually may lead to concurrency issues.
Secondly, one should always prefer to use Iterator rather the for:each loop to iterate over collections, to avoid concurrent access/modification issues.
Also you can use concurrent collection api classes to avoid concurrency issues. These classes are heavily used in such requirements to avoid concurrent modification issues.
Hope this helps.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.