Under which circumstances can toSet throw an java.lang.IllegalArgumentException? - java

Based on our Crashlytics logs it seems that we're running into the following exception from time to time:
Fatal Exception: java.lang.IllegalArgumentException
Illegal initial capacity: -1
...
java.util.HashMap.<init> (HashMap.java:448)
java.util.LinkedHashMap.<init> (LinkedHashMap.java:371)
java.util.HashSet.<init> (HashSet.java:161)
java.util.LinkedHashSet.<init> (LinkedHashSet.java:146)
kotlin.collections.CollectionsKt___CollectionsKt.toSet (CollectionsKt___CollectionsKt.java:1316)
But we're not sure when it is possible that this exception is actually thrown. The relevant code for this statement looks something like this:
private val markersMap = mutableMapOf<Any, Marker>()
...
synchronized(markersMap) {
val currentMarkers = markersMap.values.toSet() //it crashes here
// performing some operation on the markers
}
Right now we're suspecting multithreading to cause the issue as the markersMap is modified in multiple places, but as the map is already initialized by default we're not really sure how it can end up in less than an empty state. We also took a look at the toSet implementation:
if (this is Collection) {
return when (size) {
0 -> emptySet()
1 -> setOf(if (this is List) this[0] else iterator().next())
else -> toCollection(LinkedHashSet<T>(mapCapacity(size)))
}
}
Based on this, we'd assume that mapCapacity(size) returns -1, but we weren't able to find the actual implementation of mapCapacity to verify when this can happen.
Does anybody know when -1 is returned here, which in turn causes the constructor to fail?

Java collections are not synchronized and if you need to access a Map or any collection from multiple threads then you are required to take care of synchonization. as stated in LinkedHashMap's header
Note that this implementation is not synchronized.If multiple threads
access a linked hash map concurrently, and at least one of the threads
modifies the map structurally, it must be synchronized externally.
My guess is that you are probably performing structural modifications(mix of put and remove) on the Map without synchronization, which can cause this issue. for example
fun main(){
val markersMap = mutableMapOf<Any, Any>()
(1..1000).forEach { markersMap.put(it, "$it") }
val t1 = Thread{
(1..1000).forEach { markersMap.remove(it)
if(markersMap.size < 0){
print("SIZE IS ${markersMap.size}")
}
}
}
val t2 = Thread{
(1..1000).forEach {
markersMap.remove(it)
if(markersMap.size < 0){
print("SIZE IS ${markersMap.size}")
}
}
}
t1.start()
t2.start()
}
On my machine this code prints SIZE IS -128, SIZE IS -127 and lot many other negative values and when I added markersMap.values.toSet() inside one of the if blocks, this happened

Related

Reducing the scope of a synchronized block in Java unexpectedly corrupts my ArrayList, why is that the case?

A bit late, I have a Christmas special for you. There is a Santa class with an ArrayList of presents and a Map to keep track which children already have got their presents. Children modeled as threads constantly asking Santa for presents at the same time. For simplicity, each child receives exactly one (random) present.
Here is the method in the Santa class occasionally yielding a IllegalArgumentException because presents.size() is negative.
public Present givePresent(Child child) {
if(gotPresent.containsKey(child) && !gotPresent.get(child)) {
synchronized(this) {
gotPresent.put(child, true);
Random random = new Random();
int randomIndex = random.nextInt(presents.size());
Present present = presents.get(randomIndex);
presents.remove(present);
return present;
}
}
return null;
}
However, making the whole method synchronized works just fine. I don't really understand the problem with the smaller sized synchronized block shown before. From my point of view, it should still assure that a present isn't assigned to a kid multiple times and there shouldn't be concurrent writes (and also reads) on the presents ArrayList. Could you please tell me why my assumption is wrong?
That happens because the code contains a race condition. Let us use the following example to illustrate that race condition.
Imagine that Thread 1 reads
`if(gotPresent.containsKey(child) && !gotPresent.get(child))`
and it evaluates as true. While Thread 1 enters the synchronized block, another thread (i.e., Thread 2) also reads
if(gotPresent.containsKey(child) && !gotPresent.get(child))
before Thread 1 has had the time to do gotPresent.put(child, true);. Consequently, the aforementioned if also evaluates as true for Thread 2.
Thread 1 is inside the synchronized(this) and removes the present from the list of presents (i.e., presents.remove(present);). Now the size of the present list is 0. Thread 1 exits the synchronized block, while Thread 2 just enters it, and eventually calls
int randomIndex = random.nextInt(presents.size());
since presents.size() will return 0, and the random.nextInt implementation is as follows:
public int nextInt(int bound) {
if (bound <= 0)
throw new IllegalArgumentException(BadBound);
...
}
you get the IllegalArgumentException exception.
However, making the whole method synchronized works just fine.
Yes, because with
synchronized(this) {
if(gotPresent.containsKey(child) && !gotPresent.get(child)) {
gotPresent.put(child, true);
Random random = new Random();
int randomIndex = random.nextInt(presents.size());
Present present = presents.get(randomIndex);
presents.remove(present);
return present;
}
}
in the aforementioned race-condition example Thread 2 would have been waiting before the
if(gotPresent.containsKey(child) && !gotPresent.get(child))
and because Thread 1, before exiting the synchronized block, would have done
gotPresent.put(child, true);
by the time Thread 2 would have entered the synchronized block the following statement
!gotPresent.get(child)
would have evaluated as false, and consequently Thread 2 would have exit immediately without calling int randomIndex = random.nextInt(presents.size()); with a list of size 0.
Since the method that you have shown is being executed in parallel by multiple threads you should ensure mutual exclusion of the shared data structure among threads, namely gotPresent and presents. Which implies, for instance, that operations like containsKey, get, and put should be performed within the same synchronized block.

Not thread safe class

Why below class is not thread safe ?
public class UnsafeCachingFactorizer implements Servlet {
private final AtomicReference<BigInteger> lastNumber = new AtomicReference<>();
private final AtomicReference<BigInteger[]> lastFactors = new AtomicReference<>();
public void service(ServletRequest req, ServletResponse resp) {
BigInteger i = extractFromRequest(req);
if i.equals(lastNumber.get())) {
encodeIntoResponse(resp, lastFactors.get());
}
else {
BigInteger[] factors = factor(i);
lastNumber.set(i);
lastFactors.set(factors);
encodeIntoResponse(resp, factors);
}
}
}
Instance variables are thread safe, then why the whole class is not thread safe ?
It's not thread safe because you don't always get the right answer when multiple threads call the code.
Let's say that lastNumber=1 and lastFactors=factors(1). In the one-thread case, where the thread calls with i=1:
T1: if (lastNumber.get().equals(1)) { // true
T1: encodeIntoResponse(resp, lastFactors.get());
Fine, this is the expected result. But consider a multi-threaded case, where the actions within each thread takes place in the same order, but can arbitrarily interleave. One such interleaving is (where i=1 and i=2 for the two threads respectively):
T1: if (lastNumber.get().equals(1)) { // true
T2: if (lastNumber.get().equals(2)) { // false
T2: } else {
T2: lastNumber.set(2);
T2: lastFactors.set(factors(2));
T1: encodeIntoResponse(resp, lastFactors.get()); // oops! You wrote the factors(2), not factors(1).
The problem is that you're not getting and setting the AtomicReferences atomically: that is, there is nothing to stop another thread sneaking in and changing the values (of one or either) between the get and the set.
In general, whilst individual calls to methods on an AtomicReference are atomic, multiple calls are not (and they definitely aren't atomic between instances of AtomicReference). So, if you ever find yourself writing code like:
if (/* some condition with ref.get() */) {
/* some statement with ref.set() */
}
then you probably aren't using AtomicReference correctly (or, at least, it's not thread-safe).
To fix this, you need something that can be read and set atomically. For example, create a simple class to hold both:
class Holder {
// Add a ctor to initialize these.
final BigInteger number;
final BigInteger[] factors;
}
Then store this in a single AtomicReference, and use updateAndGet:
BigInteger[] factors = holderRef.updateAndGet(h -> {
if (h != null && h.number.equals(i)) {
return h;
}
return new Holder(i, factor(i));
}).factors;
encodeIntoResponse(resp, factors);
Upon reflection, updateAndGet isn't necessarily the right way to do this. If factors sometimes takes a long time to compute, then a long-time computation might get done many times, because lots of other shorter-time computations preempt it, so the update function keeps having to be called.
Instead, you can just always set the reference if you had to recompute it:
Holder h = holderRef.get();
if (h == null || !h.number.equals(i)) {
h = new Holder(i, factors(i));
holderRef.set(h);
}
return h.factors;
This may seem to violate what I said previously, in that separate calls to holderRef are not atomic, and thus not thread-safe.
It's a bit more nuanced, however: my first paragraph states that the lack of thread safety in the original code stems from the fact that you might get the factors for the wrong input. This problem doesn't occur here: you either get the holder for the right number (and hence the factors for the right number), or you compute the factors for the input.
The issue arises in what this holder is actually meant to be storing: the "last" number/factors is rather hard to define in terms of multithreading. When are you measuring "last-ness" from? The most recent call to start? The most recent call to finish? Other?
This code simply stores "a" previously computed value, without attempting to nail down this ambiguity.

Can a forEach lambda result in a race condition?

I am unsure of how lambdas work in practice, and I am concerned since under certain circumstances, lambdas can result in errors such as ConcurrentModificationExceptions if you use them incorrectly, which seems to be indicative of a race condition.
Consider the code below.
private class deltaCalculator{
Double valueA;
Double valueB;
//Init delta
volatile Double valueDelta = null;
private void calculateMinimum(List<T> dataSource){
dataSource.forEach((entry -> {
valueA = entry.getA();
valueB = entry.getB();
Double dummyDelta;
dummyDelta = Math.abs(valueA - valueB);
if(valueDelta == null){
setDelta(dummyDelta);
}else {
setDelta((valueDelta > dummyDelta) ? dummyDelta : valueDelta);
}
}));
}
private void setDelta(Double d){
this.valueDelta = d;
}
}
How does the forEach loop operate? Do different calls get passed to different threads where the JVM considers it appropriate, opening up the possibility of a race condition that could lead to incorrect minimum calculation?
If not, why can a forEach lambda throw a ConcurrentModificationException?
You'll get a ConcurrentModificationException if you try to modify the collection that you're iterating over while the for each loop runs. This could be done in a separate thread entirely, but much more commonly occurs when you try to modify the collection in the loop body.
Do different calls get passed to different threads where the JVM considers it appropriate, opening up the possibility of a race condition that could lead to incorrect minimum calculation?
No. No multithreading is taking place in your example above.

Does ConcurrentHashMap need synchronization when incrementing its values?

I know ConcurrentHashMap is thread-safe e.g.putIfAbsent,Replace etc., but I was wondering, is a block of code like the one below safe?
if (accumulator.containsKey(key)) { //accumulator is a ConcurrentHashMap
accumulator.put(key, accumulator.get(key)+1);
} else {
accumulator.put(key, 0);
}
Keep in mind that the accumulator value for a key may be asked by two different threads simultaneously, which would cause a problem in a normal HashMap. So do I need something like this?
ConcurrentHashMap<Integer,Object> locks;
...
locks.putIfAbsent(key,new Object());
synchronized(locks.get(key)) {
if (accumulator.containsKey(key)) {
accumulator.put(key, accumulator.get(key)+1);
} else {
accumulator.put(key, 0);
}
}
if (accumulator.containsKey(key)) { //accumulator is a ConcurrentHashMap
accumulator.put(key, accumulator.get(key)+1);
} else {
accumulator.put(key, 0);
}
No, this code is not thread-safe; accumulator.get(key) can be changed in between the get and the put, or the entry can be added between the containsKey and the put. If you're in Java 8, you can write accumulator.compute(key, (k, v) -> (v == null) ? 0 : v + 1), or any of the many equivalents, and it'll work. If you're not, the thing to do is write something like
while (true) {
Integer old = accumulator.get(key);
if (old == null) {
if (accumulator.putIfAbsent(key, 0) == null) {
// note: it's a little surprising that you want to put 0 in this case,
// are you sure you don't mean 1?
break;
}
} else if (accumulator.replace(key, old, old + 1)) {
break;
}
}
...which loops until it manages to make the atomic swap. This sort of loop is pretty much how you have to do it: it's how AtomicInteger works, and what you're asking for is AtomicInteger across many keys.
Alternately, you can use a library: e.g. Guava has AtomicLongMap and ConcurrentHashMultiset, which also do things like this.
I think the best solution for you would be to use an AtomicInteger. The nice feature here is that it is non-blocking, mutable and thread-safe. You can use the replace method offered by the CHM, but with that you will have to hold a lock of the segment/bucket-entry prior to the replace completing.
With the AtomicInteger you leverage quick non-blocking updates.
ConcurrentMap<Key, AtomicInteger> map;
then
map.get(key).incrementAndGet();
If you are using Java 8, LongAdder would be better.
You are correct that your first code snippet is unsafe. It's totally reasonable for the thread to get interrupted right after the check has been performed and for another thread to begin executing. Therefore in the first snippet the following could happen:
[Thread 1]: Check for key, return false
[Thread 2]: Check for key, return false
[Thread 2]: Put value 0 in for key
[Thread 1]: Put value 0 in for key
In this example, the behavior you would want would leave you in a state with the value for that key being set to 1, not 0.
Therefore locking is necessary.
Only individual actions on ConcurrentHashMap are thread-safe; taking multiple actions in sequence is not. Your first block of code is not thread-safe. It is possible, for example:
THREAD A: accumulator.containsKey(key) = false
THREAD B: accumulator.containsKey(key) = false
THREAD B: accumulator.put(key, 0)
THREAD A: accumulator.put(key, 0)
Similary, it is not thread-safe to get the accumulator value for a given key, increment it, and then put it back in the map. This is a three-step process, and it is possible for another thread to interrupt at any point.
Your second synchronized block of code is thread-safe.

ArrayList vs Vector - does this illustrate difference in synchronization?

I'm trying to understand the difference in behaviour of an ArrayList and a Vector. Does the following snippet in any way illustrate the difference in synchronization ? The output for the ArrayList (f1) is unpredictable while the output for the Vector (f2) is predictable. I think it may just be luck that f2 has predictable output because modifying f2 slightly to get the thread to sleep for even a ms (f3) causes an empty vector ! What's causing that ?
public class D implements Runnable {
ArrayList<Integer> al;
Vector<Integer> vl;
public D(ArrayList al_, Vector vl_) {
al = al_;
vl = vl_;
}
public void run() {
if (al.size() < 20)
f1();
else
f2();
} // 1
public void f1() {
if (al.size() == 0)
al.add(0);
else
al.add(al.get(al.size() - 1) + 1);
}
public void f2() {
if (vl.size() == 0)
vl.add(0);
else
vl.add(vl.get(vl.size() - 1) + 1);
}
public void f3() {
if (vl.size() == 0) {
try {
Thread.sleep(1);
vl.add(0);
} catch (InterruptedException e) {
System.out.println(e.getMessage());
}
} else {
vl.add(vl.get(vl.size() - 1) + 1);
}
}
public static void main(String... args) {
Vector<Integer> vl = new Vector<Integer>(20);
ArrayList<Integer> al = new ArrayList<Integer>(20);
for (int i = 1; i < 40; i++) {
new Thread(new D(al, vl), Integer.toString(i)).start();
}
}
}
To answer the question: Yes vector is synchronized, this means that concurrent actions on the data structure itself won't lead to unexpected behavior (e.g. NullPointerExceptions or something). Hence calls like size() are perfectly safe with a Vector in concurrent situations, but not with an ArrayList (note if there are only read accesses ArrayLists are safe too, we get into problems as soon as at least one thread writes to the datastructure, e.g. add/remove)
The problem is, that this low level synchronization is basically completely useless and your code already demonstrates this.
if (al.size() == 0)
al.add(0);
else
al.add(al.get(al.size() - 1) + 1);
What you want here is to add a number to your datastructure depending on the current size (ie if N threads execute this, in the end we'd want the list to contain the numbers [0..N)). Sadly that does not work:
Assume that 2 threads execute this code sample concurrently on an empty list/vector. The following timeline is quite possible:
T1: size() # go to true branch of if
T2: size() # alas we again take the true branch.
T1: add(0)
T2: add(0) # ouch
Both execute size() and get back the value 0. They then go into the true branch of the and both add 0 to the datastructure. That's not what you want.
Hence you'll have to synchronize in your business logic anyhow to make sure that size() and add() are executed atomically. Hence the synchronization of vector is quite useless in almost any scenario (contrary to some claims on modern JVMs the performance hit of an uncontended lock is completely negligible though, but the Collections API is much nicer so why not use it)
In The Beginning (Java 1.0) there was the "synchronized vector".
Which entailed a potentially HUGE performance hit.
Hence the addition of "ArrayList" and friends in Java 1.2 onwards.
Your code illustrates the rationale for making vectors synchronized in the first place. But it's simply unnecessary most of the time, and better done in other ways most of the rest of the time.
IMHO...
PS:
An interesting link:
http://www.coderanch.com/t/523384/java/java/ArrayList-Vector-size-incrementation
Vectors are Thread safe. ArrayLists are not. That is why ArrayList is faster than the vector.
The below link has nice info about this.
http://www.javaworld.com/javaworld/javaqa/2001-06/03-qa-0622-vector.html
I'm trying to understand the difference in behaviour of an ArrayList
and a Vector
Vector is synchronized while ArrayList is not. ArrayList is not thread-safe.
Does the following snippet in any way illustrate the difference in
synchronization ?
No difference since only Vector is sunchronized

Categories