Can a forEach lambda result in a race condition? - java

I am unsure of how lambdas work in practice, and I am concerned since under certain circumstances, lambdas can result in errors such as ConcurrentModificationExceptions if you use them incorrectly, which seems to be indicative of a race condition.
Consider the code below.
private class deltaCalculator{
Double valueA;
Double valueB;
//Init delta
volatile Double valueDelta = null;
private void calculateMinimum(List<T> dataSource){
dataSource.forEach((entry -> {
valueA = entry.getA();
valueB = entry.getB();
Double dummyDelta;
dummyDelta = Math.abs(valueA - valueB);
if(valueDelta == null){
setDelta(dummyDelta);
}else {
setDelta((valueDelta > dummyDelta) ? dummyDelta : valueDelta);
}
}));
}
private void setDelta(Double d){
this.valueDelta = d;
}
}
How does the forEach loop operate? Do different calls get passed to different threads where the JVM considers it appropriate, opening up the possibility of a race condition that could lead to incorrect minimum calculation?
If not, why can a forEach lambda throw a ConcurrentModificationException?

You'll get a ConcurrentModificationException if you try to modify the collection that you're iterating over while the for each loop runs. This could be done in a separate thread entirely, but much more commonly occurs when you try to modify the collection in the loop body.
Do different calls get passed to different threads where the JVM considers it appropriate, opening up the possibility of a race condition that could lead to incorrect minimum calculation?
No. No multithreading is taking place in your example above.

Related

Under which circumstances can toSet throw an java.lang.IllegalArgumentException?

Based on our Crashlytics logs it seems that we're running into the following exception from time to time:
Fatal Exception: java.lang.IllegalArgumentException
Illegal initial capacity: -1
...
java.util.HashMap.<init> (HashMap.java:448)
java.util.LinkedHashMap.<init> (LinkedHashMap.java:371)
java.util.HashSet.<init> (HashSet.java:161)
java.util.LinkedHashSet.<init> (LinkedHashSet.java:146)
kotlin.collections.CollectionsKt___CollectionsKt.toSet (CollectionsKt___CollectionsKt.java:1316)
But we're not sure when it is possible that this exception is actually thrown. The relevant code for this statement looks something like this:
private val markersMap = mutableMapOf<Any, Marker>()
...
synchronized(markersMap) {
val currentMarkers = markersMap.values.toSet() //it crashes here
// performing some operation on the markers
}
Right now we're suspecting multithreading to cause the issue as the markersMap is modified in multiple places, but as the map is already initialized by default we're not really sure how it can end up in less than an empty state. We also took a look at the toSet implementation:
if (this is Collection) {
return when (size) {
0 -> emptySet()
1 -> setOf(if (this is List) this[0] else iterator().next())
else -> toCollection(LinkedHashSet<T>(mapCapacity(size)))
}
}
Based on this, we'd assume that mapCapacity(size) returns -1, but we weren't able to find the actual implementation of mapCapacity to verify when this can happen.
Does anybody know when -1 is returned here, which in turn causes the constructor to fail?
Java collections are not synchronized and if you need to access a Map or any collection from multiple threads then you are required to take care of synchonization. as stated in LinkedHashMap's header
Note that this implementation is not synchronized.If multiple threads
access a linked hash map concurrently, and at least one of the threads
modifies the map structurally, it must be synchronized externally.
My guess is that you are probably performing structural modifications(mix of put and remove) on the Map without synchronization, which can cause this issue. for example
fun main(){
val markersMap = mutableMapOf<Any, Any>()
(1..1000).forEach { markersMap.put(it, "$it") }
val t1 = Thread{
(1..1000).forEach { markersMap.remove(it)
if(markersMap.size < 0){
print("SIZE IS ${markersMap.size}")
}
}
}
val t2 = Thread{
(1..1000).forEach {
markersMap.remove(it)
if(markersMap.size < 0){
print("SIZE IS ${markersMap.size}")
}
}
}
t1.start()
t2.start()
}
On my machine this code prints SIZE IS -128, SIZE IS -127 and lot many other negative values and when I added markersMap.values.toSet() inside one of the if blocks, this happened

Not thread safe class

Why below class is not thread safe ?
public class UnsafeCachingFactorizer implements Servlet {
private final AtomicReference<BigInteger> lastNumber = new AtomicReference<>();
private final AtomicReference<BigInteger[]> lastFactors = new AtomicReference<>();
public void service(ServletRequest req, ServletResponse resp) {
BigInteger i = extractFromRequest(req);
if i.equals(lastNumber.get())) {
encodeIntoResponse(resp, lastFactors.get());
}
else {
BigInteger[] factors = factor(i);
lastNumber.set(i);
lastFactors.set(factors);
encodeIntoResponse(resp, factors);
}
}
}
Instance variables are thread safe, then why the whole class is not thread safe ?
It's not thread safe because you don't always get the right answer when multiple threads call the code.
Let's say that lastNumber=1 and lastFactors=factors(1). In the one-thread case, where the thread calls with i=1:
T1: if (lastNumber.get().equals(1)) { // true
T1: encodeIntoResponse(resp, lastFactors.get());
Fine, this is the expected result. But consider a multi-threaded case, where the actions within each thread takes place in the same order, but can arbitrarily interleave. One such interleaving is (where i=1 and i=2 for the two threads respectively):
T1: if (lastNumber.get().equals(1)) { // true
T2: if (lastNumber.get().equals(2)) { // false
T2: } else {
T2: lastNumber.set(2);
T2: lastFactors.set(factors(2));
T1: encodeIntoResponse(resp, lastFactors.get()); // oops! You wrote the factors(2), not factors(1).
The problem is that you're not getting and setting the AtomicReferences atomically: that is, there is nothing to stop another thread sneaking in and changing the values (of one or either) between the get and the set.
In general, whilst individual calls to methods on an AtomicReference are atomic, multiple calls are not (and they definitely aren't atomic between instances of AtomicReference). So, if you ever find yourself writing code like:
if (/* some condition with ref.get() */) {
/* some statement with ref.set() */
}
then you probably aren't using AtomicReference correctly (or, at least, it's not thread-safe).
To fix this, you need something that can be read and set atomically. For example, create a simple class to hold both:
class Holder {
// Add a ctor to initialize these.
final BigInteger number;
final BigInteger[] factors;
}
Then store this in a single AtomicReference, and use updateAndGet:
BigInteger[] factors = holderRef.updateAndGet(h -> {
if (h != null && h.number.equals(i)) {
return h;
}
return new Holder(i, factor(i));
}).factors;
encodeIntoResponse(resp, factors);
Upon reflection, updateAndGet isn't necessarily the right way to do this. If factors sometimes takes a long time to compute, then a long-time computation might get done many times, because lots of other shorter-time computations preempt it, so the update function keeps having to be called.
Instead, you can just always set the reference if you had to recompute it:
Holder h = holderRef.get();
if (h == null || !h.number.equals(i)) {
h = new Holder(i, factors(i));
holderRef.set(h);
}
return h.factors;
This may seem to violate what I said previously, in that separate calls to holderRef are not atomic, and thus not thread-safe.
It's a bit more nuanced, however: my first paragraph states that the lack of thread safety in the original code stems from the fact that you might get the factors for the wrong input. This problem doesn't occur here: you either get the holder for the right number (and hence the factors for the right number), or you compute the factors for the input.
The issue arises in what this holder is actually meant to be storing: the "last" number/factors is rather hard to define in terms of multithreading. When are you measuring "last-ness" from? The most recent call to start? The most recent call to finish? Other?
This code simply stores "a" previously computed value, without attempting to nail down this ambiguity.

Thread-safe Map in Java

I understand the overall concepts of multi-threading and synchronization but am new to writing thread-safe code. I currently have the following code snippet:
synchronized(compiledStylesheets) {
if(compiledStylesheets.containsKey(xslt)) {
exec = compiledStylesheets.get(xslt);
} else {
exec = compile(s, imports);
compiledStylesheets.put(xslt, exec);
}
}
where compiledStylesheets is a HashMap (private, final). I have a few questions.
The compile method can take a few hundred milliseconds to return. This seems like a long time to have the object locked, but I don't see an alternative. Also, it is unnecessary to use Collections.synchronizedMap in addition to the synchronized block, correct? This is the only code that hits this object other than initialization/instantiation.
Alternatively, I know of the existence of a ConcurrentHashMap but I don't know if that's overkill. The putIfAbsent() method will not be usable in this instance because it doesn't allow me to skip the compile() method call. I also don't know if it will solve the "modified after containsKey() but before put()" problem, or if that's even really a concern in this case.
Edit: Spelling
For tasks of this nature, I highly recommend Guava caching support.
If you can't use that library, here is a compact implementation of a Multiton. Use of the FutureTask was a tip from assylias, here, via OldCurmudgeon.
public abstract class Cache<K, V>
{
private final ConcurrentMap<K, Future<V>> cache = new ConcurrentHashMap<>();
public final V get(K key)
throws InterruptedException, ExecutionException
{
Future<V> ref = cache.get(key);
if (ref == null) {
FutureTask<V> task = new FutureTask<>(new Factory(key));
ref = cache.putIfAbsent(key, task);
if (ref == null) {
task.run();
ref = task;
}
}
return ref.get();
}
protected abstract V create(K key)
throws Exception;
private final class Factory
implements Callable<V>
{
private final K key;
Factory(K key)
{
this.key = key;
}
#Override
public V call()
throws Exception
{
return create(key);
}
}
}
I think you are looking for a Multiton.
There's a very good Java one here that #assylas posted some time ago.
You can loosen the lock at the risk of an occasional doubly compiled stylesheet in race condition.
Object y;
// lock here if needed
y = map.get(x);
if(y == null) {
y = compileNewY();
// lock here if needed
map.put(x, y); // this may happen twice, if put is t.s. one will be ignored
y = map.get(x); // essential because other thread's y may have been put
}
This requires get and put to be atomic, which is true in the case of ConcurrentHashMap and you can achieve by wrapping individual calls to get and put with a lock in your class. (As I tried to explain with "lock here if needed" comments - the point being you only need to wrap individual calls, not have one big lock).
This is a standard thread safe pattern to use even with ConcurrentHashMap (and putIfAbsent) to minimize the cost of compiling twice. It still needs to be acceptable to compile twice sometimes, but it should be okay even if expensive.
By the way, you can solve that problem. Usually the above pattern isn't used with a heavy function like compileNewY but a lightweight constructor new Y(). e.g. do this:
class PrecompiledY {
public volatile Y y;
private final AtomicBoolean compiled = new AtomicBoolean(false);
public void compile() {
if(!compiled.getAndSet(true)) {
y = compile();
}
}
}
// ...
ConcurrentMap<X, PrecompiledY> myMap; // alternatively use proper locking
py = map.get(x);
if(py == null) {
py = new PrecompiledY(); // much cheaper than compiling
map.put(x, y); // this may happen twice, if put is t.s. one will be ignored
y = map.get(x); // essential because other thread's y may have been put
y.compile(); // object that didn't get inserted never gets compiled
}
Also:
Alternatively, I know of the existence of a ConcurrentHashMap but I don't know if that's overkill.
Given that your code is heavily locking, ConcurrentHashMap is almost certainly far faster, so not overkill. (And much more likely to be bug-free. Concurrency bugs are not fun to fix.)
Please see Erickson's comment below. Using double-checked locking with Hashmaps is not very smart
The compile method can take a few hundred milliseconds to return. This seems like a long time to have the object locked, but I don't see an alternative.
You can use double-checked locking, and note that you don't need any lock before get since you never remove anything from the map.
if(compiledStylesheets.containsKey(xslt)) {
exec = compiledStylesheets.get(xslt);
} else {
synchronized(compiledStylesheets) {
if(compiledStylesheets.containsKey(xslt)) {
// another thread might have created it while
// this thread was waiting for lock
exec = compiledStylesheets.get(xslt);
} else {
exec = compile(s, imports);
compiledStylesheets.put(xslt, exec);
}
}
}
}
Also, it is unnecessary to use Collections.synchronizedMap in addition to the synchronized block, correct?
Correct
This is the only code that hits this object other than initialization/instantiation.
First of all, the code as you posted it is race-condition-free because containsKey() result will never change while compile() method is running.
Collections.synchronizedMap() is useless for your case as stated above because it wraps all map methods into a synchronized block using either this as a mutex or another object you provided (for two-argument version).
IMO using ConcurrentHashMap is also not an option because it stripes locks based on key hashCode() result; its concurrent iterators is also useless here.
If you really want compile() out of synchronized block, you may pre-calculate if before checking containsKey(). This may draw the overall performance back, but may be better than calling it in synchronized block. To make a decision, personally I would consider how often key "miss" is happening and so, which option is preferrable - keep the lock for longer times or calculate your stuff always.

Is synchronization needed while reading if no contention could occur

Consider code sniper below:
package sync;
public class LockQuestion {
private String mutable;
public synchronized void setMutable(String mutable) {
this.mutable = mutable;
}
public String getMutable() {
return mutable;
}
}
At time Time1 thread Thread1 will update ‘mutable’ variable. Synchronization is needed in setter in order to flush memory from local cache to main memory.
At time Time2 ( Time2 > Time1, no thread contention) thread Thread2 will read value of mutable.
Question is – do I need to put synchronized before getter? Looks like this won’t cause any issues - memory should be up to date and Thread2’s local cache memory should be invalidated&updated by Thread1, but I’m not sure.
Rather than wonder, why not just use the atomic references in java.util.concurrent?
(and for what it's worth, my reading of happens-before does not guarantee that Thread2 will see changes to mutable unless it also uses synchronized ... but I always get a headache from that part of the JLS, so use the atomic references)
It will be fine if you make mutable volatile, details in the "cheap read-write lock"
Are you absolutely sure that the getter will be called only after the setter is called? If so, you don't need the getter to be synchronized, since concurrent reads do not need to synchronized.
If there is a chance that get and set can be called concurrently then you definitely need to synchronize the two.
If you worry so much about the performance in the reading thread, then what you do is read the value once using proper synchronization or volatile or atomic references. Then you assign the value to a plain old variable.
The assign to the plain variable is guaranteed to happen after the atomic read (because how else could it get the value?) and if the value will never be written to by another thread again you are all set.
I think you should start with something which is correct and optimise later when you know you have an issue. I would just use AtomicReference unless a few nano-seconds is too long. ;)
public static void main(String... args) {
AtomicReference<String> ars = new AtomicReference<String>();
ars.set("hello");
long start = System.nanoTime();
int runs = 1000* 1000 * 1000;
int length = test(ars, runs);
long time = System.nanoTime() - start;
System.out.printf("get() costs " + 1000*time / runs + " ps.");
}
private static int test(AtomicReference<String> ars, int runs) {
int len = 0;
for (int i = 0; i < runs; i++)
len = ars.get().length();
return len;
}
Prints
get() costs 1219 ps.
ps is a pico-second, with is 1 millionth of a micro-second.
This probably will never result in incorrect behavior, but unless you also guarantee the order that the threads startup in, you cannot necessarily guarantee that the compiler didn't reorder the read in Thread2 before the write in Thread1. More specifically, the entire Java runtime only has to guarantee that threads execute as if they were run in serial. So, as long as the thread has the same output running serially under optimizations, the entire language stack (compiler, hardware, language runtime) can do
pretty much whatever it wants. Including allowing Thread2 to cache the the result of LockQuestion.getMutable().
In practice, I would be very surprised if that ever happened. If you want to guarantee that this doesn't happen, have LockQuestion.mutable be declared as final and get initialized in the constructor. Or use the following idiom:
private static class LazySomethingHolder {
public static Something something = new Something();
}
public static Something getInstance() {
return LazySomethingHolder.something;
}

Avoiding a lost update in Java without directly using synchronization

I am wondering if it is possible to avoid the lost update problem, where multiple threads are updating the same date, while avoiding using synchronized(x) { }.
I will be doing numerous adds and increments:
val++;
ary[x] += y;
ary[z]++;
I do not know how Java will compile these into byte code and if a thread could be interrupted in the middle of one of these statements blocks of byte code. In other words are those statements thread safe?
Also, I know that the Vector class is synchronized, but I am not sure what that means. Will the following code be thread safe in that the value at position i will not change between the vec.get(i) and vec.set(...).
class myClass {
Vector<Integer> vec = new Vector<>(Integer);
public void someMethod() {
for (int i=0; i < vec.size(); i++)
vec.set(i, vec.get(i) + value);
}
}
Thanks in advance.
For the purposes of threading, ++ and += are treated as two operations (four for double and long). So updates can clobber one another. Not just be one, but a scheduler acting at the wrong moment could wipe out milliseconds of updates.
java.util.concurrent.atomic is your friend.
Your code can be made safe, assuming you don't mind each element updating individually and you don't change the size(!), as:
for (int i=0; i < vec.size(); i++) {
synchronized (vec) {
vec.set(i, vec.get(i) + value);
}
}
If you want to add resizing to the Vector you'll need to move the synchronized statement outside of the for loop, and you might as well just use plain new ArrayList. There isn't actually a great deal of use for a synchronised list.
But you could use AtomicIntegerArray:
private final AtomicIntegerArray ints = new AtomicIntegerArray(KNOWN_SIZE);
[...]
int len = ints.length();
for (int i=0; i<len; ++i) {
ints.addAndGet(i, value);
}
}
That has the advantage of no locks(!) and no boxing. The implementation is quite fun too, and you would need to understand it do more complex update (random number generators, for instance).
vec.set() and vec.get() are thread safe in that they will not set and retrieve values in such a way as to lose sets and gets in other threads. It does not mean that your set and your get will happen without an interruption.
If you're really going to be writing code like in the examples above, you should probably lock on something. And synchronized(vec) { } is as good as any. You're asking here for two operations to happen in sync, not just one thread safe operation.
Even java.util.concurrent.atomic will only ensure one operation (a get or set) will happen safely. You need to get-and-increment in one operation.

Categories