Faced a project with this code:
public class IndexUpdater implements Runnable {
#Override
public void run() {
final AtomicInteger count = new AtomicInteger(0);
FindIterable<Document> iterable = mongoService.getDocuments(entryMeta, null, "guid");
iterable.forEach(new Block<Document>() {
#Override
public void apply(final Document document) {
count.incrementAndGet();
// A lot of code....
if (count.get() / 100 * 100 == count.get()) {
LOG.info(String.format("Processing: %s", count.get()));
}
}
});
}
}
Here I am interested in three lines of code:
if (count.get() / 100 * 100 == count.get()) {
LOG.info(String.format("Processing: %s", count.get()));
}
Does this condition make sense considering multithreading and the type of the AtomicInteger variable? Or is this a pointless check?
Interestingly, IntellijIdea does not emphasize this construct as meaningless.
I wouldn't call this code "meaningless", but rather wrong (or, it has probably-unintended semantics).
If this were being invoked in a multithreaded way, you wouldn't always get the same value for count.get() on the three invocations in the method (there are four if you include count.incrementAndGet()).
The consequences of this don't look catastrophic in this case - you'd perhaps miss a few logging statements, and you might see some unexpected messages like Processing 101, and then wonder why the number isn't a multiple of 100. But perhaps if you employed the same construct elsewhere, there would be more significant implications.
Put the result of count.incrementAndGet() into a variable (*), so you can use that afterwards.
But then, it would be easier to use count.get() % 100 == 0 as well:
int value = count.incrementAndGet();
// A lot of code....
if (value % 100 == 0) {
LOG.info(String.format("Processing: %s", value));
}
which is both correct (or, it is probably what is intended) and easier to read.
(*) Depending on what you actually want to show with this logging message, you might want to put the count.incrementAndGet() after "A lot of code".
Related
I was about to write something about this, but maybe it is better to have a second opinion before appearing like a fool...
So the idea in the next piece of code (android's room package v2.4.1, RoomTrackingLiveData), is that the winner thread is kept alive, and is forced to check for contention that may have entered the process (coming from losing threads) while computing.
While fail CAS operations performed by these losing threads keep them out from entering and executing code, preventing repeating signals (mComputeFunction.call() OR postValue()).
final Runnable mRefreshRunnable = new Runnable() {
#WorkerThread
#Override
public void run() {
if (mRegisteredObserver.compareAndSet(false, true)) {
mDatabase.getInvalidationTracker().addWeakObserver(mObserver);
}
boolean computed;
do {
computed = false;
if (mComputing.compareAndSet(false, true)) {
try {
T value = null;
while (mInvalid.compareAndSet(true, false)) {
computed = true;
try {
value = mComputeFunction.call();
} catch (Exception e) {
throw new RuntimeException("Exception while computing database"
+ " live data.", e);
}
}
if (computed) {
postValue(value);
}
} finally {
mComputing.set(false);
}
}
} while (computed && mInvalid.get());
}
};
final Runnable mInvalidationRunnable = new Runnable() {
#MainThread
#Override
public void run() {
boolean isActive = hasActiveObservers();
if (mInvalid.compareAndSet(false, true)) {
if (isActive) {
getQueryExecutor().execute(mRefreshRunnable);
}
}
}
};
The most obvious thing here is that atomics are being used for everything they are not good at:
Identifying losers and ignoring winners (what reactive patterns need).
AND a happens once behavior, performed by the loser thread.
So this is completely counter intuitive to what atomics are able to achieve, since they are extremely good at defining winners, AND anything that requires a "happens once" becomes impossible to ensure state consistency (the last one is suitable to start a philosophical debate about concurrency, and I will definitely agree with any conclusion).
If atomics are used as: "Contention checkers" and "Contention blockers" then we can implement the exact principle with a volatile check of an atomic reference after a successful CAS.
And checking this volatile against the snapshot/witness during every other step of the process.
private final AtomicInteger invalidationCount = new AtomicInteger();
private final IntFunction<Runnable> invalidationRunnableFun = invalidationVersion -> (Runnable) () -> {
if (invalidationVersion != invalidationCount.get()) return;
try {
T value = computeFunction.call();
if (invalidationVersion != invalidationCount.get()) return; //In case computation takes too long...
postValue(value);
} catch (Exception e) {
e.printStackTrace();
}
};
getQueryExecutor().execute(invalidationRunnableFun.apply(invalidationCount.incrementAndGet()));
In this case, each thread is left with the individual responsibility of checking their position in the contention lane, if their position moved and is not at the front anymore, it means that a new thread entered the process, and they should stop further processing.
This alternative is so laughably simple that my first question is:
Why didn't they do it like this?
Maybe my solution has a flaw... but the thing about the first alternative (the nested spin-lock) is that it follows the idea that an atomic CAS operation cannot be verified a second time, and that a verification can only be achieved with a cmpxchg process.... which is... false.
It also follows the common (but wrong) believe that what you define after a successful CAS is the sacred word of GOD... as I've seen code seldom check for concurrency issues once they enter the if body.
if (mInvalid.compareAndSet(false, true)) {
// Ummm... yes... mInvalid is still true...
// Let's use a second atomicReference just in case...
}
It also follows common code conventions that involve "double-<enter something>" in concurrency scenarios.
So only because the first code follows those ideas, is that I am inclined to believe that my solution is a valid and better alternative.
Even though there is an argument in favor of the "nested spin-lock" option, but does not hold up much:
The first alternative is "safer" precisely because it is SLOWER, so it has MORE time to identify contention at the end of the current of incoming threads.
BUT is not even 100% safe because of the "happens once" thing that is impossible to ensure.
There is also a behavior with the code, that, when it reaches the end of a continuos flow of incoming threads, 2 signals are dispatched one after the other, the second to last one, and then the last one.
But IF it is safer because it is slower, wouldn't that defeat the goal of using atomics, since their usage is supposed to be with the aim of being a better performance alternative in the first place?
Why below class is not thread safe ?
public class UnsafeCachingFactorizer implements Servlet {
private final AtomicReference<BigInteger> lastNumber = new AtomicReference<>();
private final AtomicReference<BigInteger[]> lastFactors = new AtomicReference<>();
public void service(ServletRequest req, ServletResponse resp) {
BigInteger i = extractFromRequest(req);
if i.equals(lastNumber.get())) {
encodeIntoResponse(resp, lastFactors.get());
}
else {
BigInteger[] factors = factor(i);
lastNumber.set(i);
lastFactors.set(factors);
encodeIntoResponse(resp, factors);
}
}
}
Instance variables are thread safe, then why the whole class is not thread safe ?
It's not thread safe because you don't always get the right answer when multiple threads call the code.
Let's say that lastNumber=1 and lastFactors=factors(1). In the one-thread case, where the thread calls with i=1:
T1: if (lastNumber.get().equals(1)) { // true
T1: encodeIntoResponse(resp, lastFactors.get());
Fine, this is the expected result. But consider a multi-threaded case, where the actions within each thread takes place in the same order, but can arbitrarily interleave. One such interleaving is (where i=1 and i=2 for the two threads respectively):
T1: if (lastNumber.get().equals(1)) { // true
T2: if (lastNumber.get().equals(2)) { // false
T2: } else {
T2: lastNumber.set(2);
T2: lastFactors.set(factors(2));
T1: encodeIntoResponse(resp, lastFactors.get()); // oops! You wrote the factors(2), not factors(1).
The problem is that you're not getting and setting the AtomicReferences atomically: that is, there is nothing to stop another thread sneaking in and changing the values (of one or either) between the get and the set.
In general, whilst individual calls to methods on an AtomicReference are atomic, multiple calls are not (and they definitely aren't atomic between instances of AtomicReference). So, if you ever find yourself writing code like:
if (/* some condition with ref.get() */) {
/* some statement with ref.set() */
}
then you probably aren't using AtomicReference correctly (or, at least, it's not thread-safe).
To fix this, you need something that can be read and set atomically. For example, create a simple class to hold both:
class Holder {
// Add a ctor to initialize these.
final BigInteger number;
final BigInteger[] factors;
}
Then store this in a single AtomicReference, and use updateAndGet:
BigInteger[] factors = holderRef.updateAndGet(h -> {
if (h != null && h.number.equals(i)) {
return h;
}
return new Holder(i, factor(i));
}).factors;
encodeIntoResponse(resp, factors);
Upon reflection, updateAndGet isn't necessarily the right way to do this. If factors sometimes takes a long time to compute, then a long-time computation might get done many times, because lots of other shorter-time computations preempt it, so the update function keeps having to be called.
Instead, you can just always set the reference if you had to recompute it:
Holder h = holderRef.get();
if (h == null || !h.number.equals(i)) {
h = new Holder(i, factors(i));
holderRef.set(h);
}
return h.factors;
This may seem to violate what I said previously, in that separate calls to holderRef are not atomic, and thus not thread-safe.
It's a bit more nuanced, however: my first paragraph states that the lack of thread safety in the original code stems from the fact that you might get the factors for the wrong input. This problem doesn't occur here: you either get the holder for the right number (and hence the factors for the right number), or you compute the factors for the input.
The issue arises in what this holder is actually meant to be storing: the "last" number/factors is rather hard to define in terms of multithreading. When are you measuring "last-ness" from? The most recent call to start? The most recent call to finish? Other?
This code simply stores "a" previously computed value, without attempting to nail down this ambiguity.
I am unsure of how lambdas work in practice, and I am concerned since under certain circumstances, lambdas can result in errors such as ConcurrentModificationExceptions if you use them incorrectly, which seems to be indicative of a race condition.
Consider the code below.
private class deltaCalculator{
Double valueA;
Double valueB;
//Init delta
volatile Double valueDelta = null;
private void calculateMinimum(List<T> dataSource){
dataSource.forEach((entry -> {
valueA = entry.getA();
valueB = entry.getB();
Double dummyDelta;
dummyDelta = Math.abs(valueA - valueB);
if(valueDelta == null){
setDelta(dummyDelta);
}else {
setDelta((valueDelta > dummyDelta) ? dummyDelta : valueDelta);
}
}));
}
private void setDelta(Double d){
this.valueDelta = d;
}
}
How does the forEach loop operate? Do different calls get passed to different threads where the JVM considers it appropriate, opening up the possibility of a race condition that could lead to incorrect minimum calculation?
If not, why can a forEach lambda throw a ConcurrentModificationException?
You'll get a ConcurrentModificationException if you try to modify the collection that you're iterating over while the for each loop runs. This could be done in a separate thread entirely, but much more commonly occurs when you try to modify the collection in the loop body.
Do different calls get passed to different threads where the JVM considers it appropriate, opening up the possibility of a race condition that could lead to incorrect minimum calculation?
No. No multithreading is taking place in your example above.
I'm trying to understand the difference in behaviour of an ArrayList and a Vector. Does the following snippet in any way illustrate the difference in synchronization ? The output for the ArrayList (f1) is unpredictable while the output for the Vector (f2) is predictable. I think it may just be luck that f2 has predictable output because modifying f2 slightly to get the thread to sleep for even a ms (f3) causes an empty vector ! What's causing that ?
public class D implements Runnable {
ArrayList<Integer> al;
Vector<Integer> vl;
public D(ArrayList al_, Vector vl_) {
al = al_;
vl = vl_;
}
public void run() {
if (al.size() < 20)
f1();
else
f2();
} // 1
public void f1() {
if (al.size() == 0)
al.add(0);
else
al.add(al.get(al.size() - 1) + 1);
}
public void f2() {
if (vl.size() == 0)
vl.add(0);
else
vl.add(vl.get(vl.size() - 1) + 1);
}
public void f3() {
if (vl.size() == 0) {
try {
Thread.sleep(1);
vl.add(0);
} catch (InterruptedException e) {
System.out.println(e.getMessage());
}
} else {
vl.add(vl.get(vl.size() - 1) + 1);
}
}
public static void main(String... args) {
Vector<Integer> vl = new Vector<Integer>(20);
ArrayList<Integer> al = new ArrayList<Integer>(20);
for (int i = 1; i < 40; i++) {
new Thread(new D(al, vl), Integer.toString(i)).start();
}
}
}
To answer the question: Yes vector is synchronized, this means that concurrent actions on the data structure itself won't lead to unexpected behavior (e.g. NullPointerExceptions or something). Hence calls like size() are perfectly safe with a Vector in concurrent situations, but not with an ArrayList (note if there are only read accesses ArrayLists are safe too, we get into problems as soon as at least one thread writes to the datastructure, e.g. add/remove)
The problem is, that this low level synchronization is basically completely useless and your code already demonstrates this.
if (al.size() == 0)
al.add(0);
else
al.add(al.get(al.size() - 1) + 1);
What you want here is to add a number to your datastructure depending on the current size (ie if N threads execute this, in the end we'd want the list to contain the numbers [0..N)). Sadly that does not work:
Assume that 2 threads execute this code sample concurrently on an empty list/vector. The following timeline is quite possible:
T1: size() # go to true branch of if
T2: size() # alas we again take the true branch.
T1: add(0)
T2: add(0) # ouch
Both execute size() and get back the value 0. They then go into the true branch of the and both add 0 to the datastructure. That's not what you want.
Hence you'll have to synchronize in your business logic anyhow to make sure that size() and add() are executed atomically. Hence the synchronization of vector is quite useless in almost any scenario (contrary to some claims on modern JVMs the performance hit of an uncontended lock is completely negligible though, but the Collections API is much nicer so why not use it)
In The Beginning (Java 1.0) there was the "synchronized vector".
Which entailed a potentially HUGE performance hit.
Hence the addition of "ArrayList" and friends in Java 1.2 onwards.
Your code illustrates the rationale for making vectors synchronized in the first place. But it's simply unnecessary most of the time, and better done in other ways most of the rest of the time.
IMHO...
PS:
An interesting link:
http://www.coderanch.com/t/523384/java/java/ArrayList-Vector-size-incrementation
Vectors are Thread safe. ArrayLists are not. That is why ArrayList is faster than the vector.
The below link has nice info about this.
http://www.javaworld.com/javaworld/javaqa/2001-06/03-qa-0622-vector.html
I'm trying to understand the difference in behaviour of an ArrayList
and a Vector
Vector is synchronized while ArrayList is not. ArrayList is not thread-safe.
Does the following snippet in any way illustrate the difference in
synchronization ?
No difference since only Vector is sunchronized
This code should produce even and uneven output because there is no synchronized on any methods. Yet the output on my JVM is always even. I am really confused as this example comes straight out of Doug Lea.
public class TestMethod implements Runnable {
private int index = 0;
public void testThisMethod() {
index++;
index++;
System.out.println(Thread.currentThread().toString() + " "
+ index );
}
public void run() {
while(true) {
this.testThisMethod();
}
}
public static void main(String args[]) {
int i = 0;
TestMethod method = new TestMethod();
while(i < 20) {
new Thread(method).start();
i++;
}
}
}
Output
Thread[Thread-8,5,main] 135134
Thread[Thread-8,5,main] 135136
Thread[Thread-8,5,main] 135138
Thread[Thread-8,5,main] 135140
Thread[Thread-8,5,main] 135142
Thread[Thread-8,5,main] 135144
I tried with volatile and got the following (with an if to print only if odd):
Thread[Thread-12,5,main] 122229779
Thread[Thread-12,5,main] 122229781
Thread[Thread-12,5,main] 122229783
Thread[Thread-12,5,main] 122229785
Thread[Thread-12,5,main] 122229787
Answer to comments:
the index is infact shared, because we have one TestMethod instance but many Threads that call testThisMethod() on the one TestMethod that we have.
Code (no changes besides the mentioned above):
public class TestMethod implements Runnable {
volatile private int index = 0;
public void testThisMethod() {
index++;
index++;
if(index % 2 != 0){
System.out.println(Thread.currentThread().toString() + " "
+ index );
}
}
public void run() {
while(true) {
this.testThisMethod();
}
}
public static void main(String args[]) {
int i = 0;
TestMethod method = new TestMethod();
while(i < 20) {
new Thread(method).start();
i++;
}
}
}
First off all: as others have noted there's no guarantee at all, that your threads do get interrupted between the two increment operations.
Note that printing to System.out pretty likely forces some kind of synchronization on your threads, so your threads are pretty likely to have just started a time slice when they return from that, so they will probably complete the two incrementation operations and then wait for the shared resource for System.out.
Try replacing the System.out.println() with something like this:
int snapshot = index;
if (snapshot % 2 != 0) {
System.out.println("Oh noes! " + snapshot);
}
You don't know that. The point of automatic scheduling is that it makes no guarantees. It might treat two threads that run the same code completely different. Or completely the same. Or completely the same for an hour and then suddenly different...
The point is, even if you fix the problems mentioned in the other answers, you still cannot rely on things coming out a particular way; you must always be prepared for any possible interleaving that the Java memory and threading model allows, and that includes the possibility that the println always happens after an even number of increments, even if that seems unlikely to you on the face of it.
The result is exactly as I would expect. index is being incremented twice between outputs, and there is no interaction between threads.
To turn the question around - why would you expect odd outputs?
EDIT: Whoops. I wrongly assumed a new runnable was being created per Thread, and therefore there was a distinct index per thread, rather than shared. Disturbing how such a flawed answer got 3 upvotes though...
You have not marked index as volatile. This means that the compiler is allowed to optimize accesses to it, and it probably merges your 2 increments to one addition.
You get the output of the very first thread you start, because this thread loops and gives no chance to other threads to run.
So you should Thread.sleep() or (not recommended) Thread.yield() in the loop.