Are accumulators thread-safe? - java

I'm using accumulators and wanted to know if these objects are thread-safe?
accumInt is a type of AccumulatorParam<Integer>.
// Current value accumInt -> 6
AccumulatorThread t1 = new AccumulatorThread();
t1.setAccum(accumInt);
t1.setValueToAdd(5);
AccumulatorThread t2 = new AccumulatorThread();
t2.setAccum(accumInt);
t2.setValueToAdd(7);
new Thread(t1).start();
new Thread(t2).start();
System.out.println(accumInt.value()); // 11 or 13 or 18
AccumlatorThread class:
class AccumulatorThread implements Runnable {
Accumulator<Integer> accum;
Integer valueToAdd;
public Integer getValueToAdd() {
return valueToAdd;
}
public void setValueToAdd(Integer valueToAdd) {
this.valueToAdd = valueToAdd;
}
public Accumulator<Integer> getAccum() {
return accum;
}
public void setAccum(Accumulator<Integer> accum) {
this.accum = accum;
}
public void run() {
System.out.println("Value to Add in Thread : "+valueToAdd);
accum.add(valueToAdd);
}
}
The behavior shows that it is not a thread safe. Am I missing something?

OOC why are you both setting and reading the accumulator in the same program? Accumulators are generally added to by the worker threads and may only be read by the driver thread.
Worker1: accumulator.add(increment)
Worker2: accumulator.add(someOtherIncrement)
Driver: println(accumulator.value)
Now you are asking about mulithreading for setting/reading values in different threads on the driver. To what purpose? In that case just use a local JVM AtomicInteger or AtomicLong.
Accumulators are variables that are only “added” to through an associative operation and can therefore be efficiently supported in parallel.

Accumulators are not thread-safe. Only SparkContext can be used in multiple threads.

To expand on the other two great answers from #javadba and #zsxwing.
My understanding of Apache Spark is that they may or may not be thread-safe. It does not really matter. Since the driver is "far away" from its workers (they usually talk to each other over the network or at least between JVMs -- unless it's local mode) all updates to an accumulator arrive in messages that are processed one by one and therefore ensure single-threaded update to the accumulator.

Accumulators are not thread-safe, in fact they do not need to be thread-safe. For executors, accumulators are write only variables, they can be added by executors and they can be read by the driver. Driver makes use of DAGScheduler.updateAccumulators method to update values of accumulators after task is completed, and this method is called only from a thread that runs scheduling loop. At a time, only one task completion event is handled. That's why there is no need for accumulators to be thread-safe.

Related

Multiple thread accessing same data but getting latest data?

I wrote this program:
package com.example.threads;
import java.util.concurrent.ConcurrentHashMap;
public class ConcurrentHashMapBehaviour {
private static ConcurrentHashMap<String, String> chm = new ConcurrentHashMap<>();
private static Object _lock = new Object();
public static void main(String[] args) {
Thread t = new Thread(new MyThread());
t.start();
int counter = 0;
while (true) {
String val = "FirstVal" + counter;
counter++;
String currentVal = null;
synchronized (_lock) {
chm.put("first", val);
currentVal = chm.get("first");
}
System.out.println("In Main thread, current value is : " + currentVal);
}
}
static class MyThread implements Runnable {
#Override
public void run() {
String val = null;
while (true) {
synchronized (_lock) {
val = chm.get("first");
}
System.out.println("Value seen in MyThread is " + val);
}
}
}
}
I am sharing a common data between these thread viz: chm (ConcurrentHashMap). I made this to run in debug mode in which I made Main thread run more times than MyThread, both are controlled by _lock.
So, for instance, I made to run Main thread twice and so the value of "first" key would be "FirstVal1". Then i made Main Thread to halt and made MyThread to proceed, it was able to get the latest value, even though main thread was run multiple times.
How is this possible? I was under the impression that this variable needs to be volatile in order for these MyThread to get the latest values.
I didn't understand this behaviour. Can anyone decipher this where I am missing?
First, you're using a ConcurrentHashMap, which is safe to use in a multi-threaded environment, so if a thread puts a value into it, other threads will be able to see that value.
Second, you are synchronizing access to the map. That will ensure only one thread will write to the map.
Each such explicit synchronization also includes a memory-barrier, which will write any results waiting in a cache to be written to the main memory, making it possible for other threads to see it. Which is what a volatile variable access is: access to volatile values have memory visibility guarantees.
If you want to see data races in your program, remove all synchronization primitives and try again. That does not guarantee that you'll observe a race all the time, but you should be able to see unexpected values every now and then.
There are three misconceptions here:
Writing to a volatile variable guarantees that all changes made by the writing thread are published, i.e. can be seen by other threads. See The Java Language Specification Chapter 8 for all the details. This does not mean that the absence of the volatile modifier forbids publication. JVM implementations may be (and actually are) implemented much more forgiving. This is one of the reasons concurrency problems are so hard to trace.
"A hash table supporting full concurrency of retrievals and high expected concurrency for updates." is the first sentence of the API Documentation on the ConcurrentHashMap class. And that pretty much sums it up. The concurrent hashmap guarantees that when calling get any thread gets the latest value. That's exactly the purpose of this class. If you look at its source code you can by the way see that they use volatile fields internally.
You're additionally using synchronized blocks to access your data. These do not only guarantee exclusive access, they also guarantee that all changes made before leaving such a block are visible to all threads that synchronize on the same lock object.
To summarize it: By using the concurrent hashmap implementation and using synchronization blocks you publish the changes and make the latest changes visible to other threads. One of the two would have already been sufficient.

How to avoid congesting/stalling/deadlocking an executorservice with recursive callable

All the threads in an ExecutorService are busy with tasks that wait for tasks that are stuck in the queue of the executor service.
Example code:
ExecutorService es=Executors.newFixedThreadPool(8);
Set<Future<Set<String>>> outerSet=new HashSet<>();
for(int i=0;i<8;i++){
outerSet.add(es.submit(new Callable<Set<String>>() {
#Override
public Set<String> call() throws Exception {
Thread.sleep(10000); //to simulate work
Set<Future<String>> innerSet=new HashSet<>();
for(int j=0;j<8;j++) {
int k=j;
innerSet.add(es.submit(new Callable<String>() {
#Override
public String call() throws Exception {
return "number "+k+" in inner loop";
}
}));
}
Set<String> out=new HashSet<>();
while(!innerSet.isEmpty()) { //we are stuck at this loop because all the
for(Future<String> f:innerSet) { //callable in innerSet are stuckin the queue
if(f.isDone()) { //of es and can't start since all the threads
out.add(f.get()); //in es are busy waiting for them to finish
}
}
}
return out;
}
}));
}
Are there any way to avoid this other than by making more threadpools for each layer or by having a threadpool that is not fixed in size?
A practical example would be if some callables are submitted to ForkJoinPool.commonPool() and then these tasks use objects that also submit to the commonPool in one of their methods.
You should use a ForkJoinPool. It was made for this situation.
Whereas your solution blocks a thread permanently while it's waiting for its subtasks to finish, the work stealing ForkJoinPool can perform work while in join(). This makes it efficient for these kinds of situations where you may have a variable number of small (and often recursive) tasks that are being run. With a regular thread-pool you would need to oversize it, to make sure that you don't run out of threads.
With CompletableFuture you need to handle a lot more of the actual planning/scheduling yourself, and it will be more complex to tune if you decide to change things. With FJP the only thing you need to tune is the amount of threads in the pool, with CF you need to think about then vs. thenAsync as well.
I would recommend trying to decompose the work to use completion stages via CompletableFuture
CompletableFuture.supplyAsync(outerTask)
.thenCompose(CompletableFuture.allOf(innerTasks)
That way your outer task doesn’t hog the execution thread while processing inner tasks, but you still get a Future that resolves when the entire job is done. It can be hard to split those stages up if they’re too tightly coupled though.
The approach that you are suggesting which basically is based on the hypothesis that there is a possible resolution if the number of threads are more than the number of task, will not work here, if you are already allocating a single thread pool. You may try it to see it. It's a simple case of deadlock as you have stated in the comments of your code.
In such a case, use two separate thread pools, one for the outer and another for the inner. And when the task from the inner pool completes, simply return back the value to the outer.
Or you can simply create a thread on the fly, get the work done in it, get the result and return it back to the outer.

Thread reduction for a single class in java

I have gotten my code into a state where I am creating a couple of threads and then inside those threads I use a library framework which spawns some additional threads over the life span of my application.
I have no control over how many threads are spawned inside the library framework, but I know they exist because I can see them in the eclipse debugger, I have kept the threads I use outside the library framework to a minimum, because I really don't want a multithreaded application, but sometimes you have too.
Now I am at the point where I need to do things with sockets and I/O, both of which are inherently hard to deal with in a multithreaded environment and while I am going to make my program thread safe i'd rather not get into the situation in the first place, or at least minimize the occurrences, the classes I am attempting to reduce multithreading in aren't time sensitive and i'd like them to complete "when they get the time". As it happens the lazy work is all in the same class definition but due to reasons, the class is instantiated a hell of a lot.
I was wondering if it was possible to make single type classes use only one thread when instantiated from multiple threads, and how?
I imagine the only way to achieve this would be to create a separate thread specifically for handling and processing of a instances of single class type.
Or do I just have to think of a new way to structure my code?
EDIT: included an example of my applications architecture;
public class Example {
public ArrayList<ThreadTypeA> threads = new ArrayList<ThreadTypeA>();
public static void main(String[] args) {
threads.add(new ThreadTypeA());
// left out how dataObj gets to ThreadTypeB for brevity
dataObj data = new dataObj(events);
}
}
public ThreadTypeA {
public ArrayList<ThreadTypeB> newThreads = new ArrayList<ThreadTypeB>();
public Thread thread = new Thread(this, "");
}
public ThreadTypeB {
// left out how dataObj gets to ThreadTypeB for brevity
public libObj libObj = new Library(dataObj);
}
public Library {
public Thread thread = new Thread(this, "");
#Override
public void editMe(dataObj) {
dataObj.callBack();
}
}
public dataObj(events) {
public void callMe() {
for (Event event: events) {
event.callMe();
}
}
}
there are a number of different events that can be called, ranging from writing to files making sql queries, sending emails and using proprietary ethernet-serial comms. I wish all events to run on the same thread, sequentially.
Rather than having Threads, consider having Callable or Runnables. These are objects which represent the work that is to be done. Your code can pass these to a thread pool for execution - you'll get a Future. If you care about the answer, you'll call get on the future and your code will wait for the execution to complete. If it's a fire-and-forget then you can be assured it's queued and will get done in good time.
Generally it makes more sense to divorce your execution code from the threads that run it to allow patterns like this.
To restrict thread resources use a limited thread pool:
ExecutorService executor = Executors.newFixedThreadPool(4);
for (int i = 0; i < 100; ++i) {
executor.execute(new Runnable() { ... });
}
executor.shutdown();
Also the reuse of threads of such a pool is said to be faster.
It might be a far hope that the library does a similar thing, and maybe even has the thread pool size configurable.

How to have one java thread wait for the result of another thread?

I frequently need to have a thread wait for the result of another thread. Seems like there should be some support for this in java.util.concurrent, but I can't find it.
Exchanger is very close to what I'm talking about, but it's bi-directional. I only want Thread A to wait on Thread B, not have both wait on each other.
Yes, I know I can use a CountDownLatch or a Semaphore or Thread.wait() and then manage the result of the computation myself, but it seems like I must be missing a convenience class somewhere.
What am I missing?
UPDATE
// An Example which works using Exchanger
// but you would think there would be uni-directional solution
protected Exchanger<Integer> exchanger = new Exchanger<Integer>();
public void threadA() {
// perform some computations
int result = ...;
exchanger.exchange(result);
}
public void threadB() {
// retrieve the result of threadA
int resultOfA = exchanger.exchange(null);
}
Are you looking for Future<T>? That's the normal representation of a task which has (usually) been submitted to a work queue, but may not have completed yet. You can find out its completion status, block until it's finished, etc.
Look at ExecutorService for the normal way of obtaining futures. Note that this is focused on getting the result of an individual task, not rather than waiting for a thread to finish. A single thread may complete many tasks in its life time, of course - that's the whole point of a thread pool.
So far, it seems like BlockingQueue may be the best solution I've found.
eg.
BlockingQueue<Integer> queue = new ArrayBlockingQueue<Integer>(1);
The waiting thread will call queue.take() to wait for the result, and the producing queue will call queue.add() to submit the result.
The JDK doesn't provide a convenience class that provides the exact functionality you're looking for. However, it is actually fairly easy to write a small utility class to do just that.
You mentioned the CountDownLatch and your preference regarding it, but I would still suggest looking at it. You can build a small utility class (a "value synchronizer" if you will) pretty easily:
public class OneShotValueSynchronizer<T> {
private volatile T value;
private final CountDownLatch set = new CountDownLatch(1);
public T get() throws InterruptedException {
set.await();
return value;
}
public synchronized void set(T value) {
if (set.getCount() > 0) {
this.value = value;
set.countDown();
}
}
// more methods if needed
}
Since Java 8 you can use CompletableFuture<T>. Thread A can wait for a result using the blocking get() method, while Thread B can pass the result of computation using complete().
If Thread B encounters an exception while calculating the result, it can communicate this to Thread A by calling completeExceptionally().
What's inconvenient in using Thread.join()?
I recently had the same problem, tried using a Future then a CountdownLatch but settled on an Exchanger. They are supposed to allow two threads to swap data but there's no reason why one of those threads can't just pass a null.
In the end I think it was the cleanest solution, but it may depend on what exactly you are trying to achieve.
You might use java.util.concurrent.CountDownLatch for this.
http://download.oracle.com/javase/6/docs/api/java/util/concurrent/CountDownLatch.html
Example:
CountDownLatch latch = new CountDownLatch(1);
// thread one
// do some work
latch.countDown();
// thread two
latch.await();

Java: Best way to retrieve timings form multiple threads

We have 1000 threads that hit a web service and time how long the call takes. We wish for each thread to return their own timing result to the main application, so that various statistics can be recorded.
Please note that various tools were considered for this, but for various reasons we need to write our own.
What would be the best way for each thread to return the timing - we have considered two options so far :-
1. once a thread has its timing result it calls a singleton that provides a synchronised method to write to the file. This ensures that all each thread will write to the file in turn (although in an undetermined order - which is fine), and since the call is done after the timing results have been taken by the thread, then being blocked waiting to write is not really an issue. When all threads have completed, the main application can then read the file to generate the statistics.
2. Using the Executor, Callable and Future interfaces
Which would be the best way, or are there any other better ways ?
Thanks very much in advance
Paul
Use the latter method.
Your workers implement Callable. You then submit them to a threadpool, and get a Future instance for each.
Then just call get() on the Futures to get the results of the calculations.
import java.util.*;
import java.util.concurrent.*;
public class WebServiceTester {
public static class Tester
implements Callable {
public Integer call() {
Integer start = now();
//Do your test here
Integer end = now();
return end - start;
}
}
public static void main(String args[]) throws Exception {
ExecutorService pool = Executors.newFixedThreadPool(1000);
Set<Future<Integer>> set = new HashSet<Future<Integer>>();
for (int i =0 ; i < 1000 i++) {
set.add(pool.submit(new Tester()));
}
Set<Integer> results = new Set<Integer>();
for (Future<Integer> future : set) {
results.put(future.get());
}
//Manipulate results however you wish....
}
}
Another possible solution I can think of would be to use a CountDownLatch (from the java concurrency packages), each thread decrementing it (flagging they are finished), then once all complete (and the CountDownLatch reaches 0) your main thread can happily go through them all, asking them what their time was.
The executor framework can be implemented here. The time processing can be done by the Callable object. The Future can help you identify if the thread has completed processing.
You could pass an ArrayBlockingQueue to the threads to report their results to. You could then have a file writing thread that takes from the queue to write to the file.

Categories